SQL and XML - Calculation of longest continuous pause

I have a very specific problem which i was hoping somebody could shed some light on. It is not exactly an error but more so help on the query i need to run to return the desired result set.

I have a table called xml_table with 2 columns; word_id, word_data:

word_id | word_data
1       | <results><channel id="1"><r s="0" d="650" w="Hello"/><r s="650" d="230" w="SIL"/></channel></results>
2       | <results><channel id="1"><r s="0" d="350" w="Sorry"/><r s="350" d="10" w="WHO"/></channel></results>
3       | <results><channel id="1"><r s="0" d="750" w="Please"/><r s="750" d="50" w="s"/></channel></results>
...

and so on where word_data is an XML String.

The XML String within each row is of the following format:

<results>
   <channel id="1">
      <r s="0" d="100" w="SIL"/>
      <r s="100" d="250" w="Sorry"/>
      <r s="350" d="100" w="WHO"/>
      <r s="450" d="350" w="SIL"/>
      <r s="800" d="550" w="SIL"/>
      <r s="1350" d="100" w="Hello"/>
      <r s="1450" d="200" w="s"/>
      <r s="1650" d="50" w="SIL"/>
      <r s="1700" d="100" w="SIL"/>
   </channel>
</results>

s represents start time d represents duration w represents word (the number of r tag is NOT fixed and changes from row to row of xml_table)

The idea now is to sift through each row, and within each XML, calculate the longest consecutive duration when a 'SIL' or 's' appears as a in the w attribute and then to return this in a new table as longest_pause (i.e longest consecutive SIL/s duration) with word_id and word_data also.

So in the above example xml we have three consecutive periods where the longest_pause can occur where the total durations are 100 (100), 900 (350+550) and 350 (200 + 50 + 100) and therefore the longest_pause is 900 so 900 would be returned.

I was wondering if anybody could help with this, so far i have:

DECLARE @xml XML
DECLARE @ordered_table TABLE (id VARCHAR(20) NOT NULL, start_time INT NOT NULL, duration INT NOT NULL, word VARCHAR(50) NOT NULL)

SELECT @xml = (SELECT word_data FROM xml_table where word_id = 1)
INSERT into @ordered_table_by_time(id, start_time, duration, word)

SELECT 'NAME' AS id, Tbl.Col.value('@s', 'INT'), Tbl.Col.value('@d', 'INT'), Tbl.Col.value('@w', 'varchar(50)') FROM @xml.nodes('/results/channel[@id="1"]/r') Tbl(Col)

i.e, I have created a table to put the XML into, but i do not know where to go from there,

Please can somebody help?

Thank you :)

Answers


Your attempt at solving this looks like you want to find the longest duration for one XML but the text suggests that you want to find the row in xml_table that has the longest duration.

Working with the one XML instance and modified version of your table variable you could do like this.

DECLARE @xml XML = '
<results>
   <channel id="1">
      <r s="0" d="100" w="SIL"/>
      <r s="100" d="250" w="Sorry"/>
      <r s="350" d="100" w="WHO"/>
      <r s="450" d="350" w="SIL"/>
      <r s="800" d="550" w="SIL"/>
      <r s="1350" d="100" w="Hello"/>
      <r s="1450" d="200" w="s"/>
      <r s="1650" d="50" w="SIL"/>
      <r s="1700" d="100" w="SIL"/>
   </channel>
</results>';

DECLARE @ordered_table TABLE
(
  id INT NOT NULL,
  start_time INT NOT NULL,
  duration INT NOT NULL,
  word VARCHAR(50) NOT NULL
);

INSERT INTO @ordered_table(id, start_time, duration, word)
SELECT row_number() over(order by Tbl.Col.value('@s', 'INT')), 
       Tbl.Col.value('@s', 'INT'), 
       Tbl.Col.value('@d', 'INT'), 
       Tbl.Col.value('@w', 'varchar(50)') 
FROM @xml.nodes('/results/channel[@id="1"]/r') Tbl(Col);

WITH C AS
(
  SELECT T.id,
         CASE WHEN T.word IN ('S', 'SIL') THEN T.duration ELSE 0 END AS Dur
  FROM @ordered_table as T
  WHERE T.ID = 1
  UNION ALL
  SELECT T.id,
         CASE WHEN T.word IN ('S', 'SIL') THEN C.Dur + T.duration ELSE 0 END AS Dur
  FROM @ordered_table as T
    INNER JOIN C 
      ON T.ID = C.ID + 1
)
SELECT TOP(1) *
FROM C
ORDER BY C.Dur DESC;

SQL Fiddle

I added a ID field that is used in a recursive CTE to walk through the nodes and calculating a running sum where w is SIL or s. Then fetching the longest duration from the CTE using TOP(1) ... ORDER BY.

If you instead want the row in xml_table with the longest duration you can do like this.

with C as
(
  select 1 as node,
         X.word_id,
         X.word_data,
         case when T.W in ('S', 'SIL') then T.D else 0 end as duration
  from dbo.xml_table as X
    cross apply (select X.word_data.value('(/results/channel[@id = "1"]/r/@d)[1]', 'int'),
                        X.word_data.value('(/results/channel[@id = "1"]/r/@w)[1]', 'nvarchar(100)')) as T(D, W)
  union all 
  select C.node + 1,
         X.word_id,
         X.word_data,
         case when T.W in ('S', 'SIL') then T.D + C.duration else 0 end as duration
  from C
    inner join dbo.xml_table as X
      on X.word_id = C.word_id
    cross apply (select X.word_data.value('(/results/channel[@id = "1"]/r/@d)[sql:column("C.Node")+1][1]', 'int'),
                        X.word_data.value('(/results/channel[@id = "1"]/r/@w)[sql:column("C.Node")+1][1]', 'nvarchar(100)')) as T(D, W)
  where T.W is not null
)
select T.word_id,
       T.word_data,
       T.duration
from
  (
  select row_number() over(partition by C.word_id order by C.duration desc) as rn,
         C.word_id,
         C.word_data,
         C.duration
  from C
  ) as T
where T.rn = 1
option (maxrecursion 0);

SQL Fiddle

The recursive CTE part works the same as before but but for multiple rows at the same time and it is getting the value for duration from the XML directly using the column node that is incremented for each iteration. The query against the CTE uses row_number() to find the longest duration for each row.


Have you considered using something like python instead?

You can query the SQL to get the data, then use regular expressions to extract the values from the XML, calculate the value wanted, then insert it back into the results table.

I recently did something slightly similar and decided doing the processing in python was a much easier way to do it if that's possible for you


Need Your Help

merge changes to git repo as if they had happened in the past

git

I'm pretty new to git, so forgive me if this is obvious (or impossible)!

Close AutoCompleteTextView dropdown

java android autocomplete

I've got an AutoCompleteTextView with a dropdown list. How can I close dropdown programmatically?