MYSQL nested/ correlated subquery with group_concat and unequal keys

I'm trying to write a MYSQL query to assemble data from multiple rows of one table and then join that result with another table. I am using GROUP_CONCAT to convert 4+ rows into one string and that part works to some extent, but not the way I want it to.

Some background first to help visualize the problem: I'm working on a Home Automation project for myself and looking for a way to correlate sensor input from various sources. I have a variety of sensors that are streaming data into different mysql tables and my primary means of relating data across tables is to use timestamps in each table which may not always match precisely and that's where I have trouble using simple SQL clauses. I'm asking for help now on one particular problem that I've not been able to solve for days.

I have setup two sample tables with some data and one query in sqlfiddle here

I will break up the question into 2 parts:

  1. I have a table that captures keystrokes from a keypad. Users can do multiple things on the keypad including entering a security code to unlock functions. A security code can be 4 or 5 digits long and always ends with a *. A 4 digit code results in 5 unique records in the database - 4 for the codes and 1 for *. I have to read these back in reverse order to read the code that was entered. I wrote the query below to build these codes for me.
SELECT GROUP_CONCAT(keyvalue ORDER BY id SEPARATOR '') AS code,
       skey2 
FROM (
    SELECT keyvalue,skey2,skey1,id 
    FROM lockactivity AS la 
    WHERE la.skey2 <= 92956 
    ORDER BY la.id DESC LIMIT 6 ) AS codelist
WHERE length(codelist.keyvalue)=1 AND codelist.keyvalue<>'*'

This handles 4/5 digit codes OK so far. The table may contain rows of codes back to back, or have other records in between.

I have two challenges with this query:

a) The first and likely obvious problem is that the query runs just fine for 1 condition at a time (la.skey -> 92956 in the query above). I can't seem to run this for the entire table at once. The number 92956 in the query above is a numeric representation of the time that the user pressed * one one particular occasion and i'm reading a few records before that time to assemble the code string. Ideally I want to get a list of all skey2's where the user pressed * with the corresponding code

b) Its also possible that more than one sensors log to the table at the same time so codes may be jumbled up with the lockip field being the unique identifier for the sensors.

  1. Once the first problem has been addressed, I want to join that output with the output of another table such that i show all records from the second table and just the code from the first table (as above). The two tables are related by skey2 - the time that the user presses * generates an image and the details of that are captured in the other table. I expect I will be able to do a simple join once the first query is available. If that's not possible or will be better accomplished another way please advise.

The database and all associated code (and these queries) will be running on a raspberry pi so I may be constrained by resources to some extent and I'm looking for ways advise to also make sure that the final queries are as efficient as possible. Both tables are expected to grow by a 100+ records each day.

I have attempted many combinations of queries in the last days - nested, correlated and combinations of both. I think this is more complicated because the code that I am looking for has to be assembled using records that have no real link between them (the skey2 value changes between individual records of one code). I also hit a roadblock trying to write a correlated query to directly attempt the output desired in question 2 above and i think that failed because MySQL won't let me have a correlation two levels deep.

Thanks for reading so far and my apology for the very long post. Again, the related tables and data are available on Sqlfiddle here

Answers


This is a time-series problem. You have a sequence of keystrokes (intermixed with other stuff) in this lockactivity table. We need to separate the keystrokes into groups that are nearby to each other in time.

This query (http://sqlfiddle.com/#!2/6ffed/21/0) does that. Just for completeness, I'm leaving in the * keystrokes.

SELECT TIMESTAMPDIFF(SECOND, @prevVal, req_time) AS timediff,
       @prevVal := req_time AS req_time,
       lockip, keyvalue
  FROM lockactivity,
       (SELECT @prevVal := MIN(req_time) -
                           INTERVAL 1 HOUR FROM lockactivity) AS r
 WHERE LENGTH(keyvalue) = 1
 ORDER BY lockip, id

The first few rows of the result set look like this:

TIMEDIFF     REQ_TIME           LOCKIP         KEYVALUE
3604      2014-02-27 09:29:55   192.168.1.49    3
0         2014-02-27 09:29:55   192.168.1.49    6
1         2014-02-27 09:29:56   192.168.1.49    4
0         2014-02-27 09:29:56   192.168.1.49    1
0         2014-02-27 09:29:56   192.168.1.49    *
1155      2014-02-27 09:49:11   192.168.1.49    3

See how each new "bunch" of keystrokes starts with a relatively large timediff?

Our next step is to create a scheme for putting a new serial number on each bunch of keystrokes. This nasty query (http://sqlfiddle.com/#!2/6ffed/25/0) does that.

SELECT TIMESTAMPDIFF(SECOND, @prevVal, req_time) AS timediff,
       IF(ABS(TIMESTAMPDIFF(SECOND, @prevVal, req_time))> 120,
          @seq:=@seq+1, 
          @seq) AS seq,
       @prevVal := req_time AS req_time,
       lockip, keyvalue
  FROM lockactivity,
       (SELECT @prevVal := MIN(req_time) - 
                           INTERVAL 1 HOUR FROM lockactivity) AS r,
       (SELECT @seq := 0) AS s
 WHERE LENGTH(keyvalue) = 1
 ORDER BY lockip, id

Notice the 120 above. I arbitrarily chose to cluster keystrokes that have a delay of 120 seconds or less one from the next. You may need to choose a different clustering number. Notice also that when we get the second ip, the timediff jumps backward, so I used ABS() for the clustering criterion.

Finally, we need to summarize this stuff with GROUP_CONCAT, etc. This query (http://sqlfiddle.com/#!2/6ffed/28/0) does that.

SELECT lockip,
       GROUP_CONCAT(keyvalue 
                    ORDER BY id
                    SEPARATOR '') AS keystrokes,
       MAX(req_time) AS finish_time

FROM (       

SELECT TIMESTAMPDIFF(SECOND, @prevVal, req_time) AS timediff,
       IF(ABS(TIMESTAMPDIFF(SECOND, @prevVal, req_time))> 120,
          @seq:=@seq+1, 
          @seq) AS seq,
       @prevVal := req_time AS req_time,
       id, lockip, keyvalue
  FROM lockactivity,
       (SELECT @prevVal := MIN(req_time) - 
                           INTERVAL 1 HOUR FROM lockactivity) AS r,
       (SELECT @seq := 0) AS s
 WHERE LENGTH(keyvalue) = 1
 ORDER BY lockip, id
) AS seq
GROUP BY seq, lockip
ORDER BY MAX(req_time)

Here's the result set, which seems to be exactly what you need. I am not completely sure about the last two key sequences. You may need to fiddle around with the cluster time to get that stuff right.

LOCKIP         KEYSTROKES   FINISH_TIME
192.168.1.49   3641*        2014-02-27 09:29:56
192.168.1.49   3            2014-02-27 09:49:11
192.168.1.49   3641*        2014-02-27 20:29:49
192.168.1.49   3641*        2014-02-27 20:33:32
192.168.1.55   1122*        2014-02-27 21:06:42
192.168.1.55   1122**       2014-02-27 21:45:52
192.168.1.55   1122*        2014-02-27 22:12:38
192.168.1.49   3641*11015*  2014-02-27 22:13:11
192.168.1.49   33015*11015* 2014-02-27 22:20:10

Finally, the OP made some further adjustments to the query to split out the star-delimited sequences.

Update by jinxjy: Per my comment, I've made some small changes to the final query above to use * as a separator in addition to the timestamp sequence created earlier. Other small edits include changing the 120 to 6 as that's the standard keystroke timeout on the keypad and hiding the * in the final code column. This has also been posted to sqlfiddle (http://sqlfiddle.com/#!2/6ffed/53/0)

 SELECT lockip,
       GROUP_CONCAT(keyvalue 
                    ORDER BY id
                    SEPARATOR '') AS code,
       MAX(req_time) AS finish_time
FROM (
SELECT TIMESTAMPDIFF(SECOND, @prevVal, req_time) AS timediff,
       IF(keyvalue="*",
          @flag:=1,
          @flag:=0) AS flag,       
       IF(ABS(TIMESTAMPDIFF(SECOND, @prevVal, req_time))> 6,
          @seq:=@seq+1+@flag, 
          @seq:=@seq+@flag) AS seq,
       @prevVal := req_time AS req_time,
       id, lockip, keyvalue
  FROM lockactivity,
       (SELECT @prevVal := MIN(req_time) - INTERVAL 1 HOUR FROM lockactivity) AS r,
       (SELECT @seq := 0) AS s,
       (SELECT @flag:= 0) AS n
 WHERE LENGTH(keyvalue) = 1
 ORDER BY lockip, id
) AS seq
WHERE flag=0
GROUP BY seq, lockip
ORDER BY MAX(req_time) DESC

This gives me the ideal result:

|       LOCKIP |  CODE |         FINISH_TIME |
|--------------|-------|---------------------|
| 192.168.1.49 | 11015 | 2014-02-27 22:20:10 |
| 192.168.1.49 | 33015 | 2014-02-27 22:20:04 |
| 192.168.1.49 | 11015 | 2014-02-27 22:13:11 |
| 192.168.1.49 |  3641 | 2014-02-27 22:13:06 |
| 192.168.1.55 |  1122 | 2014-02-27 22:12:38 |
| 192.168.1.55 |  1122 | 2014-02-27 21:45:48 |
| 192.168.1.55 |  1122 | 2014-02-27 21:06:41 |
| 192.168.1.49 |  3641 | 2014-02-27 20:33:32 |
| 192.168.1.49 |  3641 | 2014-02-27 20:29:48 |
| 192.168.1.49 |     3 | 2014-02-27 09:49:11 |
| 192.168.1.49 |  3641 | 2014-02-27 09:29:56 |

Interesting problem!


Need Your Help

Akka Futures Exceptions

scala exception exception-handling akka future

What happens when an actor of a future throws an exception?