Oracle SQL — Fixing data with “impossible” date ranges
I am working with a table that, for each ID, has a list of "events", where an event is whether or not the object represented by that ID went "On" or "Off." Items assumingly start as off if they are not in this table. An item can only legally go on if was not on at that point.
Here's an example of what it might look like:
Events ID Type Date 1 On 01-MAY-12 1 Off 01-JUN-12 1 On 05-JUN-12
Unfortunately, this data is terrible and it is filled with impossibilities. Here is what part of this table actually looks like:
Events ID Type Date 1 On 01-MAY-12 1 On 01-MAY-12 1 On 01-JUN-12 1 Off 01-JUL-12 1 Off 01-AUG-12 1 On 05-AUG-12
What I want to do is write a query that returns this table without junk data, where impossible rows (that is, rows that turn the item on when it's already on, or off when it's already off) are removed.
I don't care too much about the case of it being 'off' before being turned on for the first time as I don't believe we have any cases of that.
So, said query would return:
Events ID Type Date 1 On 01-MAY-12 1 Off 01-JUL-12 1 On 01-AUG-12
Any thoughts on how to do this? This logic of "return the first row under condition X" is causing me a lot of difficulty. If I just wanted the first on row or first off row, period, I could just group by and grab a min. But how do I do it in this case?
You can write this:
SELECT id, type, "date" FROM ( SELECT id, LAG(type) OVER (PARTITION BY id ORDER BY "date") AS prev_type, type, "date" FROM events ) WHERE type <> NVL(prev_type, '-') ORDER BY "date" ;
- date isn't really a valid column name unless you wrap it in double-quotes. I'm guessing that's not really the name of the column?
- You wrote, "I don't care too much about the case of it being 'off' before being turned on for the first time as I don't believe we have any cases of that", so I decided not to remove initial Offs, but that's actually no harder: just change NVL(prev_type, '-') to NVL(prev_type, 'Off').
- I didn't worry about the case that that you have two records with the same id and "date" and different types, because I had no idea how you wanted that handled. I hope you don't need them to be put in whatever-order-makes-them-possible, because that would be significantly harder.