Oracle SQL — Fixing data with “impossible” date ranges

I am working with a table that, for each ID, has a list of "events", where an event is whether or not the object represented by that ID went "On" or "Off." Items assumingly start as off if they are not in this table. An item can only legally go on if was not on at that point.

Here's an example of what it might look like:

Events
ID   Type    Date
1    On      01-MAY-12
1    Off     01-JUN-12
1    On      05-JUN-12

Unfortunately, this data is terrible and it is filled with impossibilities. Here is what part of this table actually looks like:

Events
ID   Type    Date
1    On      01-MAY-12
1    On      01-MAY-12
1    On      01-JUN-12
1    Off     01-JUL-12
1    Off     01-AUG-12
1    On      05-AUG-12

What I want to do is write a query that returns this table without junk data, where impossible rows (that is, rows that turn the item on when it's already on, or off when it's already off) are removed.

I don't care too much about the case of it being 'off' before being turned on for the first time as I don't believe we have any cases of that.

So, said query would return:

Events
ID   Type    Date
1    On      01-MAY-12
1    Off     01-JUL-12
1    On      01-AUG-12

Any thoughts on how to do this? This logic of "return the first row under condition X" is causing me a lot of difficulty. If I just wanted the first on row or first off row, period, I could just group by and grab a min. But how do I do it in this case?

Answers


You can write this:

SELECT id,
       type,
       "date"
  FROM ( SELECT id,
                LAG(type) OVER (PARTITION BY id ORDER BY "date") AS prev_type,
                type,
                "date"
           FROM events
       )
 WHERE type <> NVL(prev_type, '-')
 ORDER
    BY "date"
;

Notes:

  • date isn't really a valid column name unless you wrap it in double-quotes. I'm guessing that's not really the name of the column?
  • You wrote, "I don't care too much about the case of it being 'off' before being turned on for the first time as I don't believe we have any cases of that", so I decided not to remove initial Offs, but that's actually no harder: just change NVL(prev_type, '-') to NVL(prev_type, 'Off').
  • I didn't worry about the case that that you have two records with the same id and "date" and different types, because I had no idea how you wanted that handled. I hope you don't need them to be put in whatever-order-makes-them-possible, because that would be significantly harder.

Need Your Help

Java Program read input from a text file and modify it accordingly

java string file-io

I am writing a Java program that inputs a test file, performs some modifications to the data, then writes it to a new file output.

R - How to sum objects in a column between an interval defined by conditions on another column

r loops sum condition rowsum

This comes as an application to this question:Sum object in a column between an interval defined by another column

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.