Adding string comparison where clause to inner join takes huge performance hit with SQLite
I'm playing around with Philadelphia transit data and I have an sqlite database storing the gtfs data. I have this query looking for departure times at a particular stop:
SELECT "stop_time".departure_time FROM "stop_time" INNER JOIN "trip" ON "trip".trip_id = "stop_time".trip_id WHERE (trip.route_id = '10726' ) -- AND (trip.service_id = '1') AND (stop_time.stop_id = '220') AND (time( stop_time.departure_time ) > time('08:30:45')) AND (time( stop_time.departure_time ) < time('09:30:45'));
The clause to match service_id to 1 is currently commented out. If I run the query as it is now, without matching service_id, it takes 2 seconds. If I uncomment the service_id clause, it'll take 30. I'm clueless as to why since I'm already looking into the trip table for the route_id.
This generally happens if an index is defined on (route_id, stop_id and departure_time) that is used when there is no filter on service_id. Once you include that in the WHERE clause, since it is not present in the index, it requires a TABLE SCAN and hence the shoot up in the execution time. If you include service_id also in the index definition, then TABLE SCAN would be replaced with INDEX SCAN.
The reason is you have an index on service_id that is being chosen in preference to either another index, and there being not many different values of service_id, so using the index isn't being very useful because there are so many rows for service_id = '1'.