JOIN vs. IN vs. EXISTS
I was reading an article that explained the difference between join and in and exists clause but I got confused with the explanation of different results when using NOT IN vs. NOT EXISTS clause. Can someone clarify why there is a difference between the output for NOT EXISTS clause vs. NOT IN clause? I tried after deleting the NULL row (t2.id = 8) from the table t2 and still got the same result.
Here's the SQL script from the article:
CREATE TABLE t1 (id INT, title VARCHAR(20), someIntCol INT) GO CREATE TABLE t2 (id INT, t1Id INT, someData VARCHAR(20)) GO INSERT INTO t1 SELECT 1, 'title 1', 5 UNION ALL SELECT 2, 'title 2', 5 UNION ALL SELECT 3, 'title 3', 5 UNION ALL SELECT 4, 'title 4', 5 UNION ALL SELECT null, 'title 5', 5 UNION ALL SELECT null, 'title 6', 5 INSERT INTO t2 SELECT 1, 1, 'data 1' UNION ALL SELECT 2, 1, 'data 2' UNION ALL SELECT 3, 2, 'data 3' UNION ALL SELECT 4, 3, 'data 4' UNION ALL SELECT 5, 3, 'data 5' UNION ALL SELECT 6, 3, 'data 6' UNION ALL SELECT 7, 4, 'data 7' UNION ALL SELECT 8, null, 'data 8' UNION ALL SELECT 9, 6, 'data 9' UNION ALL SELECT 10, 6, 'data 10' UNION ALL SELECT 11, 8, 'data 11'
And here's the SQL queries and their explanation:
-- IN doesn't get correct results. -- That's because of how IN treats NULLs and the Three-valued logic -- NULL is treated as an unknown, so if there's a null in the t2.t1id -- NOT IN will return either NOT TRUE or NOT UNKNOWN. And neither can be TRUE. -- when there's a NULL in the t1id column of the t2 table the NOT IN query will always return an empty set.
SELECT t1.* FROM t1 WHERE t1.id NOT IN (SELECT t1id FROM t2)
-- NOT EXISTS gets correct results
SELECT t1.* FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t1.id = t2.t1id) GO DROP TABLE t2 DROP TABLE t1
Here's the link to the article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
As I can see, you can use them as the same thing in a lot of cases, but you can't forget the details behind them.
Probably you can get the same results applying both NOT IN and NOT EXISTS, but you could see differences in query which involve the NULL value. Because NOT EXISTS is the only way to obtain those rows with the NULL value.
You can see it better in this example:
update cars set c_owner = NULL where c_id = BMW03444
Well... Let's try to see if we have any car in stock that has not been sold yet.
select count(*) from cars where c_owner not it (select c_name from customers);
Where's the failure? Quite simple. You're not requesting a group of cars whose buyers has not been included in the list. You are simply asking for a car without owner. Anybody, even if he's not in the list. The correct form is:
select count(*) from cars c1 where not exists ( select c_owner from customers c2 where c1.c_owner=c2.customer_id );
This is because NOT IN needs specific values to check in. So NULL values are set as FALSE and not counted. NOT EXISTS checks the non existence of an element in a set, so NULL values are set as TRUE and are included.