A colleague had a problem concerning a query using the (NOT) IN predicate, use the following table creation script (attached to the blog entry) to see the effects of the problem and to repro it on your machine.

Imagine you have two tables (not that hard till here) and you want to get the information from Table 1 (#SomeTableToTest) where some columns of the table (SomeId) does not exists in a set of other IDs (SomeOtherId) of a second table (#SomeOtherTable). I hear you saying that this is not a hard query at all, as you can use the NOT IN operator. Well, first of all lets get back to the scratchboard and see how many rows should be in the result. A distinct selection of the both tables brings back a set with the following values:

SomeTableToTest SomeOtherTable
NULL
0
1300
1301
1303
1306 1306
1307 1307
1316
1317
1318
1346
2134
3536

The red values should be filtered, the orange ones ignored (as they do not exists in the first table), the green ones should be returned as they do not exist in the second table.

Returning to the actual implementation the query would be:

SELECT SomeId 
FROM #SomeTableToTest
WHERE SomeId NOT IN 
(
    SELECT  SomeOtherId FROM #SomeOtherTable
)

this returns:

SomeId
-----------

(0 row(s) affected)


Bummer, what the hell… Where are the 8 rows which should be returned ?

Lets try the equivalent query with EXISTS:

SELECT SomeId 
FROM #SomeTableToTest T1
WHERE NOT EXISTS 
(
    SELECT * FROM #SomeOtherTable T2
    WHERE t1.SomeId = T2.SomeOtherId
)
 

SomeId
-----------
0
1300
1301
1303
1316
1317
1318
1346

(8 row(s) affected)

Strange thing, the rows are there. Anything with the query that turns the result into garbage ? Lets try to substitute the query with static values:

SELECT SomeId 
FROM #SomeTableToTest 
WHERE SomeId NOT IN 
(
    NULL,
    1306,
    1307,
    2134,
    3536
)

SomeId
-----------

(0 row(s) affected)

OK, narrowed down the problem. It has something to do with the values not the subquery. The only thing which comes in mind is the NULL value, lets try to eliminate that:

SELECT SomeId 
FROM #SomeTableToTest 
WHERE SomeId NOT IN 
(
    1306,
    1307,
    2134,
    3536
)

SomeId
-----------
0
1300
1301
1303
1316
1317
1318
1346

(8 row(s) affected)

Yiehhaa ! So the problem can be boiled down do the NOT IN operator (EXISTS worked for us) and the NULL value. Looking in the bible of TSQL syntax (beside the books of Itzik :-))

notin

IN (Transact-SQL) http://msdn.microsoft.com/en-us/library/ms177682(SQL.90).aspx

“Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results.”

Ok, that's the problem. So the combination of NULL values and the EXISTS operator will cause weird results. I hear you saying: “Come on, what is the alternative for that ?” You could either use:

  • EXISTS
  • EXCEPT
  • Filter out the NULL values in your subquery (Caution: This might also change your expected result if you want to see those NULL values)

Regardless what solution you consider to take, keep in mind that you have to consider an alternative solution. I will cover the difference in performance in one of the next blogs entries.

UPDATE: See the followup here about the magic of NOT IN and NULL which is really trivial 

-Jens