Lately we had a customer who has a query with left outer join.   From looking at the indexes and statistics and filtering conditions, our instinct tells us that SQL Server should have used index seek.  But supprisingly, SQL Server uses index scan.  The table has significant number of rows.  This has caused high CPU consumption.

In order to illustrate the issue, I came up smaller repro.

create table t1 (ItemKey int, t1_name varchar(50))
go
create table t2 (ItemKey int, t2_name varchar(50))
go
create clustered index ix_t1_itemkey on t1 (ItemKey)
go
create index ix_t1_name on t1 (t1_name)
go

create clustered index ix_t2_itemkey on t2 (ItemKey)
go
create index ix_t2_name on t2 (t2_name)
go

insert into t1 values (1, 'xyz')
insert into t1 values (2,'abc')
insert into t2 values (1, 'def')
insert into t2 values (2,'uvw')
go

If we run the query below, we got a plan that chooses index scan on t1.ix_t1_name. 

select * from t1 left outer  join t2
on t1.itemkey = t2.itemkey and t2.t2_name = 'def'
where
t1.t1_name='abc'
or t2.t2_name='def'

select * from t1 left outer  join t2  on t1.itemkey = t2.itemkey and t2.t2_name = 'def'  where  t1.t1_name='abc'  or t2.t2_name='def'
  |--Filter(WHERE:([tempdb].[dbo].[t1].[t1_name]='abc' OR [tempdb].[dbo].[t2].[t2_name]='def'))
       |--Nested Loops(Left Outer Join, OUTER REFERENCES:([tempdb].[dbo].[t1].[ItemKey]))
            |--Index Scan(OBJECT:([tempdb].[dbo].[t1].[ix_t1_name]))
            |--Index Seek(OBJECT:([tempdb].[dbo].[t2].[ix_t2_name]), SEEK:([tempdb].[dbo].[t2].[t2_name]='def' AND [tempdb].[dbo].[t2].[ItemKey]=[tempdb].[dbo].[t1].[ItemKey]) ORDERED FORWARD)

 

So the question is:

If values of t1.t1_name is very selective and has an index on it, why doesn't SQL Server choose an index seek?  The answer is that it can't do that.   This is because the query has an 'OR' clause and the filtering condition is on t2_name.    In many cases, optimizer can push the filtering down before join or change outer join to inner join.  This is done on condition that results will stay the same.
In this particular query (above) , SQL Server has to do join first and then do filtering later because of the OR condition.  In order to do the left outer join, SQL has to preserve all rows from t1 (left table).  Hence the scan.  Now if SQL Server chose to push the filter t1_name='abc' down, the second row wouldn't be included.

Below are rows from t1 and t2 and the results from the left outer join.

table t1
ItemKey     t1_name
----------- --------------------------------------------------
2           abc
1           xyz

table t2
ItemKey     t2_name
----------- --------------------------------------------------
1           def
2           uvw

result from the query
ItemKey     t1_name           ItemKey     t2_name
----------- --------------- ----------- ---------------
2           abc                NULL        NULL
1           xyz                1           def

Now that we understand SQL can't use push down the filter to do index seek on t1, the question is:  can we get rid of the OR "or t2.t2_name='def'"?   After all,  the same condition in the join condition and eliminating this will enable SQL Server to do index seek and push down the filtering condition.
However, you can't do this.  If you remove the "or t2.t2_name='def'", the results will be different.  Here is the result.  row "1           xyz                1           def" is eliminated from the result.

ItemKey     t1_name                                            ItemKey     t2_name
----------- -------------------------------------------------- ----------- --------------------------------------------------
2           abc                                                NULL        NULL

 

What is the solution?

You can re-write the query with union.  The following query will perform much better with large number of rows in table t1.   In fact the  "and t2.t2_name = 'def'" is not even needed in the join condition.  And the second select is effectively inner join.
--this query will do index scan and filter later
select * from t1 left outer join t2
on t1.itemkey = t2.itemkey and t2.t2_name = 'def'
where
t1.t1_name='abc'
union
select *  from t1 left  outer join t2
on t1.itemkey = t2.itemkey and t2.t2_name = 'def'
where t2.t2_name='def'

**************************************************************************

Jack Li | Senior Escalation Engineer | Microsoft Server Support