Before SQL Server can perform an index seek, it must determine whether the keys of the index are suitable for evaluating a predicate in the query.
Single column indexes
Single column indexes are fairly straightforward. SQL Server can use single column indexes to answer most simple comparisons including equality and inequality (greater than, less than, etc.) comparisons. More complex expressions such as functions over a column and “like” predicates with a leading wildcard character will generally prevent SQL Server from using an index seek.
For example, suppose we have a single column index on a column “a.” We can use this index to seek on these predicates:
However, we cannot use it to seek on these predicates:
Multi-column indexes are slightly more complex. With a multi-column index, the order of the keys matters. It determines the sort order of the index and it affects the set of seek predicates that SQL Server can evaluate using the index.
For an easy way to visualize why order matters, think about the phone book. The phone book is like an index with the keys (last name, first name). The contents of the phone book are sorted by last name and it is easy to look someone up if we know their last name. However, if we have only a first name, it is very difficult to get a list of people with that name. We would need another phone book sorted on first name.
In the same way, if we have an index on two columns, we can only use the index to satisfy a predicate on the second column if we have an equality predicate on the first column. Even if we cannot use the index to satisfy the predicate on the second column, we may be able to use it on the first column. In this case, we introduce a residual predicate for the predicate on the second column. This is the same residual predicate that we use for scans.
For example, suppose we have a two column index on column “a” and “b.” We can use this index to seek on any of the predicates that worked on the single column index. We can also use it to seek on these additional predicates:
For the next set of examples, we can use the index to satisfy the predicate on column a, but not on column b. In these cases, we need a residual predicate.
Finally, we cannot use the index to seek on the next set of predicates as we cannot seek even on column a. In these cases, we must use a different index (e.g., one where column b is the leading column) or we must use a scan with a residual predicate.
A slightly more concrete example
Consider the following schema:
create table person (id int, last_name varchar(30), first_name varchar(30))create unique clustered index person_id on person (id)create index person_name on person (last_name, first_name)
Here are three queries and their corresponding text showplan. The first query seeks on both columns of the person_name index. The second query seeks on the first column only and uses a residual predicate to evaluate first_name. The third query cannot seek and uses a scan with a residual predicate.
select id from person where last_name = 'Doe' and first_name = 'John' |--Index Seek(OBJECT:([person].[person_name]), SEEK:([person].[last_name]='Doe' AND [person].[first_name]='John'))
select id from person where last_name > 'Doe' and first_name = 'John' |--Index Seek(OBJECT:([person].[person_name]), SEEK:([person].[last_name] > 'Doe'), WHERE:([person].[first_name]='John'))
select id from person where last_name like '%oe' and first_name = 'John' |--Index Scan(OBJECT:([person].[person_name]), WHERE:([person].[first_name]='John' AND [person].[last_name] like '%oe'))
Note: In case you are trying to reproduce these plans, in this and some of my prior examples, I’ve used index hints (which I haven’t shown) to ensure that I get the plan I want to illustrate without having to insert data into the table.
A bit more about index keys
In most cases, the index keys are the set of columns that you specify in the create index statement. However, when you create a non-unique non-clustered index on a table with a clustered index, we append the clustered index keys to the non-clustered index keys if they are not explicitly part of the non-clustered index keys. You can seek on these implicit keys just as if you specified them explicitly.
For example, given this schema:
create table T_heap (a int, b int, c int, d int, e int, f int)create index T_heap_a on T_heap (a)create index T_heap_bc on T_heap (b, c)create index T_heap_d on T_heap (d) include (e)create unique index T_heap_f on T_heap (f)
create table T_clu (a int, b int, c int, d int, e int, f int)create unique clustered index T_clu_a on T_clu (a)create index T_clu_b on T_clu (b)create index T_clu_ac on T_clu (a, c)create index T_clu_d on T_clu (d) include (e)create unique index T_clu_f on T_clu (f)
The key columns and covered columns for each index are:
a, b, c, d, e
a, d, e
Note that each of the non-clustered indexes on T_clu include the clustered index key column a with the exception of T_clu_f which is a unique index.
To be continued one last time …
I’m planning one more post about scans and seeks. I’ll write a bit more about some of the tradeoffs that SQL Server considers when deciding which index to use and whether to perform a scan, a seek, or a seek with a bookmark lookup.
In my last post , I looked at how SQL Server 2008 handles scans on partitioned tables. I explained that
I was under the impression that using the:
a in (2, 3, 5, 7)
form would result in poor use of indexes. Is this mistaken?
"a in (2, 3, 5, 7)" is equivalent to writing "a = 2 or a = 3 or a = 5 or a = 7". Both syntaxes give the same plan and (if an index seek plan is possible and chosen by the optimizer) will result in four separate index seeks (though all four of these seeks these will generally be implemented using a single index seek operator).
in: select id from person where cola > @a and colb > @b,
you get index seek only on cola, even if colb is also part of index. colb > @b is put in residue predicate.
To have colb also used in seek predicate, you need to write it as
cola = @a or cola > @a and colb > @b,
the optimizer can recognise it as a range, this is good.
however, if you add more columns to the condition, like colc and cold, the optimizer will fail to recognise it as range, and start to use table scan. That's a bad thing.
But I found a way to get around of this drawback, I just split the condition into multiple branches combined by several 'or' operators, then compiler will be able to recognize as a seek with multiple ranges, albeit at the cost of less efficient expression.
basically, the question boils down to how to pass a composite index bookmark to engine, such that the compiler recognize it as "x > bookmark_of_x" where x represnet a composite index, i.e.having more than two columns.
Unfortunately, SQL Server does not support a syntax for "composite" comparisons.
Great Artcile,It's really simple to understand a very complex operation.