To set up this scenario, run the script below:

USE tempdb

GO

IF OBJECT_ID ('test1') IS NOT NULL DROP TABLE test1

GO

CREATE TABLE test1 (c1 tinyint, c2 smallint)

DECLARE @x int

DECLARE @msg varchar(1000)

SET @x = 1

SET NOCOUNT ON

BEGIN TRAN

WHILE (@x <= 1000000)

BEGIN

INSERT INTO test1 (c1, c2) VALUES (@x % 255, CASE WHEN @x % 1000 = 500 THEN 1000 ELSE @x % 1000 END)

IF @x % 5000 = 0

BEGIN

COMMIT TRAN

BEGIN TRAN

SET @msg = 'Inserted ' + CONVERT(varchar(20), @x) + ' rows ...'

RAISERROR (@msg, 0, 1) WITH NOWAIT

END

SET @x = @x + 1

END

WHILE (@@TRANCOUNT > 0) COMMIT TRAN

SET NOCOUNT OFF

GO

CREATE INDEX idx1 ON test1 (c2)

UPDATE STATISTICS test1 WITH FULLSCAN

DBCC SHRINKDATABASE (db2)

-- DBCC SHOW_STATISTICS ('test1', 'idx1')GO

The hypothetical scenario here is that your users complain that the first of the two queries below runs very slowly. You find that if you force SQL to use index [idx1] with an index hint, the query executes much more quickly. But why should you change your app to compensate for what seems to be a SQL Server bug? :)

USE db2

GO

SET STATISTICS PROFILE ON

SET STATISTICS TIME ON

-- Query #1 (slow)

SELECT c1, c2 FROM test1 WHERE c2 = 500

-- Query #2 (fast)

SELECT c1, c2 FROM test1 WITH (INDEX = idx1) WHERE c2 = 500

SET STATISTICS TIME OFF

SET STATISTICS PROFILE OFFGO

To me, this suggests two questions: 1) Why isn't the fast plan chosen by default? 2) Are there any possible solutions that don't require modifying code to supply an index hint?

Despite appearances, this actually isn't a bug. Here's the explanation:

This table demonstrates uneven data distribution. The [c2] column holds values ranging from 0 to 999. There are exactly 1000 rows with the same [c2] value for each of the integers between 0 and 999. However, the value 500 is an anomaly; there are 0 rows with this value. Run the following query to show the number of rows with each [c2] value and you'll see that there are none with [c2]=500:

SELECT c2, COUNT(*) AS cnt FROM test1 GROUP BY c2

c2 cnt

------ -----------

...

498 1000

499 1000

501 1000

502 1000

...

When the server builds a histogram for the statistics on a column, it attempts to intelligently select range endpoints so that a given step of the histogram represents a range of values with similar density. If you run DBCC SHOW_STATISTICS on the index you can see the histogram:

-- DBCC SHOW_STATISTICS ('test1', 'idx1')

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS

------------ ---------- ------- ------------------- --------

0 0.0 1000.0 0 0.0

499 498000.0 1000.0 498 1000.0

503 2000.0 1000.0 2 1000.0

1000 496000.0 1000.0 496 1000.0

RANGE_HI_KEY is the value that sets the upper bound for the range of values represented in each histogram step. The lower bound is RANGE_HI_KEY+1 for the preceding step. For example, consider this row:

499 498000.0 1000.0 498 1000.0

This indicates that there are 498000 rows (RANGE_ROWS) with a value between 1 and 498, inclusive. There are 1000 rows with the exact value of 499 (EQ_ROWS). AVG_RANGE_ROWS tells us that the typical value that falls within the range 1-498 shows up in 1000 rows. Now consider the next step in the histogram:

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS

------------ ---------- ------- ------------------- --------

503 2000.0 1000.0 2 1000.0

This indicates that there are 2000 rows (RANGE_ROWS) with a value between 500 and 502, inclusive; and there are 1000 rows (EQ_ROWS) with the exact value of 503. The typical c2 value in the range 500-502 shows up average of 1000 times in the table. Recall what we know about the actual number of rows in this range:

c2 cnt

------ -----------

...

499 1000

501 1000

502 1000

503 1000

...

This histogram step describes its range of c2 values exactly. What the histogram doesn't tell us, however, is exactly how many times a particular value in the 500-502 range appears. Remember that each histogram step only summarizes the values within a given range; it doesn't tell you exactly how many of each particular value there are (that is, unless the table's domain is small enough for every distinct value to have its own step in the histogram). SQL knows it needs to search for rows where [c2]=500, so it locates this step in the statistics' histogram, finds that the typical value in this range shows up in 1000 rows, and uses this as its rowcount estimate.

This is a key point: if a value being searched for falls in the middle of a histogram step, the AVG_RANGE_ROWS for that step is used to estimate the number of matches that will be found. SQL is always optimistic and assumes that the value it is searching for is actually one of the range rows. In this case that assumption is incorrect. The QP ends up overestimating the number of rows that will be returned and choosing a table scan when an index seek would have been more efficient.

So if this isn't a bug, what could you do to fix the problem? One solution would be to make the nonclustered index a covering index. Alternately, you could make [idx1] a clustered index. And, if you're using SQL 2005, you could use a plan guide with a USE PLAN hint to force the fast plan without changing the query text. All of these solutions should cause SQL to select the fast index seek plan. The bad cardinality estimate will still be there, but the perf problem won't.

(Update: A blog post from Jack Li describing a similar problem can be found here.)