Making SQL Server index usage a bit more deterministic

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

Making SQL Server index usage a bit more deterministic

  • Comments 9

Chris McKenzie blogged about a situation that I will be talking about in my upcoming TechEd 2005 presentations (particularly DBA319!), so I thought I might talk about the issue a bit here.

In his post Note to Self, Chris said:

Don't put ORDER BY Statements in Views.

Delay sorting until you're actually about to use the data.

If you end up sorting differently than the default sort specified by the ORDER BY in the view, it comes as an expensive performance hit. Witness a stored procedure that was breaking a unit test due to a SqlClient.SqlTimeout Exception (took 2.5 minutes to run). Removing the ORDER BY statements from the underlying views brought execution time down to 6 seconds.

Luckily it is not that as bad as all that (or it does not have to be). Though I actually said as much in my International Features in SQL Server 2000 white paper, which was reviewed and approved by the SQL Server International team (and several people on the Engine and at least one architect!):

Issues with defining collation at the column level

How often would you have a database that needs one sort order (for example, Latin1_General) and a column that needs a different one (for example Greek)? Sometimes, this may be crucial, but in other cases if the data in your database does not use a single collation, then it is probably multilingual data that may need to be sorted according to more than one collation. Being able to define multiple collations, each of which can be indexed, allows you to access the Greek data by specifying the Greek collation, and to have this query be an indexed search.

That last clause "and to have this query be an indexed search" is the crux of the matter. In the example provided earlier, using a COLLATE expression in the ORDER BY clause of a query gives you the functionality; however, this will not be an indexed ordering, so it will be slower for large datasets. As it stands, column-level collation makes sense only if you do not have monolingual data in a column, or if you denormalize your database to store different languages in different columns.

It was later pointed out to me that there was a way to make this work in Shiloh (SQL Server 2000), a way which gets much better in Yukon (SQL Server 2005). You can basically create multi-lingual indexes!

It works like this:

Say you have a Table tbl with column c with collation of Latin1_General_CI_AS

select * from tbl
order by c COLLATE Japanese_CI_AS

No Japanese index exists here -- so everything done with this uery will be slower.

create table tbl (c nvarchar(100), c2 as c COLLATE Japanese_CI_AS)
create index itc2 on tbl(c2)

The original query can now use the Japanese index!

Now, In SQL Server 2000, you must explicitly use that field c2 to get the index to work.

But in SQL Server 2005, the work was done in the engine to allow deterministic index usage that would not treat all indexes as "equal" and arbitrarily pick one. If an index exists on a column that matches the characteristice of the query, then the QPE will make intelligent use of that index.

Pretty cool trick, if you ask me. I wish I had known about in several of the applications I did, and before I wrote the white paper!

I will be delving into this sort of item and more in Orlando and in Amsterdam, so if you are going to TechEd 2005 then be sure to sign up for DBA 319: Best Practices for Search in Multilingual Data Sets (Using SQL Server Collation for Sorting and Indexing).

 

This post brought to you by "§" (U+00a7, a.k.a. SECTION SIGN)

Comment on the blather
Leave a Comment
  • Please add 3 and 5 and type the answer here:
  • Post
Blog - Comment List
  • Thanks for posting this. I solved my earler problem by delaying sorting until the stored procedure. I identified the problem as being that the ORDER BY clause in my stored proc was sorting the data on a different set of columns than the default sort specified by the underlying view. Are you saying that I could also have improved performance by adding indexes to the columns or column-groups that I will most often be sorting on? I may have to compare and benchmark the two approaches.

    This was a very interesting blog. Thanks again.
  • Yes, that is what I am saying. :-)

    And of course in SQLS 2005 the syntax is more natural since you never have to think about the indexes after you have created them; they will just work....

    Glad you liked it! I'll be doing more SQL server posts as I get closer and closer to TechEd.
  • Julia Lerman said just a few hours ago that she is a SQL Server luddite.
    I think that is not really...
  • A few minutes ago, I posted about Achieving case insensitivity, to deal with how best to make things...
  • Not all of the collations in SQL Server 2005 (a.k.a. Yukon) or earlier versions have coverage over...
  • (No, this is not the font post!)
    Bryan Murtha asks:

    I've read your blog and I was hoping to get out...
  • Previous posts in this series: Part 0: The empty string sorts the same in every language Part 1: The

  • So I was chatting with Kim and Paul after that .NET DA meeting I mentioned the other day. The one where

Page 1 of 1 (9 items)