Friday, May 06, 2005 3:06 PM
Michael S. Kaplan
Making SQL Server index usage a bit more deterministic
Chris McKenzie blogged about a situation that I will be talking about in my upcoming TechEd 2005 presentations (particularly DBA319!), so I thought I might talk about the issue a bit here.
In his post Note to Self, Chris said:
Don't put ORDER BY Statements in Views.
Delay sorting until you're actually about to use the data.
If you end up sorting differently than the default sort specified by the ORDER BY in the view, it comes as an expensive performance hit. Witness a stored procedure that was breaking a unit test due to a SqlClient.SqlTimeout Exception (took 2.5 minutes to run). Removing the ORDER BY statements from the underlying views brought execution time down to 6 seconds.
Luckily it is not that as bad as all that (or it does not have to be). Though I actually said as much in my International Features in SQL Server 2000 white paper, which was reviewed and approved by the SQL Server International team (and several people on the Engine and at least one architect!):
Issues with defining collation at the column level
How often would you have a database that needs one sort order (for example, Latin1_General) and a column that needs a different one (for example Greek)? Sometimes, this may be crucial, but in other cases if the data in your database does not use a single collation, then it is probably multilingual data that may need to be sorted according to more than one collation. Being able to define multiple collations, each of which can be indexed, allows you to access the Greek data by specifying the Greek collation, and to have this query be an indexed search.
That last clause "and to have this query be an indexed search" is the crux of the matter. In the example provided earlier, using a COLLATE expression in the ORDER BY clause of a query gives you the functionality; however, this will not be an indexed ordering, so it will be slower for large datasets. As it stands, column-level collation makes sense only if you do not have monolingual data in a column, or if you denormalize your database to store different languages in different columns.
It was later pointed out to me that there was a way to make this work in Shiloh (SQL Server 2000), a way which gets much better in Yukon (SQL Server 2005). You can basically create multi-lingual indexes!
It works like this:
Say you have a Table tbl with column c with collation of Latin1_General_CI_AS
select * from tbl
order by c COLLATE Japanese_CI_AS
No Japanese index exists here -- so everything done with this uery will be slower.
create table tbl (c nvarchar(100), c2 as c COLLATE Japanese_CI_AS)
create index itc2 on tbl(c2)
The original query can now use the Japanese index!
Now, In SQL Server 2000, you must explicitly use that field c2 to get the index to work.
But in SQL Server 2005, the work was done in the engine to allow deterministic index usage that would not treat all indexes as "equal" and arbitrarily pick one. If an index exists on a column that matches the characteristice of the query, then the QPE will make intelligent use of that index.
Pretty cool trick, if you ask me. I wish I had known about in several of the applications I did, and before I wrote the white paper!
I will be delving into this sort of item and more in Orlando and in Amsterdam, so if you are going to TechEd 2005 then be sure to sign up for DBA 319: Best Practices for Search in Multilingual Data Sets (Using SQL Server Collation for Sorting and Indexing).
This post brought to you by "ยง" (U+00a7, a.k.a. SECTION SIGN)