LinkedIn | FaceBook | Twitter
SQL Server uses something called the "Query Optimizer" to find the data you're looking for. To do that, there is another mechanism, called "Statistics", which can be created and maintained on columns in a table - even tables that don't have an Index.
You can read more about Statistics here, and in general they are a very good thing for performance. You could try to figure out which columns need statistics, but it's often better to let SQL Server do that for you. You can turn on the AUTO_CREATE_STATISTICS database option and SQL Server will automatically determine when it needs the Statistics and create them for you.
But just like Indexes, Statistics need to be maintained. You have a couple of options here. First, you can update the statistics manually with a Transact-SQL command, and second, you can set the AUTO_UPDATE_STATISTICS option. For the most part, the second option is best. If you do see lots of activity with either of these options and it begins to bog down the system (pretty rare, in my experience), you can perform these two tasks manually on a condition or a schedule. For me, the benefit of the increase in reads outweighs the cost of having them on automatically.
"If you do see lots of activity with either of these options" Would be nice to have a link on where I can see this!
Good point - there are some interesting bits of info in this link: http://it.toolbox.com/blogs/aSQLguy/auto-update-statistics-db-option-26468 and here: http://www.microsoft.com/technet/abouttn/flash/tips/tips_070604.mspx
And a friend wrote me to say:
"SQL Server does not use the Query Optimizer to find data. The Optimizer doesn’t find data, it finds a plan for the Execution Engine to use to retrieve the data. But I definitely agree with your subject. I tell people to always have these options on unless they have proved they are better off without them, and it is rare."
Thanks - this is a good place to mention this. This person is correct! I do tend to be less specific on some things because of the level of the topic, but they are of course correct. In fact, it’s a concert of features that “finds” data, but the optimizer uses the stats most often, so that’s why I put it that way.
Thanks for reading, and for helping me be more clear!
How much computer resources does this use up?
"The automatic statistics update uses a sampling rate that is way too small, and you cannot override this sampling rate. On a large table the sampling rate is often well less than 1%. This in turn causes the query optimizer to make bad decisions - and you only need a few such bad decisions to cripple database performance. I've spoken to Microsoft about this and they refuse to acknowledge there's a problem; so the only solution is to run statistics manually with FULLSCAN,NORECOMPUTE on a regular basis. We've found every 2 weeks is fine."
You are absolutely correct. The "black box" auto update algorithm and the "we are Microsoft, we know what's good for you" arrogance combine to make the life of a DBA with very large tables very difficult indeed.
People (incl. almost everyone at Microsoft apparently) have no clue at all what it is like to deal with 24*7 systems with billions of rows and 100s of millions of insertions/deletions per day. Running update statistics with fullscan on one index alone might take > 24hr!!
In my experience, any person low enough in the Microsoft organisation that a customer might interact with, likely has no clue. YMMV