Conor Cunningham’s blog on SQL Server, data-driven applications, and pretty much whatever other random stuff he decides to post.
Thanks for all who attended SQLBits last weekend and for those of you who attended my talks.
I had three talks this time at SQLBits, including the technical keynote on Saturday. I’ve attached these decks for your reference here:
Making DW Easy ppt deck
SQL Server 2012 ColumnStore Keynote ppt deck
Distributed Query Deep Dive ppt deck
With regard to your slide deck on column store indexes I have the following questions:
1. Does the query execution engine have the ability to work out the size of the L2 and size batches accordingly ?.
2. Should the general recommendation be not to use hyper threading for servers that have Intel processors so as not to split the level 2 cache.
3. When choosing server hardware (this leads on from question 1) should the size of the L2 cache relative to the number of cores be a consideration when using column store indexes.
I've re-read your slide deck, it looks like the answers to questions 1 and 3 are both yes, I'm presuming the answer to question 2 is also yes ?, can you please confirm this ?
I can give some general guidance here, but it is worth noting that different generations of CPUs have different relative performance, different cache sizes, and likely will ultimately require that you just try out HT and see how it works.
Historically I have recommended that HT be disabled for SQL workloads that required parallel query plans. The reasons were that the SQL QP would generate plans that statically assigned ranges to threads and early HT CPUs would progress at different speeds (meaning that you would slow down a parallel query and have some CPUs sitting idle waiting for the "slow hw threads" to finish. This distinction has been less important in more recent generations of CPUs. I haven't done a deep test on this space lately, but I still recommend systems that can have parallel queries to do some tests before production.
The new batch execution model for ColumnStore actually does not assign work statically to threads for the portions of queries that are running in batch mode. So, if you had a system that was only running batch mode queries, then it would be just fine HT and you would see some speedup. Each HW thread would make whatever progress it wants and the overall system would likely go faster. (again, feel free to validate on your machines!)
There are several tricks to this - not all queries run in batch mode. So, you can make this comparison for a given query, but the mix of queries and their plans impacts the "best setting" for this. Ultimately you'd need to look at all your likely query plans, see what fraction is in batch mode, then decide. (or at least measure with HT on/off for your workloads to see throughput).
To answer your specific questions, it's not something where we are trying to keep a compiled list of CPUs and do different things for each - we have picked an implementation that works well for current CPU options you can buy. We recommend you have a recent CPU to effectively use ColumnStores in SQL.
It's also not quite valid to say "disable HT because it might flush the L2 cache". This is possible, but really we're just greatly reducing the number of L2 cache misses. Even if this were exactly as you describe on a given piece of hardware for a given query with a given plan on a given data set, it likely will be different on different data sets, different queries, and still would be better than the row model for executing the same queries. So, you can try it in whatever configuration you'd like, but we're not making an active suggestion here on this point.
It gets more complicated in that some operators, such as project, make rows wider (which might not fit in the L2 cache). Other operators could add new rows, meaning that they might not into the existing batch. We're going to do things over time as we grow this space that makes the specific questions you are asking not live-or-die, so I would just "buy recent hardware".
My recommendation is to buy recent CPUs with big L2 caches, yes. It's not critical that you buy the biggest ones you can, but the algorithms we ship are meant to leverage L2 caches being there.
Finally, the batch processing code will evolve and improve in future releases. When you upgrade your servers, just be aware that you should also double-check what version of SQL Server has the most recent version of the columnstore code. This will help you get the most of your HW investments.
Many thanks for going to length you have to answer my question in such great detail, the size of it is almost like a blog posting in its own right.
I know that SQL Azure is not you area, but the optimizer very much is, and that in a contended shared tenancy environment there are issues around specific instances not being able to have exclusive use of the L2 cache, however, is this technology likely to find its way into SQL Azure ?.
SQL Azure is still a new platform and we are developing a lot of new functionality and key features every day. Right now, we don't do a lot to help people trying to make data warehouses to run on that platform yet, and I think it would be fair to say that it is possible but not the primary targetted scenarios. We don't really comment publicly on future features for any of our products - sorry.
Thanks for the information.