One of the few sp_configure options that good DBAs regularly change is “max degree of parallelism” or MAXDOP. So of course everyone is looking for the one best value to use for that setting. As with most things in the database world, the answer is “it depends.” I’m going to start with some background to help you understand some of the variables that affect how you should set that value on any given instance of SQL Server.
For the purposes of this discussion we can think of SQL Server schedulers as 1 visible, non-DAC scheduler per logical processor as seen by the operating system and made available to SQL Server. The terms processor, core, CPU, thread, and scheduler are often interchanged (sometimes correctly, sometimes not) throughout various documents and not all have just one meaning. I’ll try to stick to the SQL scheduler terminology to avoid confusion with logical/physical processors. For this discussion today I will assume you have not turned on CPU affinity options, you are not using WSRM, and your login is not limited by MAX_DOP settings in a Resource Governor resource pool.
As an example, say you have 4 quad cores with hyperthreading on. That comes to 4x4x2=32 visible schedulers available for non-DAC SQL Server queries.
You can see the number of visible schedulers with this query:
SELECT * FROM sys.dm_os_schedulers
When a query is compiled it is broken down into steps/operations internally. When the query runs, each step of the query plan that is eligible to go parallel can be optimized as using 1 or more schedulers. So if the plan has 10 steps, maybe step 1 gets 1 scheduler, step 2 gets 12 schedulers, step 3 gets 4 schedulers, step 4 gets 20 schedulers, etc. The total for the query can add up to more than the MAXDOP because the MAXDOP applies per step. The number of schedulers used per step is not saved in the query plan; it is chosen each time based on availability. For any step that uses more than one processor SQL will wait on CXPackets as it synchronizes the activity across schedulers (exchanges data across threads) for that step. As with other waits in SQL Server, a wait on CXPacket in and of itself is not a problem but rather a normal and expected part of how the system does its work. If the waits become excessive, then you can start looking at parallelism as a potential problem.
You can set the “max degree of parallelism” for the entire instance. This means that any given step in any given query will not be assigned more schedulers than the value set in that configuration option. The setting can be overridden (higher or lower) for individual TSQL queries with the OPTION MAXDOP query hint. Some DDL operations such as ALTER INDEX have MAXDOP hints available as well.
So say a query executes and one or more of its steps “go parallel”. This means that the query optimizer determines the query will finish faster overall when it uses parallel threads for one or more steps, even though it uses more CPU to do so. Partly it’s that the CPU time is more concentrated as more threads are active at once and partly that there is extra overhead in managing the context switches (CXPacket). So when you see CXPackets in use and even when you see waits on CXPackets it can be normal and expected.
Parallelism is very useful when you have a small number of very long running queries. Each query executes faster in parallel and since there are few queries at any given point in time there is likely little blocking or competition for resources. However, if you have many small, fast queries few will “go parallel” because the overhead of managing the data exchanges between threads is greater than the gain from executing in parallel. When you start mixing the types of queries it is harder for the optimizer to choose the right parallelism levels for the overall workload. One slow query may start at a time when things are relatively idle and it may use many threads. Then several small, fast queries come in while the first is still running and the CXPacket waits (number and duration) start to increase making all of the queries go slower. SQL’s CPU usage may go up significantly.
Another problem can arise when the statistics are out of date or very skewed. That means that the optimizer may choose a plan based on the expected number of rows (cardinality estimate) that doesn’t quite balance out as expected because the actual number of rows is so far off. In that case updating statistics (perhaps WITH FULLSCAN if the data is skewed) may help. In many cases adding or altering indexes can help as well since quite often fewer rows are touched when sufficient indexes exist and therefore the need for parallel data access decreases.
On a system where you expect all of the queries to be short/fast, setting the “max degree of parallelism” to 1 (never go parallel) basically gives priority to those short/fast queries and any big/slow queries just take longer to execute but don’t hold up other things as much. Leaving it at 0 (use any #) helps on systems where there are only a few queries running but all are slow/expensive. This is because SQL Server has the most flexibility with a setting of 0 and can choose to use as many schedulers as needed. Anything in between 1 and 0 is an attempt to manually balance those things if you feel SQL isn’t doing it the way you want it to. The only way to really know for sure what the best setting is on your system is to test average and peak workloads with different “max degree of parallelism” settings. You should keep in mind that as the data, number of users, other activity, etc. changes the “best” value may change as well.
Now we make it to the question of what value to use for “max degree of parallelism” on your particular instance. Keep in mind that these are general guidelines for OLTP systems. Warehouse/OLAP loads, edge cases, huge systems, many SQL instances, odd configurations, unusual workloads and other factors can outweigh the reasoning behind some of these recommendations. If you’re using hyper-threading, then that is really already doing context switches of a sort and you basically have two SQL schedulers mapped to one physical core. That can cause problems in the balancing equation, so if you have hyper-threading enabled you need to make sure “max degree of parallelism” is not 0 and is not greater than half the number of visible schedulers. If you use CPU affinity mask or WSRM to limit the number of cores available to SQL Server, then your “max degree of parallelism” should be no more than the number of cores available to that instance of SQL Server (if necessary divided by half for hyper-threading). If you are using hard NUMA, keep the value no larger than the number of cores in a NUMA node as this helps reduce costly foreign memory access. As a starting point for an average workload you probably don’t want to use a value greater than 8, though this is not a hard limit but rather a starting point/guideline. Start with the smallest value out of all the rules above. You may need to adjust the value based on your testing and/or as the workload and other factors change over time.
Next there is the question of how to know if the current “max degree of parallelism” is contributing substantially to a current performance or resource constraint problem. If your Process.SQLServr.%Processor Time/# of logical processors value is much higher than normal/expected AND wait stats shows more than 5% of your waits are on CXPackets, you may want to test lower (or non-zero) values of “max degree of parallelism”. You find connections that have currently running parallel queries by looking for SPIDs with multiple ECIDs in sys.sysprocesses. Query plans that are eligible for parallelism will have the Parallelism operator. If you decide your current degree of parallelism might be negatively impacting performance, use the information above to estimate what may be a good starting point for your particular instance and test/baseline/tune from there. No restart is required after you change the value; new queries automatically use the new value (unless overridden by a hint in the query).
· Case Study: Part 1: CXPACKET Wait Stats & 'max degree of parallelism' Option: Introduction to Using Wait Stats to Identify & Remediate Query Parallelism Bottlenecks http://blogs.msdn.com/b/jimmymay/archive/2008/11/28/case-study-part-1-cxpacket-wait-stats-max-degree-of-parallelism-option-introduction-to-using-wait-stats-to-identify-remediate-query-parallelism-bottlenecks.aspx // http://blogs.msdn.com/b/jimmymay/archive/2008/12/02/case-study-part-2-cxpacket-wait-stats-max-degree-of-parallelism-option-suppressing-query-parallelism-eliminated-cxpacket-waits-liberated-30-of-cpu.aspx
· SQL Server 2005 Waits and Queues http://technet.microsoft.com/en-us/library/cc966413.aspx
· Recommendations and Guidelines for 'max degree of parallelism' configuration option http://support.microsoft.com/kb/2023536
· Problem fixed in SQL Server 2008 SP1 CU3 and higher: 970399 FIX: The MAXDOP option for a running query or the max degree of parallelism option for the sp_configure stored procedure does not work in SQL Server 2008 http://support.microsoft.com/default.aspx?scid=kb;EN-US;970399
· SQL University: Parallelism Week - Introduction http://sqlblog.com/blogs/adam_machanic/archive/2010/05/24/sql-university-parallelism-week-introduction.aspx
· Maximum Number of Processors Supported by the Editions of SQL Server http://msdn.microsoft.com/en-us/library/ms143760(v=SQL.105).aspx
· Windows 2008 R2 - Groups, Processors, Sockets, Cores Threads, NUMA nodes what is all this? http://blogs.msdn.com/b/saponsqlserver/archive/2010/09/28/windows-2008-r2-groups-processors-sockets-cores-threads-numa-nodes-what-is-all-this.aspx
· Why Do I see more threads per SPID in sysprocesses than MAXDOP? http://blogs.msdn.com/b/sqlserverfaq/archive/2009/01/05/why-do-i-see-more-threads-per-spid-in-sysprocesses-than-maxdop.aspx
Note: This information was verified for SQL Server 2008 R2 and with very minor variations will apply also to SQL Server 2008 and SQL Server 2005.
I don't agree with your notes on hyperthreading, in particular: "you basically have two SQL schedulers mapped to one physical core." Hyperthreading allows each core to run two threads simultaneously, and on newer CPUs seems to perform quite well. I recently ran tests on a few very large parallel queries, with hyperthreading both on and off and maxdop set to 0 in both cases (at the query level -- I don't recommend leaving that setting at the server level, but that's a different topic). Enabling hyperthreading yielded a ~25% performance gain across the board for those workloads. Like anything else you should test on a case-by-case basis. There are no answers that work for everyone; but I think that generally speaking, enabling hyperthreading is a much better idea today than it used to be.
By the way, I don't like doing a lot of tweaking of MAXDOP at the server level these days. Instead I prefer to work with the cost threshold. More info here:
I did not intend to comment on whether or not you should enable hyper-threading, but if you do have on hyper-threading and do not leave MAXDOP=0 then your MAXDOP starting point should not be more than half the visible schedulers. Said another way, you don't want to set MAXDOP higher than the number of physical cores unless testing shows it is definitely beneficial in your specific circumstance.
Good info, Cindy. Thanks for sharing.
Thanks for this post.I've been trying to wrap my brain around parallelism for the last couple of weeks. This has helped some of the pieces fall into place. The sections "What is parallelism?" and "How does a query use parallelism?" really helped. I look forward to reading more of your posts in the future.
Thanks for your article. In my environment, we have a workload mix (many short-running queries and some long-running queries), and so we disable MAXDOP at the server level. However, when it comes to maintaining indexes off-hours, I've found that MAXDOP can be helpful. It appears that setting MAXDOP = 0 on a CREATE INDEX command will cause SQL Server to find the best degree of parallelism, even when MAXDOP is disabled at the server level.
Is there a way to determine how many parallel threads are being used by query? We have set the degree of parallelism to 3 at the server but when I query master..sysprocesses, sometimes I see 12-13 entries with different ecid for a single SPID.
@Praveen - the max degree of parallelism is per STEP in the query, NOT per query. So a given query could have many more threads than the MAXDOP setting.
Is there any way to know which step of the query is sent to which sceduler? Also, how much time that schedular is taking to execte the query step?