I was working on an index build issue for an 80 CPU system and kept seeing that only 64 CPUs were getting used. I had carefully studied sys.dm_os_spinlock_stats and sys.dm_os_wait_stats along with performance counters, memory usage pattern, and I/O activities. In fact, I had an 80 CPU, 2TB RAM, 4TB SSD system so I was convinced the SQL Server was CPU bound and adding more CPUs for the index build could be a benefit.
Note: I must caution you that adding more CPUs can lead to reduced performance because a bottleneck such as memory or I/O can become a larger problem.
I like to think of it like pumping gas at my favorite filling station. The pump at the storage tank can only move so much liquid through a finite pipe size. Adding more filling outlets for patrons to use does not mean the overall flow of gas (gallons/sec) increases. In fact, I like filling when no one else is filling because I maximize my flow and reduce my overall time at the pump.
TEST any MAXDOP setting well as you might find less goes faster.
After looking at various SQL Server Books Online references and then stepping though the code I realized our documentation is not as accurate as it could be. I hope I this post can reduce some of the confusion.
There are plenty of references for tuning MAXDOP to allow queries to run at their best while reducing the overhead of the parallelism. You have all seen the references for capping MAXDOP at 8, or number of schedulers of the NUMA node if smaller, or …. The fact is, these are all great and recommended best practices.
This post is not intended to contradict any of the current recommendations. This blog is solely focused on a specific maintenance target. The reset of my system has to be idle so I can safely consume the majority of schedulers. It is probably the middle of the night, users are sleeping and you want to schedule a job that can take full advantage of the overall system. I have reviewed various performance points and I believe using a high level of parallelism could allow my index build to complete quickly.
Warning: Index fragmentation with increased levels or parallelism: http://blogs.msdn.com/b/psssql/archive/2012/09/05/how-it-works-online-index-rebuild-can-cause-increased-fragmentation.aspx
There are several stages to determining the degree of parallelism (MAXDOP) a query can utilize.
Stage 1 – Compile
During complication SQL Server considers the hints, sp_configure and resource workgroup settings to see if a parallel plan should even be considered. Only if the query operations allow parallel execution:
If hint is present and > 1 then build a parallel plan
else if no hint or hint (MAXDOP = 0)
if sp_configure setting is 1 but workload group > 1 then build a parallel plan
else if sp_configure setting is 0 or > 1 then build parallel plan
Stage 2 – Query Execution
When the query begins execution the runtime, degree of parallelism is determined. This involves many factors, already outlined in SQL Server Books Online: http://technet.microsoft.com/en-US/library/ms178065(v=SQL.105).aspx
Before SQL Server looks at the idle workers and other factors it determines the target for the degree of parallelism.
if sp_configure or query hint forcing serial plan use (1)
else if resource workgroup set
if query hint present use min(hint, resource workgroup)
else use resource workgroup
If still 0 after the detailed calculations it is set to 64 (default max for SQL Server as documented in Books Online.) This fooled me some because on the 80 CPU system it has 2 Windows scheduler groups x 40 CPUs. I might have expected a 40 CPU cap to avoid crossing over a Windows scheduler group. This is not the case, SQL Server hard codes the 64 CPU target when the runtime target of MAXDOP is still 0 (default.)
The MAXDOP target is now adjusted for:
Actual CPU count (affinity settings from sp_configure and the resource pool).
Certain query types (index build for example) look at the partitions
Other query type limitations that may exist
Now SQL Server takes a look at the available workers (free workers for query execution.) You can loosely calculate the free worker count on a scheduler using (Free workers = Current_workers_count – current_tasks_count) from sys.dm_os_schedulers
Once the target is calculated the actual is determined by looking at the available resources to support a parallel execution. This involves determining the node(s) and CPUs with available workers.
Older versions of SQL Server used a polling mechanism every ~1 second to determine the node with the most free workers to target. This meant you could encounter race conditions from multiple queries, both going parallel on the same node when going parallel on separate nodes would have resulted in better CPU usage.
Newer builds of SQL Server actively track the free workers. This significantly reduces the possibility of assigning parallel queries to the same set of schedulers.
Trace flag 2466 - Force older version logic to determine number of available resources.
The worker location information is then used to target an appropriate set of CPUs to assign the parallel task to.
In general the placement decisions are:
Using XEvents you can monitor the MAXDOP decision logic. For example:
Back to trying to get my index build to use all 80 CPUs. I can do several things:
1. Use MAXDOP=80 query hint
2. Set sp_configure ‘max degree of parallelism’, 80 -- Warning this applies to any query
3. Create resource pool/workload group and set MAXDOP=80 and assign only the index build connection to it using a resource governor classifier.
Here are the MAXDOP results on my 80 CPU system at different setting levels.
|Query Hint ||sp_configure ||Workgroup ||RUNTIME |
|0 ||0 ||80 ||80 |
|0 ||0 ||0 ||64 |
|1 ||0 ||80 ||1 |
|2 ||0 ||80 ||2 |
|0 ||80 ||0 ||80 |
|0 ||1 ||0 ||1 |
|0 ||2 ||0 ||2 |
|80 ||2 ||0 ||80 |
|80 ||10 ||2 ||2 |
You can monitor the number of parallel workers by querying:
Note: Some configuration changes may require a flush of procedure cache (dbcc freeproccache) or a disconnect/connect pairing to take affect.
Bob Dorr - Principal SQL Server Escalation Engineer