The question invoking the discussion was why did a query elect to use 100+ workers, approximately half the configured worker threads? Before erasing this topic from my whiteboard let me document the highlights.
Often overlooked is the degree of parallelism decision is applied to each parallel operator of the plan.
For this example:
- Assume that the degree of parallelism elected for the query is 2.
- As a general rule assume that the input and output side of the operator requires the degree of parallelism work of workers. The various parallel operators and combine and distribute work making them one to many and many to one in some situations.
The query will have the controlling thread (1)
|--Stream Aggregate(GROUP BY:([ORDERS].[o_orderpriority])
|--Parallelism(Gather Streams, ORDER BY:
([ORDERS].[o_orderpriority] ASC)) (1 input and 2 output)
|--Stream Aggregate(GROUP BY:
|--Sort(ORDER BY:([ORDERS].[o_orderpriority] ASC))
|--Merge Join(Left Semi Join, MERGE:
|--Sort(ORDER BY:([ORDERS].[o_orderkey] ASC))
| |--Parallelism(Repartition Streams, (2 input and 2 output)
This query running at max degree of parallelism of 2 will require up to 8 workers.
I will spare you the show plan from the issue we were working on but the query looked like like the following and included 12 union alls.
select ... tblJan
select ... tblFeb
Each select was against a different month table and each of those tables was missing an index. To do the sorts and required joins each select required 4 DOP operators and was run on a 32 way system using a max degree or parallelism of 8.
(4 * 2 sides of each operator) * 8 = 64 * 12 queries = 768 workers.
SQL Server dynamically lowered the max degree of parallelism for this query but it still resulted in 100+ worker threads.
The solution for this issue was not to use the option MAXDOP hint but instead to create the proper indexes. The plan using the indexes was faster than any parallel plan.
Hint: When you are testing your applications set the max degree of parallelism to 1. Parallel execution is only determined by the current availability on the system. It is not a guarantee. I have seen many cases where the Q/A test ran within performance guidelines but as soon as the application deployed it started taking up too many workers or the queries are deemed too slow.
Parallelism can use power to hide query problems when if flex's its muscles. If an index is missing a sort can happen much faster on multiple CPUs. So a parallel sort can hide a missing index solution that is far better than throwing muscle at the problem. By setting the SQL Server instance to serial plan mode (max degree or parallelism of 1) in sp_configure you can identify and tune those queries.
SQL Server Senior Escalation Engineer