Why Do I see more threads per SPID in sysprocesses than MAXDOP?

Why Do I see more threads per SPID in sysprocesses than MAXDOP?

  • Comments 5

Most of the Customers using sql server 2000,2005 and 2008 have this doubt in their mind when using parallelism in sql server.

 

Why Do I see more threads per SPID in sysprocesses than MAXDOP?


The above question is being addressed in the following blog by Bob Dorr from SQL Server Escalation Services,

http://blogs.msdn.com/psssql/archive/2008/02/13/how-it-works-sql-server-per-query-degree-of-parallelism-worker-count-s.aspx

 

But we will try to provide some more clarification on this issue, in this blog by explaining the behaviour of query processor when dealing with parallelism with the help of following example.

 

We have used the following script from the book "Inside Sql server 2005 Query Tuning and Optimization" by Kalen Delaney, to induce parallelism in the execution plan,

 

create table [HugeTable1]

(

[Key] int,

[Data] int,

[Pad] char(200),

Constraint [PK1] PRIMARY KEY ([Key])

)

SET NOCOUNT ON

DECLARE @i int

BEGIN TRAN

set @i = 0

WHILE @i < 250000

BEGIN

   INSERT [HugeTable1] Values (@i,@i,NULL)

   SET @i = @i + 1

   if @i % 1000 = 0

   BEGIN

      COMMIT TRAN

      BEGIN TRAN

   END

END

COMMIT TRAN

 

SELECT [KEY],[DATA],[PAD] INTO [HugeTable2] FROM HugeTable1

 

ALTER TABLE [HugeTable2] ADD CONSTRAINT [PK2] PRIMARY KEY ([Key])

 

We use the following query which displays parallelism in it’s execution plan

 

set statistics profile on

select T1.[Key],T1.[Data],T2.[Data] From HugeTable1 T1 Join [HugeTable2] T2 ON T1.[Key] =T2.[Key] where T1.Data < 100 OPTION (MAXDOP 2)

 

Execution Plan

 

|--Parallelism(Gather Streams)         

  |--Nested Loops(Inner Join, OUTER REFERENCES:([T1].[Key]))

    |--Parallelism(Repartition Streams, RoundRobin Partitioning)

    |    |--Clustered Index Scan(OBJECT:([AdventureWorks2008].[dbo].[HugeTable1].[PK1] AS [T1])

                                                               WHERE:([AdventureWorks2008].[dbo].[HugeTable1].[Data] as [T1].[Data]<(100)))

  |--Clustered Index Seek(OBJECT:([AdventureWorks2008].[dbo].[HugeTable2].[PK2] AS [T2]),

                                                       SEEK:([T2].[KEY]=[AdventureWorks2008].[dbo].[HugeTable1].[Key] as [T1].[Key]) ORDERED FORWARD)

 

 

 

We ran the above query 100 times in a loop,identified the SPID while running the above query and queried the sysprocesses for that spid from another query window at the same time

 

select * from sys.sysprocesses where spid=56

 

spid   kpid   blocked waittype waittime lastwaittype cpu         physical_io   ecid   status   

56     5640   0       0x00BB   3        CXPACKET     66653       20605          0     suspended

56     5936   0       0x00BB   3        CXPACKET     2147483647  0              1     suspended

56     1252   0       0x00BB   1        CXPACKET     2147483647  0              2     suspended

56     3508   56      0x0024   0        LATCH_EX     2147483647  0              3     suspended

56     3580   0       0x0000   0        LATCH_EX     2147483647  0              4     runnable 

 

As we can see here we have 5 threads which are spawned to execute the query, however while executing the query we limited the DOP to 2 by specifying the MAXDOP hint to 2.

 

The reason is “MAXDOP is always specified per operator in the execution plan and not per execution plan”.

 

So in the above execution plan we have  3 operators in the execution plan viz Clustered Index Scan, Clustered Index Seek and Nested Loops.

 

So we have

- 2 threads (as per the MAXDOP hint) which are used to perform Clustered Index Scan,


- 2 threads (as per the MAXDOP hint) which are used for Nested Loop Join which also performs Clustered Index Seek to join the rows from outer Clustered Index Scan. 
In other words we do not have separate threads allocated for Clustered Index Seek opearor,


- 1 thread is used for parallel gather streams operator to collect and converge the parallel thread output from the Nested Loop join operator. This thread is basically a synchronizing thread (also referred as Thread 0 IN xml plan) which is used when there is parallelism in the execution plan of the query.

 

To justify the above explanation, we can observe the XML execution plan which also show the runtime behaviour of the plan

 

2 threads used to perform Clustered Index Scan

 

<RelOp NodeId="3" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan"….>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="2" ActualRows="100" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="0" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

 

 

2 threads used to perform Nested Loop as well the clustered index seek

 

<RelOp NodeId="1" PhysicalOp="Nested Loops" LogicalOp="Inner Join" ….>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="2" ActualRows="50" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="1" ActualRows="50" ActualEndOfScans="1" ActualExecutions="1" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

 

 

Same threads used to perform Clustered Index Seek on inner query

 

<RelOp NodeId="4" PhysicalOp="Clustered Index Seek" LogicalOp="Clustered Index Seek"…>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="2" ActualRows="50" ActualEndOfScans="0" ActualExecutions="50" />

<RunTimeCountersPerThread Thread="1" ActualRows="50" ActualEndOfScans="0" ActualExecutions="50" />

<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />

</RunTimeInformation>

 

 

Finally Thread 0 which is shown in all the operators is used to gather streams from all the parallel threads and to provide the final output to the client

 

<RelOp NodeId="0" PhysicalOp="Parallelism" LogicalOp="Gather Streams"….>

<RunTimeInformation>

<RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="1" ActualExecutions="1" />

</RunTimeInformation>

 

 

To conclude the MAXDOP setting is used to limit the number of threads per operation in the execution plan and does not limit the number of threads used to execute the query. So it is normal to see  threads per SPID in sysprocesses greater than MAXDOP setting.

 

However if you see lot of waits for CX Packets in sys.dm_os_wait_stats and High CPU being consumed by sql server, you can consider reducing the MAXDOP setting or identify the missing indexes which can expedite the processing of the query.

 


 

Parikshit Savjani,

SE, Microsoft SQL Server,

 

Leave a Comment
  • Please add 5 and 1 and type the answer here:
  • Post
  • PingBack from http://www.codedstyle.com/why-do-i-see-more-threads-per-spid-in-sysprocesses-than-maxdop/

  • Excellent blog, this explanation can answer also why you can see more threads than CPUs.

    I'm allways happy to see explanations that answer the reasoning of the behaviour.

    Thanks!!

  • Excellent blog, this explanation can answer also why you can see more threads than CPUs.

    I'm allways happy to see explanations that answer the reasoning of the behaviour.

    Thanks!!

  • Here's an excerpt from

    msdn.microsoft.com/.../ms178065(v=sql.100).aspx

    'Degree of parallelism determines the maximum number of CPUs that are being used; it does NOT (emphasis added) mean the number of threads that are being used'

    This clearly contradicts what's been said in this article.

    Please do clarify, as this is a major issue.

    Thanks lots!

  • Hi Reiner,

    Infact both the articles are trying to convey the same message which is "DOP doesn't decide the no. of threads which will be used for the complete query execution" (emphasis added for complete query)

    A query execution plan consists of multiple operators for e.g Clustered index scan, index seek, Nested Loop Join. In my blog here, I have tried to clarify that MAXDOP setting limits the number of threads per operator (emphasis added for operator) and doesn't decide the total threads used for the complete execution of the query.

    So to summarize, the MAXDOP setting limits the number of threads to be used per operator in the query execution and hence it limits  the number of CPUs to be used at any time during execution of the query since one thread can run on only 1 CPU but it definitely doesn't decide the number of threads used for the complete execution of the query.

    Hope this clarifies !!!

Page 1 of 1 (5 items)