Tom Stringer (@SQLife) was working on some HADR testing for a customer to simulate many availability groups and introduce significant load into the system to measure overhead and such. In his quest to do that he was seeing behavior that he couldn’t really explain and so worked with him to uncover what was happening under the covers.
For his test environment he was using a single CPU VM with the default worker thread setting on X64 which means it would result in 512 worker threads. Yet he was showing far more than 512 worker threads on his system which needed to be explained.
To give you context, in this test there were 40 Availability groups
select count(*)as NumAvailabilityGroups from sys.availability_groups
Sys.dm_os_schedulers showed a huge amount of workers, more than the max of 512 expected.
where status = 'Visible Online'
Sp_server_diagnostics backed this up as well:
<queryProcessing maxWorkers="512" workersCreated="923"
We then looked at sys.dm_os_workers to see if that would shed any light, and it did give us a clue. All the WAITFOR threads here were from the induced workload ( User queries), but we saw a ton of HADR type last_wait_types. We can see at least 1 thread per AG being the ‘preemptive Lease mechanism’ thread, and a bunch of other HADR type waits and also Broker and some others – but how do we deterministically figure out “what threads” are counted or not towards max worker thread limit?
select is_preemptive,state,last_wait_type,count(*) as NumWorkers from sys.dm_os_workers
Group by state,last_wait_type,is_preemptive
order by count(*) desc
So then the bigger question as to whether these threads are “counted” towards the Max worker threads or not. Bob Dorr had posted about the HADR Worker Pool earlier and from an AlwaysON Availability Groups perspective there is a pool of workers that are capped to a number that is [ Max worker threads – 40 ] but what we discovered is that these seemingly do not count towards the Max worker thread configured value and we had to prove that.
Looking at source, discovered that when we spawn a new worker thread, we calculate an IdealWorkerLimit and compare that with Max worker threads. In that calculation, if a task that is bound to a worker is marked as a “Permanent Task”, these are not “counted” towards the worker thread limit. So there are a bunch of System threads that spawn of Permanent tasks and as such as not counted towards the configured ‘max worker thread’ configured limit including but not limited to:
- Always On ( HADR Notification, HADR Controller etc )
- Lock Manager
- Log Shipping worker
- Some Fulltext
- SQL Trace
- Some of the Transport Threads such as Service Broker
- Some of the Recovery threads
Currently in sys.dm_os_tasks or in sys.dm_os_schedulers we don’t expose a column that exposes if a task is a “Permanent Task” if you will. You could roughly extrapolate it somewhat from the query on workers posted earlier, or looking at Exec requests as below.
select last_wait_type, count(*) as NumRequests from sys.dm_exec_requests
group by last_wait_type
Luckily sys.dm_exec_sessions has a column is_user_process that is exposed. So we can see that there are 385 requests that are “System” and 502 that are user
select is_user_process,count(*) as RequestCount from sys.dm_exec_sessions s
inner join sys.dm_exec_requests r
on s.session_id = r.session_id
group by is_user_process
Knowing this we can further get additional details including what type of command it is running, and many other details by modifying the query below. The query below is only listing threads that are “not” being counted against the ‘max worker thread limit’
;with cte as
from dm_exec_sessions s
inner join dm_exec_requests r
on s.session_id = r.session_id
inner join dm_os_tasks t
on r.task_address = t.task_address
inner join dm_os_workers w
on t.worker_address = w.worker_address
where s.is_user_process = 0
count(*) as cmd_cnt
group by is_user_process,command, last_wait_type
order by cmd_cnt desc
Moral of the story: Max worker threads limit does NOT have a hard cap in terms of accounting for all the System Tasks but we have demonstrated a way to approximate which tasks those are.
As a follow up, Tom will be posting another article with more details on monitoring HADR thread usage through a few other avenues.
Denzil Ribeiro – Sr. Premier Field Engineer