In most cases, people utilize SQL Server features and capabilities in a common and typical usage pattern.  But there are some instances where environments do something just a little different, or fill the bucket to the absolute brim.  With the advent of AlwaysOn Availability Groups, it is such a robust high availability and disaster recovery solution that as time goes on database professionals are going to think of really cool and interesting ways to stretch it wide, solving problems that we may not have originally thought of.  What brought on the deep dive of this specific topic was that a customer was looking to see what potential CPU problems could arise with 40+ availability groups, and a multiple of that of availability group databases. One of the most devastating bottlenecks is when you start running into exhausted CPUs, but with proper planning and focused monitoring you can minimize that risk and ensure existing functionality.

 Take an instance where you want to create and implement a large number of availability groups and/or availability databases.  You might be far exceeding what others have published and tested with.  When doing this, you will want to keep a close eye on what’s called the HADR Worker Pool.  This is basically the shared pool of workers across HADR databases.  For more information on specifics of what these tasks would include, please see Bob Dorr’s blog post on the topic. Regardless of what these workers are doing, the overconsumption of them could lead to less-than-optimal performance. As my peer, Denzil Riberio, previously wrote about in his blog post on actual worker threads exceeding max worker threads the contention may not surface itself in the form of a hard cap (although the max worker threads does contribute largely to the calculation of the worker limit, as explained below), but in other forms of CPU/resource contention.

When we run into resource contention, we instinctively do our basic troubleshooting on waits and queues, but the root of the problem may not be completely evident with traditional means.  So in order to fill in the blanks, we can look at a handful of corner strategies for troubleshooting HADR worker thread contention. 

SQL Server lends us a handful of ways to monitor and troubleshoot this, including XE events. The first one I want to look at is hadr_worker_pool_task.  Bob mentions this in his blog post, and it’s not necessarily the actual data and task information I’m concerned with normally (although this information is valuable and vital in dissecting specific problems and abnormal behavior). It’s the count of worker pool tasks that I will utilize for a few reasons: baselining during normal operation, and troubleshooting during abnormal situations.  All I pull is the count of these events (utilizing the event_counter XE target), and a comparison of current vs. baseline can provide whether or not HADR is simply doing more now than it typically does.

My first step is to create the XE session to start capturing this data:

create event session HadrWorkerPoolTask

on server

add event sqlserver.hadr_worker_pool_task

add target package0.event_counter

with

(

       max_memory = 4096 KB,

       event_retention_mode = allow_single_event_loss,

       max_dispatch_latency = 30 seconds,

       max_event_size = 0 KB,

       memory_partition_mode = none,

       track_causality = off,

       startup_state = on

);

go

 

Because the event_counter target is a running total, I want a way to have historical data and compare the deltas between two different times.  I accomplish this by creating a table in a utility database (in my example, the database is named DBA):

use DBA;

go

create table dbo.HadrWorkerPoolTask

(

       TaskLogId int identity(1, 1) not null

             constraint PK_HadrWorkerPoolTask_TaskLogId primary key clustered,

       LogDateTime datetime2(2) not null,

       TaskCount bigint not null

);

go

 

In order to pull the data from the event_counter, I wrote the following query:

;with cte as

(

       select

             convert(xml, t.target_data) as target_data

       from sys.dm_xe_session_targets t

       inner join sys.dm_xe_sessions s

       on t.event_session_address = s.address

       where s.name = 'HadrWorkerPoolTask'

       and t.target_name = 'event_counter'

)

select

       a.b.value('./@count', 'varchar(max)') as count

from cte

cross apply cte.target_data.nodes('CounterTarget/Packages/Package/Event') a(b);

 

This will give you the point-in-time count of this event, but I created the dbo.HadrWorkerPoolTask to routinely dump this count for tracking.  Modifying the above query slightly in order to persist the event counts, I end up with the following:

;with cte as

(

       select

             convert(xml, t.target_data) as target_data

       from sys.dm_xe_session_targets t

       inner join sys.dm_xe_sessions s

       on t.event_session_address = s.address

       where s.name = 'HadrWorkerPoolTask'

       and t.target_name = 'event_counter'

)

insert into dbo.HadrWorkerPoolTask(LogDateTime, TaskCount)

select

       getdate(),

       a.b.value('./@count', 'varchar(max)') as count

from cte

cross apply cte.target_data.nodes('CounterTarget/Packages/Package/Event') a(b);

 

In order to ensure this query is executed routinely, you can create a SQL Server Agent job to execute it every x minutes (or hours, or days, or whatever unit of granularity you are looking to achieve.  Just remember that the less frequent you grab and store the count of this event, your points of reference will blur and linearize the times in between possibly hiding trends).  So at this point in monitoring you now have a running log of total event counts for the hadr_worker_pool_task.  But we need to take this a step further in querying to get the deltas instead of the cumulative amount.  The below query will do just that, compare each time slot to the one sequentially before it and do the subtraction of the task count to get the delta:

select

       t1.LogDateTime,

       t1.TaskCount - t2.TaskCount as HadrTaskCount

from dbo.HadrWorkerPoolTask t1

inner join dbo.HadrWorkerPoolTask t2

on t1.TaskLogId = t2.TaskLogId + 1;

 

You can take this query and dump the result set into Excel, create an SSRS report, or do whatever you typically do to consume diagnostic data.  Here is a little sample of what it may look like (I just copy-pasted into Excel and created a line chart):

clip_image002

Now I have a nice chart of the HADR activity for this time period, and if/when something goes awry I can execute the same query for the current time span and compare the two charts.

 

Another piece of data that I like to keep an eye on is the hadr_thread_pool_worker_start event. This event fires when SQL Server starts up a new worker if there are no idle workers, or if the amount of required workers exceeds the active worker count (all the while staying within the bounds of the worker limit count).  We can consume a couple of things from this event: (1) the recordable demand that HADR is putting on the system by requiring additional workers, and (2) we can watch for the event where we simply can’t create another HADR worker, possibly due to hitting the worker limit (that limit is going to be the effective max worker threads [minus] 40, as explained by Bob Dorr).  If SQL Server is unable to start a new HADR worker then it will conveniently tell you this in the event’s output.

In order to create this event, I have implemented the below session definition:

create event session HadrThreadPoolWorkerStart

on server

add event sqlserver.hadr_thread_pool_worker_start

add target package0.event_file

(

       set filename = N'<path to XEL file>\HadrThreadPoolWorkerStart.xel'

)

with

(

       max_memory = 4096 KB,

       event_retention_mode = allow_single_event_loss,

       max_dispatch_latency = 30 seconds,

       max_event_size = 0 KB,

       memory_partition_mode = none,

       track_causality = off,

       startup_state = on

);

go

 

To view this session data, you can use the GUI (accessible through viewing the target or live data), or you can use a query like the one below:

declare @top_count int;

set @top_count = 100;

 

;with xe_cte as

(

       select

             object_name,

             cast(event_data as xml) as event_data

       from sys.fn_xe_file_target_read_file

       (

             '<path to XEL file>\HadrThreadPoolWorkerStart*.xel',

             null,

             null,

             null

       )

)

select top (@top_count)

       event_data.value('(/event/@timestamp)[1]', 'varchar(32)') as time_stamp,

       event_data.value('(/event/data/value)[3]', 'int') as active_workers,

       event_data.value('(/event/data/value)[2]', 'int') as idle_workers,

       event_data.value('(/event/data/value)[1]', 'int') as worker_limit,

       event_data.value('(/event/data/value)[4]', 'varchar(5)') as worker_start_success

from xe_cte

order by time_stamp desc;

 

This query will return a similar output:

clip_image004

The notable results of this query are active_workers vs. worker_limit (are we getting close to, or have we reached, the worker limit?), and worker_start_success (were we able to start a new HADR worker when we needed one?).  If you are consistently seeing worker_start_success as false (indicating SQL Server’s inability to start a new HADR worker) then you could be running into HADR worker pool starvation.  In the event of excessive worker thread contention and long term failures to start a worker, after 15 minutes there will be a logged message in the error log.  It is message id 35217, and in your error log the message will read:

“The thread pool for AlwaysOn Availability Groups was unable to start a new worker thread because there are not enough available worker threads. This may degrade AlwaysOn Availability Groups performance.  Use the "max worker threads" configuration option to increase number of allowable threads.

As an aside, where does worker_limit come from?  It is effective max worker threads – 40.  So on my single CPU VM, that’ll be 512 – 40 = 472.

There are other considerations when monitoring and troubleshooting HADR worker consumption and task quantity, but what this all boils down to is CPU contention.  Going back to the root of the problem, we can utilize other tried-and-true strategies to analyze this resource contention.  For instance, take note of the summary of workers and runnable tasks from the sys.dm_os_schedulers DMV:

select

       scheduler_id,

       status,

       current_tasks_count,

       current_workers_count,

       active_workers_count,

       work_queue_count,

       runnable_tasks_count,

       load_factor

from sys.dm_os_schedulers

where status = 'VISIBLE ONLINE';

 

Things to watch out for here are counts of work_queue_count and runnable_tasks_count consistently above zero.  Another query to have in your toolbox, looking for wait types associated with thread starvation:

select

       wait_type,

       count(*) as wait_type_count

from sys.dm_os_waiting_tasks

group by wait_type

order by wait_type_count desc;

 

This above query will give you a count of all current waiting tasks’ associated wait types.

 Lastly, when troubleshooting busy schedulers, I like to see what percentage of the total wait time the signal wait time is (signal wait time is the amount of time that tasks waited between being signaled and starting).  A sample query for this could be as follows:

;with cpu_wait_cte as

(

       select

             sum(wait_time_ms) as sum_wait_time_ms,

             sum(signal_wait_time_ms) as sum_signal_wait_time_ms

       from sys.dm_os_wait_stats

)

select

       convert(

             decimal(5, 2), sum_signal_wait_time_ms * 1.0 / sum_wait_time_ms * 100

       ) as signal_wait_time_percentage

from cpu_wait_cte;

 

This will be the cumulative stats from the last time the instance started (or this DMV was explicitly cleared), so you may need mold this query into a delta query from a start time to end time (or manually clear this DMV) instead of grabbing the wait stats in its entirety.  For all intents and purposes, the above query will give what percentage of the summation of time spent waiting that the signal wait time is.  This value should be frequently baselined and compared when troubleshooting possible CPU contention.  This could bring to light a higher amount of time waiting on the CPU to pick the task back up, possibly indicating CPU pressure.

I have illustrated a few possible approaches to monitoring and troubleshooting possible HADR worker thread issues and tying it all back to generic CPU contention problems.  With the right approach, you can maximize visibility on a commonly foggy aspect of HADR worker thread contention.

 

Thomas Stringer – SQL Server Premier Field Engineer

Twitter: @SQLife