I recently worked on an interesting system hang issue with a deadlocked work queue and wanted to share some information about how we resolved the issue.
In this example, we will demonstrate how a cascaded work item can deadlock a work queue. As you can see from the illustration, we have three worker threads. Most work queues have 16 or more threads servicing them, but to simplify this example, three worker threads will be used. In the first illustration, all worker threads are waiting for the queue’s event to be signaled (indicating that a work item was inserted into the queue).
Thread ~ac0.001 is a process or service related thread that is currently in a running state (indicated by the color blue). This thread needs some work to be accomplished in an arbitrary context. To do this, the thread constructs an IO_WORK_ITEM and specifies a callback routine to be performed by the worker thread(s).
Fig. 1 – All worker threads available
Once the required work items have been defined, our thread would call IoQueueWorkItem for each item. This would signal the queue’s notification KEVENT that there is work to be done. In figure 2, we can see the work items inserted into the queue (Fig. 2).
Fig. 2 – Thread inserts work items into the queue
At this point, the work queue’s event is signaled to wake up the worker(s). Once running, each of the threads removes a work item from the queue and runs the function associated with the work item.
Since each work item has a completion event, this event will be signaled when the worker is finished processing the item. Because of this, thread ~ac0.001 would normally call KeWaitForMultipleObjects to wait for the work item completion events to be signaled (Fig. 3).
Fig. 3 – Requesting thread waits for completion
Upon completion of a work item, each of the worker threads would normally signal the corresponding work item event(s) and then go back into a wait state for new work items and thread ~ac0.001 would continue execution.
Now let’s look at what happens when workers need to enqueue one or more additional work items to the same queue. This is not a problem as long as there is at least one available worker thread to service these cascaded work items. The available worker(s) would simply handle these and eventually this situation would resolve itself (Fig. 4).
Fig. 4 – Work items 4, 5, and 6 will be handled by Thread ~008.004
If there are no more available workers, work items 4, 5, and 6 will never be handled. This results in an unresolvable situation (Fig. 5). No more items can be removed from the queue.
Fig. 5 – No available worker threads to handle items 4, 5, and 6
The result is a deadlock on the queue (Fig. 6). To avoid this type of bug, a developer must take cascaded work items into consideration. It is not ideal for a worker routine to enqueue work to its own queue, but it is a common practice in the software industry. To avoid deadlocks, a developer may want to consider using a separate work queue for these cascaded items. If this is not possible, the developer should ensure that there are sufficient worker threads to handle any reasonable load scenario.
Fig. 6 – Deadlocked work queue
In summary, exercise caution when writing your worker routines and be sure to do plenty of stress testing to ensure your driver’s design is feasible in real world scenarios.
A few weeks ago, I received a memory dump from a customer who was reporting a daily system hang.
One of the first things I do with a hang dump is to do a quick assessment with !VM, !Locks, and !exqueue 6. If there's something obvious going on, those three debugger commands will usually get things moving in the right direction.
In this case, the output of !exqueue 6 caught my eye. In the Critical Work Queue, all critical system worker threads were blocking on a wait operation issued by backyard_driver. It had called KeWaitForSingleObject and was waiting for notification events. I also noticed a large number of pending work items for backyard_driver and there were two different worker routines in these items.
PENDING: IoWorkItem backyard_driver+0x830768F2 (0001e228)
IoObject (8823e148) Context (b40e4384)
PENDING: IoWorkItem backyard_driver+0x830767da (0001e110)
IoObject (8823e148) Context (b40be9f0)
From looking at the thread stacks, I could tell that the waiting worker routines were from the worker routine at +0x1e110, and the pending items were for a worker routine starting at offset +0x1e228.
8df6fb8c 96e3cccd 85801aa0 00000000 00000000 nt!KeWaitForSingleObject+0x393 (CONV: stdcall)
WARNING: Stack unwind information not available. Following frames may be wrong.
8df6fbe4 96e3d5bd 88266698 00000103 8df6fc0c backyard_driver+0x1cccd
8df6fbf4 82e924bc 88266698 8af1cb20 88267f00 backyard_driver+0x1d5bd
8df6fc0c 96eeb0ac 82e924bc 88267f00 8af1cb20 nt!IofCallDriver+0x63 (CONV: fastcall)
8df6fc28 96f08bbc 96ef64d0 859c7220 859893c8 filterdriver1+0x50ac
8df6fc40 96ef6516 85aea488 858b2e30 00000000 filterdriver2+0x5cc
8df6fc60 96f07cd1 859893c8 00000000 00000000 filterdriver2+0x2516
8df6fc7c 82ebeb33 88228c18 85afb628 859893c8 filterdriver2+0x2f1
8df6fcc4 96e3e0fe b40d625c b40d615c 8df6fcec nt!IopfCompleteRequest+0x128 (CONV: fastcall)
8df6fcd4 96e3e14d b40d6100 8b1cbaa8 8823e148 backyard_driver+0x1e0fe
8df6fcec 830767fd 8823e148 b40d615c 857524f8 backyard_driver+0x1e14d
8df6fd00 82ec3f3b 8b1cbaa8 00000000 857524f8 nt!IopProcessWorkItem+0x23 (CONV: stdcall)
8df6fd50 830646bb 00000000 ae892e65 00000000 nt!ExpWorkerThread+0x10d (CONV: stdcall)
8df6fd90 82f160f9 82ec3e2e 00000000 00000000 nt!PspSystemThreadStartup+0x9e (CONV: stdcall)
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
This means that one worker routine was calling IoQueueWorkItem for the same queue and then waiting for completion. This is fine as long as there are one or more worker threads available to process the cascaded work items, but in this case all threads were busy waiting for other work items in the same queue to complete. The queue was essentially deadlocked.
0: kd> !exqueue 6
Dumping ExWorkerQueue: 82F90580
**** Critical WorkQueue( current = 0 maximum = 4 )
THREAD 857524f8 Cid 0004.0018 Teb: 00000000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
Owning Process 8568b600 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 30540 Ticks: 63029 (0:00:16:23.258)
Context Switch Count 27640
Win32 Start Address nt!ExpWorkerThread (0x82ec3e2e)
Stack Init 8df6ffd0 Current 8df6fa98 Base 8df70000 Limit 8df6d000 Call 0
Priority 13 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
*** ERROR: Module load completed but symbols could not be loaded for backyard_driver.sys
*** ERROR: Module load completed but symbols could not be loaded for filterdriver1.sys
*** ERROR: Symbol file could not be found. Defaulted to export symbols for filterdriver2.SYS -
ChildEBP RetAddr Args to Child
8df6fab0 82ec4b25 857524f8 00000000 82f7fd20 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
8df6fae8 82ec3423 857525b8 857524f8 85801aa0 nt!KiSwapThread+0x266 (CONV: fastcall)
8df6fb10 82ebd2cf 857524f8 857525b8 00000000 nt!KiCommitThreadWait+0x1df (CONV: stdcall)
<additional threads were found here with similar stacks>
THREAD 856be798 Cid 0004.01d0 Teb: 00000000 Win32Thread: 00000000 WAIT: (Executive) KernelMode Non-Alertable
Context Switch Count 1
Stack Init bedddfd0 Current beddda98 Base bedde000 Limit beddb000 Call 0
bedddab0 82ec4b25 856be798 00000000 82f7fd20 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
bedddae8 82ec3423 856be858 856be798 85bc4ac8 nt!KiSwapThread+0x266 (CONV: fastcall)
bedddb10 82ebd2cf 856be798 856be858 00000000 nt!KiCommitThreadWait+0x1df (CONV: stdcall)
bedddb8c 96e3cccd 85bc4ac8 00000000 00000000 nt!KeWaitForSingleObject+0x393 (CONV: stdcall)
bedddbe4 96e3d5bd 88266698 00000103 bedddc0c backyard_driver+0x1cccd
bedddbf4 82e924bc 88266698 85b2a8d8 88267f00 backyard_driver+0x1d5bd
bedddc0c 96eeb0ac 82e924bc 88267f00 85b2a8d8 nt!IofCallDriver+0x63 (CONV: fastcall)
bedddc28 96f08bbc 96ef64d0 85b33218 8b056f00 filterdriver1+0x50ac
bedddc40 96ef6516 85e27c90 85e2b9f0 00000000 filterdriver2+0x5cc
bedddc60 96f07cd1 8b056f00 00000000 00000000 filterdriver2+0x2516
bedddc7c 82ebeb33 88228c18 85aae320 8b056f00 filterdriver2+0x2f1
bedddcc4 96e3e0fe b40e2ae8 b40e29e8 bedddcec nt!IopfCompleteRequest+0x128 (CONV: fastcall)
bedddcd4 96e3e14d b40e2900 859cdee0 8823e148 backyard_driver+0x1e0fe
bedddcec 830767fd 8823e148 b40e29e8 856be798 backyard_driver+0x1e14d
bedddd00 82ec3f3b 859cdee0 00000000 856be798 nt!IopProcessWorkItem+0x23 (CONV: stdcall)
bedddd50 830646bb 80000000 9da20e65 00000000 nt!ExpWorkerThread+0x10d (CONV: stdcall)
bedddd90 82f160f9 82ec3e2e 80000000 00000000 nt!PspSystemThreadStartup+0x9e (CONV: stdcall)
IoObject (8823e148) Context (b40e45b8)
IoObject (8823e148) Context (b40e47ec)
IoObject (8823e148) Context (b40e4d88)
IoObject (8823e148) Context (882aea8c)
IoObject (8823e148) Context (85807bd4)
PENDING: IoWorkItem backyard_driver+0x830767da (0001e228)
IoObject (8823e148) Context (8580cca0)
IoObject (8823e148) Context (85811e54)
IoObject (8823e148) Context (8581713c)
IoObject (8823e148) Context (8581c424)
It turns out that this issue was triggered by a sudden surge of IO related to a PNP arrival. To prevent the issue from occurring, we added a registry value to allow for additional critical worker threads to be created.
AdditionalCriticalWorkerThreads : 0x14
For more information about this registry key, refer to http://msdn.microsoft.com/en-us/library/ee377058(BTS.10).aspx
That’s all for now.
“The NTFS Doctor”
Thanks for posting, i was expecting some driver update at the end but that registry tweak surprised me a little :)
Thank you for clarification! Nice post!
A good advice: NEVER EVER wait for anything (other than a mutex acquisition) in a work item. Saves a lot of grief. Even issuing a synchronous I/O should be avoided, if possible.
Also, NEVER wait in your POWER IRP handler, or touch paged memory (or issue registry operations). Even though the handler is running on PASSIVE_LEVEL.
Staged, not staged work items, you may not know to what queue it will go. It may go to your queue, even if there are more queues available. Adding work threads is not a solution. It's just a hack.
[Yes. I also tend to be idealistic and very disciplined when developing code. I am sure that there are developers who have accidentally painted themselves into a corner by doing things like this. For those developers, awareness is the key to change.]
[Ed: Certainly no one is saying that the driver can't be improved. However, this configuration change can provide immediate relief to a customer suffering with a server that frequently hangs.]