One of the most common issues I deal with are related to blocking. Some of this are pretty straight forward to identify, Others not that easy, especially if you have a process with many threads. When I have to troubleshoot blocking on WaitHandles (e.g Mutex) I usually find the need to use both user mode and kernel mode debugging to identify the guilty thread.

For demonstration purposes I’ve built a simple sample where a process appears hung.

 

The first thing to do is to attach the debugger to the process. My sample was built with CLR 4.0 so no more mscorwks (now it´s called clr.dll). Let’s start by loading the SOS extension.

 

0:007> .loadby sos clr

 

After looking at the threads stack with !clrstack we can see a thread blocked on WaitOne.

 

(managed stack)

0:007> !clrstack

OS Thread Id: 0x1b18 (7)

Child SP IP       Call Site

04fbf4f4 76fc00fd [HelperMethodFrame_1OBJ: 04fbf4f4] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)

04fbf59c 6a4b6eff System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)

04fbf5b8 6a4b6ebd System.Threading.WaitHandle.WaitOne(Int32, Boolean)

04fbf5d0 6a4b6e80 System.Threading.WaitHandle.WaitOne()

04fbf5d8 002006c6 BlockingQueueWithManualResetEvents`1[[System.__Canon, mscorlib]].Enqueue(System.__Canon)

04fbf610 00200666 ThreadsManaged.Program+<>c__DisplayClass4.<Main>b__3()

04fbf61c 6a4d0bea System.Threading.ThreadHelper.ThreadStart_Context(System.Object)

04fbf628 6a500b6c System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

04fbf64c 6a500aac System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)

04fbf668 6a4d0b74 System.Threading.ThreadHelper.ThreadStart()

04fbf88c 6b1d1f02 [GCFrame: 04fbf88c]

04fbfb50 6b1d1f02 [DebuggerU2MCatchHandlerFrame: 04fbfb50]

 

(native stack)

0:007> knL30

# ChildEBP RetAddr 

00 0508ef74 75c90962 ntdll!ZwWaitForMultipleObjects+0x15

01 0508f010 7607162d KERNELBASE!WaitForMultipleObjectsEx+0x100

02 0508f058 665cd0b6 KERNEL32!WaitForMultipleObjectsExImplementation+0xe0

03 0508f0bc 665cce8f clr!WaitForMultipleObjectsEx_SO_TOLERANT+0x56

04 0508f0dc 665ccd11 clr!Thread::DoAppropriateAptStateWait+0x4d

05 0508f170 665ccdd2 clr!Thread::DoAppropriateWaitWorker+0x17d

06 0508f1dc 6661a384 clr!Thread::DoAppropriateWait+0x60

07 0508f2dc 658f6eff clr!WaitHandleNative::CorWaitOneNative+0x196

 

My next step is to look at objects on this thread stack. We can see on top a Mutex sync object. Let’s dig into that one!!

 

0:007> !dso

OS Thread Id: 0x1b18 (7)

ESP/REG  Object   Name

04FBF53C 0230c0bc Microsoft.Win32.SafeHandles.SafeFileHandle

04FBF574 0230bc8c Microsoft.Win32.SafeHandles.SafeWaitHandle

04FBF59C 0230bc08 System.Threading.Mutex

04FBF5BC 0230bf0c System.Threading.ThreadStart

(…)

 

Below are the details on this Mutex object. Here we are interested on the waitHandle

 

0:007> !do 0230bc08

Name:        System.Threading.Mutex

MethodTable: 6a5643c4

EEClass:     6a337fb0

Size:        24(0x18) bytes

File:        C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll

Fields:

      MT    Field   Offset                 Type VT     Attr    Value Name

6a58bcb8  40001cd        4        System.Object  0 instance 00000000 __identity

6a5642b4  4000690        c        System.IntPtr  1 instance      1b8 waitHandle

6a5657c0  4000691        8 ...es.SafeWaitHandle  0 instance 0230bc8c safeWaitHandle

6a59168c  4000692       10       System.Boolean  1 instance        1 hasThreadAffinity

6a5642b4  4000693      b74        System.IntPtr  1   shared   static InvalidHandle

    >> Domain:Value  008056b8:ffffffff <<

6a59168c  40006dd      b80       System.Boolean  1   shared   static dummyBool

    >> Domain:Value  008056b8:1 <<

6a5643c4  40006de      180 ...m.Threading.Mutex  0   shared   static s_ReservedMutex

    >> Domain:Value  008056b8:00000000 <<

 

!handle is the command to see details on an existing handle. For more information on !handle please check the debugging tools documentation.

 

0:007> !handle 1b8 f

Handle 1b8

  Type                   Mutant

  Attributes          0

  GrantedAccess 0x1f0001:

         Delete,ReadControl,WriteDac,WriteOwner,Synch

         QueryState

  HandleCount    2

  PointerCount   4

  Name                 <none>

  Object Specific Information

    Mutex is Owned

 

We can see clearly that this mutex is owned by someone but it doesn´t state who is the owner.

Since we are dealing with kernel objects, some information is kept on internal kernel structures. In order to access this information we can use the kernel debugger  KD.EXE (it comes with the same package as Windbg.exe). As a side note please be aware that when using this what you are seeing is a snapshot and therefore it might sometimes not be enough to troubleshoot the issue.

 

Start by opening a command prompt on the debugger directory and issue kd /kL (kL is the option for local kernel debugging). Here you might face a “little” issue because if you haven´t enabled local kernel debugging you will have to do it now and that requires a reboot  (causing you to lose the opportunity to dig further on the issue in your application and having to wait for it to occur again)

 

 

C:\Debuggers\Debugging Tools for Windows (x64)>kd /kL

 

Microsoft (R) Windows Debugger Version 6.12.0001.577 AMD64

Copyright (c) Microsoft Corporation. All rights reserved.

 

Local kernel debugging is disabled by default since Windows Vista, you must run "bcdedit -debug on" and reboot to enable it.

Debuggee initialization failed, HRESULT 0x80004001

    "Not implemented"

 

C:\Debuggers\Debugging Tools for Windows (x64)>bcdedit  -debug on

The operation completed successfully.

 

o   After enabling debug I can start the debugger on kernel mode (it´s a bit ugly since it´s in the command prompt window, no pretty GUI. Also ensure that your user mode session is running and not stopped on the debugger). My first step would be to configure symbols if they aren´t already configured (for more information on configuring symbols please check www.microsoft.com/whdc/devtools/debugging/debugstart.mspx ).

Additionaly I always like to set !sym noisy so that I can see everything the debugger is doing when loading symbols.

 

lkd> .sympath SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

Expanded Symbol search path is: srv*c:\symbols*http://msdl.microsoft.com/download/symbols

lkd> !sym noisy

noisy mode - symbol prompts on

 

Our first step in kernel mode is to find our process (in my case the process is called ThreadsManaged.exe) . To do that issue the command below (in yellow). As a result we are interested the process address (in green)

 

 

lkd> !process 0 0 ThreadsManaged.exe

PROCESS fffffa8009a3c7e0

    SessionId: 1  Cid: 006c    Peb: 7efdf000  ParentCid: 073c

    DirBase: 15849000  ObjectTable: fffff8a012fa3410  HandleCount: 138.

    Image: ThreadsManaged.exe

 

Now with the process address let’s see a bit detail about threads on this process. Our next step will require some cross information between our user mode session and kernel session.

 

lkd> !process fffffa8009a3c7e0 2

PROCESS fffffa8009a3c7e0

    SessionId: 1  Cid: 006c    Peb: 7efdf000  ParentCid: 073c

    DirBase: 15849000  ObjectTable: fffff8a012fa3410  HandleCount: 140.

    Image: ThreadsManaged.exe

 

        THREAD fffffa8009162b60  Cid 006c.1974  Teb: 000000007efdb000 Win32Thread: fffff900c2d54990 WAIT: (WrLpcReply) UserMode Non-Alertable

            fffffa8009162f20  Semaphore Limit 0x1

 

        THREAD fffffa800b7c3b60  Cid 006c.1b20  Teb: 000000007efd8000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable

            fffffa8009980c10  SynchronizationEvent

            fffffa8009a50210  SynchronizationEvent

            fffffa80077dfce0  SynchronizationEvent

 

        THREAD fffffa800b874b60  Cid 006c.04f4  Teb: 000000007efd5000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable

            fffffa8006d6cb20  NotificationEvent

            fffffa800d6afb90  SynchronizationEvent

            fffffa800737c9a0  SynchronizationEvent

 

        THREAD fffffa8007d83760  Cid 006c.1b3c  Teb: 000000007efad000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Alertable

            fffffa800b780060  SynchronizationTimer

            fffffa8008d5b060  SynchronizationTimer

            fffffa800d6eec20  SynchronizationTimer

 

        THREAD fffffa800dd75060  Cid 006c.1b34  Teb: 000000007efaa000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Alertable

            fffffa8007779fe0  NotificationEvent

 

        THREAD fffffa8007066b60  Cid 006c.0d10  Teb: 000000007efa7000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Alertable

            fffffa8007779fe0  NotificationEvent

 

        THREAD fffffa800d51a060  Cid 006c.1894  Teb: 000000007efa4000 Win32Thread: 0000000000000000 WAIT: (DelayExecution) UserMode Alertable

            fffffa800d51a338  Semaphore Limit 0x2

 

        THREAD fffffa800dd68060  Cid 006c.1b18  Teb: 000000007efa1000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Alertable

            fffffa800a1057d0  Mutant - owning thread fffffa800d51a060

 

        THREAD fffffa8007932b60  Cid 006c.0e68  Teb: 000000007ef9e000 Win32Thread: 0000000000000000 WAIT: (WrQueue) UserMode Alertable

            fffffa800a112aa0  QueueObject

 

        THREAD fffffa8009a1e060  Cid 006c.1180  Teb: 000000007ef9b000 Win32Thread: 0000000000000000 WAIT: (WrQueue) UserMode Alertable

            fffffa800d6f4060  QueueObject

 

        THREAD fffffa8007ab2060  Cid 006c.02e4  Teb: 000000007ef98000 Win32Thread: 0000000000000000 WAIT: (WrQueue) UserMode Alertable

            fffffa800d6f4060  QueueObject

 

So looking above we see that thread with cid (<pid>.<tid>) 006c.1b18 (in yellow) shows a Mutant (a.k.a Mutex) and the owner of this object is thread at address fffffa800d51a060 (in green). This thread as a Cid 006c.1894.

 

Back to our user mode session, break again in the debugger and list existing threads. Now just search for those cid in the output (below)

 

0:007> ~

   0  Id: 006c. 1974  Suspend: 1 Teb: 7efdb000 Unfrozen

   1  Id: 006c. 1b20  Suspend: 1 Teb: 7efd8000 Unfrozen

   2  Id: 006c. 04f4  Suspend: 1 Teb: 7efd5000 Unfrozen

   3  Id: 006c. 1b3c  Suspend: 1 Teb: 7efad000 Unfrozen

   4  Id: 006c. 1b34  Suspend: 1 Teb: 7efaa000 Unfrozen

   5  Id: 006c. 0d10  Suspend: 1 Teb: 7efa7000 Unfrozen

   6  Id: 006c. 1894  Suspend: 1 Teb: 7efa4000 Unfrozen

.  7  Id: 006c. 1b18  Suspend: 1 Teb: 7efa1000 Unfrozen

(…)

 

So remember from the beginning that our thread waiting was thread 7. Cross the information you got from the kernel session and you see that the owner thread on this mutex is thread 6. Below is the managed stack from thread 6 (the one responsible for the block).

 

OS Thread Id: 0x1894 (6)

Child SP IP       Call Site

0511f638 76fbfd31 [HelperMethodFrame: 0511f638] System.Threading.Thread.SleepInternal(Int32)

0511f684 6a5341b9 System.Threading.Thread.Sleep(Int32)

0511f688 002006e3 BlockingQueueWithManualResetEvents`1[[System.__Canon, mscorlib]].Enqueue(System.__Canon)

0511f6c0 00200616 ThreadsManaged.Program+<>c__DisplayClass4.<Main>b__2()

0511f6cc 6a4d0bea System.Threading.ThreadHelper.ThreadStart_Context(System.Object)

0511f6d8 6a500b6c System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)

0511f6fc 6a500aac System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)

0511f718 6a4d0b74 System.Threading.ThreadHelper.ThreadStart()

0511f93c 6b1d1f02 [GCFrame: 0511f93c]

0511fc00 6b1d1f02 [DebuggerU2MCatchHandlerFrame: 0511fc00]

 

It´s not often the we have the need to look both in kernel and user mode and the case above was a simple example where probably a user mode debugging session along with some code review would be enough to find the thread owning the kernel object. But sometimes this is not the case and it comes in hand to know what more options and tools we have to help us.

 

Have fun!!

Bruno