Tracking Down a Multi-Process Deadlock

Tracking Down a Multi-Process Deadlock

  • Comments 9

 

Hello, my name is Louis Shanks and my story starts off with a tricky little deadlock issue that I worked on recently. Let me set the stage by explaining the surface level symptom, and then we will get to the good stuff. Randomly when users running Vista would attempt to launch Internet Explorer nothing would happen. So they would again double-click the icon, and still nothing happened. So the users would reboot and go back to their web activities.

 

After some basic trouble shooting it was discovered that Iexplore.exe was running multiple times and every attempt to open the browser would result in another Iexplore.exe instance.  Since no UI was ever presented I went ahead and collected a few Iexplore.exe process dumps along with a complete system dump.

 

 The process dumps didn’t prove to be terribly useful so I decided to move over to the complete memory dump.

 

Since I knew we had multiple IExplore.exe processes running I wanted to investigate them first.

 

0: kd>!process 0 0

 

<snip>

PROCESS 87289d90  SessionId: 5  Cid: 0fdc    Peb: 7ffdc000  ParentCid: 0968

    DirBase: 7d66c520  ObjectTable: a203a178  HandleCount: 205.

    Image: iexplore.exe

 

PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

    DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

    Image: ieuser.exe

 

PROCESS 86c175b0  SessionId: 5  Cid: 1250    Peb: 7ffd9000  ParentCid: 0968

    DirBase: 7d66c940  ObjectTable: a0853f10  HandleCount:  91.

    Image: iexplore.exe


PROCESS 861ac7a8  SessionId: 5  Cid: 1024    Peb: 7ffde000  ParentCid: 0968

    DirBase: 7d66c620  ObjectTable: a83f7898  HandleCount:  91.

Image: iexplore.exe
</snip>

I decided to just start with the first one in the list (87289d90)

 

0: kd> !process 87289d90
    Image: iexplore.exe

    ElapsedTime                       00:24:06.666

 

It appears this process has been around a while, so I wanted to look at the first thread in this process as it is responsible for the UI in Internet Explorer.  I am looking at this thread because the surface level problem illustrates that when we launch Iexplore.exe the UI is never displayed. Here I used the !thread command to display the specific thread.

 

        THREAD 870b6398  Cid 0fdc.16a4  Teb: 7ffdf000 Win32Thread: fd850b78 WAIT: (UserRequest) UserMode Non-Alertable

            85e86cf0  SynchronizationEvent

            85cb35e0  SynchronizationEvent

        Not impersonating

        Attached Process          87289d90       Image:         iexplore.exe

        Wait Start TickCount      10817197       Ticks: 69571 (0:00:18:05.314)

        UserTime                  00:00:00.015

        KernelTime                00:00:00.093  

 

Based on the tick count this thread has been waiting for 18 minutes.

 

0: kd> kvn

# ChildEBP RetAddr  Args to Child             

00 931208d8 81a940a1 870b6398 803d2120 870b6420 nt!KiSwapContext+0x26

01 9312091c 81a8dacc 870b6398 00000000 00000002 nt!KiSwapThread+0x44f

02 93120970 81bedb4f 00000002 93120aa8 00000001 nt!KeWaitForMultipleObjects+0x53d

03 93120bfc 81bee2e3 00000002 00000001 00000000 nt!ObpWaitForMultipleObjects+0x256

04 93120d48 81a30b3a 00000002 0027e9c4 00000001 nt!NtWaitForMultipleObjects+0xcc

05 93120d48 77568254 00000002 0027e9c4 00000001 nt!KiFastCallEntry+0x12a

06 0027e970 775679fc 7708c973 00000002 0027e9c4 ntdll!KiFastSystemCallRet

07 0027e974 7708c973 00000002 0027e9c4 00000001 ntdll!NtWaitForMultipleObjects+0xc

08 0027ea10 7608ff95 0027e9c4 0027ea38 00000000 kernel32!WaitForMultipleObjectsEx+0x11d

09 0027ea64 773d3a46 00000028 0027eaac ffffffff USER32!RealMsgWaitForMultipleObjectsEx+0x13c

0a 0027ea8c 773d3b7f 0027eaac ffffffff 0027eabc ole32!CCliModalLoop::BlockFn+0x97

0b 0027eab4 774e75be ffffffff 003ddf90 0027ebc0 ole32!ModalLoop+0x5b

0c 0027ead0 774e9294 00000000 0027ebd4 00000000 ole32!ThreadSendReceive+0x12c

0d 0027eaf8 774e9100 0027ebc0 003d3538 0027ec1c ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x194

0e 0027ebd8 773d3c25 003d3538 0027ecfc 0027ece4 ole32!CRpcChannelBuffer::SendReceive2+0xef

0f 0027ebf4 773d3bd7 0027ecfc 0027ece4 003d3538 ole32!CCliModalLoop::SendReceive+0x1e

10 0027ec6c 77403efb 003d3538 0027ecfc 0027ece4 ole32!CAptRpcChnl::SendReceive+0x73

11 0027ecc0 76052ff4 003d3538 0027ecfc 0027ece4 ole32!CCtxComChnl::SendReceive+0x1c5

12 0027ecd8 76053055 003e50a4 0027ede4 76052e0d RPCRT4!NdrProxySendReceive+0x43

13 0027ece4 76052e0d 09ea0d59 0027f1fc 070001f3 RPCRT4!NdrpProxySendReceive+0xc

 

Based on the call stack it appears this thread made an OLE call and is now waiting for some objects to signal.

 

What is IExplore doing in this thread?  It is waiting on another process to service this OLE request.   We can find the PID of the other process by digging into the ole32 SendReceive call.

 

I looked at the first argument:

 

0: kd> dd 003d3538

003d3538  77403a50 773f57b4 00000003 00000002

003d3548  00000000 00000000 003d2ef0 003d8840

003d3558  0038b1c0 003e05f8 77403980 00070005

 

This gives us a pointer to another data structure that contains the information we are attempting to locate.


0: kd> dd 003d2ef0

003d2ef0  774ee600 003d2e70 00000d34 00000000

003d2f00  862941fa c94cde09 55a88424 801a2601

003d2f10  55a88424 801a2601 0000c400 00000d34

 

Ah there it is! 0d34 is the process id servicing our OLE request.

 

I went back and reviewed the !process 0 0 output to locate the process containing PID 0D34. I’m dumping it here.

 

PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

    DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

    Image: ieuser.exe

 

Ok, so the process handing this request is IEUSER.exe, which is the broker process used by Internet Explorer in Vista when running in protect mode.  This is new in Vista.

 

More on IEUser and Protected mode can be seen at the following location:
http://msdn.microsoft.com/en-us/library/bb250462.aspx

 

Now I needed to dig into the IEUser.exe process to find out what it was working on causing Iexplore to wait.

 

0: kd> !process 861a9d90 

PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

    DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

    Image: ieuser.exe


I discovered the thread in the ieuser.exe process working on our call from Iexplore.exe. However this thread is also waiting on access to something before it can complete.


0: kd> kvn+200

 # ChildEBP RetAddr  Args to Child             

00 a41b0c50 81a940a1 85ecbac0 85ecbb48 81af9920 nt!KiSwapContext+0x26

01 a41b0c94 81a2ddc8 85ecbac0 00000000 85e18510 nt!KiSwapThread+0x44f

02 a41b0ce8 81bf0d9f 85e18510 00000006 00000001 nt!KeWaitForSingleObject+0x492

03 a41b0d50 81a30b3a 00000224 00000000 00000000 nt!NtWaitForSingleObject+0xbe

04 a41b0d50 77568254 00000224 00000000 00000000 nt!KiFastCallEntry+0x12a

05 0163e8a4 77567a0c 7708c7c4 00000224 00000000 ntdll!KiFastSystemCallRet

06 0163e8a8 7708c7c4 00000224 00000000 00000000 ntdll!NtWaitForSingleObject+0xc

07 0163e918 7708c732 00000224 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xbe

08 0163e92c 747e60aa 00000224 ffffffff 00000000 kernel32!WaitForSingleObject+0x12

09 0163e948 747e5fe0 0163ec48 00497b38 00000000 RASAPI32!ReadPhonebookFileEx+0x1a4

0a 0163e968 747e4e9d 0163ec48 00000000 00000000 RASAPI32!ReadPhonebookFile+0x1b

0b 0163e9a8 747e52d3 0163ec48 0048f660 0163e9e0 RASAPI32!DwEnumEntriesFromPhonebook+0x35

0c 0163ee58 747e4fd4 0163ee9c 00000001 0048f660 RASAPI32!DwEnumEntriesInDir+0x1cc

0d 0163f0ac 747e677e 00000001 0048f660 0163f0f4 RASAPI32!DwEnumEntriesForPbkMode+0xab

0e 0163f0e0 76e534b9 00000000 00004180 0048f660 RASAPI32!RasEnumEntriesW+0xb0

0f 0163f10c 76e5342a 76ecc2dc 00000000 00000000 WININET!RasEnumHelp::RasEnumHelp+0x55

10 0163f120 76e533a3 00000001 00000000 00000001 WININET!DoConnectoidsExist+0x2b

11 0163f14c 76e534ff 76ecf02c 76ecc2dc 00000000 WININET!GetRasConnections+0x34

12 0163f168 76e4196e 00000000 0163f180 6ee3228c WININET!IsDialUpConnection+0xa9

13 0163f184 76e70564 00000001 00000000 00000000 WININET!FixProxySettingsForCurrentConnection+0x31

14 0163f1a8 6ec2db3b 00000000 0163f1bc 0163f3d4 WININET!InternetInitializeAutoProxyDll+0x6c

15 0163f1b8 75fe9142 004884a0 00000202 0047e900 ieframe!CShdocvwBroker::StartAutoProxyDetection+0x4d

16 0163f1d0 76054294 6ec2daf8 0163f3d8 00000001 RPCRT4!Invoke+0x2a

17 0163f5fc 76054a49 004858d0 0046e4e0 0047e900 RPCRT4!NdrStubCall2+0x27b

18 0163f64c 774e86d7 004858d0 0047e900 0046e4e0 RPCRT4!CStdStubBuffer_Invoke+0xa0

19 0163f694 774e877d 0047e900 0047f968 00480b38 ole32!SyncStubInvoke+0x3c

1a 0163f6e0 7740249e 0047e900 0047f860 004858d0 ole32!StubInvoke+0xb9

1b 0163f7bc 774023af 0046e4e0 00000000 004858d0 ole32!CCtxComChnl::ContextInvoke+0xfa

1c 0163f7d8 774e85df 0047e900 00000001 004858d0 ole32!MTAInvoke+0x1a

1d 0163f808 774e8a18 d0908070 0046e4e0 004858d0 ole32!AppInvoke+0xaa

1e 0163f8e4 774e85ef 0047e8a8 00474118 0047ebc0 ole32!ComInvokeWithLockAndIPID+0x32c

1f 0163f930 75fe94b0 0047ebc0 08ac70b7 0047ebc0 ole32!ThreadInvoke+0x2fd

 

I’m dumping the first parameter passed to KeWaitForSingleObject to determine the object type.

 

0: kd> !object 85e18510

Object: 85e18510  Type: (85484d40) Mutant

    ObjectHeader: 85e184f8 (old version)

    HandleCount: 3  PointerCount: 5

    Directory Object: 881f3030  Name: RasPbFile


Ok, we are waiting on a Mutex named RasPbFile. The big question in your mind now is, “who owns this lock”, right?

To determine this I’ll dump the structure for the mutex object:

 

0: kd> dt _KMUTANT 85e18510

nt!_KMUTANT

   +0x000 Header           : _DISPATCHER_HEADER

   +0x010 MutantListEntry  : _LIST_ENTRY [ 0x870b6564 - 0x870b6564 ]

   +0x018 OwnerThread      : 0x870b6398 _KTHREAD

   +0x01c Abandoned        : 0 ''

   +0x01d ApcDisable       : 0 ''

 

The owner of the mutex is 870b6398. The bad news is this is the same thread I started with in the Iexplore.exe process so we’re looking at a classic deadlock scenario. Thread 870b6398 running under Iexplore.exe has made an OLE call over to the IEuser.exe broker process. The IEuser process needs the mutex owned by the thread that called it in the first place.

This turned out to be a bug in the RasApi code, not the browser or IEuser process.  The RasAPI code was written in such a way that it would wait for an infinite period of time to gain the Mutex.

 

0163e92c 747e60aa 00000224 ffffffff 00000000 kernel32!WaitForSingleObject+0x12

 

 The good news is that we have created hotfix 951738 to address this issue.  The official KB article is still pending.

Leave a Comment
  • Please add 5 and 7 and type the answer here:
  • Post
  • The most interesting part is missing :) Would be nice to have a declaration of SendReceive (which is, I believe, undocumented) and the content of the structure which was used by you to obtain the PID.

  • When "digging into the ole32 SendReceive call" you chose those addresses to dump because you knew from the source code the parameters passed to that function?

    [You are correct; we knew how to find the process ID because the source code gave us the needed details. However, the technique itself can still be useful without source code, although we cannot guarantee that the function parameter will remain the same in other versions of ole32.dll.]
  • The list is a little longer today because of not posting last week. Enjoy! Microsoft Advanced Windows

  • To Lucian: given the stack trace of IE, I would start to look at ole32!CCliModalLoop::BlockFn source code to obtain the name of the object which is going to be signaled.

    Then search for this mutex name over all sources and, voila, you know the reason for deadlock.

  • There is a simpler method for RPC calls (illustrated from another dump) to use dpp and search for small numbers:

    00faf7e4 77c7fcf0 00133620 00faf80c 7778d073 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [Non-Fpo])

    1: kd> dpp 00133620

    00133620  072bb008 77c7e5d8 RPCRT4!LRPC_CCALL::`vftable'

    00133624  00000010

    00133628  0738dd90 00070005

    0013362c  00000038

    00133630  00000005

    00133634  0771be9c 8a885d04

    00133638  072e1050 00000044

    0013363c  00080838 067e15d8

    00133640  00fafc48 7c829f59 ntdll!RtlFreeHeap+0x70f

    00133644  00fafa34 00000000

    00133648  00000000

    0013364c  faedcf5c

    00133650  11d131fe

    00133654  8000d2aa 00000000

    00133658  0e27c15f

    0013365c  00000030

    00133660  07a2abb0

    00133664  4007b10e

    00133668  66997ba0

    0013366c  50f338ee

    00133670  00003108

    00133674  00000005

    00133678  00000000

    0013367c  00000000

    00133680  ffffffff

    00133684  01a1db7f

    00133688  00000000

    0013368c  00000000

    00133690  00000002

    00133694  10010000 00000008

    00133698  00000a4c

    0013369c  000003f4

    Here is our PID:

    00133670  00003108

  • Actually my previous example is a bit different where we have RPC over LPC and !lpc command can do this. In your case I don't see the same simple way.

  • >> hotfix 951738 to address this issue.  The official KB article is still pending. <<

    Looks like the KB article is now available:

    "On a computer that is running Windows Vista or Windows Server 2008 and that has UAC and Internet Explorer 7 protected mode enabled, Internet Explorer 7 may stop responding when you try to visit any Web site" @ http://support.microsoft.com/kb/951738

  • How can you know the 003d2ef0 is pointing to another data structure?  You have the SendReceive codes and the data structures?

    Also, how can you know "the thread in the ieuser.exe process working on our call from Iexplore.exe" from the call stack?

    For us without the windows source codes, it is good if you can explain more about the function and the structure.

    Anyway, you still tell us some undocumented way to check the PID and what some function parameters are.

    Thanks

    [Please see our response here. Thanks.]
  • Used this today on a user dump and found the PID of the other process handling the request! Saved our customer from collecting a full system dump.

    --Kappa.

Page 1 of 1 (9 items)