• Ntdebugging Blog

    Debugging a Hang at “Applying Computer Settings”

    • 2 Comments

    Hi Everyone - My name is Aman. I'm an Escalation Engineer on the Microsoft GES (Global Escalation Services) team. We recently came across a critical situation where-in the following issues were reported by the customer:

     

    1.       DC login hangs at “applying computer settings” regardless of RDP or local console.

    2.       Connecting to the server using computer management console fails with "RPC Server is too busy to complete the operation"

    3.       Server responds to ping in the problem state but cannot be accessed through ports 445/139.

    4.       Reboot will fix the issue for a couple of days but the problem will return again.

     

    The case had already been troubleshot many times in the past without any results. So, to get to the root cause, it was escalated to GES and we took a full memory dump. In this blog I want to show you how to navigate between processes via LPC messages to drill down to root cause.

     

    Disclaimer – Not every hang at “Applying Computer Settings” requires a debug. Standard troubleshooting typically resolves this type of thing. 

     

    Now let's have a look at the Complete Memory Dump (note: a Kernel dump is not enough). Since the console session hangs at "applying computer settings", the target starting point of my debug will be the process called winlogon. Winlogon.exe is the executable responsible for initiating the logon process for users. If the console session is hung at "applying computer settings", winlogon has to have an explanation as to why it's unable to process the user logon successfully.

     

    So, let’s get the address of the winlogon process:

     

    kd> ! process 0 0 winlogon.exe

    PROCESS 85c63830  SessionId: 0  Cid: 01a4    Peb: 7ffdf000  ParentCid: 0158

        DirBase: 39f6a000  ObjectTable: e13d7928  HandleCount: 535.

        Image: winlogon.exe

     

    PROCESS 84f02020  SessionId: 1  Cid: 0344    Peb: 7ffdf000  ParentCid: 0158

        DirBase: 2d675000  ObjectTable: e35abc78  HandleCount: 313.

        Image: winlogon.exe

     

     

    So, why we have two winlogon processes executing? Simple, the customer tried to log into the domain controller via RDP and local console. Both methods resulted in the hang, hence creating two winlogon sessions. Let’s check the first one and see what's happening in there. I picked up only questionable threads for the sake of this session.

     

     

    kd> .process /p /r 85c63830

    Implicit process is now 85c63830

    Loading User Symbols

    ................................................................

    .........

    kd> !process 85c63830 7

    PROCESS 85c63830  SessionId: 0  Cid: 01a4    Peb: 7ffdf000  ParentCid: 0158

        DirBase: 39f6a000  ObjectTable: e13d7928  HandleCount: 535.

        Image: winlogon.exe

        VadRoot 85c599c8 Vads 157 Clone 0 Private 1266. Modified 1081. Locked 0.

        DeviceMap e1001150

        Token                             e1743a38

        ElapsedTime                       15:50:33.974

        UserTime                          00:00:00.187

        KernelTime                        00:00:02.125

        QuotaPoolUsage[PagedPool]         53444

        QuotaPoolUsage[NonPagedPool]      63864

        Working Set Sizes (now,min,max)  (1161, 50, 345) (4644KB, 200KB, 1380KB)

        PeakWorkingSetSize                2834

        VirtualSize                       49 Mb

        PeakVirtualSize                   52 Mb

        PageFaultCount                    10861

        MemoryPriority                    BACKGROUND

        BasePriority                      13

        CommitCharge                      1979

     

     

     

            THREAD 847318d0  Cid 01a4.58e4  Teb: 7ffdd000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                84731abc  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 00213bd7:

            Current LPC port e125d4e0

            Not impersonating

            DeviceMap                 e1001150

            Owning Process            85c63830       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3650157        Ticks: 525 (0:00:00:08.203)

            Context Switch Count      2            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init f5c98000 Current f5c97c20 Base f5c98000 Limit f5c95000 Call 0

            Priority 14 BasePriority 13 PriorityDecrement 0

            ChildEBP RetAddr  Args to Child             

            f5c97c38 808202b6 847318d0 84731978 00000000 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            f5c97c50 8081fb6e 84731abc 84731a90 847318d0 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            f5c97c94 80915355 84731abc 00000011 f5c97d01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            f5c97d50 8082350b 0000080c 00087ba0 00087ba0 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            f5c97d50 7c8285ec 0000080c 00087ba0 00087ba0 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f5c97d64)

            0075fab0 7c82787b 77c80a6e 0000080c 00087ba0 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            0075fab4 77c80a6e 0000080c 00087ba0 00087ba0 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            0075fb00 77c7fcf0 0075fb3c 0075fb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            0075fb0c 77c80673 0075fb3c 75822c38 0075ff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            0075fb20 77ce315a 0075fb68 00087c18 001125a8 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            0075ff08 75822b74 75822c38 75822b8a 0075ff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            0075ff20 75822b4f 00b81460 0075ff3c 0075ff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            0075ff30 75824e7a 0075ff3c 00100000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            0075ff60 0103917b 0075ff78 00000000 00000000 WlNotify!SensStopScreenSaverEvent+0x58 (FPO: [1,9,0])

            0075ffb8 77e64829 001125a8 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            0075ffec 00000000 010390b7 001125a8 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

    We found somewhere around 15 threads in the process above. Only the above thread looked suspicious since it has issued a LPC call and hasn't received a reply yet. Before we investigate further, let’s check the other winlogon process and find out about the threads in there as well....just in case if there is a commonality or not amongst both the processes...

     

    Note the long tick counts in these threads!

     

     

    kd> .process /p /r 84f02020

    Implicit process is now 84f02020

    Loading User Symbols

    ................................................................

    ................

    kd> !process 84f02020 7

    PROCESS 84f02020  SessionId: 1  Cid: 0344    Peb: 7ffdf000  ParentCid: 0158

        DirBase: 2d675000  ObjectTable: e35abc78  HandleCount: 313.

        Image: winlogon.exe

        VadRoot 85e86dd0 Vads 184 Clone 0 Private 656. Modified 722. Locked 0.

        DeviceMap e1001150

        Token                             e3a09030

        ElapsedTime                       04:51:33.657

        UserTime                          00:00:00.140

        KernelTime                        00:00:00.796

        QuotaPoolUsage[PagedPool]         49464

        QuotaPoolUsage[NonPagedPool]      10080

        Working Set Sizes (now,min,max)  (1099, 50, 345) (4396KB, 200KB, 1380KB)

        PeakWorkingSetSize                2193

        VirtualSize                       48 Mb

        PeakVirtualSize                   50 Mb

        PageFaultCount                    4312

        MemoryPriority                    BACKGROUND

        BasePriority                      13

        CommitCharge                      1329

     

            THREAD 847a93f0  Cid 0344.4ab0  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                847a95dc  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 0020b318:

            Current LPC port e3761290

            Impersonation token:  e3a92250 (Level Impersonation)

            DeviceMap                 e1bca430

            Owning Process            84f02020       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3500419        Ticks: 150263 (0:00:39:07.859)

            Context Switch Count      1            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init b9d00000 Current b9cffc20 Base b9d00000 Limit b9cfd000 Call 0

            Priority 13 BasePriority 13 PriorityDecrement 0

            Kernel stack not resident.

            ChildEBP RetAddr  Args to Child             

            b9cffc38 808202b6 847a93f0 847a9498 00000310 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            b9cffc50 8081fb6e 847a95dc 847a95b0 847a93f0 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            b9cffc94 80915355 847a95dc 00000011 00000001 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            b9cffd50 8082350b 00000364 000b9760 000b9760 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            b9cffd50 7c8285ec 00000364 000b9760 000b9760 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9cffd64)

            0087fab0 7c82787b 77c80a6e 00000364 000b9760 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            0087fab4 77c80a6e 00000364 000b9760 000b9760 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            0087fb00 77c7fcf0 0087fb3c 0087fb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            0087fb0c 77c80673 0087fb3c 75822c38 0087ff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            0087fb20 77ce315a 0087fb68 000b9820 000a8bd8 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            0087ff08 75822b74 75822c38 75822b8a 0087ff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            0087ff20 75822b4f 000c1e30 0087ff3c 0087ff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            0087ff30 75824e19 0087ff3c 00080000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            0087ff60 0103917b 0087ff78 00000000 00000000 WlNotify!SensStartScreenSaverEvent+0x58 (FPO: [1,9,0])

            0087ffb8 77e64829 000a8bd8 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            0087ffec 00000000 010390b7 000a8bd8 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

            THREAD 847ad8d0  Cid 0344.4c7c  Teb: 7ffac000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                847adabc  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 0020bf59:

            Current LPC port e3761290

            Impersonation token:  e39b1358 (Level Impersonation)

            DeviceMap                 e1bca430

            Owning Process            84f02020       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3513766        Ticks: 136916 (0:00:35:39.312)

            Context Switch Count      1            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init b9dc0000 Current b9dbfc20 Base b9dc0000 Limit b9dbd000 Call 0

            Priority 13 BasePriority 13 PriorityDecrement 0

            Kernel stack not resident.

            ChildEBP RetAddr  Args to Child             

            b9dbfc38 808202b6 847ad8d0 847ad978 00000341 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            b9dbfc50 8081fb6e 847adabc 847ada90 847ad8d0 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            b9dbfc94 80915355 847adabc 00000011 b9dbfd01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            b9dbfd50 8082350b 00000364 000ba340 000ba340 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            b9dbfd50 7c8285ec 00000364 000ba340 000ba340 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9dbfd64)

            00b7fab0 7c82787b 77c80a6e 00000364 000ba340 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            00b7fab4 77c80a6e 00000364 000ba340 000ba340 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            00b7fb00 77c7fcf0 00b7fb3c 00b7fb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            00b7fb0c 77c80673 00b7fb3c 75822c38 00b7ff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            00b7fb20 77ce315a 00b7fb68 000ba400 000a8b80 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            00b7ff08 75822b74 75822c38 75822b8a 00b7ff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            00b7ff20 75822b4f 000c1e30 00b7ff3c 00b7ff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            00b7ff30 75824e7a 00b7ff3c 00100000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            00b7ff60 0103917b 00b7ff78 00000000 00000000 WlNotify!SensStopScreenSaverEvent+0x58 (FPO: [1,9,0])

            00b7ffb8 77e64829 000a8b80 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            00b7ffec 00000000 010390b7 000a8b80 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

            THREAD 84989b40  Cid 0344.4c88  Teb: 7ffaa000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                84989d2c  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 0020bf60:

            Current LPC port e3761290

            Impersonation token:  e3a48030 (Level Impersonation)

            DeviceMap                 e1bca430

            Owning Process            84f02020       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3513766        Ticks: 136916 (0:00:35:39.312)

            Context Switch Count      1            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init b9c44000 Current b9c43c20 Base b9c44000 Limit b9c41000 Call 0

            Priority 13 BasePriority 13 PriorityDecrement 0

            Kernel stack not resident.

            ChildEBP RetAddr  Args to Child             

            b9c43c38 808202b6 84989b40 84989be8 00000741 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            b9c43c50 8081fb6e 84989d2c 84989d00 84989b40 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            b9c43c94 80915355 84989d2c 00000011 b9c43d01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            b9c43d50 8082350b 00000364 000c18a8 000c18a8 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            b9c43d50 7c8285ec 00000364 000c18a8 000c18a8 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9c43d64)

            00dffab0 7c82787b 77c80a6e 00000364 000c18a8 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            00dffab4 77c80a6e 00000364 000c18a8 000c18a8 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            00dffb00 77c7fcf0 00dffb3c 00dffb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            00dffb0c 77c80673 00dffb3c 75822c38 00dfff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            00dffb20 77ce315a 00dffb68 000c1968 000bcd80 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            00dfff08 75822b74 75822c38 75822b8a 00dfff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            00dfff20 75822b4f 000c1e30 00dfff3c 00dfff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            00dfff30 75824d57 00dfff3c 00020000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            00dfff60 0103917b 00dfff78 00000000 00000000 WlNotify!SensLockEvent+0x58 (FPO: [1,9,0])

            00dfffb8 77e64829 000bcd80 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            00dfffec 00000000 010390b7 000bcd80 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

            THREAD 847d7b18  Cid 0344.4c9c  Teb: 7ffa6000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                847d7d04  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 0020bfb4:

            Current LPC port e3761290

            Impersonation token:  e423a358 (Level Impersonation)

            DeviceMap                 e1bca430

            Owning Process            84f02020       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3514154        Ticks: 136528 (0:00:35:33.250)

            Context Switch Count      1            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init b9c00000 Current b9bffc20 Base b9c00000 Limit b9bfd000 Call 0

            Priority 13 BasePriority 13 PriorityDecrement 0

            Kernel stack not resident.

            ChildEBP RetAddr  Args to Child             

            b9bffc38 808202b6 847d7b18 847d7bc0 00002601 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            b9bffc50 8081fb6e 847d7d04 847d7cd8 847d7b18 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            b9bffc94 80915355 847d7d04 00000011 b9bffd01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            b9bffd50 8082350b 00000364 000ab970 000ab970 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            b9bffd50 7c8285ec 00000364 000ab970 000ab970 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9bffd64)

            00fcfab0 7c82787b 77c80a6e 00000364 000ab970 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            00fcfab4 77c80a6e 00000364 000ab970 000ab970 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            00fcfb00 77c7fcf0 00fcfb3c 00fcfb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            00fcfb0c 77c80673 00fcfb3c 75822c38 00fcff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            00fcfb20 77ce315a 00fcfb68 000aba30 000bd858 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            00fcff08 75822b74 75822c38 75822b8a 00fcff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            00fcff20 75822b4f 000c1e30 00fcff3c 00fcff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            00fcff30 75824db8 00fcff3c 00040000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            00fcff60 0103917b 00fcff78 00000000 00000000 WlNotify!SensUnlockEvent+0x58 (FPO: [1,9,0])

            00fcffb8 77e64829 000bd858 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            00fcffec 00000000 010390b7 000bd858 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

            THREAD 8478e8d0  Cid 0344.53d0  Teb: 7ffab000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                8478eabc  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 0020f7bb:

            Current LPC port e3761290

            Impersonation token:  e3811358 (Level Impersonation)

            DeviceMap                 e1bca430

            Owning Process            84f02020       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3572296        Ticks: 78386 (0:00:20:24.781)

            Context Switch Count      1            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init b94bc000 Current b94bbc20 Base b94bc000 Limit b94b9000 Call 0

            Priority 13 BasePriority 13 PriorityDecrement 0

            Kernel stack not resident.

            ChildEBP RetAddr  Args to Child             

            b94bbc38 808202b6 8478e8d0 8478e978 00000310 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            b94bbc50 8081fb6e 8478eabc 8478ea90 8478e8d0 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            b94bbc94 80915355 8478eabc 00000011 b94bbd01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            b94bbd50 8082350b 00000364 000880a8 000880a8 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            b94bbd50 7c8285ec 00000364 000880a8 000880a8 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b94bbd64)

            00dbfab0 7c82787b 77c80a6e 00000364 000880a8 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            00dbfab4 77c80a6e 00000364 000880a8 000880a8 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            00dbfb00 77c7fcf0 00dbfb3c 00dbfb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            00dbfb0c 77c80673 00dbfb3c 75822c38 00dbff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            00dbfb20 77ce315a 00dbfb68 00088168 0008bd80 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            00dbff08 75822b74 75822c38 75822b8a 00dbff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            00dbff20 75822b4f 000c1e30 00dbff3c 00dbff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            00dbff30 75824e19 00dbff3c 00080000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            00dbff60 0103917b 00dbff78 00000000 00000000 WlNotify!SensStartScreenSaverEvent+0x58 (FPO: [1,9,0])

            00dbffb8 77e64829 0008bd80 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            00dbffec 00000000 010390b7 0008bd80 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

            THREAD 846e6660  Cid 0344.56a8  Teb: 7ffa7000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                846e684c  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 0021117f:

            Current LPC port e3761290

            Impersonation token:  e37a5358 (Level Impersonation)

            DeviceMap                 e1bca430

            Owning Process            84f02020       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3602966        Ticks: 47716 (0:00:12:25.562)

            Context Switch Count      1            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init b9500000 Current b94ffc20 Base b9500000 Limit b94fd000 Call 0

            Priority 13 BasePriority 13 PriorityDecrement 0

            Kernel stack not resident.

            ChildEBP RetAddr  Args to Child             

            b94ffc38 808202b6 846e6660 846e6708 00000241 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            b94ffc50 8081fb6e 846e684c 846e6820 846e6660 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            b94ffc94 80915355 846e684c 00000011 b94ffd01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            b94ffd50 8082350b 00000364 0008ba78 0008ba78 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            b94ffd50 7c8285ec 00000364 0008ba78 0008ba78 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b94ffd64)

            00f8fab0 7c82787b 77c80a6e 00000364 0008ba78 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            00f8fab4 77c80a6e 00000364 0008ba78 0008ba78 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            00f8fb00 77c7fcf0 00f8fb3c 00f8fb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            00f8fb0c 77c80673 00f8fb3c 75822c38 00f8ff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            00f8fb20 77ce315a 00f8fb68 0008bb38 0009bbf0 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            00f8ff08 75822b74 75822c38 75822b8a 00f8ff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            00f8ff20 75822b4f 000c1e30 00f8ff3c 00f8ff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            00f8ff30 75824e7a 00f8ff3c 00100000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            00f8ff60 0103917b 00f8ff78 00000000 00000000 WlNotify!SensStopScreenSaverEvent+0x58 (FPO: [1,9,0])

            00f8ffb8 77e64829 0009bbf0 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            00f8ffec 00000000 010390b7 0009bbf0 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

            THREAD 8470d428  Cid 0344.56b4  Teb: 7ffa4000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                8470d614  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 00211186:

            Current LPC port e3761290

            Impersonation token:  e4167358 (Level Impersonation)

            DeviceMap                 e1bca430

            Owning Process            84f02020       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3602966        Ticks: 47716 (0:00:12:25.562)

            Context Switch Count      1            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init b9378000 Current b9377c20 Base b9378000 Limit b9375000 Call 0

            Priority 13 BasePriority 13 PriorityDecrement 0

            Kernel stack not resident.

            ChildEBP RetAddr  Args to Child             

            b9377c38 808202b6 8470d428 8470d4d0 00002241 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            b9377c50 8081fb6e 8470d614 8470d5e8 8470d428 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            b9377c94 80915355 8470d614 00000011 b9377d01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            b9377d50 8082350b 00000364 0013b340 0013b340 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            b9377d50 7c8285ec 00000364 0013b340 0013b340 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9377d64)

            016dfab0 7c82787b 77c80a6e 00000364 0013b340 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            016dfab4 77c80a6e 00000364 0013b340 0013b340 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            016dfb00 77c7fcf0 016dfb3c 016dfb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            016dfb0c 77c80673 016dfb3c 75822c38 016dff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            016dfb20 77ce315a 016dfb68 0013b400 0007f808 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            016dff08 75822b74 75822c38 75822b8a 016dff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            016dff20 75822b4f 000c1e30 016dff3c 016dff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            016dff30 75824d57 016dff3c 00020000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            016dff60 0103917b 016dff78 00000000 00000000 WlNotify!SensLockEvent+0x58 (FPO: [1,9,0])

            016dffb8 77e64829 0007f808 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            016dffec 00000000 010390b7 0007f808 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

            THREAD 846c4b40  Cid 0344.56d0  Teb: 7ffa2000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

                846c4d2c  Semaphore Limit 0x1

            Waiting for reply to LPC MessageId 00211223:

            Current LPC port e3761290

            Impersonation token:  e3826358 (Level Impersonation)

            DeviceMap                 e1bca430

            Owning Process            84f02020       Image:         winlogon.exe

            Attached Process          N/A            Image:         N/A

            Wait Start TickCount      3603490        Ticks: 47192 (0:00:12:17.375)

            Context Switch Count      1            

            UserTime                  00:00:00.000

            KernelTime                00:00:00.000

            Win32 Start Address winlogon!WlpExecuteNotify (0x010390b7)

            Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

            Stack Init b9310000 Current b930fc20 Base b9310000 Limit b930d000 Call 0

            Priority 13 BasePriority 13 PriorityDecrement 0

            Kernel stack not resident.

            ChildEBP RetAddr  Args to Child             

            b930fc38 808202b6 846c4b40 846c4be8 00002301 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

            b930fc50 8081fb6e 846c4d2c 846c4d00 846c4b40 nt!KiSwapThread+0x83 (FPO: [0,2,0])

            b930fc94 80915355 846c4d2c 00000011 b930fd01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

            b930fd50 8082350b 00000364 0013c440 0013c440 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

            b930fd50 7c8285ec 00000364 0013c440 0013c440 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b930fd64)

            0175fab0 7c82787b 77c80a6e 00000364 0013c440 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

            0175fab4 77c80a6e 00000364 0013c440 0013c440 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

            0175fb00 77c7fcf0 0175fb3c 0175fb20 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

            0175fb0c 77c80673 0175fb3c 75822c38 0175ff28 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

            0175fb20 77ce315a 0175fb68 0013c500 00098ef8 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

            0175ff08 75822b74 75822c38 75822b8a 0175ff28 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

            0175ff20 75822b4f 000c1e30 0175ff3c 0175ff60 WlNotify!SensApip_RPC_SensNotifyWinlogonEvent+0x1c (FPO: [2,1,0])

            0175ff30 75824db8 0175ff3c 00040000 00000024 WlNotify!SensNotifyWinlogonEvent+0x1c (FPO: [1,0,0])

            0175ff60 0103917b 0175ff78 00000000 00000000 WlNotify!SensUnlockEvent+0x58 (FPO: [1,9,0])

            0175ffb8 77e64829 00098ef8 00000000 00000000 winlogon!WlpExecuteNotify+0xc4 (FPO: [SEH])

            0175ffec 00000000 010390b7 00098ef8 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

    Wow!....look at that...around 8 threads, and all waiting for replies to the LPC calls issued. So, both the winlogon processes are stuck waiting for a reply for the LPC calls they made.

     

    The next part is to find out what's happening at the server part of these LPC calls since winlogon is stuck until the processing of the LPC calls are finished.

     

    To find the server part of the LPC call, we pick the message ID from each thread. Look for (Waiting for reply to LPC MessageId "message_id":) in the threads above. Let's start with the first winlogon process to find out the server end of the LPC call it had in there. If you remember, we did find one thread in the first winlogon process waiting for a reply to the LPC call and message id of that LPC call was 00213bd7.

     

    kd> !lpc message 00213bd7

    Searching message 213bd7 in threads ...

    Client thread 847318d0 waiting a reply from 213bd7                         

    Searching thread 847318d0 in port rundown queues ...

     

    Server communication port 0xe365aa90

        Handles: 1   References: 1

        The LpcDataInfoChainHead queue is empty

            Connected port: 0xe125d4e0      Server connection port: 0xe1c37920

     

    Client communication port 0xe125d4e0

        Handles: 1   References: 2

        The LpcDataInfoChainHead queue is empty

     

    Server connection port e1c37920  Name: senssvc

        Handles: 1   References: 17

        Server process  : 85e021e0 (svchost.exe)

        Queue semaphore : 850fa890

        Semaphore state 16 (0x10)

            Messages in queue:

            0000 e3791a88 - Busy  Id=001fbe4f  From: 0f4c.335c  Context=80060002  [e1c37930 . e1281378]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00000028 00000005 00010050 00d0fdac

            0000 e1281378 - Busy  Id=001fbe56  From: 0f4c.336c  Context=80060002  [e3791a88 . e15c5350]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00490065 0074006e 006c0065 00500000

            0000 e15c5350 - Busy  Id=00200326  From: 0f4c.3840  Context=80060002  [e1281378 . e14ba008]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 0008c080 00000000 0049004d 00440048

            0000 e14ba008 - Busy  Id=0020032a  From: 0f4c.3848  Context=80060002  [e15c5350 . e1120290]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00000000 00000000 0000000c 00000000

            0000 e1120290 - Busy  Id=002049b3  From: 0f4c.3b08  Context=80060002  [e14ba008 . e40a48c8]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00600040 00600050 00900050 00600050

            0000 e40a48c8 - Busy  Id=00204e8d  From: 0f4c.09c8  Context=80060002  [e1120290 . e1ab41d0]

                       Length=00200008  Type=00000005 (LPC_PORT_CLOSED)

                       Data: fd48cde3 01c9700a

            0000 e1ab41d0 - Busy  Id=0020b318  From: 0344.4ab0  Context=80050001  [e40a48c8 . e1b35378]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00000028 00000005 00010050 00d0fdac

            0000 e1b35378 - Busy  Id=0020bf59  From: 0344.4c7c  Context=80050001  [e1ab41d0 . e1bdc918]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00000000 00000000 00000000 00000000

            0000 e1bdc918 - Busy  Id=0020bf60  From: 0344.4c88  Context=80050001  [e1b35378 . e1657ca0]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 0037002d 00350032 00340033 00350035

            0000 e1657ca0 - Busy  Id=0020bfb4  From: 0344.4c9c  Context=80050001  [e1bdc918 . e3a463e8]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 4ed07378 cf0ee4b3 5faeb176 0000000c

            0000 e3a463e8 - Busy  Id=0020f7bb  From: 0344.53d0  Context=80050001  [e1657ca0 . e1bcab08]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00000000 00000000 00000000 00000000

            0000 e1bcab08 - Busy  Id=0021117f  From: 0344.56a8  Context=80050001  [e3a463e8 . e4190008]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00000000 813a8198 00001601 00027011

            0000 e4190008 - Busy  Id=00211186  From: 0344.56b4  Context=80050001  [e1bcab08 . e40afca8]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 002e0070 0061006d 00720072 006f0069

            0000 e40afca8 - Busy  Id=00211223  From: 0344.56d0  Context=80050001  [e4190008 . e37de340]

                       Length=00c000a8  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 4bdd485c 5d18bba4 d4bb211b 0000000c

            0000 e37de340 - Busy  Id=00213bd7  From: 01a4.58e4  Context=80030000  [e40afca8 . e1c37930]

                       Length=00780060  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00000000 811fc308 00001601 000152cb

        The message queue contains 15 messages

        The LpcDataInfoChainHead queue is empty

     

    Done!                                            

     

    Interesting, along with the message id, 00213bd7 , in the queue, we also found all other message ID's seen in the second winlogon process.

    Take a look at the messages in queue and you will see all of them in there.

     

    So, now the question arises as to what is this process doing and why its unable to process these LPC call's successfully.

     

    Let's switch context to the Server process  :  85e021e0 (svchost.exe)

     

    kd> .process /p /r 85e021e0

    Implicit process is now 85e021e0 

    Loading User Symbols

    ................................................................

    ...........................................

     

    When I dumped the threads in this process, the thread count turned out to be an alarming 4900 which is extremely bizarre. Around 1325 of those threads were waiting to acquire a common critical section. This became a very solid point of interest and almost all of them (around 4870) were in a wait state pending a single common SynchronizationEvent i.e. 85d1eeb0.

     

    These threads wait for the message to come through via the LPC port to the server process. That also explains why the server service (user mode component: srvsvc.dll) is unable to process any remote requests.

     

    Few of those threads, waiting on the common critical section, looked like this: (Don’t get bothered by the “Kernel stack not resident” as we know memory gets swapped to disk)

     

    kd> !thread 84832af8

    THREAD 84832af8  Cid 03d8.4760  Teb: 7ef49000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

        85d1eeb0  SynchronizationEvent

    Not impersonating

    DeviceMap                 e1001150

    Owning Process            85e021e0       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3474348        Ticks: 176334 (0:00:45:55.218)

    Context Switch Count      1            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address srvsvc!XsProcessApisWrapper (0x74ed39a9)

    Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

    Stack Init b9fd4000 Current b9fd3c78 Base b9fd4000 Limit b9fd1000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

    ChildEBP RetAddr  Args to Child             

    b9fd3c90 808202b6 84832af8 84832ba0 00000000 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

    b9fd3ca8 8081fb6e 00000000 00000000 00000000 nt!KiSwapThread+0x83 (FPO: [0,2,0])

    b9fd3cec 8090e64e 85d1eeb0 00000006 00000001 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

    b9fd3d50 8082350b 00000314 00000000 00000000 nt!NtWaitForSingleObject+0x9a (FPO: [SEH])

    b9fd3d50 7c8285ec 00000314 00000000 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9fd3d64)

    3d6bfbc8 7c827d0b 7c83d236 00000314 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    3d6bfbcc 7c83d236 00000314 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])

    3d6bfc08 7c83d281 00000314 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x1a3 (FPO: [2,7,4])

    3d6bfc28 7c82ee3b 7c8877a0 00000000 7ffd8000 ntdll!RtlEnterCriticalSection+0xa8 (FPO: [1,1,0])

    3d6bfcb8 7c82ec9f 3d6bfd28 3d6bfd28 00000000 ntdll!LdrpInitializeThread+0x68 (FPO: [SEH])

    3d6bfd14 7c8284c5 3d6bfd28 7c800000 00000000 ntdll!_LdrpInitialize+0x16f (FPO: [SEH])

    00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25

     

    kd> !thread 847d8b40

    THREAD 847d8b40  Cid 03d8.4be0  Teb: 7ed63000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

        85d1eeb0  SynchronizationEvent

    Not impersonating

    DeviceMap                 e1001150

    Owning Process            85e021e0       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3549550        Ticks: 101132 (0:00:26:20.187)

    Context Switch Count      1            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address srvsvc!XsProcessApisWrapper (0x74ed39a9)

    Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

    Stack Init b98a4000 Current b98a3c78 Base b98a4000 Limit b98a1000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

    ChildEBP RetAddr  Args to Child             

    b98a3c90 808202b6 847d8b40 847d8be8 00000000 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

    b98a3ca8 8081fb6e 00000000 00000000 00000000 nt!KiSwapThread+0x83 (FPO: [0,2,0])

    b98a3cec 8090e64e 85d1eeb0 00000006 00000001 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

    b98a3d50 8082350b 00000314 00000000 00000000 nt!NtWaitForSingleObject+0x9a (FPO: [SEH])

    b98a3d50 7c8285ec 00000314 00000000 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b98a3d64)

    4503fbc8 7c827d0b 7c83d236 00000314 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    4503fbcc 7c83d236 00000314 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])

    4503fc08 7c83d281 00000314 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x1a3 (FPO: [2,7,4])

    4503fc28 7c82ee3b 7c8877a0 00000000 7ffd8000 ntdll!RtlEnterCriticalSection+0xa8 (FPO: [1,1,0])

    4503fcb8 7c82ec9f 4503fd28 4503fd28 00000000 ntdll!LdrpInitializeThread+0x68 (FPO: [SEH])

    4503fd14 7c8284c5 4503fd28 7c800000 00000000 ntdll!_LdrpInitialize+0x16f (FPO: [SEH])

    00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25

     

    kd> !thread 847a8db0

    THREAD 847a8db0  Cid 03d8.4998  Teb: 7eed1000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

        85d1eeb0  SynchronizationEvent

    Not impersonating

    DeviceMap                 e1001150

    Owning Process            85e021e0       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3487819        Ticks: 162863 (0:00:42:24.734)

    Context Switch Count      1            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address srvsvc!XsProcessApisWrapper (0x74ed39a9)

    Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

    Stack Init b9dc4000 Current b9dc3c78 Base b9dc4000 Limit b9dc1000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

    ChildEBP RetAddr  Args to Child              

    b9dc3c90 808202b6 847a8db0 847a8e58 00000000 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

    b9dc3ca8 8081fb6e 00000000 00000000 00000000 nt!KiSwapThread+0x83 (FPO: [0,2,0])

    b9dc3cec 8090e64e 85d1eeb0 00000006 00000001 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

    b9dc3d50 8082350b 00000314 00000000 00000000 nt!NtWaitForSingleObject+0x9a (FPO: [SEH])

    b9dc3d50 7c8285ec 00000314 00000000 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9dc3d64)

    3f4bfbc8 7c827d0b 7c83d236 00000314 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    3f4bfbcc 7c83d236 00000314 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])

    3f4bfc08 7c83d281 00000314 00000004 00000000 ntdll!RtlpWaitOnCriticalSection+0x1a3 (FPO: [2,7,4])

    3f4bfc28 7c82ee3b 7c8877a0 00000000 7ffd8000 ntdll!RtlEnterCriticalSection+0xa8 (FPO: [1,1,0])

    3f4bfcb8 7c82ec9f 3f4bfd28 3f4bfd28 00000000 ntdll!LdrpInitializeThread+0x68 (FPO: [SEH])

    3f4bfd14 7c8284c5 3f4bfd28 7c800000 00000000 ntdll!_LdrpInitialize+0x16f (FPO: [SEH])

    00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x25

     

    kd> !thread 848e1db0

    THREAD 848e1db0  Cid 03d8.2c10  Teb: 7f67a000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

        85d1eeb0  SynchronizationEvent

    Not impersonating

    DeviceMap                 e1001150

    Owning Process            85e021e0       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3183097        Ticks: 467585 (0:02:01:46.015)

    Context Switch Count      1             

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address srvsvc!XsProcessApisWrapper (0x74ed39a9)

    Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

    Stack Init f3570000 Current f356fc78 Base f3570000 Limit f356d000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident

     

    kd> !thread 84890db0

    THREAD 84890db0  Cid 03d8.31cc  Teb: 7f4e3000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

        85d1eeb0  SynchronizationEvent

    Not impersonating

    DeviceMap                 e1001150

    Owning Process            85e021e0       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3244800        Ticks: 405882 (0:01:45:41.906)

    Context Switch Count      1            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address srvsvc!XsProcessApisWrapper (0x74ed39a9)

    Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

    Stack Init f2ee8000 Current f2ee7c78 Base f2ee8000 Limit f2ee5000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

     

    kd> !thread 848223f0

    THREAD 848223f0  Cid 03d8.3714  Teb: 7f3d8000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

        85d1eeb0  SynchronizationEvent

    Not impersonating

    DeviceMap                 e1001150

    Owning Process            85e021e0       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3283235        Ticks: 367447 (0:01:35:41.359)

    Context Switch Count      1            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address srvsvc!XsProcessApisWrapper (0x74ed39a9)

    Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

    Stack Init f2ac4000 Current f2ac3c78 Base f2ac4000 Limit f2ac1000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

     

    kd> dt ntdll!_RTL_CRITICAL_SECTION 7c8877a0

       +0x000 DebugInfo        : 0x7c8877c0 _RTL_CRITICAL_SECTION_DEBUG

       +0x004 LockCount        : -19514

       +0x008 RecursionCount   : 2

       +0x00c OwningThread     : 0x00001634

       +0x010 LockSemaphore    : 0x00000314

       +0x014 SpinCount        : 0

     

    In Microsoft Windows Server 2003 Service Pack 1 and later versions of Windows, the LockCount field is parsed as follows: (also at link http://msdn.microsoft.com/en-us/library/cc267476.aspx)

    • The lowest bit shows the lock status. If this bit is 0, the critical section is locked; if it is 1, the critical section is not locked.
    • The next bit shows whether a thread has been woken for this lock. If this bit is 0, then a thread has been woken for this lock; if it is 1, no thread has been woken.
    • The remaining bits are the ones-complement of the number of threads waiting for the lock.

    In our case, the LockCount is -19514. The lowest bit can be determined in this way:

    kd> ? 0x1 & (-0n19514)

    Evaluate expression: 0 = 00000000

     

    kd>  (0x2 & (-0n19514)) >> 1

         ^ Syntax error in ' (0x2 & (-0n19514)) >> 1'

     

    kd> ?(0x2 & (-0n19514)) >> 1

    Evaluate expression: 1 = 00000001

     

    kd> ? ((-1) - (-0n19514)) >> 2

    Evaluate expression: 4878 = 0000130e

     

     

    So, there are 4878 threads waiting for this particular critical section. Seeing the number of above threads in the waiting list, it makes lot of sense.

    In RED above, we see the offender (0x00001634) who is not ready to leave the critical section.

     

    So now, at least we know which thread is causing the server to be unresponsive. Let’s go further and find out what’s happening in that particular thread.

     

    kd> !thread 85125020

    THREAD 85125020  Cid 03d8.1634  Teb: 7ffdb000 Win32Thread: e1754ea8 WAIT: (Unknown) UserMode Non-Alertable

        8512520c  Semaphore Limit 0x1

    Waiting for reply to LPC MessageId 001e295b:

    Current LPC port e38140c8

    IRP List:

        850eff68: (0006,0094) Flags: 00000800  Mdl: 00000000

    Not impersonating

    DeviceMap                 e1001150

    Owning Process            85e021e0       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3470768        Ticks: 179914 (0:00:46:51.156)

    Context Switch Count      93                 LargeStack

    UserTime                  00:00:00.000

    KernelTime                00:00:00.015

    Win32 Start Address RPCRT4!ThreadStartRoutine (0x77c7b0f5)

    Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

    Stack Init f5424000 Current f5423c20 Base f5424000 Limit f5421000 Call 0

    Priority 10 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

    ChildEBP RetAddr  Args to Child             

    f5423c38 808202b6 85125020 851250c8 00000500 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

    f5423c50 8081fb6e 8512520c 851251e0 85125020 nt!KiSwapThread+0x83 (FPO: [0,2,0])

    f5423c94 80915355 8512520c 00000011 80912701 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

    f5423d50 8082350b 00000a40 000e9dd8 000de270 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

    f5423d50 7c8285ec 00000a40 000e9dd8 000de270 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f5423d64)

    00a0c580 7c82787b 77e77342 00000a40 000e9dd8 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    00a0c584 77e77342 00000a40 000e9dd8 000de270 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

    00a0c5bc 77e77214 01b1f978 7c840936 00a0c604 kernel32!BasepQueryModuleData+0x121 (FPO: [5,6,4])

    00a0c5dc 7c84086e 01b1f978 7c840936 00a0c604 kernel32!BaseQueryModuleData+0x3b (FPO: [5,1,0])

    00a0c82c 7c8343b7 0009ec88 0009ec88 00000000 ntdll!LdrpCheckAppDatabase+0x9c (FPO: [1,138,4])

    00a0c840 7c8333ba 0009ec88 0016ecb0 00a0ce98 ntdll!LdrpCheckNXCompatibility+0x27 (FPO: [1,1,4])

    00a0c93c 7c834311 00000000 00000000 777964d0 ntdll!LdrpRunInitializeRoutines+0x20f (FPO: [SEH])

    00a0cbd0 7c834065 00000000 0016ecb0 00a0ce98 ntdll!LdrpLoadDll+0x3cd (FPO: [SEH])

    00a0ce4c 77e41bf3 0016ecb0 00a0ce98 00a0ce78 ntdll!LdrLoadDll+0x198 (FPO: [SEH])

    00a0ceb4 776adc89 00a0cf30 00000000 00000008 kernel32!LoadLibraryExW+0x1b2 (FPO: [SEH])

    00a0ced8 776adb95 00a0cf30 00a0cefc 00a0cf00 ole32!CClassCache::CDllPathEntry::LoadDll+0xb0 (FPO: [4,1,4])

    00a0cf08 776ad2da 00a0cf30 00a0d208 00a0cf28 ole32!CClassCache::CDllPathEntry::Create_rl+0x37 (FPO: [3,3,4])

    00a0d154 776ad1fd 00000001 00a0d208 00a0d184 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd8 (FPO: [3,139,4])

    00a0d19c 776ad112 00000001 00105624 00a0d1c8 ole32!CClassCache::GetClassObjectActivator+0x1fe (FPO: [3,10,4])

    00a0d1cc 776a1b7f 00a0d208 00a0dce8 00a0d7a0 ole32!CClassCache::GetClassObject+0x24 (FPO: [1,1,0])

    00a0d240 776a19b5 77794964 00a0d7a0 00a0dce8 ole32!CServerContextActivator::GetClassObject+0xfd (FPO: [3,23,4])

    00a0d278 776a1c4e 00a0d7a0 00a0dce8 00a0d7a0 ole32!ActivationPropertiesIn::DelegateGetClassObject+0xf3 (FPO: [2,7,0])

    00a0d298 776a1bfa 77794978 00a0d7a0 00a0dce8 ole32!CApartmentActivator::GetClassObject+0x4d (FPO: [3,2,0])

    00a0d2b0 776acb27 77794978 00000001 00000000 ole32!CProcessActivator::GCOCallback+0x2b (FPO: [6,0,4])

    00a0d2d0 776acad8 77794960 00a0d5fc 00000000 ole32!CProcessActivator::AttemptActivation+0x2c (FPO: [7,0,0])

    00a0d30c 776a1ca1 77794960 00a0d5fc 00000000 ole32!CProcessActivator::ActivateByContext+0x4f (FPO: [6,3,0])

    00a0d334 776a19b5 77794960 00a0d7a0 00a0dce8 ole32!CProcessActivator::GetClassObject+0x48 (FPO: [3,1,4])

    00a0d36c 776a1925 00a0d7a0 00a0dce8 0073006d ole32!ActivationPropertiesIn::DelegateGetClassObject+0xf3 (FPO: [2,7,0])

    00a0d5b4 776a19b5 7779487c 00000001 00a0dce8 ole32!CClientContextActivator::GetClassObject+0x88 (FPO: [3,3,0])

    00a0d5ec 776a1a07 00a0d7a0 00a0dce8 7767c9b0 ole32!ActivationPropertiesIn::DelegateGetClassObject+0xf3 (FPO: [2,7,0])

    00a0dda4 776a187e 00a0ded8 00000001 00000000 ole32!ICoGetClassObject+0x3a5 (FPO: [7,52,0])

    00a0ddd4 776a1841 00a0ded8 00000002 00000000 ole32!CComActivator::DoGetClassObject+0xe0 (FPO: [6,0,4])

    00a0ddf4 772d7218 00a0ded8 00000001 00000000 ole32!CoGetClassObject+0x20 (FPO: [5,0,0])

    00a0de60 772d65ce 01ab48d8 01aeba70 00a0de80 urlmon!CProtMgr::FindFirstCF+0x11a (FPO: [4,17,0])

    00a0de84 772c4408 01ab4870 01aeba70 00000000 urlmon!COInetSession::CreateHandler+0x2d (FPO: [6,1,0])

    00a0deec 772d6b6c 00083ff8 01aeba70 000841b0 urlmon!CTransaction::LoadHandler+0xbd (FPO: [5,15,4])

    00a0df18 772c8302 00083ff8 0000000d 00000001 urlmon!CTransaction::DispatchReport+0x203 (FPO: [8,1,4])

    00a0df44 772c874b 00083ff8 01b092e8 00000000 urlmon!CTransaction::DispatchPacket+0x31 (FPO: [2,0,4])

    00a0df64 772c6681 00083ff8 00000000 00a0e014 urlmon!CTransaction::OnINetCallback+0x92 (FPO: [2,1,0])

    00a0df9c 7739b6e3 00c60048 00000465 00000000 urlmon!TransactionWndProc+0x33 (FPO: [SEH])

    00a0dfc8 7739b874 772c664e 00c60048 00000465 USER32!InternalCallWinProc+0x28

    00a0e040 7739ba92 00000000 772c664e 00c60048 USER32!UserCallWinProcCheckWow+0x151 (FPO: [SEH])

    00a0e0a8 773a16e5 00a0e0d0 00000001 00a0e10c USER32!DispatchMessageWorker+0x327 (FPO: [SEH])

    00a0e0b8 772de54a 00a0e0d0 00084100 00000000 USER32!DispatchMessageA+0xf (FPO: [1,0,0])

    00a0e10c 772ca8f5 00000000 00000000 01ad6588 urlmon!CTransaction::CompleteOperation+0x15a (FPO: [1,15,0])

    00a0e148 772c77ee 00083ff8 01ac6930 01ad64e0 urlmon!CTransaction::Start+0x52c (FPO: [6,9,4])

    00a0e1cc 772c744f 00a0e214 01aa2b30 772c9c00 urlmon!CBinding::StartBinding+0x4d8 (FPO: [6,22,4])

    00a0f264 772c7319 01aa7230 00000000 01aa2b30 urlmon!CUrlMon::StartBinding+0x1d8 (FPO: [6,1051,4])

    00a0f29c 772eb8f5 01aa7230 01aa2b30 00000000 urlmon!CUrlMon::BindToStorage+0x67 (FPO: [5,3,4])

    00a0f2e0 7732238b 001422e0 00a0f344 001422e0 urlmon!CBaseBSCB::KickOffDownload+0x191 (FPO: [2,8,0])

    00a0f2f4 7732296f 001422e0 00a0f344 0000000c urlmon!CFileDownload::KickOffDownload+0x2d (FPO: [2,0,4])

    00a0f30c 77322ce6 00000000 00a0f344 00a0f3c4 urlmon!URLDownloadToFileW+0x51 (FPO: [5,0,0])

    00a0f444 00a0f59d 00000000 00a0f63e 00a0f632 urlmon!URLDownloadToFileA+0xd8 (FPO: [5,68,4])

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    00a0f474 71bf396b 48544950 71bf3969 554a4649 0xa0f59d

     77430b8c 77e40000 00000000 00000004 0000ffff 000000b8 WS2HELP!WahOpenHandleHelper+0xf7 (FPO: [1,114,0])

     

    Now, looking at the thread, lets list down the observations and our interpretation from it:

     

    1)      So, by looking at API URLDownloadToFileW(), it seems this thread is busy downloading a file from the internet. The API and parameters are documented at  http://msdn.microsoft.com/en-us/library/ms836094.aspx Let’s find out which file are we trying to download here:

     

    kd> du 00a0f344

    00a0f344  http://10.162.13.115:31797/x

     

    kd> du 00a0f3c4

    00a0f3c4  "x.exe"

     

    So we are trying to download an executable here? It’s worth having the customer investigate this binary?

     

    2)      This thread is waiting for a reply to a LPC call issued (LPC MessageId 001e295b), so let’s find out who do we have at the server end of this LPC call:

     

    kd> !lpc message 001e295b

    Searching message 1e295b in threads ...

    Client thread 85125020 waiting a reply from 1e295b                         

    Searching thread 85125020 in port rundown queues ...

     

    Server connection port e1395030  Name: AELPort

        Handles: 1   References: 22

        Server process  : 85e021e0 (svchost.exe)

        Queue semaphore : 85c593f8

        Semaphore state 0 (0x0)

        The message queue is empty

     

    So, as you see above, the server piece of this LPC call is this process itself hence a straightforward deadlock situation. Also, please remember that critical section is process specific.

     

    3)      The stack below URLMON looks messed up so we shall have to look at the raw stack and reconstruct the correct stack. 

     

    kd> kv =00a0f8cc 0x00a0f8cc 71c49573

     

    ChildEBP RetAddr  Args to Child             

    00a0f8cc 71c49578 000c9620 000c93ac 00104100 NETAPI32!NetpwPathCanonicalize+0x76 (FPO: [6,0,4])

    00a0f8f4 74edc916 000c93ac 00104100 0000019a NETAPI32!NetpwPathCanonicalize+0x7b (FPO: [6,0,4])

    00a0f914 77c80193 000c9398 000c93ac 00104100 srvsvc!NetprPathCanonicalize+0x1c (FPO: [7,0,0])

    00a0f944 77ce33e1 74edc8fa 00a0fb28 00000007 RPCRT4!Invoke+0x30

    00a0fd44 77ce35c4 00000000 00000000 000b9fd8 RPCRT4!NdrStubCall2+0x299 (FPO: [SEH])

    00a0fd60 77c7ff7a 000b9fd8 000a80d8 000b9fd8 RPCRT4!NdrServerCall2+0x19 (FPO: [1,1,0])

    00a0fd94 77c8042d 74ed31d7 000b9fd8 00a0fe34 RPCRT4!DispatchToStubInCNoAvrf+0x38 (FPO: [SEH])

    00a0fde8 77c80353 0000001f 00000000 74ee3f0c RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x11f (FPO: [4,13,4])

    00a0fe0c 77c68e0d 000b9fd8 00000000 74ee3f0c RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3 (FPO: [4,0,4])

    00a0fe40 77c68cb3 000b9f98 000002a4 000b9f60 RPCRT4!OSF_SCALL::DispatchHelper+0x149 (FPO: [0,4,4])

    00a0fe54 77c68c2b 00000000 0000001f 000c9370 RPCRT4!OSF_SCALL::DispatchRPCCall+0x10d (FPO: [2,0,4])

    00a0fe84 77c68b5e 000c9370 030002bc 0000001f RPCRT4!OSF_SCALL::ProcessReceivedPDU+0x57f (FPO: [3,5,4])

    00a0fea4 77c6e8db 00000000 000002bc 00000000 RPCRT4!OSF_SCALL::BeginRpcCall+0x194 (FPO: [2,0,4])

    00a0ff04 77c6e7b4 00000000 000c9370 000002bc RPCRT4!OSF_SCONNECTION::ProcessReceiveComplete+0x435 (FPO: [3,17,0])

    00a0ff18 77c7b799 0008cd68 0000000c 00000000 RPCRT4!ProcessConnectionServerReceivedEvent+0x21 (FPO: [7,0,0])

    00a0ff84 77c7b9b5 00a0ffac 77c8872d 0008cd68 RPCRT4!LOADABLE_TRANSPORT::ProcessIOEvents+0x1b8 (FPO: [0,15,0])

    00a0ff8c 77c8872d 0008cd68 00000000 00000000 RPCRT4!ProcessIOEventsWrapper+0xd (FPO: [1,0,0])

    00a0ffac 77c7b110 0008b120 00a0ffec 77e64829 RPCRT4!BaseCachedThreadRoutine+0x9d (FPO: [1,2,4])

    00a0ffb8 77e64829 001262f0 00000000 00000000 RPCRT4!ThreadStartRoutine+0x1b (FPO: [1,0,0])

    00a0ffec 00000000 77c7b0f5 001262f0 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

     

    Our first course of action was to install the latest and greatest netapi32.dll binary which cleared up the issue.

     

     

    I hope that helps!!

     

    Thanks and regards,

    Aman

  • Ntdebugging Blog

    The default interactive desktop heap size has been increased on 32-bit Vista SP1

    • 5 Comments

     

    This is going to be a short blog post, but considering the amount of feedback we’ve received on the our two previous desktop heap posts, I think this is worth blogging about.  32-bit Vista SP1 and 32-bit Windows Server 2008 both have a new value for the default size of interactive desktop heaps.  Previously, this heap size was 3 MB by default, but on Vista SP1 and Server 2008 the default interactive heap size is 12 MB.  We’ve heard your comments about the problems associated with the heap size, and this new default size should go a long way towards avoiding interactive desktop heap depletion.

     

    On 32-bit systems, running Vista SP1 or Windows Server 2008, you’ll now see that the relevant portion of the registry value looks like this (by default)...

     

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

    SharedSection=1024,12288,512

     

    64-bit Windows still has a default interactive desktop heap size of 20MB.  Keep in mind that in Vista, session view space (the memory range that desktop heaps are allocated from) is now a dynamic address range.  This means that in Vista and Server 2008 the individual desktop heap sizes can be increased to larger sizes with less concern over exhausting session view space.

     

    - Matthew Justice

  • Ntdebugging Blog

    LPC (Local procedure calls) Part 1 architecture

    • 5 Comments

     

    Hello, my name is Roy. I’m an Escalation Engineer in the CPR platforms team.  I’ll be doing a four part series on LPC over the coming month.  You’re sure to find this interesting. That being said let’s get started.

     

    Disclaimer: The purpose of this blog is to illustrate debugging techniques with LPC. Please do not rely on this information as documentation for the LPC APIs when writing your own code. As always, this is subject to change in the future. 

    What is LPC?

     

    LPC (Local Inter-Process Communication and NOT Local Procedure Calls) is a high speed message based communication mechanism implemented in the NT kernel. LPC can be used for communication between two user mode processes, between a user mode process and a kernel mode driver or between two kernel mode drivers.  One example would be two user mode processes communicating via LPC.  Like CSRSS.exe talking to SMSS.exe over SmssWinStationApiPort while creating a logon session or any process talking to LSASS.exe over LsaAuthenticationPort for security reasons. Another example of a user mode process communicating with a kernel mode driver would be KSecDD.sys talking to LSASS.exe for EFS key encryption and decryption during read/write of an encrypted file.

     

    LPC uses two different mechanisms for passing data between the client and the server process. It uses the LPC message buffer (for data size less than 304 bytes) or it uses a shared memory Section mapped to both client and server address space (for data size more than 304 bytes).

     

    Apart from being used as the protocol of choice of Remote Procedure Calls between processes running on the same system, LPC is also used throughout the system e.g. for Win32 Application’s communication with CSRSS.exe, Security Reference Monitor’s communication with LSASS, WinLogon’s communication with LSASS etc.

     

    LPC enforces synchronous communication model between the client and the server processes. Vista deprecates the use of LPC using a new mechanism called Asynchronous Local Inter-Process Communication (ALPC). ALPC has an inherent advantage over LPC in that all calls from client to the server are asynchronous i.e. the client does not need to block/wait for the server to respond to a message. In Vista, legacy application calls to LPC APIs are automatically redirected to the newer ALPC APIs.

     

     

    LPC APIs

     

    LPC APIs are native APIs i.e. they are exported in user mode by NTDLL.dll and in kernel mode by NTOSKRNL.exe. LPC APIs are not exposed at the Win32 level hence win32 applications cannot use the LPC facility directly. Win32 applications can however use LPC indirectly when using RPC by specifying LPC as its underling transport via the protocol sequence “ncalrpc”. All LPC APIs end in the word "Port" which implies an LPC communication endpoint.

     

    API

    Description

    NtCreatePort

    Used by server to create a connection port

    NtConnectPort

    Used by client to connect to a connection port

    NtListenPort

    Used by server to listen for connection requests on the connection port.

    NtAcceptConnectPort

    Used by server to accept connection requests on the connection port

    NtCompleteConnectPort

    Used by server to complete the acceptance of a connection request

    NtRequestPort

    Used to send a datagram message that does not have a reply

    NtRequestWaitReplyPort

    Used to send a message and wait for a reply

    NtReplyPort

    Used to send a reply to a particular message

    NtReplyWaitReplyPort

    Used to send a reply to a particular message and wait for a reply to a previous message

    NtReplyWaitReceivePort

    Used by server to send a reply to the client and wait to receive a message from the client

    NtImpersonateClientOfPort

    Used by server thread to temporarily borrow the security context of a client thread

     

     

    The following diagram illustrates the steps taken by an LPC server process to listen to inbound connection request from potential client and the steps taken by clients to connect to listening servers.

     

     

    Figure [1] LPC Client Server Connection Establishment Sequence

    NOTE: Many server processes use the NtReplyWaitReceivePort( ) API instead of NtListenPort( ). NtListenPort( ) drops all LPC messages except connection requests. Hence NtListenPort( ) can only be used for the first connection. For later connection requests NtReplyWaitReceivePort( ) is used.

     

    The following diagram illustrates the steps taken by an LPC client to send a request to an LPC server that is has already established a connection to and the steps taken by the server to responds to the message.

     

     

     

    Figure [2] LPC Client Server Data Transfer Sequence

     

     

    LPC Data Structures

     

     

    LPC Port Data Structure

     

    LPC endpoints are referred to as ports. LPC implementation uses the same port structure to represent the various types of ports. The ports used by LPC are Server Connection Ports which are named ports created by the server process to accept incoming connections from clients.  Client Communication Ports which are created by the client process to connect to a server process and Server Communication Port created by the server process.

     

    Relationships 

      

     

    Figure [3] LPC Port types and their relationships

     

     

    LPCP_PORT_OBJECT is the internal data structure used by LPC to represent a LPC port. LPCP_PORT_OBJECTs are allocated out of paged pool with the tag ‘Port’.

     

    kd> dt nt!_LPCP_PORT_OBJECT

       +0x000 ConnectionPort   : Ptr32 _LPCP_PORT_OBJECT

       +0x004 ConnectedPort    : Ptr32 _LPCP_PORT_OBJECT

       +0x008 MsgQueue         : _LPCP_PORT_QUEUE

       +0x018 Creator          : _CLIENT_ID

       +0x020 ClientSectionBase : Ptr32 Void

       +0x024 ServerSectionBase : Ptr32 Void

       +0x028 PortContext      : Ptr32 Void

       +0x02c ClientThread     : Ptr32 _ETHREAD

       +0x030 SecurityQos      : _SECURITY_QUALITY_OF_SERVICE

       +0x03c StaticSecurity   : _SECURITY_CLIENT_CONTEXT

       +0x078 LpcReplyChainHead : _LIST_ENTRY

       +0x080 LpcDataInfoChainHead : _LIST_ENTRY

       +0x088 ServerProcess    : Ptr32 _EPROCESS

       +0x088 MappingProcess   : Ptr32 _EPROCESS

       +0x08c MaxMessageLength : Uint2B

       +0x08e MaxConnectionInfoLength : Uint2B

       +0x090 Flags            : Uint4B

       +0x094 WaitEvent        : _KEVENT

     

     

    Field

    Description

    ConnectedPort

    Points to the Server Communication Port

    ConnectionPort

    Points to the Server Connection Port

    MsgQueue.Semaphore

    Used to signal the server thread about the presence of  a message in MsgQueue.ReceiveHead

    MsgQueue.ReceiveHead

    Head of a doubly linked list containing all the messages that are waiting to be dequeued by the server.

    MsgQueue.NonPagedPortQueue

    Points to the LPCP_NONPAGED_PORT_QUEUE structure for the client communication port for tracking lost replies.

    LpcReplyChainHead

    Head of a doubly linked list containing all the threads that are waiting for replies to messages sent to this port.

     

     

     

    LPC Message Data Structure

     

    LPC message are data structures that carry information from the LPC client to the LPC server and can be of various types like connection, request, close etc.

     

    LPCP_MESSAGE is the internal data structure used by LPC to represent a message. LPCP_MESSAGE structures are allocated from a system wide lookaside list with the tag ‘LpcM’.

     

    kd> dt nt!_LPCP_MESSAGE

       +0x000 Entry            : _LIST_ENTRY

       +0x000 FreeEntry        : _SINGLE_LIST_ENTRY

       +0x004 Reserved0        : Uint4B

       +0x008 SenderPort       : Ptr32 Void

       +0x00c RepliedToThread  : Ptr32 _ETHREAD

       +0x010 PortContext      : Ptr32 Void

       +0x018 Request          : _PORT_MESSAGE

     

     

    Field

    Description

    Request.MessageId

    Generated from the value of a global epoch LpcpNextMessageId. Used to uniquely identify a message.

    SenderPort

    Points to the LPCP_PORT_OBJECT of the client communication port

    Entry

    Is the list entry that is used to queue the message to the Server Communication/Connection Port’s MsgQueue.ReceiveHead

    Request

    Is a copy of the message buffer that was provided as the Request parameter to the call to NtRequestWaitReplyPort() or a copy the message buffer that was provided as the Reply parameter to NtReplyWaitRecivePortEx().

     

     

    LPC related fields in Thread Data Structure

     

    kd> dt nt!_ETHREAD -y Lpc

       +0x1c8 LpcReplyChain : _LIST_ENTRY

       +0x1f4 LpcReplySemaphore : _KSEMAPHORE

       +0x208 LpcReplyMessage : Ptr32 Void

       +0x208 LpcWaitingOnPort : Ptr32 Void

       +0x228 LpcReceivedMessageId : Uint4B

       +0x23c LpcReplyMessageId : Uint4B

       +0x250 LpcReceivedMsgIdValid : Pos 0, 1 Bit

       +0x250 LpcExitThreadCalled : Pos 1, 1 Bit

     

    The following table describes the fields of the ETHREAD data structure that are used for communication between LPC Client and Server process.

     

     

    Field

    Description

    LpcReplyMessageId

    Contains the ID of the message sent to the server while the client thread is waiting on the response.

    LpcWaitingOnPort

    Used by the LPC Client Thread to stores the LPC client communication port on which it is waiting for a reply.

    LpcReplySemaphore

    Used to block the LPC Client Thread while the server is processing the LPC message.

    LpcExitThreadCalled

    Used by the Lpc APIs to determine if the thread is currently in the process of exiting and if so returns STATUS_THREAD_IS_TERMINATING from the API call.

    LpcReplyChain

    Used to queue the thread to the list of threads waiting for replies from the server communication/connection port. The list head of this list is in LPCP_PORT_OBJECT->LpcReplyChainHead.

     

     

     

    Important global LPC connection ports

     

    Field

    Description

    LsaAuthenticationPort

     Used for the following tasks:

    1. Security package management like Kerberos, NTLM etc…
    2. Credential management
    3. User Account management
    4. EFS (Encrypted File System) key encryption, decryption

     

    SmssWinStationApiPort

     

     This port is used by the CSRSS to manage Window Stations i.e. multiple sessions.

     

    ErrorLogPort

     

     This port is used by IO manager and other components to send error log entries to the EventLog service.

     

     

     

     

    Important LPC port fields in the Process

    (nt!_EPROCESS)

     

    Field

    Description

    SecurityPort

    LSA uses this port for EFS (encrypted file system) and authentication.

     

    ExceptionPort

     When a thread does not catch an exception an LPC_EXCEPTION message is sent to this port. A subsystem might register an LPC port in this field to receive second chance exception information of process running in the subsystem. Default action of CSRSS is to terminate the process.

     

    DebugPort

     The kernel dispatches user mode exceptions to the process debugger on this port. Debug outputs using OutputDebugString( ) are passed to this port as DBG_PRINTEXCEPTION_C.

     

     

     

     

    Important LPC port fields in the Thread

    (nt!_ETHREAD)

     

    Field

    Description

    TerminationPort

    This is not a single port rather a chain of ports registered with the Process Manager. The process manager notifies all ports when the thread terminates.

     

    LpcWaitingOnPort

    This field has the address of the client communication port when the client thread waits for a reply to a LPC message from a server thread.

     

     

     

     

    Stay tuned...in the next post we will roll our sleeves up and dig into the debug extensions for LPC and some common states where  you may find these calls stuck.

     

  • Ntdebugging Blog

    Interpreting Event 153 Errors

    • 8 Comments

    Hello my name is Bob Golding and I would like to share with you a new event that you may see in the system event log.  Event ID 153 is an error associated with the storage subsystem. This event was new in Windows 8 and Windows Server 2012 and was added to Windows 7 and Windows Server 2008 R2 starting with hot fix KB2819485.

     

    An event 153 is similar to an event 129.  An event 129 is logged when the storport driver times out a request to the disk; I described event 129 messages in a previous article.  The difference between a 153 and a 129 is that a 129 is logged when storport times out a request, a 153 is logged when the storport miniport driver times out a request.  The miniport driver may also be referred to as an adapter driver or HBA driver, this driver is typically written the hardware vendor.

     

    Because the miniport driver has a better knowledge of the request execution environment, some miniport drivers time the request themselves instead of letting storport handle request timing.  This is because the miniport driver can abort the individual request and return an error rather than storport resetting the drive after a timeout.  Resetting the drive is disruptive to the I/O subsystem and may not be necessary if only one request has timed out.  The error returned from the miniport driver is bubbled up to the class driver who can log an event 153 and retry the request.

     

    Below is an example event 153:

     

    Event 153 Example

     

    This error means that a request failed and was retried by the class driver.  In the past no message would be logged in this situation because storport did not timeout the request.  The lack of messages resulted in confusion when troubleshooting disk errors because timeouts would occur but there would be no evidence of the error.

     

    The details section of the event the log record will present what error caused the retry and whether the request was a read or write. Below is the details output:

     

    Event 153 Details

     

    In the example above at byte offset 29 is the SCSI status, at offset 30 is the SRB status that caused the retry, and at offset 31 is the SCSI command that is being retried.  In this case the SCSI status was 00 (SCSISTAT_GOOD), the SRB status was 09 (SRB_STATUS_TIMEOUT), and the command was 28 (SCSIOP_READ). 

     

    The most common SCSI commands are:

    SCSIOP_READ - 0x28

    SCSIOP_WRITE - 0x2A

     

    The most common SRB statuses are below:

    SRB_STATUS_TIMEOUT - 0x09

    SRB_STATUS_BUS_RESET - 0x0E

    SRB_STATUS_COMMAND_TIMEOUT - 0x0B

     

    A complete list of SCSI operations and statuses can be found in scsi.h in the WDK.  A list of SRB statuses can be found in srb.h.

     

    The timeout errors (SRB_STATUS_TIMEOUT and SRB_STATUS_COMMAND_TIMEOUT) indicate a request timed out in the adapter. In other words a request was sent to the drive and there was no response within the timeout period.  The bus reset error (SRB_STATUS_BUS_RESET) indicates that the device was reset and that the request is being retried due to the reset since all outstanding requests are aborted when a drive receives a reset.

     

    A system administrator who encounters event 153 errors should investigate the health of the computer’s disk subsystem.  Although an occasional timeout may be part of the normal operation of a system, the frequent need to retry requests indicates a performance issue with the storage that should be corrected.

  • Ntdebugging Blog

    Kernel Stack Overflows

    • 5 Comments

    Hello, this is Omer, and today I would like to talk about a common error that we see in a lot of cases reported to us by customers. It involves drivers taking too much space on the kernel stack that results in a kernel stack overflow, which will then crash the system with one of the following bugchecks:

    1. STOP 0x7F: UNEXPECTED_KERNEL_MODE_TRAP with Parameter 1 set to EXCEPTION_DOUBLE_FAULT, which is caused by running off the end of a kernel stack.

    2. STOP 0x1E: KMODE_EXCEPTION_NOT_HANDLED, 0x7E: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED, or 0x8E: KERNEL_MODE_EXCEPTION_NOT_HANDLED, with an exception code of STATUS_ACCESS_VIOLATION, which indicates a memory access violation.

    3. STOP 0x2B: PANIC_STACK_SWITCH, which usually occurs when a kernel-mode driver uses too much stack space.

    Kernel Stack Overview

    Each thread in the system is allocated with a kernel mode stack. Code running on any kernel-mode thread (whether it is a system thread or a thread created by a driver) uses that thread's kernel-mode stack unless the code is a DPC, in which case it uses the processor's DPC stack on certain platforms.  Stack grows negatively.  This means that the beginning (bottom) of the stack has a higher address than the end (top) of the stack.  For example, let's stay the beginning of your stack is 0x80f1000 and this is where your stack pointer (ESP) is pointing.  If you push a DWORD value onto the stack, its address would be 0x80f0ffc.  The next DWORD value would be stored at 0x80f0ff8 and so on up to the limit (top) of the allocated stack.  The top of the stack is bordered by a guard-page to detect overruns.

    The size of the kernel-mode stack varies among different hardware platforms. For example:

    · On x86-based platforms, the kernel-mode stack is 12K.

    · On x64-based platforms, the kernel-mode stack is 24K. (x64-based platforms include systems with processors using the AMD64 architecture and processors using the Intel EM64T architecture).

    · On Itanium-based platforms, the kernel-mode stack is 32K with a 32K backing store. (If the processor runs out of registers from its register file, it uses the backing store to hold the contents of registers until the allocating function returns. This doesn't affect stack allocations directly, but the operating system uses more registers on Itanium-based platforms than on other platforms, which makes relatively more stack available to drivers.)

    The stack sizes listed above are hard limits that are imposed by the system, and all drivers need to use space conservatively so that they can coexist.

    Exception Overview

    So, now that we have discussed the kernel stack, let’s dive into how the double fault actually happens.

    When we reach the top of the stack, one more push instruction is going to cause an exception. This could be either a simple push instruction, or something along the lines of a call instruction which also pushes the return address onto the stack, etc.

    The push instruction is going to cause the first exception. This will cause the exception handler to kick in, which will then try to allocate the trap frame and other variables on the stack. This causes the second exception.

    This time around, the operating system takes advantage a special x86 structure called the Task State Segment(TSS).  The OS stores the state of the registers in the TSS and then stops.  The TSS can be accessed via an entry in the global descriptor table, and can be used to debug the memory dump that is created.

    The Usual Suspects

    Rogue drivers are usually guilty of one or more of the following design flaws:

    1. Using the stack liberally. Instead of passing large amounts of data on the stack, driver writers should design functions to accept pointers to data structures. These data structures should be allocated out of system space memory(paged or non-paged pool). If you need to pass large number of parameters from one function to another, then group the parameters into a structure and then pass a pointer to that structure.

    2. Calling functions recursively. Heavily nested or recursive functions that are passing large amounts of data on the stack will use too much space and will overflow. Try to design drivers that use a minimal number of recursive calls and nested functions.

    Since the size of the stack is much smaller on x86 machines, you will run into these problems with x86 machines more frequently than any other platform.

    For a more detailed description, please visit

    http://www.microsoft.com/whdc/Driver/tips/KMstack.mspx.

    Debugging Kernel Stack Overflows

    Full kernel dumps are usually enough to find the offending driver. The most common bugcheck code that appears in these dumps is UNEXPECTED_KERNEL_MODE_TRAP (0x7f), with the first argument being EXCEPTION_DOUBLE_FAULT (0x8).

    When you get this dump, the first command that you should run is !analyze-v.

    0: kd> !analyze -v

    *******************************************************************************

    * Bugcheck Analysis *

    *******************************************************************************

    UNEXPECTED_KERNEL_MODE_TRAP (7f)

    This means a trap occurred in kernel mode, and it's a trap of a kind that the kernel isn't allowed to have/catch (bound trap) or that

    is always instant death (double fault). The first number in the bugcheck params is the number of the trap (8 = double fault, etc)

    Consult an Intel x86 family manual to learn more about what these traps are. Here is a *portion* of those codes:

    If kv shows a taskGate

    use .tss on the part before the colon, then kv.

    Else if kv shows a trapframe

    use .trap on that value

    Else

    .trap on the appropriate frame will show where the trap was taken(on x86, this will be the ebp that goes with the procedure KiTrap)

    Endif

    kb will then show the corrected stack.

    Arguments:

    Arg1: 00000008, EXCEPTION_DOUBLE_FAULT

    Arg2: 80042000

    Arg3: 00000000

    Arg4: 00000000

    Debugging Details:

    ------------------

    BUGCHECK_STR: 0x7f_8

    TSS: 00000028 -- (.tss 0x28)

    eax=87b90328 ebx=87b90328 ecx=8aa3d8c0 edx=87b90328 esi=b8cb7138 edi=8084266a

    eip=f7159c53 esp=b8cb7000 ebp=b8cb7010 iopl=0 nv up ei pl nz na po nc

    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202

    Ntfs!NtfsInitializeIrpContext+0xc:

    f7159c53 57 push edi

    Resetting default scope

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    PROCESS_NAME: System

    CURRENT_IRQL: 1

    TRAP_FRAME: b8cb8620 -- (.trap 0xffffffffb8cb8620)

    ErrCode = 00000000

    eax=c1587000 ebx=0000000e ecx=0000000f edx=00000000 esi=87dca350 edi=00000000

    eip=8093837b esp=b8cb8694 ebp=b8cb86d0 iopl=0 nv up ei ng nz ac po cy

    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293

    nt!CcMapData+0x8c:

    8093837b 8a10 mov dl,byte ptr [eax] ds:0023:c1587000=??

    Resetting default scope

    LAST_CONTROL_TRANSFER: from f7158867 to f7159c53

    Let's follow the instructions that the debugger is giving us. Since the debugger gave us a .tss command, lets run that. After that, run a !thread to get the thread summary:

    0: kd> .tss 0x28

    eax=87b90328 ebx=87b90328 ecx=8aa3d8c0 edx=87b90328 esi=b8cb7138 edi=8084266a

    eip=f7159c53 esp=b8cb7000 ebp=b8cb7010 iopl=0 nv up ei pl nz na po nc

    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202

    Ntfs!NtfsInitializeIrpContext+0xc:

    f7159c53 57 push edi

    0: kd> !thread

    THREAD 87dca350 Cid 0420.0990 Teb: 7ffdf000 Win32Thread: efdbe430 RUNNING on processor 0

    IRP List:

    89cba088: (0006,01fc) Flags: 00000404 Mdl: 00000000

    Not impersonating

    DeviceMap e10008d8

    Owning Process 8ab8e238 Image: System

    Wait Start TickCount 7260638 Ticks: 0

    Context Switch Count 17 LargeStack

    UserTime 00:00:00.000

    KernelTime 00:00:00.015

    Start Address 0x4a6810ea

    Stack Init b8cba000 Current b8cb7c64 Base b8cba000 Limit b8cb7000 Call 0

    Priority 14 BasePriority 13 PriorityDecrement 0

    We are looking for the kernel stack limits(above in red). For this particular stack we see that the stack starts at b8cba000, and ends at b8cb7000. If you look at the ESP register in the .tss output above, you will see that we have reached the stack limit. The current instruction being attempted is a push which overflows the stack and causes the bugcheck.

    Now that we have determined we do have a stack overflow, let’s find out what caused this, and who the offending driver is.

    The first thing that I do is dump the stack. You might need to increase the number of frames displayed to see the whole stack.

    0: kd> kb

    *** Stack trace for last set context - .thread/.cxr resets it

    ChildEBP RetAddr

    b8cb7010 f7158867 Ntfs!NtfsInitializeIrpContext+0xc

    b8cb71bc 8083f9c0 Ntfs!NtfsFsdRead+0xb7

    b8cb71d0 f7212c53 nt!IofCallDriver+0x45

    b8cb71f8 8083f9c0 fltmgr!FltpDispatch+0x6f

    b8cb720c ba547bcc nt!IofCallDriver+0x45

    WARNING: Stack unwind information not available. Following frames may be wrong.

    b8cb7214 8083f9c0 tmpreflt!TmpAddRdr+0x7b8

    b8cb7228 ba4e08be nt!IofCallDriver+0x45

    b8cb7430 ba4e09d3 DRIVER_A+0x28be

    b8cb7450 b85fa306 DRIVER_A+0x29d3

    b8cb763c b85fa50d DRIVER_B+0x8306

    b8cb765c 8082f0d7 DRIVER_B+0x850d

    b8cb7674 8082f175 nt!IoPageRead+0x109

    b8cb76f8 80849cd5 nt!MiDispatchFault+0xd2a

    b8cb7754 80837d0a nt!MmAccessFault+0x64a

    b8cb7754 8093837b nt!KiTrap0E+0xdc

    b8cb781c f718c0ac nt!CcMapData+0x8c

    b8cb783c f718c6e6 Ntfs!NtfsMapStream+0x4b

    b8cb78b0 f718c045 Ntfs!NtfsReadMftRecord+0x86

    b8cb78e8 f718c0f4 Ntfs!NtfsReadFileRecord+0x7a

    b8cb7920 f7155c3c Ntfs!NtfsLookupInFileRecord+0x37

    b8cb7a30 f715746a Ntfs!NtfsLookupAllocation+0xdd

    b8cb7bfc f7157655 Ntfs!NtfsPrepareBuffers+0x25d

    b8cb7dd8 f715575e Ntfs!NtfsNonCachedIo+0x1ee

    b8cb7ec4 f71588de Ntfs!NtfsCommonRead+0xaf5

    b8cb8070 8083f9c0 Ntfs!NtfsFsdRead+0x113

    b8cb8084 f7212c53 nt!IofCallDriver+0x45

    b8cb80ac 8083f9c0 fltmgr!FltpDispatch+0x6f

    b8cb80c0 ba547bcc nt!IofCallDriver+0x45

    b8cb80c8 8083f9c0 tmpreflt!TmpAddRdr+0x7b8

    b8cb80dc ba4e08be nt!IofCallDriver+0x45

    b8cb82e4 ba4e09d3 DRIVER_A+0x28be

    b8cb8304 b85fa306 DRIVER_A+0x29d3

    b8cb84f0 b85fa50d DRIVER_B+0x8306

    b8cb8510 8082f0d7 DRIVER_B+0x850d

    b8cb8528 8082f175 nt!IoPageRead+0x109

    b8cb85ac 80849cd5 nt!MiDispatchFault+0xd2a

    b8cb8608 80837d0a nt!MmAccessFault+0x64a

    b8cb8608 8093837b nt!KiTrap0E+0xdc

    b8cb86d0 f718c0ac nt!CcMapData+0x8c

    b8cb86f0 f718ef1b Ntfs!NtfsMapStream+0x4b

    b8cb8720 f7186aa7 Ntfs!ReadIndexBuffer+0x8f

    b8cb8894 f7187042 Ntfs!NtfsUpdateFileNameInIndex+0x62

    b8cb8990 f7186059 Ntfs!NtfsUpdateDuplicateInfo+0x2b0

    b8cb8b98 f7186302 Ntfs!NtfsCommonCleanup+0x1e82

    b8cb8d08 8083f9c0 Ntfs!NtfsFsdCleanup+0xcf

    b8cb8d1c f7212c53 nt!IofCallDriver+0x45

    b8cb8d44 8083f9c0 fltmgr!FltpDispatch+0x6f

    b8cb8d58 ba54809a nt!IofCallDriver+0x45

    b8cb8d80 ba54d01d tmpreflt!TmpQueryFullName+0x454

    b8cb8d90 8083f9c0 tmpreflt!TmpQueryFullName+0x53d7

    b8cb8da4 ba4e08be nt!IofCallDriver+0x45

    b8cb8fac ba4e09d3 DRIVER_A+0x28be

    b8cb8fcc b85fa306 DRIVER_A+0x29d3

    b8cb91b8 b85fa50d DRIVER_B+0x8306

    b8cb91d8 80937f75 DRIVER_B+0x850d

    b8cb9208 8092add4 nt!IopCloseFile+0x2ae

    b8cb9238 8092af7a nt!ObpDecrementHandleCount+0x10a

    b8cb9260 8092ae9e nt!ObpCloseHandleTableEntry+0x131

    b8cb92a4 8092aee9 nt!ObpCloseHandle+0x82

    b8cb92b4 80834d3f nt!NtClose+0x1b

    b8cb92b4 8083c0fc nt!KiFastCallEntry+0xfc

    b8cb9330 bf835765 nt!ZwClose+0x11

    b8cb9608 bf8aa2dd win32k!bCreateSection+0x2ad

    b8cb9660 bf826b45 win32k!EngMapFontFileFDInternal+0xc6

    b8cb96c0 bf82784a win32k!PUBLIC_PFTOBJ::bLoadFonts+0x17f

    b8cb991c bf9bcb67 win32k!PUBLIC_PFTOBJ::bLoadAFont+0x77

    b8cb9af0 bf9bcb16 win32k!bInitOneStockFontInternal+0x42

    b8cb9b0c bf9bb0e8 win32k!bInitOneStockFont+0x3f

    b8cb9cf4 bf9ba845 win32k!bInitStockFontsInternal+0x12a

    b8cb9cfc bf8246ad win32k!bInitStockFonts+0xa

    b8cb9d48 bf8242d5 win32k!InitializeGreCSRSS+0x149

    b8cb9d50 80834d3f win32k!NtUserInitialize+0x66

    b8cb9d50 7c82ed54 nt!KiFastCallEntry+0xfc

    0015fdb0 00000000 0x7c82ed54

    The next step is to calculate how much space each frame is taking up. This can be done by walking the stack manually. Just subtract the subsequent EBP from the current EBP for each frame and add up the space used by all the modules.

    Module Stack Usage Percentage
    Ntfs 4152 36%
    DRIVER_A 1572 14%
    win32k 2592 22%
    DRIVER_B 1656 14%
    tmpreflt 72 1%
    fltmgr 120 1%
    nt 1420 12%

    It would be easy to blame NTFS since it is the top stack consumer, but look closer.  Even though NTFS is using the most of space in our example, this is due to both DRIVER_A and DRIVER_B making repeated calls into NTFS to access data.  Alone, it is likely that neither driver would have caused a problem, but both drivers combined resulted in a bugcheck.  Conscientious driver writing and efficient stack usage would have prevented this problem.  Both drivers need to optimize the number of calls they make to NTFS.

    Further reading

    http://www.microsoft.com/whdc/driver/kernel/mem-mgmt.mspx

    http://www.microsoft.com/whdc/Driver/tips/KMstack.mspx

    http://support.microsoft.com/kb/186775

    For more information on Task State Segments, please see the Intel and AMD Processor manuals.

  • Ntdebugging Blog

    Basics of Debugging Windows

    • 8 Comments

    Hello, this is East again. This blog post is about a topic that we always skip over when discussing debugging; what and where are the tools for debugging. I will touch on the different types of debuggers, loading symbols and the basics of getting started with loading up a dump under your preferred debugger.

    Microsoft currently offers 4 types of debugging tools. With these tools you can remote debug another machine over firewire or serial cable (USB also but may not work consistently), as well as debug usermode processes and dump files.

    Command line debuggers:

    1 ) kd.exe: kernel debugger – Used to review Crash dumps created by a blue screen crash event or a stop error. (kd –z <location of dump> –y <location of symbols>)

    2 ) cdb.exe: User mode debugger for reviewing applications,  processes, and process dumps  (cdb  –z <location of dump> –y <location of symbols> )

    3 ) ntsd.exe: CDB and NTSD are virtually identical, except that NTSD spawns a new text window when it is started, whereas CDB inherits the Command Prompt window from which it was invoked.  When I refer to "CDB", it applies to both CDB and NTSD.

    Graphical User Interface Debugger:

    4) Windbg.exe is a GUI based debugger. It can debug the same things as KD & CDB using the same commands. Windbg gives you the ability to have multiple windows open simultaneously  to review source code or other selectable items under the view menu.

    I like using windbg for all of my user and kernel debugging, while  others I work with prefer kd for kernel debugging and cdb for user mode debugging.

     

    There are 32bit and 64bit debuggers available.

    NOTE: Some people use Visual Studio as well, but this blog post will not cover using Visual Studio as a debugger.

     

    You can review applications that already have started on your machine using CDB or Windbg. You can have the problematic application launch under the debugger as well:

    Cdb or Windbg

    -p <pid> specifies the decimal process ID to attach to ( use tlist or the task manger to obtain the PID)

    -psn <name> specifies the process to attach to by service name

    <application to launch> -y <symbol path>

    NOTE: windbg allows you to use menu options as well: select “Attach to a Process” on the File menu to debug a user-mode application that is currently running.

     

    What are dumps?

    Memory dumps are a record of what was in memory and the registers at the time of a crash. There are 3 types of memory dumps:

    NOTE: The type of dump that will be written upon bugcheck can be configured by right clicking my computer ->properties ->Advanced tab ->Settings, in the Write debugging section you will use the first drop down box to select what type of memory dump you want. (See KB307973)

    ·         Mini dump – is a subset of memory that is in use by the application creating the dump.

    A mini memory Dump file is written to %SystemRoot%\Minidump\Memory.dmp by default and is usually less than a 1mb in size.

    ·         Kernel only – This is used to review the machine’s kernel memory at the time of the crash.

    ·         Full/Complete – This is the largest kernel mode dump file. It contains all information from kernel and user mode address spaces that was in physical memory at the time of the dump (about the same size as the physical memory on the box).

    Kernel and Complete Memory Dumps are written to %SystemRoot%\Memory.dmp by default.

    Note: You can configure the server to crash using certain keystrokes . This would be useful when troubleshooting a hung server or a timing issue, KB244139 explains how to configure your server for a manual crash.

    You can also create dump files from an application or process, these are known as User-mode dumps.  Additional information can be found on these types dump in the Debugging Tools for Windows help file.

     

    How do I read a dump file?

    In order to make fast progress with a memory dump file, it is best to load symbol files. Symbol files contains data that the debugger uses to interpret the application or driver code. They may contain:

    -          Global variable names

    -          Function names

    Private Symbols would contain the above information and:

    -          Local variable names

    -          Source-line numbers

    -          Type information for variables, structures, etc.

     Microsoft currently has two ways you can access symbols for the Operating System:

    Service pack download site – You will need to create:

    -          Separate directories for Windows 2000 RTM, Windows 2000 SP1, Windows 2000 SP2, Windows XP RTM, etc.

    -          Separate directories for all of the above for free vs. checked build

    -          Separate directories for hotfix symbols

     

    Public symbol server – uses a symbol store, which is a collection of symbol files. The symbol server uses the time stamp & file size to match up symbols to the active binary.After getting your symbol files together, you will need a way to tell the debugger where they are located and set up some other options.

    To set the symbol path do one of the following:

    -          _NT_SYMBOL_PATH environment variable

    -          -y command line option

    -          .sympath (Set Symbol Path) debugger command

    -          WinDbg: File | Symbol File Path dialog, or CTRL+S

    To set the executable Image Path (needed for minidumps only), do one of the foolowing:

    -          -i command line option

    -          .exepath debugger command

    -          WinDbg: File | Image File Path dialog, or CTRL+i

    -          Source Path

    -          .srcpath WinDbg: File | Source File Path dialog, or CTRL+P

    If symbol errors appear when you begin, you can try the below commands to help narrow down some problems;

    !sym noisy — gives verbose symbol information

    AND

    .reload —  to reload all symbols

     

    Also using the srv* in your symbol path tells the debugger to load and save symbols being used out to a specific directory:

    srv*DownstreamStore*<symbol locations>

     

    NOTE: You must always use .reload after you change the symbol path or fix a symbol error — the debugger doesn’t automatically reload your symbols!

     

    Now that we are done with the overview, let’s configure our machine as a host computer to open memory a dump.  I will be using Microsoft Public Symbol servers and I want to store current symbols locally to my host machine.

    Using windbg I will set my current workspace symbols to: srv*c:\pubsymbols*http://msdl.microsoft.com/download/symbols

    Click the menu option File ->Symbol File Path or Ctrl + S. This will bring up an empty box that will allow you to enter or browse to your symbol path.

    If using kd you want to set an environment variable (_NT_SYMBOL_PATH) under “my computer properties -> advanced tab” to always start with your symbols set to:  “srv*c:\pubsymbols*http://msdl.microsoft.com/download/symbols” or use this same path in your command line:

    Kd –z <path to dump.file> -y srv*c:\pubsymbols*http://msdl.microsoft.com/download/symbols

     

    NOTE: Windbg will append any workspace symbol path with the one set by the _NT_SYMBOL_PATH environment variable during loading of a memory dump.

    Ok, now we know what debugger we want to use and we know our symbol locations. Let’s open our first kernel memory dump , located on <drive letter> <path to dump file>

    Using windbg, I will load a dump file using menu options File ->Open crash Dump (ctrl + D) or drag the the dump file into the debugger; you can even  start windbg at the command prompt.  My command would look like this:

    Windbg  –z C:\training\case 7f\MEMORY051308.22.DMP

    I did not use the –y for symbol path, as it is set already in my default workspace or in my environment variable.

    When the debugger first loads a dump file it displays several lines of information before giving you a prompt to get started with your commands (by default):

    Microsoft (R) Windows Debugger Version 6.9.0003.113 X86  ß debugger version

    Copyright (c) Microsoft Corporation. All rights reserved. ß Copyright of the debugger creator

    Loading Dump File [C:\training\case 7f\MEMORY051308.22.DMP] ß location of the dump file loading

    Kernel Summary Dump File: Only kernel address space is available ß type of memory dump (mini, kernel, or full)

    Symbol search path is: srv*c:\pubsymbols*http://msdl.microsoft.com/download/symbols ß Symbol path for this debug session

    Executable search path is:  ß points to the directory the executable files are located. For most situations this is not needed. For other situations please check the debugger help file.

     

    The next 4 lines talk about The OS version, service packs and how many processors are on the box

    1 -Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (8 procs) Free x86 compatible

    2 - Product: Server, suite: Enterprise TerminalServer SingleUserTS

    3 - Built by: 3790.srv03_sp2_gdr.070304-2240

    4 - Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8

     

    Next we would see when the machine crashed and how long it was up prior to this crash:

    Debug session time: Wed May 14 01:27:36.768 2008 (GMT-4)

    System Uptime: 0 days 16:32:51.921

     

    After completing the above process, the debugger starts loading the dump file and parsing through the loaded symbols. Here you may notice some warnings for some user space processes which are not included in the kernel dump. This is ok.

    WARNING: Process directory table base BFF0A080 doesn't match CR3 007AF000

    WARNING: Process directory table base BFF0A080 doesn't match CR3 007AF000

    Loading Kernel Symbols

    ...........................................................................................................................................

    Loading User Symbols

    PEB is paged out (Peb.Ldr = 7ffdf00c).  Type ".hh dbgerr001" for details

    Loading unloaded module list

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

     

    1- Use !analyze -v to get detailed debugging information.

    2 - BugCheck 7F, {8, f773ffe0, 0, 0}

    3 - *** ERROR: Module load completed but symbols could not be loaded for ql2300.sy

    The three things I want to point out from above are:

    1 - !analyze –v: This is the debugger command used to help analyze a dump file by reviewing information passed to KeBugCheck including specific parameters of that crash. It will analyze this information and provide a definition of the bugcheck, a stack showing all current function calls, and, when possible, the name of an offending driver or process that the debugger thinks is at fault.  Please review the debugger help file for additional information in this area.

    2 – The type of bugcheck that occurred on the machine.

    3 – An error telling you about symbols missing or not available to help diagnose a particular driver or application. This can lead to a misdiagnostis if you’re not careful.

    Once loading is completed you should be at a kd> prompt. This prompt shows you the current processor you are using (if the machine has more than one).

    For this dump we are at processor 3 on an 8 proc machine:

    3: kd>

     

    To view the current crash stack location you can use the "K" command. There are multiple forms of this command, each one dumping the basic plus additional information. As functions are executed and call other functions, a call stack is created in stack memory. Here are two common commands to view the stack:

     

    3: kd> k

    ChildEBP RetAddr

    00000000 baebf0ce nt!KiTrap08+0x75

    b3a4bffc baebf737 storport!RaCallMiniportInterrupt+0x2

    b3a4c008 8088d889 storport!RaidpAdapterInterruptRoutine+0x1d

    b3a4c008 80a59d8e nt!KiInterruptDispatch+0x49

    b3a4c09c 80a5c2fc hal!HalpGenerateInterrupt+0x1d2

    b3a4c0c0 80a5c44d hal!HalpLowerIrqlHardwareInterrupts+0x108

    b3a4c0d0 808256ed hal!KfLowerIrql+0x59

    <snippet>

     

    3: kd> kb

    ChildEBP RetAddr  Args to Child

    00000000 baebf0ce 00000000 00000000 00000000 nt!KiTrap08+0x75

    b3a4bffc baebf737 97bedb88 b3a4c02c 8088d889 storport!RaCallMiniportInterrupt+0x2

    b3a4c008 8088d889 977b9e18 97bedad0 03010006 storport!RaidpAdapterInterruptRoutine+0x1d

    b3a4c008 80a59d8e 977b9e18 97bedad0 03010006 nt!KiInterruptDispatch+0x49

    b3a4c09c 80a5c2fc 97797004 97bedad0 00000102 hal!HalpGenerateInterrupt+0x1d2

    b3a4c0c0 80a5c44d 00000101 977b9e02 b3a4c0d8 hal!HalpLowerIrqlHardwareInterrupts+0x108

    b3a4c0d0 808256ed b3a4c0e8 baebf1c6 977b9bb0 hal!KfLowerIrql+0x59

    <snippet>

     

    Either one can be used depending on how much information you want to see and can use.

    This completes the Basic of Debugging Windows, Part I. I will create a Part II using specific questions gathered from our readers.

     

    Miscellaneous information:

    To go further with this topic I would suggest starting with the debugger help file included with the Microsoft Debugging Tools. 

    ADPlus – An automated way to use the cdb.exe to capture/create a usermode dump when a process hangs or crashes. (more info - http://msdn.microsoft.com/en-us/library/cc265629.aspx or kb286350)

    Public Symbols for Microsoft Operating Systems:

    Microsoft Public Symbol server : srv * DownstreamStore * http://msdl.microsoft.com/download/symbols

    example: srv*c:\mysyms*http://msdl.microsoft.com/download/symbols

     Microsoft Symbol packages http://www.microsoft.com/whdc/devtools/debugging/symbolpkg.mspx#d

    Use !Analyze-v to gather additional information about the bugcheck and a bucket-id for your dump file. The bucket-id can be submitted to Microsoft for review for similar crashes and resolutions. Try using the Microsoft Online Crash Analysis to submit your crash dump bucket-id for possible follow up from Microsoft or for Microsoft to look for trends: http://oca.microsoft.com/en/Welcome.aspx

    For concepts, tools and information about the system architecture:

    http://msdn.microsoft.com/en-us/default.aspx

    Windows Internal 4th edition (by Mark E. Russinovich & David A. Solomon) the whole book or Chapter 14 - Crash Dump Analysis

    Advanced Windows Debugging (by Mario Hewardt & Daniel Pravat )

    http://technet.microsoft.com/en-us/default.aspx

  • Ntdebugging Blog

    Remote Debugging connecting to a Remote Stub using the Microsoft Debugging Tools for Windows

    • 3 Comments

    The Microsoft Debugging Tools for Windows provide a couple ways to create a remote debugging connection including "Connecting to a remote session" and "Connecting to a remote stub". Connecting to a remote session is the most commonly used of the two, and is all you typically need, but it does not allow the client to access debugging symbols missing on the remote.

    Jeff’s blog post here has more information on some of the intricacies of debugging a remote session. Often a remote debugging session will be done within a corporate network where all users have access to symbol information, but if we are debugging outside of the secure environment where we cannot share symbols freely, debugging becomes more difficult .

    The Microsoft debugging tools package has two server side command line applications to help us do this by connecting to a remote stub: Dbgsrv.exe for user mode debugging and Kdsrv.exe for kernel mode debugging. They provide a thin server layer for the debugger to attach with full use of local symbols and source code.

    Conventional remote debug session

    Let’s illustrate the benefits of using remote stubs by showing the problems encountered when connecting to a remote session. 

     image_thumb[1][5]

    This diagram shows the conventional remote debug session. Both the debugging host and target are running windbg, though we could use KD or CDB. The goal is to debug a buggy application called Getif.exe which runs on the debugging target. We have symbols for the application on the debugging host however for security reasons the debugging target does not have symbol access to Getif.exe. On both machines we have Windows internet symbols loaded from the http://msdl.microsoft.com/download/symbols.

     

    Here is the symbol path on the debugging target machine. Because this system is outside our secure environment it does not have symbol access for the Getif.exe application. This will be problematic for debugging.

    clip_image008[4]_thumb[1][5]

     

    Here is the symbol path on the debugging host which includes a path to the symbols for the application because it’s inside the secure location.

    clip_image010[4]_thumb[1]

     

    To create the remote debugging server on the target machine, I specify the local port to use for the connection by using the .server command:

    .server tcp:port=<localport>

    clip_image012[4]_thumb[1]

     

    Next I select “Connect to Remote Session” on the debugging host.

    clip_image014[4]_thumb[1]

     

    Then enter the port and server name:

    clip_image016[4]_thumb[1]

     

    And this is where we run into the headache! Because we don’t have Getif.exe symbol access on the machine located outside of the secure environment, the stacks for getif.exe don’t display symbolic information. It makes it hard to debug the application because we can’t dump data types or view other symbolic info.

    clip_image018[4]_thumb[1]

    If we want to debug Getif.exe with symbols, we need to copy the symbol files to the debug target machine and add the location to the target’s symbol path. In our scenario this may not be possible because symbols are often restricted to in-house use and not allowed to be copied freely.

     

     

    Dbgsrv to the rescue!!

    image_thumb[3]

    By activating Dbgsrv on the debugging target machine, the debugger on the debugging host connects with full symbol access to the user mode process on the target. The diagram above illustrates the process.

    Let's run through an example using dbgsrv on the debugging target. On the debugging target enter

     <path to debugging tools>\dbgsrv -t tcp:port=<localport>

    Dbgsrv returns to a command prompt with no output in the command window and the debug port can be any open port on the system. It’s a good idea to confirm we are listening on the port we selected by using netstat. Here’s an example -  

    clip_image020[4]_thumb[1]

    Now let’s move our focus to the debugging host computer. Instead of choosing the typical “Connect a Remote Session”, we use the option, “Connect to a Remote Stub” from the File menu.

    clip_image022[4]_thumb[1]

    Next enter the port and server name.

    clip_image024[4]_thumb[1]

    After selecting OK, we are back to the empty Command window as expected. We won’t see any text until connected to the target process:

    clip_image026[4]_thumb[1]

    Next select File-Attach to a Process option.

    clip_image028[4]_thumb[1]

    It is important to note the “Attach to Process” dialog shows the list of processes running on the debugging target machine.

    clip_image030[4]_thumb[1]

    Now let’s attach to the process. The title bar confirms that we connected to the process on the kdsrv target:

    clip_image032[4]_thumb[1]

    Here you can see the benefit of connecting with a remote stub. The symbols for getif.exe are working for the application running on a machine located outside of the secure environment.

    clip_image034[4]_thumb[1]

     

     

    But what if I need to Kernel debug?

    You can do the same thing with Kernel debugging using Kdsrv.

    image_thumb[5]

    If we need to attach to a kernel debugger we can use kdsrv. Much like the previous example we enter the command:

    kdsrv -t tcp:port=<localport>

    We can see that we are listening on the local port we selected:

    clip_image036[4]_thumb[1]

    We must now start our debugger on the debugging host machine with a command line that includes the information needed to connect to the KdSrv host. As of Windbg 6.10.3.233 we cannot attach to the kdsrv remote stub from the Windbg graphical interface and then make the kernel connection to the debugging target .

    clip_image038[4]_thumb[1]

    Here is the string I used in the graphic above-

    windbg -k kdsrv:server=@{tcp:server=rodolpho,port=5004},trans=@{com:pipe,port=\\mimi\pipe\2008target}

    Note that we are debugging to a named pipe on a kernel debug target running under Microsoft Hyper-V! This is a great way to debug without the hassle of setting up a second machine and attaching with a null modem cable.

    If we wanted to attach to COM1 with a null-modem cable we would use:

    windbg -k kdsrv:server=@{tcp:server=rodolpho,port=5004},trans=@{com:port=com1,baud=11500}

    For more information on client and server syntax see http://msdn.microsoft.com/en-us/library/cc266440.aspx.

    After Windbg starts the command window will look just like if we attached with a local debugger. The title bar will confirm that we are attaching through the debug server:

    clip_image040[4]_thumb[1]

    We can also change context to the Getif.exe process after connecting to the remote stub. When we look at the application stack we have symbols for our application:

    clip_image042[4]_thumb[1]

    After we have our remote stub connection, we can then share our client as a remote session, but we will always handle the symbols on the remote stub client.

    Once you setup a remote stub a couple of times it’s as simple as setting up a remote session, and can make a remote debug go a lot smoother. For more information check out the debugging tools help, available outside of windbg as debugger.chm in the debugger directory, as well as http://msdn.microsoft.com/en-us/library/cc266434.aspx

    Security Note: These examples use TCP/IP or named pipes and are not password protected. The debugger remotes will allow SSL and Secure pipe connections with password and certificate authentication if you require greater security. See the online help or this link for more information.

  • Ntdebugging Blog

    Don't Believe Everything You Read

    • 3 Comments

    Recently, I was contacted by a customer who was advised by an ISV to set a registry value under one of the sub keys in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\.  Let's call it UseQuantumComputing = 1 (value name has been changed to protect the ISV).  The customer wanted to know what this value actually did and no one could find any documentation explaining it.  These issues often come to our team because we have access to the Windows source code. I did a bit of code review to find out what this value does.  As it turns out, nowhere in Windows source code between Windows 2000 and Windows Server 2012 do we ever check for or set UseQuantumComputing.

     

    I can think of a few reasons the ISV would suggest setting this value.  Perhaps they were under the impression this did something but got confused about the value name.  It's possible they hoped making a registry change would have a placebo effect.  Or, perhaps their software actually checks this value, not Windows.

     

    The latter of these possibilities is probably the worst case scenario. An ISV should not create a registry value inside of keys used for Windows’ own internal use. Why? The first reason is that there's no guarantee that Microsoft won't end up coincidentally using that same value name later. This would cause a conflict between the two users of the value.  Second, we have to consider the scenario where two different ISVs both decide to use the same value. That would be bad too.  Lastly, there's no guarantee that the key in use will still exist in later versions, or that it will be writeable or readable by the ISV due to permission changes.  In addition to all these reasons, there is the common sense issue that it is just confusing. Now the ISV's software and uninstaller needs to look all over the registry, not just in their own keys.

     

    On a similar note, I also recently had a case where a "Windows Tips" blog (not created, endorsed, or run by Microsoft) suggested using a registry value that was implemented in Windows but was not documented by Microsoft.  It turns out this value wasn't thoroughly tested (because it was undocumented and wasn't intended to be used in production), and using it would cause server hangs under certain conditions.  These hangs were only discovered after a large customer decided to implement the undocumented value across their enterprise. 

     

    Here are a few tips for IT Pros, developers, and users alike:

    • Don't implement random registry settings if you can't find documentation for that setting on an official Microsoft site, like MSDN, TechNet, or support.microsoft.com(information on forums or answer boards (e.g. social.*.microsoft.com or answers.*.microsoft.com) is not official documentation).  At best these unknown registry settings they will do nothing, at worst they will cause you headaches later.
    • If a key/value isn't documented, changes to it likely are not tested, and could put your machine in a state that makes it difficult or impossible to support.
    • If you are a developer, keep any of your registry settings in your own key. Don't pollute in others' keys.
    • If an ISV or Microsoft suggests you implement a setting, make sure you understand the implications of that setting.

     

    I'll leave you with the warning displayed in many of our KBs - it's there for a reason!

     

    WARNING: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.

     

    -Matt

  • Ntdebugging Blog

    Video: Using XPERF to root cause CPU consumption

    • 4 Comments

    Hi Debuggers,

     

    Here is a short video showing how to root cause CPU consumption issues on Vista/Server 2008 and higher operating systems with Xperf.

     

    For this scenario, Xperf is an easy to implement and powerful replacement for previous actions like gathering a series of dumps during the CPU, kernrate data, perfmon, other profilers, etc.

     

    Xperf allows one to quickly turn on profiling and see where the CPU time is being spent by using tracing already in the box!

     

     

    -Tate

     

     

    Note - Feel free to post questions here, or jump into a discussion about the video on our Twitter page ( @ntdebugging ).

     

     

    Share this post :
  • Ntdebugging Blog

    x64 Manual Stack Reconstruction and Stack Walking

    • 2 Comments

    My name is Trey Nash and I am an Escalation Engineer on the Core OS team.  My experience is as a software developer, and therefore my blog posts tend to be slanted in the direction of helping developers during the feature development, testing and the support phases.

    In this installment I would like to expand a bit on a previous post of mine called Challenges of Debugging Optimized x64 Code.  In that post I discussed the nuances of the x64 calling convention (thankfully of which there is only one) and how it is used in optimized builds of software.  The calling convention is sometimes referred to as the Application Binary Interface (ABI).  In this post, I would like to discuss the x64 unwind metadata and how you can use it in the debugger to manually walk a stack.

    In some cases, you may have a corrupted stack that the debugger simply cannot effectively walk for you.  This often happens because the debugger walks a stack from the top down (assuming the stack grows upwards as if it were a stack of places on a table), and if the stack is sufficiently trashed then the debugger cannot find its bearing.  In the x86 world, a large percentage of the time, you can spot the stack frames by following the chain of base pointers and then build a crafty stack backtrace command to display the stack at some point in time.  But in the x64 calling convention there is no base pointer.  In fact, once a function’s prolog code has executed the rsp register generally never changes until the epilog code.  To read more about x64 prolog and epilog code conventions, go here.

    Moreover, the syntax for creating a crafty stack backtrace command in the x64 environment is currently undocumented, and I aim to shed some light on that near the end of that blog post. J

    The Example Code

    For this blog post I have used the following example C# code that requires the .NET 4.0 framework and can be easily built from a Visual Studio 2010 command prompt.  You can find the code below:

    using System;
    using System.Numerics;
    using System.Threading;
    using System.Threading.Tasks;
    using System.Collections.Concurrent;

    class EntryPoint
    {
        const int FactorialsToCompute = 2000;

        static void Main() {
            var numbers = new ConcurrentDictionary<BigInteger, BigInteger>(4, FactorialsToCompute);

            // Create a factorial delegate.
            Func<BigInteger, BigInteger> factorial = null;
            factorial = (n) => ( n == 0 ) ? 1 : n * factorial(n-1);

            // Now compute the factorial of the list
            // concurrently.
            Parallel.For( 0,
                          FactorialsToCompute,
                          (i) => {
                            numbers[i] = factorial(i);
                          } );
        }
    }

    The spirit of this code is to concurrently compute the first 2000 factorials and store the results in a dictionary.  This code uses the new Task Parallel Library to distribute this work evenly across the multiple cores on the system.  To compile the example (assuming the code is stored in test.cs), you can execute the following command from a Visual Studio 2010 command prompt:

    csc /r:system.numerics.dll test.cs

    Note:    If you are using a 64bit platform, be sure to use the x64 command prompt shortcut installed by the Visual Studio 2010 installer.

    You can download a free evaluation of Visual Studio 2010
    here.

    x64 Unwind Metadata

    So how does the debugger and functions such as RtlVirtualUnwind know how to walk the x64 stack if it cannot find a base pointer?  The secret is that it uses unwind metadata that is typically baked into the Portable Executable (PE) file at link time. You can inspect this information using the /UNWINDINFO option of the command line tool dumpbin.  For example, I went to the directory on my machine which contains clr.dll (c:\Windows\Microsoft.NET\Framework\v4.0.30319) and dumped the unwind info looking for CLREvent::WaitEx, which I have pasted below:

      00013F20 000DFDB0 000DFE3C 007267D8  ?WaitEx@CLREvent@@QEAAKKW4WaitMode@@PEAUPendingSync@@@Z (public: unsigned long __cdecl CLREvent::WaitEx(unsigned long,enum WaitMode,struct PendingSync *))
        Unwind version: 1
        Unwind flags: UHANDLER
        Size of prologue: 0x20
        Count of codes: 10
        Unwind codes:
          20: SAVE_NONVOL, register=rbp offset=0xB0
          1C: SAVE_NONVOL, register=rbx offset=0xA8
          0F: ALLOC_SMALL, size=0x70
          0B: PUSH_NONVOL, register=r14
          09: PUSH_NONVOL, register=r13
          07: PUSH_NONVOL, register=r12
          05: PUSH_NONVOL, register=rdi
          04: PUSH_NONVOL, register=rsi
        Handler: 0020ADF0 __CxxFrameHandler3
        EH Handler Data: 007B3F54

    I’ll get into what all of this means shortly.

    Note:    The dumpbin.exe functionality is also exposed via the linker.  For example, the command “dumpbin.exe /?” is identical to “link.exe /dump /?”.

     

    Within the debugger, you can find this same information for a particular function using the .fnent command.  To demonstrate, I executed the example code within a windbg instance and broke in at some random point and chose one of the threads to look at which has a stack looking like the following:

      12  Id: f80.7f0 Suspend: 1 Teb: 000007ff`fffa0000 Unfrozen
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79
    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    At first glance, it may appear that this stack is already trashed since there is no symbol information for the bottom frames in the display.  Before jumping to this conclusion, recall that this is a managed application and therefore contains JIT compiled code.  To verify that the addresses without symbol information are JIT’ed code, you can do a couple of things.

    First, use the !EEHeap extension in the SOS extension to determine if these addresses reside in the JIT code heap.  Below, you can see the commands I used to both load the SOS extension and then display the Execution Engine (EE) Heap information:

    0:014> .loadby sos clr

    0:014> !EEHeap -loader
    Loader Heap:
    --------------------------------------
    System Domain:     000007fef50955a0
    LowFrequencyHeap:  000007ff00020000(2000:1000) Size: 0x1000 (4096) bytes.
    HighFrequencyHeap: 000007ff00022000(8000:1000) Size: 0x1000 (4096) bytes.
    StubHeap:          000007ff0002a000(2000:2000) Size: 0x2000 (8192) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff000d0000(6000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff000dc000(4000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff00106000(3a000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff000e0000(26000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0x8000 (32768) bytes.
    --------------------------------------
    Shared Domain:     000007fef5095040
    LowFrequencyHeap:  000007ff00020000(2000:1000) Size: 0x1000 (4096) bytes.
    HighFrequencyHeap: 000007ff00022000(8000:1000) Size: 0x1000 (4096) bytes.
    StubHeap:          000007ff0002a000(2000:2000) Size: 0x2000 (8192) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff000d0000(6000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff000dc000(4000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff00106000(3a000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff000e0000(26000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0x8000 (32768) bytes.
    --------------------------------------
    Domain 1:          00000000003e73c0
    LowFrequencyHeap:  000007ff00030000(2000:1000) 000007ff00140000(10000:5000) Size: 0x6000 (24576) bytes total, 0x1000 (4096) bytes wasted.
    HighFrequencyHeap: 000007ff00032000(8000:5000) Size: 0x5000 (20480) bytes.
    StubHeap:          Size: 0x0 (0) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff00040000(4000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff0004b000(2000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff0007c000(54000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff0004d000(2f000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0xf000 (61440) bytes total, 0x1000 (4096) bytes wasted.
    --------------------------------------
    Jit code heap:
    LoaderCodeHeap:    000007ff00150000(40000:2000) Size: 0x2000 (8192) bytes.
    Total size:        Size: 0x2000 (8192) bytes.

    --------------------------------------
    Module Thunk heaps:
    Module 000007fee5581000: Size: 0x0 (0) bytes.
    Module 000007ff000330d8: Size: 0x0 (0) bytes.
    Module 000007fef06f1000: Size: 0x0 (0) bytes.
    Total size:      Size: 0x0 (0) bytes.
    --------------------------------------
    Module Lookup Table heaps:
    Module 000007fee5581000: Size: 0x0 (0) bytes.
    Module 000007ff000330d8: Size: 0x0 (0) bytes.
    Module 000007fef06f1000: Size: 0x0 (0) bytes.
    Total size:      Size: 0x0 (0) bytes.
    --------------------------------------
    Total LoaderHeap size:   Size: 0x21000 (135168) bytes total, 0x1000 (4096) bytes wasted.

    =======================================

    I have highlighted the JIT heap information and you can see that the JIT’ed code instruction pointers in the stack fall within this range.

    The second sanity check you can perform is to use a variant of the u instruction to confirm that there is a call instruction just prior to that address as shown below:

    0:012> ub 0x7ff`0015098c
    000007ff`0015095e 488b01          mov     rax,qword ptr [rcx]
    000007ff`00150961 48898424b0000000 mov     qword ptr [rsp+0B0h],rax
    000007ff`00150969 488b4108        mov     rax,qword ptr [rcx+8]
    000007ff`0015096d 48898424b8000000 mov     qword ptr [rsp+0B8h],rax
    000007ff`00150975 4c8d8424b0000000 lea     r8,[rsp+0B0h]
    000007ff`0015097d 488b5308        mov     rdx,qword ptr [rbx+8]
    000007ff`00150981 488d8c24c0000000 lea     rcx,[rsp+0C0h]
    000007ff`00150989 ff5318          call    qword ptr [rbx+18h]

    So at this point we have verified that we probably have a valid stack.  But how does the debugger so effectively walk this stack for us if there is no stack frame pointer?  The answer, of course, is that it uses the unwind information.

    To explore the answer to that question, let’s focus on a particular frame within the stack such as frame 4 in the stack above.  The code at that frame is inside the function clr!CLREvent::WaitEx, and if we pass that to .fnent, we get the following output:

    0:012> .fnent clr!CLREvent::WaitEx
    Debugger function entry 00000000`04075e40 for:
    (000007fe`f48bfdb0)   clr!CLREvent::WaitEx   |  (000007fe`f48bfe3c)   clr!CLREvent::Set
    Exact matches:
        clr!CLREvent::WaitEx = <no type information>

    BeginAddress      = 00000000`000dfdb0
    EndAddress        = 00000000`000dfe3c
    UnwindInfoAddress = 00000000`007267d8

    Unwind info at 000007fe`f4f067d8, 20 bytes
      version 1, flags 2, prolog 20, codes a
      frame reg 0, frame offs 0
      handler routine: clr!_CxxFrameHandler3 (000007fe`f49eadf0), data 7b3f54
      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0
      02: offs 1c, unwind op 4, op info 3    UWOP_SAVE_NONVOL FrameOffset: a8
      04: offs f, unwind op 2, op info d     UWOP_ALLOC_SMALL
      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL
      06: offs 9, unwind op 0, op info d     UWOP_PUSH_NONVOL
      07: offs 7, unwind op 0, op info c     UWOP_PUSH_NONVOL
      08: offs 5, unwind op 0, op info 7     UWOP_PUSH_NONVOL
      09: offs 4, unwind op 0, op info 6     UWOP_PUSH_NONVOL

     Notice that this output is virtually identical to the same information provided by dumpbin using the /UNWINDINFO option.

    I have highlighted two interesting values above.  The value highlighted in green is a relative virtual address (RVA) to the unwind info that is baked into the PE file by the linker.  The value highlighted in yellow is the actual virtual address of the unwind info and can be computed by adding the module base address shown below to the RVA for UnwindInfoAddress.

    0:012> lmnm clr

    start             end                 module name

    000007fe`f47e0000 000007fe`f5145000   clr

    By examining the PE header using !dh you can confirm that the unwind information resides in the .rdata section of the module, which I have shown below:

    0:012> !dh clr

    File Type: DLL
    FILE HEADER VALUES
        8664 machine (X64)
           6 number of sections
    4BA21EEB time date stamp Thu Mar 18 07:39:07 2010

    <snip>
    SECTION HEADER #2
      .rdata name
      1FC8EC virtual size
      67F000 virtual address
      1FCA00 size of raw data
      67E200 file pointer to raw data
           0 file pointer to relocation table
           0 file pointer to line numbers
           0 number of relocations
           0 number of line numbers
    40000040 flags
             Initialized Data
             (no align specified)
             Read Only
    <snip>

    Using the Unwind Info

    Now let’s take a look at the unwind info and compare it to the prolog code of the function with which it is associated.  For convenience, I have reprinted the .fnent output for the function:

    0:012> .fnent clr!CLREvent::WaitEx
    Debugger function entry 00000000`04075e40 for:
    (000007fe`f48bfdb0)   clr!CLREvent::WaitEx   |  (000007fe`f48bfe3c)   clr!CLREvent::Set
    Exact matches:
        clr!CLREvent::WaitEx = <no type information>

    BeginAddress      = 00000000`000dfdb0
    EndAddress        = 00000000`000dfe3c
    UnwindInfoAddress = 00000000`007267d8

    Unwind info at 000007fe`f4f067d8, 20 bytes
      version 1, flags 2, prolog 20, codes a
      frame reg 0, frame offs 0
      handler routine: clr!_CxxFrameHandler3 (000007fe`f49eadf0), data 7b3f54
      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0
      02: offs 1c, unwind op 4, op info 3    UWOP_SAVE_NONVOL FrameOffset: a8
      04: offs f, unwind op 2, op info d     UWOP_ALLOC_SMALL
      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL
      06: offs 9, unwind op 0, op info d     UWOP_PUSH_NONVOL
      07: offs 7, unwind op 0, op info c     UWOP_PUSH_NONVOL
      08: offs 5, unwind op 0, op info 7     UWOP_PUSH_NONVOL
      09: offs 4, unwind op 0, op info 6     UWOP_PUSH_NONVOL

    The yellow highlighted value tells us that the prolog code for the function is 0x20 bytes in length. Using that information we can dump out the prolog code for the function:

    0:012> u clr!CLREvent::WaitEx clr!CLREvent::WaitEx+20
    clr!CLREvent::WaitEx:
    000007fe`f48bfdb0 488bc4          mov     rax,rsp
    000007fe`f48bfdb3 56              push    rsi
    000007fe`f48bfdb4 57              push    rdi
    000007fe`f48bfdb5 4154            push    r12
    000007fe`f48bfdb7 4155            push    r13
    000007fe`f48bfdb9 4156            push    r14
    000007fe`f48bfdbb 4883ec70        sub     rsp,70h
    000007fe`f48bfdbf 48c7442440feffffff mov   qword ptr [rsp+40h],0FFFFFFFFFFFFFFFEh
    000007fe`f48bfdc8 48895810        mov     qword ptr [rax+10h],rbx
    000007fe`f48bfdcc 48896818        mov     qword ptr [rax+18h],rbp

    The list of operations in the unwind info is listed in the reverse order of the operations in the assembly code.  Each of the UWOP_PUSH_NONVOL operations in the unwind info maps to a nonvolatile register that is pushed onto the stack for safe keeping in the prolog code.  I have highlighted the sections within the prolog and the .fnent output such that highlighting with like colors indicates related information.  Now, let’s take a look at the raw stack and tie all of this information together. 

    Below is the stack with the frame we are focusing on highlighted in yellow:

    0:012> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79
    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    Note:    The symbols above look a little weird and may lead you to believe that WaitEx is calling itself recursively, but it is not.  It only appears that way because you need the private symbols for clr.dll to be able to see the real function name.  Only public symbols are available outside of Microsoft.

     

    And below is the raw stack relevant to this frame with some highlighting and annotations that I have added:

    0:012> dps 00000000`04a51f60-10 L20
    00000000`04a51f50  00000000`00000001
    00000000`04a51f58  000007fe`f48bfe23 clr!CLREvent::WaitEx+0x5e
    00000000`04a51f60  00000000`c0402388
    00000000`04a51f68  00000000`c0402500
    00000000`04a51f70  000007fe`f48afaf0 clr!SystemNative::ArrayCopy
    00000000`04a51f78  00000000`00000182
    00000000`04a51f80  00000000`04a521d0
    00000000`04a51f88  000007fe`00000001
    00000000`04a51f90  00000000`00000057
    00000000`04a51f98  00000000`c0402398
    00000000`04a51fa0  ffffffff`fffffffe
    00000000`04a51fa8  007f0000`04a521d0
    00000000`04a51fb0  fffff880`009ca540
    00000000`04a51fb8  000007fe`f483da5b clr!SVR::heap_select::select_heap+0x1c
    00000000`04a51fc0  fffff880`009ca540
    00000000`04a51fc8  000007fe`fd4e18aa KERNELBASE!ResetEvent+0xa
    00000000`04a51fd0  00000000`0043dc60
    00000000`04a51fd8  00000000`00000178
    00000000`04a51fe0  00000000`00493c10
    00000000`04a51fe8  00000000`0043dc60    
    ß saved rdi
    00000000`04a51ff0  00000000`00000001

       *** call into clr!CLREvent::WaitEx ***

    00000000`04a51ff8  000007fe`f48d51d8 clr!SVR::gc_heap::wait_for_gc_done+0x98
    00000000`04a52000  00000000`00493ba0
    00000000`04a52008  00000000`00493ba0    
    ß saved rbx
    00000000`04a52010  00000000`00000058    
    ß saved rbp
    00000000`04a52018  000007fe`f0711e0f System_Numerics_ni+0x21e0f
    00000000`04a52020  00000000`00000178
    00000000`04a52028  000007fe`f4995249 clr!SVR::GCHeap::Alloc+0xb4
    00000000`04a52030  00000000`0043a140
    00000000`04a52038  00000000`0043dc60
    00000000`04a52040  00000000`00000000
    00000000`04a52048  00000000`04a522e0

    In the stack listing I have used the same color highlighting scheme as before to show how the data on the raw stack correlates to the unwind data.  And, using green highlighting, I have shown how the Child-SP value correlates to the stack frame.

    The cyan highlighting represents nonvolatile registers that are pushed onto the stack in the prolog code.  The blue highlighting represents stack space reserved for locals and for register home space allocated for calling sub routines.  In the unwind data the stack reservation is represented by a UWOP_ALLOC_SMALL operation.  And the red highlighting represents nonvolatile registers that are stored in the home space of the previous stack frame and represented by a UWOP_SAVE_NONVOL operation stored in the unwind information.

    As you can see, we have all of the information we need in the unwind data to determine which slots on the stack are used for what.  The only thing we don’t know is the partitioning of the reserved stack space for locals, which is described by the private symbol information for the clr.dll module.

    Tying it all Together

    .fnent produces its output directly from parsing the definition of the UNWIND_INFO structure and it even gives you the address of where that structure lives in memory.  The UNWIND_INFO structure also contains a variable amount of UNWIND_CODE structures.  You can find details of the structure definitions for UNWIND_INFO and UNWIND_CODE here.  Each parsed line of unwind information in the .fnent output is backed by at least one of these structures.  In fact, you can see the correlation between the structure fields for UNWIND_INFO and the data in the .fnent output as shown below:

    From UNWIND_CODE:

    UBYTE

    Offset in prolog

    UBYTE: 4

    Unwind operation code

    UBYTE: 4

    Operation info

    From .fnent:

      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL

    The meaning of the OpInfo (operation info) field is dependent on the UnwindOp (unwind operation code) field and is spelled out in the documentation for UNWIND_CODE.  For example, for the UWOP_PUSH_NONVOL operation shown above, the OpInfo field is an index into the following table, which indicates which nonvolatile register this push is associated with.  Note that the values in the below table are in decimal, while the .fnent values are in hex:

    0

    RAX

    1

    RCX

    2

    RDX

    3

    RBX

    4

    RSP

    5

    RBP

    6

    RSI

    7

    RDI

    8 to 15

    R8 to R15

    Therefore, the previous line from the .fnent output represents a push operation for the r14 register (05: offs b, unwind op 0, op info e   UWOP_PUSH_NONVOL).  Looking at the assembly above, we see that the topmost UWOP_PUSH_NONVOL operation in the .fnent output correlates to the last nonvolatile register push in the prolog code (push    r14).

    Note:    Remember, the push operations in the .fnent output are listed in the reverse order of where they are in the actual prolog code.  This helps the unwind code easily calculate offsets of where they should live in the stack.

     

    One thing that you will notice in the x64 calling convention is that once the prolog code has executed, the value for rsp will very rarely change.  The Child-SP value in the stack displayed by the k commands is the value of rsp for that frame after the prolog code has executed.  So the offsets to access these nonvolatile registers are then applied to the Child-SP value (previously highlighted in green) to find where they live on the stack.  So, in a way, the Child-SP value acts like the base pointer we are used to on the x86 platform.

    In the .fnent output above, you will also see the following:

      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0

    For UWOP_SAVE_NONVOL, you see that the .fnent output shows us the offset where we can find this register, and the register in question is represented by the OpInfo value that equates to rbp.  The offset above is applied to the Child-SP value (00000000`04a51f60 in this case) to produce the address 00000000`04a52010, which indicates that’s where we can find a saved copy of rbp.  I have also annotated where it lives in the raw stack output shown previously.

    Note:    If you’re wondering why rbp is stored in the previous stack frame, check out my previous post on this topic where I describe how in optimized builds, the compiler can use the home space from the previous stack frame to save nonvolatile registers thus saving them with a MOV operation as opposed to a PUSH operation.  This is possible because in optimized builds the home space is not necessarily used to store parameters.

     

    So how does all of This Work for CLR JIT Code?

    If you have asked this question, then you are definitely paying attention!  As we have shown, the compiler and linker are responsible for placing unwind info in the Portable Executable file at build time.  But what about dynamic code that is generated at runtime?  Certainly there must be unwind information for dynamically compiled code as well, otherwise there would be no way to walk the stack or unwind the stack after an exception.

    As it turns out, APIs exist for this very situation, including RtlAddFunctionTable  and RtlInstallFunctionTableCallback.  In fact, the CLR uses RtlInstallFunctionTableCallback.  The generated unwind information is then rooted in a linked list where the head is at ntdll!RtlpDynamicFunctionTable.  The format of the linked list items is undocumented as it is an implementation detail, but using dbghelp.dll you can find the unwind information for a given instruction pointer if you so desire by calling SymFunctionTableAccess64.

    In fact, if you want to see the CLR adding dynamic unwind info in action you can run the test code above under the debugger, and then at the initial breakpoint, before the application starts running, set the following breakpoint:

    bu ntdll!RtlInstallFunctionTableCallback

    When you let the application run you should then end up with a call stack at the breakpoint that looks like the following, which clearly shows the JIT compiler adding the unwind info to the table dynamically:

    0:000> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`0017dca8 000007fe`f4832cc6 ntdll!RtlInstallFunctionTableCallback
    01 00000000`0017dcb0 000007fe`f4831422 clr!InstallEEFunctionTable+0x77
    02 00000000`0017df60 000007fe`f4828ca8 clr!StubLinker::EmitUnwindInfo+0x492
    03 00000000`0017e050 000007fe`f4832c1a clr!StubLinker::EmitStub+0xe8
    04 00000000`0017e0b0 000007fe`f48328e5 clr!StubLinker::LinkInterceptor+0x1ea
    05 00000000`0017e160 000007fe`f4831e40 clr!CTPMethodTable::CreateStubForNonVirtualMethod+0xa35
    06 00000000`0017e300 000007fe`f4832926 clr!CRemotingServices::GetStubForNonVirtualMethod+0x50
    07 00000000`0017e3c0 000007fe`f48223f3 clr!MethodDesc::DoPrestub+0x38b
    08 00000000`0017e4d0 000007fe`f47e2d07 clr!PreStubWorker+0x1df
    09 00000000`0017e590 000007fe`f48210b4 clr!ThePreStubAMD64+0x87
    0a 00000000`0017e660 000007fe`f48211c9 clr!CallDescrWorker+0x84
    0b 00000000`0017e6d0 000007fe`f4821245 clr!CallDescrWorkerWithHandler+0xa9
    0c 00000000`0017e750 000007fe`f4823cf1 clr!MethodDesc::CallDescr+0x2a1
    0d 00000000`0017e9b0 000007fe`f49cdc3d clr!MethodDescCallSite::Call+0x35
    0e 00000000`0017e9f0 000007fe`f4999f0d clr!AppDomain::InitializeDomainContext+0x1ac
    0f 00000000`0017ebf0 000007fe`f49212a1 clr!SystemDomain::InitializeDefaultDomain+0x13d
    10 00000000`0017f0c0 000007fe`f4923dd6 clr!SystemDomain::ExecuteMainMethod+0x191
    11 00000000`0017f670 000007fe`f4923cf3 clr!ExecuteEXE+0x43
    12 00000000`0017f6d0 000007fe`f49a7365 clr!CorExeMainInternal+0xc4
    13 00000000`0017f740 000007fe`f8ad3309 clr!CorExeMain+0x15

    But there is one more wrinkle to this picture.  We now know that by using RtlInstallFunctionTableCallback the CLR, or any other JIT engine, can register a callback that provides the unwind information at runtime.  But how does the debugger access this information?  When the debugger is broken into the process or if you are debugging a dump, it cannot execute the callback function registered with RtlInstallFunctionTableCallback.

    This is where the sixth and final parameter to RtlInstallFunctionTableCallback comes into play.  By providing the OutOfProcessCallbackDll parameter, the CLR is providing a dll which the debugger can use to effectively parse through the JITer’s unwind information statically.  When inspecting which path the CLR passes for OutOfProcessCallbackDll on my machine, I see the following string:

    0:000> du /c 80 000007fe`f5916160
    000007fe`f5916160  "C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll"

    So, the debugger uses mscordacwks.dll to statically examine the unwind info while the process is broken in the debugger or while inspecting a dump.

    Note:    This is one of the many reasons why you must have a complete process dump to effectively post-mortem debug managed applications.

    Using the ‘k =’ Command to Dump the Stack

    If you look at the documentation for the k command, you’ll see that there is a way to override the base pointer when walking the stack.  However, the documentation leaves it a complete mystery as to how to apply this in the x64 world.  To demonstrate what I mean, consider the following stack from earlier:

    0:012> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79

    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    Now, imagine the top of the stack is corrupted, which I have “simulated” by blacking out the top few frames in this stack dump.  Furthermore, let’s assume that we identified a frame where the stack starts to look sane again by looking at the raw stack below:

    0:012> dps 00000000`04a51e90 
    00000000`04a51e90  00000000`00000000
    00000000`04a51e98  00000000`04a52130
    00000000`04a51ea0  00000000`ffffffff
    00000000`04a51ea8  00000000`ffffffff
    00000000`04a51eb0  00000000`00000108
    00000000`04a51eb8  000007fe`f48bffc7 clr!CLREvent::WaitEx+0x170
    00000000`04a51ec0  00000000`00000000
    00000000`04a51ec8  00000000`00000108
    00000000`04a51ed0  000007fe`00000000
    00000000`04a51ed8  00000000`00000108
    00000000`04a51ee0  ffffffff`fffffffe
    00000000`04a51ee8  00000000`00000001
    00000000`04a51ef0  00000000`00000000
    00000000`04a51ef8  000007fe`f48bff70 clr!CLREvent::WaitEx+0xf8
    00000000`04a51f00  00000000`00000000
    00000000`04a51f08  00000000`00493ba0

    From looking at this stack, we can see the typical pattern of stack frames because the return addresses resolve to symbols of sorts.

    To dump out the corrupted stack, here is the undocumented syntax for the x64 platform:

    k = <rsp> <rip> <frame_count>

    <rsp> is the stack pointer to start with.  You want to use the stack pointer that would have been in rsp when that function was active.  Remember, typically rsp does not change after the function prolog code completes.  Therefore, if you pick the stack pointer just below the return address, you should be good.

    <rip> should be an instruction pointer from within the function that was executing at the time the <rsp> value above was in play.  In this case, the return address directly above <rsp> comes from that function and I have highlighted it in green.  This piece of information is critical so that the debugger can find the unwind metadata for the function that was current at this point in the stack.  Without it, the debugger cannot walk the stack.

    Armed with this information, you can construct a k command to dump the stack starting from this frame as shown below:

    0:012> kn = 00000000`04a51ec0 000007fe`f48bffc7 10
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    01 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    02 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    03 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    04 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    05 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    06 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    07 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    08 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    09 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0a 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0b 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0c 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c

    Note:    The frame count in the above k expression is required.  That is the way the debugger engine distinguishes between this variant of the command (with an overridden rip) and the documented form of k that does not provide an overridden rip.

    Conclusion

    Since the x64 calling convention does not utilize a base pointer (among other things), we need some extra information to effectively walk the stack.  That extra information comes in the form of unwind metadata and is generated by the compiler and linker for static code and baked into the portable executable file.  If you happen to code in assembly language, there are various macros that you must use to decorate your assembly code so that the assembler can generate the proper unwind metadata.  For dynamically compiled code, that information is instead provided at runtime by registering a callback with the system.  Knowing this information is critical if you encounter a corrupted stack and must piece it together manually.  In such situations you’ll need to know how to dig out the unwind metadata manually and use it to effectively reconstruct the call stack.

    That said, you could spare yourself some effort and use the undocumented variant of the k command described above to dump the stack starting at any frame. J

    Happy debugging everyone!

    "The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious.  No association with any real company, organization, product, domain name, email address, logo, person, places, or events is intended or should be inferred."

Page 4 of 24 (235 items) «23456»