• Ntdebugging Blog

    Debug Sleuth at Work: Hung Server…..Mystery of the unprocessed SMB work item.


    Hello folks, Pushkar here. I recently worked on a case where the server was hung at “Applying User Settings” during the logon phase. You might ask what’s going to be new in this post, NTDebugging has bunch of posts covering such debug scenarios J. In my defense, this case was particularly interesting because it defied the general logic that I had about processing of network connection request. I am still a newbie in this area getting my feet wet in the arena of debugging. I had to seek help from one of the GES guru Aman Jain. He is an expert in the Windows Networking Stack. He helped me in confirming my suspicion.


    As we were stuck at “Applying User Settings”, the obvious path was to look at the activity of Winlogon to understand why is it stuck? I could see that there were seven logon requests which haven’t been processed yet.


    1: kd> !session

    Sessions on machine: 7

    Valid Sessions: 0 1 2 3 4 5 6


    1: kd> !process 0 0 winlogon.exe

    PROCESS fffffa806e132c10

        SessionId: 1  Cid: 0384    Peb: 7fffffdf000  ParentCid: 0298

        DirBase: db59d000  ObjectTable: fffff88007fd0430  HandleCount: 139.

        Image: winlogon.exe



    The above command returns 7 instances of Winlogon.exe, one instance per session. I picked one of the sessions as all the 7 sessions were in the same state.


    As this was a Windows 2008 Server, winlogon would open an LPC Channel with Logonui.exe to process the user’s logon and then it will call into Local Session Manager (LSM.EXE) to spawn the users’ session. I could see that winlogon is waiting on message reply from Logonui.exe.


    1: kd> !thread fffffa80`73d3d770

    THREAD fffffa8073d3d770  Cid 0384.1e2c  Teb: 000007fffffd7000 Win32Thread: 0000000000000000 WAIT: (WrLpcReply) UserMode Non-Alertable

        fffffa8073d3db00  Semaphore Limit 0x1

    Waiting for reply to ALPC Message fffff8801ffd9cf0 : queued at port fffffa804e9d9370 : owned by process fffffa806e134c10

    Not impersonating

    DeviceMap                 fffff880000073a0

    Owning Process            fffffa806e132c10       Image:         winlogon.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      4192487        Ticks: 1882 (0:00:00:29.359)

    Context Switch Count      2            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address 0x0000000076f67cb0

    Stack Init fffffa60110a9db0 Current fffffa60110a9670

    Base fffffa60110aa000 Limit fffffa60110a4000 Call 0

    Priority 13 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffffa60`110a96b0 fffff800`01ebb6fa nt!KiSwapContext+0x7f

    fffffa60`110a97f0 fffff800`01eb035b nt!KiSwapThread+0x13a

    fffffa60`110a9860 fffff800`01ee54e2 nt!KeWaitForSingleObject+0x2cb

    fffffa60`110a98f0 fffff800`0212ef34 nt!AlpcpSignalAndWait+0x92

    fffffa60`110a9980 fffff800`0212b9c6 nt!AlpcpReceiveSynchronousReply+0x44

    fffffa60`110a99e0 fffff800`0211b52f nt!AlpcpProcessSynchronousRequest+0x24f

    fffffa60`110a9b00 fffff800`01eb8ef3 nt!NtAlpcSendWaitReceivePort+0x19f

    fffffa60`110a9bb0 00000000`76f9756a nt!KiSystemServiceCopyEnd+0x13

    00000000`01f8f098 00000000`00000000 0x76f9756a


    1: kd> !alpc /m fffff8801ffd9cf0


    Message @ fffff8801ffd9cf0

      MessageID             : 0x0464 (1124)

      CallbackID            : 0x2844F0 (2639088)

      SequenceNumber        : 0x0000000B (11)

      Type                  : LPC_REQUEST

      DataLength            : 0x0064 (100)

      TotalLength           : 0x008C (140)

      Canceled              : No

      Release               : No

      ReplyWaitReply        : No

      Continuation          : Yes

      OwnerPort             : fffffa80481ba9b0 [ALPC_CLIENT_COMMUNICATION_PORT]

      WaitingThread         : fffffa8073d3d770

      QueueType             : ALPC_MSGQUEUE_PENDING

      QueuePort             : fffffa804e9d9370 [ALPC_CONNECTION_PORT]

      QueuePortOwnerProcess : fffffa806e134c10 (LogonUI.exe)

      ServerThread          : fffffa804f96d5a0                  <----------- LPC Message Server Thread

      QuotaCharged          : No

      CancelQueuePort       : 0000000000000000

      CancelSequencePort    : 0000000000000000

      CancelSequenceNumber  : 0x00000000 (0)

      ClientContext         : 0000000000243d00

      ServerContext         : 0000000000000000

      PortContext           : 0000000001c27600

      CancelPortContext     : 0000000000000000

      SecurityData          : 0000000000000000

      View                  : 0000000000000000


    1: kd> !thread fffffa804f96d5a0

    THREAD fffffa804f96d5a0  Cid 01ac.09fc  Teb: 000007fffffd7000 Win32Thread: fffff900c067dd50 WAIT: (UserRequest) UserMode Non-Alertable

        fffffa80516be920  NotificationEvent

    Not impersonating

    DeviceMap                 fffff880000073a0

    Owning Process            fffffa806e134c10       Image:         LogonUI.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      4192487        Ticks: 1882 (0:00:00:29.359)

    Context Switch Count      13                 LargeStack

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address 0x000007fefed47780

    Stack Init fffffa60114e4db0 Current fffffa60114e4940

    Base fffffa60114e5000 Limit fffffa60114df000 Call 0

    Priority 14 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffffa60`114e4980 fffff800`01ebb6fa nt!KiSwapContext+0x7f

    fffffa60`114e4ac0 fffff800`01eb035b nt!KiSwapThread+0x13a

    fffffa60`114e4b30 fffff800`02119608 nt!KeWaitForSingleObject+0x2cb

    fffffa60`114e4bc0 fffff800`01eb8ef3 nt!NtWaitForSingleObject+0x98

    fffffa60`114e4c20 00000000`76f96d5a nt!KiSystemServiceCopyEnd+0x13

    00000000`0444ef18 00000000`00000000 0x76f96d5a


    This server thread is processing a message, but we can’t tell what it is waiting for. If this thread was finished processing a message we would have seen “WAIT: (WrLpcReceive)” for the thread wait reason. Often this type of wait is for a critical section or other synchronization object that is waiting for another thread in the process. This begged an investigation of Logonui.exe, to see what the other threads were doing.


    I noticed that another Logonui.exe thread is waiting on LPC Message processing by a thread in LSM.EXE.


    1: kd> !thread fffffa80`4e9e5060

    THREAD fffffa804e9e5060  Cid 01ac.0438  Teb: 000007fffffd9000 Win32Thread: fffff900c06db2e0 WAIT: (WrLpcReply) UserMode Non-Alertable

        fffffa804e9e53f0  Semaphore Limit 0x1

    Waiting for reply to ALPC Message fffff8801ffaacf0 : queued at port fffffa804e9b6060 : owned by process fffffa806e130c10

    Not impersonating

    DeviceMap                 fffff880000073a0

    Owning Process            fffffa806e134c10       Image:         LogonUI.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      4192488        Ticks: 1881 (0:00:00:29.343)

    Context Switch Count      165                 LargeStack

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address 0x000007fefbaa151c

    Stack Init fffffa6008d26db0 Current fffffa6008d26670

    Base fffffa6008d27000 Limit fffffa6008d20000 Call 0

    Priority 15 BasePriority 13 PriorityDecrement 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffffa60`08d266b0 fffff800`01ebb6fa nt!KiSwapContext+0x7f

    fffffa60`08d267f0 fffff800`01eb035b nt!KiSwapThread+0x13a

    fffffa60`08d26860 fffff800`01ee54e2 nt!KeWaitForSingleObject+0x2cb

    fffffa60`08d268f0 fffff800`0212ef34 nt!AlpcpSignalAndWait+0x92

    fffffa60`08d26980 fffff800`0212b9c6 nt!AlpcpReceiveSynchronousReply+0x44

    fffffa60`08d269e0 fffff800`0211b52f nt!AlpcpProcessSynchronousRequest+0x24f

    fffffa60`08d26b00 fffff800`01eb8ef3 nt!NtAlpcSendWaitReceivePort+0x19f

    fffffa60`08d26bb0 00000000`76f9756a nt!KiSystemServiceCopyEnd+0x13

    00000000`0272ea78 00000000`00000000 0x76f9756a


    At this point I decided to follow the LPC message chain trail to see where it ends and who is ultimately blocking it.


    1: kd> !alpc /m fffff8801ffaacf0


    Message @ fffff8801ffaacf0

      MessageID             : 0x0474 (1140)

      CallbackID            : 0x2844F6 (2639094)

      SequenceNumber        : 0x0000000D (13)

      Type                  : LPC_REQUEST

      DataLength            : 0x0058 (88)

      TotalLength           : 0x0080 (128)

      Canceled              : No

      Release               : No

      ReplyWaitReply        : No

      Continuation          : Yes

      OwnerPort             : fffffa804ea03c80 [ALPC_CLIENT_COMMUNICATION_PORT]

      WaitingThread         : fffffa804e9e5060

      QueueType             : ALPC_MSGQUEUE_PENDING

      QueuePort             : fffffa804e9b6060 [ALPC_CONNECTION_PORT]

      QueuePortOwnerProcess : fffffa806e130c10 (lsm.exe)

      ServerThread          : fffffa805180c700            <---- LPC Server thread for this message in LSM.EXE

      QuotaCharged          : No

      CancelQueuePort       : 0000000000000000

      CancelSequencePort    : 0000000000000000

      CancelSequenceNumber  : 0x00000000 (0)

      ClientContext         : 0000000001cf0e90

      ServerContext         : 0000000000000000

      PortContext           : 0000000001dc9df0

      CancelPortContext     : 0000000000000000

      SecurityData          : 0000000000000000

      View                  : 0000000000000000


    I noticed that the server thread which is supposed to process the LPC message for above thread is itself waiting on a message reply from a thread in Svchost.exe. LSM had bunch of other threads which were waiting on LPC message processing from server threads in Svchost.exe. I dumped out all the threads and their LPC message structure to locate the server threads in Svchost.exe which seems to be doing some activity and found that only for the thread below the server thread had some interesting information as other LPC Server threads in svchost.exe were Notification Timers.


    1: kd> !thread fffffa80`50910060

    THREAD fffffa8050910060  Cid 02e0.1a70  Teb: 000007fffffda000 Win32Thread: 0000000000000000 WAIT: (WrLpcReply) UserMode Non-Alertable

        fffffa80509103f0  Semaphore Limit 0x1

    Waiting for reply to ALPC Message fffff8801fe00cf0 : queued at port fffffa804e876b60 : owned by process fffffa804ea4fc10

    Not impersonating

    DeviceMap                 fffff880000073a0

    Owning Process            fffffa806e130c10       Image:         lsm.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3187894        Ticks: 1006475 (0:04:21:41.110)

    Context Switch Count      97            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address 0x000007fefed47780

    Stack Init fffffa60125b1db0 Current fffffa60125b1670

    Base fffffa60125b2000 Limit fffffa60125ac000 Call 0

    Priority 9 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5

    Kernel stack not resident.

    Child-SP          RetAddr           Call Site

    fffffa60`125b16b0 fffff800`01ebb6fa nt!KiSwapContext+0x7f

    fffffa60`125b17f0 fffff800`01eb035b nt!KiSwapThread+0x13a

    fffffa60`125b1860 fffff800`01ee54e2 nt!KeWaitForSingleObject+0x2cb

    fffffa60`125b18f0 fffff800`0212ef34 nt!AlpcpSignalAndWait+0x92

    fffffa60`125b1980 fffff800`0212b9c6 nt!AlpcpReceiveSynchronousReply+0x44

    fffffa60`125b19e0 fffff800`0211b52f nt!AlpcpProcessSynchronousRequest+0x24f

    fffffa60`125b1b00 fffff800`01eb8ef3 nt!NtAlpcSendWaitReceivePort+0x19f

    fffffa60`125b1bb0 00000000`76f9756a nt!KiSystemServiceCopyEnd+0x13

    00000000`0205e538 00000000`00000000 0x76f9756a


    I looked at the server thread for this LPC message and that was also waiting on a notification event doing nothing. It again begged an investigation of the state of Svchost.exe to find out why the server thread hasn’t replied yet?


    1: kd> !alpc /m fffff8801fe00cf0


    Message @ fffff8801fe00cf0

      MessageID             : 0x03C4 (964)

      CallbackID            : 0x2140C2 (2179266)

      SequenceNumber        : 0x00000021 (33)

      Type                  : LPC_REQUEST

      DataLength            : 0x00C0 (192)

      TotalLength           : 0x00E8 (232)

      Canceled              : No

      Release               : No

      ReplyWaitReply        : No

      Continuation          : Yes

      OwnerPort             : fffffa804ef15d30 [ALPC_CLIENT_COMMUNICATION_PORT]

      WaitingThread         : fffffa8050910060

      QueueType             : ALPC_MSGQUEUE_PENDING

      QueuePort             : fffffa804e876b60 [ALPC_CONNECTION_PORT]

      QueuePortOwnerProcess : fffffa804ea4fc10 (svchost.exe)

      ServerThread          : fffffa8051435940

      QuotaCharged          : No

      CancelQueuePort       : 0000000000000000

      CancelSequencePort    : 0000000000000000

      CancelSequenceNumber  : 0x00000000 (0)

      ClientContext         : 00000000021a0f60

      ServerContext         : 0000000000000000

      PortContext           : 00000000033834d0

      CancelPortContext     : 0000000000000000

      SecurityData          : 0000000000000000

      View                  : 0000000000000000


    1: kd> !thread fffffa8051435940

    THREAD fffffa8051435940  Cid 0574.1ea0  Teb: 000007fffffd3000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable

        fffffa601086d3e8  SynchronizationEvent

    IRP List:

        fffffa80514558b0: (0006,03a0) Flags: 00000884  Mdl: 00000000

    Impersonation token:  fffff8801fc4b050 (Level Impersonation)

    DeviceMap                 fffff8800e7c50e0

    Owning Process            fffffa804ea4fc10       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      3187932        Ticks: 1006437 (0:04:21:40.517)

    Context Switch Count      111            

    UserTime                  00:00:00.015

    KernelTime                00:00:00.000

    Win32 Start Address 0x000007fefed47780

    Stack Init fffffa601086ddb0 Current fffffa601086d040

    Base fffffa601086e000 Limit fffffa6010868000 Call 0

    Priority 10 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffffa60`1086d080 fffff800`01ebb6fa nt!KiSwapContext+0x7f

    fffffa60`1086d1c0 fffff800`01eb035b nt!KiSwapThread+0x13a

    fffffa60`1086d230 fffff800`020d2a2e nt!KeWaitForSingleObject+0x2cb

    fffffa60`1086d2c0 fffff800`020d2a87 nt!FsRtlCancellableWaitForMultipleObjects+0x62

    fffffa60`1086d320 fffffa60`015da75a nt!FsRtlCancellableWaitForSingleObject+0x27

    fffffa60`1086d360 fffffa60`015d8588 mup!MupLocateUncProviderForPath+0x40a

    fffffa60`1086d4a0 fffffa60`015d8130 mup!MupiCallUncProvider+0xe8

    fffffa60`1086d510 fffffa60`015d90ef mup!MupStateMachine+0x120

    fffffa60`1086d560 fffffa60`00f31e91 mup!MupCreate+0x2c3

    fffffa60`1086d5e0 fffffa60`00f4e26c fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x211

    fffffa60`1086d650 fffff800`021267d9 fltmgr!FltpCreate+0x25d

    fffffa60`1086d700 fffff800`0211ecb2 nt!IopParseDevice+0x5f9

    fffffa60`1086d8a0 fffff800`0211f842 nt!ObpLookupObjectName+0x593

    fffffa60`1086d9b0 fffff800`02124e47 nt!ObOpenObjectByName+0x2f2

    fffffa60`1086da80 fffff800`0212ecc8 nt!IopCreateFile+0x287

    fffffa60`1086db20 fffff800`01eb8ef3 nt!NtCreateFile+0x78

    fffffa60`1086dbb0 00000000`76f9726a nt!KiSystemServiceCopyEnd+0x13

    00000000`2674d738 00000000`00000000 0x76f9726a


    Bingo! I found two threads in Svchost.exe which showed some activity and were blocked for over 4 hours on a UNC Path access. From the IRP which was issued by the thread above I found out that it was attempting to contact one of the DCs.


    1: kd> !irp 0xfffffa80`514558b0

    Irp is active with 4 stacks 3 is current (= 0xfffffa8051455a10)

    No Mdl: No System Buffer: Thread fffffa8051435940:  Irp stack trace. 

         cmd  flg cl Device   File     Completion-Context

    [  0, 0]   0  0 00000000 00000000 00000000-00000000   


                      Args: 00000000 00000000 00000000 00000000

    [  0, 0]   0  0 00000000 00000000 00000000-00000000   


                      Args: 00000000 00000000 00000000 00000000

    >[  0, 0]   0 e0 fffffa8031d1a840 fffffa805145aec0 fffffa6000f332b0-fffffa8047b6c930 Success Error Cancel

                 \FileSystem\Mup  fltmgr!FltpSynchronizedOperationCompletion

                      Args: fffffa601086d848 01000040 00030000 00000000

    [  0, 0]   0  0 fffffa8031d1b880 fffffa805145aec0 00000000-00000000   


                      Args: fffffa601086d848 01000040 00030000 00000000


    1: kd> dt nt!_FILE_OBJECT fffffa805145aec0 FileName

       +0x058 FileName : _UNICODE_STRING "\DC005.contoso.com\PIPE\lsarpc"


    For the details on MUP you can refer to Support for UNC Naming and MUP.


    At first my thought was that something is wrong with the network stack because we are waiting on domain controller to respond. The thought was that either we had a networking issue or the Domain Controller wasn’t responding, but redirector would have timed out as it won’t wait for over 4 hours for an unresponsive connection. I started digging into the source code and found that we build a query context in MupLocateUncProviderForPath () function and issue another IRP which is sent down the I/O Stack. I searched for other IRPs which was issued by the above thread and found two IRPs.


    1: kd> !irpfind 0 0 thread fffffa8051435940

    Looking for IRPs with thread == fffffa8051435940


    Scanning large pool allocation table for Tag: Irp? (fffffa804ea83000 : fffffa804ed83000)



    Searching NonPaged pool (fffffa8030605000 : fffffa8c27e00000) for Tag: Irp?


      Irp    [ Thread ] irpStack: (Mj,Mn)   DevObj  [Driver]         MDL Process

    fffffa80514558b0 [fffffa8051435940] irpStack: ( 0, 0)  fffffa8031d1a840 [ \FileSystem\Mup]

    fffffa8073bc65e0 [fffffa8051435940] irpStack: ( e, 0)  fffffa804e842610 [ \FileSystem\mrxsmb]


    1: kd> !irp fffffa8073bc65e0

    Irp is active with 2 stacks 1 is current (= 0xfffffa8073bc66b0)

     No Mdl: No System Buffer: Thread fffffa8051435940:  Irp stack trace. 

         cmd  flg cl Device   File     Completion-Context

    >[  e, 0]   0 e1 fffffa804e842610 fffffa804e843f20 fffffa60015d4024-fffffa601086d3b0 Success Error Cancel pending

                 \FileSystem\mrxsmb     mup!MupiQueryPathCompletionRoutine

                      Args: 00000004 00000050 00140193 fffffa601086d408

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   


                      Args: 00000000 00000000 00000000 00000000


    This is the IRP that MUP has issued an IOCTL IOCTL_REDIR_QUERY_PATH to the redirector.  If the status returned by the driver is anything other than STATUS_PENDING then it will wait by calling FsRtlCancellableWaitForSingleObject.


    The 0xfffffa8051435940 thread doesn’t have any redirector function in its stack, which made me suspicious that there could be some other thread which is already working on similar operation and redirector is currently busy processing that request. I started looking for a worker thread which would be calling the redirector.


    1: kd> !stacks 0 mrxsmb

    Proc.Thread  .Thread  Ticks   ThreadState Blocker

                                [fffffa804e9c7c10 svchost.exe]

     248.000e84  fffffa804ee72bb0 ffe0d0ad Blocked    nt!FsRtlCancellableWaitForMultipleObjects+0x62

                                [fffffa804e9e12e0 svchost.exe]

     404.000f18  fffffa804edad4f0 ffe03e7e Blocked    nt!FsRtlCancellableWaitForMultipleObjects+0x62


    The above command will parse through all the kernel stacks for all the process and dump out one line summary info for the threads where it finds reference to that module in its stack. I have snipped the output to keep things in context.


    One of the sample threads where redirector is in action is shown below.


    1: kd> !thread fffffa804ee72bb0

    THREAD fffffa804ee72bb0  Cid 0248.0e84  Teb: 000007fffffa8000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Alertable

        fffffa8048f6c6b8  SynchronizationEvent

    IRP List:

        fffffa8073ac3980: (0006,03a0) Flags: 00000884  Mdl: 00000000

        fffffa807257fc60: (0006,03a0) Flags: 00000884  Mdl: 00000000

        fffffa80511627b0: (0006,03a0) Flags: 00000884  Mdl: 00000000

    Impersonation token:  fffff8800d515c40 (Level Impersonation)

    Owning Process            fffffa804e9c7c10       Image:         svchost.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      2043731        Ticks: 2150638 (0:09:19:10.167)

    Context Switch Count      893            

    UserTime                  00:00:00.031

    KernelTime                00:00:00.109

    Win32 Start Address 0x000007fefb96c534

    Stack Init fffffa60125a1db0 Current fffffa60125a0ab0

    Base fffffa60125a2000 Limit fffffa601259c000 Call 0

    Priority 10 BasePriority 1 PriorityDecrement 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffffa60`125a0af0 fffff800`01ebb6fa nt!KiSwapContext+0x7f

    fffffa60`125a0c30 fffff800`01eb035b nt!KiSwapThread+0x13a

    fffffa60`125a0ca0 fffff800`020d2a2e nt!KeWaitForSingleObject+0x2cb

    fffffa60`125a0d30 fffff800`020d2a87 nt!FsRtlCancellableWaitForMultipleObjects+0x62

    fffffa60`125a0d90 fffffa60`0f625c14 nt!FsRtlCancellableWaitForSingleObject+0x27

    fffffa60`125a0dd0 fffffa60`0f634e9e rdbss!RxWaitForStableCondition+0x11c

    fffffa60`125a0e10 fffffa60`0f63599f rdbss!RxFindOrCreateConnections+0x572

    fffffa60`125a0ef0 fffffa60`0f627318 rdbss!RxConstructVirtualNetRoot+0xb7

    fffffa60`125a0f90 fffffa60`0f63815a rdbss!RxFindOrConstructVirtualNetRoot+0x594

    fffffa60`125a1100 fffffa60`0f627eee rdbss!RxCreateTreeConnect+0x13e

    fffffa60`125a1190 fffffa60`0f6041f6 rdbss!RxCommonCreate+0x20a

    fffffa60`125a1250 fffffa60`0f62568a rdbss!RxFsdCommonDispatch+0x786

    fffffa60`125a1340 fffffa60`1031f2cf rdbss!RxFsdDispatch+0x21a

    fffffa60`125a13b0 fffffa60`015d85f9 mrxsmb!MRxSmbFsdDispatch+0xbf

    fffffa60`125a13f0 fffffa60`015d8130 mup!MupiCallUncProvider+0x159

    fffffa60`125a1460 fffffa60`015d90ef mup!MupStateMachine+0x120

    fffffa60`125a14b0 fffffa60`00f4e342 mup!MupCreate+0x2c3

    fffffa60`125a1530 fffff800`021267d9 fltmgr!FltpCreate+0x333

    fffffa60`125a15e0 fffff800`0211ecb2 nt!IopParseDevice+0x5f9

    fffffa60`125a1780 fffff800`0211f842 nt!ObpLookupObjectName+0x593

    fffffa60`125a1890 fffff800`02124e47 nt!ObOpenObjectByName+0x2f2

    fffffa60`125a1960 fffff800`0212ecc8 nt!IopCreateFile+0x287

    fffffa60`125a1a00 fffff800`01eb8ef3 nt!NtCreateFile+0x78

    fffffa60`125a1a90 fffff800`01eb9400 nt!KiSystemServiceCopyEnd+0x13

    fffffa60`125a1c98 fffffa60`0f664a5e nt!KiServiceLinkage



    The above thread is waiting for the condition of the remote connection to become stable and that’s the reason it called the function RxWaitForStableCondition(), which accepts RX_CONTEXT as its first argument and the reason for why we entered into the wait as second argument. As we can see that the thread above was waiting for over 9 hours and 19 minutes.


    For additional reading you can go through RX_CONTEXT and IRP Management.


    The unassembled code for the function below is to find out where we store the pointer to the RX_CONTEXT structure. You may remember about the challenges of x64 debugging from this post.


    1: kd> .frame /r 05

    05 fffffa60`11e42dd0 fffffa60`0f634e9e rdbss!RxWaitForStableCondition+0x11c

    rax=0000000000000000 rbx=fffffa804f4de010 rcx=0000000000000000

    rdx=0000000000000000 rsi=0000000000000001 rdi=0000000000000000

    rip=fffffa600f625c14 rsp=fffffa6011e42dd0 rbp=fffffa805113a490

    r8=0000000000000000  r9=0000000000000000 r10=0000000000000000

    r11=0000000000000000 r12=fffffa805113a4a0 r13=fffffa6011e42fd8

    r14=fffffa804f4de010 r15=0000000000000000

    iopl=0         nv up di pl nz na pe nc

    cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000


    fffffa60`0f625c14 85c0            test    eax,eax


    1: kd> u rdbss!RxWaitForStableCondition


    fffffa60`0f625af8 488bc4          mov     rax,rsp

    fffffa60`0f625afb 48895808        mov     qword ptr [rax+8],rbx

    fffffa60`0f625aff 48896810        mov     qword ptr [rax+10h],rbp

    fffffa60`0f625b03 48897018        mov     qword ptr [rax+18h],rsi

    fffffa60`0f625b07 48897820        mov     qword ptr [rax+20h],rdi

    fffffa60`0f625b0b 4154            push    r12

    fffffa60`0f625b0d 4883ec30        sub     rsp,30h

    fffffa60`0f625b11 33ff            xor     edi,edi

    1: kd> u


    fffffa60`0f625b13 833a03          cmp     dword ptr [rdx],3

    fffffa60`0f625b16 418af1          mov     sil,r9b

    fffffa60`0f625b19 4d8be0          mov     r12,r8

    fffffa60`0f625b1c 488bea          mov     rbp,rdx

    fffffa60`0f625b1f 488bd9          mov     rbx,rcx     <--------- RX_CONTEXT is in RBX register

    fffffa60`0f625b22 7c07            jl      rdbss!RxWaitForStableCondition+0x33 (fffffa60`0f625b2b)

    fffffa60`0f625b24 33c0            xor     eax,eax

    fffffa60`0f625b26 e968010000      jmp     rdbss!RxWaitForStableCondition+0x19b (fffffa60`0f625c93)


    1: kd> !pool fffffa804f4de010 2           <---- [Argument 1] Here is pointer to RX_CONTEXT Structure

    Pool page fffffa804f4de010 region is Nonpaged pool

    *fffffa804f4de000 size:  670 previous size:    0  (Allocated) *RxIr

                Pooltag RxIr : RDBSS RxContext, Binary : rdbss.sys


    1: kd> !pool fffffa805113a490 2           <---- [Argument 2] Here is the Pointer to the Reason for Wait

    Pool page fffffa805113a490 region is Nonpaged pool

    *fffffa805113a3b0 size:  280 previous size:   10  (Allocated) *RxSc

                Pooltag RxSc : RDBSS SrvCall, Binary : rdbss.sys


    1: kd> dc fffffa805113a490 L1

    fffffa80`5113a490  00000001               <---------------- Transport is in Transition state


    We have bunch of RX_CONTEXT describing the IRPs pending on redirector for the transport of the remote connection to stabilize.


    Next obvious thought that came to my mind was to investigate the list where we store the IRPs which have been issued by the redirector for the transport driver to process and understand the state of those IRPs and understand why we are not progressing further? RxCeSend is the API which submits the IRPs to the Transport driver which is maintained in the list below. For more information about Redirector related APIs you can refer Connection Engine Management.


    1: kd> x rdbss!RxIrpsList

    fffffa60`0f620110 rdbss!RxIrpsList = <no type information>


    The rdbss!RxIrpsList is a pointer to the structure of type LIST_ENTRY.


    1: kd> dq fffffa60`0f620110 L2

    fffffa60`0f620110  fffffa80`510fd0d0 fffffa80`4edb5590


    1: kd> dq fffffa80`510fd0d0 L2            <---------------- LIST_ENTRY.FLINK 

    fffffa80`510fd0d0  fffffa80`50d2e300 fffffa60`0f620110


    1: kd> dq fffffa80`4edb5590 L2            <---------------- LIST_ENTRY.BLINK

    fffffa80`4edb5590  fffffa60`0f620110 fffffa80`4ff450d0


    This being a x64 windows each entry in this list is 0x30 bytes, this can change with OS version or driver updates, so don’t rely on this list or any of the offset that I referenced here in your driver code. Entry in the first column of this list is a pointer to the list entry. At offset 10 we store the pointer to the IRP. Add 0x10 bytes to get the IRP.


    1: kd> dl 0xfffffa80`510fd0d0

    fffffa80`510fd0d0  fffffa80`50d2e300 fffffa60`0f620110

    fffffa80`510fd0e0  fffffa80`7274e5a0 00000000`00000000

    fffffa80`50d2e300  fffffa80`50ee4490 fffffa80`510fd0d0

    fffffa80`50d2e310  fffffa80`50f16160 00000000`00000000

    fffffa80`50ee4490  fffffa80`727f1140 fffffa80`50d2e300

    fffffa80`50ee44a0  fffffa80`51491760 00000000`00000000

    fffffa80`727f1140  fffffa80`73a95860 fffffa80`50ee4490

    fffffa80`727f1150  fffffa80`511b9c60 00000000`00000000

    fffffa80`73a95860  fffffa80`73a8e3e0 fffffa80`727f1140

    fffffa80`73a95870  fffffa80`73a77c60 00000000`00000000

    fffffa80`73a8e3e0  fffffa80`73a9c290 fffffa80`73a95860

    fffffa80`73a8e3f0  fffffa80`73a46a30 00000000`00000000

    fffffa80`73a9c290  fffffa80`725ca490 fffffa80`73a8e3e0



    1: kd> !irp poi(fffffa80`510fd0d0+10)

    Irp is active with 2 stacks 2 is current (= 0xfffffa807274e6b8)

     No Mdl: No System Buffer: Thread 00000000:  Irp stack trace. 

         cmd  flg cl Device   File     Completion-Context

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   


                      Args: 00000000 00000000 00000000 00000000

    >[  f, 3]   0 e1 fffffa804b1fd8d0 fffffa8050f012f0 fffffa601030df70-fffffa805119fda0 Success Error Cancel pending

                 \Driver\Smb      mrxsmb!RxTdiAsynchronousConnectCompletion

                      Args: 00000000 fffffa8050d127e0 00000000 fffffa601031b6f0


    1: kd> !irp poi(fffffa80`50d2e300+10)

    Irp is active with 2 stacks 2 is current (= 0xfffffa8050f16278)

     No Mdl: System buffer=fffffa8050d59c88: Thread 00000000:  Irp stack trace. 

         cmd  flg cl Device   File     Completion-Context

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   


                      Args: 00000000 00000000 00000000 00000000

    >[  e, 0]   0 e1 fffffa804b1fd8d0 fffffa8050f012f0 fffffa60103063a4-fffffa8050d59c60 Success Error Cancel pending

                 \Driver\Smb      mrxsmb!RxTdiAsynchronousRequestCompletion

                      Args: 00000000 00000030 0021c0e8 00000000


    1: kd> !irp poi(fffffa80`50ee4490+10)

    Irp is active with 2 stacks 2 is current (= 0xfffffa8051491878)

     No Mdl: System buffer=fffffa805115dcb8: Thread 00000000:  Irp stack trace. 

         cmd  flg cl Device   File     Completion-Context

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   


                      Args: 00000000 00000000 00000000 00000000

    >[  e, 0]   0 e1 fffffa804b1fd8d0 fffffa8050f012f0 fffffa60103063a4-fffffa805115dc90 Success Error Cancel pending

                 \Driver\Smb      mrxsmb!RxTdiAsynchronousRequestCompletion

                      Args: 00000000 00000030 0021c0e8 00000000


    All these IRPs are waiting on SMB Transport Driver (responsible for handling TCP/IP port 445 related traffic) for further processing. The obvious feeling was to find out if there are threads which are making call to SMB function to see what they are they doing?


    Note: In Windows 2008 R2 and Windows 7 onwards redirector doesn’t use SMB as the transport driver directly.


    I searched through the dump but I couldn’t find a single thread which was calling any of the SMB function. This is the point where it gets more interesting. Why we don’t have any thread which is calling SMB function when we have bunch of IRPs pending on that driver?


    This was the time for me to collaborate with the Network Stack Guru Aman Jain to get some insight to figure out the answer to my question.


    It turned out that SMB driver will create a work item and it will use the System Delayed Worker Queue to queue the work item that it created and will populate the routine in the work Item structure which is supposed to process the work item once the worker thread picks up the work item.


    1: kd> x nt!ExWorkerQueue

    fffff800`01ff08a0 nt!ExWorkerQueue = <no type information>


    Above pointer is the list head for the 3 queues that system has namely the “Critical WorkQueue” , “Delayed WorkQueue” and the “HyperCritical WorkQueue” as see in the output of !exqueue command in the debugger. Each entry in this list is 0x58 bytes long and is of type _KQUEUE.


    Structure referenced below decides how a various work queues are enumerated. This structure is documented in this MSDN article.


    typedef enum _WORK_QUEUE_TYPE {

      CriticalWorkQueue        = 0,

      DelayedWorkQueue         = 1,

      HyperCriticalWorkQueue   = 2,

      MaximumWorkQueue         = 3



    Critical WorkQueue

    1: kd> dt nt!_KQUEUE fffff800`01ff08a0   

       +0x000 Header           : _DISPATCHER_HEADER

       +0x018 EntryListHead    : _LIST_ENTRY [ 0xfffff800`01ff08b8 - 0xfffff800`01ff08b8 ]

       +0x028 CurrentCount     : 0

       +0x02c MaximumCount     : 0x10

       +0x030 ThreadListHead   : _LIST_ENTRY [ 0xfffffa80`30a4da08 - 0xfffffa80`30a39d68 ]


    Delayed WorkQueue

    1: kd> dt nt!_KQUEUE fffff800`01ff08a0+58

       +0x000 Header           : _DISPATCHER_HEADER

       +0x018 EntryListHead    : _LIST_ENTRY [ 0xfffffa80`50226f00 - 0xfffffa80`5183f9f0 ]

       +0x028 CurrentCount     : 0

       +0x02c MaximumCount     : 0x10

       +0x030 ThreadListHead   : _LIST_ENTRY [ 0xfffffa80`30a398d8 - 0xfffffa80`51157d68 ]


    HyperCritical WorkQueue

    1: kd> dt nt!_KQUEUE fffff800`01ff08a0+58+58

       +0x000 Header           : _DISPATCHER_HEADER

       +0x018 EntryListHead    : _LIST_ENTRY [ 0xfffff800`01ff0968 - 0xfffff800`01ff0968 ]

       +0x028 CurrentCount     : 0

       +0x02c MaximumCount     : 0x10

       +0x030 ThreadListHead   : _LIST_ENTRY [ 0xfffffa80`30a4e1f8 - 0xfffffa80`30a4e1f8 ]


    1: kd> dd fffff80001ff08a0+58+40 L1

    fffff800`01ff0938  00000010 <----------- We created 0x10 (16) dynamic threads


    We create additional dynamic threads as in the event when the default number of threads in a particular work queue is unable to handle the Work Items being queued. In this case we created 16 additional threads, which is the maximum number of dynamic thread which can be created.


    1: kd> dd fffff80001ff08a0+58+4c L1

    fffff800`01ff0944  00004667


    1: kd> ? 00004667

    Evaluate expression: 18023 = 00000000`00004667  <-------- 18,023 work Items pending for processing


    I could see that there are work items pending in the Delayed WorkQueue because of the following relation which is not present in any other WorkQueue that we dumped.


    nt!_KQUEUE->EntryListHead.Flink != nt!_QUEUE->EntryListHead.Blink


    For details about Work Item Processing and Work Items you can refer following docs:

    System Worker Threads




    The next obvious steps involved dumping all work items pending in the list to see if I have work items related to SMB driver? And if the Work Items are present for SMB then why they haven’t been picked up by the worker thread?


    1: kd> !list "-t nt!_LIST_ENTRY.Flink -e -x\"dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1\" fffffa80`50226f00"




    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`4b7c7e30  fffffa80`50a720e0

    fffffa80`4b7c7e50  fffffa60`0f5002e4 smb!SmbDelayedStartTcpSession


    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`72544e00  fffffa80`31aeb580

    fffffa80`72544e20  fffffa60`0f5002e4 smb!SmbDelayedStartTcpSession


    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`72759d50  fffffa80`51184690

    fffffa80`72759d70  fffffa60`0f5002e4 smb!SmbDelayedStartTcpSession


    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`73b416e0  fffffa80`5143c9c0

    fffffa80`73b41700  fffffa60`0f5002e4 smb!SmbDelayedStartTcpSession




    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`518392e0  fffffa80`31fafa60

    fffffa80`51839300  fffffa60`00becb58 mpio!MPIOPathVerify


    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`31fafa60  fffffa80`31fafab0

    fffffa80`31fafa80  fffffa60`00becb58 mpio!MPIOPathVerify


    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`31fafab0  fffffa80`31fafc30

    fffffa80`31fafad0  fffffa60`00becb58 mpio!MPIOPathVerify


    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`31fafc30  fffffa80`5183f9a0

    fffffa80`31fafc50  fffffa60`00becb58 mpio!MPIOPathVerify


    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`5183f9a0  fffffa80`4ea092e0

    fffffa80`5183f9c0  fffffa60`00becb58 mpio!MPIOPathVerify


    dq poi(@$extret) L1; dps poi(@$extret)+0x20 L1

    fffffa80`4ea092e0  fffffa80`5183f9f0

    fffffa80`4ea09300  fffffa60`00becb58 mpio!MPIOPathVerify


    Voila! Here are my SMB work items in the Delayed WorkQueue waiting to be processed. Another interesting observation was that this queue had around 18,023 work items in this queue waiting to be picked up by the worker threads. Most of the work items were related to MPIO. Investigation of the worker threads in the Delayed WorkQueue revealed that the worker threads were blocked processing MPIO work item on the DSM driver for the storage.


    I picked one of the worker entry from the _KQUEUE->ThreadListHead for the Delayed WorkQueue and validated that it is a thread pool allocation with !pool. As the size of _ETHREAD structure is 0x450 bytes and the pool allocation was for 0x490 bytes, this indicates that the pointer to the actual thread object would be 0x40 bytes into that pool allocation from the beginning of that pool block.


    1: kd> !pool 0xfffffa80`30a398d8  2

    Pool page fffffa8030a398d8 region is Nonpaged pool

    *fffffa8030a396e0 size:  490 previous size:   90  (Allocated) *Thre (Protected)

                Pooltag Thre : Thread objects, Binary : nt!ps


    1: kd> ?? sizeof(nt!_ETHREAD)

    unsigned int64 0x450


    1: kd> !thread fffffa8030a396e0+40

    THREAD fffffa8030a39720  Cid 0004.0044  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable

        fffffa8031e8fe58  SynchronizationEvent

    Not impersonating

    DeviceMap                 fffff880000073a0

    Owning Process            fffffa80309ca110       Image:         System

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      2015907        Ticks: 2178462 (0:09:26:24.225)

    Context Switch Count      42461            

    UserTime                  00:00:00.000

    KernelTime                00:00:14.976

    Win32 Start Address nt!ExpWorkerThread (0xfffff80001ec07c8)

    Stack Init fffffa600232bdb0 Current fffffa600232b800

    Base fffffa600232c000 Limit fffffa6002326000 Call 0

    Priority 15 BasePriority 12 PriorityDecrement 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffffa60`0232b840 fffff800`01ebb6fa nt!KiSwapContext+0x7f

    fffffa60`0232b980 fffff800`01eb035b nt!KiSwapThread+0x13a

    fffffa60`0232b9f0 fffffa60`00c9c698 nt!KeWaitForSingleObject+0x2cb

    fffffa60`0232ba80 fffffa60`00c97a06 MyBadDSM+0xb698

    fffffa60`0232bb10 fffffa60`00becca3 MyBadDSM+0x6a06

    fffffa60`0232bb60 fffff800`02119693 mpio!MPIOPathVerify+0x14b

    fffffa60`0232bcc0 fffff800`01ec08c3 nt!IopProcessWorkItem+0x27

    fffffa60`0232bcf0 fffff800`020c3ee7 nt!ExpWorkerThread+0xfb

    fffffa60`0232bd50 fffff800`01ef6616 nt!PspSystemThreadStartup+0x57

    fffffa60`0232bd80 00000000`00000000 nt!KxStartSystemThread+0x16


    At this point it was evident that something has gone wrong with either the storage or the HBA adapter, Inspection of the MyBadDSM.sys code at this point could reveal if it was the problem with the driver or something has gone wrong with the HBA or the Fiber connection to the storage.


    This brings me to the end of this post. I hope you enjoyed reading this post. Goodbye. Until next time!

  • Ntdebugging Blog

    Expanding Netmon to aid in debugging


    A walk-through of creating a Netmon parser in the context of a real case


    As is obvious to frequent readers of our blog, our team logs a lot of time in our debugger of choice (for some windbg, for others kd).  However, a debugger is not always the best tool for the job, and sometimes even the best tool has limitations. I found this to especially true when working a few Internet Printing Protocol (IPP) cases recently.


    Probably the biggest challenge of many IPP cases is the mixed environments you usually find IPP running in.  The benefit customers see in IPP over other print providers is it works natively or with minimal configuration on Windows, Mac, and Linux clients.  This makes it popular in places like college computer labs, where there isn’t one standard client system.   Unfortunately, this also means that we can’t really debug both sides of the communication when something goes wrong.


    In a recent case, a customer was having problems printing from their Windows Vista clients to a Linux Common Unix Printing System (CUPS) IPP server.  If the CUPS server was set to allow anonymous connections, everything worked perfectly. When the administrator enabled authentication, he found that most print jobs failed to print.  After a bit more testing, he found that small jobs (perhaps a page of printed text) worked fine, but larger, multi-page documents failed.


    For situations like this, I prefer to use a network trace to get a feeling for where the problem is occurring.  The problem was – IPP wasn’t one of the protocols built in to Netmon (and I find Wireshark’s IPP parser to not always work well – especially with Windows clients/servers).  I decided that the amount of time it would take to decode the traffic by hand could be better spent creating a Netmon IPP parser that I could use every time I ran into one of these issues.


    One of the great things about Netmon is you can view the source of every included parser.  This was a big help, as I hadn’t written a parser before.  [Note: all steps noted are written using Netmon 3.4.There might be slight differences in Netmon 3.3.]  To do this, open Netmon and click the Parsers tab. Under Object View, expand parser files and double click any of the .npl files.  The source will appear on the right.


    The language for Netmon parsers is similar to C++, with a limited set of statements.  These are all documented in the Netmon help file, but the ones I found useful are described below.  To begin, I started by defining a number of tables.  The basic idea of a table is to provide a way to convert a value to a string.  For example, one field in an IPP packet is the status of a printer, which is represented by an integer.  In order to allow Netmon to show printer states in a readable form, I created a table to convert the values as seen in Figure 1 below.


    Table IPPPrinterState //2911-4.4.11




                case 3 : "idle";

                case 4 : "processing";

                case 5 : "stopped";

                default : "Unknown Code";



    Figure 1: Netmon Table


    Each table is defined with the Table keyword, followed by a name for the table.  It may optionally be followed by a list of parameters, which I’ll use later.  In this case, I added a comment that specified which RFC and section this information comes from.  A table consists of a switch statement with a case for each value, and a default for all other cases, much like other programming languages.  I created tables like IPPPrinterState for each field that could be represented in an IPP packet from information I found in each of IPP’s RFCs.


    Once the tables were complete, I moved on to creating the Protocol portion of the parser.  This section of the code provides the logic that iterates through a packet and calls the tables for the appropriate data.  This section starts with either the RegisterBefore or RegisterAfter keyword.  These are used to determine how your parser is called.  Essentially, Netmon takes all of the parsers it has, and compiles them into one continuous binary (.npb) and registration tells Netmon where your parser fits.  For my case, I used the following registration code.


    [ RegisterAfter (HTTPPayloadData.OCSP, Ipp, Property.HTTPContentType.Contains("application/ipp")) ]


    This tells Netmon that, when compiling the parser, it should insert my code right after the code for the OCSP protocol in its HTTPPayloadData parser, my protocol should be called IPP, and it should enter my code path if the HTTP Payload is of content type “application/ipp”.  This allows my parser to work a bit differently than the Wireshark IPP parser – Wireshark uses a port number (631) to identify IPP traffic, whereas my code looks at HTTP content types.  The advantage of this, for me, is that Windows servers use port 80 for IPP by default, not 631, so in cases with a Windows IPP server, this parser should correctly identify the packets.  You may be wondering how or why I chose to register after OCSP.  Basically, I knew I needed my code to be registered in the section of code where HTTP does its payload content type processing.  So I opened up HTTP’s parser, and searched for the content type analysis. OCSP was the first protocol I found in HTTP’s content type logic, so I used that as the place to insert my protocol.


    After the registration comes the Protocol statement. I chose the following.


    Protocol IPP = FormatString("Status/OpCode = %s", IPPOperationsSupported(OpID))


    This names my protocol IPP and specifies that I want the description of the protocol to display the IPP status code.  This way, a user doesn’t need to drill down to find out if this is a print job or a printer status request.  You’ll notice FormatString is a function in Netmon that is similar to printf.  In this case, I am passing a variable (OpID, which is defined lower in my code) to my IPPOperationsSupported table to determine what this OpCode means.  Before I had a parser, I would need to look up the operations supported values in the IPP RFC for each packet I examined.


    Next is the body of the protocol.  Basically, this consists of a series of fields (like variables) that define how a packet is laid out.  Creating a field is similar to declaring a variable in C++. You start by choosing a data type that matches the size of the data in the packet and provide a name for that field.  For example, Figure 2 shown below contains the first seven lines of my Protocol.


    struct Version = FormatString("%i.%i", Major, Minor)


          INT8 Major;

          INT8 Minor;


    INT16 OpId = FormatString("%s (0x%X)", IPPOperationsSupported(this), this);

    INT32 RequestId;

    Figure 2: Code in the protocol block


    The IPP specification states that all packets begin with two 8-bit values, the first value specifies the major protocol version in use, and the second value specifies the minor.  In this case, I wrapped both in a struct so Netmon will display them as “Version: 1.0”, instead of separately as “Major: 1” “Minor: 0” on two lines.  After the version is a 16-bit field that specifies the operation requested (for example, print-job or get-printer-state).  I choose to display this value by looking it up in the IPPOperationsSupported table, then printing it as the string, followed by the hex value (e.g. “Get-Printer-Attributes (0xB)”).  The ‘this’ keyword simply uses the value of the current field, which in this case is the OpId.  Even though Netmon parses through the packet sequentially, this kind of use of a Field before its value is retrieved is allowed.  Finally, I set the RequestId field, which is a 32-bit int value.  Since this field is just a transaction ID for this conversation, I don’t need to do any formatting to it.


    After that, things got a little more complicated.  IPP’s structure allows for a variable number of attribute groups, each of which can contain a variable number of attributes.  For example, in response to the request “Get-Printer-Attributes” from the client, the server responds with the Printer Attributes group, which contains a number of attributes like printer-state, queued-job-count, and so on.  First, I needed to deal with the attribute groups in a loop until I’d read each one.  IPP specifies that the end of the attribute groups is specified with the value of 0x03, so I wrote a while loop to create attribute groups until FrameData[FrameOffset] is equal to 3 (See Figure 3 below).  FrameData and FrameOffset are special values provided by Netmon.  FrameData is an array of the entire contents of the frame, and FrameOffset is Netmon’s current location in the packet.  I use this instead of declaring a field here because referencing FrameData[FrameOffset] does not advance the parser frame location.  This is important because I want to consume that value further down.


    Inside that loop, I declared another struct that contains an attribute group.  Much like the Protocol IPP line above, we reference a field here that will be declared lower down.  This line does not advance the FrameOffset, since we don’t declare a field here.  The first line of this struct is the field declaration line that finally consumes the attribute group tag.  Below that is another While loop to process all attributes in the attribute group.  IPP differentiates between attributes and attribute groups by making all attribute group identifiers smaller than 0x10, and all attribute identifiers 0x10 or higher.  I use this as the condition for my loop.  Finally, I declare an Attribute struct inside this loop.  This struct is displayed after looking up how to properly print based on the Attribute Name and Value in the AttribDisplayTable.


    IPP declares attributes as an 8-bit type identifier (int, bool, string, etc.), a 16-bit int specifying the attribute name’s length, the name (a string), a 16-bit in value length, and a value.  Since I want to look up the value in various tables, depending on the Attribute Name, I store the Attribute Name as “AttName” in a property.  This way, I can continue to reference it while processing continues.  Properties are declared in brackets just above the field they will store.  In my case, I prepend the ‘Post.’ evaluation keyword to the property name.  This instructs Netmon to use the end result of the next line as its value, but before advancing the FrameOffset.  I do this again for the actual value, which I call Val.  If I did not use the Post evaluation keyword, Val would contain the unsigned int32 value of printer state, instead of the formatted string result I get by looking up printer state in its table.


    While [FrameData[FrameOffset] != 0x03]


          struct AttributeGroup = FormatString("%s", IPPTags(TagGroup))


                INT8 TagGroup = FormatString("%s (0x%X)", IPPTags(this), this);

                While [FrameData[FrameOffset] >= 0x10]


                      struct Attribute = AttribDisplayTable(AttName, Val)


                            INT8 Type = FormatString("%s (0x%X)", IPPTags(this), this);

                            INT16 NameLength;


                            AsciiString(NameLength) AttributeName;   

                            INT16 ValueLength;



                                  case "printer-state" :


                                  UINT32 PrinterState = FormatString("%s (0x%X)", IPPPrinterState(this),this);


    Figure 3: Loops in protocol block


    My case statements continue like printer-state for all possible attributes of IPP.  At the very end of the protocol block, after I’ve closed my switch, structs, and whiles, I have one more line, which consumes any data remaining in the packet.  This would contain document data if the packet was a print job, and is required so all the packet data is consumed before Netmon moves on to the next packet.  That line is:


    BLOB (FrameLength - FrameOffset) Data;


    As you can see, it is a binary blob data type, set to the size of the frame, less our current location.


    Finally, after my Protocol block, I needed to define my own data type.  IPP defines its own data format to specify printer uptime, so I created a data type for it as shown below in Figure 4.


    //Uptime format spec

    Number UPTIME


          Size = 4;

          DisplayFormat = (this != 0) ? FormatString("%i days %02i:%02i:%02i (%i seconds)",

                (this/86400), //Days

                (this-((this/86400)*86400))/3600, //Hours

                (this-(((this/86400)*86400)+(((this-((this/86400)*86400))/3600)*3600)))/60, //Minutes

                (this%60), //Seconds

                this) : "0"


    Figure 4: Custom data type


    The first line of Figure 4 specifies this will be a data type composed of numeric data named UPTIME.  Size specifies how many bytes the type uses.  DisplayFormat is what Netmon displays for this type. In this case, I use the x ? y : z syntax.  Netmon doesn’t have if/then/else keywords, but instead uses this ternary operator.  I use a special case for 0 since it seems to be a common return value in the traces I’ve looked at, and having ‘Uptime: 0 days 00:00:00 (0 seconds)’ seemed excessive.


    Figures 5 and 6 below show what the result looks like in Netmon.


    Figure 5: Frame Summary



    Figure 6: Frame Detail


    So what did the trace show?  Windows attempts to send IPP requests with no authentication first, then if it receives an access denied, retries with authentication.  This is by design, as the IPP server replies with the authentication types it supports in the access denied message.  In the case of print jobs that are too large to fit in a single packet, IPP’s spec allows servers to either issue the access denied message as soon as it receives the first packet, or after it has received the entire job.  It turns out that the IPP Print Provider on Windows was designed to send the entire job before listening for a response, so it missed the access denied message that CUPS sent after it received the first packet.  See http://support.microsoft.com/kb/976988/ for related information.  Want a copy of the IPP parser? It will be included in a future release of the Netmon Parser Update.


    I hope this post have given you a better idea of how Netmon works, how IPP works, and helps if you ever need to write a parser for your protocol.

    -Matt Burrough

  • Ntdebugging Blog

    Debugging a Bugcheck 0x109


    My name is Nader Khonsari. I am an escalation engineer in Platforms Global Escalation Services. I want to share with you a recent experience where 64-bit Windows Server 2008 servers at a customer location were encountering bugcheck 0x109 blue screen crashes.


    In 64-bit versions of the Windows kernel PatchGuard is present. If any driver or application attempts to modify the kernel the PatchGuard will generate the bugcheck (CRITICAL_STRUCTURE_CORRUPTION) mentioned below. PatchGuard protects the kernel from modification by malicious or badly written drivers or software.


    To further investigate this bugcheck you need to compare the impacted kernel function with a known reliable one. For instance, if the machine encountering this was running Windows Server 2008 service pack 2 with a post SP2 hotfix kernel you need to compare the impacted kernel function with that of service pack 2 kernel function. Usually you do not need to download and extract the post SP2 hotfix, because the vast majority of the kernel code has not been modified since the service pack.


    If you already have service pack 2 for Windows Server 2008 handy, expand the package using instructions included in KB928636:


    Windows6.0-KB948465-X64.exe /x

    expand.exe -f:* C:\WS08\SP2\windows6.0-kb948465-X64.cab C:\WS08\SP2\Expanded


    Locate the kernel binary from the expanded binaries and then open it up with your debugger just like you open a crash memory dump.


    windbg -z C:\WS08\SP2\Expanded\amd64_microsoft-windows-os-kernel_31bf3856ad364e35_6.0.6002.18005_none_ca3a763069a24eea\ntoskrnl.exe


    This is the bugcheck data from the dump:



    This bugcheck is generated when the kernel detects that critical kernel code or

    data have been corrupted. There are generally three causes for a corruption:

    1) A driver has inadvertently or deliberately modified critical kernel code

     or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx

    2) A developer attempted to set a normal kernel breakpoint using a kernel

     debugger that was not attached when the system was booted. Normal breakpoints,

     "bp", can only be set if the debugger is attached at boot time. Hardware

     breakpoints, "ba", can be set at any time.

    3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.


    Arg1: a3a039d89b456543, Reserved

    Arg2: b3b7465eedc23277, Reserved

    Arg3: fffff80001778470, Failure type dependent information

    Arg4: 0000000000000001, Type of corrupted region, can be

            0 : A generic data region

            1 : Modification of a function or .pdata

            2 : A processor IDT

            3 : A processor GDT

            4 : Type 1 process list corruption

            5 : Type 2 process list corruption

            6 : Debug routine modification

            7 : Critical MSR modification


    Next, check the address at Arg3. This will give you the function that was modified, but not the offset of the modified instruction.


    3: kd> ln fffff80001778470

    (fffff800`01778470)   nt!KeSetSystemTime   |  (fffff800`01778790)   nt!BiLoadSystemStore

    Exact matches:

        nt!KeSetSystemTime = <no type information>


    Unassemble the same function in the SP2 kernel binary you expanded from the SP2 package. Do the same with the function of the crashed kernel and compare the two. You will find the modified opcode compared to that of the unmodified kernel.


    Below is the comparison of the nt!KeSetSystemTime code of the crashed kernel and that of the service pack 2 kernel respectively. They match fine except for the highlighted byte in the prefetch instruction which has been overwritten with a 0x1f.  This changed the instruction to a nop, which is done to prevent the prefetch operation from occurring on processors that don't support prefetch.



    fffff800`017785c6 0f1f0f          nop     dword ptr [rdi]

    fffff800`017785c9 488b07          mov     rax,qword ptr [rdi]

    fffff800`017785cc 493bc7          cmp     rax,r15

    fffff800`017785cf 7516            jne     nt!KeSetSystemTime+0x177



    00000001`4012e5b6 0f0d0f          prefetchw [rdi]

    00000001`4012e5b9 488b07          mov     rax,qword ptr [rdi]

    00000001`4012e5bc 493bc7          cmp     rax,r15

    00000001`4012e5bf 7516            jne    


    After further investigation this turned out to be a known issue in the VMware environment when the VM is moved from a non-prefetch to a prefetch architecture and even then, only in a live-migration case.  The issue is documented on VMWare's site at http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1008749&sliceId=1&docTypeID=DT_KB_1_1&dialogID=74787167&stateId=0 .

  • Ntdebugging Blog

    Debugging services startup in Svchost from a kernel mode debug session


    Windows shared services allow us to run system services together in a single service by having multiple DLLs run in a single process called Svchost. This allows Windows to have many services to run with the overhead of a single process.   You can find more information about shared services here and here.


    I like to do a lot of my user mode debugging from a kernel mode debugger. I can just attach to a Hyper-V VM though a virtual serial port and all my tools are available to me without having to set up a user mode debugger on the target, plus if my application calls into kernel mode I’m all set.


    The Svchost process won’t load the DLL until it’s time to start the service, so it can be tricky to set the correct breakpoint to catch the service starting up. This is complicated by the fact that the dll entry point will usually be <module>!ServiceMain, however this is not necessarily the case.


    We can leverage the fact that svchost.exe exists in the same shared memory across all the Svchost processes, and a change to this shared memory will affect all of the svchost process.  When you set a user mode breakpoint from a kernel mode debugger the debugger changes the user mode code by inserting an int 3 (the debugger changes the code back when it is executed).  We can take advantage of this functionality to set one breakpoint that will fire from any svchost process.


    If we can get past the Svchost process loading the service dll, we can find the host process main function or any other function we want to set a breakpoint on.


    First, find the process address of any Svchost. I’m just showing the first of a long list of processes:


    kd> !process  0 0 svchost.exe

    PROCESS fffffa80098c5970

        SessionId: 0  Cid: 0270    Peb: 7fffffd8000  ParentCid: 0204

        DirBase: 35986000  ObjectTable: fffff8a001115b20  HandleCount: 343.

        Image: svchost.exe


    Now let’s get into the context of this Svchost (it doesn’t matter which one) and set a breakpoint.  .process –i sets an invasive breakpoint so we can get into the active context of the process.


    kd> .process -i fffffa80098c5970

    You need to continue execution (press 'g' <enter>) for the context

    to be switched. When the debugger breaks in again, you will be in

    the new process context.


    Then go the debugger to enter the active process.


    kd> g

    Break instruction exception - code 80000003 (first chance)



    GetServiceMainFunctions()  is where Svchost both loads the service DLL and gets the service main function so let’s set a breakpoint for it:


    kd> .reload /user

    Loading User Symbols


    kd> bp svchost!GetServiceMainFunctions


    Now I just go the debugger and start my service.  I’ll use the workstation service as an example (I had previously stopped the workstation service for this test).


    kd> g


    At a command prompt on the target I enter:


    C:\> net start workstation


    In the debugger I get:


    Breakpoint 0 hit


    0033:00000000`ffc813cc fff3            push    rbx

    kd> k

    Child-SP          RetAddr           Call Site

    00000000`0339f7f8 00000000`ffc8128d svchost!GetServiceMainFunctions

    00000000`0339f800 000007fe`fe38a82d svchost!ServiceStarter+0xba

    00000000`0339f890 00000000`77a5f56d sechost!ScSvcctrlThreadW+0x25

    00000000`0339f8c0 00000000`77b93281 kernel32!BaseThreadInitThunk+0xd

    00000000`0339f8f0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d


    svchost!GetServiceMainFunctions is called from Svchost!ServiceStarter.  I just leave GetServiceMainFunctions with gu (Go Up). Now the service dll will have been loaded and I can load symbols for wkssvc.dll and get the service main() function.


    kd> gu


    0033:00000000`ffc8128d 488b7c2430      mov     rdi,qword ptr [rsp+30h] ss:002b:00000000`0339f830=000007fefb443a7c

    kd> k

    Child-SP          RetAddr           Call Site

    00000000`0339f800 000007fe`fe38a82d svchost!ServiceStarter+0xba

    00000000`0339f890 00000000`77a5f56d sechost!ScSvcctrlThreadW+0x25

    00000000`0339f8c0 00000000`77b93281 kernel32!BaseThreadInitThunk+0xd

    00000000`0339f8f0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d


    Now loading my user symbols I can get all the symbols for wkssvc.dll


    kd> .reload /user

    Loading User Symbols


    kd> dps 00000000`0339f800  l7

    00000000`0339f800  00000000`00000000

    00000000`0339f808  00000000`00000000

    00000000`0339f810  00000000`00000000

    00000000`0339f818  00000000`00000000

    00000000`0339f820  00000000`00000000

    00000000`0339f828  00000000`00000000

    00000000`0339f830  000007fe`fb443a7c wkssvc!ServiceMain <- Here is the Service Main function.


    I can now set a breakpoint on wkssvc!ServiceMain,  and as a bonus I now have access to all available breakpoints in wkssvc.dll


    kd> bp wkssvc!ServiceMain

    kd> g

    Breakpoint 1 hit


    0033:000007fe`fb443a7c 48895c2408      mov     qword ptr [rsp+8],rbx ss:002b:00000000`0339f800={wkssvc!SvchostPushServiceGlobals (000007fe`fb442ff4)}

    kd> k

    Child-SP          RetAddr           Call Site

    00000000`0339f7f8 00000000`ffc81344 wkssvc!ServiceMain

    00000000`0339f800 000007fe`fe38a82d svchost!ServiceStarter+0x1e8

    00000000`0339f890 00000000`77a5f56d sechost!ScSvcctrlThreadW+0x25

    00000000`0339f8c0 00000000`77b93281 kernel32!BaseThreadInitThunk+0xd

    00000000`0339f8f0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d


    Now I can debug the service on startup, not caring what instance of svchost it’s in with full user and kernel mode access.

  • Ntdebugging Blog

    I Want A Debugger Robot




    My name is Sabin from the Platforms Global Escalation Services team at Microsoft, and today I want to share with you a recent experience I had debugging an issue reported by an hardware manufacturer.


    The customer was doing a reboot test for their new server product line. They found that after hundreds of continuous reboots there was always a single instance that the server took more than 20 minutes to start up, when compared to an average 2 minute normal startup time. This happened only once every 300+ to 1000+ reboots. The number of reboots it took before the problem happened again varied randomly so it was difficult to predict when the problem would occur.


    Although they setup a live kernel debugging environment, they didn’t want to watch the computer screen for 10+ hours waiting for the problem to happen so they could manually hit Ctrl+Break in windbg. So instead they setup a video camera to film the computer screen 24x7, and they managed to find that when the “mysterious delay” happened the computer showed a gray screen with “Microsoft (R) Windows (R) version 5.1 (Build 3790: Service Pack 2)”.


    The case came to me and the customer even shipped a problematic server to our office to troubleshoot the cause of the delay. The problem was that I didn’t want to stare at the computer screen for 10+ hours either!


    The first thing I thought was that it would be fantastic if there were a robot sitting in front of Windbg, watching the elapsed time for each reboot, so it could hit Ctrl+Break in windbg if the server takes more than 10 minute to start. Then I asked myself, “Why not?”


    I decided to build such a “robot” myself.  I went around and checked the Debuggers SDK document (which can be found in the windbg help document debugger.chm), and I realized that what I needed was a customized debugger. The functionality of the debugger is simple, it should be able to recognize the time when the server first starts and the time when the server reboots. If there is more than 10 minutes between these two times the customized debugger automatically breaks in to the server. The event callback interface IDebugEventCallbacks::SessionStatus and the client interface IDebugControl::SetInterrupt can meet my needs perfectly.


    It is not that difficult to build such a customized debugger, which I called DBGRobot. I would like to share some code snippets which you may find helpful when building a customized debugger for a special debugging scenario, or as the basis for building a more complicated debugging robot.


    First, we need to download and install the Windows Driver Kit Version 7.1.0. When installing the WDK be sure to select Debugging Tools for Windows.




    If you install the WDK to its default folder, which for version 7.1.0 is C:\WinDDK\7600.16385.1, the C:\WinDDK\7600.16385.1\Debuggers\sdk\samples folder will contain the sample code from the Debugger SDK. The dumpstk sample is the one particularly interesting to us. We can copy some common code from it, such as the out.cpp and out.hpp which is the implementation of the IDebugOutputCallbacks interface.


    Now let’s do some coding.  The common code is copied from the Debuggers SDK sample Dumpstk. I also listed it here for clarity.


    The first step is to create the IDebugClient, IDebugControl and IDebugSymbols interfaces (although IDebugSymbols is not used in this case). You need to call the DebugCreate() function to create the IDebugClient interface, and then use IDebugClient->QueryInterface() to query the IDebugControl and IDebugSymbols interfaces.





        HRESULT Status;


        // Start things off by getting an initial interface from

        // the engine.  This can be any engine interface but is

        // generally IDebugClient as the client interface is

        // where sessions are started.

        if ((Status = DebugCreate(__uuidof(IDebugClient),

                                  (void**)&g_Client)) != S_OK)


            Exit(1, "DebugCreate failed, 0x%X\n", Status);



        // Query for some other interfaces that we'll need.

        if ((Status = g_Client->QueryInterface(__uuidof(IDebugControl),

                                               (void**)&g_Control)) != S_OK ||

            (Status = g_Client->QueryInterface(__uuidof(IDebugSymbols),

                                               (void**)&g_Symbols)) != S_OK)


            Exit(1, "QueryInterface failed, 0x%X\n", Status);




    If you want to see the output from the debugging engine, you also need to implement the IDebugOutputCallbacks interface. The main function to be implemented is IDebugOutputCallbacks::Output(), which is quite simple as we only need to see the output in the command prompt stdout stream:





        IN ULONG Mask,

        IN PCSTR Text




        fputs(Text, stdout);

        return S_OK;



    Here comes our main code logic: we need to implement the IDebugEventCallbacks interface and monitor the SessionStatus events. In order for the debugger engine to deliver the SessionStatus events to us we need to set the DEBUG_EVENT_SESSION_STATUS mask in IDebugEventCallbacks::GetInterestMask():





            OUT PULONG Mask




        // Here we monitor all the event types, although it is not necessary


        return S_OK;




    In the RobotEventCallBacks::SessionStatus() function, if we receive the DEBUG_SESSION_ACTIVE session status we know that Windows has started and connected to the debug session. Once the OS has started we create a new thread that waits for hEvent, an unnamed global event. In the new thread we send a timeout value for 10 minutes to the WaitForSingleObject() call.  The hEvent will be set when the session status is DEBUG_SESSION_END or DEBUG_SESSION_REBOOT, which means that the server rebooted and disconnected from the debug session. If WaitForSingleObject() returns WAIT_TIMEOUT, indicating that the server has not rebooted after 10 minutes, we send a break to the server via g_Control->SetInterrupt(DEBUG_INTERRUPT_ACTIVE).





        IN ULONG Status




        printf("Session Status changed, status = %d \n", Status);


        if (Status == DEBUG_SESSION_ACTIVE)


                    //bSessionActive = TRUE;

                    ::CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)TimerThread, NULL, 0, NULL);


        else if ((Status == DEBUG_SESSION_END) || (Status == DEBUG_SESSION_REBOOT) )


                    //bSessionActive = FALSE;





        return S_OK;






          IN PEXCEPTION_RECORD64 Exception,

          IN ULONG FirstChance



          printf("Captured exception 0x%x, FirstChance = %d", Exception->ExceptionCode, FirstChance);

          if (FirstChance && Exception->ExceptionCode == 0x80000003)


                bInterrupted = TRUE;



          return DEBUG_STATUS_NO_CHANGE;



    void TimerThread()


                    HRESULT Status;

                    DWORD result;


                    result = WaitForSingleObject(hEvent,10*60*1000);


                    if (result == WAIT_OBJECT_0) //the close event is signled within 10 minutes


                                    printf("Target reboots within 10 minutes \n");             



                    else if (result == WAIT_TIMEOUT)  //the close event is not singled within 10 minute, it means the server hangs


                        if ((Status = g_Control->SetInterrupt(DEBUG_INTERRUPT_ACTIVE)) !=S_OK)  //force the target to break


                                    Exit(1, "SetInterrupt failed, 0x%x\n", Status);







    Below is our main function. In the main function we need to create the interfaces, and set the callback interfaces g_Client->SetOutputCallbacks and g_Client->SetEventCallbacks. Then we call g_Control->WaitForEvent() in the while(TRUE) loop, which is a typical main loop for a debugger program. If WaitForEvent() returns it means that a debugging event happened, such an exception, and then you can send some debug commands via g_Control->Execute. Here we simply send a “.time” command to record the problematic time. The bInterrupted boolean is set in the RobotEventCallBacks::Exception() function when it detects a first chance 80000003 exception, which means a debugger break-in.


    Note: Initially I wanted to send a “.crash” command to crash the server and generate the memory dump.  When I tried this I found that although the “.crash” command did trigger a blue screen it failed to generate a memory dump because the problem happened so early in the boot process and the paging file was not initialized yet. So I changed it to just send a “.time” command and performed live debugging after the problem was captured by the tool. Initially I had forgotten to change the printf output, which is why you see the output “Succeeded in capturing the problem and crashing the server!” in the below screenshot.


    void __cdecl

    main(int argc, char* argv[])



        HRESULT Status;


        hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);  //create the global hEvent


        CreateInterfaces();  //create the global interfaces


        if ((Status = g_Client->SetOutputCallbacks(&g_OutputCb)) != S_OK)  //set the output callback interface


            Exit(1, "SetOutputCallbacks failed, 0x%X\n", Status);



        if ((Status = g_Client->SetEventCallbacks(&g_EventCb)) != S_OK)  //Set the event callback interface


            Exit(1, "SetEventCallbacks failed, 0x%X\n", Status);





        Status = g_Client->AttachKernel(DEBUG_ATTACH_KERNEL_CONNECTION, argv[1]); //Attach to the kernel debugging, the argv[1] should be “COM:Port=COM1,baud=115200”


        if (Status != S_OK)


                    Exit(1, "AttachKernel failed, 0x%X\n", Status);              




        while(TRUE)  //This is the main loop of the debugger program, it calls g_Control->WaitForEvent() repeatedly waiting for the debugging events



                        Status = g_Control->WaitForEvent(DEBUG_WAIT_DEFAULT,

                                                              INFINITE);  //wait for 10 minutes


                        printf("WaitForEvent returned. Status = 0x%x\n", Status);



                        if (Status == S_OK && bInterrupted)


                                    if ((Status = g_Control->Execute(DEBUG_OUTCTL_THIS_CLIENT, ".time", DEBUG_EXECUTE_ECHO)) !=S_OK)  //do .time to record the time the problem happens


                                                    Exit(1, "Execute failed, 0x%x\n", Status);




                                    Exit(0, "Succeeded in capturing the problem!");




        Exit(0, NULL);



    Now that we have the code we can build DBGRobot.exe using the WDK 7.1.0 build environment. Remember to set the following environment variables, as mentioned in the readme.txt under \WinDDK\7600.16385.1\Debuggers\samples


    SET DBGLIB_LIB_PATH=C:\WinDDK\7600.16385.1\Debuggers\sdk\lib

    SET DBGSDK_INC_PATH=C:\WinDDK\7600.16385.1\Debuggers\sdk\inc

    SET DBGSDK_LIB_PATH=C:\WinDDK\7600.16385.1\Debuggers\sdk\lib


    Now I have the “robot” watching the problem for me. I set up the live kernel debugging environment as usual and then ran the DBGRobot.exe tool as the debugger instead of windbg.  I used the command: dbgrobot.exe com:port=COM1,baudrate=115200.


    This is the screenshot when the server starts and reboots normally:



    I let the DBGRobot.exe run over night, and in the morning when I came back to the office, the “robot” had done its job perfectly and captured the problem for me! The problematic server was left in the break-in state so I just needed to launch windbg to attach to the server, and then I could do the debugging in windbg as normal.



    The cause of the delay turned out to be a hardware problem, but the actual problem is not important here. I hope this article can give you some new ideas about debugging, and using the debug tools in new ways.


  • Ntdebugging Blog

    Part 3: Understanding !PTE - Non-PAE and X64


    Hello, Ryan Mangipano (ryanman) again with part three of my series on understanding the output of the !PTE command. In this last installment I’ll continue our manual conversion of Virtual Addresses by converting a Non-PAE VA. Afterwards I’ll convert a VA from X64 Long Mode. Then I’ll discuss the TLB. If you haven’t read part one and two of this series I recommend taking a looking before jumping into the rest of this article.

    Non-PAE system VA to Physical Address conversion

    In part 1 and part 2 I discussed using windbg to manually x86 PAE virtual to physical address conversion with 4KB and 2MB pages. Now I’ll mention the same thing for Non-PAE systems. The official processor manuals explain how systems not using PAE have only two level of tables instead of the three used by PAE. This fact is because PAE added the Page Directory Pointer Table. This means the CR3 register will contain a pointer to the Page Directory Table. Also note the size of the table entries will be 32-bit, not the 64-bit size observed in the PAE tables. PAE expanded the table size to allow for more bits for the purpose of allowing the addressing of more physical memory.

    The PAE bit is bit number five, which is the sixth bit due to bit numbering starting at zero. You can see PAE is not enabled on this system.

           0: kd> .formats @cr4

      Binary:  00000000 00000000 00000110 11010001

     Another method of checking this:

                   1: kd> j ((@cr4 & 0y00000000000000000000000000100000) != 0) '.echo PAE flag Enabled';'.echo PAE flag Disabled'

    PAE flag Enabled

     Now I'll use the following random virtual address which is valid:


    0: kd> !pte f72c5c00

    F72C5C00  - PDE at C0300F70        PTE at C03DCB14

              contains 01014963      contains 06CE7963

            pfn 1014 G-DA--KWV    pfn 6ce7 G-DA—KWV


    0: kd> .formats f72c5c00

      Binary:  11110111 00101100 01011100 00000000

     Let’s break it down:


    PD offset                        11110111 00

    Page Table Offset                       101100 0101

    Offset in the Physical Page             1100 00000000 (since it’s a even 12 bits,  just refer to it in hex as c00)


    0: kd> !dd @cr3 + (0y1111011100 * @@(sizeof(nt!HARDWARE_PTE)))L1

    # a07df70 01014963


    0: kd> !dd 1014000 + (0y1011000101 * @@(sizeof(nt!HARDWARE_PTE)))L1

    # 1014b14 06ce7963

     Now that I have the physical page base, I'll place  the last 3 hex digits (c00)  from the Virtual Address onto the address base.


    0: kd> !dd 6ce7c00 L4

    # 6ce7c00 00000001 c0000005 00000000 00000000


    0: kd> dd f72c5c00 L4

    f72c5c00  00000001 c0000005 00000000 00000000

    X64 VA to Physical Address Conversion

    Just as PAE added a third level to the non-PAE two-level system, x64 Long mode adds a fourth level to the hierarchy.  This table is called the Page-Map Level-4 (PML4 table). AMD refers to the entries in this table as PML4E (Page-Map Level-4 Entry).  Intel refers to each entry as PML4-Table Entry. Internally we refer to this as the eXtended Page directory Entry (PXE).  Regardless of how you refer to these entries they contain indexes into the PDP table (Page Directory Pointer Table).

     Here is the output of the !pte command against a 64-bit address:

                   7: kd> !pte fffffade`c24eb7c0

                                     VA fffffadec24eb7c0


    contains 0000000111800863  contains 0000000119826863  contains 0000000119839963  contains 0000000001FF6121

    pfn 111800     ---DA--KWEV  pfn 119826     ---DA--KWEV  pfn 119839     -G-DA--KWEV  pfn 1ff6       -G--A—KREV

    I'll break it down in binary and use data from the processor manuals to separate the bits


    7: kd> .formats fffffade`c24eb7c0

      Binary:  11111111 11111111 11111010 11011110 11000010 01001110 10110111 11000000


    Sign extend               11111111 11111111

    PML4 offset               11111010 1

    PDP offset                1011110 11

    PD offset                 000010 010

    Page-Table offset         01110 1011

    Physical Page Offset      0111 11000000


     Now that I have the numbers, I'll plug them in and find the physical address. If you are having problems following along, refer to part one of this blog and the AMD x64 System Programming manual. You should be comparing the output below to the !pte output above


    7: kd> !dq @cr3 + ( 0y111110101 * @@(sizeof(ntkrnlmp!HARDWARE_PTE))) L1

    #  147fa8 00000001`11800863


    7: kd> !dq 0x00111800000 + (  0y101111011  * @@(sizeof(ntkrnlmp!HARDWARE_PTE))) L1

    #111800bd8 00000001`19826863


    7: kd> !dq 0x119826000 + ( 0y000010010  * @@(sizeof(ntkrnlmp!HARDWARE_PTE))) L1

    #119826090 00000001`19839963


    7: kd> !dq 0x119839000 + ( 0y011101011  * @@(sizeof(ntkrnlmp!HARDWARE_PTE))) L1

    #119839758 00000000`01ff6121


    7: kd> !dc 1ff67c0 L4

    # 1ff67c0 5085ff48 48000005 68244c8b 04a8f633 H..P...H.L$h3...


    7: kd> dc fffffade`c24eb7c0 L4

    fffffade`c24eb7c0  5085ff48 48000005 68244c8b 04a8f633  H..P...H.L$h3...

    TLB- Translation Lookaside Buffer and Conclusion

    The CPU’s memory management unit performs these operations to translate virtual addresses to physical. Wouldn’t it be great if we could cache the virtual address to physical page information in a location that can be accessed very quickly so that the CPU doesn’t have to look this up for future references to this page?  That is just what the Translation Lookaside Buffer (TLB) does. Hopefully this will shed some light on some basic memory structures like Large Pages, Flags, and the TLB so I encourage you to read more about these topics from the following sources-

    How PAE x86 works (on MSDN): http://technet.microsoft.com/en-us/library/cc736309(WS.10).aspx

    Intel  & AMD processor manuals: http://www.intel.com/products/processor/manuals/index.htm and http://developer.amd.com/documentation/guides/Pages/default.aspx#manuals

    “Windows Internals, 5th Edition” Mark E. Russinovich and David A. Solomon with Alex Ionescu  -Chapter 9: Memory Management



  • Ntdebugging Blog

    ‘Hidden’ ETW Stack Trace Feature – Get Stacks All Over the Place!



    I’ve heard Mark R. say that “sometimes Microsoft gave me a gift” when coming across an interesting feature not in mainstream documentation.  How true that often the odd find can prove ultimately very valuable.  Such was the case when I read Bruce Dawson’s Stack Walking in Xperf blog entry.

    Here’s the excerpt that blew me away:

    You can also record call stacks for manifest-based ETW (Event Tracing for Windows) events, but the syntax is quite different, and it only works on Windows 7 and above. When you specify your ETW provider to xperf after “-on” you can specify extra parameters after the provider name, separated by colons. These are flags, a level, and, for manifest-based providers, a list of extra data to record, which can include call stacks. You can leave the flags and level fields blank and just specify ‘stack’ (in single quotes) after three colons like this:

         xperf -on Latency -stackwalk profile -start browse -on Microsoft-IE:::'stack'  
         rem Your scenario goes here...     
         xperf -stop browse -stop -d mytrace.etl

    This sounds rather ordinary at first,  but looking at the syntax you may realize that this means for just about every manifest based provider (Microsoft-*) one can generate a stack trace on all ETW events therein.  This is really cool because if you are only interested in stacks, you don’t have to monkey around with any other trace level or flags which is often the most frustrating part about manual ETW tracing.  Recall from Ivan’s blog that there are over 600+ inbox providers to explore so we can go crazy here exploring components!  Let’s demonstrate.

    First, remember to use the Win7 SDK version of XPerf.  The XPerf(WPT) install *.msi files are found in the Bin directory after installing Win32 Development Tools, so you have to do a two-step at present to get the tools installed.

    Let’s list all the providers…

    C:\utils\xperf7sdk>xperf -providers | findstr "Microsoft-"
           93c05d69-51a3-485e-877f-1806a8731346:0x0001000000000000:0x4       : Microsoft-Windows-LUA
           9580d7dd-0379-4658-9870-d5be7d52d6de:0x0001000000000000:0x4       : Microsoft-Windows-WLAN-AutoConfig
           959f1fac-7ca8-4ed1-89dc-cdfa7e093cb0:0x0001000000000000:0x4       : Microsoft-Windows-HealthCenterCPL
           96ac7637-5950-4a30-b8f7-e07e8e5734c1:0x0001000000000000:0x4       : Microsoft-Windows-Kernel-BootDiagnostics
           96f4a050-7e31-453c-88be-9634f4e02139:0x0001000000000000:0x4       : Microsoft-Windows-UserPnp
           98583af0-fc93-4e71-96d5-9f8da716c6b8:0x0001000000000000:0x4       : Microsoft-Windows-Dwm-Udwm
           a50b09f8-93eb-4396-84c9-dc921259f952:0x0001000000000000:0x4       : Microsoft-Windows-Diagnosis-MSDE<
           ad5162d8-daf0-4a25-88a7-01cbeb33902e:0x0001000000000000:0x4       : Microsoft-Windows-WPDClassInstaller
           ae4bd3be-f36f-45b6-8d21-bdd6fb832853:0x0001000000000000:0x4       : Microsoft-Windows-Audio
           af2e340c-0743-4f5a-b2d3-2f7225d215de:0x0001000000000000:0x4       : Microsoft-Windows-Netshell
           b03d4051-3564-4e93-93db-3c34f1b5b503:0x0001000000000000:0x4       : Microsoft-Windows-Diagnosis-TaskManager
           b1f90b27-4551-49d6-b2bd-dfc6453762a6:0x0001000000000000:0x4       : Microsoft-Windows-PowerCpl
           b2a40f1f-a05a-4dfd-886a-4c4f18c4334c:0x0001000000000000:0x4       : Microsoft-PerfTrack-IEFRAME
           c89b991e-3b48-49b2-80d3-ac000dfc9749:0x0001000000000000:0x4       : Microsoft-Windows-Documents
           c9bdb4eb-9287-4c8e-8378-6896f0d1c5ef:0x0001000000000000:0x4       : Microsoft-Windows-HomeGroup-ProviderService
           dbe9b383-7cf3-4331-91cc-a3cb16a3b538:0x0001000000000000:0x4       : Microsoft-Windows-Winlogon
           de513a55-c345-438b-9a74-e18cac5c5cc5:0x0001000000000000:0x4       : Microsoft-Windows-Help
           ded165cf-485d-4770-a3e7-9c5f0320e80c:0x0001000000000000:0x4       : Microsoft-Windows-DeviceUx
           e978f84e-582d-4167-977e-32af52706888:0x0001000000000000:0x4       : Microsoft-Windows-TabletPC-InputPanel
           ed6b3ba8-95b2-4cf5-a317-d4af7003884c:0x0001000000000000:0x4       : Microsoft-Windows-Sidebar
           f3f14ff3-7b80-4868-91d0-d77e497b025e:0x0001000000000000:0x4       : Microsoft-Windows-WMP
           fbcfac3f-8459-419f-8e48-1f0b49cdb85e:0x0001000000000000:0x4       : Microsoft-Windows-NetworkProfile
           ffdb9886-80f3-4540-aa8b-b85192217ddf:0x0001000000000000:0x4       : Microsoft-PerfTrack-MSHTML

    We have a lot to play with as you can see, but let’s say you are trying to track down a socket leak.  Since we know sockets are implemented in afd.sys, let’s search for that provider…

    C:\utils\xperf7sdk>xperf -providers | findstr "AFD"        
    e53c6823-7bb8-44bb-90dc3f86090d48a6                               : Microsoft-Windows-Winsock-AFD       

    Now, we can form the XPerf syntax to trace socket operations during the leak.  The “AFDSession” is just the session name we are making up, the rest is the prescriptive syntax. Just to explain the syntax we are starting multiple sessions at the time here – kernel logger session + AFDSession. Similarly we are stopping both sessions at the same time and then merging both into the one file AFD.etl file.

    xperf -on Latency -stackwalk profile -start AFDSession -on Microsoft-Windows-Winsock-AFD:::'stack'
    <reproduce leak>
    xperf -stop AFDSession -stop -d AFD.etl

    Run xperfview AFD.etl and switch on “Stack counts by type graph”. If you don’t see this frame, make sure you are on the 7SDK version


    The beauty here is the Stack Counts by Type frame which allows us to explore which stack traces where collected, notice all the AFD stack count sources!


    Let’s say we have the simple case of a massive number of Create’s happening and thus handle counts rocketing up (leaking) for a given process.  Perhaps exhausting all ephemeral ports on the system, ultimately resulting in any number of hang symptoms.

    Simply filter the Events for AfdCreate as follows.


    Now, we have a filtered Frame of just those stack counts….


    You can imagine that if there were a lot more creates here that the frequency distribution shown above would be much higher, but for the sake of just exploring the feature, let’s just look at these small number of creates anyway.  In fact, let’s select them all by just choosing Select View in the context menu which will get all of these events over the trace sample.


    The frame area is now highlighted in light purple indicating the selected area.


    Because we are interested in Stacks, make sure you have configured your symbol path and have checked Load Symbols

    (be patient, this could take a while to build up your symbol cache)



    Next, invoke the Summary Table


    Click the flyout control on the left to select only the Creates again.


    Next, make it easy on yourself by Filtering on the interesting process only.


    After, much better.


    Now, here is the fruit of our labor in the full stack trace inside iexplore.exe allocating the socket handles.





    Consider that what we see above is read from the top down.  Iexplore.exe is doing this work through wininet, jsproxy, jscript, etc. all the way down through the calling of ws2_32.dll!socket which ultimately creates the socket and returned our file handle.  I’ve seen so many cases of ephemeral port exhaustion which is essentially a handle leak against sockets, these could have been solved in minutes with this tracing!  Keep in mind that this data isn’t perfectly suited for tracking handle leaks, especially small ones since there’s no accounting for outstanding handles.  However, sometimes the leak is egregious enough that you just need to look at activity over time to be pretty certain of the culprit.  Really, what we are doing here is just probing a particular ETW instrumented API, which is incredibly powerful.

    Typically, on 64-bit computers, you cannot capture the kernel stack in certain contexts when page faults are not allowed. To enable walking the kernel stack on AMD64, set the DisablePagingExecutive Memory Management registry value to 1 and reboot. The DisablePagingExecutive registry value is located under the following registry key: HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management

    In summary, this isn’t really a hidden feature since it’s documented in EnableTraceEx2’s Enable Parameters parameter, but I hope with this additional context you see what powerful analysis is possible.  Previously, this would have required live debugging a process, so now we have another example of the power of XPerf to decode and view key system activity, non-invasively!


  • Ntdebugging Blog

    x64 Manual Stack Reconstruction and Stack Walking


    My name is Trey Nash and I am an Escalation Engineer on the Core OS team.  My experience is as a software developer, and therefore my blog posts tend to be slanted in the direction of helping developers during the feature development, testing and the support phases.

    In this installment I would like to expand a bit on a previous post of mine called Challenges of Debugging Optimized x64 Code.  In that post I discussed the nuances of the x64 calling convention (thankfully of which there is only one) and how it is used in optimized builds of software.  The calling convention is sometimes referred to as the Application Binary Interface (ABI).  In this post, I would like to discuss the x64 unwind metadata and how you can use it in the debugger to manually walk a stack.

    In some cases, you may have a corrupted stack that the debugger simply cannot effectively walk for you.  This often happens because the debugger walks a stack from the top down (assuming the stack grows upwards as if it were a stack of places on a table), and if the stack is sufficiently trashed then the debugger cannot find its bearing.  In the x86 world, a large percentage of the time, you can spot the stack frames by following the chain of base pointers and then build a crafty stack backtrace command to display the stack at some point in time.  But in the x64 calling convention there is no base pointer.  In fact, once a function’s prolog code has executed the rsp register generally never changes until the epilog code.  To read more about x64 prolog and epilog code conventions, go here.

    Moreover, the syntax for creating a crafty stack backtrace command in the x64 environment is currently undocumented, and I aim to shed some light on that near the end of that blog post. J

    The Example Code

    For this blog post I have used the following example C# code that requires the .NET 4.0 framework and can be easily built from a Visual Studio 2010 command prompt.  You can find the code below:

    using System;
    using System.Numerics;
    using System.Threading;
    using System.Threading.Tasks;
    using System.Collections.Concurrent;

    class EntryPoint
        const int FactorialsToCompute = 2000;

        static void Main() {
            var numbers = new ConcurrentDictionary<BigInteger, BigInteger>(4, FactorialsToCompute);

            // Create a factorial delegate.
            Func<BigInteger, BigInteger> factorial = null;
            factorial = (n) => ( n == 0 ) ? 1 : n * factorial(n-1);

            // Now compute the factorial of the list
            // concurrently.
            Parallel.For( 0,
                          (i) => {
                            numbers[i] = factorial(i);
                          } );

    The spirit of this code is to concurrently compute the first 2000 factorials and store the results in a dictionary.  This code uses the new Task Parallel Library to distribute this work evenly across the multiple cores on the system.  To compile the example (assuming the code is stored in test.cs), you can execute the following command from a Visual Studio 2010 command prompt:

    csc /r:system.numerics.dll test.cs

    Note:    If you are using a 64bit platform, be sure to use the x64 command prompt shortcut installed by the Visual Studio 2010 installer.

    You can download a free evaluation of Visual Studio 2010

    x64 Unwind Metadata

    So how does the debugger and functions such as RtlVirtualUnwind know how to walk the x64 stack if it cannot find a base pointer?  The secret is that it uses unwind metadata that is typically baked into the Portable Executable (PE) file at link time. You can inspect this information using the /UNWINDINFO option of the command line tool dumpbin.  For example, I went to the directory on my machine which contains clr.dll (c:\Windows\Microsoft.NET\Framework\v4.0.30319) and dumped the unwind info looking for CLREvent::WaitEx, which I have pasted below:

      00013F20 000DFDB0 000DFE3C 007267D8  ?WaitEx@CLREvent@@QEAAKKW4WaitMode@@PEAUPendingSync@@@Z (public: unsigned long __cdecl CLREvent::WaitEx(unsigned long,enum WaitMode,struct PendingSync *))
        Unwind version: 1
        Unwind flags: UHANDLER
        Size of prologue: 0x20
        Count of codes: 10
        Unwind codes:
          20: SAVE_NONVOL, register=rbp offset=0xB0
          1C: SAVE_NONVOL, register=rbx offset=0xA8
          0F: ALLOC_SMALL, size=0x70
          0B: PUSH_NONVOL, register=r14
          09: PUSH_NONVOL, register=r13
          07: PUSH_NONVOL, register=r12
          05: PUSH_NONVOL, register=rdi
          04: PUSH_NONVOL, register=rsi
        Handler: 0020ADF0 __CxxFrameHandler3
        EH Handler Data: 007B3F54

    I’ll get into what all of this means shortly.

    Note:    The dumpbin.exe functionality is also exposed via the linker.  For example, the command “dumpbin.exe /?” is identical to “link.exe /dump /?”.


    Within the debugger, you can find this same information for a particular function using the .fnent command.  To demonstrate, I executed the example code within a windbg instance and broke in at some random point and chose one of the threads to look at which has a stack looking like the following:

      12  Id: f80.7f0 Suspend: 1 Teb: 000007ff`fffa0000 Unfrozen
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79
    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    At first glance, it may appear that this stack is already trashed since there is no symbol information for the bottom frames in the display.  Before jumping to this conclusion, recall that this is a managed application and therefore contains JIT compiled code.  To verify that the addresses without symbol information are JIT’ed code, you can do a couple of things.

    First, use the !EEHeap extension in the SOS extension to determine if these addresses reside in the JIT code heap.  Below, you can see the commands I used to both load the SOS extension and then display the Execution Engine (EE) Heap information:

    0:014> .loadby sos clr

    0:014> !EEHeap -loader
    Loader Heap:
    System Domain:     000007fef50955a0
    LowFrequencyHeap:  000007ff00020000(2000:1000) Size: 0x1000 (4096) bytes.
    HighFrequencyHeap: 000007ff00022000(8000:1000) Size: 0x1000 (4096) bytes.
    StubHeap:          000007ff0002a000(2000:2000) Size: 0x2000 (8192) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff000d0000(6000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff000dc000(4000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff00106000(3a000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff000e0000(26000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0x8000 (32768) bytes.
    Shared Domain:     000007fef5095040
    LowFrequencyHeap:  000007ff00020000(2000:1000) Size: 0x1000 (4096) bytes.
    HighFrequencyHeap: 000007ff00022000(8000:1000) Size: 0x1000 (4096) bytes.
    StubHeap:          000007ff0002a000(2000:2000) Size: 0x2000 (8192) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff000d0000(6000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff000dc000(4000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff00106000(3a000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff000e0000(26000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0x8000 (32768) bytes.
    Domain 1:          00000000003e73c0
    LowFrequencyHeap:  000007ff00030000(2000:1000) 000007ff00140000(10000:5000) Size: 0x6000 (24576) bytes total, 0x1000 (4096) bytes wasted.
    HighFrequencyHeap: 000007ff00032000(8000:5000) Size: 0x5000 (20480) bytes.
    StubHeap:          Size: 0x0 (0) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff00040000(4000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff0004b000(2000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff0007c000(54000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff0004d000(2f000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0xf000 (61440) bytes total, 0x1000 (4096) bytes wasted.
    Jit code heap:
    LoaderCodeHeap:    000007ff00150000(40000:2000) Size: 0x2000 (8192) bytes.
    Total size:        Size: 0x2000 (8192) bytes.

    Module Thunk heaps:
    Module 000007fee5581000: Size: 0x0 (0) bytes.
    Module 000007ff000330d8: Size: 0x0 (0) bytes.
    Module 000007fef06f1000: Size: 0x0 (0) bytes.
    Total size:      Size: 0x0 (0) bytes.
    Module Lookup Table heaps:
    Module 000007fee5581000: Size: 0x0 (0) bytes.
    Module 000007ff000330d8: Size: 0x0 (0) bytes.
    Module 000007fef06f1000: Size: 0x0 (0) bytes.
    Total size:      Size: 0x0 (0) bytes.
    Total LoaderHeap size:   Size: 0x21000 (135168) bytes total, 0x1000 (4096) bytes wasted.


    I have highlighted the JIT heap information and you can see that the JIT’ed code instruction pointers in the stack fall within this range.

    The second sanity check you can perform is to use a variant of the u instruction to confirm that there is a call instruction just prior to that address as shown below:

    0:012> ub 0x7ff`0015098c
    000007ff`0015095e 488b01          mov     rax,qword ptr [rcx]
    000007ff`00150961 48898424b0000000 mov     qword ptr [rsp+0B0h],rax
    000007ff`00150969 488b4108        mov     rax,qword ptr [rcx+8]
    000007ff`0015096d 48898424b8000000 mov     qword ptr [rsp+0B8h],rax
    000007ff`00150975 4c8d8424b0000000 lea     r8,[rsp+0B0h]
    000007ff`0015097d 488b5308        mov     rdx,qword ptr [rbx+8]
    000007ff`00150981 488d8c24c0000000 lea     rcx,[rsp+0C0h]
    000007ff`00150989 ff5318          call    qword ptr [rbx+18h]

    So at this point we have verified that we probably have a valid stack.  But how does the debugger so effectively walk this stack for us if there is no stack frame pointer?  The answer, of course, is that it uses the unwind information.

    To explore the answer to that question, let’s focus on a particular frame within the stack such as frame 4 in the stack above.  The code at that frame is inside the function clr!CLREvent::WaitEx, and if we pass that to .fnent, we get the following output:

    0:012> .fnent clr!CLREvent::WaitEx
    Debugger function entry 00000000`04075e40 for:
    (000007fe`f48bfdb0)   clr!CLREvent::WaitEx   |  (000007fe`f48bfe3c)   clr!CLREvent::Set
    Exact matches:
        clr!CLREvent::WaitEx = <no type information>

    BeginAddress      = 00000000`000dfdb0
    EndAddress        = 00000000`000dfe3c
    UnwindInfoAddress = 00000000`007267d8

    Unwind info at 000007fe`f4f067d8, 20 bytes
      version 1, flags 2, prolog 20, codes a
      frame reg 0, frame offs 0
      handler routine: clr!_CxxFrameHandler3 (000007fe`f49eadf0), data 7b3f54
      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0
      02: offs 1c, unwind op 4, op info 3    UWOP_SAVE_NONVOL FrameOffset: a8
      04: offs f, unwind op 2, op info d     UWOP_ALLOC_SMALL
      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL
      06: offs 9, unwind op 0, op info d     UWOP_PUSH_NONVOL
      07: offs 7, unwind op 0, op info c     UWOP_PUSH_NONVOL
      08: offs 5, unwind op 0, op info 7     UWOP_PUSH_NONVOL
      09: offs 4, unwind op 0, op info 6     UWOP_PUSH_NONVOL

     Notice that this output is virtually identical to the same information provided by dumpbin using the /UNWINDINFO option.

    I have highlighted two interesting values above.  The value highlighted in green is a relative virtual address (RVA) to the unwind info that is baked into the PE file by the linker.  The value highlighted in yellow is the actual virtual address of the unwind info and can be computed by adding the module base address shown below to the RVA for UnwindInfoAddress.

    0:012> lmnm clr

    start             end                 module name

    000007fe`f47e0000 000007fe`f5145000   clr

    By examining the PE header using !dh you can confirm that the unwind information resides in the .rdata section of the module, which I have shown below:

    0:012> !dh clr

    File Type: DLL
        8664 machine (X64)
           6 number of sections
    4BA21EEB time date stamp Thu Mar 18 07:39:07 2010

      .rdata name
      1FC8EC virtual size
      67F000 virtual address
      1FCA00 size of raw data
      67E200 file pointer to raw data
           0 file pointer to relocation table
           0 file pointer to line numbers
           0 number of relocations
           0 number of line numbers
    40000040 flags
             Initialized Data
             (no align specified)
             Read Only

    Using the Unwind Info

    Now let’s take a look at the unwind info and compare it to the prolog code of the function with which it is associated.  For convenience, I have reprinted the .fnent output for the function:

    0:012> .fnent clr!CLREvent::WaitEx
    Debugger function entry 00000000`04075e40 for:
    (000007fe`f48bfdb0)   clr!CLREvent::WaitEx   |  (000007fe`f48bfe3c)   clr!CLREvent::Set
    Exact matches:
        clr!CLREvent::WaitEx = <no type information>

    BeginAddress      = 00000000`000dfdb0
    EndAddress        = 00000000`000dfe3c
    UnwindInfoAddress = 00000000`007267d8

    Unwind info at 000007fe`f4f067d8, 20 bytes
      version 1, flags 2, prolog 20, codes a
      frame reg 0, frame offs 0
      handler routine: clr!_CxxFrameHandler3 (000007fe`f49eadf0), data 7b3f54
      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0
      02: offs 1c, unwind op 4, op info 3    UWOP_SAVE_NONVOL FrameOffset: a8
      04: offs f, unwind op 2, op info d     UWOP_ALLOC_SMALL
      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL
      06: offs 9, unwind op 0, op info d     UWOP_PUSH_NONVOL
      07: offs 7, unwind op 0, op info c     UWOP_PUSH_NONVOL
      08: offs 5, unwind op 0, op info 7     UWOP_PUSH_NONVOL
      09: offs 4, unwind op 0, op info 6     UWOP_PUSH_NONVOL

    The yellow highlighted value tells us that the prolog code for the function is 0x20 bytes in length. Using that information we can dump out the prolog code for the function:

    0:012> u clr!CLREvent::WaitEx clr!CLREvent::WaitEx+20
    000007fe`f48bfdb0 488bc4          mov     rax,rsp
    000007fe`f48bfdb3 56              push    rsi
    000007fe`f48bfdb4 57              push    rdi
    000007fe`f48bfdb5 4154            push    r12
    000007fe`f48bfdb7 4155            push    r13
    000007fe`f48bfdb9 4156            push    r14
    000007fe`f48bfdbb 4883ec70        sub     rsp,70h
    000007fe`f48bfdbf 48c7442440feffffff mov   qword ptr [rsp+40h],0FFFFFFFFFFFFFFFEh
    000007fe`f48bfdc8 48895810        mov     qword ptr [rax+10h],rbx
    000007fe`f48bfdcc 48896818        mov     qword ptr [rax+18h],rbp

    The list of operations in the unwind info is listed in the reverse order of the operations in the assembly code.  Each of the UWOP_PUSH_NONVOL operations in the unwind info maps to a nonvolatile register that is pushed onto the stack for safe keeping in the prolog code.  I have highlighted the sections within the prolog and the .fnent output such that highlighting with like colors indicates related information.  Now, let’s take a look at the raw stack and tie all of this information together. 

    Below is the stack with the frame we are focusing on highlighted in yellow:

    0:012> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79
    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    Note:    The symbols above look a little weird and may lead you to believe that WaitEx is calling itself recursively, but it is not.  It only appears that way because you need the private symbols for clr.dll to be able to see the real function name.  Only public symbols are available outside of Microsoft.


    And below is the raw stack relevant to this frame with some highlighting and annotations that I have added:

    0:012> dps 00000000`04a51f60-10 L20
    00000000`04a51f50  00000000`00000001
    00000000`04a51f58  000007fe`f48bfe23 clr!CLREvent::WaitEx+0x5e
    00000000`04a51f60  00000000`c0402388
    00000000`04a51f68  00000000`c0402500
    00000000`04a51f70  000007fe`f48afaf0 clr!SystemNative::ArrayCopy
    00000000`04a51f78  00000000`00000182
    00000000`04a51f80  00000000`04a521d0
    00000000`04a51f88  000007fe`00000001
    00000000`04a51f90  00000000`00000057
    00000000`04a51f98  00000000`c0402398
    00000000`04a51fa0  ffffffff`fffffffe
    00000000`04a51fa8  007f0000`04a521d0
    00000000`04a51fb0  fffff880`009ca540
    00000000`04a51fb8  000007fe`f483da5b clr!SVR::heap_select::select_heap+0x1c
    00000000`04a51fc0  fffff880`009ca540
    00000000`04a51fc8  000007fe`fd4e18aa KERNELBASE!ResetEvent+0xa
    00000000`04a51fd0  00000000`0043dc60
    00000000`04a51fd8  00000000`00000178
    00000000`04a51fe0  00000000`00493c10
    00000000`04a51fe8  00000000`0043dc60    
    ß saved rdi
    00000000`04a51ff0  00000000`00000001

       *** call into clr!CLREvent::WaitEx ***

    00000000`04a51ff8  000007fe`f48d51d8 clr!SVR::gc_heap::wait_for_gc_done+0x98
    00000000`04a52000  00000000`00493ba0
    00000000`04a52008  00000000`00493ba0    
    ß saved rbx
    00000000`04a52010  00000000`00000058    
    ß saved rbp
    00000000`04a52018  000007fe`f0711e0f System_Numerics_ni+0x21e0f
    00000000`04a52020  00000000`00000178
    00000000`04a52028  000007fe`f4995249 clr!SVR::GCHeap::Alloc+0xb4
    00000000`04a52030  00000000`0043a140
    00000000`04a52038  00000000`0043dc60
    00000000`04a52040  00000000`00000000
    00000000`04a52048  00000000`04a522e0

    In the stack listing I have used the same color highlighting scheme as before to show how the data on the raw stack correlates to the unwind data.  And, using green highlighting, I have shown how the Child-SP value correlates to the stack frame.

    The cyan highlighting represents nonvolatile registers that are pushed onto the stack in the prolog code.  The blue highlighting represents stack space reserved for locals and for register home space allocated for calling sub routines.  In the unwind data the stack reservation is represented by a UWOP_ALLOC_SMALL operation.  And the red highlighting represents nonvolatile registers that are stored in the home space of the previous stack frame and represented by a UWOP_SAVE_NONVOL operation stored in the unwind information.

    As you can see, we have all of the information we need in the unwind data to determine which slots on the stack are used for what.  The only thing we don’t know is the partitioning of the reserved stack space for locals, which is described by the private symbol information for the clr.dll module.

    Tying it all Together

    .fnent produces its output directly from parsing the definition of the UNWIND_INFO structure and it even gives you the address of where that structure lives in memory.  The UNWIND_INFO structure also contains a variable amount of UNWIND_CODE structures.  You can find details of the structure definitions for UNWIND_INFO and UNWIND_CODE here.  Each parsed line of unwind information in the .fnent output is backed by at least one of these structures.  In fact, you can see the correlation between the structure fields for UNWIND_INFO and the data in the .fnent output as shown below:



    Offset in prolog

    UBYTE: 4

    Unwind operation code

    UBYTE: 4

    Operation info

    From .fnent:

      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL

    The meaning of the OpInfo (operation info) field is dependent on the UnwindOp (unwind operation code) field and is spelled out in the documentation for UNWIND_CODE.  For example, for the UWOP_PUSH_NONVOL operation shown above, the OpInfo field is an index into the following table, which indicates which nonvolatile register this push is associated with.  Note that the values in the below table are in decimal, while the .fnent values are in hex:

















    8 to 15

    R8 to R15

    Therefore, the previous line from the .fnent output represents a push operation for the r14 register (05: offs b, unwind op 0, op info e   UWOP_PUSH_NONVOL).  Looking at the assembly above, we see that the topmost UWOP_PUSH_NONVOL operation in the .fnent output correlates to the last nonvolatile register push in the prolog code (push    r14).

    Note:    Remember, the push operations in the .fnent output are listed in the reverse order of where they are in the actual prolog code.  This helps the unwind code easily calculate offsets of where they should live in the stack.


    One thing that you will notice in the x64 calling convention is that once the prolog code has executed, the value for rsp will very rarely change.  The Child-SP value in the stack displayed by the k commands is the value of rsp for that frame after the prolog code has executed.  So the offsets to access these nonvolatile registers are then applied to the Child-SP value (previously highlighted in green) to find where they live on the stack.  So, in a way, the Child-SP value acts like the base pointer we are used to on the x86 platform.

    In the .fnent output above, you will also see the following:

      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0

    For UWOP_SAVE_NONVOL, you see that the .fnent output shows us the offset where we can find this register, and the register in question is represented by the OpInfo value that equates to rbp.  The offset above is applied to the Child-SP value (00000000`04a51f60 in this case) to produce the address 00000000`04a52010, which indicates that’s where we can find a saved copy of rbp.  I have also annotated where it lives in the raw stack output shown previously.

    Note:    If you’re wondering why rbp is stored in the previous stack frame, check out my previous post on this topic where I describe how in optimized builds, the compiler can use the home space from the previous stack frame to save nonvolatile registers thus saving them with a MOV operation as opposed to a PUSH operation.  This is possible because in optimized builds the home space is not necessarily used to store parameters.


    So how does all of This Work for CLR JIT Code?

    If you have asked this question, then you are definitely paying attention!  As we have shown, the compiler and linker are responsible for placing unwind info in the Portable Executable file at build time.  But what about dynamic code that is generated at runtime?  Certainly there must be unwind information for dynamically compiled code as well, otherwise there would be no way to walk the stack or unwind the stack after an exception.

    As it turns out, APIs exist for this very situation, including RtlAddFunctionTable  and RtlInstallFunctionTableCallback.  In fact, the CLR uses RtlInstallFunctionTableCallback.  The generated unwind information is then rooted in a linked list where the head is at ntdll!RtlpDynamicFunctionTable.  The format of the linked list items is undocumented as it is an implementation detail, but using dbghelp.dll you can find the unwind information for a given instruction pointer if you so desire by calling SymFunctionTableAccess64.

    In fact, if you want to see the CLR adding dynamic unwind info in action you can run the test code above under the debugger, and then at the initial breakpoint, before the application starts running, set the following breakpoint:

    bu ntdll!RtlInstallFunctionTableCallback

    When you let the application run you should then end up with a call stack at the breakpoint that looks like the following, which clearly shows the JIT compiler adding the unwind info to the table dynamically:

    0:000> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`0017dca8 000007fe`f4832cc6 ntdll!RtlInstallFunctionTableCallback
    01 00000000`0017dcb0 000007fe`f4831422 clr!InstallEEFunctionTable+0x77
    02 00000000`0017df60 000007fe`f4828ca8 clr!StubLinker::EmitUnwindInfo+0x492
    03 00000000`0017e050 000007fe`f4832c1a clr!StubLinker::EmitStub+0xe8
    04 00000000`0017e0b0 000007fe`f48328e5 clr!StubLinker::LinkInterceptor+0x1ea
    05 00000000`0017e160 000007fe`f4831e40 clr!CTPMethodTable::CreateStubForNonVirtualMethod+0xa35
    06 00000000`0017e300 000007fe`f4832926 clr!CRemotingServices::GetStubForNonVirtualMethod+0x50
    07 00000000`0017e3c0 000007fe`f48223f3 clr!MethodDesc::DoPrestub+0x38b
    08 00000000`0017e4d0 000007fe`f47e2d07 clr!PreStubWorker+0x1df
    09 00000000`0017e590 000007fe`f48210b4 clr!ThePreStubAMD64+0x87
    0a 00000000`0017e660 000007fe`f48211c9 clr!CallDescrWorker+0x84
    0b 00000000`0017e6d0 000007fe`f4821245 clr!CallDescrWorkerWithHandler+0xa9
    0c 00000000`0017e750 000007fe`f4823cf1 clr!MethodDesc::CallDescr+0x2a1
    0d 00000000`0017e9b0 000007fe`f49cdc3d clr!MethodDescCallSite::Call+0x35
    0e 00000000`0017e9f0 000007fe`f4999f0d clr!AppDomain::InitializeDomainContext+0x1ac
    0f 00000000`0017ebf0 000007fe`f49212a1 clr!SystemDomain::InitializeDefaultDomain+0x13d
    10 00000000`0017f0c0 000007fe`f4923dd6 clr!SystemDomain::ExecuteMainMethod+0x191
    11 00000000`0017f670 000007fe`f4923cf3 clr!ExecuteEXE+0x43
    12 00000000`0017f6d0 000007fe`f49a7365 clr!CorExeMainInternal+0xc4
    13 00000000`0017f740 000007fe`f8ad3309 clr!CorExeMain+0x15

    But there is one more wrinkle to this picture.  We now know that by using RtlInstallFunctionTableCallback the CLR, or any other JIT engine, can register a callback that provides the unwind information at runtime.  But how does the debugger access this information?  When the debugger is broken into the process or if you are debugging a dump, it cannot execute the callback function registered with RtlInstallFunctionTableCallback.

    This is where the sixth and final parameter to RtlInstallFunctionTableCallback comes into play.  By providing the OutOfProcessCallbackDll parameter, the CLR is providing a dll which the debugger can use to effectively parse through the JITer’s unwind information statically.  When inspecting which path the CLR passes for OutOfProcessCallbackDll on my machine, I see the following string:

    0:000> du /c 80 000007fe`f5916160
    000007fe`f5916160  "C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll"

    So, the debugger uses mscordacwks.dll to statically examine the unwind info while the process is broken in the debugger or while inspecting a dump.

    Note:    This is one of the many reasons why you must have a complete process dump to effectively post-mortem debug managed applications.

    Using the ‘k =’ Command to Dump the Stack

    If you look at the documentation for the k command, you’ll see that there is a way to override the base pointer when walking the stack.  However, the documentation leaves it a complete mystery as to how to apply this in the x64 world.  To demonstrate what I mean, consider the following stack from earlier:

    0:012> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79

    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    Now, imagine the top of the stack is corrupted, which I have “simulated” by blacking out the top few frames in this stack dump.  Furthermore, let’s assume that we identified a frame where the stack starts to look sane again by looking at the raw stack below:

    0:012> dps 00000000`04a51e90 
    00000000`04a51e90  00000000`00000000
    00000000`04a51e98  00000000`04a52130
    00000000`04a51ea0  00000000`ffffffff
    00000000`04a51ea8  00000000`ffffffff
    00000000`04a51eb0  00000000`00000108
    00000000`04a51eb8  000007fe`f48bffc7 clr!CLREvent::WaitEx+0x170
    00000000`04a51ec0  00000000`00000000
    00000000`04a51ec8  00000000`00000108
    00000000`04a51ed0  000007fe`00000000
    00000000`04a51ed8  00000000`00000108
    00000000`04a51ee0  ffffffff`fffffffe
    00000000`04a51ee8  00000000`00000001
    00000000`04a51ef0  00000000`00000000
    00000000`04a51ef8  000007fe`f48bff70 clr!CLREvent::WaitEx+0xf8
    00000000`04a51f00  00000000`00000000
    00000000`04a51f08  00000000`00493ba0

    From looking at this stack, we can see the typical pattern of stack frames because the return addresses resolve to symbols of sorts.

    To dump out the corrupted stack, here is the undocumented syntax for the x64 platform:

    k = <rsp> <rip> <frame_count>

    <rsp> is the stack pointer to start with.  You want to use the stack pointer that would have been in rsp when that function was active.  Remember, typically rsp does not change after the function prolog code completes.  Therefore, if you pick the stack pointer just below the return address, you should be good.

    <rip> should be an instruction pointer from within the function that was executing at the time the <rsp> value above was in play.  In this case, the return address directly above <rsp> comes from that function and I have highlighted it in green.  This piece of information is critical so that the debugger can find the unwind metadata for the function that was current at this point in the stack.  Without it, the debugger cannot walk the stack.

    Armed with this information, you can construct a k command to dump the stack starting from this frame as shown below:

    0:012> kn = 00000000`04a51ec0 000007fe`f48bffc7 10
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    01 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    02 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    03 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    04 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    05 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    06 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    07 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    08 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    09 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0a 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0b 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0c 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c

    Note:    The frame count in the above k expression is required.  That is the way the debugger engine distinguishes between this variant of the command (with an overridden rip) and the documented form of k that does not provide an overridden rip.


    Since the x64 calling convention does not utilize a base pointer (among other things), we need some extra information to effectively walk the stack.  That extra information comes in the form of unwind metadata and is generated by the compiler and linker for static code and baked into the portable executable file.  If you happen to code in assembly language, there are various macros that you must use to decorate your assembly code so that the assembler can generate the proper unwind metadata.  For dynamically compiled code, that information is instead provided at runtime by registering a callback with the system.  Knowing this information is critical if you encounter a corrupted stack and must piece it together manually.  In such situations you’ll need to know how to dig out the unwind metadata manually and use it to effectively reconstruct the call stack.

    That said, you could spare yourself some effort and use the undocumented variant of the k command described above to dump the stack starting at any frame. J

    Happy debugging everyone!

    "The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious.  No association with any real company, organization, product, domain name, email address, logo, person, places, or events is intended or should be inferred."

  • Ntdebugging Blog

    Uncovering How Workspaces Work in WinDbg


    Author - Jason Epperly

    Workspaces have always been a little confusing to me. I knew how to bend them to do what I needed to get the job done, however they still remained a bit mysterious. Recently I decided to sort this out, just so I knew how they worked under the hood. But before I show you my investigation let's discuss the different types of workspaces. Windbg uses several built-in types including Base, User, Kernel, Remote, Processor Architecture, Per Dump, and Per Executable. It also uses named workspaces (or user defined workspaces). When you perform a particular type of debugging (e.g. live user-mode, post-mortem dump analysis etc.) these workspaces are combined into the final environment. Here's a diagram to illustrate the possible combination of workspaces.


    • The green line is the case where WinDbg is used to open a dump file. In this scenario the Base workspace + per dump workspace is used. Note: per dump simply means each individual dump file opened gets its own workspace.
    • The blue line is the scenario where WinDbg is used to live debug a running application, using the Base+User-mode workspaces.
    • The orange line is an example of WinDbg used to perform a live kernel debug on an x86 machine. In this case windbg is using Base+Kernel+x86 workspaces.

    From the diagram you can see windbg typically uses a combination of two workspaces. While live kernel debugging it uses three workspaces.

    So what is in a workspace?

    • Session Information
      • All break point (bp's) and handling information for exceptions and events (sx settings).
      • All open source files.
      • All user-defined aliases.
    • Configuration settings
      • The symbol path.
      • The executable image path.
      • The source path.
      • The current source options that were set with l+, l- (Set Source Options).
      • Log file settings.
      • The COM or 1394 kernel connection settings, if the connection was started by using the graphical interface.
      • The most recent paths in each Open dialog box (except for the workspace file and text file paths, which are not saved).
      • The current .enable_unicode, .force_radix_output, and .enable_long_status settings.
    • WinDbg graphical interface.
      • The title of the WinDbg window
      • The Automatically Open Disassembly setting
      • The default font
      • The size and position of the WinDbg window on the desktop.
      • Which debugging information windows are open.
      • The size and position of each open window, including the window's size, its floating or docked status, whether it is tabbed with other windows, and all of the related settings in its shortcut menu.
      • The location of the pane boundary in the Debugger Command window and the word wrap setting in that window.
      • Whether the toolbar and status bar, and the individual toolbars on each debugging information window, are visible.
      • The customization of the Registers window.
      • The flags in the Calls window, Locals window, and Watch window.
      • The items that were viewed in the Watch window.
      • The cursor location in each Source window.

    All of these settings (except for the blue ones) are applied cumulatively (Base first, then the next workspace, etc). The blue items above are only loaded from the last workspace in the chain. To show this in action I created a simple walk through to illustrate the use of workspaces the debugger.

    First I opened windbg without the use of any command line options. When it opens in this dormant state (not attached to anything and has nothing opened) its using the Base workspace. If I don't change anything (e.g. window placement) I am not prompted with any workspace dialogs when I start debugging. However if I moved the debugger's main window to any location (we will call this position 1) followed by executing any of the highlighted operations below -


    I am prompted with this dialog-


    Choosing "Yes" on the dialog above integrates my changes into the "Base" workspace so window position 1 is now part of my Base workspace.

    Now I'm going to select "Open Executable" and browse to our old faithful target binary notepad.exe. Once the binary is opened, windbg uses Base+Notepad (per Executable file). Now I'll move the debugger's main window again (we will call this position 2) and choose the option Debug > Stop debugging. Because of the window location change, I am prompted with the following-


    If I choose 'Yes', windbg will use window position 2 for anytime I open the notepad executable in the future. After closing the notepad.exe executable, windbg reverts back to using the Base workspace.

    This time I'll actually launch notepad (not from the debugger) and attach to the running notepad.exe process with the debugger. We are now at Base+User-mode. I moved the debugger window (new position 3), selected Debug > Stop Debugging and get prompted with this dialog-


    Choosing "Yes" will store WinDbg window position 3 in the User-mode workspace. Once I have completed this step, Windbg is again using the Base workspace because we stopped debugging.

    To futher illustrate workspaces I'll attach a to a target Virtual Machine for Kernel Debugging but not break in. Windbg is using Base+Kernel now. I moved the window again and as soon as I break-in I get this dialog-


    I chose 'No' on the dialog because I'm getting the hang of things. If I move the window again and type qd (quit and detach) to end the current kernel debug session, I will see this dialog-


    So before we ended the session, we were at Base+Kernel+AMD64.

    Running through this exercise helped me understand why I usually create a named workspace, change all my settings and use the command line option -W to open my workspace. Hopefully this will clear up some of the complexities involved with workspaces. This is why the debugger help file recommends making all the changes you need at the lowest possible level (i.e. Base first, then the others).

    Hope this helps...

  • Ntdebugging Blog

    Storport ETW Logging to Measure Requests Made to a Disk Unit


    Greetings NTDEBUGGING community, Bob here again, and today I would like to let everyone know about a new feature implemented in the STORPORT.SYS binaries released in the following KB articles:

    ·         KB 979764 (Windows Server 2008)

    ·         KB 978000 (Windows Server 2008 R2)


    Beginning with these new versions, it is now possible to measure timing statistics for requests made to a system’s disk unit.  These measurements are taken at the lowest possible level of OS interaction with the storage adapter hardware, making it much easier to diagnose storage performance issues. The measurements are taken from the port driver, STORPORT.SYS, via ETW (Event Tracing for Windows).  Windows drivers are implemented in a layered architecture, so in our case the STORPORT driver interfaces directly with the adapter or miniport driver.  The STORPORT driver is responsible for managing requests and queues, and providing other upper level functions.  Each miniport driver is specific to a particular storage adapter card, and knows how to deliver requests to the disk unit over the transport medium, like fibre channel for instance.

    Configuring and Starting STORPORT ETW Logging

    To measure storage performance, the user enters a threshold value via the GUI interface. (This can be scripted as well; see KB 978000 for details.)  When an I/O time is equal to or greater than the entered threshold, an event 13 is written to the ETW log.  

    Here is a walkthrough to enable this feature via the GUI:

    Click Start, type “perfmon”, and press ENTER to start Performance Monitor.



    Expand “Data Collector Sets” and select “Event Trace Sessions”.



    Right-Click on “Event Trace Sessions” and select “New -> Data Collector Set”. 



    The following dialog will appear:



    Give the new data collector set a name in the dialog box.  In this example I called it “Storport Performance Logging”. Choose the “Create manually (Advanced)” option then click Next to see the following dialog.



    After clicking “Add...” on the dialog above the following list of providers will appear. Select “Microsoft-Windows-StorPort” and click OK.



    When the dialog below opens, select “Keywords (Any)” then click Edit.



    Check the box for IOPerfNotification, and then click OK. Note:  For Windows Server 2008 SP1 and SP2, some properties may not be present (e. g. Reset, PortNotification and PauseResume). 




    Next, select “Filter” and click Edit…



    This is where we enter our time threshold.  Remember that timing values greater than or equal to this filter value will be logged to the trace file.


    Select “Binary” and check the “Filter enabled” box


    In the “Filter data” field, enter the threshold value in the number of milliseconds. This must be entered in little endian format. Please refer to this table for the most common entries.



    Binary(little endian)









    01 00 00 00 00 00 00 00

    05 00 00 00 00 00 00 00

    0A 00 00 00 00 00 00 00

    0F 00 00 00 00 00 00 00


    Note: The “Filter type” value will always remain 0 as in the example above showing an entry for 1 ms. 


    WARNING - The whole data line needs to be filled in when entering the threshold. For demonstration purposes here is the WRONG WAY to enter 256 ms. This will not work because of the missing 00 entries on the line.


    Below is the correct way to enter the value (256ms) Note: The Filter value may have to be re-entered for next run because it could get reset.


    After clicking OK, you will see a new Event Trace session (screenshot below).  Right-Click the new Event Trace session and select “Properties”



    Next we need to set a location for the trace file.  Ensure you have sufficient disk space in the target folder because logs can be many Megabytes in size depending on the number of hits. 



    The target folder may be changed in the “Directory” tab



    I recommend choosing a directory that is more easily located, like C:\PerfLogs, in which case you can click “Browse”



    Note:  It is not recommended to store the trace file on the same storage you are troubleshooting.


    The name for the file is selected on the “File” tab.


    When you are done setting the file and directory information click OK.

    Now right-click the log and select “Start” to begin tracing.


    The “Status” should indicate that it is running.


    How long to run the log depends on you issue.  You should let the log run until the issue or slow down occur then check the log for delays.

    Viewing the Log File

    First right-click and select “Stop” to flush any cached data and stop the logging.


    Then use Event Viewer to examine the log.



    Next choose "Open the saved log…"






    After clicking OK a popup dialog will be displayed to ask if you want to convert the log. Select “No”. Here is some sample event data that you will likely see in the log.



    Example: Event ID 13



    Let's examine the “Details” tab which shows information about the I/O that exceeded or equaled the threshold we had set.  If you do not set a threshold, then all I/O events are logged, but for performance reasons we recommend setting a "reasonable" threshold like 15ms.


    Breaking down the Event Data we have …


    Miniport Extension: (Kernel address of the miniport extension, not of use to performance logging, only debugging)

    PortNumber:  This is the adapter port number (i.e. RaidPort1, etc.)

    PathID:  Path number if multi-pathing is in place

    TargetID:  Target LUN exposed to the Operating System

    LUN:  The Logical Unit Number of the physical storage where the particular I/O is directed

    *RequestDuration:  How long the request took in milliseconds (the number of note here for storage latency diagnostics)

    Command:  the SCSI command sent to the unit. (Usually read or write)  This is listed in decimal.  Convert this value to hexadecimal with Calc.exe for referencing SCSI commands http://en.wikipedia.org/wiki/SCSI_command

    ScsiStatus:  the status that came back from the drive

    SRBStatus:  the status as returned from adapter  (see srb.h and scsi.h in the Microsoft WDK or http://en.wikipedia.org/wiki/SCSI_Status_Code)


     Example: Command 42 equals 2A which is a Write of 10 bytes.



    Here’s the respective SCSI command:


    When viewing the log, the commands will be mostly reads and writes.  If I/O is determined to be slow at this level, then the issue is most likely disk or SAN related.

    Hope you enjoyed the walkthrough!









Page 9 of 24 (240 items) «7891011»