• Ntdebugging Blog

    TalkBack Video: Cumulative Time Zone Update for Windows

    • 2 Comments

    Hi, my name is Joel Schaeffer, and I am an Escalation Engineer on the Platforms Global Escalation Services (GES) team.   I created a video to discuss the August 2007 Windows Cumulative Time Zone Update.  It is hosted on Channel9, and can be found here.  It will explain how to install the patch, validate the install, manually update non-mainstream Windows versions, and how to update Windows Mobile clients.

     

    See the following knowledge base articles for more information about how to update time zone information on your Windows clients and servers:

     

    August 2007 cumulative time zone update for Microsoft Windows operating systems

    http://support.microsoft.com/?id=933360

     

    How to configure daylight saving time for Microsoft Windows operating systems

    http://support.microsoft.com/?id=914387

     

    An August 2007 daylight saving time update for Windows Mobile-based devices is available

    http://support.microsoft.com/?id=941898

     

    See http://www.microsoft.com/dst2007 for more details about how to prepare for the upcoming DST changes for all Microsoft products.

  • Ntdebugging Blog

    Talkback video: Desktop Heap

    • 3 Comments

    Hello, Matthew here again.  Starting today, my team will be bringing you content in the form of videos, as well blog posts.  We’ll be hosting these videos on Channel 9, and we’ll link them from the ntdebugging blog.  One way that we’ll be using video is as a means of highlighting topics we’ve already covered, and then answering some of the common question that we receive from you in the form of comments on the blog.  So if you leave a comment with an interesting question, we may answer it in an upcoming video.  We are also thinking about using video as a way to show various debugging techniques rather than just blog about them. 

    For our first video, we’ve chosen desktop heap as a topic.  We’ve had a lot of traffic on the desktop heap posts, and we’ve had a lot of questions too.  So check out the video, and let us know what you think!

    http://channel9.msdn.com/Blogs/jeff_dailey/ntdebugging-blog-talk-about-on-desktop-heap

  • Ntdebugging Blog

    LPC Part 3: Debugging a real world LPC scenario

    • 4 Comments

    Hello my name is Roy; I’m an EE on the Microsoft global escalation services / CPR team.   This blog post continues my previous posts on LPC. 

    Below I will discuss scenarios that are simulated to depict actual issues encountered by Microsoft Customer Support Services. The two following scenarios deal with client threads waiting for LPC reply and hence in some way leading to user experience of a hung system or component.

     

    CASE1 – When everything in LPC looks intact

     

    In this scenario tracing the reason for client threads wait is possible through LPC data structures and extensions.

     

    The issue is a hung SMS (Systems Management Server) service. The OS (operating system) itself is working fine. It is just SMS that has hung.

     

    Taking a user mode hang dump (adplus.vbs hang mode dump) does not help since all we see is that threads in SMS are making RPC calls and are stuck. We need to investigate where these RPC calls are stuck. From the user dump we can see one important thing about the RPC calls is that they are implemented over LPC. LPC is completely implemented in the kernel and hence taking a kernel dump would be the next step in this case.

     

    02e9e698 7c821c94 ntdll!KiFastSystemCallRet

    02e9e69c 77c72700 ntdll!NtRequestWaitReplyPort+0xc              <--- RPC implemented over LPC

    02e9e6f4 77c713ba RPCRT4!LRPC_CCALL::SendReceive+0x230

    02e9e700 77788056 RPCRT4!I_RpcSendReceive+0x24          <--- RPC calls made

    02e9e71c 77788011 ole32!ThreadSendReceive+0xf5

    02e9e738 77787ed7 ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x112

    02e9e818 77695349 ole32!CRpcChannelBuffer::SendReceive2+0xc1

    02e9e884 776952e1 ole32!CAptRpcChnl::SendReceive+0xab

    02e9e8d8 77ce127e ole32!CCtxComChnl::SendReceive+0x113

    02e9e8f4 77ce13ca RPCRT4!NdrProxySendReceive+0x43

    02e9ecdc 77ce11bd RPCRT4!NdrClientCall2+0x206

    02e9ecfc 77c53a12 RPCRT4!ObjectStublessClient+0x8b

    02e9ed0c 77688fd2 RPCRT4!ObjectStubless+0xf

    02e9ed70 7768915a ole32!CRpcResolver::CreateInstance+0x14e

    02e9efbc 776a9b68 ole32!CClientContextActivator::CreateInstance+0xfa

    02e9effc 776a9a9a ole32!ActivationPropertiesIn::DelegateCreateInstance+0xf7

    02e9f7b0 776b1275 ole32!ICoCreateInstanceEx+0x3cc

    02e9f7d8 776b1244 ole32!CComActivator::DoCreateInstance+0x28

    02e9f7fc 74cf1a51 ole32!CoCreateInstanceEx+0x23

    02e9f830 74cf19fe wbemprox!CDCOMTrans::DoActualCCI+0x3d

    02e9f874 74cf16b1 wbemprox!CDCOMTrans::DoCCI+0x12d

    02e9f934 74cf152e wbemprox!CDCOMTrans::DoActualConnection+0x25f

    02e9f960 74cf22ab wbemprox!CDCOMTrans::DoConnection+0x25

    02e9f9a0 1806eaa9 wbemprox!CLocator::ConnectServer+0x7c

    WARNING: Stack unwind information not available. Following frames may be wrong.

    02e9fa50 186c31b6 ccmcore!CCM::Utility::Wmi::Namespace::Open+0x1a2

    02e9fb44 186c357f PolicyAgent!DllUnregisterServer+0x10d8

    02e9fbb0 18706a63 PolicyAgent!DllUnregisterServer+0x14a1

    02e9fc8c 18708899 PolicyAgentEndpoint!DllGetClassObject+0x73d2

    02e9fd6c 187097d9 PolicyAgentEndpoint!DllGetClassObject+0x9208

    02e9fde0 01068bb8 PolicyAgentEndpoint!DllGetClassObject+0xa148

    02e9fe6c 7c83b332 CcmExec+0x68bb8

    02e9fec8 7c8161d0 ntdll!RtlpWaitOrTimerCallout+0x74

    02e9feec 7c81a3c5 ntdll!RtlpAsyncWaitCallbackCompletion+0x37

    02e9ff44 7c8200fc ntdll!RtlpWorkerCallout+0x71

    02e9ff64 7c81a3fa ntdll!RtlpExecuteWorkerRequest+0x4f

    02e9ff78 7c82017f ntdll!RtlpApcCallout+0x11

    02e9ffb8 77e6608b ntdll!RtlpWorkerThread+0x61

    02e9ffec 00000000 kernel32!BaseThreadStart+0x34

     

    We examine threads belonging to ccmexec.exe from the kernel dump.

     

     

    0: kd> !process 0 0 ccmexec.exe

    PROCESS 895a6430  SessionId: 0  Cid: 01c8    Peb: 7ffde000  ParentCid: 0278

        DirBase: 6f3c7000  ObjectTable: e22d6b00  HandleCount: 832.

        Image: CcmExec.exe

     

    0: kd> .process /p /r 895a6430

    Implicit process is now 895a6430

    Loading User Symbols

    ............................................................................................

     

     

    0: kd> !thread 88e75b10

    THREAD 88e75b10  Cid 01c8.0de0  Teb: 7ffa6000 Win32Thread: e2365ea8 WAIT: (WrLpcReply) UserMode Non-Alertable

        88e75cfc  Semaphore Limit 0x1

    Waiting for reply to LPC MessageId 00039236:

    Current LPC port e2327022

    Not impersonating

    DeviceMap                 e10018a8

    Owning Process            895a6430       Image:         CcmExec.exe

    Wait Start TickCount      99831          Ticks: 46825 (0:00:12:11.640)

    Context Switch Count      14119                 LargeStack

    UserTime                  00:00:00.125

    KernelTime                00:00:00.046

    Win32 Start Address ntdll!RtlpWorkerThread (0x7c83ad38)

    Start Address kernel32!BaseThreadStartThunk (0x77e6b5f3)

    Stack Init b9425000 Current b9424c08 Base b9425000 Limit b9421000 Call 0

    Priority 13 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

     

    ChildEBP RetAddr  Args to Child             

    b9424c20 8083e6a2 88e75b88 88e75b10 88e75bb8 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

    b9424c4c 8083f164 88e75cfc 88e75cd0 88e75b10 nt!KiSwapThread+0x284 (FPO: [Non-Fpo])

    b9424c94 8093983f 88e75cfc 00000011 b9424d01 nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])

    b9424d50 80834d3f 00000114 0017db40 0017db40 nt!NtRequestWaitReplyPort+0x776 (FPO: [Non-Fpo])

    b9424d50 7c82ed54 00000114 0017db40 0017db40 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b9424d64)

    02e9e698 7c821c94 77c72700 00000114 0017db40 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    02e9e69c 77c72700 00000114 0017db40 0017db40 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

    02e9e6f4 77c713ba 00127fd4 02e9e71c 77788056 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [Non-Fpo])

    02e9e700 77788056 00127fd4 02e9e800 0009f488 RPCRT4!I_RpcSendReceive+0x24 (FPO: [Non-Fpo])

    02e9e71c 77788011 00000000 00000000 00000000 ole32!ThreadSendReceive+0xf5 (FPO: [Non-Fpo])

    02e9e738 77787ed7 02e9e800 02e9e900 0009f488 ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x112 (FPO: [Non-Fpo])

    02e9e818 77695349 0009f488 02e9e910 02e9e900 ole32!CRpcChannelBuffer::SendReceive2+0xc1 (FPO: [Non-Fpo])

    02e9e884 776952e1 0009f488 02e9e910 02e9e900 ole32!CAptRpcChnl::SendReceive+0xab (FPO: [Non-Fpo])

    02e9e8d8 77ce127e 0009f488 02e9e910 02e9e900 ole32!CCtxComChnl::SendReceive+0x113 (FPO: [Non-Fpo])

    02e9e8f4 77ce13ca 000a1934 02e9e93c 0600016e RPCRT4!NdrProxySendReceive+0x43 (FPO: [Non-Fpo])

    02e9ecdc 77ce11bd 77676298 7767951e 02e9ed14 RPCRT4!NdrClientCall2+0x206 (FPO: [Non-Fpo])

    02e9ecfc 77c53a12 00000014 00000004 02e9ed70 RPCRT4!ObjectStublessClient+0x8b (FPO: [Non-Fpo])

    02e9ed0c 77688fd2 000a1934 00000000 02e9f1b4 RPCRT4!ObjectStubless+0xf

    02e9ed70 7768915a 02e9f1b4 02e9f704 00000010 ole32!CRpcResolver::CreateInstance+0x14e (FPO: [Non-Fpo])

    02e9efbc 776a9b68 7778fcd0 00000000 00000000 ole32!CClientContextActivator::CreateInstance+0xfa (FPO: [Non-Fpo])

    02e9effc 776a9a9a 02e9f1b4 00000000 02e9f704 ole32!ActivationPropertiesIn::DelegateCreateInstance+0xf7 (FPO: [Non-Fpo])

    02e9f7b0 776b1275 74cf1a8c 00000000 00000014 ole32!ICoCreateInstanceEx+0x3cc (FPO: [Non-Fpo])

    02e9f7d8 776b1244 74cf1a8c 00000000 00000014 ole32!CComActivator::DoCreateInstance+0x28 (FPO: [Non-Fpo])

    02e9f7fc 74cf1a51 74cf1a8c 00000000 00000014 ole32!CoCreateInstanceEx+0x23 (FPO: [Non-Fpo])

    02e9f830 74cf19fe 02e9f898 00000001 00000000 wbemprox!CDCOMTrans::DoActualCCI+0x3d (FPO: [Non-Fpo])

    02e9f874 74cf16b1 02e9f898 00000001 00000000 wbemprox!CDCOMTrans::DoCCI+0x12d (FPO: [Non-Fpo])

    02e9f934 74cf152e 01801974 00000000 00000000 wbemprox!CDCOMTrans::DoActualConnection+0x25f (FPO: [Non-Fpo])

    02e9f960 74cf22ab 01801974 00000000 00000000 wbemprox!CDCOMTrans::DoConnection+0x25 (FPO: [Non-Fpo])

    02e9f9a0 1806eaa9 00b594d0 01801974 00000000 wbemprox!CLocator::ConnectServer+0x7c (FPO: [Non-Fpo])

    WARNING: Stack unwind information not available. Following frames may be wrong.

    02e9fa50 186c31b6 0062bdaa 00000001 00000000 ccmcore!CCM::Utility::Wmi::Namespace::Open+0x1a2

    02e9fb44 186c357f 02e9fb68 00000001 00000000 PolicyAgent!DllUnregisterServer+0x10d8

    02e9fbb0 18706a63 0056a6f0 00000001 00000000 PolicyAgent!DllUnregisterServer+0x14a1

    02e9fc8c 18708899 0063cd50 02e9fd0c 02e9fcec PolicyAgentEndpoint!DllGetClassObject+0x73d2

    02e9fd6c 187097d9 ffffffff 00000864 00000000 PolicyAgentEndpoint!DllGetClassObject+0x9208

    02e9fde0 01068bb8 00566a28 006af398 00000000 PolicyAgentEndpoint!DllGetClassObject+0xa148

    02e9fe6c 7c83b332 000008b6 0011c700 0011c7f0 CcmExec+0x68bb8

    02e9fec8 7c8161d0 01068a29 000008b6 0011c700 ntdll!RtlpWaitOrTimerCallout+0x74 (FPO: [Non-Fpo])

    02e9feec 7c81a3c5 0011c7f0 7c889880 01e89b18 ntdll!RtlpAsyncWaitCallbackCompletion+0x37 (FPO: [Non-Fpo])

    02e9ff44 7c8200fc 7c816199 0011c7f0 00000000 ntdll!RtlpWorkerCallout+0x71 (FPO: [Non-Fpo])

    02e9ff64 7c81a3fa 00000000 0011c7f0 01e89b18 ntdll!RtlpExecuteWorkerRequest+0x4f (FPO: [Non-Fpo])

     

    The !thread output contains some important information.

    1. WAIT: (WrLpcReply): This tells us that the thread is waiting for a LPC reply. This information called as the “Wait Reason” is passed to KeWaitForSingleObject( ) as a parameter. This is retrieved from the _KTHREAD.WaitReason field.
    2. Waiting for reply to LPC MessageId 00039236: This line gives the message ID of the LPC message that was sent to the server thread.
    3. Current LPC port e2327022: The port number listed is the client communication port.
    4. 88e75cfc  Semaphore Limit 0x1 : The semaphore that the thread is waiting for is signaled when a reply message is posted to the client communication ports message queue. NtRequestWaitReplyPort( ) waits on this semaphore.
    5. Ticks: 46825 (0:00:12:11.640): This thread has been waiting for 12mins for a reply from the server.

     

    The next step is to find the server thread that is processing this LPC message. We will use LPC debugger extensions to trace the server thread.

     

    0: kd> !lpc message 00039236

    Searching message 39236 in threads ...

        Server thread 89531020 is working on message 39236                        

    Client thread 88e75b10 waiting a reply from 39236                         

    Searching thread 88e75b10 in port rundown queues ...

     

    Server communication port 0xe2300a38

        Handles: 1   References: 1

        The LpcDataInfoChainHead queue is empty

            Connected port: 0xe2327020      Server connection port: 0xe13734b0

     

    Client communication port 0xe2327020

        Handles: 1   References: 9

        The LpcDataInfoChainHead queue is empty

     

    Server connection port e13734b0  Name: epmapper

        Handles: 1   References: 93

        Server process  : 896e1450 (svchost1.exe)

        Queue semaphore : 8995ac10

        Semaphore state 0 (0x0)

        The message queue is empty

     

        Messages in LpcDataInfoChainHead:

            0000 e2365188 - Busy  Id=0003832f  From: 04a8.13f4  Context=80050003  [e1373530 . e297ab00]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00050242 421e5fa2 9b3b48b4 a695daf5 00000001

            0000 e297ab00 - Busy  Id=00039232  From: 01c8.14e4  Context=8021000d  [e2365188 . e12cbea0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 b8d06b40 b8d06b00 000003f4 000010a0

            0000 e12cbea0 - Busy  Id=00039236  From: 01c8.0de0  Context=8021000d  [e297ab00 . e1297430]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 4195c844 8ff87eb3 2b70d0f6 001e001c

            0000 e1297430 - Busy  Id=0003c96d  From: 0204.0224  Context=8022000e  [e12cbea0 . e1534a48]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 b8bb1b40 b8bb1b00 000003f4 00000fbc

            0000 e1534a48 - Busy  Id=0003db3b  From: 01c8.10e0  Context=8021000d  [e1297430 . e2515ca0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 0000dc04 000001c8 7b6ee790 9e8a0590

            0000 e2515ca0 - Busy  Id=0003db3f  From: 01c8.0f5c  Context=8021000d  [e1534a48 . e29279e8]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 0000dc04 000001c8 7b6ee790 9e8a0590

            0000 e29279e8 - Busy  Id=0003db5a  From: 01c8.1618  Context=8021000d  [e2515ca0 . e1061008]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 00000000 00000000 00000000 00000000

            0000 e1061008 - Busy  Id=0004617f  From: 01c8.1220  Context=8021000d  [e29279e8 . e295fa80]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 00010c20 000001c8 37bd6287 35ebbd53

            0000 e295fa80 - Busy  Id=0004a38a  From: 01c8.1708  Context=8021000d  [e1061008 . e2966ea0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 0001a41a 000001c8 cb786dae 60cdf739

            0000 e2966ea0 - Busy  Id=0004d0c2  From: 04a8.04d4  Context=80050003  [e295fa80 . e1261008]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00020001 00030242 5415989d 3d8048c1 00070005 00000000

            0000 e1261008 - Busy  Id=000544d7  From: 01c8.0758  Context=8021000d  [e2966ea0 . e1373530]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 00010c20 000001c8 37bd6287 35ebbd53

        The LpcDataInfoChainHead queue contains 11 messages

        Threads in RunDown queue :     0xe2364fc8    0xe297a940    0xe12cbce0    0xe1297270    0xe1534888    0xe2515ae0    0xe2927828    0xe1060e48    0xe295f8c0    0xe2966ce0    0xe1260e48

    Done.                     

     

    When a client thread sends a message to a server the message ID is stored in the clients ETHREAD.LpcReplyMessageId field. !lpc message searches the message ID in all threads in the system. Once client thread is identified ETHREAD.LpcWaitingOnPort will give the client communication port used for the LPC transaction. This has references to the server communication port and connection port.

     

    !lpc message dumps the message queue for all the port objects. Important thing to note here is that even though the message queue is empty for connection port there are 11 messages chained to LpcDataInfoChainHead. This tells us that 11 messages were dequeued from the server message queue but the server has not replied to any of the messages since once the server responds to a message it is unchained from the LpcDataInfoChainHead list and freed.

     

    Messaging Queue

    Figure [4] LPC Message Queuing stages

     

    The above figure shows messages are first queued to the MsgQueue and then chained to LpcDataInfoChainHead. This kind of chaining only happens when the message contains extra data. This does not happen for a LPC_CONNECTION_REQUEST or if the message does not contain extra data. In our scenario server thread has dequeued and started processing the hence we see it in the LpcDataInfoChainHead.

     

    We now dump the server thread to see why it has not responded to the message for 12mins.

     

    0: kd> .process /p /r 896e1450

    Implicit process is now 896e1450

    Loading User Symbols

    ............................................................................................................................

     

    0: kd> !thread 89531020

    THREAD 89531020  Cid 03f4.10a0  Teb: 7ffac000 Win32Thread: 00000000 WAIT: (WrLpcReply) UserMode Non-Alertable

        8953120c  Semaphore Limit 0x1

    Waiting for reply to LPC MessageId 0003923d:

    Current LPC port e22d29d8

    Impersonation token:  e15315d0 (Level Impersonation)

    Owning Process            896e1450       Image:         svchost1.exe

    Wait Start TickCount      144629         Ticks: 2027 (0:00:00:31.671)

    Context Switch Count      219            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address 0x00039236

    LPC Server thread working on message Id 39236

    Start Address kernel32!BaseThreadStartThunk (0x77e6b5f3)

    Stack Init b8d07000 Current b8d06c08 Base b8d07000 Limit b8d04000 Call 0

    Priority 10 BasePriority 8 PriorityDecrement 0

    ChildEBP RetAddr  Args to Child             

    b8d06c20 8083e6a2 89531098 89531020 895310c8 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

    b8d06c4c 8083f164 8953120c 895311e0 89531020 nt!KiSwapThread+0x284 (FPO: [Non-Fpo])

    b8d06c94 8093983f 8953120c 00000011 b8d06d01 nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])

    b8d06d50 80834d3f 0000056c 000f3fc0 000f3fc0 nt!NtRequestWaitReplyPort+0x776 (FPO: [Non-Fpo])

    b8d06d50 7c82ed54 0000056c 000f3fc0 000f3fc0 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b8d06d64)

    0097f0a8 7c821c94 77c72700 0000056c 000f3fc0 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0097f0ac 77c72700 0000056c 000f3fc0 000f3fc0 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

    0097f118 77c713ba 0097f154 0097f138 77c72c7f RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [Non-Fpo])

    0097f124 77c72c7f 0097f154 76d320d8 0097f540 RPCRT4!I_RpcSendReceive+0x24 (FPO: [Non-Fpo])

    0097f138 77ce219b 0097f180 000db5f8 00000000 RPCRT4!NdrSendReceive+0x2b (FPO: [Non-Fpo])

    0097f520 76d41213 76d320d8 76d402c8 0097f540 RPCRT4!NdrClientCall2+0x22e (FPO: [Non-Fpo])

    0097f538 76d3fc12 000b5618 000ce248 000ce268 rpcss!LocalCreateInstance+0x1c (FPO: [Non-Fpo])

    0097f5cc 76d3f033 0097f7d8 0097f654 0097f824 rpcss!CServerListEntry::CallServer+0x127 (FPO: [Non-Fpo])

    0097f604 76d3bc7f 00000001 00000008 00000000 rpcss!CServerTableEntry::CallRunningServer+0x181 (FPO: [Non-Fpo])

    0097f670 76d3bac5 0097f7d8 00000000 000e415c rpcss!Activation+0x337 (FPO: [Non-Fpo])

    0097f6cc 76d3dd08 000e415c 0097f7bc 0097f71c rpcss!ActivateFromProperties+0x213 (FPO: [Non-Fpo])

    0097f6dc 76d3dd5c 76d738d0 00000000 000e415c rpcss!CScmActivator::CreateInstance+0x10 (FPO: [Non-Fpo])

    0097f71c 76d3c162 000e415c 00000000 0097f7bc rpcss!ActivationPropertiesIn::DelegateCreateInstance+0xf7 (FPO: [Non-Fpo])

    0097f764 76d3bdb6 000e415c 0097f7bc 00000000 rpcss!ActivateFromPropertiesPreamble+0x4e3 (FPO: [Non-Fpo])

    0097f7ac 76d3dcdd 00000002 000ce2ac 00000000 rpcss!PerformScmStage+0xbb (FPO: [Non-Fpo])

    0097f8c8 77c70f3b 000cdbb0 000ce248 000ce268 rpcss!SCMActivatorCreateInstance+0xa2 (FPO: [Non-Fpo])

    0097f8f8 77ce23f7 76d3dc3e 0097fae0 00000007 RPCRT4!Invoke+0x30

    0097fcf8 77ce26ed 00000000 00000000 000cdcdc RPCRT4!NdrStubCall2+0x299 (FPO: [Non-Fpo])

    0097fd14 77c709be 000cdcdc 00094ef0 000cdcdc RPCRT4!NdrServerCall2+0x19 (FPO: [Non-Fpo])

    0097fd48 77c7093f 76d3156c 000cdcdc 0097fdec RPCRT4!DispatchToStubInCNoAvrf+0x38 (FPO: [Non-Fpo])

    0097fd9c 77c70865 00000004 00000000 76d73538 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x117 (FPO: [Non-Fpo])

    0097fdc0 77c734b1 000cdcdc 00000000 76d73538 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3 (FPO: [Non-Fpo])

    0097fdfc 77c71bb3 000b44d8 0008f0e8 000dea40 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0x42c (FPO: [Non-Fpo])

    0097fe20 77c75458 0008f120 0097fe38 000b44d8 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0x127 (FPO: [Non-Fpo])

    0097ff84 77c5778f 0097ffac 77c5f7dd 0008f0e8 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430 (FPO: [Non-Fpo])

    0097ff8c 77c5f7dd 0008f0e8 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd (FPO: [Non-Fpo])

    0097ffac 77c5de88 00084298 0097ffec 77e6608b RPCRT4!BaseCachedThreadRoutine+0x9d (FPO: [Non-Fpo])

    0097ffb8 77e6608b 000c0070 00000000 00000000 RPCRT4!ThreadStartRoutine+0x1b (FPO: [Non-Fpo])

    0097ffec 00000000 77c5de6d 000c0070 00000000 kernel32!BaseThreadStart+0x34 (FPO: [Non-Fpo])

     

    The server thread is waiting for a reply to another LPC message it issued while servicing the client request. Tracing the LPC message as above -

     

    0: kd> !lpc message 0003923d

    Searching message 3923d in threads ...

    Client thread 89531020 waiting a reply from 3923d                         

        Server thread 898cd998 is working on message 3923d                         

    Searching thread 89531020 in port rundown queues ...

     

    Server communication port 0xe22d58c8

        Handles: 1   References: 1

        The LpcDataInfoChainHead queue is empty

            Connected port: 0xe22d29d8      Server connection port: 0xe17a95f8

     

    Client communication port 0xe22d29d8

        Handles: 1   References: 15

        The LpcDataInfoChainHead queue is empty

     

    Server connection port e17a95f8  Name: OLE71892D01F993498EA1C3757C71D5

        Handles: 1   References: 37

        Server process  : 89bedd88 (svchost.exe)

        Queue semaphore : 89718210

        Semaphore state 0 (0x0)

        The message queue is empty

     

        Messages in LpcDataInfoChainHead:

            0000 e188c3b0 - Busy  Id=00039239  From: 03f4.111c  Context=80030001  [e17a9678 . e28b00f8]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e28b00f8 - Busy  Id=0003923d  From: 03f4.10a0  Context=80030001  [e188c3b0 . e2846008]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e2846008 - Busy  Id=0003c96e  From: 03f4.0df0  Context=80030001  [e28b00f8 . e27dcb70]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e27dcb70 - Busy  Id=0003db3c  From: 03f4.1158  Context=80030001  [e2846008 . e28c0788]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e28c0788 - Busy  Id=0003db41  From: 03f4.0fbc  Context=80030001  [e27dcb70 . e2983650]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e2983650 - Busy  Id=0003db5c  From: 03f4.0f40  Context=80030001  [e28c0788 . e2596ae0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e2596ae0 - Busy  Id=00046180  From: 03f4.03d8  Context=80030001  [e2983650 . e28c3ea0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e28c3ea0 - Busy  Id=0004a38d  From: 03f4.07c4  Context=80030001  [e2596ae0 . e294aea0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e294aea0 - Busy  Id=0004d0c6  From: 03f4.1714  Context=80030001  [e28c3ea0 . e2977ea0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00030342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e2977ea0 - Busy  Id=000544d8  From: 03f4.162c  Context=80030001  [e294aea0 . e241f008]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e241f008 - Busy  Id=0006077e  From: 03f4.031c  Context=80030001  [e2977ea0 . e2940820]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e2940820 - Busy  Id=000638da  From: 03f4.1754  Context=80030001  [e241f008 . e2576270]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e2576270 - Busy  Id=000638f1  From: 03f4.1794  Context=80030001  [e2940820 . e261c268]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

            0000 e261c268 - Busy  Id=000638fe  From: 03f4.17a8  Context=80030001  [e2576270 . e17a9678]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00040342 0000d401 000004a8 2e6c8db6 4131cb31

        The LpcDataInfoChainHead queue contains 14 messages

        Threads in RunDown queue :     0xe188c1f0    0xe28aff38    0xe2845e48    0xe27dc9b0    0xe28c05c8    0xe2983490    0xe2596920    0xe28c3ce0    0xe294ace0    0xe2977ce0    0xe241ee48    0xe2940660    0xe25760b0    0xe261c0a8

    Done.        

     

     

    0: kd> .process /p /r 89bedd88

    Implicit process is now 89bedd88

    Loading User Symbols

    ............................................................................................................................

     

     

    0: kd> !thread 898cd998

    THREAD 898cd998  Cid 04a8.0508  Teb: 7ffd5000 Win32Thread: e23a6008 WAIT: (UserRequest) UserMode Non-Alertable

        89006340  SynchronizationEvent

    IRP List:

        88eb89e0: (0006,0094) Flags: 00000800  Mdl: 00000000

    Not impersonating

    DeviceMap                 e10018a8

    Owning Process            89bedd88       Image:         svchost.exe

    Wait Start TickCount      99831          Ticks: 46825 (0:00:12:11.640)

    Context Switch Count      7330                 LargeStack

    UserTime                  00:00:00.906

    KernelTime                00:00:00.531

    Win32 Start Address 0x0003923d

    LPC Server thread working on message Id 3923d

    Start Address kernel32!BaseThreadStartThunk (0x77e6b5f3)

    Stack Init b9a55000 Current b9a54c60 Base b9a55000 Limit b9a51000 Call 0

    Priority 13 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

    ChildEBP RetAddr  Args to Child             

    b9a54c78 8083e6a2 898cda10 898cd998 898cda40 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

    b9a54ca4 8083f164 00000000 00000000 00000000 nt!KiSwapThread+0x284 (FPO: [Non-Fpo])

    b9a54cec 8092db70 89006340 00000006 80834d01 nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])

    b9a54d50 80834d3f 00000d64 00000000 00000000 nt!NtWaitForSingleObject+0x9a (FPO: [Non-Fpo])

    b9a54d50 7c82ed54 00000d64 00000000 00000000 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b9a54d64)

    0109f248 7c822124 77e6bad8 00000d64 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0109f24c 77e6bad8 00000d64 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])

    0109f2bc 77e6ba42 00000d64 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac (FPO: [Non-Fpo])

    0109f2d0 77719486 00000d64 ffffffff 00000000 kernel32!WaitForSingleObject+0x12 (FPO: [Non-Fpo])

    0109f2e4 7769c09b 00086803 00000000 00086660 ole32!CComApartment::WaitForAccess+0x5c (FPO: [0,0,0])

    0109f2f8 776a417d 0109f344 02163088 02145ac0 ole32!CComApartment::GetPreRegMOID+0x41 (FPO: [Non-Fpo])

    0109f310 776a40d2 0109f344 77790858 00000001 ole32!GetPreRegMOID+0x24 (FPO: [Non-Fpo])

    0109f358 776a48e3 00000001 00000003 0109f3c4 ole32!CIDObject::GetOrCreateStdID+0x11e (FPO: [Non-Fpo])

    0109f37c 776a4419 02163088 00000000 00086860 ole32!ObtainStdIDFromUnk+0x92 (FPO: [Non-Fpo])

    0109f3a4 776a43c1 0109f468 00162690 02b8bd30 ole32!StdMarshalObject+0x49 (FPO: [Non-Fpo])

    0109f400 776a3974 0109f42c 0109f468 00162690 ole32!CDestObjectWrapper::MarshalInterface+0x150 (FPO: [Non-Fpo])

    0109f43c 77686f83 0109f468 00162690 02b8bd30 ole32!CoMarshalInterface+0x93 (FPO: [Non-Fpo])

    0109f490 77686e38 02b8bd30 00162690 00162690 ole32!MarshalHelperMulti+0x48 (FPO: [Non-Fpo])

    0109f4bc 77686d46 58b10040 00000001 00162690 ole32!GetInstanceHelperMulti+0x4d (FPO: [Non-Fpo])

    0109f584 77c70f3b 000a7938 00000000 0218b774 ole32!CObjServer::CreateInstance+0x30d (FPO: [Non-Fpo])

    0109f5a8 77ce23f7 77686b34 0109f790 00000004 RPCRT4!Invoke+0x30

    0109f9a8 77ce1a95 000b4398 000b2074 05cc596c RPCRT4!NdrStubCall2+0x299 (FPO: [Non-Fpo])

    0109fa00 77788ec1 000b4398 05cc596c 000b2074 RPCRT4!CStdStubBuffer_Invoke+0xc6 (FPO: [Non-Fpo])

    0109fa44 77788e74 05cc596c 000b4428 00000000 ole32!SyncStubInvoke+0x33 (FPO: [Non-Fpo])

    0109fa8c 7769b2b7 05cc596c 000b4288 000b4398 ole32!StubInvoke+0xa7 (FPO: [Non-Fpo])

    0109fb64 7769b0b5 000b2074 00000000 000b4398 ole32!CCtxComChnl::ContextInvoke+0xe3 (FPO: [Non-Fpo])

    0109fb80 77789926 05cc596c 00000001 000b4398 ole32!MTAInvoke+0x1a (FPO: [Non-Fpo])

    0109fbb0 77788c33 d0908070 000b2074 000b4398 ole32!AppInvoke+0xa3 (FPO: [Non-Fpo])

    0109fc84 77789936 05cc5918 000b22f8 000b4410 ole32!ComInvokeWithLockAndIPID+0x2c5 (FPO: [Non-Fpo])

    0109fcd0 77c709be 020ccc24 000b4410 020ccc24 ole32!ThreadInvoke+0x1e0 (FPO: [Non-Fpo])

    0109fd04 77c7093f 777887e0 020ccc24 0109fdec RPCRT4!DispatchToStubInCNoAvrf+0x38 (FPO: [Non-Fpo])

    0109fd58 77c70865 00000000 00000000 7767bfa8 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x117 (FPO: [Non-Fpo])

    0109fd7c 77c657eb 020ccc24 00000000 7767bfa8 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3 (FPO: [Non-Fpo])

    0109fdbc 77c71e26 020ccc24 020ccbdc 00000000 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject+0xc0 (FPO: [Non-Fpo])

    0109fdfc 77c71bb3 000c7db0 000b1498 00097c08 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0x42c (FPO: [Non-Fpo])

    0109fe20 77c75458 000b14d0 0109fe38 000c7db0 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0x127 (FPO: [Non-Fpo])

    0109ff84 77c5778f 0109ffac 77c5f7dd 000b1498 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430 (FPO: [Non-Fpo])

    0109ff8c 77c5f7dd 000b1498 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd (FPO: [Non-Fpo])

    0109ffac 77c5de88 0008aa48 0109ffec 77e6608b RPCRT4!BaseCachedThreadRoutine+0x9d (FPO: [Non-Fpo])

    0109ffb8 77e6608b 000b1678 00000000 00000000 RPCRT4!ThreadStartRoutine+0x1b (FPO: [Non-Fpo])

     

    The above LPC server thread is waiting for allocation of COM OID (Object Identifier) since we have run out of OIDs. The way OIDs are allocated is by calling CComApartment::CallTheResolver( ). Searching threads in the system with CComApartment::CallTheResolver on the stack we get the below thread within svchost.exe.

     

     

    0: kd> !thread 88d76db0 

    THREAD 88d76db0  Cid 04a8.13f4  Teb: 7ffd6000 Win32Thread: e295c758 WAIT: (Unknown) UserMode Non-Alertable

        88d76f9c  Semaphore Limit 0x1

    Waiting for reply to LPC MessageId 0003832f:

    Current LPC port e15791b8

    Not impersonating

    DeviceMap                 e10018a8

    Owning Process            89bedd88       Image:         svchost.exe

    Wait Start TickCount      99073          Ticks: 47583 (0:00:12:23.484)

    Context Switch Count      6947                 LargeStack

    UserTime                  00:00:00.859

    KernelTime                00:00:00.562

    Win32 Start Address 0x0003832e

    LPC Server thread working on message Id 3832e

    Start Address kernel32!BaseThreadStartThunk (0x77e6b5f3)

    Stack Init b8af6000 Current b8af5c08 Base b8af6000 Limit b8af2000 Call 0

    Priority 9 BasePriority 8 PriorityDecrement 0

    Kernel stack not resident.

    ChildEBP RetAddr  Args to Child             

    b8af5c20 8083e6a2 88d76e28 88d76db0 88d76e58 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

    b8af5c4c 8083f164 88d76f9c 88d76f70 88d76db0 nt!KiSwapThread+0x284 (FPO: [Non-Fpo])

    b8af5c94 8093983f 88d76f9c 00000011 b8af5d01 nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])

    b8af5d50 80834d3f 00000094 000dde10 000dde10 nt!NtRequestWaitReplyPort+0x776 (FPO: [Non-Fpo])

    b8af5d50 7c82ed54 00000094 000dde10 000dde10 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b8af5d64)

    0251ed2c 7c821c94 77c72700 00000094 000dde10 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0251ed30 77c72700 00000094 000dde10 000dde10 ntdll!NtRequestWaitReplyPort+0xc (FPO: [3,0,0])

    0251ed88 77c713ba 0251edc4 0251eda8 77c72c7f RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [Non-Fpo])

    0251ed94 77c72c7f 0251edc4 776720f8 0251f1b0 RPCRT4!I_RpcSendReceive+0x24 (FPO: [Non-Fpo])

    0251eda8 77ce219b 0251edf0 05c6b44c 0251f360 RPCRT4!NdrSendReceive+0x2b (FPO: [Non-Fpo])

    0251f190 7769c087 776720f8 77671df0 0251f1b0 RPCRT4!NdrClientCall2+0x22e (FPO: [Non-Fpo])

    0251f1a8 7769c04e 0008b0a8 0008bc30 0251f1f4 ole32!ServerAllocateOIDs+0x1c (FPO: [Non-Fpo])

    0251f1d8 7769c01b 0251f1f4 0251f368 0251f220 ole32!CRpcResolver::ServerAllocOIDs+0x2a (FPO: [Non-Fpo])

    0251f1fc 7769bf93 0251f368 0251f220 00000014 ole32!OXIDEntry::AllocOIDs+0x40 (FPO: [Non-Fpo])

    0251f36c 7769c0b2 00086803 00000000 00086660 ole32!CComApartment::CallTheResolver+0x6b (FPO: [Non-Fpo])

    0251f380 776a417d 0251f3cc 05c67490 05d59ee0 ole32!CComApartment::GetPreRegMOID+0x54 (FPO: [Non-Fpo])

    0251f398 776a40d2 0251f3cc 77790858 00000001 ole32!GetPreRegMOID+0x24 (FPO: [Non-Fpo])

    0251f3e0 776a48e3 00000001 00000003 0251f44c ole32!CIDObject::GetOrCreateStdID+0x11e (FPO: [Non-Fpo])

    0251f404 776a4419 05c67490 00000000 00086860 ole32!ObtainStdIDFromUnk+0x92 (FPO: [Non-Fpo])

    0251f42c 776a43c1 05e39b00 0251f504 02b30234 ole32!StdMarshalObject+0x49 (FPO: [Non-Fpo])

    0251f488 776a3974 0251f4b4 05e39b00 0251f504 ole32!CDestObjectWrapper::MarshalInterface+0x150 (FPO: [Non-Fpo])

    0251f4c4 77c544a3 05e39b00 0251f504 02b30234 ole32!CoMarshalInterface+0x93 (FPO: [Non-Fpo])

    0251f518 77c54e85 0251f5e8 02b30234 74ce6210 RPCRT4!NdrpInterfacePointerMarshall+0x9a (FPO: [Non-Fpo])

    0251f538 77c691ad 0251f5e8 000f2898 02b30234 RPCRT4!NdrpPointerMarshall+0x87 (FPO: [Non-Fpo])

    0251f550 77c690fc 0251f5e8 02b30234 74ce6210 RPCRT4!NdrPointerMarshall+0x30 (FPO: [Non-Fpo])

    0251f570 77c691ad 0251f5e8 00000000 00112e50 RPCRT4!NdrpPointerMarshall+0x116 (FPO: [Non-Fpo])

    0251f588 77c715ac 0251f5e8 00112e50 74ce620c RPCRT4!NdrPointerMarshall+0x30 (FPO: [Non-Fpo])

    0251f5ac 77ce249e 00000000 00000001 020ce8c8 RPCRT4!NdrpServerMarshal+0x82 (FPO: [Non-Fpo])

    0251f9a8 77ce1a95 020ce8c8 0011fa08 000f1454 RPCRT4!NdrStubCall2+0x374 (FPO: [Non-Fpo])

    0251fa00 77788ec1 020ce8c8 000f1454 0011fa08 RPCRT4!CStdStubBuffer_Invoke+0xc6 (FPO: [Non-Fpo])

    0251fa44 77788e74 000f1454 00137868 020debf0 ole32!SyncStubInvoke+0x33 (FPO: [Non-Fpo])

    0251fa8c 7769b2b7 000f1454 0014aaf8 020ce8c8 ole32!StubInvoke+0xa7 (FPO: [Non-Fpo])

    0251fb64 7769b0b5 0011fa08 00000000 020ce8c8 ole32!CCtxComChnl::ContextInvoke+0xe3 (FPO: [Non-Fpo])

    0251fb80 77789926 000f1454 00000001 020ce8c8 ole32!MTAInvoke+0x1a (FPO: [Non-Fpo])

    0251fbb0 77788c33 d0908070 0011fa08 020ce8c8 ole32!AppInvoke+0xa3 (FPO: [Non-Fpo])

    0251fc84 77789936 000f1400 020d3b48 00137850 ole32!ComInvokeWithLockAndIPID+0x2c5 (FPO: [Non-Fpo])

    0251fcd0 77c709be 0216202c 00137850 0216202c ole32!ThreadInvoke+0x1e0 (FPO: [Non-Fpo])

    0251fd04 77c7093f 777887e0 0216202c 0251fdec RPCRT4!DispatchToStubInCNoAvrf+0x38 (FPO: [Non-Fpo])

    0251fd58 77c70865 00000000 00000000 7767bfa8 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x117 (FPO: [Non-Fpo])

    0251fd7c 77c657eb 0216202c 00000000 7767bfa8 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3 (FPO: [Non-Fpo])

     

     

    0: kd> !lpc message 0003832f

    Searching message 3832f in threads ...

        Server thread 8900edb0 is working on message 3832f                        

    Client thread 88d76db0 waiting a reply from 3832f                         

    Searching thread 88d76db0 in port rundown queues ...

     

    Server communication port 0xe188a6e8

        Handles: 1   References: 1

        The LpcDataInfoChainHead queue is empty

            Connected port: 0xe15791b8      Server connection port: 0xe13734b0

     

    Client communication port 0xe15791b8

        Handles: 1   References: 5

        The LpcDataInfoChainHead queue is empty

     

    Server connection port e13734b0  Name: epmapper

        Handles: 1   References: 93

        Server process  : 896e1450 (svchost1.exe)

        Queue semaphore : 8995ac10

        Semaphore state 0 (0x0)

        The message queue is empty

     

        Messages in LpcDataInfoChainHead:

            0000 e2365188 - Busy  Id=0003832f  From: 04a8.13f4  Context=80050003  [e1373530 . e297ab00]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00000001 00050242 421e5fa2 9b3b48b4 a695daf5 00000001

            0000 e297ab00 - Busy  Id=00039232  From: 01c8.14e4  Context=8021000d  [e2365188 . e12cbea0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 b8d06b40 b8d06b00 000003f4 000010a0

            0000 e12cbea0 - Busy  Id=00039236  From: 01c8.0de0  Context=8021000d  [e297ab00 . e1297430]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 4195c844 8ff87eb3 2b70d0f6 001e001c

            0000 e1297430 - Busy  Id=0003c96d  From: 0204.0224  Context=8022000e  [e12cbea0 . e1534a48]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 b8bb1b40 b8bb1b00 000003f4 00000fbc

            0000 e1534a48 - Busy  Id=0003db3b  From: 01c8.10e0  Context=8021000d  [e1297430 . e2515ca0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 0000dc04 000001c8 7b6ee790 9e8a0590

            0000 e2515ca0 - Busy  Id=0003db3f  From: 01c8.0f5c  Context=8021000d  [e1534a48 . e29279e8]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 0000dc04 000001c8 7b6ee790 9e8a0590

            0000 e29279e8 - Busy  Id=0003db5a  From: 01c8.1618  Context=8021000d  [e2515ca0 . e1061008]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 00000000 00000000 00000000 00000000

            0000 e1061008 - Busy  Id=0004617f  From: 01c8.1220  Context=8021000d  [e29279e8 . e295fa80]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 00010c20 000001c8 37bd6287 35ebbd53

            0000 e295fa80 - Busy  Id=0004a38a  From: 01c8.1708  Context=8021000d  [e1061008 . e2966ea0]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 0001a41a 000001c8 cb786dae 60cdf739

            0000 e2966ea0 - Busy  Id=0004d0c2  From: 04a8.04d4  Context=80050003  [e295fa80 . e1261008]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00020001 00030242 5415989d 3d8048c1 00070005 00000000

            0000 e1261008 - Busy  Id=000544d7  From: 01c8.0758  Context=8021000d  [e2966ea0 . e1373530]

                       Length=0044002c  Type=00380001 (LPC_REQUEST)

                       Data: 00010001 00040242 00010c20 000001c8 37bd6287 35ebbd53

        The LpcDataInfoChainHead queue contains 11 messages

        Threads in RunDown queue :     0xe2364fc8    0xe297a940    0xe12cbce0    0xe1297270    0xe1534888    0xe2515ae0    0xe2927828    0xe1060e48    0xe295f8c0    0xe2966ce0    0xe1260e48

    Done.                                             

     

     

    0: kd> !thread 8900edb0

    THREAD 8900edb0  Cid 03f4.0598  Teb: 7ffd4000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable

        89014cc0  NotificationEvent

    Not impersonating

    DeviceMap                 e10018a8

    Owning Process            896e1450       Image:         svchost1.exe

    Wait Start TickCount      144629         Ticks: 2027 (0:00:00:31.671)

    Context Switch Count      5892            

    UserTime                  00:00:00.765

    KernelTime                00:00:01.203

    Win32 Start Address 0x0003832f

    LPC Server thread working on message Id 3832f

    Start Address kernel32!BaseThreadStartThunk (0x77e6b5f3)

    Stack Init b95d1000 Current b95d0c60 Base b95d1000 Limit b95ce000 Call 0

    Priority 13 BasePriority 8 PriorityDecrement 0

    ChildEBP RetAddr  Args to Child             

    b95d0c78 8083e6a2 8900ee28 8900edb0 8900ee58 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])

    b95d0ca4 8083f164 00000000 00000000 00000000 nt!KiSwapThread+0x284 (FPO: [Non-Fpo])

    b95d0cec 8092db70 89014cc0 00000006 80935d01 nt!KeWaitForSingleObject+0x346 (FPO: [Non-Fpo])

    b95d0d50 80834d3f 000004e8 00000000 00000000 nt!NtWaitForSingleObject+0x9a (FPO: [Non-Fpo])

    b95d0d50 7c82ed54 000004e8 00000000 00000000 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b95d0d64)

    0082f7a4 7c822124 77e6bad8 000004e8 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0082f7a8 77e6bad8 000004e8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])

    0082f818 77e6ba42 000004e8 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac (FPO: [Non-Fpo])

    0082f82c 77cab1e0 000004e8 ffffffff 000bc314 kernel32!WaitForSingleObject+0x12 (FPO: [Non-Fpo])

    0082f840 77cab333 0082f89c 000ad626 000bc314 RPCRT4!SWMRLock::LockSharedOrExclusiveOnLastWriter+0x32 (FPO: [Non-Fpo])

    0082f868 77c72e00 0082f89c 00000001 00000002 RPCRT4!SWMRLock::LockSharedOrExclusive+0xa6 (FPO: [Non-Fpo])

    0082f890 77c72f71 00000000 00000000 00000010 RPCRT4!NDRSContextUnmarshall2+0x1b9 (FPO: [Non-Fpo])

    0082f8bc 77c72eb6 4182f938 76d32c22 0082f938 RPCRT4!NdrServerContextNewUnmarshall+0xa6 (FPO: [Non-Fpo])

    0082f8d0 77c70f93 0082f938 0082fae4 76d32c22 RPCRT4!NdrUnmarshallHandle+0x52 (FPO: [Non-Fpo])

    0082f900 77ce2345 0082fae4 00094618 00000000 RPCRT4!NdrpServerUnMarshal+0x13d (FPO: [Non-Fpo])

    0082fcf8 77ce26ed 00000000 00000000 000bbd9c RPCRT4!NdrStubCall2+0x19f (FPO: [Non-Fpo])

    0082fd14 77c709be 000bbd9c 00094618 000bbd9c RPCRT4!NdrServerCall2+0x19 (FPO: [Non-Fpo])

    0082fd48 77c7093f 76d3156c 000bbd9c 0082fdec RPCRT4!DispatchToStubInCNoAvrf+0x38 (FPO: [Non-Fpo])

    0082fd9c 77c70865 00000005 00000000 76d736c0 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x117 (FPO: [Non-Fpo])

    0082fdc0 77c734b1 000bbd9c 00000000 76d736c0 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3 (FPO: [Non-Fpo])

    0082fdfc 77c71bb3 00099188 0008f0e8 000adfd8 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0x42c (FPO: [Non-Fpo])

    0082fe20 77c75458 0008f120 0082fe38 00099188 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0x127 (FPO: [Non-Fpo])

    0082ff84 77c5778f 0082ffac 77c5f7dd 0008f0e8 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430 (FPO: [Non-Fpo])

    0082ff8c 77c5f7dd 0008f0e8 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd (FPO: [Non-Fpo])

    0082ffac 77c5de88 00084298 0082ffec 77e6608b RPCRT4!BaseCachedThreadRoutine+0x9d (FPO: [Non-Fpo])

    0082ffb8 77e6608b 000bc298 00000000 00000000 RPCRT4!ThreadStartRoutine+0x1b (FPO: [Non-Fpo])

    0082ffec 00000000 77c5de6d 000bc298 00000000 kernel32!BaseThreadStart+0x34 (FPO: [Non-Fpo])

     

    The above thread belongs to the RPC service and is waiting for a lock to be acquired.

     

    0: kd> x /v svchost1!*service*name*

    pub global 01004078    0 svchost1!ServiceNames = <no type information>

     

    0: kd> du poi(01004078)

    000839f8  "RpcSs"

     

    To find why this thread is not able to acquire the lock, investigation of the threads within the RPC service is required. The reason the lock is not available is because there is a thread that has acquired the lock and is trying to reach a client that does not exist.

     

    The chain of events is summarized in the below figure. Note that all horizontal links were traversed using LPC data structures and debugger extensions. The vertical links were traversed by investigation beyond the scope of LPC. Thus the issue that started in a component of SMS was finally traced to RPC through the use of LPC data structures and debugger extension. LPC is one of the core inter-process communication mechanisms. Understanding LPC and learning to use its extensions helps a lot to debug issues where client threads are waiting for reply from the server.

     

    Chain of events

  • Ntdebugging Blog

    LPC part 2 Kernel Debugger Extensions

    • 1 Comments

    Hello my name is Roy, I’m an EE on the Microsoft global escalation services / CPR team.   This blog is a follow on to my first LPC blog.  We will be discussing debugger extensions that allow you to look at LPC related issues.

     

    Disclaimer: The purpose of this blog is to illustrate debugging techniques with LPC. Please do not rely on this information as documentation for the APIs when writing your own code. As always, this is subject to change in the future.  

     

    LPC Kernel Debugger Extensions

     

     

    Command

    Description

    !lpc                    

    Display the list and description of all the !lpc commands

    !lpc message [MessageId]

    Display the message with a given ID and all related information by attempting to match the given Message ID to the EHTREAD->LpcReceivedMessageId and to the ETHREAD->LpcReplyMessageId of all threads in the system.

    If the MessageId parameter is omitted then it attempts to display all the outstanding messages in the system by searching for the tag ‘LpcM’ in the pools.

    !lpc port [PortAddress] 

    Displays port information. If a server connection port address is specified then only information about that port is displayed. If either a client or server communication port is specified it prints information about the specified communication port, the peer communication port and the server connection port.

    If the PortAddress parameter is omitted then it attempts to walk the list of all objects of type “Port” and “WaitablePort” and display them. Note that for this feature to work the GFlags option “+otl” i.e. “Maintain a list of objects for each type” must be enabled.

    !lpc scan PortAddress   

    Displays port information. It attempts to walk the list of all objects of type “Port” and “WaitablePort” and display the one matching the specified port address. Note that for this feature to work the GFlags option “+otl” i.e. “Maintain a list of objects for each type” must be enabled.

    !lpc thread [ThreadAddr]

    If  ThreadAddr is specified it walks the list of threads in the ETHREAD-> LpcReplyChain to locate the list head i.e. a “Port” or “WaitablePort” object on which the thread is waiting for a reply.

    If the ThreadAddr parameter is omitted then it attempts to find all LPC server threads by looking for threads with a non-NULL EHTREAD->LpcReceivedMessageId and all client threads by looking for threads with a non-NULL ETHREAD->LpcReplyMessageId and displays them.

    !lpc PoolSearch

    Toggles a setting that controls whether the “lpc message” command will search for LPC message tag (‘LpcM’) in the kernel pools or not.

     

     

     

    LPC Kernel Debugger Extension Usage

     

     

    LPC Connection Port information from call stack

     

    On the call stack of any client or server threads blocked on LPC data transfer or LPC connection there will be a frame containing either one of the functions  NtRequestWaitReplyPort() or NtReplyWaitReceivePortEx() . The first parameter to either one of these functions is the handle to the port they are blocked on.

     

    kd> !thread 810de2a8 

    THREAD 810de2a8  Cid 01dc.01f4  Teb: 7ffde000 Win32Thread: 00000000 WAIT: (WrLpcReceive) UserMode Non-Alertable

        81131188  Semaphore Limit 0x7fffffff

        810de398  NotificationTimer

    Not impersonating

    DeviceMap                 e196c460

    Owning Process            810ddda0       Image:         rpclpcs.exe

    Wait Start TickCount      64555          Ticks: 402 (0:00:00:04.025)

    Context Switch Count      2            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address 0x77e76bf0

    Start Address 0x7c810856

    Stack Init f8e28000 Current f8e27c4c Base f8e28000 Limit f8e25000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0

    ChildEBP RetAddr  Args to Child             

    f8e27c64 804dc6a6 810de318 810de2a8 804dc6f2 nt!KiSwapContext+0x2e (FPO: [Uses EBP] [0,0,4])

    f8e27c70 804dc6f2 e1084108 8055a540 e1084108 nt!KiSwapThread+0x46 (FPO: [0,0,0])

    f8e27c98 8056a50a 00000001 00000010 00000001 nt!KeWaitForSingleObject+0x1c2 (FPO: [Non-Fpo])

    f8e27d48 804df06b 000007c4 002bff70 00000000 nt!NtReplyWaitReceivePortEx+0x3dc (FPO: [Non-Fpo])

    f8e27d48 7c90eb94 000007c4 002bff70 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f8e27d64)

    002bff80 00000000 00000000 00000000 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

     

    kd> !handle 7c4 3 810ddda0       

    processor number 0, process 810ddda0

    PROCESS 810ddda0  SessionId: 0  Cid: 01dc    Peb: 7ffd9000  ParentCid: 01b4

        DirBase: 058cd000  ObjectTable: e1a13278  HandleCount:  18.

        Image: rpclpcs.exe

     

    Handle table at e107d000 with 18 Entries in use

    07c4: Object: e1084108  GrantedAccess: 001f0001 Entry: e107df88

    Object: e1084108  Type: (812b5c80) Port

        ObjectHeader: e10840f0 (old version)

            HandleCount: 1  PointerCount: 4

            Directory Object: e14c72c8  Name: rpclpc

     

    kd> !lpc port e1084108

     

    Server connection port e1084108  Name: rpclpc

        Handles: 1   References: 4

        Server process  : 810ddda0 (rpclpcs.exe)

        Queue semaphore : 81131188

        Semaphore state 0 (0x0)

        The message queue is empty

        The LpcDataInfoChainHead queue is empty

     

     

    LPC Messages that are waiting to be picked up by the server thread

     

    When a message is queued to the server connection port and the server thread has not yet not been signaled to pull out the message the ‘!lpc port <Port>” displays the following output that includes the message.

     

    kd> !lpc port e10d8388

     

    Client communication port 0xe10d8388

        Handles: 1   References: 2

        The LpcDataInfoChainHead queue is empty

            Connected port: 0xe106dc00      Server connection port: 0xe15512e0

     

    Server communication port 0xe106dc00

        Handles: 1   References: 1

        The LpcDataInfoChainHead queue is empty

     

    Server connection port e15512e0  Name: rpclpc

        Handles: 1   References: 9

        Server process  : ffbbebe8 (rpclpcs.exe)

        Queue semaphore : 81250848

        Semaphore state 0 (0x0)

            Messages in queue:

            0000 e10513f8 - Busy  Id=000048d8  From: 05f4.0108  Context=80020000  [e15512f0 . e15512f0]

                       Length=0054003c  Type=00000001 (LPC_REQUEST)

                       Data: 00000001 00000241 00000000 00000000 f877bc04 804ec10e

        The message queue contains 1 messages

        The LpcDataInfoChainHead queue is empty

     

     

    LPC Messages being processed by the server

     

    When an LPC message has been queued by the client, dequeued by the sever and being worked upon by the server both the client and the server thread have the message ID associated with them

     

    kd> !lpc message 16fa

    Searching message 16fa in threads ...

        Server thread 810de2a8 is working on message 16fa                        

    Client thread 810dc930 waiting a reply from 16fa                         

    Searching thread 810dc930 in port rundown queues ...

     

    Server connection port e1084108  Name: rpclpc

        Handles: 1   References: 3

        Server process  : 810ddda0 (rpclpcs.exe)

        Queue semaphore : 81131188

        Semaphore state 0 (0x0)

        The message queue is empty

        The LpcDataInfoChainHead queue is empty

    Done.                                             

     

    kd> !thread 810de2a8

    THREAD 810de2a8  Cid 01dc.01f4  Teb: 7ffde000 Win32Thread: 00000000 RUNNING on processor 0

    Not impersonating

    DeviceMap                 e196c460

    Owning Process            810ddda0       Image:         rpclpcs.exe

    Wait Start TickCount      65732          Ticks: 10 (0:00:00:00.100)

    Context Switch Count      4            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.010

    Win32 Start Address 0x000016fa

    LPC Server thread working on message Id 16fa

    Start Address kernel32!BaseThreadStartThunk (0x7c810856)

    Stack Init f8e28000 Current f8e27728 Base f8e28000 Limit f8e25000 Call 0

    Priority 9 BasePriority 8 PriorityDecrement 0 DecrementCount 0

    ChildEBP RetAddr  Args to Child             

    f8e27d64 7c90eb94 badb0d00 002bfe24 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f8e27d64)

    002bfe18 7c90e399 77e76703 000007c4 002bff70 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    002bfe1c 77e76703 000007c4 002bff70 00000000 ntdll!NtReplyWaitReceivePortEx+0xc (FPO: [5,0,0])

    002bff80 77e76c22 002bffa8 77e76a3b 00084540 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0xf4 (FPO: [Non-Fpo])

    002bff88 77e76a3b 00084540 7c90ee18 0006fb10 RPCRT4!RecvLotsaCallsWrapper+0xd (FPO: [Non-Fpo])

    002bffa8 77e76c0a 00083f20 002bffec 7c80b50b RPCRT4!BaseCachedThreadRoutine+0x79 (FPO: [Non-Fpo])

    002bffb4 7c80b50b 00084718 7c90ee18 0006fb10 RPCRT4!ThreadStartRoutine+0x1a (FPO: [Non-Fpo])

    002bffec 00000000 77e76bf0 00084718 00000000 kernel32!BaseThreadStart+0x37 (FPO: [Non-Fpo])

     

    kd> !thread 810dc930

    THREAD 810dc930  Cid 01e4.01ec  Teb: 7ffde000 Win32Thread: 00000000 READY

    Waiting for reply to LPC MessageId 000016fa:

    Pending LPC Reply Message:

        e16b8538: [e16b8538,e16b8538]

    Not impersonating

    DeviceMap                 e196c460

    Owning Process            810e9860       Image:         rpclpcc.exe

    Wait Start TickCount      65731          Ticks: 11 (0:00:00:00.110)

    Context Switch Count      26            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.020

    Win32 Start Address rpclpcs!pre_c_init (0x01001a91)

    Start Address kernel32!BaseProcessStartThunk (0x7c810867)

    Stack Init fbe0e000 Current fbe0dc28 Base fbe0e000 Limit fbe0b000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 16

    ChildEBP RetAddr  Args to Child             

    fbe0dc40 804ea3a4 00000000 810dc930 e107f388 nt!KiUnlockDispatcherDatabase+0x77 (FPO: [Uses EBP] [0,0,4])

    fbe0dc58 80586255 81131188 00000001 00000001 nt!KeReleaseSemaphore+0x70 (FPO: [Non-Fpo])

    fbe0dd10 80598c58 000847b0 0006f8c0 0006f8c8 nt!NtSecureConnectPort+0x635 (FPO: [Non-Fpo])

    fbe0dd3c 804df06b 000847b0 0006f8c0 0006f8c8 nt!NtConnectPort+0x24 (FPO: [Non-Fpo])

    fbe0dd3c 7c90eb94 000847b0 0006f8c0 0006f8c8 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ fbe0dd64)

    0006f8d0 7c93040b ffffffff 00000000 7c859e48 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0006f9b0 7c8107fd 00000000 00084718 7c90e642 ntdll!DbgPrint+0x1b (FPO: [Non-Fpo])

    0006fe68 77ea11a8 00000022 00084540 00000000 kernel32!CreateRemoteThread+0x284 (FPO: [Non-Fpo])

    000847a8 00000000 00000001 00000000 00000000 RPCRT4!LRPC_ADDRESS::CompleteListen+0x84 (FPO: [Non-Fpo])

     

     

    Finding all threads in the system involved in LPC communication

     

    To get a consolidated list of all client threads in the system waiting on LPC replies and a list of server threads working on received LPC requests use the “!lpc threads” command without any parameters

     

    0: kd> !lpc thread

    Searching message 0 in threads ...

    Client thread 89a4cdb0 waiting a reply from 15f6                         

        Server thread 898c6a50 is working on message 63f09                        

        Server thread 89671020 is working on message 63f75                        

        Server thread 89904a48 is working on message 63f10                        

        Server thread 88d6bdb0 is working on message 3d6b1                        

        Server thread 89973db0 is working on message 63f32                         

        Server thread 896d32b8 is working on message 3454                        

        Server thread 8995d020 is working on message 63f76                        

        Server thread 8960f020 is working on message 63f77                         

        Server thread 898cd350 is working on message 61bd9                        

        Server thread 8900edb0 is working on message 3832f                        

        Server thread 8900fbf0 is working on message 33c2e                        

        Server thread 88d539a8 is working on message 3343f                        

    Client thread 89be4020 waiting a reply from 6077e                         

    Client thread 89012990 waiting a reply from 39239                         

        Server thread 89012990 is working on message 39232                        

    Client thread 89531020 waiting a reply from 3923d                         

        Server thread 89531020 is working on message 39236                        

    Client thread 88d13b40 waiting a reply from 3c96e                            

     

     

     

    Quick Tips

     

    8 No thread in the system is doing a NtListenPort( ) does that mean no LPC connection  port is open for connections?

     

    !stacks 2 NtListenPort does not return any thread because NtListenPort( ) calls NtRequestWaitReplyPort( ) to wait for an incoming LPC_CONNECTION_REQUEST message. Instead of calling NtListenPort( ) for listening for incoming connections most LPC connection threads directly call NtRequestWaitReplyPort( ).

     

    8 !lpc message does not show any message?

     

    !poolfind LpcM 1 should do the trick.

     

     

    0: kd> !lpc message

     

    Scanning large pool allocation table for Tag: LpcM (8974e000 : 8976e000)

     

    Searching Paged pool (e1000000 : f7000000) for Tag: LpcM

     

     

    0: kd> !poolfind LpcM 1

     

    Scanning large pool allocation table for Tag: LpcM (8974e000 : 8976e000)

     

     

    Searching Paged pool (e1000000 : f7000000) for Tag: LpcM

     

    e1061000 size:  168 previous size:    0  (Allocated) LpcM

    e1090758 size:  168 previous size:    8  (Allocated) LpcM

    e10a7a40 size:  168 previous size:    8  (Allocated) LpcM

    e1188468 size:    8 previous size:   20  (Free)      LpcM

    e1261000 size:  168 previous size:    0  (Allocated) LpcM

    e1297428 size:  168 previous size:    8  (Allocated) LpcM

     

    Once you have the pool blocks listed with the “LpcM” tag dump the ones that are “Allocated”. On a 32-bit machine add 8bytes to the pool address listed on the left most column. On a 64-bit machine add 16bytes i.e. 0x10. Dump the message as below to retrieve the message id.

     

    0: kd> dt nt!_LPCP_MESSAGE e1061000+8

       +0x000 Entry            : _LIST_ENTRY [ 0xe295fa80 - 0xe29279e8 ]

       +0x000 FreeEntry        : _SINGLE_LIST_ENTRY

       +0x004 Reserved0        : 0xe29279e8

       +0x008 SenderPort       : 0xe2327020

       +0x00c RepliedToThread  : (null)

       +0x010 PortContext      : 0x8021000d

       +0x018 Request          : _PORT_MESSAGE

     

    0: kd> dt nt!_LPCP_MESSAGE e1061000+8 Request.

       +0x018 Request  :

          +0x000 u1       : __unnamed

          +0x004 u2       : __unnamed

          +0x008 ClientId : _CLIENT_ID

          +0x008 DoNotUseThisField : 9.8460604703041575e-311

          +0x010 MessageId : 0x4617f

          +0x014 ClientViewSize : 0

          +0x014 CallbackId : 0

     

    0: kd> !lpc message 0x4617f

    Searching message 4617f in threads ...

        Server thread 88f91c08 is working on message 4617f                        

    Client thread 88f0f5f8 waiting a reply from 4617f                         

    Searching thread 88f0f5f8 in port rundown queues ...

     

    Server communication port 0xe2300a38

        Handles: 1   References: 1

        The LpcDataInfoChainHead queue is empty

            Connected port: 0xe2327020      Server connection port: 0xe13734b0

     

    Client communication port 0xe2327020

        Handles: 1   References: 9

        The LpcDataInfoChainHead queue is empty

     

    Server connection port e13734b0  Name: epmapper

        Handles: 1   References: 93

        Server process  : 896e1450 (svchost1.exe)

        Queue semaphore : 8995ac10

        Semaphore state 0 (0x0)

        The message queue is empty

     

     

    8   The number of server threads shown by !lpc thread processing requests is much more than the number of client threads waiting for reply.

     

    !lpc thread interprets stale data in ETHREAD structure as valid LPC information. An indication of the stale data is the ETHREAD.LpcExitThreadCalled will be set. This field is only set when the thread is exiting.

     

    0: kd> !lpc thread

    Searching message 0 in threads ...

    Client thread 89a4cdb0 waiting a reply from 15f6                         

        Server thread 898c6a50 is working on message 63f09                        

        Server thread 89671020 is working on message 63f75                         

        Server thread 89904a48 is working on message 63f10                        

        Server thread 88d6bdb0 is working on message 3d6b1                        

        Server thread 89973db0 is working on message 63f32                        

        Server thread 896d32b8 is working on message 3454                        

        Server thread 8995d020 is working on message 63f76                        

        Server thread 8960f020 is working on message 63f77                        

        Server thread 898cd350 is working on message 61bd9                        

        Server thread 8900edb0 is working on message 3832f                        

        Server thread 8900fbf0 is working on message 33c2e                        

        Server thread 88d539a8 is working on message 3343f     

     

    0: kd> !lpc message 3454

    Searching message 3454 in threads ...

        Server thread 896d32b8 is working on message 3454                        

    Done. 

     

     

    0: kd> dt nt!_ETHREAD 896d32b8 Lpc.

       +0x1c0 LpcReplyChain :  [ 0x896d3478 - 0x896d3478 ]

          +0x000 Flink : 0x896d3478 _LIST_ENTRY [ 0x896d3478 - 0x896d3478 ]

          +0x004 Blink : 0x896d3478 _LIST_ENTRY [ 0x896d3478 - 0x896d3478 ]

       +0x1ec LpcReplySemaphore :

          +0x000 Header : _DISPATCHER_HEADER

          +0x010 Limit : 1

       +0x200 LpcReplyMessage :

       +0x200 LpcWaitingOnPort :

       +0x220 LpcReceivedMessageId : 0x3454

       +0x234 LpcReplyMessageId : 0

       +0x248 LpcReceivedMsgIdValid : 0x1 ''

       +0x248 LpcExitThreadCalled : 0x1 ''

     

    8 Finding all client threads in the system waiting for LPC reply.

     

    !lpc thread

  • Ntdebugging Blog

    LPC (Local procedure calls) Part 1 architecture

    • 5 Comments

     

    Hello, my name is Roy. I’m an Escalation Engineer in the CPR platforms team.  I’ll be doing a four part series on LPC over the coming month.  You’re sure to find this interesting. That being said let’s get started.

     

    Disclaimer: The purpose of this blog is to illustrate debugging techniques with LPC. Please do not rely on this information as documentation for the LPC APIs when writing your own code. As always, this is subject to change in the future. 

    What is LPC?

     

    LPC (Local Inter-Process Communication and NOT Local Procedure Calls) is a high speed message based communication mechanism implemented in the NT kernel. LPC can be used for communication between two user mode processes, between a user mode process and a kernel mode driver or between two kernel mode drivers.  One example would be two user mode processes communicating via LPC.  Like CSRSS.exe talking to SMSS.exe over SmssWinStationApiPort while creating a logon session or any process talking to LSASS.exe over LsaAuthenticationPort for security reasons. Another example of a user mode process communicating with a kernel mode driver would be KSecDD.sys talking to LSASS.exe for EFS key encryption and decryption during read/write of an encrypted file.

     

    LPC uses two different mechanisms for passing data between the client and the server process. It uses the LPC message buffer (for data size less than 304 bytes) or it uses a shared memory Section mapped to both client and server address space (for data size more than 304 bytes).

     

    Apart from being used as the protocol of choice of Remote Procedure Calls between processes running on the same system, LPC is also used throughout the system e.g. for Win32 Application’s communication with CSRSS.exe, Security Reference Monitor’s communication with LSASS, WinLogon’s communication with LSASS etc.

     

    LPC enforces synchronous communication model between the client and the server processes. Vista deprecates the use of LPC using a new mechanism called Asynchronous Local Inter-Process Communication (ALPC). ALPC has an inherent advantage over LPC in that all calls from client to the server are asynchronous i.e. the client does not need to block/wait for the server to respond to a message. In Vista, legacy application calls to LPC APIs are automatically redirected to the newer ALPC APIs.

     

     

    LPC APIs

     

    LPC APIs are native APIs i.e. they are exported in user mode by NTDLL.dll and in kernel mode by NTOSKRNL.exe. LPC APIs are not exposed at the Win32 level hence win32 applications cannot use the LPC facility directly. Win32 applications can however use LPC indirectly when using RPC by specifying LPC as its underling transport via the protocol sequence “ncalrpc”. All LPC APIs end in the word "Port" which implies an LPC communication endpoint.

     

    API

    Description

    NtCreatePort

    Used by server to create a connection port

    NtConnectPort

    Used by client to connect to a connection port

    NtListenPort

    Used by server to listen for connection requests on the connection port.

    NtAcceptConnectPort

    Used by server to accept connection requests on the connection port

    NtCompleteConnectPort

    Used by server to complete the acceptance of a connection request

    NtRequestPort

    Used to send a datagram message that does not have a reply

    NtRequestWaitReplyPort

    Used to send a message and wait for a reply

    NtReplyPort

    Used to send a reply to a particular message

    NtReplyWaitReplyPort

    Used to send a reply to a particular message and wait for a reply to a previous message

    NtReplyWaitReceivePort

    Used by server to send a reply to the client and wait to receive a message from the client

    NtImpersonateClientOfPort

    Used by server thread to temporarily borrow the security context of a client thread

     

     

    The following diagram illustrates the steps taken by an LPC server process to listen to inbound connection request from potential client and the steps taken by clients to connect to listening servers.

     

     

    Figure [1] LPC Client Server Connection Establishment Sequence

    NOTE: Many server processes use the NtReplyWaitReceivePort( ) API instead of NtListenPort( ). NtListenPort( ) drops all LPC messages except connection requests. Hence NtListenPort( ) can only be used for the first connection. For later connection requests NtReplyWaitReceivePort( ) is used.

     

    The following diagram illustrates the steps taken by an LPC client to send a request to an LPC server that is has already established a connection to and the steps taken by the server to responds to the message.

     

     

     

    Figure [2] LPC Client Server Data Transfer Sequence

     

     

    LPC Data Structures

     

     

    LPC Port Data Structure

     

    LPC endpoints are referred to as ports. LPC implementation uses the same port structure to represent the various types of ports. The ports used by LPC are Server Connection Ports which are named ports created by the server process to accept incoming connections from clients.  Client Communication Ports which are created by the client process to connect to a server process and Server Communication Port created by the server process.

     

    Relationships 

      

     

    Figure [3] LPC Port types and their relationships

     

     

    LPCP_PORT_OBJECT is the internal data structure used by LPC to represent a LPC port. LPCP_PORT_OBJECTs are allocated out of paged pool with the tag ‘Port’.

     

    kd> dt nt!_LPCP_PORT_OBJECT

       +0x000 ConnectionPort   : Ptr32 _LPCP_PORT_OBJECT

       +0x004 ConnectedPort    : Ptr32 _LPCP_PORT_OBJECT

       +0x008 MsgQueue         : _LPCP_PORT_QUEUE

       +0x018 Creator          : _CLIENT_ID

       +0x020 ClientSectionBase : Ptr32 Void

       +0x024 ServerSectionBase : Ptr32 Void

       +0x028 PortContext      : Ptr32 Void

       +0x02c ClientThread     : Ptr32 _ETHREAD

       +0x030 SecurityQos      : _SECURITY_QUALITY_OF_SERVICE

       +0x03c StaticSecurity   : _SECURITY_CLIENT_CONTEXT

       +0x078 LpcReplyChainHead : _LIST_ENTRY

       +0x080 LpcDataInfoChainHead : _LIST_ENTRY

       +0x088 ServerProcess    : Ptr32 _EPROCESS

       +0x088 MappingProcess   : Ptr32 _EPROCESS

       +0x08c MaxMessageLength : Uint2B

       +0x08e MaxConnectionInfoLength : Uint2B

       +0x090 Flags            : Uint4B

       +0x094 WaitEvent        : _KEVENT

     

     

    Field

    Description

    ConnectedPort

    Points to the Server Communication Port

    ConnectionPort

    Points to the Server Connection Port

    MsgQueue.Semaphore

    Used to signal the server thread about the presence of  a message in MsgQueue.ReceiveHead

    MsgQueue.ReceiveHead

    Head of a doubly linked list containing all the messages that are waiting to be dequeued by the server.

    MsgQueue.NonPagedPortQueue

    Points to the LPCP_NONPAGED_PORT_QUEUE structure for the client communication port for tracking lost replies.

    LpcReplyChainHead

    Head of a doubly linked list containing all the threads that are waiting for replies to messages sent to this port.

     

     

     

    LPC Message Data Structure

     

    LPC message are data structures that carry information from the LPC client to the LPC server and can be of various types like connection, request, close etc.

     

    LPCP_MESSAGE is the internal data structure used by LPC to represent a message. LPCP_MESSAGE structures are allocated from a system wide lookaside list with the tag ‘LpcM’.

     

    kd> dt nt!_LPCP_MESSAGE

       +0x000 Entry            : _LIST_ENTRY

       +0x000 FreeEntry        : _SINGLE_LIST_ENTRY

       +0x004 Reserved0        : Uint4B

       +0x008 SenderPort       : Ptr32 Void

       +0x00c RepliedToThread  : Ptr32 _ETHREAD

       +0x010 PortContext      : Ptr32 Void

       +0x018 Request          : _PORT_MESSAGE

     

     

    Field

    Description

    Request.MessageId

    Generated from the value of a global epoch LpcpNextMessageId. Used to uniquely identify a message.

    SenderPort

    Points to the LPCP_PORT_OBJECT of the client communication port

    Entry

    Is the list entry that is used to queue the message to the Server Communication/Connection Port���s MsgQueue.ReceiveHead

    Request

    Is a copy of the message buffer that was provided as the Request parameter to the call to NtRequestWaitReplyPort() or a copy the message buffer that was provided as the Reply parameter to NtReplyWaitRecivePortEx().

     

     

    LPC related fields in Thread Data Structure

     

    kd> dt nt!_ETHREAD -y Lpc

       +0x1c8 LpcReplyChain : _LIST_ENTRY

       +0x1f4 LpcReplySemaphore : _KSEMAPHORE

       +0x208 LpcReplyMessage : Ptr32 Void

       +0x208 LpcWaitingOnPort : Ptr32 Void

       +0x228 LpcReceivedMessageId : Uint4B

       +0x23c LpcReplyMessageId : Uint4B

       +0x250 LpcReceivedMsgIdValid : Pos 0, 1 Bit

       +0x250 LpcExitThreadCalled : Pos 1, 1 Bit

     

    The following table describes the fields of the ETHREAD data structure that are used for communication between LPC Client and Server process.

     

     

    Field

    Description

    LpcReplyMessageId

    Contains the ID of the message sent to the server while the client thread is waiting on the response.

    LpcWaitingOnPort

    Used by the LPC Client Thread to stores the LPC client communication port on which it is waiting for a reply.

    LpcReplySemaphore

    Used to block the LPC Client Thread while the server is processing the LPC message.

    LpcExitThreadCalled

    Used by the Lpc APIs to determine if the thread is currently in the process of exiting and if so returns STATUS_THREAD_IS_TERMINATING from the API call.

    LpcReplyChain

    Used to queue the thread to the list of threads waiting for replies from the server communication/connection port. The list head of this list is in LPCP_PORT_OBJECT->LpcReplyChainHead.

     

     

     

    Important global LPC connection ports

     

    Field

    Description

    LsaAuthenticationPort

     Used for the following tasks:

    1. Security package management like Kerberos, NTLM etc…
    2. Credential management
    3. User Account management
    4. EFS (Encrypted File System) key encryption, decryption

     

    SmssWinStationApiPort

     

     This port is used by the CSRSS to manage Window Stations i.e. multiple sessions.

     

    ErrorLogPort

     

     This port is used by IO manager and other components to send error log entries to the EventLog service.

     

     

     

     

    Important LPC port fields in the Process

    (nt!_EPROCESS)

     

    Field

    Description

    SecurityPort

    LSA uses this port for EFS (encrypted file system) and authentication.

     

    ExceptionPort

     When a thread does not catch an exception an LPC_EXCEPTION message is sent to this port. A subsystem might register an LPC port in this field to receive second chance exception information of process running in the subsystem. Default action of CSRSS is to terminate the process.

     

    DebugPort

     The kernel dispatches user mode exceptions to the process debugger on this port. Debug outputs using OutputDebugString( ) are passed to this port as DBG_PRINTEXCEPTION_C.

     

     

     

     

    Important LPC port fields in the Thread

    (nt!_ETHREAD)

     

    Field

    Description

    TerminationPort

    This is not a single port rather a chain of ports registered with the Process Manager. The process manager notifies all ports when the thread terminates.

     

    LpcWaitingOnPort

    This field has the address of the client communication port when the client thread waits for a reply to a LPC message from a server thread.

     

     

     

     

    Stay tuned...in the next post we will roll our sleeves up and dig into the debug extensions for LPC and some common states where  you may find these calls stuck.

     

  • Ntdebugging Blog

    Desktop Heap, part 2

    • 21 Comments

     

    Matthew here again – I want to provide some follow-up information on desktop heap.   In the first post I didn’t discuss the size of desktop heap related memory ranges on 64-bit Windows, 3GB, or Vista.   So without further ado, here are the relevant sizes on various platforms...

     

     

    Windows XP (32-bit)

     

    ·         48 MB = SessionViewSize (default registry value, set for XP Professional, x86)

    ·         20 MB = SessionViewSize (if no registry value is defined) 

    ·         3072 KB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)

    ·         512 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)

    ·         128 KB = Winlogon desktop heap size

    ·         64 KB = Disconnect desktop heap size

     

     

     

    Windows Server 2003 (32-bit)

     

    ·         48 MB = SessionViewSize (default registry value)

    ·         20 MB = SessionViewSize (if no registry value is defined; this is the default for Terminal Servers)

    ·         3072 KB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)

    ·         512 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)

    ·         128 KB = Winlogon desktop heap size

    ·         64 KB = Disconnect desktop heap size

     

     

     

    Windows Server 2003 booted with 3GB (32-bit)

     

    ·         20 MB = SessionViewSize (registry value has no effect)

    ·         3072 KB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)

    ·         512 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)

    ·         128 KB = Winlogon desktop heap size

    ·         64 KB = Disconnect desktop heap size

     

    You may also see reduced heap sizes when running 3GB.  During the initialization of the window manager, an attempt is made to reserve enough session view space to accommodate the expected number of desktops heaps for a given session.  If the heap sizes specified in the SharedSection registry value have been increased, the attempt to reserve session view space may fail.  When this happens, the window manager falls back to a pair of “safe” sizes for desktop heaps (512KB for interactive, 128KB for non-interactive) and tries to reserve session space again, using these smaller numbers.  This ensures that even if the registry values are too large for the 20MB session view space, the system will still be able to boot.

     

     

     

    Windows Server 2003 (64-bit)

     

    ·         104 MB = SessionViewSize (if no registry value is defined; which is the default) 

    ·         20 MB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)

    ·         768 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)

    ·         192 KB = Winlogon desktop heap size

    ·         96 KB = Disconnect desktop heap size

     

     

     

    Windows Vista RTM (32-bit)

     

    ·         Session View space is now a dynamic kernel address range.  The SessionViewSize registry value is no longer used.

    ·         3072 KB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)

    ·         512 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)

    ·         128 KB = Winlogon desktop heap size

    ·         64 KB = Disconnect desktop heap size

     

     

     

    Windows Vista (64-bit) and Windows Server 2008 (64-bit)

     

    ·         Session View space is now a dynamic kernel address range.  The SessionViewSize registry value is no longer used.

    ·         20 MB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)

    ·         768 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)

    ·         192 KB = Winlogon desktop heap size

    ·         96 KB = Disconnect desktop heap size

     

     

    Windows Vista SP1 (32-bit) and Windows Server 2008 (32-bit)

     

    ·         Session View space is now a dynamic kernel address range.  The SessionViewSize registry value is no longer used.

    ·         12288 KB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)

    ·         512 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)

    ·         128 KB = Winlogon desktop heap size

    ·         64 KB = Disconnect desktop heap size 

     

     

    Windows Vista introduced a new public API function: CreateDesktopEx, which allows the caller to specify the size of desktop heap.

    Additionally, GetUserObjectInformation now includes a new flag for retrieving the desktop heap size (UOI_HEAPSIZE).

     

     

    [Update: 3/20/2008 - Added Vista SP1 and Server 2008 info.  More information can be found here.]

  • Ntdebugging Blog

    How Windows Starts Up (Part the second)

    • 7 Comments

    Howdy folks, David here again with part two of How Windows Starts Up.  Today we’ll be covering the Boot Loader Phase.  Let’s take a moment to recap where we are at this point.  So far, the computer has completed POST, executed the MBR code, located the active partition, executed the Boot Sector code in that partition, found NTLDR and loaded it into memory.  So what happens next?  During this phase, we’ll be discussing the following files located in the root of your system volume:  NTLDR, ntdetect.com and boot.ini.

     

    The Boot Loader Phase begins when NTLDR executes:

     

    1.    NTLDR switches the processor from real mode to 32-bit flat memory mode.

    How does it do this you ask?  NTLDR contains two parts – a 16-bit component which handles switching the processor to Protected Mode and a 32-bit component which handles everything else. 

     

    When NTLDR first begins execution, the processor is in an x86-operating mode called Real-Mode.  In Real-Mode, no virtual to physical translation of memory addresses occurs, which means that all memory addresses use the 16-bit Segment:Offset format, and only the first 1MB of the computer's physical memory is accessible.  However, we don’t have to concern ourselves with this too much because the first action NTLDR takes is to switch the CPU to Protected-Mode. 

     

    *Trivia Alert*  before we enter Protected-Mode, we enable the A20 Line – you IBM guys know what I’m talking about… 

     

    Still no virtual-to-physical translation occurs at this point in the boot process, but a full 32 bits of memory addressing becomes accessible, and now NTLDR can access up to 4GB of physical memory. 

     

    2.    NTLDR creates enough page tables to make memory below 16MB accessible with paging turned on.  This number has been increased with Windows Server 2003. 

     

    3.    NTLDR enables paging.  “Protected-Mode with Paging Enabled” is the mode in which Windows executes during normal operation.

     

    4.    NTLDR then starts the FAT and NTFS mini file system drivers contained within its own code.  Unlike the Boot Sector's file system code, NTLDR's code is sophisticated enough to read subdirectories.

     

    Note:   All disk access from this point until the Kernel Phase begins uses BIOS Interrupt 13h, Extended Interrupt 13h or NTBOOTDD.SYS

     

    5.    NTLDR locates and parses Hiberfil.sys to determine if it contains a valid hibernator file.  If a valid hibernator file is found, then that file is loaded into memory and execution continues from the point the computer was placed into hibernation.

     

    Hibernation files are interesting.  The file either contains a full memory image, or it contains a pointer (ARC path) to the boot partition of the last operating system that entered hibernation.  That boot partition will contain another Hiberfil.sys file, which contains a full memory image of the hibernating operating system.  If the Hiberfil.sys file is a pointer, the file will contain an ARC path using one of the following syntaxes:

     

                             - linkmulti()disk()rdisk()partition()

                             - linksignature()disk()rdisk()partition()

                             - linkscsi()disk()rdisk()partition()

     

    6.    If Hiberfil.sys is not found, NTLDR parses the Boot.ini file.  If the Boot.ini contains a reference to more than one operating system or boot option, then the boot loader screen is displayed. 

     

    This is a sample of a default Boot.ini file from a Windows 2000 Server-based computer:

     

    [boot loader]

    timeout=30

    default=multi(0)disk(0)rdisk(0)partition(1)\WINNT

    [operating systems]

    multi(0)disk(0)rdisk(0)partition(1)\WINNT="Windows 2000 Server" /fastdetect

     

    Let’s talk a bit about what’s in the Boot.ini file.  Boot.ini contains a list of ARC paths.  Ok, quick – five bonus points to the first person who can tell me what ‘ARC’ stands for…  come on…  Yes, you there in the red shirt… correct!  It stands for ‘Advanced RISC Computing’; now beam down with the captain.  The rest of you come with me…

     

    ARC paths may use the “Multi”, “SCSI” or “Signature” syntax.

     

    The “Multi” syntax instructs Windows to rely on the system BIOS to load system files.  This means that NTLDR will be using INT13 BIOS calls to find and load NTOSKRNL and any other files on the boot partition.

     

    The “SCSI” syntax tells Windows to load a SCSI device driver to access the boot partition.  A copy of the driver must be renamed to NTBOOTDD.SYS and placed in the root of the system partition.

     

    The “Signature” syntax is equivalent to the SCSI syntax, but is used instead to support the Plug and Play architecture in Windows 2000 or later.  Because Windows 2000 is a PnP OS, the SCSI controller number instance might vary each time you start Windows 2000, especially if you add new SCSI controller hardware after setup.

     

    The Signature syntax instructs NTLDR to locate the drive with a disk signature that matches the value in the parentheses, regardless of which SCSI controller number that the drive is connected to.  Keep in mind that the Signature syntax also requires the presence of an Ntbootdd.sys file in the root of the system partition.  If you recall from the previous post, the disk signature value is located in the MBR.

     

    Example:

    signature(8b467c12)disk(1)rdisk(0)partition(2)\winnt="Windows 2000"

     

          Signature() syntax is used only if one of the following conditions exists:

     

    ·         The System BIOS or controller hosting the boot partition on which Windows is installed, does not support INT13 Extensions, or has the INT13 Extensions option disabled, and the partition on which you installed Windows is either larger than ~7.8 gigabytes (GB) in size, or the ending cylinder number for that partition is higher than 1024.

    ·         The drive on which you installed Windows is connected to a SCSI controller with SCSI BIOS disabled, so INT13 BIOS calls cannot be used during the boot process.

     

                            227704 Windows May Use Signature() Syntax in the Boot.ini File

                            http://support.microsoft.com/default.aspx?scid=kb;EN-US;227704

     

    7.    If Boot.ini is not found, NTLDR attempts to continue the boot process by assuming that Windows was installed in the default location on the same partition (c:\windows).  If the c:\windows directory is not found, NTLDR displays this error message:  Windows could not start because the following file is missing or corrupt: \winnt root\system32\ntoskrnl.exe

     

    8.    If the boot loader screen is displayed, the user must select an operating system to boot.

    a.    If the user chooses to run an operating system other than Windows 2003/XP such as DOS or Windows 95/98, NTLDR loads and executes the Bootsect.dos file located in the root of the boot drive.

    (1)  If the Bootsect.dos file is found, NTLDR loads it into RAM and then passes control to it.

    (2)  Bootsect.dos then loads IO.sys, MSDOS.sys, Command.com, etc., as necessary and proceeds to boot the downlevel operating system.

    (3)  If the file is not found in the root, NTLDR displays:

    I/O Error accessing boot sector file multi(0)disk(0)rdisk(0)partition(1): \bootsect.dos

    b.    If the user selects a Windows NT-based operating system, NTLDR executes Ntdetect.com.

    (1)  If the ARC path points to a hard drive or partition that is not on the system, the following is displayed:

    Windows could not start because of a computer disk hardware configuration problem.  Could not read from the selected boot disk.  Check boot path and disk hardware.  Please check the Windows documentation about hardware disk configuration and your hardware reference manuals for additional information.

    (2)  The above message may also be displayed if you are using Signature() ARC path notation and the disk signature in the MBR is overwritten (for example, by a virus) or changed (corrupted) so that it no longer matches the Signature() value in the Boot.ini file.  The following error message may also be displayed when there are problems with the Signature () value:

    Windows could not start because of the following ARC firmware boot configuration problem: did not properly generate ARC name for HAL and system paths. Please check the Windows documentation about ARC configuration options and your hardware reference manuals for additional information.

     

    9.    Ntdetect.com initializes and detects all supported hardware that it can find.  (Troubleshooting tip:  There is an Ntdetect.chk on the Windows Resource Kit CD in the Debug.cab file, which can be renamed and used in place of Ntdetect.com to view what Ntdetect.com is detecting during startup.  You know you want to – go ahead, I’ll wait…)

     

    Ntdetect.com detects the following components:

     

    a)    Computer ID

    b)    Bus/adapter type

    c)    Video adapter

    d)    Keyboard

    e)    Communications ports

    f)     Floppy disks

    g)    Mouse or other pointing devices

    h)    Parallel ports

     

    10.  Ntdetect.com passes its list of hardware components to NTLDR.

     

    11.  If the /SOS switch was used in the Boot.ini for the OS that is loading, NTLDR will display the files it loads into memory from this point on.  Keep in mind – none of these files are being initialized.  They are just being loaded into RAM at this point.  I can’t tell you how many times someone has incorrectly concluded that there is a problem with MUP.SYS because it’s the last file displayed on this screen before their computer hangs.  Poor MUP – always getting a bad rap.  This list can be useful, however; it can help you identify potential malware that may be loading.

     

    12.  NTLDR loads Ntoskrnl.exe and Hal.dll into memory.  Again, *loading*, not initializing.

     

    13.  NTLDR loads the System registry hive into memory.  If Windows cannot load the registry hive, it may be necessary to restore the System hive from the ERD or by using the Windows Recovery Console to copy the files from the Windows\Repair or Windows\Repair\RegBack folder.

     

    14.  NTLDR reads the HKLM\System\Select registry key.

    a.    If the Last Known Good option was not selected, NTLDR looks for the “Default” value in the SELECT key to select which ControlSet to use.  (0x1 = ControlSet001, 0x2 = ContolSet002, 0x3 = ControlSet003) 

    b.    If the Last Known Good option was selected, NTLDR looks for the “LastKnownGood” value in the SELECT key to select the ControlSet to use.

     

    15.  NTLDR sets the “Current” value in the HKLM\System\Select key to the ControlSet being used.

     

    Note:  At this point, NTLDR has determined which ControlSet it needs to use, but we don’t actually create the “CurrentControlSet” symbolic link until the kernel initializes (more on that in our next installment).  Therefore, for the purposes of this discussion, I’ll use *ActiveControlSet* to refer to whichever registry ControlSet has been identified as the one NTLDR needs to use.

     

    16.  NTLDR scans HKLM\System\*ActiveControlSet*\Control\Nls\Codepage for the “Acp”, “Oemcp”, and “Oemhal” values.  These entries are pointers to the table of codepage information stored in the same key.  The table has the filenames that correspond to the codepage numbers.  NTLDR loads these files into memory.  The defaults are C_1252.nls, C_437.nls, and Vgaoem.fon.

     

    A code page is a character set specific to an operating system or hardware platform.  It maps character codes to individual characters.  Different code pages include different special characters, typically customized for a language or a group of languages.  The system uses code pages to translate keyboard input into character values for non-Unicode based applications, and to translate character values into characters for non-Unicode based output displays.

     

    17.  NTLDR scans HKLM\System\*ActiveControlSet*\Control\Nls\Language for the “Default” value.  This entry is a pointer to the table of language information stored in the same key.  NTLDR loads this file into memory.  The default is L_intl.nls.

     

    18.  NTLDR scans HKLM\System\*ActiveControlSet*\Services for device drivers with a “Start” value of 0x0.  It finds their “Group” and “Tag” values also.

     

    19.  NTLDR scans HKLM\System\*ActiveControlSet*\Control\ServiceGroupOrder for the order that service groups should be loaded.

     

    20.  NTLDR sorts the drivers by “Group” and then by their “Tag” value.  Each service or driver has a particular group to which it belongs.  All drivers within a specific group are ordered by tag. 

     

    21.  NTLDR loads the 0x0 device drivers into memory.  It does not initialize or start these drivers.

     

    22.  The board is set, and the pieces are moving.  NTLDR executes Ntoskrnl.exe.

     

    Troubleshooting the Boot Loader Phase

     

    Okay, now that we’ve seen what is supposed to happen, what could possibly go wrong?  Well, several things…

     

    First, all file-system activity during this phase uses BIOS Int13, Extended Int13 or Ntbootdd.sys. 

     

    1.    If your computer uses Extended Int 13, verify that Ext. Int13 is enabled in your system’s BIOS or SCSI BIOS as appropriate.

     

    2.    If you are using a SCSI driver, verify that you are using the correct SCSI driver (Ntbootdd.sys).  If you are using "SCSI()" syntax in the Boot.ini file, copy the correct device driver for the SCSI controller to the root and then rename it to Ntbootdd.sys.  This is the driver that is stored in the Winnt.sbs\System32\Drivers folder (for example, the Aic78xx.sys file).  If you use "MULTI()" syntax in the Boot.ini file, you do not need to do this.  Use the Windows Recovery Console or a Windows NT boot floppy to complete this step.

     

                301680 HOW TO: Create a Boot Disk for an NTFS or FAT Partition

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;301680

     

    307654 How to install and use the Recovery Console in Windows XP

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;307654

     

     

    Significant Errors during this phase:

     

    ·         Windows could not start because of a computer disk hardware configuration problem.  Could not read from the selected boot disk.  Check boot path and disk hardware.  Please check the Windows documentation about hardware disk configuration and your hardware reference manuals for additional information.

     

    This error indicates that the ARC path is pointing to a hard drive or partition that is not on the system.  This message may also be displayed if you are using Signature() ARC path notation and the disk signature in the MBR is overwritten (for example, by a virus) or changed (corrupted) so that it no longer matches the Signature() value in the Boot.ini file. 

     

    ·         Windows could not start because of the following ARC firmware boot configuration problem: did not properly generate ARC name for HAL and system paths. Please check the Windows documentation about ARC configuration options and your hardware reference manuals for additional information.

     

    This error message may be displayed when there are problems with the Signature () ARC path syntax.

     

    Ntoskrnl.exe missing or corrupt.  This error could be the result of an incorrect ARC path in the Boot.ini file.  Use the information in the following knowledge base articles to correct the Boot.ini information.

     

                102873 BOOT.INI and ARC Path Naming Conventions and Usage

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;102873

     

                155222 How to Determine the ARC Path

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;155222

     

                130921 Creating an FT Boot Disk With Scsi( ) and Multi( ) Identifiers

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;130921

     

    It is also possible to use the Windows XP Recovery Console to create a new Boot.ini for a Windows NT, Windows 2000 or .NET installation.  Please see the following knowledge base article for a description of this process:

     

                291980 A Discussion About the Bootcfg Command and Its Uses

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q291980

     

    Specify the /SOS switch in the Boot.ini file and monitor the output to the screen.  If the system stops responding before anything is output to the screen, then the problem may be with hardware detection.  Replace Ntdetect.com with Ntdetect.chk to view the hardware detected by the system during this phase.

     

                103659 Setup Hangs While Inspecting Hardware; How to Use Ntdetect.com

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;103659

     

                320040 HOW TO: Use Ntdetect.chk to Identify Hardware Problems

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;320040

     

     

    Incorrect HAL

    If after specifying the /SOS option, you see that an incorrect version of the HAL.DLL is loading, use the following KB articles to replace the HAL or troubleshoot other HAL-related issues:

     

                314477 "Hardware Configuration Problem" Err Msg Starting Windows

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;314477

     

                237556 Troubleshooting Windows 2000 Hardware Abstraction Layer Issues

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;237556

     

    330184  "Invalid Boot.ini" or "Windows could not start" error messages

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;330184

     

     

    Unable to load System hive

    Several problems could happen here.  First, consider that the amount of memory we’re dealing with is only 16MB.  We have to squeeze NTLDR, the System Registry Hive, the kernel, the HAL, all the boot drivers and the PFN database into a relatively small area.  So, with Windows Server 2003, we’ve given ourselves a bit more room. 

     

                302594 The System hive memory limitation is improved in Windows Server 2003

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;302594

     

    If Windows is unable to load the System registry hive into memory, then it may be necessary to restore the System hive from the ERD or use the Windows Recovery Console to copy the files from the Winnt\Repair or Winnt\Repair\RegBack folder whichever version is most recent.

     

                307545 How to Recover from a Corrupted Registry

                http://support.microsoft.com/default.aspx?scid=kb;EN-US;307545

     

     

    Table of Startup Type Values

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services

    0x0 = Boot

    Loaded into RAM by the Boot loader (NTLDR) during Boot Loader Phase, then initialized by the Kernel (NTOSKRNL) during Kernel Phase

    0x1 = System

    Loaded and initialized by the Kernel during Kernel Phase

    0x2 = Auto

    Loaded or started automatically at system startup

    0x3 = Manual

    Driver is manually started by the user or another process

    0x4 = Disabled

    Driver is not to be started (There are exceptions, such as NTFS)

     

     

    Conclusion

    Well folks, we certainly have covered a lot of ground so far.  Any questions?  Yes, you there in the front…  Ah, why have we not discussed any blue screens?  Very good question.  We haven’t seen any BSOD’s up to this point in the boot sequence because the kernel isn’t running yet and STOP errors are generated by the kernel.  No kernel, no blue screens. 

     

    That’s all the time we have this week boys & girls.  Join us next time for How Windows Starts Up, The Kernel Phase.

     

  • Ntdebugging Blog

    Where the rubber meets the road, or in this case the hardware meets the probe.

    • 1 Comments

    Hi my name is Bob, I’m an Escalation engineer with the Microsoft critical problem resolution team.  We had one of our readers ask how much we deal with hardware.  Well in response we recently worked on an interesting problem I thought I would share with you.  In this case it was interesting because it demonstrated an issue specific to multi-processor machines and something that probably sounded innocuous to the driver writer who caused the  problem.

     

    What was the problem?

     

    The problem was that everything worked except the time in the system was not being updated. The RTC had stopped.   We spent time reviewing the semantics of how the RTC should function and even hooked up an oscilloscope to the RTC on the motherboard and were able to turn it off and on with the debugger by writing out the correct port.  The trace on the scope validated our understanding of what had to be written to the port to turn the clock off.    One we had a clear understanding of this we understood what we were looking for in a driver that might cause the problem.  Note the clock typically fires every 10ms so you do not need a fast scope to do this.

     

     

    Special keyboard driver written

     

    In order to catch a dump in state we had to modify the keyboard driver.  It would cause an “Int 3” in its ISR instead of calling “bug check” for an E2 stop.    Because the RTC was not running the idle thread was not getting quantums and as a result a normal break in would not work.  However the system would respond to ISRs. 

     

    What was found

     

    All RTC interrupts were stopped - the clock was not running.  We checked all the obvious places to see if the RTC was disabled.

     

    We looked at the ICR in the I/O APIC.  This is the interrupt configuration register. There is a register for every interrupt pin on the APIC. These registers are used to tell the APIC what vector to send the processor so the processor can service the interrupt.  It also has configuration information about level and if it edge triggered and a mask bit.  The mask bit was not set.

    Below is a reenactment.

     

    0: kd> ed ffd01000

    ffd01000 00000034 20 ß Select register 20 which is pin 8.

     

     

    0: kd> ed ffd01010

    ffd01010 000008d1    ß Contents ß Vect D1 Bit 16 the interrupt mask bit is not set so it is OK.

     

     

    Next check the RTC status register which are I/O ports 70 and 71.  Port 70 is the address port.  Port 71 is the data port.  This information is from an old BIOS book.

     

    0: kd> ob 70 b       ß ‘B’ is a control register.   

    0: kd> ib 71

    00000071: 42         ß The value 42 means that the RTC is enabled.  Bit 6 is the enable.

     

     

    So what was it?

     

    The way the RTC works is it will interrupt at a certain interval.  When the interrupt is serviced, the status register has to be read to start it again.

     

    We discovered another driver that was reading  the clock, this was done by disassembly various drivers in the dump and looking for the I/O operation to ports 70 or 71.  The lower addresses selected by port 70 will yield the time when read.  That is what the driver was doing.

     

    You would think that simply reading the time in this way would not hurt anything.  However, in a multi-processor system, access has to be serialized.  There is only one set of I/O ports for the system.

     

    Since it takes two accesses to perform an operation on the clock, one address & one data, a collision between two processors can cause undetermined results.

     

    Below is a timing diagram of the issue;

     

     

    Proc 0 running OS RTC handler                         Proc 1 running XYZ driver

     

      T1          Set register select to status register

     

      T2                                                                                          Set register select to read time

     

      T3          Read status register to restart clock

     

     

    So at T3 the OS RTC handler reads the wrong register so the clock does not start. 

     

     

    Conclusion

     

    I thought this was an interesting problem that illustrates the need for serialization.  And it demonstrates what to look out for in a multi-proc environment.  You always have to think “What happens if the other processor does…….”

     

    For more information consult any BIOS developer manuals you may have lying around or this link we found http://www.geocities.com/SiliconValley/Campus/1671/docs/rtc.htm

    See the “Status Register C” section, “All future interrupts are disabled until this register is read - your interrupt handler *must* do it.”

  • Ntdebugging Blog

    How Windows Starts Up (part 1 of 4)

    • 11 Comments

    Hi folks, my name is David and I’m an Escalation Engineer for Microsoft.  Since Bryan wrote about How Windows Shuts Down, I thought it would be a good idea to cover How Windows Starts Up.

     

    This information applies specifically to Windows 2000, Windows XP, and Windows Server 2003.  I will blog separately on the boot changes in Windows Vista.

     

    Additional information about this topic may be found in Chapter 5 of Microsoft Windows Internals by Russinovich and Solomon and also on TechNet:  http://technet.microsoft.com/en-us/library/bb457123.aspx

     

    Methodology

    The key to understanding and troubleshooting system startup issues is to accurately ascertain the point of failure.  To facilitate this determination, I have divided the boot process into the following four phases.

     

                    1. Initial

                    2. Boot Loader

                    3. Kernel

                    4. Logon

     

    Over the next few weeks, I’ll be describing each of these phases in detail and providing appropriate guidance for troubleshooting relevant issues for each phase. 

     

    Initial Phase

    The Initial Phase of boot is divided into the Power-On Self Test (POST) and Initial Disk Access. 

     

    Power-On Self Test

    POST activities are fully implemented by the computer’s BIOS and vary by manufacturer.  Please refer to the technical documentation provided with your hardware for details.  However, regardless of manufacturer, certain generic actions are performed to ensure stable voltage, check RAM, enable interrupts for system usage, initialize the video adapter, scan for peripheral cards and perform a memory test if necessary.  Depending on manufacturer and configuration, a single beep usually indicates a successful POST. 

     

    Troubleshooting the POST

    ü  Make sure you have the latest BIOS and firmware updates for the hardware installed in the system.

    ü  Replace the CMOS battery if it has failed.

    ü  Investigate recently added hardware (RAM, Video cards, SCSI adapters, etc.)

    ü  Remove recently added RAM modules.

    ü  Remove all adapter cards, then replace individually, ensuring they are properly seated.

    ü  Move adapter cards to other slots on the motherboard.

    ü  If the computer still will not complete POST, contact your manufacturer.

     

    Initial Disk Access

    Depending on the boot device order specified in your computer’s BIOS, your computer may attempt to boot from a CD-ROM, Network card, Floppy disk, USB device or a hard disk.  For the purposes of this document, we’ll assume that we’re booting to a hard disk since that is the most common scenario.

     

    Before we discuss the sequence of events that occur during this phase of startup, we need to understand a little bit about the layout of the boot disk.  The structure of the hard disk can be visualized this way:  (Obviously, these data areas are not to scale)

     

     

     

    Hard disks are divided into Cylinders, Heads and Sectors.  A sector is the smallest physical storage unit on a disk and is almost always 512 bytes in size.  For more information about the physical structure of a hard disk, please refer to the following Resource Kit chapter:  http://www.microsoft.com/resources/documentation/windowsnt/4/server/reskit/en-us/resguide/diskover.mspx

     

    There are two disk sectors critical to starting the computer that we’ll be discussing in detail:  

     

    ·         Master Boot Record (MBR)

    ·         Boot Sector

     

    The MBR is always located at Sector 1 of Cylinder 0, Head 0 of each physical disk.   The Boot Sector resides at Sector 1 of each partition.  These sectors contain both executable code and the data required to run the code.

     

    Please note that there is some ambiguity involved in sector numbering.  Cylinder/Head/Sector (CHS) notation begins numbering at C0/H0/S1.  However, Absolute Sector numbering begins numbering at zero.  Absolute Sector numbering is often used in disk editing utilities such as DskProbe.  These differences are discussed in the following knowledge base article:

    Q97819  Ambiguous References to Sector One and Sector Zero

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q97819

     

    Now that we have that straight, when does this information get written to disk?  The MBR is created when the disk is partitioned.  The Boot Sector is created when you format a volume.  The MBR contains a small amount of executable code called the Master Boot Code, the Disk Signature and the partition table for the disk.  At the end of the MBR is a 2-byte structure called a Signature Word or End of Sector marker, which should always be set to 0x55AA.  A Signature Word also marks the end of an Extended Boot Record (EBR) and the Boot Sector.

     

    The Disk Signature, a unique number at offset 0x1B8, identifies the disk to the operating system.  Windows 2000 and higher operating systems use the disk signature as an index to store and retrieve information about the disk in the registry subkey HKLM\System\MountedDevices.

     

    The Partition Table is a 64-byte data structure within the MBR used to identify the type and location of partitions on a hard disk.  Each partition table entry is 16 bytes long (four entries max).  Each entry starts at a predetermined offset from the beginning of the sector as follows:

     

                                    Partition 1                           0x1BE    (446)

                                    Partition 2                           0x1CE    (462)

                                    Partition 3                           0x1DE    (478)

                                    Partition 4                           0x1EE    (494)

     

    The following is a partial example of a sample MBR showing three partition table entries in-use and one empty:

     

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00 00 00

    000001D0: 81 0A 07 FE FF FF 8A F5 – 7F 00 3D 26 9C 00 00 00

    000001E0: C1 FF 05 FE FF FF C7 1B – 1C 01 D6 96 92 00 00 00

    000001F0: 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00

     

    Let’s take a look at each of the fields of a partition table entry individually.  For each of these explanations, I’ll use the first partition table entry above and highlight the relevant section.  Keep in mind that these values are little-endian.

     

    Byte Offset

    Field Length

    Sample Value

    Field Description

    0x1BE

    8 Bits

    0x80

    Boot Indicator

                    00=Do Not Use for Booting

                    80=Active partition (Use for Booting)

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00     

     

    0x1BF

    8 Bits

    0x01

    Starting Head

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00

     

    0x1C0

    0x1C1

    2 Byte

    Word

    0x01

    Starting Sector

    Only the first 6 bits are used.  The upper 2 bits of this byte are used by the Starting Cylinder field.

    0x00

    Starting Cylinder

    Uses 1 byte + 2 bits from the Starting Sector field to make up the cylinder value.  The Starting Cylinder is a 10-bit number with a maximum value of 1023.

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00

     

     

     

    0x1C2

    8 Bits

    0x07

    System ID

    Defines the volume type.  0x07=NTFS

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00

     

    Other Possible System ID Values:  

    Partition Type

    ID Value

    0x01

    FAT12 primary partition or logical drive (fewer than 32,680 sectors in the volume)

    0x04

    FAT16 partition or logical drive (32,680–65,535 sectors or 16 MB–33 MB)

    0x05

    Extended partition

    0x06

    BIGDOS FAT16 partition or logical drive (33 MB–4 GB)

    0x07

    Installable File System (NTFS partition or logical drive)

    0x0B

    FAT32 partition or logical drive

    0x0C

    FAT32 partition or logical drive using BIOS INT 13h extensions

    0x0E

    BIGDOS FAT16 partition or logical drive using BIOS INT 13h extensions

    0x0F

    Extended partition using BIOS INT 13h extensions

    0x12

    EISA partition

    0x42

    Dynamic disk volume

    0x86

    Legacy FT FAT16 disk *

    0x87

    Legacy FT NTFS disk *

    0x8B

    Legacy FT volume formatted with FAT32 *

    0x8C

    Legacy FT volume using BIOS INT 13h extensions formatted with FAT32 *

     

    Partition types denoted with an asterisk (*) indicate that they are also used to designate non-FT configurations such as striped and spanned volumes.

     

    0x1C3

    8 Bits

    0xFE

    Ending Head      (0xFE=254 decimal)

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00

     

    0x1C4

    0x1C5

    2 Byte

    Word

    0xBF

    Ending Sector

    As with the Starting Sector, it only uses the first 6 bits of the byte.  The upper 2 bits are used by the Ending Cylinder field.

    0x09

    Ending Cylinder

    Uses 1 byte in addition to the upper 2 bits from the Ending Sector field to make up the cylinder value.  The Ending Cylinder is a 10-bit number with a maximum value of 1023.

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00

     

    0x1C6

    32 Bits

    0x3F000000

    Relative Sectors

    The offset from the beginning of the disk to the beginning of the volume, counting by sectors.

    0x0000003F = 63

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00

     

     

    0x1CA

    32 Bits

    0X4BF57F00

    Total Sectors

    The total number of sectors in the volume.

    0x007FF54B = 8,385,867 Sectors = 4GB

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00

     

    Are you with me so far?  Good!  Now, Cylinder/Sector encoding can be a bit tricky, so let’s take a closer look. 

     

    Cylinder - Sector Encoding

    16

    15

    14

    13

    12

    11

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    Cylinder bits 1-8

    Cyl bits

    9 & 10

    Sector value

    Bits 1-6

     

    As you can see, the Sector value occupies the lower 6 bits of the word and the Cylinder occupies the upper 10 bits of the word.  In our example, the starting values for Cylinder and Sector are 01 00.  Values in the MBR use reverse-byte ordering, which is also referred to as ‘little-endian’ notation.  Therefore, we swap the bytes and find that our starting values are Cyl 0, Sec 1.

     

    Our ending values are more interesting:  BF 09.  First, we swap the bytes and obtain a hex value of 0x09BF.  This value in binary notation is 100110111111.  The following table illustrates how we derive the correct partition table values from these bytes:

     

    Example:  BF 09

    8

    7

    6

    5

    4

    3

    2

    1

    10

    9

    6

    5

    4

    3

    2

    1

    0

    0

    0

    0

    1

    0

    0

    1

    1

    0

    1

    1

    1

    1

    1

    1

    10 Cylinder value bits 1-8

    Cyl bits

    9 & 10

    Sector value

    bits 1-6

     

    The 6 low bits are all set to 1, therefore our Sector value is 111111 or 63.   You can see above how the bits are arranged for the Cylinder value.  The value above is 1000001001 (521).  Since both Cylinder and Head values begin numbering at zero, we have a total of 522 Cylinders and 255 Heads represented here.  This gives us an ending CHS value of:  522 x 255 x 63 = 8,385,930 sectors. 

     

    Subtracting the starting CHS address (Cylinder 0, Head 1, Sector 1) (63) gives us the total size of this partition:  8,385,867 sectors or 4GB.  We can verify this number by comparing it to the Total Sectors represented in the partition table:  4B F5 7F 00.  Applying reverse-byte ordering gives us 00 7F F5 4B which equals 8,385,867 sectors.

     

    So, now that we have an understanding of what is contained within the structures on the disk, let’s look at the sequence of events that occur.  Remember, this is just after POST has successfully completed.

     

    1.       The motherboard ROM BIOS attempts to access the first boot device specified in the BIOS.  (This is typically user configurable and can be edited using the BIOS configuration utility.)

     

    2.       The ROM BIOS reads Cylinder 0, Head 0, and Sector 1 of the first boot device.

     

    3.       The ROM BIOS loads that sector into memory and tests it.

    a.       For a floppy drive, the first sector is a FAT Boot Sector.

    b.      For a hard drive, the first sector is the Master Boot Record (MBR).

     

    4.       When booting to the hard drive, the ROM BIOS looks at the last two signature bytes of Sector 1 and verifies they are equal to 55AA.

    a.       If the signature bytes do not equal 55AA the system assumes that the MBR is corrupt or the hard drive has never been partitioned.  This invokes BIOS Interrupt 18, which displays an error that is BIOS-vendor specific such as “Operating System not found”.

    b.      If the BIOS finds that the last two bytes are 55AA, the MBR program executes.

     

    5.       The MBR searches the partition table for a boot indicator byte 0x80 indicating an active partition.

    a.       If no active partition is found, BIOS Interrupt 18 is invoked and a BIOS error is displayed such as “Operating System not found”.

    b.      If any boot indicator in the MBR has a value other than 0x80 or 0x00, or if more than one boot indicator indicates an active partition (0x80), the system stops and displays “Invalid partition table”.

    c.       If an active partition is found, that partition’s Boot Sector is loaded and tested.

     

    6.       The MBR loads the active partition’s Boot Sector and tests it for 55AA.

    a.       If the Boot Sector cannot be read after five retries, the system halts and displays “Error loading operating system”.

    b.      If the Boot Sector can be read but is missing its 55AA marker, “Missing operating system” is displayed and the system halts.

    c.       The bootstrap area is made up of a total of 16 sectors (0-15).  If sector 0 for the bootstrap code is valid, but there is corruption in sectors 1-15, you may get a black screen with no error.  In this case, it may be possible to transplant sectors 1-15 (not Sector 0) from another NTFS partition using DskProbe.

    d.      If the Boot Sector is loaded and it passes the 55AA test, the MBR passes control over to the active partition’s Boot Sector.

     

    7.       The active partition’s Boot Sector begins executing and looks for NTLDR.  The contents of the Boot Sector are entirely dependent on the format of the partition.   For example, if the boot partition is a FAT partition, Windows writes code to the Boot Sector that understands the FAT file system.  However, if the partition is NTFS, Windows writes NTFS-capable code.  The role of the Boot Sector code is to give Windows just enough information about the structure and format of a logical drive to enable it to read the NTLDR file from the root directory.   The errors at this point of the boot process are file system specific.  The following are possible errors with FAT and NTFS Boot Sectors.

    a.       FAT:  In all cases it displays “press any key to restart” after either of the following errors.

    (1)    If NTLDR is not found “NTLDR is missing” is displayed

    (2)    If NTLDR is on a bad sector, “Disk Error” displays.

    b.      NTFS:  In all cases it displays “Press CTRL+ALT+DEL to restart” after any of the following errors.

    (1)    If NTLDR is on a bad sector, “A disk read error occurred” is displayed.  This message may also be displayed on Windows 2000 or higher systems if Extended Int13 calls are required, but have been disabled in the CMOS or SCSI BIOS.  This behavior may also be seen if an FRS (File Record Segment) cannot be loaded or if any NTFS Metadata information is corrupt.

    (2)    If NTLDR is not found, “NTLDR is missing” displays.

    (3)    If NTLDR is compressed, “NTLDR is compressed” displays.

     

    8.       Once NTLDR is found, it is loaded into memory and executed.  At this point the Boot Loader phase begins.

     

    Troubleshooting the Initial Phase (Initial Disk Access)

    If you are unable to boot Windows and the failure is occurring very early in the boot process, begin by creating a Windows boot floppy and using it to attempt to boot the system.

     

                    Q119467 How to Create a Bootable Disk for an NTFS or FAT Partition

                    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q119467

     

    If you are unable to boot to Windows NT with a floppy, skip to the section titled “Unable to boot using floppy”.

     

    If you can successfully start the computer from the boot disk, the problem is limited to the Master Boot Record, the Boot Sector, or one or more of the following files:  NTLDR, NTDetect.com, or Boot.ini.  After Windows is running, you should immediately back up all data before you attempt to fix the Boot Sector or Master Boot Record.

     

    1.       Run a current virus scanning program to verify that no virus is present.

    2.       Select the method of troubleshooting based upon the error generated.

    ·         Operating System not found

    Typical BIOS-specific error indicating no 55AA signature on the MBR.  This must be repaired manually using a disk editor such as DskProbe.

    Warning:  Running FixMBR or FDISK /MBR in this case will clear the partition table.

    ·         Invalid Partition Table

    May be repaired using DiskProbe – check for multiple 0x80 boot indicators.

    ·         Error Loading Operating System

    This error indicates an invalid Boot Sector.  Use the “Repairing the Boot Sector” section below.

    ·         Missing Operating System

    This error indicates that the 55AA signature is missing from the Boot Sector.  This may be replaced using DskProbe or by using the “Repairing the Boot Sector” section below.

    ·         NTLDR is missing

    Use the “Repairing the NTLDR file” section below.

    ·         Disk error (FAT only)

    Indicates that NTLDR is located on a bad sector.

    ·         A disk read error occurred (NTFS only)

    Indicates that NTLDR is on a bad sector.  Use the “Repairing the NTLDR file” section below.

    ·         NTLDR is compressed

    Indicates that NTLDR is compressed.  Uncompress or replace NTLDR.  Note:  This error is rare.

    ·         Computer boots to a black screen with blinking cursor

    Could be an issue with the MBR or the active partition’s boot code (Q228734).  Use the appropriate section below.

     

    Repairing the MBR

    Use the FIXMBR command from the Windows Recovery Console to re-write the boot code located in the first 446 bytes of the MBR, but to leave the partition table intact.

     

    326215  How To Use the Recovery Console on a Windows Server 2003-Based Computer That Does Not Start

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;326215

     

    WARNING:  DO NOT use this method if you are receiving “Operating System not found” or other BIOS-specific error.  FIXMBR will zero out the partition table if the 55AA signature is missing from the MBR.  The data will be unrecoverable.

     

    Repairing the Boot Sector

    Use the FIXBOOT command from the Windows Recovery Console to write a new Boot Sector to the system partition.

     

    326215  How To Use the Recovery Console on a Windows Server 2003-Based Computer That Does Not Start

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;326215

     

    If you are not able to repair the Boot Sector by using this method, you may repair it manually using the following knowledge base article:

     

    Q153973 Recovering NTFS Boot Sector on NTFS Partitions

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q153973

     

    Repairing the NTLDR file

    Use the Recovery Console to copy NTLDR from \i386 directory of the Windows CD-ROM to the root of the hard drive.

     

    Unable to boot using floppy

    If you are unable to start the computer with a boot floppy and you are not receiving an error message, then the problem is most likely with your partition tables.  If invalid partitions are present and you are unable to start your computer with a boot disk, formatting the disk, reinstalling Windows and restoring from backup may be your only option.  It may be possible to recreate the partition table using DskProbe if a backup of the partition information is available.  It also may be possible to manually reconstruct the partition table however this is outside the scope of this blog post.  We may cover this later.

     

    If you do not have a current backup of the data on the computer, as a last resort a data recovery service may be able to recover some of your information. 

     

    Helpful knowledge base articles:

                    Q272395 Error Message: Boot Record Signature AA55 Not Found

                    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q272395

     

                    Q155892 Windows NT Boot Problem: Kernel File Is Missing From the Disk

                    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q155892

     

                    Q228004 Changing Active Partition Can Make Your System Unbootable

                    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q228004

     

                    Q155053 Black Screen on Boot

                    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q155053

     

                    Q314503 Computer Hangs with a Black Screen When You Start Windows

                    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q314503

     

                    Q153973 Recovering NTFS Boot Sector on NTFS Partitions

                    http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q153973

     

    Okay boys & girls, that’s it for now.  Next time, I’ll continue with the boot sequence and cover the Boot Loader phase.  This is where things really get interesting… J

     

  • Ntdebugging Blog

    Server Hangs with Event ID: 2021 and 2022

    • 3 Comments

    Hi again!  This is Tate from the CPR team and I’m going to show you how to debug a Server Service hang and the sometimes dreaded Event ID:  2021 and Event ID:  2022.  There is much Voodoo about troubleshooting these two events but never fear, it’s possible to debug quickly given the right approach.

     

     

    Recall, I’ve talked about two other common Server Service events prior, 2019 and 2020.  (Event ID: 20xx usually denotes a srv.sys series event) Also, from now on I’m going to refer to the Server Service as srv. It’s in the driver srv.sys that the bulk of the work is happening and as such that’s just what we call it.

     

    Anyway, if you have seen these events before you likely have queried the web with the ID or description and have seen the big KB317249 article.

    This is an oldie but goodie, but the problem is that, well it’s big and it would help to have perspective on how srv is supposed to work because all the root causes for the event seem quite varied…but often there is only one of two things happening here.

     

    1.)  Blocking in the I/O stack preventing normal amounts of SMB requests from getting processed in a timely fashion. (this is the most common that we see, the SMB pump failure case)

     

    2.)  Abnormally large amount of requests directed at srv(this is usually a really chatty SMB application or usually accumulation of ChangeNotify requests).

     

    I’ll show you how to determine which of these two are happening but first let’s start with the event text as a refresher…

     

    Event ID: 2022
    Source: Srv
    Description: Server was unable to find a free connection n times in the last s seconds.

    Event ID: 2021
    Source: Srv
    Description: Server was unable to create a work item %2 times in the last %3 seconds.

     

     

    What does this mean?

     

    We really have to start with an explanation of Connections and Work Items.  So, internally to srv these are just structures used to represent a connection into srv and an item of work…couldn’t be that straightforward right?  It is, when a request comes into srv it has to use an Endpoint, Connection, a Work Item, and other structures to know about where the request came from, what the work is, and how to get the response back to what client.

     

    At its core, srv is just a Kernel Mode Driver that does SMB processing.  It takes in network SMB/CIFS requests, converts them into local I/O, and pumps the return back out to the client.  But the problem here is why would you be “unable to find” or “unable to create” one of these or what breaks this pump/processing?

     

     

    But first, let’s comment on storage.  Kernel memory is not free and as mentioned in prior blogs these structures take up space in the kernel pools and or may have preconfigured srv nominal limits (see MaxWorkItems in KB317249, etc.) if you hit either of these limits the event will be logged.  So as we learned prior running out of Paged or NonPaged pool can hang a machine no problem but can also log these events should Srv not be able to allocate memory on behalf of client requests.  (See the Event ID:  2019 2020 blog for that info and troubleshooting).  What usually happens here though are these WorkItems build up due to blocking or high load and cause the Event ID:  2019 or 2020…(with the top consumer usually LSwn – LanmanServer WorkItems) and then you will see these along with the 2021/2022s.  An event party if you will with the 2021/2022 issue in srv actually exhausting kernel memory pools.

     

    Could you bump up the limits of everything in \LanmanServer via KB317249 not read further and be okay?

     

    Maybe, but it depends how much blocking you may have or how high the load gets before the point of failure.  If you are logging this even after the server has been working fine under your normal ebb and flow of load in the past, it may be likely that this article will not completely fix your problem so further work is needed.  Keep in mind too that if you bump up the limits in the registry per the above explanation you could actually hit the pool limits sooner!  The values in KB317249 are pretty conservative so they shouldn’t hurt but again it might not completely help.  Also, if you don’t like changing things until you understand more…read on!

     

     

     

    REASON #1 for Event ID:  2021 and 2022

     

    The pump mentioned above is really the Srv!WorkerThread and KQUEUE architecture.  When requests come in they are put in KQUEUEs via KeInsertQueue and as you might imagine, the processing/draining of those queues is done by the Srv!WorkerThread(s) via a call to KeRemoveQueue.  These threads are the heart of Srv and are the key to determining what is usually happening to break the SMB processing here, so we usually look there for the answer.

     

    To check in on srv you gather a kernel memory.dmp via KB244139 during the problem or with livekd (Sysinternals) perhaps…but the best way is KB244139 or by attaching the kernel debugger.  The best time to gather this dump is when the server is somewhat unresponsive via the network (SMB) or you are constantly logging the 2021/2022 events…or if the machine is completely hung after hitting these Events.

     

    With memory.dmp in hand, just load it up and do the following to get the system process’ EPROCESS address via !process 0 0 system

     

    0: kd> !process 0 0 system

    PROCESS 89067520  SessionId: 0  Cid: 0008    Peb: 00000000  ParentCid: 0000

        DirBase: 00030000  ObjectTable: 89067008  TableSize: 490.

        Image: System

     

    Now change process context using that address with .process /r /p <system EPROCESS address>

     

    0: kd> .process /r /p 89067520

    Implicit process is now 89067520

    Loading User Symbols

     

    Now let’s dump out all the system threads using !process <system EPROCESS address>

     

    0: kd> !process 89067520

    PROCESS 89067520  SessionId: 0  Cid: 0008    Peb: 00000000  ParentCid: 0000

        DirBase: 00030000  ObjectTable: 89067008  TableSize: 490.

        Image: System

    .

    .

    Dumping out the threads…

    .

    .

     

    Once the output of threads is finished, I like to search up from the bottom for “Start Address srv!WorkerThread” looking at the call stacks…

     

    Note:  There’s a shortcut here for Windows 2003 to just do a “!stacks 2 Srv!WorkerThread” right away.  This output will give you a really quick look at the threads without having to .process into system, etc.

     

    Here’s an example of a “good thread” just waiting for work to come in…sitting in KeRemoveQueue

     

     THREAD 887c14e0  Cid 8.474  Teb: 00000000  Win32Thread: 00000000 WAIT: (WrEventPairLow) KernelMode Non-Alertable

                be9fd324  Unknown

            Not impersonating

            Owning Process 89067520

            Wait Start TickCount    160832116     Elapsed Ticks: 203335

            Context Switch Count    134294              

            UserTime                  0:00:00.0000

            KernelTime                0:00:09.0343

            Start Address srv!WorkerThread (0xbea06880)

            Stack Init beb11000 Current beb10d3c Base beb11000 Limit beb0e000 Call 0

            Priority 10 BasePriority 9 PriorityDecrement 0 DecrementCount 0

     

            ChildEBP RetAddr  Args to Child

            beb10d54 8042ef5f 88a10eb8 be9fd320 00000001 nt!KiSwapThread+0x1b1

            beb10d78 bea068ec 00000000 00000000 00000000 nt!KeRemoveQueue+0x197

            beb10da8 804578c2 be9fd300 00000000 00000000 srv!WorkerThread+0x6c

            beb10ddc 8046c966 bea06880 be9fd320 00000000 nt!PspSystemThreadStartup+0x54

            00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

     

     

    What does a “bad thread” look like?

    Here is a blocked Srv!WorkerThread that is trying to process a FindFirst SMB request…

     

    THREAD 887dd4e0  Cid 8.45c  Teb: 00000000  Win32Thread: 00000000 WAIT: (UserRequest) KernelMode Non-Alertable

                88255db8  SynchronizationEvent

            IRP List:

                86941288: (0006,01fc) Flags: 00000884  Mdl: 00000000

            Impersonation token:  e5d56990 (Level Impersonation)

            Owning Process 89067520

            Wait Start TickCount    160783273     Elapsed Ticks: 252178

    NOTE the Tick count here on this thread….it’s been waiting a while, not good.

            Context Switch Count    12993              

            UserTime                  0:00:00.0000

            KernelTime                0:00:00.0843

            Start Address srv!WorkerThread (0xbea06880)

            Stack Init bec29000 Current bec284f4 Base bec29000 Limit bec26000 Call 0

            Priority 10 BasePriority 9 PriorityDecrement 0 DecrementCount 0

     

            ChildEBP RetAddr  Args to Child

            bec2850c 8042d893 00000000 882549c8 00000000 nt!KiSwapThread+0x1b1

            bec28534 f5ae3408 88255db8 00000006 00000000 nt!KeWaitForSingleObject+0x1a3

            bec285bc f5ae4066 bec285e4 00000000 00000000 SomeFiltr+0x3408<<<<<<<making a blocking call here…

            bec285fc f5ae19fb 884ce3a0 01941288 8041eecb SomeFiltr+0x4066

            bec287a4 804c4994 884ce3a0 00000000 bec28894 SomeFiltr+0x19fb

            bec287dc 804531c2 884766c8 00000000 bec28894 nt!IopParseFile+0x44

            bec28854 804da4d8 0000051c bec28950 00000040 nt!ObpLookupObjectName+0xf8

            bec28964 804a4495 00000000 00000000 00000000 nt!ObOpenObjectByName+0xc8

            bec28a40 804a403a 8690d008 00000001 bec28b4c nt!IopCreateFile+0x407

            bec28a88 bea0282c 8690d008 00000001 bec28b4c nt!IoCreateFile+0x36

            bec28af8 bea050f6 88a0d2f8 8690d008 00000001 srv!SrvIoCreateFile+0x32a

            bec28b8c bea0acdb 88a0d2f8 00000001 8046b000 srv!SrvQueryDirectoryFile+0x1c8

            bec28c5c bea0936d 88a0d2f8 00000001 00000000 srv!SrvFind2Loop+0x171

            bec28cd0 bea07d2f 88a0d201 bea0a9c8 88a0d2f8 srv!DoFindFirst2+0x373

            bec28cd8 bea0a9c8 88a0d2f8 88a0d2f8 804636f0 srv!SrvSmbFindFirst2+0x31

            bec28cf0 bea089b7 88994f38 88a0d2f8 00000000 srv!ExecuteTransaction+0xdc

            bec28d68 be9f27ae 88a0d2f8 be9f2cde 88a0d300 srv!SrvSmbTransaction+0x5af

            bec28d70 be9f2cde 88a0d300 88a144a0 bec28da8 srv!SrvProcessSmb+0x46

            bec28d84 bea069a9 00000000 00000000 00000000 srv!SrvRestartReceive+0xa2

            bec28da8 804578c2 88a14400 00000000 00000000 srv!WorkerThread+0x129

            bec28ddc 8046c966 bea06880 88a144a0 00000000 nt!PspSystemThreadStartup+0x54

            00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

     

    Note:  The tick count (Elapsed Ticks) was not bad for the prior “good thread” because we’ve just been waiting in the call to KeRemoveQueue for that long for work to arrive.  No work to do in that queue, no problem.  However, in this “bad thread “case we’ve grabbed a WorkItem to process and been waiting for a long time (about 1hr 5 min.) stuck while processing it, yikes.

     

     

    Why is this bad?

     

    So if the Srv!WorkerThread is the worker thread that drains the WorkItems from this list of request in the KQUEUEs and number of these threads are finite(they are), then we have a problem!  As more requests come into srv they will continue to get placed in the KQUEUEs by different event handler threads at the transport layer but with no WorkerThread available to process the queue on the back end (and considering these Connections and WorkItems are stored in Pool and the number is finite as well) this is how and usually why the pump stops and we log these events!  So not only might we run out of the configured limits in the registry (KB317249) for srv, we might run out of Non Paged Pool, and we are definitely not giving clients a timely response if at all in this state.

     

    What if I see “good” and “bad” threads in my !process of the system process?

    This is likely because of the two queues that we have in service.  There are blocking and a non-blocking queues so you may see the threads that are waiting in KeRemoveQueue are actually for the blocking queue whereas your other bad threads are blocked on things like CreateFile, Find, etc. requests…supposed to be shorter time requests, appropriately so, on the the non-blocking work queues.  In general, if you see any Srv!WorkerThread blocked in anything but KeRemoveQueue for very long, it’s bad.

     

     

    So what is the solution?

     

    Referencing the above “bad thread” for SomeFiltr.sys note that the filter driver in question has chosen to make a blocking call to KeWaitForSingleObject.  This is blocking a FindFirst request from a client but more importantly it’s consuming an Srv!WorkerThread in the process as stated….so it’s actually the filter driver making the decision to stop the processing so the fault lies with it in this case.

     

    Now it’s worth mentioning that this is a fairly simple 1st tier blocking example but suffice to say any blocking call taking too long in a srv!WorkerThread can have bad consequences.  This could have easily been any filter driver or device driver in the I/O stack (between the network and srv and the disk) which is blocking doing some work to “help” process the I/O or more difficult to debug the filter or function driver could be waiting on some other queuing mechanism that it implements…so you see how srv can encounter a dependency failure here.

    Another trick that we use is looking at all other thread stacks containing the name of the filter driver to see if it’s doing something recognizable like waiting on another Kernel api, filter, etc…in other words guessing through the black box of it’s decision to wait on this thread and making an educated guess of what work it is trying to do on another…A common case is when the blocked thread may be waiting on a response from an LPC to LSASS which is performing some security check. Then it’s necessary to jump …..

     

    In summary, since we can’t see further into the blocking here through SomeFiltr.sys and there’s no other threads in process with this driver in the call stack we’d recommend testing updating, removing, or contact the vendor of the SomeFiltr.sys filter to see if this is fixed, etc.  You can do an “lmvm SomeFiltr”  to get the time/date information for the file.

     

     

    So, what if all the threads are just waiting like in the good thread example?

     

    This could be the unfortunate side of gathering a single memory.dmp to represent a problem that may be transient.  If a fraction of a second before you hit the Ctrl+Scroll+Scroll to issue the dump the Srv!WorkerThread unblocks, it may look completely normal.  What you can do for this class of issue that has temporary blocking is either try again to get it in state, or have a look at the device stack that srv has to work through to get things done(see devstack below).  Usually you will find out what filter drivers you are running, etc. and since that is the #1 reason for this event updating them is usually the solution because we or the vendor have likely already debugged this for you and with the filter vendor’s help have likely fixed it in a later version.

     

    This is why usually the first thing we target is filter drivers before debugging anything for hangs or if a dump is not available.  By frequency this is far and away the usual root cause for hangs of many many (did I say many?) symptom types.

     

    How do I check the members of device stack that Srv.sys has to work through to get I/O requests processed, filter drivers I need to update, etc?

     

    This is cool…so did you notice that there was an Irp listed in the prior “bad thread”?

     

    Dump it out…

     

    0: kd> !irp 86941288

    Irp is active with 11 stacks 11 is current (= 0x86941460)

     No Mdl Thread 887dd4e0:  Irp stack trace. 

         cmd  flg cl Device   File     Completion-Context

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                                                    Args: 00000000 00000000 00000000 00000000

    >[  0, 0]   1  0 884ce3a0 878977a8 00000000-00000000   

                           \FileSystem\SomeFitr

                                                    Args: bec28778 01000000 00070000 00000000

     

    Now grab the Device Object for SomeFiltr…

     

    0: kd> !devstack 884ce3a0

      !DevObj   !DrvObj            !DevExt   ObjectName

    > 884ce3a0  \FileSystem\SomeFiltr884ce458 

      88ba5c80  \Driver\InnocentDriver       88ba5d38 

      88ba5460  \FileSystem\Ntfs   88ba5518 

     

    Now you can see how this thing works….ntfs talks to InnocentDriver and it to SomeFiltr, etc….all these guys have to work flawlessly to not kill the srv request dependent upon this device stack.

     

    What if you didn’t have an Irp, no problem…everthing hangs off Object Manager so start at Ntfs’s device objects…This is cooler…

     

     

    0: kd> !drvobj \FileSystem\Ntfs  <<<<note that’s !DRVobj for the Driver Object

    Driver object (88dd2710) is for:

     \FileSystem\Ntfs

    Driver Extension List: (id , addr)

     

    Device Object list:

    882b4120  8717a620  88b91800  88b981e0

    88b9e600  88ba5460  88f583e0  88dd25f0

     

    0: kd> !devstack 882b4120

      !DevObj   !DrvObj            !DevExt   ObjectName

      86576020  \FileSystem\SomeFiltr865760d8  <<<<<<there you are Mr. SomeFiltr

    > 882b4120  \FileSystem\Ntfs   882b41d8 

     

    0: kd> !devstack 8717a620

      !DevObj   !DrvObj            !DevExt   ObjectName

      8792ae00  \FileSystem\SomeFiltr8792aeb8  <<<<<<and here

    > 8717a620  \FileSystem\Ntfs   8717a6d8 

    .

    .etc…

    .

    0: kd> !devstack 88dd25f0

      !DevObj   !DrvObj            !DevExt   ObjectName

      886c9020  \FileSystem\SomeFiltr886c90d8  <<<<<<<and here

      88f74f00  \FileSystem\DfsDriver88f74fb8 

      88f74020  \Driver\InnocentDriver       88f740d8 

    > 88dd25f0  \FileSystem\Ntfs   00000000  Ntfs

     

    The prudent thing to do if you aren’t sure of root cause by debugging the Srv!WorkerThreads you have captured in a memory.dmp is at least make sure all the filter drivers in these stacks are updated and perhaps even your storage drivers that are being used under ntfs serving your file share(s).

     

     

    REASON #2 for Event ID:  2021 and 2022

     

    What we have just discussed was the most common reason for hitting the Event ID:  2021 and 2022 event in blocking by a filter or function driver on the I/O stack and how to identify.  There are of course other reasons like bursts of demand and a lower than the maximum configured Srv (the nominal limits mentioned above and discussed in KB317249) or just simply a lot of long term requests that are on the system such as ChangeNotify requests.  How would you find out about these?  Analyze the network traffic.

     

    There are tons of network sniffers which allow you to gauge what type of SMB traffic is going in between servers so I won’t get into that here.  I will only say that if you have the ability to generate Statistics of top SMB commands, that’s the way to go.  You could however just filter on SMB and ballpark what kind of request and from which clients are occurring most and given your client configuration if that’s normal.  What we see here usually is a misbehaving network application in a loop, etc. hitting the file server with repeated requests for the same file, etc.  This is also a great strategy for high system process CPU issues where when you look at the thread call stacks generating the CPU via Process Explorer (sysinternals) you find the base of the activity in Srv!WorkerThreads coming and going rapidly.

     

    I will mention however there are a few known ways to reduce the cumulative load of ChangeNotifications and QueryFile and QueryPathInfo requests that Windows Clients actually generate.  You can check out the following KBs for these configuration changes on the client side:

     

    For the client SMB side’s shell:  NoRemoteChangeNotify.

    http://support.microsoft.com/kb/812669/en-us

    “An update is available in Windows to turn off directory notification SMB requests”

     

     

    For the client SMB side (redirector):  InfoCacheLevel

    (also has shell change NoRemoteRecursiveEvents )

    http://support.microsoft.com/kb/816375/en-us

    “Windows XP Explorer Pane flickers on mapped network drives”

     

     

     

    I hope you’ve enjoyed this post and hopefully the next time you see these srv events you can nail the root cause with windbg no problem!

     

    Keep that device stack updated and until next time…

Page 23 of 24 (239 items) «2021222324