• Ntdebugging Blog

    Using Netmon to figure out the source of high CPU in WMIprvse.exe

    • 5 Comments

    Recently I was working with a customer who had a file server experiencing high CPU in the WMIprvse.exe process. We received multiple user dumps and noted that someone or something was running the same query again and again. We needed to figure out what was running the query in a tight loop, causing the high CPU.

    image

    Figure 1 - Task Manager on FileServer

    Before we get into the exact troubleshooting steps, let me provide some background on WMI.  Winmgmt is the WMI service within a SVCHOST process running under the LocalSystem account.  In all cases, the WMI service automatically starts when the first management application or script requests connection to a WMI namespace. For more information, see Starting and Stopping the WMI Service. To avoid stopping all the services when a provider fails, each provider is loaded into a separate host process named "Wmiprvse.exe". This also allows each instance of Wmiprvse to run under a different account with varying security. For more details you can look at the MSDN documentation on WMI.

     

    I dumped out all the services in the various svchost.exe processes. You can do this from a command prompt by running the tasklist /svc command. In my instance, I found that the WinMgmt service was running in svchost, PID  452 (PID number will vary). Someone had to be making RPC calls to this svchost.exe process to run the WMI queries. It could be some local process on the machine; it could even be a process on a remote machine.

     

    At this point I requested user dumps of PID 452 from the customer.  This would allow me to determine who was making the RPC calls to svchost.exe to run the WMI queries.  While the customer was uploading the dumps, we decided to get a Network Monitor trace to see if the RPC calls were coming over the network.

     

    Immediately, I could see a lot of RPC traffic to the svchost.exe process(PID=452).

     

    image

    Figure 2 - Network Monitor Output from the FileServer. Notice the Source and destination ports and IP addresses. IP addresses are hidden by the aliases

    Looking at the RPC payload, I could see the text of the WMI query. You can see this in the Hex Details Pane. The query that was running in a loop was Select * from Win32_Process. Looks like I found the source of the WMI queries.

     

    At this point, we got the source IP for the RPC packets. We logged into the machine, and brought up the Task Manager.

     

    image

    Figure 3 - Task Manager on Remote Machine(Machine1)

    Immediately we saw that there was some script running inside a Wscript.exe process.  At this point I was pretty sure that this script was the culprit. The customer was not sure what this was, and was not comfortable terminating the process.  To prove my suspicion, I had him open a command prompt and run the following command, netstat –ano.

     

    image

    Figure 4 - Netstat output from Remote Machine

     

    From the output in Fig. 4, I could see a TCP connection created by PID 3532 (wscript.exe). Looking at the local and foreign addresses from the above output, they matched up exactly to what we were seeing in the Network Monitor trace.

     

    In the above case, we already had our suspicions on the wscript.exe process; however, sometimes it might not be that easy. In that case, we could have used the netstat output to look at all connections to the file server (157.59.123.121). If there were multiple connections, then we can also narrow it down by the port number. Based on that, we could have found the PID responsible.

     

    The customer called me later in the day, and told me that they had recently updated their scripts. One of their scripts had a bug which was running WMI scripts in a tight loop. Fixing the script caused the problem to go away.

     

    Had the query being coming from a local process, I would have had to debug the svchost.exe process, and figure out who was making the WMI calls. However, since we could see the traffic on netmon, we didn’t need to use the debugger. Interesting way to get to the root of the problem without using a debugger!

     

     

    Share this post :
  • Ntdebugging Blog

    XPERF Videos and other Blog Feedback

    • 1 Comments

    We appreciate the comments and feedback sent about the blog. You guys are great! We’ve had comments requesting to make the XPERF videos downloadable. I’m working to move them to a hosting site and will post the link here soon. 

    Are there any content topics you would like to see discussed on NTDebugging, please drop us a note and we’ll try to address it in a future blog.

    Thanks for reading the blog!

    Cheers,

    Ron Stock

  • Ntdebugging Blog

    Gone with the Exception...SX* to the Rescue!

    • 1 Comments

     

    Hi, my name is Anurag Sarin, and I'm an Escalation Engineer from the Platforms Global Escalation Services 24x7 Team, located in Bangalore, India.

     

    I recently worked a case dealing with a component that reported an exception. But it was not critical enough to crash the system and it was simply logged in a file. Because the root cause of the problem was not yet known, we did not have a deterministic way to reproduce the exception. The exception would just come and go. So catching the exception became a challenge. Do we take a dump? How do we break into the debugger to catch the exception when it occurs? Manual dumps wouldn't help because we didn't know when the exception would occur, and even if we did, it happened too fast to use Ctrl + Scroll Lock  + Scroll Lock.

     

    A similar challenge is when you want to start debugging just after a module loads without knowing when the module load will happen. You can run into this while reverse engineering code or while live debugging. You may want to save all debugging efforts until a particular module gets loaded. Sometimes you just want to know if a module ever gets loaded in a particular code path.

     

    In these situations, the ‘sx*’ command comes to our rescue. Below you will find a couple of nuggets to assist with these types of scenarios.

    Break on exception

     

    Let's say Windows Media Player reports “RPC Server is unavailable” when a user clicks "File\Open" and attempts to browse to a remote computer. In this situation the ‘sxe’ command is your friend. This command ‘sets an exception’ from the debugger. When this is set the target breaks into the debugger before any other error handlers are activated

     

    By the way, RPC_S_SERVER_UNAVAILABLE is win32 error 1722 (0x06ba) with the description “RPC Server is unavailable” and is a common networking issue.

     

    This is how WinError.h in Windows SDK documents it

     

    //

    // MessageId: RPC_S_SERVER_UNAVAILABLE

    //

    // MessageText:

    //

    // The RPC server is unavailable.

    //

    #define RPC_S_SERVER_UNAVAILABLE         1722L

     

    Here are the steps to catch the “RPC Server is unavailable” exception. 

     

    1. Attach the debugger to the process with the RPC_S_SERVER_UNAVAILABLE error.  In our example we’ll attach to wmplayer.exe.

     

    2. Break into debugger to set or verify the symbol path.

     

    3. Reload the symbols.

     

    .reload

     

    4. Clear any previously set exceptions.

     

    sdx  *

     

    5. Set the exception on 1722 (0x06ba) i.e. RPC_S_SERVER_UNAVAILABLE “RPC Server is unavailable”.

     

    sxe 06ba

     

    6. Let the process run until we hit the exception. You may hit a different exception before the one you’re looking for in which case you hit 'g' to continue execution.

     

    g

     

    7. This is the output from the debugger on a Vista RTM machine.

     

    (b2c.f28): Unknown exception - code 000006ba (first chance) 

    First chance exceptions are reported before any exception handling.

    This exception may be expected and handled.

    eax=0601e47c ebx=00000000 ecx=00000000 edx=049cd488 esi=000006ba edi=0601e5a4

    eip=76dbb09e esp=0601e47c ebp=0601e4cc iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    kernel32!RaiseException+0x58:

    76dbb09e c9              leave

    0:036> kv100

    ChildEBP RetAddr  Args to Child             

    0601e4cc 76ea0911 000006ba 00000001 00000000 kernel32!RaiseException+0x58 (FPO: [4,20,0])

    0601e4e4 76ea08e9 000006ba 0601e9dc 76f2e07d RPCRT4!RpcpRaiseException+0x1e (FPO: [0,0,0])

    0601e4f0 76f2e07d 000006ba f61cdc85 05545938 RPCRT4!RpcRaiseException+0x16

    0601e9dc 760a9e12 760a8908 760a95ea 0601e9fc RPCRT4!NdrClientCall2+0xd51                       

    0601e9f4 760a9e5b 0553fee0 00000064 0601ea54 NETAPI32!NetrWkstaGetInfo+0x19 (FPO: [3,1,0])

    0601ea40 75106f02 0553fee0 00000064 0601ea54 NETAPI32!NetWkstaGetInfo+0x3b (FPO: [SEH])      ßNetWkstaGetInfo called for request on information on workstation.

    0601ea5c 75106f86 0553fee0 0601ea84 0601eb5c ntlanman!ServerExists+0x19 (FPO: [2,2,0])

    0601ea98 75e24ede 0601ec00 05545938 0601ec44 ntlanman!NPDfsLmGetResourceInformation+0x177 (FPO: [4,8,4])

    0601eab0 75e23bb9 055939f0 caa0f4a6 00000000 MPR!CGetResourceInformation::TestProvider+0x1a (FPO: [1,0,0])

    0601eb08 75e23ac7 caa0f4ea 0601ec44 0601eb5c MPR!CRoutedOperation::GetResult+0xf6 (FPO: [SEH])

    0601eb44 75e23a79 0601ec54 0601eb90 75e24f0d MPR!CMprOperation::Perform+0x65 (FPO: [SEH])

    0601eb50 75e24f0d 00000001 75e24f5c 000000a4 MPR!CRoutedOperation::Perform+0x22 (FPO: [1,0,4])

    0601eb90 76f5978a 0601ec00 05545938 0601ec44 MPR!WNetGetResourceInformationW+0x26 (FPO: [4,13,0])

    0601ebb4 76f597f5 0601ec00 00000000 00000000 SHELL32!SHWNetGetResourceInformationAlloc+0x40 (FPO: [4,0,4])

    0601ebd8 76f63d75 0601ec00 0601ec54 0601ec44 SHELL32!CNetFolder::_GetResourceInformationAlloc+0xf4 (FPO: [5,0,4])

    0601f278 76f63bc3 001d019a 05560288 0553db08 SHELL32!CNetFolder::_ParseUNCName+0x1a7 (FPO: [7,414,4])

    0601f4c4 76fe2c3e 0553db50 001d019a 05560288 SHELL32!CNetFolder::ParseDisplayName+0x123 (FPO: [7,135,4])

    0601f52c 6de11f8c 05547c94 001d019a 05560288 SHELL32!CRegFolder::ParseDisplayName+0x90 (FPO: [7,14,4])

    0601f5b0 6de11e0d 001d019a 05560288 0553db08 NetworkExplorer!CNetworkExplorerFolder::_IsValidServer+0x9f (FPO: [5,21,4])

    0601f5e0 76fe58ab 0553da98 001d019a 05560288 NetworkExplorer!CNetworkExplorerFolder::ParseDisplayName+0x68 (FPO: [7,2,4])

    0601f608 76fe2d13 0553da98 049d0008 001d019a SHELL32!CDesktopFolder::_ChildParseDisplayName+0x22 (FPO: [8,0,4])

    0601f654 76fe2c3e 0553da98 001d019a 049d0008 SHELL32!CDesktopFolder::ParseDisplayName+0x8c (FPO: [7,6,4])

    0601f6bc 76fe2b91 0559544c 001d019a 05560288 SHELL32!CRegFolder::ParseDisplayName+0x90 (FPO: [7,14,4])

    0601f6f4 76c8c0f3 00000000 30c10000 0601f788 SHELL32!SHParseDisplayName+0x98 (FPO: [5,2,4])

    0601f734 76c8c03a 0601f994 0601f788 0601f780 COMDLG32!CAsyncParser::_ParseFromDesktop+0x73 (FPO: [3,7,0])

    0601fba0 76c808d8 0556b850 00000000 00000000 COMDLG32!CAsyncParser::_ParseOneItem+0x2e1 (FPO: [2,275,4])

    0601fbbc 76c80bc6 0601fc40 77c1b132 0555b088 COMDLG32!CAsyncParser::_Parse+0x4a (FPO: [0,1,0])

    0601fbc4 77c1b132 0555b088 00000000 00000000 COMDLG32!CAsyncParser::s_ThreadProc+0xd (FPO: [1,0,0])

    0601fc40 76de3833 0023c72c 0601fc8c 77b1a9bd SHLWAPI!WrapperThreadProc+0x10f (FPO: [1,25,4])

    0601fc4c 77b1a9bd 0023c72c 0601b824 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])

    0601fc8c 00000000 77c1b09a 0023c72c 00000000 ntdll!_RtlUserThreadStart+0x23 (FPO: [SEH])

     

     

    8. After you break into the stack for the 6ba exception, you might want to get a dump using the following command.

     

    0:000> .dump /ma c:\dump1.dmp

    Creating c:\dump2.dmp - mini user dump used

    Dump successfully written

     

    9. Now that we have a dump in the exception code path, it might be a good idea to tell the debugger to stop breaking for this exception.  

     

    sxd 06ba

     

     

    Now the debugger does not break for this exception.

    Note: To detach the debugger from the process do a ‘.detach’. If you just close the debugger it will exit the process too.

     

    Looking at the top of the stack, we see that wmplayer sent a request for a share access and failed due to (what looks like a network problem). I say this as we see that MRP  invokes ntlanman (Windows Redirector for Network Shares) which further request information on workstation.name ( NetWkstaGetInfo ) on the network (a Remote RPC call RPCRT4!NdrClientCall2). The next step should be to take a network trace and investigate if there are any networking problems on the network.

     

    And it would make root cause determination much easier if you knew the user was attempting to connect to a nonexistent machine name (\\<Bad Machine Name>) while doing the File\Open in Windows Media Player. J

    Break on Module load

     

    Now let’s talk about a scenario where we want to start debugging when a particular DLL gets loaded. Let’s take the example of mpr.dll loading using the scenario above.

     

    1.       Follow steps 1 through 4 mentioned above, to attach to wmplayer.exe

     

    2.        In this case we set an exception on ‘module load of MRP.DLL‘.

     

    sxe ld mpr.dll

     

    3. Now let the process run. It will break back into the debugger when the DLL is loaded.

     

     g

     

    This is the debugger output on a Windows 2008 machine when MPR.DLL gets loaded into the wmplayer process.

     

    0:001> g

    ModLoad: 753d0000 753e4000   C:\Windows\system32\MPR.dll

    eax=75a3b111 ebx=00000000 ecx=00000000 edx=75a7275c esi=7ffad000 edi=20000000

    eip=771a9a94 esp=0291df74 ebp=0291dfb8 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    ntdll!KiFastSystemCallRet:

    771a9a94 c3              ret

    0:014> kvn ffff

     #   Memory  ChildEBP RetAddr  Args to Child             

    00           0291df70 771a8764 7717df04 000003d4 ffffffff ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    01         4 0291df74 7717df04 000003d4 ffffffff 0291e0a0 ntdll!ZwMapViewOfSection+0xc (FPO: [10,0,0])

    02        44 0291dfb8 7717c855 000003d4 0291e0a0 0291e088 ntdll!LdrpMapViewOfDllSection+0x67 (FPO: [5,2,4])

    03       108 0291e0c0 7717fab9 01e2f640 0291e100 01e66a58 ntdll!LdrpMapDll+0x417 (FPO: [SEH])

    04       274 0291e334 7717610e 01e2f640 72410321 0291e378 ntdll!LdrpLoadImportModule+0x206 (FPO: [4,145,4])

    05        4c 0291e380 77176233 01e2f640 01e4f020 0291e3ac ntdll!LdrpHandleOneNewFormatImportDescriptor+0x84 (FPO: [4,10,4])

    06        1c 0291e39c 77176248 01e2f640 01e4f020 724102c0 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d (FPO: [3,0,4])

    07        7c 0291e418 7717c5ba 01e2f640 01e4f020 75a35de0 ntdll!LdrpWalkImportDescriptor+0x1f2 (FPO: [SEH])

    08       288 0291e6a0 77177a52 00000000 01e2f640 0291e994 ntdll!LdrpLoadDll+0x2f1 (FPO: [SEH])

    09       284 0291e924 76d031ba 01e2f640 0291e994 0291e954 ntdll!LdrLoadDll+0x22a (FPO: [SEH])

    0a        64 0291e988 75a4b34f 0291ea04 00000000 00000008 kernel32!LoadLibraryExW+0x252 (FPO: [SEH])

    0b        24 0291e9ac 75a4b239 0291ea04 0291e9d0 0291e9d4 ole32!CClassCache::CDllPathEntry::LoadDll+0xae (FPO: [4,1,4])

    0c        30 0291e9dc 75a469fe 0291ea04 0291ecec 0291e9fc ole32!CClassCache::CDllPathEntry::Create_rl+0x37 (FPO: [3,3,4])

    0d       24c 0291ec28 75a46913 00000001 0291ecec 0291ec58 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4 (FPO: [3,139,4])

    0e        48 0291ec70 75a467b4 00000001 00505350 0291ec9c ole32!CClassCache::GetClassObjectActivator+0x224 (FPO: [3,10,4])

    0f        38 0291eca8 75a474c0 0291ecec 00000000 0291f2f4 ole32!CClassCache::GetClassObject+0x30 (FPO: [1,3,4])

    10        7c 0291ed24 75a4705b 75b4e428 00000000 0291f2f4 ole32!CServerContextActivator::CreateInstance+0x110 (FPO: [4,25,4])

    11        40 0291ed64 75a4772d 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

    12        54 0291edb8 75a4764f 75b4e4ac 00000000 0291f2f4 ole32!CApartmentActivator::CreateInstance+0x112 (FPO: [4,13,4])

    13        20 0291edd8 75a47609 75b4e48c 00000001 00000000 ole32!CProcessActivator::CCICallback+0x6d (FPO: [6,1,4])

    14        20 0291edf8 75a475ba 75b4e48c 0291f150 00000000 ole32!CProcessActivator::AttemptActivation+0x2c (FPO: [7,0,0])

    15        3c 0291ee34 75a476af 75b4e48c 0291f150 00000000 ole32!CProcessActivator::ActivateByContext+0x4f (FPO: [6,3,4])

    16        28 0291ee5c 75a4705b 75b4e48c 00000000 0291f2f4 ole32!CProcessActivator::CreateInstance+0x49 (FPO: [4,1,4])

    17        40 0291ee9c 75a47121 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

    18       260 0291f0fc 75a4705b 75b4e4b4 00000000 0291f2f4 ole32!CClientContextActivator::CreateInstance+0xb0 (FPO: [4,6,4])

    19        40 0291f13c 75a472c5 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

    1a       7d4 0291f910 75a6e2a2 0291f9f4 00000000 00000017 ole32!ICoCreateInstanceEx+0x403 (FPO: [8,50,4])

    1b        60 0291f970 75a6e203 0291f9f4 00000000 00000017 ole32!CComActivator::DoCreateInstance+0xd9 (FPO: [7,11,4])

    1c        24 0291f994 75a6e1bc 0291f9f4 00000000 00000017 ole32!CoCreateInstanceEx+0x38 (FPO: [6,0,0])

    1d        30 0291f9c4 72751489 0291f9f4 00000000 00000017 ole32!CoCreateInstance+0x37 (FPO: [5,3,4])

    1e        44 0291fa08 727550b9 00000000 022cd14c 022cd140 FunDisc!CQueryWorker::CreateProvider+0x1a6 (FPO: [1,7,4])

    1f        34 0291fa3c 72755028 0291fa80 00000000 00000000 FunDisc!CQueryWorker::Execute+0x77 (FPO: [1,7,4])

    20        18 0291fa54 74892c26 00000000 0291fa80 0057f198 FunDisc!CFunctionInstanceCollectionQuery::Execute+0x66 (FPO: [2,0,4])

    21        1c 0291fa70 74892d21 0057f198 00000001 00000000 NetworkItemFactory!CNetworkItemFactory::_StartFDQuery+0x84 (FPO: [4,0,4])

    22        30 0291faa0 74892d8b 0057f198 0291fac0 7489478d NetworkItemFactory!CNetworkItemFactory::s_StartFDInMTA+0xa8 (FPO: [1,3,4])

    23         c 0291faac 7489478d 0057f198 00000000 00000000 NetworkItemFactory!FDBackgroundThreadHandler+0x1b (FPO: [1,0,0])

    24        14 0291fac0 76d24911 01e4abd0 0291fb0c 7718e4b6 NetworkItemFactory!ThreadHandler+0xf (FPO: [1,0,0])

    25         c 0291facc 7718e4b6 01e4abd0 75a3404c 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])

    26        40 0291fb0c 7718e489 7489477e 01e4abd0 ffffffff ntdll!__RtlUserThreadStart+0x23 (FPO: [SEH])

    27        18 0291fb24 00000000 7489477e 01e4abd0 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])

     

    From the stack above, you can clearly see that seven frames below the top is what ‘was’ happening just ‘before’ the dll got loaded. How do I get to the point just ‘after’ the module gets loaded? Well one way is to use ‘gu’ twelve times. Note: The purpose of this is to show you how to use the various exception related commands.

     

    0:014> gu

    eax=00000000 ebx=00000000 ecx=0291df74 edx=771a9a94 esi=7ffad000 edi=20000000

    eip=771a8764 esp=0291df78 ebp=0291dfb8 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    ntdll!ZwMapViewOfSection+0xc:

    771a8764 c22800          ret     28h

    <SNIP>

     

    This is the point just ‘after’ MPR.DLL got loaded.

     

    0:014> kvn ffff

     #   Memory  ChildEBP RetAddr  Args to Child             

    00           0291e9dc 75a469fe 0291ea04 0291ecec 0291e9fc ole32!CClassCache::CDllPathEntry::Create_rl+0x37 (FPO: [3,3,4])

    01       24c 0291ec28 75a46913 00000001 0291ecec 0291ec58 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4 (FPO: [3,139,4])

    02        48 0291ec70 75a467b4 00000001 00505350 0291ec9c ole32!CClassCache::GetClassObjectActivator+0x224 (FPO: [3,10,4])

    03        38 0291eca8 75a474c0 0291ecec 00000000 0291f2f4 ole32!CClassCache::GetClassObject+0x30 (FPO: [1,3,4])

    04        7c 0291ed24 75a4705b 75b4e428 00000000 0291f2f4 ole32!CServerContextActivator::CreateInstance+0x110 (FPO: [4,25,4])

    05        40 0291ed64 75a4772d 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

    06        54 0291edb8 75a4764f 75b4e4ac 00000000 0291f2f4 ole32!CApartmentActivator::CreateInstance+0x112 (FPO: [4,13,4])

    07        20 0291edd8 75a47609 75b4e48c 00000001 00000000 ole32!CProcessActivator::CCICallback+0x6d (FPO: [6,1,4])

    08        20 0291edf8 75a475ba 75b4e48c 0291f150 00000000 ole32!CProcessActivator::AttemptActivation+0x2c (FPO: [7,0,0])

    09        3c 0291ee34 75a476af 75b4e48c 0291f150 00000000 ole32!CProcessActivator::ActivateByContext+0x4f (FPO: [6,3,4])

    0a        28 0291ee5c 75a4705b 75b4e48c 00000000 0291f2f4 ole32!CProcessActivator::CreateInstance+0x49 (FPO: [4,1,4])

    0b        40 0291ee9c 75a47121 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

    0c       260 0291f0fc 75a4705b 75b4e4b4 00000000 0291f2f4 ole32!CClientContextActivator::CreateInstance+0xb0 (FPO: [4,6,4])

    0d        40 0291f13c 75a472c5 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

    0e       7d4 0291f910 75a6e2a2 0291f9f4 00000000 00000017 ole32!ICoCreateInstanceEx+0x403 (FPO: [8,50,4])

    0f        60 0291f970 75a6e203 0291f9f4 00000000 00000017 ole32!CComActivator::DoCreateInstance+0xd9 (FPO: [7,11,4])

    10        24 0291f994 75a6e1bc 0291f9f4 00000000 00000017 ole32!CoCreateInstanceEx+0x38 (FPO: [6,0,0])

    11        30 0291f9c4 72751489 0291f9f4 00000000 00000017 ole32!CoCreateInstance+0x37 (FPO: [5,3,4])

    12        44 0291fa08 727550b9 00000000 022cd14c 022cd140 FunDisc!CQueryWorker::CreateProvider+0x1a6 (FPO: [1,7,4])

    13        34 0291fa3c 72755028 0291fa80 00000000 00000000 FunDisc!CQueryWorker::Execute+0x77 (FPO: [1,7,4])

    14        18 0291fa54 74892c26 00000000 0291fa80 0057f198 FunDisc!CFunctionInstanceCollectionQuery::Execute+0x66 (FPO: [2,0,4])

    15        1c 0291fa70 74892d21 0057f198 00000001 00000000 NetworkItemFactory!CNetworkItemFactory::_StartFDQuery+0x84 (FPO: [4,0,4])

    16        30 0291faa0 74892d8b 0057f198 0291fac0 7489478d NetworkItemFactory!CNetworkItemFactory::s_StartFDInMTA+0xa8 (FPO: [1,3,4])

    17         c 0291faac 7489478d 0057f198 00000000 00000000 NetworkItemFactory!FDBackgroundThreadHandler+0x1b (FPO: [1,0,0])

    18        14 0291fac0 76d24911 01e4abd0 0291fb0c 7718e4b6 NetworkItemFactory!ThreadHandler+0xf (FPO: [1,0,0])

    19         c 0291facc 7718e4b6 01e4abd0 75a3404c 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])

    1a        40 0291fb0c 7718e489 7489477e 01e4abd0 ffffffff ntdll!__RtlUserThreadStart+0x23 (FPO: [SEH])

    1b        18 0291fb24 00000000 7489477e 01e4abd0 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])

    Disable First Chance Exceptions

    Another case where sx* commands are helpful is when you need to disable first change exceptions. Click here for more on the topic.

    If you do not want to see a first chance exception in the debugger, you should disable first chance exception handling for the specific exception code. Otherwise, when the first chance exception occurs, you may need to instruct the debugger to pass on the exception to the program to be handled as usual. How do we do this?

     This is an example of disabling first chance Access Violation Exception.

    sxd av

    Let me show you a quick example using the MMC.exe process. On my Windows XP machine, I attached a debugger to the mmc.exe process and added “Computer Management” from File -> ‘Add/Remove Snap in’. The mmc.exe broke into the debugger on a first chance exception every time I clicked “Services and Applications”. I did not want this to happen because this was not the focus of my investigation.

    This is what the first chance exception looks like in the debugger.

    (d84.cc): Break instruction exception - code 80000003 (first chance)

    eax=7ffd7000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005

    eip=7c901230 esp=00aeffcc ebp=00aefff4 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000246

    ntdll!DbgBreakPoint:

    7c901230 cc              int     3

     

    BTW, if you type ‘sx’, you get the current exception settings.

    0:002> sx

     

    <Snip>

     

      av - Access violation - break - not handled

     

    <Snip>

     

    So I disabled first chance exceptions by using this command.

    sxd av

    After running the command, first chance exceptions don’t cause the debugger to break. We only see the notifications.

     

    (d84.d40): Access violation - code c0000005 (first chance)

     

    After disabling first chance exceptions, this is our sx settings.

    0:002> sx

     

    <Snip>

     

      av - Access violation - second-chance break - not handled

     

    <Snip>

    Break on a user mode module from kernel mode debugging

     

    A common challenge we face in ‘day to day’ debugging is how to break on a ‘user mode’ DLL - ‘module load’, while doing a kernel mode debug. Here's an example of setting a break point on the mpr!WNetEnumResource API from kernel mode.

     

    1.       You must run !gflag +ksl to enable Kernel Stack Loading in the debugger. Now the kernel will notify the debugger when a user mode module loads occur (normally the kernel debugger is only notified of kernel module load, not user).  One caveat is the debugger is only notified on the first load of a particular module, rather than every load of the module.  However, this will let you set any breakpoints you're interested in the user mode module with ‘bu’.

     

    kd> !gflag +ksl

    New NtGlobalFlag contents: 0x00040000

        ksl - Enable loading of kernel debugger symbols

     

     2.      Set the process context to your user mode process using ‘.process -r -p <EPROCESS>’. I used mmc.exe in my case .

     

    PROCESS 899d0c78  SessionId: 1  Cid: 0850    Peb: 7ffdf000  ParentCid: 0138

        DirBase: 7f7bf160  ObjectTable: 94fedcc8  HandleCount: 345.

        Image: mmc.exe

     

    kd> .process -r -p 899d0c78

    Implicit process is now 899d0c78

    .cache forcedecodeuser done

    Loading User Symbols

    ................................................................

    .................

     

    3.      Set an exception on module load ‘sxe ld usermode.dll’

     

    kd> sxe ld mpr.dll

     

    kd> sx

    <Snip>

      ld - Load module - break

           (only break for mpr.dll)

    <Snip>

     

    This is what the debugger displays when mpr.dll gets loaded into the mmc process.

     

    nt!DbgLoadImageSymbols+0x47:

    816497f3 cc              int     3

     

    kd> k ffff

      Memory  ChildEBP RetAddr 

              8e089b20 81780dcb nt!DbgLoadImageSymbols+0x47

           54 8e089b74 8186d735 nt!MiLoadUserSymbols+0x1ed

           a4 8e089c18 8185f655 nt!MiMapViewOfImageSection+0x885

           70 8e089c88 8185f745 nt!MiMapViewOfSection+0x22a

           30 8e089cb8 8185f952 nt!MmMapViewOfSection+0x2a

           7c 8e089d34 81692a1a nt!NtMapViewOfSection+0x203

            0 8e089d34 77a59a94 nt!KiFastCallEntry+0x12a

              0771e050 77a58764 ntdll!KiFastSystemCallRet

            4 0771e054 77a2df04 ntdll!ZwMapViewOfSection+0xc

           44 0771e098 77a2c855 ntdll!LdrpMapViewOfDllSection+0x67

          108 0771e1a0 77a2fab9 ntdll!LdrpMapDll+0x417

          274 0771e414 77a2610e ntdll!LdrpLoadImportModule+0x206

           4c 0771e460 77a26233 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x84

           1c 0771e47c 77a26248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d

           7c 0771e4f8 77a2c5ba ntdll!LdrpWalkImportDescriptor+0x1f2

          288 0771e780 77a27a52 ntdll!LdrpLoadDll+0x2f1

          284 0771ea04 777131ba ntdll!LdrLoadDll+0x22a

           64 0771ea68 7689b34f kernel32!LoadLibraryExW+0x252

           24 0771ea8c 7689b239 ole32!CClassCache::CDllPathEntry::LoadDll+0xae

           30 0771eabc 768969fe ole32!CClassCache::CDllPathEntry::Create_rl+0x37

          24c 0771ed08 76896913 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4

           48 0771ed50 768967b4 ole32!CClassCache::GetClassObjectActivator+0x224

           38 0771ed88 768974c0 ole32!CClassCache::GetClassObject+0x30

           7c 0771ee04 7689705b ole32!CServerContextActivator::CreateInstance+0x110

           40 0771ee44 7689772d ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108

           54 0771ee98 7689764f ole32!CApartmentActivator::CreateInstance+0x112

           20 0771eeb8 76897609 ole32!CProcessActivator::CCICallback+0x6d

           20 0771eed8 768975ba ole32!CProcessActivator::AttemptActivation+0x2c

           3c 0771ef14 768976af ole32!CProcessActivator::ActivateByContext+0x4f

           28 0771ef3c 7689705b ole32!CProcessActivator::CreateInstance+0x49

           40 0771ef7c 76897121 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108

          260 0771f1dc 7689705b ole32!CClientContextActivator::CreateInstance+0xb0

           40 0771f21c 768972c5 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108

          7dc 0771f9f8 768be2a2 ole32!ICoCreateInstanceEx+0x403

           60 0771fa58 768be203 ole32!CComActivator::DoCreateInstance+0xd9

           24 0771fa7c 768be1bc ole32!CoCreateInstanceEx+0x38

           30 0771faac 733d1489 ole32!CoCreateInstance+0x37

           44 0771faf0 733d50b9 fundisc!CQueryWorker::CreateProvider+0x1a6

           34 0771fb24 733d5028 fundisc!CQueryWorker::Execute+0x77

           18 0771fb3c 751d2c26 fundisc!CFunctionInstanceCollectionQuery::Execute+0x66

           1c 0771fb58 751d2d21 networkitemfactory!CNetworkItemFactory::_StartFDQuery+0x84

           30 0771fb88 751d2d8b networkitemfactory!CNetworkItemFactory::s_StartFDInMTA+0xa8

            c 0771fb94 751d478d networkitemfactory!FDBackgroundThreadHandler+0x1b

           14 0771fba8 77734911 networkitemfactory!ThreadHandler+0xf

            c 0771fbb4 77a3e4b6 kernel32!BaseThreadInitThunk+0xe

           40 0771fbf4 77a3e489 ntdll!__RtlUserThreadStart+0x23

           18 0771fc0c 00000000 ntdll!_RtlUserThreadStart+0x1b

     

    So now we can set an unresolved break point on mpr!WNetEnumResourceW

     

    kd> bu mpr!WNetEnumResourceW

     

    And this is how we hit the break point for mpr!WNetEnumResourceW

     

    Breakpoint 0 hit

    mpr!WNetEnumResourceW:

    001b:75c83522 6a14            push    14h

     

    kd> !thread

    THREAD 8a0bfaf0  Cid 0850.0540  Teb: 7ff49000 Win32Thread: 00000000 RUNNING on processor 0

    Not impersonating

    DeviceMap                 92b291e8

    Owning Process            0       Image:         <Unknown>

    Attached Process          899d0c78       Image:         mmc.exe

    Wait Start TickCount      678430         Ticks: 0

    Context Switch Count      8            

    UserTime                  00:00:00.015

    KernelTime                00:00:00.031

    Win32 Start Address ntdll!TppWorkerThread (0x77a3dbb0)

    Stack Init 8e10e000 Current 8e10dc98 Base 8e10e000 Limit 8e10b000 Call 0

    Priority 10 BasePriority 8 PriorityDecrement 2 IoPriority 2 PagePriority 5

    ChildEBP RetAddr  Args to Child             

    070bfd34 75193c14 06db4658 070bfd5c 06ec9c28 mpr!WNetEnumResourceW   

    070bfd74 75193ee3 00000000 00000000 00000000 fdWNet!CWNetProvider::EnumerateFunc+0x1c3  

    070bfd84 75193f23 7ff49000 06e4f258 75193ee9 fdWNet!CWNetProviderWorkItem::Enumerate+0x24

    070bfd9c 77a18a5c 00000000 7007c2c5 06259e38 fdWNet!CWNetProvider::ThreadProc+0x3a   

    070bfe00 77a3de3f 01cfc920 06e4f258 7007c3f5 ntdll!RtlpTpWorkCallback+0xbf   

    070bff30 77734911 06259e30 070bff7c 77a3e4b6 ntdll!TppWorkerThread+0x545   

    070bff3c 77a3e4b6 06259e30 7007c3b9 00000000 kernel32!BaseThreadInitThunk+0xe

    070bff7c 77a3e489 77a3dbb0 06259e30 ffffffff ntdll!__RtlUserThreadStart+0x23

    070bff94 00000000 77a3dbb0 06259e30 00000000 ntdll!_RtlUserThreadStart+0x1b

     

    Hope this article helps you navigate around using the sx commands to collect data or debug. Take a look at the Windbg help files for more info on the sx* commands.

     

     

    Share this post :
  • Ntdebugging Blog

    !Search to the Rescue!

    • 6 Comments

    My name is Trey Nash and I am an Escalation Engineer on the Core OS team.  My experience is as a software developer, and therefore, my blog posts tend to be slanted in the direction of helping developers during the feature development, testing, and the support phases.

    Windbg is definitely a feature-rich debugger.  Sometimes, reading the debugger help during idle time can provide some great insight into the capabilities of the debugger.  However, the debugger help comes up short when you ask questions such as, “command huh-huh sure is cool, but when would I ever want to do that?!?”  Besides, if you’re caught reading the windbg help in your spare time, you may be on the receiving end of some ridicule form those in your social circle.

    In this post, I would like to speak a bit about the !search command, among others,  and when you would want to use it.  Additionally, I’ll be demonstrating some related techniques germane to when you would use !search in the first place.

    Not long ago, I was working with a dump from a machine that was hung and it was my job to find out why.  After applying many of the techniques in our hang dump blog post, I discovered that there was a thread in particular that was stuck, which I show below:

    0: kd> !thread fe016330
    THREAD fe016330  Cid 0004.02e0  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
        fcf1a698  SynchronizationEvent
        fe0163a8  NotificationTimer
    IRP List:
       
    fcb47650
    : (0006,01d8) Flags: 00000404  Mdl: 00000000
    Not impersonating
    DeviceMap                 e18008e8
    Owning Process            fe790648       Image:         System
    Attached Process          N/A            Image:         N/A
    Wait Start TickCount      75337682       Ticks: 145 (0:00:00:02.265)
    Context Switch Count      31848752            
    UserTime                  00:00:00.000
    KernelTime               
    00:25:38.000
    Start Address Treyresearch (0xf45629e0)
    Stack Init f50e1000 Current f50e05e8 Base f50e1000 Limit f50de000 Call 0
    Priority 15 BasePriority 15 PriorityDecrement 0
    ChildEBP RetAddr  Args to Child             
    f50e0600 e103d5b1 fe016330 fe0163d8 00000000 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
    f50e062c e103df9e fe016330 fd0321f8 00000000 nt!KiSwapThread+0x2e5 (FPO: [0,7,0])
    f50e0674 e101e05b fcf1a698 0000001b 00000000 nt!KeWaitForSingleObject+0x346 (FPO: [5,13,4])
    f50e06b0 e102e00a e3faf6e0 f50e0900 00000000 nt!ExpWaitForResource+0xd5 (FPO: [0,5,4])
    f50e06d0 f5a988cb
    fd0321f8 00000001 f50e08e4 nt!ExAcquireResourceExclusiveLite
    +0x8d (FPO: [2,3,0])
    f50e06e0 f5ad81c4 f50e0900 e3faf6e0 00000001
    Ntfs!NtfsAcquirePagingResourceExclusive
    +0x20 (FPO: [3,0,0])
    f50e08e4 f5ad8909 f50e0900 fcb47650 fdcc3020 Ntfs!NtfsCommonCleanup+0x193 (FPO: [SEH])
    f50e0a54 e1040153 fe00d718
    fcb47650 fcb47650 Ntfs!NtfsFsdCleanup
    +0xcf (FPO: [SEH])
    f50e0a68 f5b4fd28 fddbc818 fe6d1a28 00000000 nt!IofCallDriver+0x45 (FPO: [0,0,4])
    f50e0a94 e1040153 fdcc3020 fcb47650 fcb47650 fltmgr!FltpDispatch+0x152 (FPO: [2,6,0])
    f50e0aa8 f5b4fb25 fdde0cb0 fcb47650 fdd8dc18 nt!IofCallDriver+0x45 (FPO: [0,0,4])
    f50e0acc f5b4fcf5 f50e0aec fdde0cb0 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x20b (FPO: [3,4,4])
    f50e0b04 e1040153 fdde0cb0 fcb47650 fcb47650 fltmgr!FltpDispatch+0x11f (FPO: [2,6,0])
    f50e0b18 e112ec0a fdfd9bd8 fe774730 fdfd9bf0 nt!IofCallDriver+0x45 (FPO: [0,0,4])
    f50e0b48 e112b6af fe790648 fdde0cb0 00010003 nt!IopCloseFile+0x2ae (FPO: [5,7,0])
    f50e0b78 e112b852 fe790648 00000001 fe774730 nt!ObpDecrementHandleCount+0xcc (FPO: [4,2,4])
    f50e0ba0 e112b776 e1802e48 fdfd9bf0 00006e54 nt!ObpCloseHandleTableEntry+0x131 (FPO: [5,1,0])
    f50e0be4 e112b7c1 00006e54 00000000 f50e0c00 nt!ObpCloseHandle+0x82 (FPO: [2,7,4])
    f50e0bf4 e1033bdf 00006e54 f50e0cfc e103b00c
    nt!NtClose
    +0x1b (FPO: [1,0,0])
    f50e0bf4 e103b00c 00006e54 f50e0cfc e103b00c nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ f50e0c00)
    f50e0c70 f4562119 00006e54 00030000 00000068 nt!ZwClose+0x11 (FPO: [1,0,0])
    WARNING: Stack unwind information not available. Following frames may be wrong.
    f50e0cfc f456229f f50e0d34 f50e0d2c f4577f50 Treyresearch+0x11119
    f50e0d38 f45626f9 fe016330 fc825368 00000000 Treyresearch+0x1129f
    f50e0d70 f45629ae f1ed8000 00002000 00000000 Treyresearch+0x116f9
    f50e0d90 f4562ba3 fc825318 fde59b38 00000003 Treyresearch+0x119ae
    f50e0dac e1120833 f4577e20 00000000 00000000 Treyresearch+0x11ba3
    f50e0ddc e103fe9f f45629e0 f4577e20 00000000 nt!PspSystemThreadStartup+0x2e (FPO: [SEH])
    00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

    Note:    Eagle-eye readers may have noticed that the debugger states every frame in the above thread uses frame pointer optimization (FPO).  This is a bug in version 6.11.0001.402 of the debugger.

    I have highlighted the interesting bits above.  It seems that this thread is some sort of worker thread, probably created by the Treyresearch driver.  It is doing some work that includes closing a particular file.  In the process of closing the file, NTFS wants to acquire the paging resource for this particular file, and that is where this thread gets stuck.

    What is the paging resource?  Many file systems have a per-file lock that one acquires when performing paging I/O such that other destabilizing activity cannot occur at the same time as a paging operation.  The paging resource for the file is this lock.

    To further illustrate the paging resource, let’s check out the file in question.  One handy things that you can do is follow the stack down to where you see the most recent call to nt!IofCallDriver.  You can see in the MSDN documentation that IoCallDriver accepts two parameters, a DEVICE_OBJECT* and an IRP*.  However, nt!IofCallDriver is a fastcall function, so you cannot find its parameters on the stack.  But since you know that nt!IofCallDriver is calling a driver dispatch routine, and since driver dispatch routines have the same prototype as IoCallDriver, you can easily find the IRP in question which I have highlighted in the Ntfs!NtfsFsdCleanup frame of the  thread’s stack above and dumped out below:

    0: kd> !irp fcb47650
    Irp is active with 10 stacks 10 is current (= 0xfcb47804)
     No Mdl: No System Buffer: Thread fe016330:  Irp stack trace. 
         cmd  flg cl Device   File     Completion-Context
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
    >[ 12, 0]   0  0 fe00d718
    fdfd9bf0
    00000000-00000000   
                  \FileSystem\Ntfs
                         Args: 00000000 00000000 00000000 00000000

    And from the IRP above, we can find the real file that the thread above is trying to acquire the paging lock for:

    0: kd> !fileobj fdfd9bf0

    \Program Files\Treyresearch\Treyresearch.data

    Device Object: 0xfe6da738   \Driver\Ftdisk
    Vpb: 0xfe791818
    Access: Read Write Delete SharedRead SharedWrite SharedDelete

    Flags:  0x43062
           Synchronous IO
           Sequential Only
           Cache Supported
           Modified
           Size Changed
           Handle Created

    File Object is currently busy and has 0 waiters.

    FsContext:
    0xe3faf7a8      FsContext2: 0xe3faf8f0
    Private Cache Map: 0xfccf1fa0
    CurrentByteOffset: 6400164
    Cache Data:
      Section Object Pointers: fc956f3c
      Shared Cache Map: fccf1ec8         File Offset: 6400164
      Vacb: fe77bd80
      Your data is at: cbe80164

    The file object contains two fields named FsContext and FsContext2 shown above.  These fields are for the file system to store file system specific information.  Most file systems would store the paging resource in these context fields somewhere.  For example, NTFS uses FsContext to hold the stream control block (SCB) and you can surmise that somewhere down in the SCB is where NTFS stores the paging resource. (It’s actually more complicated than that, but that’s good enough for sake of this discussion)

    Now, let’s take a look at the paging resource itself.  You can see from the documentation of ExAcquireResourceExclusiveLite, the first parameter is an ERESOURCE and I have highlighted it in our thread stack above.  Given that, we can use the !locks command to get a better idea of what’s going on:

    0: kd> !locks -v fd0321f8

    Resource @ 0xfd0321f8    Shared 1 owning threads
        Contention Count = 2
        NumberOfSharedWaiters = 1
        NumberOfExclusiveWaiters = 1
         Threads:
    fe77f1e0-07<*>     
    ß This thread is the owner

         THREAD fe77f1e0  Cid 0004.0064  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
             fc825320  NotificationEvent
         Not impersonating
         DeviceMap                 e18008e8
         Owning Process            fe790648       Image:         System
         Attached Process          fced2d88       Image:         store.exe
         Wait Start TickCount      75281842       Ticks: 55985 (0:00:14:34.765)
         Context Switch Count      4440231            
         UserTime                  00:00:00.000
         KernelTime                00:01:51.171
         Start Address nt!MiMappedPageWriter (0xe101962c)
         Stack Init f6137000 Current f61366ac Base f6137000 Limit f6134000 Call 0
         Priority 17 BasePriority 8 PriorityDecrement 0
         ChildEBP RetAddr 
         f61366c4 e103d5b1 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
         f61366f0 e103df9e nt!KiSwapThread+0x2e5 (FPO: [0,7,0])
         f6136738 f4562e28 nt!KeWaitForSingleObject+0x346 (FPO: [5,13,4])
    WARNING: Stack unwind information not available. Following frames may be wrong.
         f6136760 f4563229 Treyresearch+0x11e28
         f6136788 f4559d8f Treyresearch+0x12229
         f61367e4 f4570b95 Treyresearch+0x8d8f
         f613684c f4570e39 Treyresearch+0x1fb95
         f6136898 f4570f4b Treyresearch+0x1fe39
         f61368c4 f5b4cb73 Treyresearch+0x1ff4b
         f613692c f5b4efc2 fltmgr!FltpPerformPostCallbacks+0x1c5 (FPO: [1,17,4])
         f6136940 f5b4f4f1 fltmgr!FltpProcessIoCompletion+0x10 (FPO: [1,0,0])
         f6136950 f5b4fb83 fltmgr!FltpPassThroughCompletion+0x89 (FPO: [3,0,4])
         f6136980 f5b4fcf5 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x269 (FPO: [3,4,4])
         f61369b8 e1040153 fltmgr!FltpDispatch+0x11f (FPO: [2,6,0])
         f61369cc f452b2f8 nt!IofCallDriver+0x45 (FPO: [0,0,4])
         f6136a18 f452b6d3 exifs!NtSystemWrite+0x1ff (FPO: [Non-Fpo])
         f6136ab0 f452ae1d exifs!IfsInternalWrite+0x1a0 (FPO: [Non-Fpo])
         f6136b24 f4549e02 exifs!MRxIfsWrite+0x333 (FPO: [Non-Fpo])
         f6136b44 f4541a8e exifs!RxLowIoSubmit+0x180 (FPO: [Uses EBP] [2,2,4])
         f6136b54 f45427ed exifs!RxLowIoWriteShell+0x2e (FPO: [1,0,1])
         f6136c64 f452fbe3 exifs!RxCommonWrite+0xcc1 (FPO: [Non-Fpo])
         f6136cf8 f453dffd exifs!RxFsdCommonDispatch+0x2c4 (FPO: [Non-Fpo])
         f6136d24 f452439a exifs!RxFsdDispatch+0x93 (FPO: [Non-Fpo])
         f6136d40 e1040153 exifs!MRxIfsFsdDispatch+0x6c (FPO: [Non-Fpo])
         f6136d54 e101c5b4 nt!IofCallDriver+0x45 (FPO: [0,0,4])
         f6136d68 e101971d nt!IoAsynchronousPageWrite+0xd0 (FPO: [8,0,4])
         f6136dac e1120833
    nt!MiMappedPageWriter
    +0x12e (FPO: [1,4,0])
         f6136ddc e103fe9f nt!PspSystemThreadStartup+0x2e (FPO: [SEH])
         00000000 00000000 nt!KiThreadStartup+0x16

    fe78eb40-01   

         THREAD fe78eb40  Cid 0004.001c  Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable
             fcea3890  Semaphore Limit 0x7fffffff
             fe78ebb8  NotificationTimer
         Not impersonating
         DeviceMap                 e18008e8
         Owning Process            fe790648       Image:         System
         Attached Process          N/A            Image:         N/A
         Wait Start TickCount      75337718       Ticks: 109 (0:00:00:01.703)
         Context Switch Count      3480314            
         UserTime                  00:00:00.000
         KernelTime                00:01:35.875
         Start Address nt!ExpWorkerThread (0xe102da4b)
         Stack Init f60ef000 Current f60eebf0 Base f60ef000 Limit f60ec000 Call 0
         Priority 14 BasePriority 13 PriorityDecrement 1
         ChildEBP RetAddr 
         f60eec08 e103d5b1 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
         f60eec34 e103df9e nt!KiSwapThread+0x2e5 (FPO: [0,7,0])
         f60eec7c e101e05b nt!KeWaitForSingleObject+0x346 (FPO: [5,13,4])
         f60eecb8 e1024ba8 nt!ExpWaitForResource+0xd5 (FPO: [0,5,4])
         f60eecd8 f5a98915 nt!ExAcquireResourceSharedLite+0xf5 (FPO: [2,3,4])
         f60eece8 f5ae198a Ntfs!NtfsAcquirePagingResourceShared+0x20 (FPO: [3,0,0])
         f60eed04 e1044997 Ntfs!NtfsAcquireScbForLazyWrite+0x7a (FPO: [2,0,0])
         f60eed40 e104328e nt!CcWriteBehind+0x27 (FPO: [0,8,4])
         f60eed80 e102db08 nt!CcWorkerThread+0x15a (FPO: [SEH])
         f60eedac e1120833 nt!ExpWorkerThread+0xeb (FPO: [1,5,0])
         f60eeddc e103fe9f nt!PspSystemThreadStartup+0x2e (FPO: [SEH])
         00000000 00000000 nt!KiThreadStartup+0x16

         Threads Waiting On Exclusive Access:

                  fe016330      

    Now this is some juicy output.  I used the –v option to also expand some of the threads related to this lock.  The owner thread is the one with the asterisk (*) next to it and you can see from the expanded thread listing and based on the fact that the function at the bottom of the stack is nt!MiMappedPageWriter, that the thread in question is the mapped page writer.  This thread is a system thread that periodically sweeps through a list of dirty pages flushing them out to disk.  Interestingly, the mapped page writer has acquired the ERESOURCE seven times.  That is shown by the -07 next to the owner thread in the above output.  The second thread is waiting for shared access and it is a system file cache thread.  And finally, the third thread is our initial hung thread that is waiting on exclusive access.

    As a sanity check, let’s make sure that the ERESOURCE and the file in question are related.  Previously, I stated that in the NTFS file system the FsContext field of the file object contains an SCB.  Let’s pass that pointer to !pool and get some more information about it:

    0: kd> !pool 0xe3faf7a8 2
    Pool page e3faf7a8 region is Paged pool
    *e3faf6d8 size:  330 previous size:   20  (Allocated) *Ntff
                  Pooltag Ntff : FCB_DATA, Binary : ntfs.sys

    Now, we can use the search command (s), to search the pool memory above and see if our ERESOURCE is in there.  If so, that would satisfy our sanity check:

    0: kd> s -d e3faf6d8 L 330/4 fd0321f8
    e3faf72c  fd0321f8 0c9013aa 01c9969b 42002f46  .!..........F/.B
    e3faf7b4  fd0321f8 06410000 00000000 06400164  .!....A.....d.@.

    Now that we are satisfied that we have matched up the ERESOURCE with the file that owns it, let’s move on.  At first glance of the mapped writer thread, it looks like the offending entity is exifs.  After all, it’s the most interesting component on the mapped page writer stack.  But don’t be fooled.  What you see in the mapped page writer stack is a snapshot of what it was doing when the dump was taken, and that’s not necessarily the work that caused things to go bad in the first place.  Even though exifs is a file system, it is not NTFS.  And we know an NTFS file’s paging resource is locked.  Keep in mind that the mapped page writer is processing a list.  So the items on the list that have caused the contention may have long been taken off the list and processed.

    So what do you do?  Unfortunately, the badness happened some time ago.  We don’t have a stack to look at to show who did this and when.  But what we can do is perform a search of memory to see if there are any references to the ERESOURCE elsewhere in memory.  If we find some hits, maybe they will shed some more light on what is going on.  So, let’s go ahead and do that:

    0: kd> !search fd0321f8
    Searching PFNs in range 0000000B - 000DFFF9 for [FFFFFFFFFD0321F8 - FFFFFFFFFD0321F8]

    Pfn      Offset   Hit      Va       Pte     
    - - - - - - - - - - - - - - - - - - - - - - - - - - -
    00007AB7 000004E8 FD0321F8 FCAB74E8 C03F2ADC
           fcab74e0+0x8     : Ntfr  -- ERESOURCE
    00007CDF 00000950 FD0321F8 FCCDF950 C03F337C
           fccdf938+0x18    : NpFc  -- CCB, client control block - Process: fdd70248
    0000803A 0000022C FD03A1F8 FD03A22C C03F40E8
           fd03a220+0xc     : Vad   -- Mm virtual address descriptors
    0000977F 0000003C FD0321F8
    FE77F03C
    C03F9DFC
           fe77f000+0x3c    :
    MmWe  -- Work entries for writing out modified filesystem pages.
    0000977F 00000514 FD0321F8 FE77F514
    C03F9DFC
           fe77f4d8+0x3c    :
    MmWe  -- Work entries for writing out modified filesystem pages.
    0000977F 000005BC FD0321F8 FE77F5BC
    C03F9DFC
           fe77f580+0x3c    :
    MmWe  -- Work entries for writing out modified filesystem pages.
    0000977F 00000904 FD0321F8 FE77F904
    C03F9DFC
           fe77f8c8+0x3c    :
    MmWe  -- Work entries for writing out modified filesystem pages.
    0000977F 00000BA4 FD0321F8 FE77FBA4
    C03F9DFC
           fe77fb68+0x3c    :
    MmWe  -- Work entries for writing out modified filesystem pages.
    0000978A 00000324 FD0321F8 FE78A324
    C03F9E28
           fe78a2e8+0x3c    :
    MmWe  -- Work entries for writing out modified filesystem pages.
    0000978A 0000051C FD0321F8 FE78A51C
    C03F9E28
           fe78a4e0+0x3c    :
    MmWe  -- Work entries for writing out modified filesystem pages.
    00060B7B 0000072C FD0321F8 E3FAF72C C038FEBC
           e3faf6d8+0x54    : Ntff  -- FCB_DATA
    00060B7B 000007B4 FD0321F8 E3FAF7B4 C038FEBC
           e3faf6d8+0xdc    : Ntff  -- FCB_DATA
    000D9653 00000638 FD0321F8 F50E0638 C03D4380
    000D9653 00000694 FD0321F8 F50E0694 C03D4380
    000D9653 000006D8 FD0321F8 F50E06D8 C03D4380
    000DE415 0000054C F50321F8 F503254C C03D40C8
    000DFBF4 00000C40 FD0321F8 F60EEC40 C03D83B8
    000DFBF4 00000C9C FD0321F8 F60EEC9C C03D83B8
    000DFBF4 00000CE0 FD0321F8 F60EECE0 C03D83B8
    Search done.

    One thing to note is that there are some hits that are not exactly what we were looking for.  That’s because !search also looks for values that are one bit off from what you requested.  Check out the help for how you can adjust this behavior.  Also, !search performs some extra work along the way.  If it notices that the virtual address found is in the pool, it displays information about that pool entry.

    Do you spot the curiosity?  Remember that the ERESOURCE had been acquired seven times.  Correspondingly, there are seven hits in the !search list with the MmWe tag!  And not surprisingly, the description of that tag pulled from pooltag.txt file in the triage directory where the debugger is installed reveals that these are paging work entries.  Now we’re on to something.

    Note:    Incidentally, if you want to determine where a virtual address in the list above resides, you can always pass it to !address.  The addresses at the end of the !seach output are often addresses on some thread’s stack.  If you pass those addresses to !thread, it is smart enough to find the thread that is associated with that stack and display it for you.

     “OK, but how do I find the real culprit?”, you then say.  Well, you have to continue to dig with what you have.  Unfortunately, the contents of the pool entries with MmWe tags are not documented, although, we definitely know what they are.  Let’s take a look at one of them using !pool:

    0: kd> !pool FE77F03C
    Pool page fe77f03c region is Nonpaged pool
    *
    fe77f000 size:   a8
    previous size:    0  (Allocated) *MmWe
                  Pooltag MmWe : Work entries for writing out modified filesystem pages., Binary : nt!mm

    Now, we see where the pool entry starts and how big the entry is.  So, let’s take a look at the contents of the memory:

    0: kd> dps fe77f000 L a8/@$ptrsize
    fe77f000  0a150000
    fe77f004  65576d4d
    fe77f008  fe78a4e8
    fe77f00c  e10b3af0 nt!MmMappedPageWriterList
    fe77f010  06140000
    fe77f014  00000000
    fe77f018  06150000
    fe77f01c  00000000
    fe77f020  fcc20788
    fe77f024  00000000
    fe77f028  e10b3f20 nt!MmMappedFileHeader
    fe77f02c  00000000
    fe77f030  00000000
    fe77f034  fdfd9bf0
    fe77f038  fdc6f008
    fe77f03c  fd0321f8
    fe77f040  00000000
    fe77f044  00000000
    fe77f048  00000000
    fe77f04c  0002005c
    fe77f050  00000000
    fe77f054  f1d20000
    fe77f058  00000000
    fe77f05c  00010000
    fe77f060  00000000
    fe77f064  0007036c
    fe77f068  000cb59d
    fe77f06c  000a97fe
    fe77f070  0000d7ef
    fe77f074  00021c90
    fe77f078  0005d6b1
    fe77f07c  000a5642
    fe77f080  0004d5c3
    fe77f084  000ae354
    fe77f088  00038249
    fe77f08c  000509ea
    fe77f090  0009c915
    fe77f094  00018dd6
    fe77f098  000b94d7
    fe77f09c  0006ca58
    fe77f0a0  00091e29
    fe77f0a4  00000000

    I used the dps command so that it would check to see if any of the values matched to any known symbols.  As you can see, there are a couple of symbols in there.  Notice that I divided the size of the block by the pseudo register $ptrsize as well.  The symbols that dps found validate that this block of memory is associated with the mapped page writer.

    You may also ask yourself, “are any of these values pointers to other pool blocks?”  If you wanted to know that, you could iterate over each one of them passing them to !pool or !address.  That sounds tedious to do manually.  But thankfully, the debugger has some nice command tokens such as .foreach that make this a breeze.  If you want to pass each of the above values to !pool, you can perform the following in the debugger:

    .foreach /pS 1 /ps 1 ( value { dp /c 1 fe77f000 L a8/@$ptrsize } ) { .if( value != 0 ) {.printf "**** %p ****\n", ${value}; !pool ${value} 0x2} }

    The address highlighted above is the address of the pool block revealed by the previous !pool command.  a8 is the size of the block in bytes and since dp lists the memory in units of pointer size, I divide a8 by $ptrsize.

    I won’t show the full output here, because it’s rather verbose.  But I have duplicated the output from the dps command above with added pool tags next to items that are pool entries:

    0: kd> dps fe77f000 L a8/@$ptrsize
    fe77f000  0a150000
    fe77f004  65576d4d
    fe77f008  fe78a4e8   Pooltag MmWe : Work entries for writing out modified filesystem pages., Binary : nt!mm
    fe77f00c  e10b3af0 nt!MmMappedPageWriterList
    fe77f010  06140000
    fe77f014  00000000
    fe77f018  06150000
    fe77f01c  00000000
    fe77f020 
    fcc20788   Pooltag Irp  : Io, IRP packets
    fe77f024  00000000
    fe77f028  e10b3f20 nt!MmMappedFileHeader
    fe77f02c  00000000
    fe77f030  00000000
    fe77f034 
    fdfd9bf0   Pooltag File : File objects
    fe77f038  fdc6f008   Pooltag MmCa : Mm control areas for mapped files, Binary : nt!mm
    fe77f03c 
    fd0321f8
       Pooltag Ntfr : ERESOURCE, Binary : ntfs.sys
    fe77f040  00000000
    fe77f044  00000000
    fe77f048  00000000
    fe77f04c  0002005c

    <snip>

     

    Sure enough, the file object above is identical to the file we identified earlier on as the one that the worker thread was attempting to close a handle to.  Also, you can see that our ERESOURCE is in there as well.  And even more, now we have an IRP that may reveal even more information.  Let’s see:

    0: kd> !irp fcc20788 1
    Irp is active with 10 stacks 12 is current (= 00000000)
     Mdl=fe77f048: No System Buffer: Thread fd089e6c: 
    Irp is completed
    .  Pending has been returned
    Flags = 00000003
    ThreadListEntry.Flink = fcc20798
    ThreadListEntry.Blink = fcc20798
    IoStatus.Status = 00000000
    IoStatus.Information = 00010000

    RequestorMode = 00000000
    Cancel = 00
    CancelIrql = 0
    ApcEnvironment = 00
    UserIosb = fe77f018
    UserEvent = 00000000
    Overlay.AsynchronousParameters.UserApcRoutine = e101c95b
    Overlay.AsynchronousParameters.UserApcContext = fe77f008
    Overlay.AllocationSize = 00000000 - 00000000
    CancelRoutine = 00000000   
    UserBuffer = 00000000
    &Tail.Overlay.DeviceQueueEntry = fcc207c8
    Tail.Overlay.Thread = fd089e6c
    Tail.Overlay.AuxiliaryBuffer = e101cb86
    Tail.Overlay.ListEntry.Flink = 00000000
    Tail.Overlay.ListEntry.Blink = 00000000
    Tail.Overlay.CurrentStackLocation = 00000000
    Tail.Overlay.OriginalFileObject = 00000000
    Tail.Apc = 00300012
    Tail.CompletionKey = 00300012
         cmd  flg cl Device   File     Completion-Context
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

                         Args: 00000000 00000000 00000000 00000000
     [  4, 0]   0  0 fe755ab8 00000000 f5ce857e-fe709df8   
                  \Driver\Disk  PartMgr!PmIoCompletion
                         Args: 00000000 00000000 00000000 0000000c
     [  4, 0]   0  0 fe709df8 00000000 f5c16558-fe6da7f0   
                  \Driver\PartMgr     ftdisk!FtpRefCountCompletionRoutine
                         Args: 00000000 00000000 00000000 0000000c
     [  4, 0]   0  0 fe6da738 00000000 f5bc0638-fe6d6a80   
                  \Driver\Ftdisk      volsnap!VspRefCountCompletionRoutine
                         Args: 00000000 00000000 00000000 0000000c
     [  4, 0]   0  0 fe6d69c8 00000000 f5d0af28-00000000   
                  \Driver\VolSnap     vsp
                         Args: 00000000 00000000 00000000 0000000c
     [  4, 0]   0  0 fe6d66c0 00000000 f5a9351c-fde38cd0   
                  \Driver\VSP   Ntfs!NtfsSingleAsyncCompletionRoutine
                         Args: 00000000 00000000 00000000 0000000c
     [  4, 0]   0  0 fe00d718 00000000 00000000-00000000   
                  \FileSystem\Ntfs
                         Args: 00000000 00000000 00000000 00000000

    What is apparent from looking at this IRP is that it is flagged as completed.  Moreover, the information in the IoStatus fields looks to be relevant as well.  But after studying the situation a little deeper, it appeared that the completion routine had never been fired.  We can find out more about the completion routine by dumping out the Tail.Apc portion of the IRP as shown below:

    0: kd> dt nt!_IRP fcc20788 Tail.Apc.
       +0x040 Tail      :
          +0x000 Apc       :
             +0x000 Type      : 0x12 ''
             +0x001 SpareByte0 : 0 ''
             +0x002 Size      : 0x30 '0'
             +0x003 SpareByte1 : 0 ''
             +0x004 SpareLong0 : 0
             +0x008 Thread    :
    0xfe77f1e0
    _KTHREAD
             +0x00c ApcListEntry : _LIST_ENTRY [ 0xfcde8ce4 - 0xfd089e6c ]
             +0x014
    KernelRoutine : 0xe101cb86           void  nt!IopCompletePageWrite+0
             +0x018 RundownRoutine : (null)
             +0x01c NormalRoutine : (null)
             +0x020 NormalContext : (null)
             +0x024 SystemArgument1 : (null)
             +0x028 SystemArgument2 : (null)
             +0x02c ApcStateIndex : 0 ''
             +0x02d ApcMode   : 0 ''
             +0x02e
    Inserted  : 0x1 ''

    Recall from the rules of IRP processing on Windows that IRPS like these have their completion routines called within the thread context that initiated the I/O.  As you can see above, that thread is the same thread that is running nt!MiMappedPageWriter.  Moreover, the Inserted flag is set above, which means that the APC has been placed in the APC queue for the thread.  Deductive reasoning would imply that if the completion routine has not run, then the APC has not been delivered.  And the APC will not be delivered if normal kernel mode APCs are disabled at the moment.  So, let’s check on that by looking in the nt!KTHREAD structure:

    0: kd> dt nt!_KTHREAD 0xfe77f1e0 KernelApcDisable
       +0x070 KernelApcDisable :
    -2

    Sure enough, kernel APCs are disabled for this thread at the moment.  How can that be?  Well, there are several ways to disable normal kernel APC delivery and it often involves either directly or indirectly entering a critical or guarded region.  Critical regions are entered directly via KeEnterCriticalRegion and guarded regions are entered via KeEnterGuardedRegion.  However, there are several means of indirectly entering critical regions including FsRtlEnterFileSystem.    Additionally, holding a mutex object automatically places the holder in a critical region.

    Therefore, the root cause in this case was that the file system drivers appear to have put the thread into a state where it cannot receive APCs and, therefore, I/O initiated on that thread could not be completed.  The APCs build up in the queue so that they can be delivered when kernel APC delivery is re-enabled.  Incidentally, the documentation for FsRtlEnterFileSystem states that file system filter drivers should never disable normal kernel APCs across calls to IoCallDriver.

    Conclusion

    Many times, when it looks like you are hitting up against a brick wall in determining what went wrong in a dump, you can get a lot further than you initially expect with a little bit of intuition and the right tools.  Of course, this intuition will grow as you become more and more familiar with the Windows operating system internals, or whatever platform you work on.  Using !search to search physical memory in the dump file can help find references (a.k.a pointers) to objects in hard to find places.  Additionally, pool tag information along with the helpful text in the pooltag.txt file displayed by !pool goes a long way when it comes to figuring out what a particular pool block is used for.  Armed with all of these tools, you can always get farther than one may initially expect.

    Happy debugging everyone!

    "The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious.  No association with any real company, organization, product, domain name, email address, logo, person, places, or events is intended or should be inferred."

     

    Share this post :
Page 1 of 1 (4 items)