Welcome to MSDN Blogs Sign in | Join | Help

 

Hello, I am East with a quick review of what to do with your memory dumps before opening a support case. It's not uncommon for an entity to open a Microsoft support case with the intent of having a dump file examined by the escalation team. In some cases the memory dump is deemed corrupt which ultimately adds expense and additional hours to resolve the case. The purpose of this blog is to show you ways to potentially save money by doing a preliminary 'memory dump check' before opening the support case. This article is targeted at system administrators who don't necessarily spend a great chunk of time in the debugger, but may be tasked with fixing a crashing server.

  

DUMPCHK

 

Microsoft provides a free debugging tools package to review memory dumps. After installing the tools, use Dumpchk.exe to validate both kernel and user memory dump files. It's a simple command line tool that doesn't even need the debugger.

 

The syntax for the command is

 

 dumpchk [-y <sympath>] <dumpfile>

 

The -y parameter is used to specify a symbol path and is not always necessary for all dump files, however I recommend using it to show additional information. Simply point it to the Microsoft public symbol store at this path - http://msdl.microsoft.com/download/symbols (note: this link is not accessible through internet browsers).

 

Here's an example -

 

clip_image002

With a user dump you would typically see output like this

clip_image004

Look for the "Finished dump check" message which usually indicates success. It is still possible for other forms of corruption in the dumps however Dumpchk is a great way to rule out obvious problems.

clip_image006

Note that other errors may be listed, some of which are actually benign. For example, the following error message does not represent a problem:

 

error 3 InitTypeRead( nt!_PEB at 7ffd5000) 

 

As a general rule look for "Finished dump check".

 

 

Validating a Corrupt Dump In the Debugger

 

In addition to using Dumpchk, I recommend opening the memory dump in the debugger to look for some of the more common problems. Remember we're not actually debugging to determine the root cause of the crash. Instead we're looking for errors to validate the health of the memory dump so you don't waste your time sending us a bad memory dump file.  It's easy to open a memory dump in the debugger.

 

1. Install the latest debugging tools package from the web. Download the correct platform, either x86 or x64.

2. Install the package. I recommend using C:\Debuggers for the install directory.

3. After the installation is complete, open Windbg and set the symbol path by selecting "File", then "Symbol File Path"

clip_image008

4. After clicking OK to set your symbol path, open the memory dump by selecting "File", then "Open Crash Dump". After opening the dump file, look for any of the conditions below.

 

 

NOTE: the following techniques are only applicable to Kernel memory dumps.

 

 

If you see the message, "KdDebuggerDataBlock is not present or unreadable" the dump is most likely corrupt. Several things can cause the corruption including a page file too small to accommodate the contents of memory, inadequate disk space to hold the dump, or perhaps a problem with the OS recognizing Dynamic Hardware Partitioning. This is only a partial list of things to check.

 

 

**************************************************************************

THIS DUMP FILE IS PARTIALLY CORRUPT.

KdDebuggerDataBlock is not present or unreadable.

**************************************************************************

Unable to read PsLoadedModuleList

**************************************************************************

THIS DUMP FILE IS PARTIALLY CORRUPT.

KdDebuggerDataBlock is not present or unreadable.

**************************************************************************

 

I need to point out one caveat about the error above. If it is accompanied by I/O errors then chances are you're opening the dump located on a network share, and the debugger is having problems accessing the share.

 

Ignored in-page I/O error

Ignored in-page I/O error

Ignored in-page I/O error

Ignored in-page I/O error

Ignored in-page I/O error

Exception 0xc0000006 while accessing file mapping

Unable to read PsLoadedModuleList

Ignored in-page I/O error

Ignored in-page I/O error

Ignored in-page I/O error

Ignored in-page I/O error

Ignored in-page I/O error

Exception 0xc0000006 while accessing file mapping

 

Copy the dump to the machine where the debugger is installed and reopen it in Windbg. If you continue to see the KdDebuggerDataBlock is not present or unreadable error locally then the dump is corrupt.

 

If you see this message it simply means the symbol file path is incorrect. Check for typos in the 'Symbol Search Path' dialog box. Remember to point your symbol path to http://msdl.microsoft.com/download/symbols

 

 

*******************************************************************************

*                                                                             *

*                        Bugcheck Analysis                                    *

*                                                                             *

*******************************************************************************

 

Use !analyze -v to get detailed debugging information.

 

BugCheck 50, {fffffff0, 0, 8086d649, 0}

 

***** Kernel symbols are WRONG. Please fix symbols to do analysis.

 

*************************************************************************

***                                                                   ***

***                                                                   ***

***    Your debugger is not using the correct symbols                 ***

***                                                                   ***

***    In order for this command to work properly, your symbol path   ***

***    must point to .pdb files that have full type information.      ***

***                                                                   ***

***    Certain .pdb files (such as the public OS symbols) do not      ***

***    contain the required information.  Contact the group that      ***

***    provided you with these symbols if you need this command to    ***

***    work.                                                          ***

***                                                                   ***

***    Type referenced: nt!_KPRCB                                     ***

***                                                                   ***

*************************************************************************

 

 

You can ignore these messages-

 

“Page ????? not present in the dump file. Type ".hh dbgerr004" for details “

“Page ????? not present in the dump file. Type ".hh dbgerr004" for details “

“Page ????? not present in the dump file. Type ".hh dbgerr004" for details “

“Page ????? not present in the dump file. Type ".hh dbgerr004" for details “

 

 

This indicates a page of memory needed by the debugger is missing from the dump. This does not mean the dump is corrupt and can be ignored, for the most part.

 

 

Page File Issues - One of the most common causes of corruption is an improperly configured page file. I highly recommend reviewing “How to generate a kernel or a complete memory dump file in Windows Server 2008” to properly configure the page file. If you don't have immediate access to the machine that generated the dump, and want to check for page file problems in the debugger, you can use the !vm command to weed out obvious issues.

 

clip_image010

If you see something similar to the output below, it typically indicates the pagefile was not large enough, or the pagefile could not be found (notice the values given are so large as to not make sense).

 

 

kd> !vm

 

*** Virtual Memory Usage ***

Physical Memory:     3145107 (  12580428 Kb)

Paging File Name paged out

  Current: 6385980911134738644 Kb  Free Space: 870495645396584060 Kb

  Minimum: 2351575501587540372 Kb  Maximum:    3053207781125429928 Kb

 

 

It's a good idea to set the page file at least 1 MB larger than the size of physical memory. Additionally the page file must be present on the system drive. This is the type of output you would see if the page file size is wrong or no page file was specified.

 

kd> !vm

 

*** Virtual Memory Usage ***

Physical Memory:  1903435   ( 7613740 Kb)

00000000: Unable to get page file

00000000: Unable to get paged pool info 0

unable to get nt!MmAllocatedNonPagedPool

unable to get nt!MmExtendedCommit

unable to get nt!MmPageFileFullExtendPages

unable to get nt!MmPageFileFullExtendCount

unable to get nt!MmTotalFreeSystemPtes

unable to get nt!MmSystemLockPagesCount

unable to get nt!MiSpecialPagesNonPaged

unable to get nt!MiSpecialPagesNonPagedMaximum

00000004: Unable to get number of free nonpaged PTEs

unable to get nt!MmSpecialPagesInUse

Available Pages:  1301896   ( 5207584 Kb)

ResAvail Pages:   1815514   ( 7262056 Kb)

Modified Pages:       426   (    1704 Kb)

NonPagedPool Usage: 22058   (   88232 Kb)

NonPagedPool Max:   65536   (  262144 Kb)

8c5f7000: Unable to get pool descriptor

 

If the system has no page file you might see the message below when loading the dump

 

 

**************************************************************************

THIS DUMP FILE IS PARTIALLY CORRUPT.

KdDebuggerDataBlock is not present or unreadable.

**************************************************************************

Unable to read PsLoadedModuleList

......

......

Unable to get program counter

GetContextState failed, 0xD0000147

Unable to get current machine context, NTSTATUS 0xC0000147 <- this machine may not have a pagefile.

GetContextState failed, 0xD0000147

GetContextState failed, 0xD0000147

 

 

For additional information on memory dumps check the KB. Hopefully the information in this article will help you make better business decisions by evaluating whether it's too early to open a support call due to a corrupt dump. Thanks for the read!

 

Share this post :

 

Hello, Venkatesh blogging here again. Recently I had a collaboration request to determine which mystery component was writing to the application event log. This was the event-

 

Event Type:        Error

Event Source:        ODBC

Event Category:        None

Event ID:        0

Date:                5/12/2009

Time:                2:13:19 PM

User:                N/A

Computer:        TX163611

Description:

The description for Event ID ( 0 ) in Source ( ODBC ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: Failed to enlist in DTC: SQL state 42000, native error 8509, error message [Microsoft][ODBC SQL Server Driver][SQL Server]Import of Microsoft Distributed Transaction Coordinator (MS DTC) transaction failed: 0x8004d00e(XACT_E_NOTRANSACTION)..

The external partner's machine was flooded with this event in the event log. To reproduce the problem in-house, our Developer Support team wrote a small test app to repro the issue locally. Unfortunately the issue reproduced only when we single stepped through the test app in Visual Studio, but it would work in a pinch.  The test application was not writing this event but we suspected  one of the background services from the application talked to MSDTC.exe or SQLServer.exe, and it may be writing this event.

 We know that applications use Win32 API ReportEvent (part of Advapi32.dll) to write events into event logs.  Advapi32 makes a Local RPC call to the Event Log Service running in Services.exe process. It’s the Event Log service that actually writes events to the log files. We can verify this using procmon which should show services.exe writing into the event log files.

 So I setup the machine for the kernel debug. Since I didn’t know who was writing to the application log, I needed to set a break point at the server side.  I set a break point on EventLog!ElfReportEventW in the Services.exe process using "!bpid < PID>" (more info on !bpid can be found in debugger help). EventLog!ElfReportEventW is the server side of the RPC to Advapi32!ReportEventW call.

First I got the PID of services.exe process:

 

kd> !process 0 0 services.exe

PROCESS 8d9aa020  SessionId: 0  Cid: 01ac    Peb: 7ffde000  ParentCid: 017c

    DirBase: 3549e080  ObjectTable: e151e698  HandleCount: 355.

    Image: services.exe

Next I used !bpid <PID> to set the context of the breakpoint in services.exe process:

 

 

kd> !bpid 01ac   

Finding winlogon.exe (0)...

Waiting for winlogon.exe to break.  This can take a couple of minutes...

Break instruction exception - code 80000003 (first chance)

Stepping to g_BreakinProcessId check...

Break into process 1ac set.  The next break should be in the desired process.

Break instruction exception - code 80000003 (first chance)

ntdll!DbgBreakPoint:

001b:7c81a3e1 cc              int     3

 

Then I verified the current process was services.exe by using the -1 flag. Since we see Services.exe in the output below, we know the !bpid command broke into the correct process.

kd> !process -1 0

PROCESS 8d9aa020  SessionId: 0  Cid: 01ac    Peb: 7ffde000  ParentCid: 017c

    DirBase: 3549e080  ObjectTable: e151e698  HandleCount: 355.

    Image: services.exe

This is how I loaded the user mode symbols to set the break point in Eventlog.dll

 

kd> .process /p /r 8d9aa020 

Implicit process is now 8d9aa020

.cache forcedecodeuser done

Loading User Symbols

......................

Here I'm setting the eventlog!ElfrReportEventW break point:

 

kd> x eventlog!*Reportevent*

7575b18f eventlog!ElfrReportEventAndSourceW = <no type information>

75753324 eventlog!ElfIntReportEvent = <no type information>

7575372e eventlog!ElfrReportEventW = <no type information>

75753a07 eventlog!ElfrReportEventA = <no type information>

kd> bp 7575372e

kd>

With the debugger ready, I single stepped through the test app in Visual Studio to repro the problem. The break point EventLog!ElfReportEventW hit in the services.exe process. Using the LPC message I was able to determine the client of the RPC call which turned out to be the test app.

 

kd> g

Breakpoint 0 hit

eventlog!ElfrReportEventW:

001b:7575372e 8bff            mov     edi,edi

Here I looked at the current thread to see the call stack:

kd> !thread

THREAD 8e0413d0  Cid 01ac.0694  Teb: 7ffa6000 Win32Thread: 00000000 RUNNING on processor 0

Not impersonating

DeviceMap                 e1000128

Owning Process            8d9aa020       Image:         services.exe

Attached Process          N/A            Image:         N/A

Wait Start TickCount      75870          Ticks: 0

Context Switch Count      2754            

UserTime                  00:00:00.281

KernelTime                00:00:01.968

Win32 Start Address 0x0006e72d

LPC Server thread working on message Id 6e72d

Start Address kernel32!BaseThreadStartThunk (0x77e617ec)

Stack Init f5d87000 Current f5d86c24 Base f5d87000 Limit f5d84000 Call 0

Priority 10 BasePriority 9 PriorityDecrement 0

ChildEBP RetAddr  Args to Child             

0129f8ac 77c80193 01021fe8 4a09ca4f 00000001 eventlog!ElfrReportEventW (FPO: [14,0,0])

0129f8f8 77ce33e1 7575372e 0129fae0 0000000e RPCRT4!Invoke+0x30

0129fcf8 77ce35c4 00000000 00000000 000d7c2c RPCRT4!NdrStubCall2+0x299 (FPO: [SEH])

0129fd14 77c7ff7a 000d7c2c 000b6e78 000d7c2c RPCRT4!NdrServerCall2+0x19 (FPO: [1,1,0])

0129fd48 77c8042d 75751b51 000d7c2c 0129fdec RPCRT4!DispatchToStubInCNoAvrf+0x38 (FPO: [SEH])

0129fd9c 77c80353 0000000b 00000000 75760050 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x11f (FPO: [4,13,4])

0129fdc0 77c811dc 000d7c2c 00000000 75760050 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3 (FPO: [4,0,4])

0129fdfc 77c812f0 000d79c0 000a1028 000ce2c0 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0x42c (FPO: [0,6,4])

0129fe20 77c88678 000a1060 0129fe38 000d79c0 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0x127 (FPO: [4,4,4])

0129ff84 77c88792 0129ffac 77c8872d 000a1028 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430 (FPO: [0,14,0])

0129ff8c 77c8872d 000a1028 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd (FPO: [1,0,0])

0129ffac 77c7b110 0009f4c0 0129ffec 77e6482f RPCRT4!BaseCachedThreadRoutine+0x9d (FPO: [1,2,4])

0129ffb8 77e6482f 000ccb58 00000000 00000000 RPCRT4!ThreadStartRoutine+0x1b (FPO: [1,0,0])

0129ffec 00000000 77c7b0f5 000ccb58 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

 

From the !thread output, we know the thread above is working on LPC message 6e72d. We can use !lpc command to get the client thread:

 

kd> !lpc message 6e72d

Searching message 6e72d in threads ...

    Server thread 8e0413d0 is working on message 6e72d                        

Client thread 8da33220 waiting a reply from 6e72d                         

Searching thread 8da33220 in port rundown queues ...

 

Server communication port 0xe2875380

    Handles: 1   References: 1

    The LpcDataInfoChainHead queue is empty

        Connected port: 0xe23cedd0      Server connection port: 0xe1ceb2b0

 

Client communication port 0xe23cedd0

    Handles: 1   References: 2

    The LpcDataInfoChainHead queue is empty

 

Server connection port e1ceb2b0  Name: ntsvcs

    Handles: 1   References: 129

    Server process  : 8d9aa020 (services.exe)

    Queue semaphore : 8db981a0

    Semaphore state 0 (0x0)

    The message queue is empty

 

    Messages in LpcDataInfoChainHead:

        0000 e22aa008 - Busy  Id=0006e72d  From: 0f60.0db4  Context=80680017  [e1ceb330 . e1ceb330]

                   Length=0044002c  Type=00380001 (LPC_REQUEST)

                   Data: 00000001 000b0242 8092d465 8dad78d0 8dad78e8 8e34a720

    The LpcDataInfoChainHead queue contains 1 messages

    Threads in RunDown queue :     0xe22a9e48

Done. 

From the output above, we know the client thread is 8da33220. No we can use the !thread command to get the process it belongs to:

 

kd> !thread 8da33220

THREAD 8da33220  Cid 0f60.0db4  Teb: 7ffdd000 Win32Thread: e2890c20 WAIT: (Unknown) UserMode Non-Alertable

    8da3340c  Semaphore Limit 0x1

Waiting for reply to LPC MessageId 0006e72d:

Current LPC port e23cedd0

Not impersonating

DeviceMap                 e2295b28

Owning Process            8d1f95d8       Image:         ODBCLoggingTest.exe

Attached Process          N/A            Image:         N/A

Wait Start TickCount      75870          Ticks: 0

Context Switch Count      2049                 LargeStack

UserTime                  00:00:00.343

KernelTime                00:00:00.203

Win32 Start Address 0x79007bf0

Start Address kernel32!BaseProcessStartThunk (0x77e617f8)

Stack Init f54ee000 Current f54edc20 Base f54ee000 Limit f54ea000 Call 0

Priority 10 BasePriority 8 PriorityDecrement 0

ChildEBP RetAddr  Args to Child             

f54edc38 8082ffd7 8da33220 8da332c8 000007c1 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

f54edc50 808287d4 8da3340c 8da333e0 8da33220 nt!KiSwapThread+0x83 (FPO: [0,2,0])

f54edc94 80916b4c 8da3340c 00000011 00000001 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

f54edd50 80883938 00000478 00209058 00209058 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

f54edd50 7c82860c 00000478 00209058 00209058 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f54edd64)

0012b80c 00000000 00000000 00000000 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

 

Currently we are in services process context. We need to change to the LPC client thread process context to get the complete call stack. We do this by using the .thread command (instead of !thread):

 

kd> .process /p /r 8d1f95d8      

Implicit process is now 8d1f95d8

.cache forcedecodeuser done

Loading User Symbols

................................................................

............

 

kd> !thread 8da33220

THREAD 8da33220  Cid 0f60.0db4  Teb: 7ffdd000 Win32Thread: e2890c20 WAIT: (Unknown) UserMode Non-Alertable

    8da3340c  Semaphore Limit 0x1

Waiting for reply to LPC MessageId 0006e72d:

Current LPC port e23cedd0

Not impersonating

DeviceMap                 e2295b28

Owning Process            8d1f95d8       Image:         ODBCLoggingTest.exe

Attached Process          N/A            Image:         N/A

Wait Start TickCount      75870          Ticks: 0

Context Switch Count      2049                 LargeStack

UserTime                  00:00:00.343

KernelTime                00:00:00.203

Win32 Start Address mscoree!_CorExeMain (0x79007bf0)

Start Address KERNEL32!BaseProcessStartThunk (0x77e617f8)

Stack Init f54ee000 Current f54edc20 Base f54ee000 Limit f54ea000 Call 0

Priority 10 BasePriority 8 PriorityDecrement 0

*** WARNING: Unable to verify checksum for System.Data.ni.dll

*** ERROR: Module load completed but symbols could not be loaded for System.Data.ni.dll

ChildEBP RetAddr  Args to Child             

f54edc38 8082ffd7 8da33220 8da332c8 000007c1 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

f54edc50 808287d4 8da3340c 8da333e0 8da33220 nt!KiSwapThread+0x83 (FPO: [0,2,0])

f54edc94 80916b4c 8da3340c 00000011 00000001 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])

f54edd50 80883938 00000478 00209058 00209058 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])

f54edd50 7c82860c 00000478 00209058 00209058 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f54edd64)

0012b7bc 7c827899 77c80a6e 00000478 00209058 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

0012b7c0 77c80a6e 00000478 00209058 00209058 ntdll!ZwRequestWaitReplyPort+0xc (FPO: [3,0,0])

0012b80c 77c7fcf0 0012b848 0012b82c 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230 (FPO: [1,11,0])

0012b818 77c80673 0012b848 7d1f73e8 0012bc34 RPCRT4!I_RpcSendReceive+0x24 (FPO: [1,0,0])

0012b82c 77ce315a 0012b874 00223340 00222b90 RPCRT4!NdrSendReceive+0x2b (FPO: [2,0,0])

0012bc14 7d1fc005 7d1f73e8 7d1fba3c 0012bc34 RPCRT4!NdrClientCall2+0x22e (FPO: [SEH])

0012bc2c 7d1fbfc6 00222ad0 4a09ca4f 00000001 ADVAPI32!ElfrReportEventW+0x1c (FPO: [14,1,0])

0012bca4 7d1fbf08 00222ad0 00000001 00000000 ADVAPI32!ElfReportEventW+0x5a (FPO: [SEH])

0012bd0c 488c4e16 00222ad0 00000001 00000000 ADVAPI32!ReportEventW+0xcb (FPO: [SEH])

0012dd4c 488c2bf1 0012e1c8 0012e9c8 0000213d ODBC32!Log+0xa9 (FPO: [1,2051,0])

0012e184 488c32c2 00000000 0012e1c8 0000213d ODBC32!HashForTrace+0xb7 (FPO: [4,262,0])

0012e9d8 4a757523 03f71ac0 03f72f30 04310128 ODBC32!CDispenser::EnlistResource+0x22d (FPO: [3,524,4])

0012ea10 4a7580f4 0020cff4 03f72f30 04310128 comsvcs!CHolder::SafeDispenserDriver::EnlistResource+0x22 (FPO: [SEH])

0012ea48 4a75892a 03f72f30 00000000 00000001 comsvcs!CHolder::AddRes+0xdb (FPO: [6,6,4])

0012eaac 488c3551 04310128 03f71b48 0012eb10 comsvcs!CHolder::AllocResource+0x346 (FPO: [3,14,4])

0012eacc 488c36b4 03f71ac0 03f71b48 0012eb10 ODBC32!CDispenser::TryAllocResource+0x49 (FPO: [4,0,4])

0012eb04 488c45c0 03f71ac0 03f72f30 00000000 ODBC32!CDispenser::GetActiveConnection+0x31 (FPO: [2,5,4])

0012eb20 488a29d3 03f71ab8 00000000 0012f69c ODBC32!GetActiveConn+0x7e (FPO: [2,0,4])

0012f260 0090ba00 03f71b48 00000000 012a3620 ODBC32!SQLDriverConnectW+0x9df (FPO: [8,457,4])

WARNING: Frame IP not in any known module. Following frames may be wrong.

0012f2a4 655ca10c 0012f308 013049e8 012e025c 0x90ba00

0012f318 655ca612 012e2db4 013049e8 0130495c System_Data_ni+0x47a10c

0012f32c 655cde8a 012e2db4 012e025c 00000000 System_Data_ni+0x47a612

0012f348 655cdb78 012e025c 012c3df0 0130495c System_Data_ni+0x47de8a

0012f378 656c681d 012e05b8 00000000 012c3df0 System_Data_ni+0x47db78

0012f39c 65223831 0130495c 012e0160 00000000 System_Data_ni+0x57681d

0012f3cc 655c07e1 012c3df0 0012f410 0012f428 System_Data_ni+0xd3831

0012f480 79e71b4c 0012f4cc 00000000 0012f510 System_Data_ni+0x4707e1

0012f490 79e821b1 0012f560 00000000 0012f530 mscorwks!CallDescrWorker+0x33

0012f510 79e96501 0012f560 00000000 0012f530 mscorwks!CallDescrWorkerWithHandler+0xa3 (FPO: [SEH])

0012f648 79e96534 0091c010 0012f714 0012f6e0 mscorwks!MethodDesc::CallDescr+0x19c (FPO: [5,12,4])

0012f664 79e96552 0091c010 0012f714 0012f6e0 mscorwks!MethodDesc::CallTargetWorker+0x1f (FPO: [4,0,0])

0012f67c 79eefa45 0012f6e0 38a6f4e1 00000000 mscorwks!MethodDescCallSite::CallWithValueTypes+0x1a (FPO: [1,0,0])

0012f7e0 79eef965 00912ff0 00000001 0012f81c mscorwks!ClassLoader::RunMain+0x223 (FPO: [SEH])

0012fa48 79eefeb5 00000000 38a6fc19 00000001 mscorwks!Assembly::ExecuteMainMethod+0xa6 (FPO: [1,144,4])

0012ff18 79ef009f 00400000 00000000 38a6fc69 mscorwks!SystemDomain::ExecuteMainMethod+0x456 (FPO: [2,301,4])

 

From the test app call we can clearly see that ODBC32.dll is writing the event to the event log using Advapi32!ReportEventW. With this information I engaged our internal ODBC32.dll development team to continue the investigation.

If we had known the test app was writing to the event log, we could have avoided the kernel debugging and set a breakpoint on Advapi32!ReportEventW in the Visual Studio itself. Since I use windbg, I didn’t even know you could set break points in OS modules while debugging in Visual Studio.

For other debuggers in the same boat, this is how you set breakpoint in Visual Studio on OS APIs.

1.  First setup the OS symbol path by either using the  _NT_SYMBOL_PATH environment variable, or by using the UI (Tools->Options->Debugging->Symbols) window.

2.  From the  Debug->New Breakpoint menu, select "Break at Function..."

3.  In the "New Breakpoint" window, enter {,,advapi32.dll}ReportEventW in the Function: box and click OK.

 More information about the syntax can be found at the below link:

http://msdn.microsoft.com/en-us/library/wztycb7f(VS.80).aspx

In conclusion, if you don’t know who is writing a particular event to the event log, set a breakpoint on eventlog!ElfrReportEventW in the services.exe process, and use the !lpc command to find the module\thread (Call stack)\process writing the event in the event log.

 

Feel free to "dialog" about the post on our Twitter page - http://twitter.com/ntdebugging

 

Share this post :

Hi Debuggers,

 

Here is a short video (6 min.) of how to root cause CPU consumption issues on Vista/Server 2008 and higher operating systems with Xperf. Download the video from the 'Attachement link' below.

 

For this scenario, Xperf is an easy to implement and powerful replacement for previous actions like gathering a series of dumps during the CPU, kernrate data, perfmon, other profilers, etc.

 

Xperf allows one to quickly turn on profiling and see where the CPU time is being spent by using tracing already in the box!

 

-Tate

 

 

Note - Feel free to post questions here, or jump into a discussion about the video on our Twitter page ( @ntdebugging ).

 

 

Share this post :

We're now on Twitter @ http://twitter.com/ntdebugging. If you want to chime in with feedback or suggestions for future articles feel free to drop us a "tweet". I thought this was would give us a better venue to interact with you guys.

 Cheers,

Ron Stock

My name is Ryan Mangipano (ryanman) and I am a Sr. Support Escalation Engineer at Microsoft.  Today’s blog will consist of a complete walkthrough of my recent analysis of a stop 0x50 along with the steps that  led me to  identify that  this crash was caused by pool corruption. In this particular case, I found the answer before completing the actual debug.

 

To begin the analysis, I entered the !analyze –v command into the debugger and examined the output relating to the bugcheck.  This is always my first step since it usually provides  an overview of what type of problem occurred. In the output listed below, you can see that the system has bugchecked due to an attempt by kernel mode code to access the invalid address 0xffffff50. 

 

 PAGE_FAULT_IN_NONPAGED_AREA (50)

Invalid system memory was referenced.  This cannot be protected by try-except,

it must be protected by a Probe.  Typically the address is just plain bad or it

is pointing at freed memory.

Arguments:

Arg1: ffffff50, memory referenced.

Arg2: 00000000, value 0 = read operation, 1 = write operation.

Arg3: 80846dd1, If non-zero, the instruction address which referenced the bad memory

                address.

Arg4: 00000000, (reserved)

 

Before proceeding with our debugging, I would like to point out that invalid addresses such as the one listed above are often obtained when code subtracts values from a null pointer.  To illustrate that  this particular address 0xffffff50 could be obtained in such a manner, I have provided output from the .formats debugger command below.  Notice that the  hex vaule 0xffffff50  can be used to represent negative 176 as a decimal.  It is a relatively common situation to arrive at a negative value by subtracting values from a null pointer (which is zero).   Since addresses are actually unsigned, this is really just an invalid positive address. 

 

0: kd> .formats ffffff50

Evaluate expression:

  Hex:     ffffff50

  Decimal: -176

 

 

Let’s proceed with our investigation.  The output of !analyze –v also provided us with the following:

 

TRAP_FRAME:  f4009940 -- (.trap 0xfffffffff4009940)

 

 This provided a  trap frame address in the form of a DML (Debugger Markup Language)  link that can be clicked on to quickly enter the command into the debugger.

 

Passing .trap the address of a trap frame will show the register state when the trap was generated.  This will also set the context to the state when the trap was generated.  This information was saved into the trap frame by the trap handler that executed after  the trap occurred.

 

 The trap frame address listed above could have instead been obtained by dumping out the stack using the command  kv .  !analyze –v is very good at displaying  the correct trapframe, however I feel that it is a good practice to dump out the stack to verify that the trap frame you are entering is in fact the one that you are interested in. You can see in the output below that there are two trapframes listed. I am interested in the topmost one that matches the trap that led us to the bugcheck. I can see that analyze –v has listed the trapframe that we are interested in, so I entered it into the debugger.

 

0: kd> kv

ChildEBP RetAddr  Args to Child             

f40098d8 808692ab 00000050 ffffff50 00000000 nt!KeBugCheckEx+0x1b (FPO: [5,0,0])

f4009928 80836c2a 00000000 ffffff50 00000000 nt!MmAccessFault+0x813 (FPO: [4,10,4])

f4009928 80846dd1 00000000 ffffff50 00000000 nt!KiTrap0E+0xdc (FPO: [0,0] TrapFrame @ f4009940)

f40099b0 809426b2 a9131e08 00000012 861ce318 nt!ObReferenceObjectSafe+0x3 (FPO: [0,0,0])

f40099c8 808544d9 861ce318 0007d960 00000000 nt!PsGetNextProcess+0x6c (FPO: [1,1,0])

f4009a58 8094292f 01210048 0007d960 00000000 nt!ExpGetProcessInformation+0x36d (FPO: [SEH])

f4009d4c 80833bef 00000005 01210048 0007d960 nt!NtQuerySystemInformation+0x11e0 (FPO: [SEH])

f4009d4c 7c8285ec 00000005 01210048 0007d960 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ f4009d64)

 

0: kd> .trap 0xfffffffff4009940

 

The output from the .trap command will be similar to the ouput from the r command. The  r command will dump out the registers and the current instruction that caused the error. Listed below are the output of the two commands:

 

0: kd> .trap 0xfffffffff4009940

ErrCode = 00000000

eax=861ce300 ebx=808b5be8 ecx=ffffff68 edx=ffffff50 esi=8747c8d0 edi=00000000

eip=80846dd1 esp=f40099b4 ebp=f40099c8 iopl=0         nv up ei pl nz ac pe cy

cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010217

nt!ObReferenceObjectSafe+0x3:

80846dd1 8b0a            mov     ecx,dword ptr [edx]  ds:0023:ffffff50=????????

 

0: kd> r

Last set context:

eax=861ce300 ebx=808b5be8 ecx=ffffff68 edx=ffffff50 esi=8747c8d0 edi=00000000

eip=80846dd1 esp=f40099b4 ebp=f40099c8 iopl=0         nv up ei pl nz ac pe cy

cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010217

nt!ObReferenceObjectSafe+0x3:

80846dd1 8b0a            mov     ecx,dword ptr [edx]  ds:0023:ffffff50=????????  ß This is the current instruction. We trapped on this instruction trying to dereference edx which contained ffffff50.

 

Next, we shall  dump out the stack and try to get a feel for what happened based on the function names listed in the stack. Remember that the functions at the bottom of the stack called the functions above them.  Here are the function names listed in the  top four stack frames along with my annotation. In this case I read the stack starting at the bottom and working my way up instead of the traditional way of starting at the bad value and backtracing to the problem.

 

0: kd> kcL4

·         nt!ObReferenceObjectSafe ß Finally, this function encountered the bad value and crashed. To observe this for yourself, please examine  the output of the 'r' command listed above.  You will notice that the current instruction at the time of the trap (the address in EIP) was unassembled. Above this instruction the function name and offset are listed  as nt!ObReferenceObjectSafe+0x3.  This means that the instruction we were executing when the trap occurred was located at offset 0x3 from the location that the symbol ObReferenceObjectSafe references.

·         nt!PsGetNextProcess ß Based on the function name here, it appears that we were trying to traverse the linked list of active processes on the system.

·         nt!ExpGetProcessInformation ß NtQuerySystemInformation() called this function. Based on the function name GetProcessInformation and the call listed above,  this was obviously an attempt to get information about the processes running on the system

·         nt!NtQuerySystemInformation ß We were trying to query system information

 

                Normally, my next step would typically be to unassemble the current function with  uf @$ip , locate the current instruction in the output by pasting the address in eip into the find diaglog box, and proceed to backtrace the source of the bad value until I had located the source. However, I have a bad habit of poking around for fun by dumping out various data while I am debugging. The  function names on the stack seemed to indicate that we were working with the list of processes. The address we died on suggested that we may have encountered a null pointer. I couldn't resist pausing a moment to dump out the list of processes on the system to see if we had encountered a null pointer while traversing the list of processes on the system. After all, I've reviewed dumps before that were that simple. Even if I found a null pointer in the list, I would still have to perform the actual debug and walk the assembly to prove it and identify how we crashed.  However, I felt it would be more exciting to work in the reverse of my normal  methods. Besides, it only takes a moment once you know what you are doing.

 

Let me pause a moment to provide a little background information here which will later prove relevant in understanding my method of thinking and comprehending what went wrong. In the nt!_EPROCESS  object there is a _LIST_ENTRY structure  located at offset 0x98 called ActiveProcessLinks. This forms a linked list of all the active NT!_EPROCESS structures on the system.  Let us first examine the _ LIST_ENTRY structure by reviewing the output below:

 

1: kd> dt nt!_LIST_ENTRY

   +0x000 Flink            : Ptr32 _LIST_ENTRY

   +0x004 Blink            : Ptr32 _LIST_ENTRY

 

As you can see above, a _LIST_ENTRY contains two pointers that create what is called a linked list. Blink represents a backwards link and Flink represents a forward link.  A double linked list may be traversed in either direction by following the pointer to the next or previous entry in the list. Linked lists are used heavily in operating system programming.  This particular  linked list contains a pointer to the previous _EPROCESS object’s ActiveProcessLinks field (blink) and the next _EPROCESS object’s ActiveProcessLinks field (flink).. The following command will display the activeprocesslinks field of the nt!_EPROCESS structure.

0: kd> dt nt!_EPROCESS activeprocesslinks

   +0x098 ActiveProcessLinks : _LIST_ENTRY  ß the +0x098 lets us know that this field is located 0x98 from the start of each  nt!_EPROCESS structure.  All of these EPROCESS structures on the system are linked together using the flink/blink method. This allows us to enumerate all of the processes on the system by following the links in either direction.

 

The first entry of a linked list is commonly referred to as the list head. The  list head is a _LIST_ENTRY structure that points to the first and last entry of the link list. Global variables are often used to store the location of the list head. Once you have the location of the head, you can then use it's flink/blink values to traverse the list in either direction.   Nt!PsActiveProcessHead is a global variable that points to the beginning of the list of Active Processes that we discussed in the previous paragraph. If you traverse this list starting at the head, you will end up in the activeprocesslinks field of each process on the system. This will allow code to find each _EPROCESS structure on the system by simply traversing this list. If you are an true geek like me and are  learning this for the first time, you are probably going to  fire up your debugger and dump out the list of active processes on your own system before you reach the end of this article.

Please note that by following the flink/blink you will arrive at offset 0x098 from this structure, not the start of the structure. So the complete list consists of a _LIST_ENTRY field present in each EPROCESS structure + the _LIST_ENTRY located at the memory referenced by the global variable nt!PsActiveProcessHead. You can use the 'x' examine symbols command to display this variable. I included the wildcards below only to demonstrate the ability to use wildcards with these commands.

 

0: kd> x nt!*PsActiveProcessHead*

808b5be8 nt!PsActiveProcessHead = <no type information>

0: kd> dt nt!*PsActiveProcessHead*

808b5be8  ntkrnlmp!PsActiveProcessHead

 

Back to our investigation.  At the point where we left off, we had not yet dug into the code to identify exactly what happened. We instead took an educated guess and decided to dump out the list of processes on the system. If it is found that we have a null value, this may very well allow us to work in reverse of our normal debugging.   Armed with the above background  information, we shall now dump out the list. If we get lucky, this will expedite the review of this memory dump by telling us what may have gone wrong before we even start digging into the code.

 

The beginning of the list sounds like a good place to start traversing  the entries. The value highlighted in the global variable above marks the address where the linked list starts. The following command will dump out this location and provide you with the flink and blink. The switch /c1 was used in order to limit the output to 1 column for easy blog annotation. For more information on the /c1 switch, type .hh dd into the debugger command interpreter and then press enter once the help file window appears.

 

0: kd> dd /c1 nt!PsActiveProcessHead L2

808b5be8  8a78c800  ß this is the forward link (flink)

808b5bec  85ddd510  ß this is the backwards link (blink)

 

                Now that we have the address that the list starts at, we need a way to dump it out. Windbg provides many ways  to automatically walk a linked list. You can use dt, !list, or the dl commands . In this case, let's use the  command 'dl' due to its simplicity . More information about this command may be obtained from the help file that is included with windbg.

 

0: kd> dl 0x8a78c800 ff 2   ß Here is an example of how to use this command to traverse the linked list starting at address 0x8a78c800. The value ff represents the maximum number of links to display. The value 0x8a78c800 was obtained from the flink field _LIST_ENTRY present at the global variable listed above. The value 2 represents the number of values to dump from each address. Since this linked list contains a forward link (flink) and backwards link (blink), we should pass in a value of 2 to dump both addresses.

 

As displayed below, I decided to use the global nt!PsActiveProcessHead instead of specifying an address to the command dl.  Here is the beginning of the output as displayed by the debugger.

 

0: kd> dl nt!PsActiveProcessHead ff 2

808b5be8  8a78c800 85ddd510 

8a78c800  89af0370 808b5be8  

888745a0  888492e0 89af0370

888492e0  89b17708 888745a0

89b17708  89b4e3d0 888492e0

... … Ommitting the rest of the output

 

                The plain output displayed above represents the type of output that you will see when entering the dl command into the debugger. In order to better illustrate the patterns and show the relationship between the flink and blink values,  I have bolded, underlined, highlighted, and commented various portions of the output and listed it below. Note the annotations that demonstrate how each member of the linked list points to the next and previous item. You can follow the list all the way down until you see we do in fact end up at a zeroed out address.

 

0: kd> dl nt!PsActiveProcessHead ff 2

808b5be8  8a78c800 85ddd510  <-- Notice that the dl command first displays  address  808b5be8 which is the address referenced by the symbol information for the global variable nt!PsActiveProcessHead that we provided. The next two values are the flink and the blink. I have highlighted the flink above. Notice how this flink is pointing to the next memory location listed below.

 

8a78c800  89af0370 808b5be8   <--  In this line of output, notice that the dl command has followed the flink to address 8a78c800  (which I have again highlighted)  and  is dumping out the new  flink and blink pair.  I have underlined the blink. Notice that the blink  points back to the address of the previous flink/blink pair.  

 

89af0370  888745a0 8a78c800  Notice the  patterns displayed, follow the output  all the way to the bottom, and see if you notice anything that may be a problem.

 

888745a0  888492e0 89af0370

888492e0  89b17708 888745a0

89b17708  89b4e3d0 888492e0

89b4e3d0  8883fe20 89b17708

8883fe20  89ab73e8 89b4e3d0

89ab73e8  887d7650 8883fe20

887d7650  888906a8 89ab73e8

888906a8  89483408 887d7650

89483408  88839d40 888906a8

… Ommitting output to reduce length. The links were all valid…

 

85ff6e20  8708be20 8663a278

8708be20  87176940 85ff6e20

87176940  861ce3b0 8708be20

861ce3b0  860010b8 87176940

860010b8  00000000 00000000   <-- Wait...This doesn't look correct. The blink should be 861ce3b0, not zero. The previous list entry's flink took us here to address 860010b8. It appears that our theory might just be correct. Observing this,  I suspected that memory location  860010b8 has been overwritten with zeros. It is also possible, however, that the value in the flink above was incorrect causing us to end up at some random address that contained zeros. To know for sure I dumped the list backwards to see if the next flink/blink pair had a blink pointing to 860010b8.

 

0: kd> dlb nt!PsActiveProcessHead ff 2ß This command dumps the list backwards. Below you can see the dlb command walking the linked list by starting at the list head and moving backwards (the list should make a circle). Don't be confused by the different patterns below. This is simply due to the fact that the list is being walked backward. The output displays the three value in the same order  address   flink  blink.

 

0: kd> dlb nt!PsActiveProcessHead ff 2

808b5be8  8a78c800 85ddd510 

85ddd510  808b5be8 85e58e20

85e58e20  85ddd510 8646ce20

8646ce20  85e58e20 85d92c08

85d92c08  8646ce20 8616ce20

… Ommitting output to reduce length. The links were all valid…

872ba380  86e456a0 86c8da10

86c8da10  872ba380 8692f708

8692f708  86c8da10 86d630b8

86d630b8  8692f708 8673eae0

8673eae0  86d630b8 87223a98

87223a98  8673eae0 860010b8  

860010b8  00000000 00000000    ß We arrived at the same address of 860010b8. So when walking the list both forward and backward we encounter pointers that take us to this address. This means that there should have been a valid flink/blink pair (which would be a valid nt!_EPROCESS activeprocesslinks _LIST_ENTRY ) at this address.  This memory appears to have instead been zeroed out. It’s flink pointer should be 87223a98 

 

When dumping the list out forward using dl, the line of output representing 860010b8 should have contained these values below, however it did not.

860010b8  87223a98  861ce3b0 

 

So what happed? In dumping out the area around the address, you can see it has been zeroed

 

0: kd> dd 860010b8-200 860010b8+200

86000eb8  00000000 00000000 00000000 00000000

86000ec8  00000000 00000000 00000000 00000000

86000ed8  00000000 00000000 00000000 00000000

86000ee8  00000000 00000000 00000000 00000000

… Ommitting output to reduce length. Entire range contained all zeros…

86001088  00000000 00000000 00000000 00000000

86001098  00000000 00000000 00000000 00000000

860010a8  00000000 00000000 00000000 00000000

860010b8  00000000 00000000 00000000 00000000

860010c8  00000000 00000000 00000000 00000000

860010d8  00000000 00000000 00000000 00000000

860010e8  00000000 00000000 00000000 00000000

… Ommitting output to reduce length. Entire range contained all zeros…

86001288  00000000 00000000 00000000 00000000

86001298  00000000 00000000 00000000 00000000

860012a8  00000000 00000000 00000000 00000000

860012b8  00000000

 

Out of curiosity, I ran the !address command to see what range this address falls into.

0: kd> !address 860010b8

  82b7e000 - 07c82000                          

          Usage       KernelSpaceUsageNonPagedPool

 

Next, we dump the pool information out. Note the corruption indication listed below.

 

0: kd> !pool 860010b8

Pool page 860010b8 region is Nonpaged pool

86001000 is not a valid large pool allocation, checking large session pool...

86001000 is freed (or corrupt) pool

Bad allocation size @86001000, zero is invalid

***

*** An error (or corruption) in the pool was detected;

*** Attempting to diagnose the problem.

***

*** Use !poolval 86001000 for more details.

***

 

0: kd> !poolval 86001000

Pool page 86001000 region is Nonpaged pool

Validating Pool headers for pool page: 86001000

Pool page [ 86001000 ] is __inVALID.

Analyzing linked list...

[ 86001000 --> 86001310 (size = 0x310 bytes)]: Corrupt region

Scanning for single bit errors...

None found

 

So now that we have opened the dump, set the trap frame, dumped the stack, then found  the answer. Now, we need to do the actual debug of the assembly code to both prove our theory and determine how this caused us to die. Let's revisit the activeprocesslinks field that we discussed before.

 

0: kd> dt nt!_EPROCESS activeprocesslinks

   +0x098 ActiveProcessLinks : _LIST_ENTRY  ß the linked list we were following is located 0x98 from the start of the nt!_EPROCESS.  The previous _LIST_ENTRY provided us with the address of this field by in the next process object. To get the start of the actual process object, we need to subtract 0x98 from that value. For example, if you were passed in a null address instead of a valid pointer and subtracted 0x98 you would get:

 

0: kd> ? 0x0-0x098 

Evaluate expression: -152 = ffffff68  ß If we start with a null value and backup 0x98 we get ffffff68 

 

As you can see below, that very value happens to be present in ECX. ECX should instead be the address of our process object. However,since our liked list pointer was null, we instead were following an invalid address. It’s all downhill from here:

 

0: kd> recx  ß What’s in ECX?

Last set context:

ecx=ffffff68   ß it’s the offset from the null pointer

 

Let’s keep digging to verify this by looking back in the assembly and walking through what happened in the final moments of the crash. To do this, I unassembled backwards using the ub command. Examination of the output of this command (listed below but limited to only one command for blog simplicity) will show that the code took the invalid address ffffff68 that should have been a pointer to the process object and subtracted 0x18. This gave the us the invalid address ffffff50 that we died on. The . represents the current instruction pointer and the L1 tells us to only display 1 unassembled assembly language instruction.

 

0: kd> ub . L1

nt!ObReferenceObjectSafe

80846dce 8d51e8          lea     edx,[ecx-18h]   ß we took the process object pointer in ecx and subtracted 0x18 from it and put it into edx to dereference

 

0: kd> redx

edx=ffffff50

 

0: kd> ? ffffff68-0x18

Evaluate expression: -176 = ffffff50

 

Next, we tried to dereference this invalid pointer. Note that @$ip is another way (just like .)to point to the instruction pointer.

 

0: kd> u @$ip L1

nt!ObReferenceObjectSafe+0x3:

80846dd1 8b0a            mov     ecx,dword ptr [edx]  ß edx contains the invalid value

 

You can also use the r command to dump out all of the registers and display this failed instruction as shown below.

 

0: kd> r

Last set context:

eax=861ce300 ebx=808b5be8 ecx=ffffff68 edx=ffffff50 esi=8747c8d0 edi=00000000

eip=80846dd1 esp=f40099b4 ebp=f40099c8 iopl=0         nv up ei pl nz ac pe cy

cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010217

nt!ObReferenceObjectSafe+0x3:

80846dd1 8b0a            mov     ecx,dword ptr [edx]  ds:0023:ffffff50=????????  ß we trapped trying to dereference the bad address that edx was directing us to

 

So, why would we be backing up from this address? Each process object has an Object Header that precedes it. The following command ?? uses the current Expression Evaluator (C++ is the default) to run  the sizeof() function against  the nt!_OBJECT_HEADER symbol information and output the size of the an object header.

 

0: kd> ?? sizeof (nt!_OBJECT_HEADER)

unsigned int 0x20

 

You can dump out this header to see where the start of the process object is with the following command

 

0: kd> dt nt!_OBJECT_HEADER

   +0x000 PointerCount     : Int4B  ß the code was accessing the offset 0 of the Object Header which is the PointerCount field

   +0x004 HandleCount      : Int4B

   +0x004 NextToFree       : Ptr32 Void

   +0x008 Type             : Ptr32 _OBJECT_TYPE

   +0x00c NameInfoOffset   : UChar

   +0x00d HandleInfoOffset : UChar

   +0x00e QuotaInfoOffset  : UChar

   +0x00f Flags            : UChar

   +0x010 ObjectCreateInfo : Ptr32 _OBJECT_CREATE_INFORMATION

   +0x010 QuotaBlockCharged : Ptr32 Void

   +0x014 SecurityDescriptor : Ptr32 Void

   +0x018 Body             : _QUAD  ß The process object is located here at offset 0x18. This means that backing up 0x18 would put you at offset 0x0 of the object header which is the Pointer Count field.

 

So let’s see if we can locate any stack data to validate this. Dumping the stack again, I observe the address of the process object.

 

0: kd> kvL2

ChildEBP RetAddr  Args to Child             

f40099b0 809426b2 a9131e08 00000012 861ce318 nt!ObReferenceObjectSafe+0x3 (FPO: [0,0,0])

f40099c8 808544d9 861ce318 0007d960 00000000 nt!PsGetNextProcess+0x6c (FPO: [1,1,0])

 

Let’s dump out this process field. Wow! Look what we have here. The address of the zeroed list entry from above.

0: kd> dt nt!_EPROCESS 0x861ce318 ActiveProcessLinks

   +0x098 ActiveProcessLinks : _LIST_ENTRY [ 0x860010b8 - 0x87176940 ]

 

0: kd> dd 0x860010b8 L2

860010b8  00000000 00000000

 

So to summarize the problem, we crashed because a memory address that should have contained valid flink/blink pointers to a process object contained the invalid value 00000000.  We expected this value to be pointing to a valid nt!_EPROCESS  ActiveProcessLinks field instead of containing a null pointer (zero). After obtaining this null pointer, we then subtracted 0x098 which normally would have brought us to the start of the process object. Instead we ended up with invalid pointer 0xffffff68. Finally, we subtracted another 0x18 which should have brought us to the PointerCount  field of the object header. This gave us address 0xffffff50 instead of a valid object header address. We then dereferenced this invalid address which led us to the bugcheck.

 

This crash was due to pool corruption, as shown in the output of the !pool command above.  This blog has been a walkthrough of debugging a memory dump and identifying how pool corruption led up to a crash. Identification of the reason for the corruption is a separate matter. These types of issues are often difficult to troubleshoot due to the fact that any driver in kernel mode could have corrupted this memory region. To further complicate matters, the actual corruption of this memory may have taken place millions of CPU cycles before the time this code stumbled across the corrupted region leading up to the crash that caused this memory dump to be initiated. A memory dump is simply a snapshot of information relating to the state of a system during one particular moment in time.  

 

                In this case, in order to identify the source of the pool corruption we need to use Special Pool.  Special Pool will use guard pages to catch a buffer overrun or underrun and should provide us with a dump that shows the code that causes the corruption.  You can find more information on Special Pool in KB188831.  Also, in some situations memory corruption can be caused by a driver overflowing on a DMA transfer causing corruption to physical pages rather than virtual pages as we see in typical pool corruption.

               

                Usually, a debug such as this one would have started with analysis of the function at the top of the stack and worked  down the stack reading the assembly to definitively identify the source of the bad pointer. However in this case, we were lucky enough to find the answer right away and then validate it with conventional debugging methods by reversing in reverse.

Share this post :

Hello, my name is Venkatesh Ganga. I’m a Senior Escalation Engineer on the Microsoft Platform Global Escalation Services team. In this blog I would like to talk about an interesting WMI issue I worked a few months ago. The idea behind this blog is to discuss how WMI works under the hood. In this article I show a few call stacks but the focus of the article is really to point out WMI concepts.  

 

The issue was interesting. The customer encountered the error, "Provider cannot perform the requested operation" for all WMI queries to any provider on some of his Windows XP machines. Also the WMI Control (wmimgmt.msc) properties showed the following:-

 

Figure 1:
image 

 

Moreover, nothing happened when he clicked on other tabs like Logging, Backup/Restore, etc.  Running WMIDIAG showed the message - “24 error(s) 0x80041024 - (WBEM_E_PROVIDER_NOT_CAPABLE) Provider cannot perform the requested operation“.

All the standard WMI troubleshooting was done before I was engaged, like rebuilding the WMI repository, checking DCOM permissions, etc. But nothing easily resolved the issue so I had the customer send a machine image to Microsoft for a local repro of the problem. Initially I didn’t know where to start with the issue because there were multiple symptoms. I decided to trace the execution of the inbox provider CIMWin32.dll by issuing the WMI query “Select * from Win32_Processor”, next find where it fails, and then go from there.

WMI maintains each provider’s physical implementation information such as the CLSID of the provider’s COM object, the hosting model, etc. in an instance of the __Win32Provider system class. Each provider can support one or more types mentioned below. Based on the type(s) the provider supports, WMI will create an instance of each supported type.

·         __ClassProviderRegistration

·         __EventConsumerProviderRegistration

·         __EventProvideRegistration

·         __InstanceProviderRegistration

·         __MethodProviderRegistration

·         __ObjectProviderRegistration

·         __PropertyProviderRegisration

For example, looking at the Windows inbox .mof file for the Cimwin32 provider, we can quickly determine that the provider registers for two types - the Instance Provider, and the Method Provider as Figure 2 displays. The inbox mof files are usually stored in c:\Windows\System32\wbem

 

Figure 2:

image

 

In Memory at Runtime 

Since our test query “select * from Win32_Processor” retrieves the instances of Win32_Processor class, WMI checks the CIMWin32’s __InstanceProviderRegistration object to see if the SupportsEnumeration property is set. While debugging the customer’s image, I found the runtime object WMI maintains for CIMWin32 provider, set to “False” for the ”SupportsEnumeration” property.

 WMI maintains an in-memory structure for each provider and fills that structure out with information from the repository.  The CreateInstanceEnumAsync checks the provider’s in-memory structure to determine whether it supports enumeration.  From my debugging on the customer’s system, I determined the provider did not support enumeration.

image

Here's a snippet from my debugging-

0:003> kL

ChildEBP RetAddr

0091f560 77e7a1ac  wmiprvse!CInterceptor_IWbemSyncProvider::CreateInstanceEnumAsync+0x21

0091f588 77ef421a  rpcrt4!Invoke+0x30

0091f998 77ef4bf3  rpcrt4!NdrStubCall2+0x297

0091f9f0 756bd7fe  rpcrt4!CStdStubBuffer_Invoke+0xc6

0091fa04 77600c31  fastprox!CBaseStublet::Invoke+0x22

0091fa44 77600bdb  ole32!SyncStubInvoke+0x33

0091fa8c 7750f237  ole32!StubInvoke+0xa7

0091fb64 7750f15c  ole32!CCtxComChnl::ContextInvoke+0xe3

0091fb80 77600b11  ole32!MTAInvoke+0x1a

0091fbb0 776009bc  ole32!AppInvoke+0x9c

0091fc84 77600715  ole32!ComInvokeWithLockAndIPID+0x2e0

0091fcd0 77e79c75  ole32!ThreadInvoke+0x1cd

0091fd04 77e79bda  rpcrt4!DispatchToStubInC+0x38

0091fd58 77e79b06  rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0x113

0091fd7c 77e89f9c  rpcrt4!RPC_INTERFACE::DispatchToStub+0x84

0091fdbc 77e89fdd  rpcrt4!RPC_INTERFACE::DispatchToStubWithObject+0xc0

0091fdfc 77e7be65  rpcrt4!LRPC_SCALL::DealWithRequestMessage+0x2cd

0091fe20 77e76794  rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0x16d

0091ff80 00000000  rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0x28f

Repository 

In contrast, when we checked the customer’s system, we determined  the repository showed  the provider’s “SupportEnumeration” property was set to true.  Verification steps below:

1.       Run wbemtest and connect to root\cimv2

2.       Enter the following query-

                                select * from __InstanceProviderRegistration where provider="__Win32Provider.Name=\"CIMWin32\""

3.       After running the query, you should see one object in the Query Result window. Double click it and look at the properties. On a working system you will see the SupportEnumeration property is set to “True” for the CimWin32 provider, as Figure 3 shows.

 

Figure 3:

image

 

Discrepancy Between the In-Memory and Repository

Now we know WMI is not initializing the provider correctly because there is a discrepancy between the in-memory objects it maintains, and the repository.

So I started debugging the CIMWin32 provider initialization where WMI fills the in-memory structure. During provider initialization, WMI makes a REFERENCES OF query to get all  types CIMWin32 supports. On our customer’s machine, this query returned zero objects, whereas on a working system it returned two objects (one for __InstanceProviderRegistration and other for __MethodProviderRegistration). Here's a screenshot of this action. 

"references of {__Win32Provider.Name="CimWin32"}"

image

Here is a stack of how we found the exact “REFERENCES OF“ query being issued.

0:007> kL

ChildEBP RetAddr 

0164fa38 59841a6c wbemcore!CWbemNamespace::ExecQuery

0164fa88 59841c41 wmiprvsd!CServerObject_ProviderRegistrationV1::QueryRepositoryUsingQuery+0x47

0164faac 59841d31 wmiprvsd!CServerObject_ProviderRegistrationV1::QueryRepository+0x42

0164fac8 5980f056 wmiprvsd!CServerObject_ProviderRegistrationV1::Load+0x33

0164fba8 762e9cfd wmiprvsd!CServerObject_BindingFactory::GetProvider+0x1f5

0164fc24 76301d1e wbemcore!CWbemNamespace::DynAux_ExecQueryExtendedAsync+0x81

0164fce0 76302912 wbemcore!CQueryEngine::ExecComplexQuery+0x1f4

0164fdec 763032bd wbemcore!CQueryEngine::ExecQlQuery+0x37

0164fe90 762fc769 wbemcore!CQueryEngine::ExecQuery+0x228

0164feac 762cef24 wbemcore!CAsyncReq_ExecQueryAsync::Execute+0x19

0164fed8 762ced4e wbemcore!CCoreQueue::pExecute+0x3c

0164ff08 762f25cb wbemcore!CCoreQueue::Execute+0x18

0164ff50 762cee89 wbemcore!CWbemQueue::Execute+0xf6

0164ff84 762cf055 wbemcore!CCoreQueue::ThreadMain+0x111

0164ffb4 7c80b683 wbemcore!CCoreQueue::_ThreadEntry+0x45

0164ffec 00000000 kernel32!BaseThreadStart+0x37

0:044> du 04f57424

04f57424  "references of {__Win32Provider.N"

04f57464  "ame="CIMWin32"}"

The repository driver, repdrvfs.dll, works on the query above. The references query failed with error 80041017 (WBEM_E_INVALID_QUERY) due to an apostrophe in the computer name. While parsing the query, the repository driver considers anything between the apostrophes, or quotes, as a name. This apostrophe in the computer name misleads WMI to look for a closing apostrophe, which is not there, subsequently the references query fails with error.


0:005> kL
ChildEBP RetAddr

05cbfec8 75214a74 repdrvfs!CNamespaceHandle::ExecReferencesQuery+0x1dd

05cbff88 75215518 repdrvfs!CNamespaceHandle::ExecQuerySink+0xc2

05cbffac 7520e11c repdrvfs!CExecQueryObject::Execute+0x28

05cbffb4 7c80b683 repdrvfs!A51FiberBase+0xd

05cbffec 7c82ffa9 kernel32!BaseThreadStart+0x37

05cbffec 00000000 kernel32!BaseFiberStart+0x17

 

0:005> du 00b56218
00b56218 "references of {\\COMPUTERNAME'User1"
00b56258 "\ROOT\cimv2:__Win32Provider.Name"
00b56298 "="CIMWin32"}"


0:005>r

image

WMI makes a similar REFERENCES OF query for all the providers to get the types they support. But they all fail for the same reason. Hence, WMI queries to all the providers fail. We renamed the computer name without an apostrophe and rebooted the box. After the rename, WMI initialized properly, and the WMI queries started working again. An apostrophe is not a valid character for the computer name. Unfortunately, we never found out how the computer name got set with an apostrophe. If we try to set the computer name through GUI, it throws an error saying the computer name has invalid characters.  Hopefully through this experience, you have a better understanding of how WMI implements providers to help you better troubleshoot your next WMI issue.

Here are the main takeaways from this article-

1.       WMI maintains an instance of __Win32Provider system class for each provider.  Based on the provider’s registration (see .mof file above) it maintains one or more registration system classes such as __InstanceProviderRegistration.

2.       WMI uses the REFERENCES OF query to map the instance of __Win32Provider to the providers registration types (e.g. __InstanceProviderRegistration).

3.       To view the various providers registered on your system, connect to root\Cimv2 and issue the following query: “Select * from __Win32Provider”.

4.       And to get the types that each provider supports, issue the following query:references of {__Win32Provider.Name="<PROVIDER_NAME>"}

 

Share this post :

Hi, my name is Todd Webb, I am an Escalation Engineer from the Global Escalation Services OEM Team.

 

I recently worked a case where we were trying to troubleshoot an issue and the system would hang with USB debugging enabled. This was a problem since USB debugging was the only kernel debugging option available on this particular hardware setup. I needed a quick and easy way to find the source of the problem without using many of the standard methods.  Basically I had to debug a debugger issue with no access to a debugger.  I knew the problem was likely USB related, but had no other information.  The method I used to troubleshoot this issue was to write some macros that would make a call to HalMakeBeep to generate audible indications of where we were in the code.  The primary reason for using this method was I could quickly and easily move the location in the code where I made the calls as I narrowed down where the problem occurred. 

 

The alternative would have been to add a bunch of logging and find some way to force a dump.  My initial assessment was this would be more time consuming and less flexible then the beep code method.  Others may have other methods to do this, but this just gives people another method of debugging driver start-up problems on systems that are not debuggable.

 

The main beep macro allows you to pass in a frequency and duration in milliseconds for a single beep.  Then based on the example below you can extend the single beep macro to create a series of beeps.

 

Here is an example of the macros I used:

 

//

// beep macros for troubleshooting without debugger

//

 

#define EHCI_BEEP1(f, d)                                              \

    {                                                                 \

        LARGE_INTEGER duration;                                       \

        duration.QuadPart = Int32x32To64(d, -10000);                  \

        HalMakeBeep(f);                                               \

        KeDelayExecutionThread(KernelMode, FALSE, &duration);         \

        HalMakeBeep(0);                                               \

        duration.QuadPart = Int32x32To64(500, -10000);                \

        KeDelayExecutionThread(KernelMode, FALSE, &duration);         \

    }

 

#define EHCI_BEEP2(f, d)                                              \

    {                                                                 \

        EHCI_BEEP1(f, d);                                             \

        EHCI_BEEP1(f, d);                                             \

    }

     

#define EHCI_BEEP3(f, d)                                              \

    {                                                                 \

        EHCI_BEEP2(f, d);                                             \

        EHCI_BEEP1(f, d);                                             \

    }

 

Here is an example of a long beep followed by three short beeps:

 

    //

    // debug - 1 long : 3 short

    //

    EHCI_BEEP1(600, 1000);

    EHCI_BEEP3(600, 500);

 

Share this post :

Trey Nash here again, and I would like to discuss a scenario we are all too familiar with.  You’ve worked your tail off for the past year, and for the past couple of months you even worked evenings and weekends.  Management rewarded your team with two weeks off in appreciation of your efforts.  But now that you’re back in the office, you’re hearing rumors percolating up from your tech support department that there are some cases where your application is crashing in the field for mysterious reasons.  What do you do?

The application happens to have been built with an unhandled exception filter registered via the AppDomain.UnhandledException event.  Therefore, you at least know that the application is failing with an InvalidCastException, but you cannot imagine why this is happening.

Wouldn’t it be nice if you could live debug on the affected system?  Unless you work on-site for your customer or your software is on a laptop and your customer is willing to send it to you, then I doubt you will get this opportunity.  What you need is a tool to capture the state of your application while it is failing.  Then the customer could capture this information and send it to you.

Enter ADPlus.  ADPlus is a free tool in the Debugging Tools for Windows package that scripts the CDB debugger allowing you to capture dumps for one or multiple processes on a system.  It also offers the following advantages:

·         ADPlus can monitor desktop applications, services, etc.

·         ADPlus can monitor multiple processes on the system.  When it collects a dump of those processes, it freezes and dumps them simultaneously.  This is essential for tracking down problems with inter-process communications.

·         ADPlus supports xcopy deployment meaning the customer does not need to install anything via Windows Installer, etc.  This minimizes configuration changes on the machine, and that is music to customers’ ears.  

 

Note:    Although ADPlus is xcopy installable, you still have to install the Debugging Tools for Windows package via Windows Installer as that is the only way Microsoft distributes it.  However, once you have installed Debugging Tools for Windows once, you can xcopy deploy ADPlus or the entire Debugging Tools for Windows package to another machine.  In fact, during development, I find it extremely handy to check in the development tools into the source repository.  Debugging Tools for Windows supports this by virtue of the fact that it is xcopy installable.

For those of you that are savvy with Windows Installer, you can perform an admin install using the msi for Debugging Tools for Windows and that will allow you to extract the files without actually installing the package on the machine, for example:

msiexec /a dbg_x86_6.11.1.404.msi

 

With all of that said, let’s see how ADPlus can help you diagnose problems with .NET applications.

The Sample Application

In the rest of this post, I will reference the C# 3.0 sample application that I have put together to illustrate using ADPlus to capture an unhandled exception in a .NET application.  The code is listed below:

using System;

using System.Linq;

using System.Runtime.Serialization;

 

class A

{

    public void SaySomething() {

        Console.WriteLine( "Yeah, Peter...." );

        throw new BadDesignException();

    }

}

 

class B : A

{

}

 

class C

{

}

 

class EntryPoint

{

    static void Main() {

        DoSomething();

    }

 

    static void DoSomething() {

        Func<int, object> generatorFunc = (x) => {

            if( x == 7 ) {

                return new C();

            } else {

                return new B();

            }

        };

 

        var collection = from i in Enumerable.Range( 0, 10 )

                         select generatorFunc(i);

 

        // Let's iterate over each of the items in the collection

        //

        // ASSUMING THEY ARE ALL DERIVED FROM A !!!!

        foreach( var item in collection ) {

            A a = (A) item;

            try {

                a.SaySomething();

            }

            catch( BadDesignException ) {

                // Swallow these here. The programmer chose to

                // use exceptions for normal control flow which

                // is *very* poor design.

            }

        }

    }

}

 

public class BadDesignException : Exception

{

    public BadDesignException() { }

    public BadDesignException( string msg ) : base( msg ) { }

    public BadDesignException( string msg, Exception x ) : base( msg, x ) { }

    protected BadDesignException( SerializationInfo si,

                                  StreamingContext ctx )

        :base( si, ctx ) { }

}

 

You can compile this sample code easily by putting it in a file, such as test.cs, and then from either a Visual Studio Command Prompt or from a Windows SDK Command Shell, execute the following:

csc /debug+ test.cs

 

Note:    The code is contrived and there are many bad things about this code from a design/coding standpoint, but that is intentional for the sake of illustration.  For example, one may want to re-think introducing a collection that contains references to System.Object.

The code above also uses features that are new to C# 3.0 (for brevity) including lambda expressions and LINQ.  If you would like to become more familiar with them, visit the
C# website on MSDN or reference one of the excellent books on C# such as Pro LINQ or Accelerated C# 2008.

 

The notable section of the code to focus on is the foreach loop within the EntryPoint.Main() method.  In that foreach loop, we are iterating over a collection of objects where we are assuming that they all derive from type A.  Thus the code attempts to cast all instances to a reference of type A and since I have intentionally put an instance of type C within that collection, it will fail at some point with an exception of type System.InvalidCastException.

Capturing the Problem

Using ADPlus in crash mode, we can capture the exception in the customer’s environment.  To illustrate this in action, let’s launch ADPlus in crash mode and monitor the test.exe example application built above by using the following command line:

adplus -crash -o c:\temp\test -FullOnFirst -sc c:\temp\test\test.exe

 

Note:    I built and tested this code in a directory named c:\temp\test on my machine as I wrote this.  Therefore, you will see references to it throughout this post.  Incidentally, I have found ADPlus is a lot easier to use if you feed it fully qualified paths.  If you get mysterious errors or behavior, and you are using relative paths with ADPlus, try fully qualifying the paths before you beat your head against the wall for too long trying to figure out what could be going wrong.

 

ADPlus is easier to launch if you add the directory where the Debugging Tools for Windows was installed to your PATH environment variable.  The command line above assumes this has been done.  Now, let me explain the command line options that I used above.  I highly recommend that you become familiar with all of the ADPlus command line options.

·         -crash launches ADPlus in crash mode.  This is the mode you want to use if your application is failing because of an unhandled exception.

·         -o c:\temp\test tells ADPlus that I want the output to be placed in c:\temp\test, which is the directory in which I built the test.exe application.

·         -FullOnFirst is very important for managed applications..  This tells ADPlus to grab a full process dump on first chance exceptions.  It is essential that you capture full dumps for managed applications, otherwise, all of the necessary data regarding the execution engine and the managed heap will be absent from the dump making it impossible to debug effectively.

·         -sc c:\temp\test.exe is only one of the ways you can point ADPlus to an application to monitor.  In this case, we’re instructing ADPlus to tell the debugger to explicitly launch the application.  Had it been a service, or if the application you want to monitor is already running, we would probably have used –p to attach to its PID or –pn to attach to the process by name.  Notice that I provided the full path to the application.

After you launch ADPlus, you are presented with the following dialog unless you use the –quiet option.

 

Once the application is finished executing, go to the directory that you specified in the ADPlus –o command line option and you should see a subdirectory named similarly to what you see in the previous dialog snapshot.  For the instance of test.exe I just executed, that directory is named Crash_Mode__Date_05-11-2009__Time_21-12-54PM.  Under that directory, there are quite a few files that I have listed below:

C:\temp\test\Crash_Mode__Date_05-11-2009__Time_21-12-54PM>dir /b
ADPlus_report.txt
CDBScripts
PID-0__Spawned0__1st_chance_CPlusPlusEH__full_15ac_2009-05-11_21-13-01-332_0eac.dmp
PID-0__Spawned0__1st_chance_NET_CLR__full_15ac_2009-05-11_21-12-55-482_0eac.dmp
PID-0__Spawned0__1st_chance_NET_CLR__full_15ac_2009-05-11_21-12-56-527_0eac.dmp
PID-0__Spawned0__1st_chance_NET_CLR__full_15ac_2009-05-11_21-12-57-370_0eac.dmp
PID-0__Spawned0__1st_chance_NET_CLR__full_15ac_2009-05-11_21-12-58-103_0eac.dmp
PID-0__Spawned0__1st_chance_NET_CLR__full_15ac_2009-05-11_21-12-58-867_0eac.dmp
PID-0__Spawned0__1st_chance_NET_CLR__full_15ac_2009-05-11_21-12-59-663_0eac.dmp
PID-0__Spawned0__1st_chance_NET_CLR__full_15ac_2009-05-11_21-13-00-505_0eac.dmp
PID-0__Spawned0__1st_chance_NET_CLR__full_15ac_2009-05-11_21-13-02-689_0eac.dmp
PID-0__Spawned0__1st_chance_Process_Shut_Down__full_15ac_2009-05-11_21-13-27-743_0eac.dmp
PID-0__Spawned0__2nd_chance_NET_CLR__full_15ac_2009-05-11_21-13-04-140_0eac.dmp
PID-0__Spawned0__Date_05-11-2009__Time_21-12-54PM.log
Process_List.txt

For each first chance exception, a dump file (.dmp) has been collected.  Notice that the dump files can tend to be very large because of the –FullOnFirst option.  You can subsequently load these dump files into either Windbg (or its variants) or the Visual Studio Debugger.  Personally, I prefer Windbg because I can then load the SOS extension along with the SOSEX extension and dig into the state of the application and the CLR.

Using the ADPlus default configuration as we have above, you can see that ADPlus generated dumps for one first-chance C++ exception, eight first-chance CLR exceptions, one second-chance CLR exception and one dump collected during process shut down.

What is Included in the Output

If you look in your Debugging Tools for Windows directory, you’ll notice that ADPlus is really just a very complex VBScript.  It generates quite a bit of useful information along with any problem dump files.  ADPlus_report.txt reports the configuration for ADPlus for this run.  This is handy if you need to know what it will do for a specific type of exception.

Process_list.txt is generated from executing tlist.exe, another tool that comes with Debugging Tools for Windows.

And finally, the CDBScripts subdirectory contains a .cfg file which is the exact debugger script ADPlus generated and subsequently fed to CDB to get the job done.  On my machine, when running against the sample application using the command line from the previous section, this file is named PID‑0__Spawned0.cfg.  If you’re ever curious or need to know exactly what ADPlus instructed the debugger to do, this file is the source.

 

Note:    The reason the name contains a zero PID is because we used the –sc option to launch the application.  Had we used the –p or –pn option, the PID in the filename would not be zero.

 

I don’t recommend executing this debugger script in a live debugger as doing so could overwrite the data that you already have collected.  Instead, if you need to couple ADPlus with live debugging, you should use the –gs option which I will describe shortly.

Pinpointing a Specific Exception

You’ll notice that in the previous run of ADPlus, quite a few dump files were generated and each one was fairly large.  In reality, there are times where a first chance exception does not always indicate a fatal situation.  In Accelerated C# 2008, I go into detail regarding how it is poor design practice to implement any sort of control flow or otherwise reasonably expected behavior using exceptions.  This is because exceptions are supposed to be truly exceptional events!  For more juicy details on how expensive exceptions are, I invite you to read an excellent blog post by Tess Ferrandez on the topic. At any rate, you may encounter situations where you get a lot of dumps for first chance exceptions you are not interested in as this example shows..

To alleviate this situation, you can create an ADPlus configuration file coupled with the !StopOnException command provided by the SOS extension to instruct ADPlus to filter out only the exceptions you’re interested in.  To do this, I created a configuration file named filter.config with the following contents:

<ADPlus>

 

    <!-- Configuring ADPlus to log only exceptions we're interested in -->

 

  <Exceptions>

     <Config>

        <!-- This is for the CLR exception -->

       <Code> clr </Code>

       <Actions1> Log </Actions1>

       <CustomActions1> .loadby sos mscorwks; !StopOnException System.InvalidCastException 1; j ($t1 = 1) '.dump /ma /u c:\dumps\InvalidCastException.dmp; gn' ; 'gn'  </CustomActions1>

       <ReturnAction1> VOID  </ReturnAction1>

       <Actions2> Log </Actions2>

     </Config>

  </Exceptions>

 

</ADPlus>

 

 

Note:    The <CustomActions1> element above is meant to be in one line in the config file.

 

The <CustomActions1> element is the element of interest in this configuration file.  This element allows you to specify which commands the debugger should execute on first chance exceptions.  Within this element, you can put any valid debugger commands (except windbg GUI-related commands).  If you need to execute multiple commands, as I have above, you simply separate them with semicolons.  In the <CustomActions1> element shown above, I first load the SOS extension using the .loadby command.  Then, I use the !StopOnException command in predicate mode to set the $t1 pseudo-register to 1 if the exception is of type System.InvalidCastException and 0 otherwise.  Then, the following j command is used to create a full dump if $t1 is 1 and do nothing otherwise.  The gn command in both paths of the j command tells the debugger to go unhandled so that the exception is propagated up rather than swallowed by the debugger.  If you were to go handled, thus swallowing the exception, the exception handlers in the code would never see it and the debugger would alter the behavior of the application.  And finally, note that the .dump command used to create the dump indicates the path where the dump will be stored.  In this case, I am placing it in the c:\dumps directory on my machine.

Now you can provide this configuration file to ADPlus using the following command:

adplus -crash -o c:\temp\test -FullOnFirst -c c:\temp\test\filter.config -sc c:\temp\test\test.exe

Notice that there are fewer dumps collected.  Along with the dump for the InvalidCastException, it also captures a C++ exception as well as when the application shuts down.  If you open the C++ exception dump in the debugger and inspect the stack, it shows that the CLR is in the process of generating the InvalidCastException.  The CLR catches the C++ exception and converts it into a managed exception.  The C++ exception dump was generated because I left the –FullOnFirst option in the previous command line.  You can eliminate the C++ exception dump by removing -FullOnFirst.

Using ADPlus during Live Debugging

I already hinted at the –gs ADPlus command line option earlier in this post.  This option allows you to create the debugger scripts ADPlus creates without actually running them in the debugger.  For example, if you execute the following command (I executed mine from my c:\temp\test directory):

adplus -crash -o c:\temp\test -FullOnFirst -c c:\temp\test\filter.config -gs livedebug

You will notice that ADPlus does not actually launch any debugger or your application.  Rather, it creates a subdirectory named livedebug.  When you go into that directory, you’ll notice it looks similar in layout to the crash directories created in the previous demonstration.  On my machine, I end up with the following two files:

C:\temp\test\livedebug\ADPlus_report.txt
C:\temp\test\livedebug\CDBScripts\PID-0__livedebug.cfg

The PID-0__livedebug.cfg file is actually a debugger script file containing debugger commands.  All we have to do now is launch our test application in the debugger and then execute this script.  Within my c:\temp\test directory, I can launch the debugger using the following command:

windbg test.exe

Once inside the debugger, I can invoke the ADPlus debugger script by executing the following $$< command:

$$<C:\temp\test\livedebug\CDBScripts\PID-0__livedebug.cfg

Once you execute the $$< command, you will notice that the script takes over and performs the same actions as ADPlus does when run conventionally from the command line.

As a further experiment, edit the filter.config file and remove the gn commands.  Now the debugger will wait for user input after executing the custom commands rather than continuing execution of the application.  This could be handy if you want the opportunity to perform debugging by hand if ADPlus encounters a certain situation.

First Chance vs. Second Chance Exceptions

Throughout this post, I have been focusing on first chance exceptions for the sake of illustration.  However, many times, you are only interested in the truly unhandled exceptions.  In that case, you would want to capture only second chance exceptions.  You certainly would want to capture first chance exceptions if you needed to capture the state at the exact point the exception was thrown and before the OS searches for any handlers.  Additionally, if you suspect that the application you are debugging may be handling an exception inappropriately (maybe even blindly swallowing it), then you certainly want to catch first chance exceptions in that case.

Conclusion

In this blog post I have introduced you to ADPlus and the utility it provides when troubleshooting problems in the field.  ADPlus lends itself well to capturing exceptional situations in the field since it requires no configuration changes on the affected machine thus making it an easy pill for your customers to swallow.  You may also find this very useful when working with your Quality Assurance team during the development phase.  For example, how many times have you encountered a situation where a problem only presents itself on a dusty old rarely-used machine in the back corner of a random QA engineer’s office?  How many times, in such situations, have you felt like the only way to trouble shoot the problem effectively is to install the Visual Studio debugger and start working on that machine?  Furthermore, what if the problem only happens on that dusty old machine about once a week.  ADPlus can help you avoid that madness by providing an easy mechanism for capturing full process dumps on the troubled machine so you can then take those dumps to your trusty development machine for further debugging and analysis.

Have fun out there!

 

Share this post :

In this video Michael introduces the sample based profiling tool named XPERF. He also demonstrates how to use the tool to uncover high DPC’s, and Disk I/O’s. If you would like to see more XPERF blogs on ntdebugging please send your feedback using the email link at the top of the blog site. Feel free to leave comments too.

Location of the video: http://ittv.net/DesktopModules/UltraVideoGallery/uvg.swf?vId=419&portalId=0

BTW  - I’m working on making a downloadable version for off-line viewing. Stay Tuned.

Link to Microsoft Windows Performance Toolkit

http://msdn.microsoft.com/en-us/performance/default.aspx

 

Link to XPerf version 4.5. This ships with the Win7 SDK below.

http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=6db1f17f-5f1e-4e54-a331-c32285cdde0c

 

We look forward to your feedback!

Ron Stock

Share this post :

Recently I was working with a customer who had a file server experiencing high CPU in the WMIprvse.exe process. We received multiple user dumps and noted that someone or something was running the same query again and again. We needed to figure out what was running the query in a tight loop, causing the high CPU.

image

Figure 1 - Task Manager on FileServer

Before we get into the exact troubleshooting steps, let me provide some background on WMI.  Winmgmt is the WMI service within a SVCHOST process running under the LocalSystem account.  In all cases, the WMI service automatically starts when the first management application or script requests connection to a WMI namespace. For more information, see Starting and Stopping the WMI Service. To avoid stopping all the services when a provider fails, each provider is loaded into a separate host process named "Wmiprvse.exe". This also allows each instance of Wmiprvse to run under a different account with varying security. For more details you can look at the MSDN documentation on WMI.

 

I dumped out all the services in the various svchost.exe processes. You can do this from a command prompt by running the tasklist /svc command. In my instance, I found that the WinMgmt service was running in svchost, PID  452 (PID number will vary). Someone had to be making RPC calls to this svchost.exe process to run the WMI queries. It could be some local process on the machine; it could even be a process on a remote machine.

 

At this point I requested user dumps of PID 452 from the customer.  This would allow me to determine who was making the RPC calls to svchost.exe to run the WMI queries.  While the customer was uploading the dumps, we decided to get a Network Monitor trace to see if the RPC calls were coming over the network.

 

Immediately, I could see a lot of RPC traffic to the svchost.exe process(PID=452).

 

image

Figure 2 - Network Monitor Output from the FileServer. Notice the Source and destination ports and IP addresses. IP addresses are hidden by the aliases

Looking at the RPC payload, I could see the text of the WMI query. You can see this in the Hex Details Pane. The query that was running in a loop was Select * from Win32_Process. Looks like I found the source of the WMI queries.

 

At this point, we got the source IP for the RPC packets. We logged into the machine, and brought up the Task Manager.

 

image

Figure 3 - Task Manager on Remote Machine(Machine1)

Immediately we saw that there was some script running inside a Wscript.exe process.  At this point I was pretty sure that this script was the culprit. The customer was not sure what this was, and was not comfortable terminating the process.  To prove my suspicion, I had him open a command prompt and run the following command, netstat –ano.

 

image

Figure 4 - Netstat output from Remote Machine

 

From the output in Fig. 4, I could see a TCP connection created by PID 3532 (wscript.exe). Looking at the local and foreign addresses from the above output, they matched up exactly to what we were seeing in the Network Monitor trace.

 

In the above case, we already had our suspicions on the wscript.exe process; however, sometimes it might not be that easy. In that case, we could have used the netstat output to look at all connections to the file server (157.59.123.121). If there were multiple connections, then we can also narrow it down by the port number. Based on that, we could have found the PID responsible.

 

The customer called me later in the day, and told me that they had recently updated their scripts. One of their scripts had a bug which was running WMI scripts in a tight loop. Fixing the script caused the problem to go away.

 

Had the query being coming from a local process, I would have had to debug the svchost.exe process, and figure out who was making the WMI calls. However, since we could see the traffic on netmon, we didn’t need to use the debugger. Interesting way to get to the root of the problem without using a debugger!

 

 

Share this post :

We appreciate the comments and feedback sent about the blog. You guys are great! We’ve had comments requesting to make the XPERF videos downloadable. I’m working to move them to a hosting site and will post the link here soon. 

Are there any content topics you would like to see discussed on NTDebugging, please drop us a note and we’ll try to address it in a future blog.

Thanks for reading the blog!

Cheers,

Ron Stock

 

Hi, my name is Anurag Sarin, and I'm an Escalation Engineer from the Platforms Global Escalation Services 24x7 Team, located in Bangalore, India.

 

I recently worked a case dealing with a component that reported an exception. But it was not critical enough to crash the system and it was simply logged in a file. Because the root cause of the problem was not yet known, we did not have a deterministic way to reproduce the exception. The exception would just come and go. So catching the exception became a challenge. Do we take a dump? How do we break into the debugger to catch the exception when it occurs? Manual dumps wouldn't help because we didn't know when the exception would occur, and even if we did, it happened too fast to use Ctrl + Scroll Lock  + Scroll Lock.

 

A similar challenge is when you want to start debugging just after a module loads without knowing when the module load will happen. You can run into this while reverse engineering code or while live debugging. You may want to save all debugging efforts until a particular module gets loaded. Sometimes you just want to know if a module ever gets loaded in a particular code path.

 

In these situations, the ‘sx*’ command comes to our rescue. Below you will find a couple of nuggets to assist with these types of scenarios.

Break on exception

 

Let's say Windows Media Player reports “RPC Server is unavailable” when a user clicks "File\Open" and attempts to browse to a remote computer. In this situation the ‘sxe’ command is your friend. This command ‘sets an exception’ from the debugger. When this is set the target breaks into the debugger before any other error handlers are activated

 

By the way, RPC_S_SERVER_UNAVAILABLE is win32 error 1722 (0x06ba) with the description “RPC Server is unavailable” and is a common networking issue.

 

This is how WinError.h in Windows SDK documents it

 

//

// MessageId: RPC_S_SERVER_UNAVAILABLE

//

// MessageText:

//

// The RPC server is unavailable.

//

#define RPC_S_SERVER_UNAVAILABLE         1722L

 

Here are the steps to catch the “RPC Server is unavailable” exception. 

 

1. Attach the debugger to the process with the RPC_S_SERVER_UNAVAILABLE error.  In our example we’ll attach to wmplayer.exe.

 

2. Break into debugger to set or verify the symbol path.

 

3. Reload the symbols.

 

.reload

 

4. Clear any previously set exceptions.

 

sdx  *

 

5. Set the exception on 1722 (0x06ba) i.e. RPC_S_SERVER_UNAVAILABLE “RPC Server is unavailable”.

 

sxe 06ba

 

6. Let the process run until we hit the exception. You may hit a different exception before the one you’re looking for in which case you hit 'g' to continue execution.

 

g

 

7. This is the output from the debugger on a Vista RTM machine.

 

(b2c.f28): Unknown exception - code 000006ba (first chance) 

First chance exceptions are reported before any exception handling.

This exception may be expected and handled.

eax=0601e47c ebx=00000000 ecx=00000000 edx=049cd488 esi=000006ba edi=0601e5a4

eip=76dbb09e esp=0601e47c ebp=0601e4cc iopl=0         nv up ei pl zr na pe nc

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

kernel32!RaiseException+0x58:

76dbb09e c9              leave

0:036> kv100

ChildEBP RetAddr  Args to Child             

0601e4cc 76ea0911 000006ba 00000001 00000000 kernel32!RaiseException+0x58 (FPO: [4,20,0])

0601e4e4 76ea08e9 000006ba 0601e9dc 76f2e07d RPCRT4!RpcpRaiseException+0x1e (FPO: [0,0,0])

0601e4f0 76f2e07d 000006ba f61cdc85 05545938 RPCRT4!RpcRaiseException+0x16

0601e9dc 760a9e12 760a8908 760a95ea 0601e9fc RPCRT4!NdrClientCall2+0xd51                       

0601e9f4 760a9e5b 0553fee0 00000064 0601ea54 NETAPI32!NetrWkstaGetInfo+0x19 (FPO: [3,1,0])

0601ea40 75106f02 0553fee0 00000064 0601ea54 NETAPI32!NetWkstaGetInfo+0x3b (FPO: [SEH])      ßNetWkstaGetInfo called for request on information on workstation.

0601ea5c 75106f86 0553fee0 0601ea84 0601eb5c ntlanman!ServerExists+0x19 (FPO: [2,2,0])

0601ea98 75e24ede 0601ec00 05545938 0601ec44 ntlanman!NPDfsLmGetResourceInformation+0x177 (FPO: [4,8,4])

0601eab0 75e23bb9 055939f0 caa0f4a6 00000000 MPR!CGetResourceInformation::TestProvider+0x1a (FPO: [1,0,0])

0601eb08 75e23ac7 caa0f4ea 0601ec44 0601eb5c MPR!CRoutedOperation::GetResult+0xf6 (FPO: [SEH])

0601eb44 75e23a79 0601ec54 0601eb90 75e24f0d MPR!CMprOperation::Perform+0x65 (FPO: [SEH])

0601eb50 75e24f0d 00000001 75e24f5c 000000a4 MPR!CRoutedOperation::Perform+0x22 (FPO: [1,0,4])

0601eb90 76f5978a 0601ec00 05545938 0601ec44 MPR!WNetGetResourceInformationW+0x26 (FPO: [4,13,0])

0601ebb4 76f597f5 0601ec00 00000000 00000000 SHELL32!SHWNetGetResourceInformationAlloc+0x40 (FPO: [4,0,4])

0601ebd8 76f63d75 0601ec00 0601ec54 0601ec44 SHELL32!CNetFolder::_GetResourceInformationAlloc+0xf4 (FPO: [5,0,4])

0601f278 76f63bc3 001d019a 05560288 0553db08 SHELL32!CNetFolder::_ParseUNCName+0x1a7 (FPO: [7,414,4])

0601f4c4 76fe2c3e 0553db50 001d019a 05560288 SHELL32!CNetFolder::ParseDisplayName+0x123 (FPO: [7,135,4])

0601f52c 6de11f8c 05547c94 001d019a 05560288 SHELL32!CRegFolder::ParseDisplayName+0x90 (FPO: [7,14,4])

0601f5b0 6de11e0d 001d019a 05560288 0553db08 NetworkExplorer!CNetworkExplorerFolder::_IsValidServer+0x9f (FPO: [5,21,4])

0601f5e0 76fe58ab 0553da98 001d019a 05560288 NetworkExplorer!CNetworkExplorerFolder::ParseDisplayName+0x68 (FPO: [7,2,4])

0601f608 76fe2d13 0553da98 049d0008 001d019a SHELL32!CDesktopFolder::_ChildParseDisplayName+0x22 (FPO: [8,0,4])

0601f654 76fe2c3e 0553da98 001d019a 049d0008 SHELL32!CDesktopFolder::ParseDisplayName+0x8c (FPO: [7,6,4])

0601f6bc 76fe2b91 0559544c 001d019a 05560288 SHELL32!CRegFolder::ParseDisplayName+0x90 (FPO: [7,14,4])

0601f6f4 76c8c0f3 00000000 30c10000 0601f788 SHELL32!SHParseDisplayName+0x98 (FPO: [5,2,4])

0601f734 76c8c03a 0601f994 0601f788 0601f780 COMDLG32!CAsyncParser::_ParseFromDesktop+0x73 (FPO: [3,7,0])

0601fba0 76c808d8 0556b850 00000000 00000000 COMDLG32!CAsyncParser::_ParseOneItem+0x2e1 (FPO: [2,275,4])

0601fbbc 76c80bc6 0601fc40 77c1b132 0555b088 COMDLG32!CAsyncParser::_Parse+0x4a (FPO: [0,1,0])

0601fbc4 77c1b132 0555b088 00000000 00000000 COMDLG32!CAsyncParser::s_ThreadProc+0xd (FPO: [1,0,0])

0601fc40 76de3833 0023c72c 0601fc8c 77b1a9bd SHLWAPI!WrapperThreadProc+0x10f (FPO: [1,25,4])

0601fc4c 77b1a9bd 0023c72c 0601b824 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])

0601fc8c 00000000 77c1b09a 0023c72c 00000000 ntdll!_RtlUserThreadStart+0x23 (FPO: [SEH])

 

 

8. After you break into the stack for the 6ba exception, you might want to get a dump using the following command.

 

0:000> .dump /ma c:\dump1.dmp

Creating c:\dump2.dmp - mini user dump used

Dump successfully written

 

9. Now that we have a dump in the exception code path, it might be a good idea to tell the debugger to stop breaking for this exception.  

 

sxd 06ba

 

 

Now the debugger does not break for this exception.

Note: To detach the debugger from the process do a ‘.detach’. If you just close the debugger it will exit the process too.

 

Looking at the top of the stack, we see that wmplayer sent a request for a share access and failed due to (what looks like a network problem). I say this as we see that MRP  invokes ntlanman (Windows Redirector for Network Shares) which further request information on workstation.name ( NetWkstaGetInfo ) on the network (a Remote RPC call RPCRT4!NdrClientCall2). The next step should be to take a network trace and investigate if there are any networking problems on the network.

 

And it would make root cause determination much easier if you knew the user was attempting to connect to a nonexistent machine name (\\<Bad Machine Name>) while doing the File\Open in Windows Media Player. J

Break on Module load

 

Now let’s talk about a scenario where we want to start debugging when a particular DLL gets loaded. Let’s take the example of mpr.dll loading using the scenario above.

 

1.       Follow steps 1 through 4 mentioned above, to attach to wmplayer.exe

 

2.        In this case we set an exception on ‘module load of MRP.DLL‘.

 

sxe ld mpr.dll

 

3. Now let the process run. It will break back into the debugger when the DLL is loaded.

 

 g

 

This is the debugger output on a Windows 2008 machine when MPR.DLL gets loaded into the wmplayer process.

 

0:001> g

ModLoad: 753d0000 753e4000   C:\Windows\system32\MPR.dll

eax=75a3b111 ebx=00000000 ecx=00000000 edx=75a7275c esi=7ffad000 edi=20000000

eip=771a9a94 esp=0291df74 ebp=0291dfb8 iopl=0         nv up ei pl zr na pe nc

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

ntdll!KiFastSystemCallRet:

771a9a94 c3              ret

0:014> kvn ffff

 #   Memory  ChildEBP RetAddr  Args to Child             

00           0291df70 771a8764 7717df04 000003d4 ffffffff ntdll!KiFastSystemCallRet (FPO: [0,0,0])

01         4 0291df74 7717df04 000003d4 ffffffff 0291e0a0 ntdll!ZwMapViewOfSection+0xc (FPO: [10,0,0])

02        44 0291dfb8 7717c855 000003d4 0291e0a0 0291e088 ntdll!LdrpMapViewOfDllSection+0x67 (FPO: [5,2,4])

03       108 0291e0c0 7717fab9 01e2f640 0291e100 01e66a58 ntdll!LdrpMapDll+0x417 (FPO: [SEH])

04       274 0291e334 7717610e 01e2f640 72410321 0291e378 ntdll!LdrpLoadImportModule+0x206 (FPO: [4,145,4])

05        4c 0291e380 77176233 01e2f640 01e4f020 0291e3ac ntdll!LdrpHandleOneNewFormatImportDescriptor+0x84 (FPO: [4,10,4])

06        1c 0291e39c 77176248 01e2f640 01e4f020 724102c0 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d (FPO: [3,0,4])

07        7c 0291e418 7717c5ba 01e2f640 01e4f020 75a35de0 ntdll!LdrpWalkImportDescriptor+0x1f2 (FPO: [SEH])

08       288 0291e6a0 77177a52 00000000 01e2f640 0291e994 ntdll!LdrpLoadDll+0x2f1 (FPO: [SEH])

09       284 0291e924 76d031ba 01e2f640 0291e994 0291e954 ntdll!LdrLoadDll+0x22a (FPO: [SEH])

0a        64 0291e988 75a4b34f 0291ea04 00000000 00000008 kernel32!LoadLibraryExW+0x252 (FPO: [SEH])

0b        24 0291e9ac 75a4b239 0291ea04 0291e9d0 0291e9d4 ole32!CClassCache::CDllPathEntry::LoadDll+0xae (FPO: [4,1,4])

0c        30 0291e9dc 75a469fe 0291ea04 0291ecec 0291e9fc ole32!CClassCache::CDllPathEntry::Create_rl+0x37 (FPO: [3,3,4])

0d       24c 0291ec28 75a46913 00000001 0291ecec 0291ec58 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4 (FPO: [3,139,4])

0e        48 0291ec70 75a467b4 00000001 00505350 0291ec9c ole32!CClassCache::GetClassObjectActivator+0x224 (FPO: [3,10,4])

0f        38 0291eca8 75a474c0 0291ecec 00000000 0291f2f4 ole32!CClassCache::GetClassObject+0x30 (FPO: [1,3,4])

10        7c 0291ed24 75a4705b 75b4e428 00000000 0291f2f4 ole32!CServerContextActivator::CreateInstance+0x110 (FPO: [4,25,4])

11        40 0291ed64 75a4772d 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

12        54 0291edb8 75a4764f 75b4e4ac 00000000 0291f2f4 ole32!CApartmentActivator::CreateInstance+0x112 (FPO: [4,13,4])

13        20 0291edd8 75a47609 75b4e48c 00000001 00000000 ole32!CProcessActivator::CCICallback+0x6d (FPO: [6,1,4])

14        20 0291edf8 75a475ba 75b4e48c 0291f150 00000000 ole32!CProcessActivator::AttemptActivation+0x2c (FPO: [7,0,0])

15        3c 0291ee34 75a476af 75b4e48c 0291f150 00000000 ole32!CProcessActivator::ActivateByContext+0x4f (FPO: [6,3,4])

16        28 0291ee5c 75a4705b 75b4e48c 00000000 0291f2f4 ole32!CProcessActivator::CreateInstance+0x49 (FPO: [4,1,4])

17        40 0291ee9c 75a47121 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

18       260 0291f0fc 75a4705b 75b4e4b4 00000000 0291f2f4 ole32!CClientContextActivator::CreateInstance+0xb0 (FPO: [4,6,4])

19        40 0291f13c 75a472c5 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

1a       7d4 0291f910 75a6e2a2 0291f9f4 00000000 00000017 ole32!ICoCreateInstanceEx+0x403 (FPO: [8,50,4])

1b        60 0291f970 75a6e203 0291f9f4 00000000 00000017 ole32!CComActivator::DoCreateInstance+0xd9 (FPO: [7,11,4])

1c        24 0291f994 75a6e1bc 0291f9f4 00000000 00000017 ole32!CoCreateInstanceEx+0x38 (FPO: [6,0,0])

1d        30 0291f9c4 72751489 0291f9f4 00000000 00000017 ole32!CoCreateInstance+0x37 (FPO: [5,3,4])

1e        44 0291fa08 727550b9 00000000 022cd14c 022cd140 FunDisc!CQueryWorker::CreateProvider+0x1a6 (FPO: [1,7,4])

1f        34 0291fa3c 72755028 0291fa80 00000000 00000000 FunDisc!CQueryWorker::Execute+0x77 (FPO: [1,7,4])

20        18 0291fa54 74892c26 00000000 0291fa80 0057f198 FunDisc!CFunctionInstanceCollectionQuery::Execute+0x66 (FPO: [2,0,4])

21        1c 0291fa70 74892d21 0057f198 00000001 00000000 NetworkItemFactory!CNetworkItemFactory::_StartFDQuery+0x84 (FPO: [4,0,4])

22        30 0291faa0 74892d8b 0057f198 0291fac0 7489478d NetworkItemFactory!CNetworkItemFactory::s_StartFDInMTA+0xa8 (FPO: [1,3,4])

23         c 0291faac 7489478d 0057f198 00000000 00000000 NetworkItemFactory!FDBackgroundThreadHandler+0x1b (FPO: [1,0,0])

24        14 0291fac0 76d24911 01e4abd0 0291fb0c 7718e4b6 NetworkItemFactory!ThreadHandler+0xf (FPO: [1,0,0])

25         c 0291facc 7718e4b6 01e4abd0 75a3404c 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])

26        40 0291fb0c 7718e489 7489477e 01e4abd0 ffffffff ntdll!__RtlUserThreadStart+0x23 (FPO: [SEH])

27        18 0291fb24 00000000 7489477e 01e4abd0 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])

 

From the stack above, you can clearly see that seven frames below the top is what ‘was’ happening just ‘before’ the dll got loaded. How do I get to the point just ‘after’ the module gets loaded? Well one way is to use ‘gu’ twelve times. Note: The purpose of this is to show you how to use the various exception related commands.

 

0:014> gu

eax=00000000 ebx=00000000 ecx=0291df74 edx=771a9a94 esi=7ffad000 edi=20000000

eip=771a8764 esp=0291df78 ebp=0291dfb8 iopl=0         nv up ei pl zr na pe nc

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

ntdll!ZwMapViewOfSection+0xc:

771a8764 c22800          ret     28h

<SNIP>

 

This is the point just ‘after’ MPR.DLL got loaded.

 

0:014> kvn ffff

 #   Memory  ChildEBP RetAddr  Args to Child             

00           0291e9dc 75a469fe 0291ea04 0291ecec 0291e9fc ole32!CClassCache::CDllPathEntry::Create_rl+0x37 (FPO: [3,3,4])

01       24c 0291ec28 75a46913 00000001 0291ecec 0291ec58 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4 (FPO: [3,139,4])

02        48 0291ec70 75a467b4 00000001 00505350 0291ec9c ole32!CClassCache::GetClassObjectActivator+0x224 (FPO: [3,10,4])

03        38 0291eca8 75a474c0 0291ecec 00000000 0291f2f4 ole32!CClassCache::GetClassObject+0x30 (FPO: [1,3,4])

04        7c 0291ed24 75a4705b 75b4e428 00000000 0291f2f4 ole32!CServerContextActivator::CreateInstance+0x110 (FPO: [4,25,4])

05        40 0291ed64 75a4772d 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

06        54 0291edb8 75a4764f 75b4e4ac 00000000 0291f2f4 ole32!CApartmentActivator::CreateInstance+0x112 (FPO: [4,13,4])

07        20 0291edd8 75a47609 75b4e48c 00000001 00000000 ole32!CProcessActivator::CCICallback+0x6d (FPO: [6,1,4])

08        20 0291edf8 75a475ba 75b4e48c 0291f150 00000000 ole32!CProcessActivator::AttemptActivation+0x2c (FPO: [7,0,0])

09        3c 0291ee34 75a476af 75b4e48c 0291f150 00000000 ole32!CProcessActivator::ActivateByContext+0x4f (FPO: [6,3,4])

0a        28 0291ee5c 75a4705b 75b4e48c 00000000 0291f2f4 ole32!CProcessActivator::CreateInstance+0x49 (FPO: [4,1,4])

0b        40 0291ee9c 75a47121 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

0c       260 0291f0fc 75a4705b 75b4e4b4 00000000 0291f2f4 ole32!CClientContextActivator::CreateInstance+0xb0 (FPO: [4,6,4])

0d        40 0291f13c 75a472c5 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])

0e       7d4 0291f910 75a6e2a2 0291f9f4 00000000 00000017 ole32!ICoCreateInstanceEx+0x403 (FPO: [8,50,4])

0f        60 0291f970 75a6e203 0291f9f4 00000000 00000017 ole32!CComActivator::DoCreateInstance+0xd9 (FPO: [7,11,4])

10        24 0291f994 75a6e1bc 0291f9f4 00000000 00000017 ole32!CoCreateInstanceEx+0x38 (FPO: [6,0,0])

11        30 0291f9c4 72751489 0291f9f4 00000000 00000017 ole32!CoCreateInstance+0x37 (FPO: [5,3,4])

12        44 0291fa08 727550b9 00000000 022cd14c 022cd140 FunDisc!CQueryWorker::CreateProvider+0x1a6 (FPO: [1,7,4])

13        34 0291fa3c 72755028 0291fa80 00000000 00000000 FunDisc!CQueryWorker::Execute+0x77 (FPO: [1,7,4])

14        18 0291fa54 74892c26 00000000 0291fa80 0057f198 FunDisc!CFunctionInstanceCollectionQuery::Execute+0x66 (FPO: [2,0,4])

15        1c 0291fa70 74892d21 0057f198 00000001 00000000 NetworkItemFactory!CNetworkItemFactory::_StartFDQuery+0x84 (FPO: [4,0,4])

16        30 0291faa0 74892d8b 0057f198 0291fac0 7489478d NetworkItemFactory!CNetworkItemFactory::s_StartFDInMTA+0xa8 (FPO: [1,3,4])

17         c 0291faac 7489478d 0057f198 00000000 00000000 NetworkItemFactory!FDBackgroundThreadHandler+0x1b (FPO: [1,0,0])

18        14 0291fac0 76d24911 01e4abd0 0291fb0c 7718e4b6 NetworkItemFactory!ThreadHandler+0xf (FPO: [1,0,0])

19         c 0291facc 7718e4b6 01e4abd0 75a3404c 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])

1a        40 0291fb0c 7718e489 7489477e 01e4abd0 ffffffff ntdll!__RtlUserThreadStart+0x23 (FPO: [SEH])

1b        18 0291fb24 00000000 7489477e 01e4abd0 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])