• Ntdebugging Blog

    Smoking Gun Pool Corruption


    Hello, my name is Ron Stock and I’m an Escalation Engineer on the Microsoft Platforms Global Escalation Services Team. Today I’m going to talk about pool corruption which manifests itself in various ways.  It’s usually hard to track down because the culprit is long gone when the machine crashes. Tools such as Special Pool make our debug lives easier; however tracking down corruption doesn’t always have to make you pull your hair out. In some cases simply re-tracing the steps of the crash can reveal a smoking gun.

    Let’s take a look at a real world example. First we need to be in the right context so we set the trap frame to give us the register context when the machine crashed.


    2: kd> .trap 0xfffffffff470662c

    ErrCode = 00000002

    eax=35303132 ebx=fd24d640 ecx=fd24d78c edx=fd24d784 esi=fd24d598 edi=fd24d610

    eip=e083f7a5 esp=f47066a0 ebp=f47066e0 iopl=0         nv up ei pl nz na po nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010202


    e083f7a5 ff4818          dec     dword ptr [eax+18h]  ds:0023:3530314a=????????


    From the register output we can tell that the system crashed while attempting to dereference a pointer at memory location [eax+18h]. The value stored in register eax is probably the address of a structure given that the code is attempting to dereference offset 18 from the base of eax. Currently eax is pointing to 0x35303132 which is clearly not a valid kernel mode address.  Most kernel mode addresses on 32-bit systems will be above the 0x80000000 range assuming the machine is not using something like the /3GB switch. Our mission now is to determine how eax was set.


    First we’ll unassemble the failing function using the UF command.


    2: kd> uf nt!KeWaitForSingleObject





    e083f7a5 ff4818          dec     dword ptr [eax+18h]

    e083f7a8 8b4818          mov     ecx,dword ptr [eax+18h]

    e083f7ab 3b481c          cmp     ecx,dword ptr [eax+1Ch]

    e083f7ae 0f836ef9ffff    jae     nt!KeWaitForSingleObject+0x2a3 (e083f122)


    I truncated the results of the UF output to conserve space in this blog. Instruction e083f7a5 is the line of code that generated the fault so our focus is to determine how the value of eax was set prior to running instruction e083f7a5. Based on the UF output, instruction e083f11c could have jumped to e083f7a5. Let’s investigate how eax is set before instruction e083f11c jumped to the failing line.



    e083f107 8d4208          lea     eax,[edx+8]

    e083f10a 8b4804          mov     ecx,dword ptr [eax+4]

    e083f10d 8903            mov     dword ptr [ebx],eax

    e083f10f 894b04          mov     dword ptr [ebx+4],ecx

    e083f112 8919            mov     dword ptr [ecx],ebx

    e083f114 895804          mov     dword ptr [eax+4],ebx

    e083f117 8b4668          mov     eax,dword ptr [esi+68h]

    e083f11a 85c0            test    eax,eax

    e083f11c 0f8583060000    jne     nt!KeWaitForSingleObject+0x25b (e083f7a5) ß--Jump


    Instruction e083f117 moves a value into eax so I’m dumping the value here.


    2: kd> dd esi+68h l1

    fd24d600  35303132


    Bingo! There’s our bad value of 35303132 which is the value of the eax register too, so we probably took this code path. Just to confirm the current value of eax, I’m dumping the register which should mirror the results for eax when using the “r” command to get the full register set.


    2: kd> r eax

    Last set context:



    Now our focus moves to why dword ptr [esi+68h] points to the bad value? Without source code this can be challenging to narrow down, however the !pool command comes in handy for cases like this.  


    2: kd> ? esi+68h

    Evaluate expression: -47917568 = fd24d600


    Let’s examine fd24d600 a little more in detail using the !pool command. The !pool command neatly displays an entire page of 4k kernel memory listing all of the allocations contained on the page. From the output we can determine that our address is allocated from NonPaged pool and holds some sort of thread data, evidenced by the Thre tag next to our allocation. Notice the asterisk next to fd24d578 indicating the start of our pool. Virtual address fd24d578 is the beginning of an 8 byte pool header, and the header is followed by the actual data blob. Be aware that not all memory is allocated from the pool so the !pool command is not always useful. I have more information on !pool  later in the blog.


    2: kd> !pool fd24d600

    Pool page fd24d600 region is Nonpaged pool

     fd24d000 size:  270 previous size:    0  (Allocated)  Thre (Protected)

     fd24d270 size:   10 previous size:  270  (Free)       `.lk

     fd24d280 size:   40 previous size:   10  (Allocated)  Ntfr

     fd24d2c0 size:   20 previous size:   40  (Free)       CcSc

     fd24d2e0 size:  128 previous size:   20  (Allocated)  PTrk

     fd24d408 size:  128 previous size:  128  (Allocated)  PTrk

     fd24d530 size:    8 previous size:  128  (Free)       Mdl

     fd24d538 size:   28 previous size:    8  (Allocated)  Ntfn

     fd24d560 size:   18 previous size:   28  (Free)       Muta


    *fd24d578 size:  270 previous size:   18  (Allocated) *Thre (Protected) ß-our pool


     fd24d7e8 size:  428 previous size:  270  (Allocated)  Mdl

     fd24dc10 size:   30 previous size:  428  (Allocated)  Even (Protected)

     fd24dc40 size:   30 previous size:   30  (Allocated)  TCPc

     fd24dc70 size:   18 previous size:   30  (Free)       SeTd

     fd24dc88 size:   28 previous size:   18  (Allocated)  Ntfn

     fd24dcb0 size:  128 previous size:   28  (Allocated)  PTrk

     fd24ddd8 size:  228 previous size:  128  (Allocated)  tdLL


    I’ll dump out the contents of the allocation using the dc command starting at the pool header for this block of memory. Remember, we expect to move a value from [esi+68] into eax. Later the code dereferences [eax+18] which leads me to believe that eax is the base of a structure. So we expect a valid Kernel mode value to be moved into eax rather than something like a string, otherwise the code wouldn’t dereference an offset.



    2: kd> dc fd24d578

    fd24d578  0a4e0003 e5726854 00000003 00000002  ..N.Thr.........

    fd24d588  eb10ee70 20000000 e08b5c60 eb136f96  p...... `\...o..

    fd24d598  006e0006 00000000 fd24d5a0 fd24d5a0  ..n.......$...$.

    fd24d5a8  fd24d5a8 fd24d5a8 f4707000 f4704000  ..$...$..pp..@p.

    fd24d5b8  f4706d48 00000000 fd24d700 fd24d700  Hmp.......$...$.

    fd24d5c8  fd24d5c8 fd24d5c8 fd270290 01000100  ..$...$...'.....

    fd24d5d8  00000002 00000000 00000001 01000a02  ................

    fd24d5e8  00000000 fd24d640 32110000 0200009f  ....@.$....2....

    2: kd> dc

    fd24d5f8  00000000 20202020 32313532 000a6953  ....   25125Si.. <-- appears to be a string.

    fd24d608  20202020 20202020 20202020 5c4e4556              VEN\

    fd24d618  32313532 20202020 20202020 20202020  2512           

    fd24d628  00000000 00000000 00000000 00000000  ................

    fd24d638  00000000 00000000 fd24d78c fd24d78c  ..........$...$.

    fd24d648  00000000 fd24d784 fd24d640 30010000  ......$.@.$....0

    fd24d658  00343033 00000000 00000000 00000000  304.............

    fd24d668  00000000 01000000 00000000 00000000  ................

    2: kd> dc

    fd24d678  fd24d598 00000000 00000000 00000000  ..$.............

    fd24d688  fd24d618 fd24d618 fd24d598 fd24d610  ..$...$...$...$.

    fd24d698  00000000 00010102 00000000 00000000  ................

    fd24d6a8  00000000 00000000 e08aeee0 00000000  ................

    fd24d6b8  00000801 0000000f fd270290 0000000f  ..........'.....

    fd24d6c8  fd24d5c0 fd24d6d0 00000000 00000000  ..$...$.........

    fd24d6d8  00000000 00000000 00000000 00000000  ................

    fd24d6e8  00000000 00000000 f4707000 06300612  .........pp...0.


    Examining the memory contents above you can clearly see a string overwrite starting around 0xfd24d5f8. The memory we dereferenced, fd24d600 or [esi+68], is right in the middle of the string. The string appears to be a vendor number for a piece of hardware. After examining the setupapi.log and the OEM**.inf  files in the Windows\inf directory we found a similar string and narrowed it down to a third party.


    A little more on the !pool command is important to mention. The memory address of interest may not always be allocated from the pool in which case you would encounter a message similar to this.


    0: kd> !pool 80000ae5

    Pool page 80000ae5 region is Unknown

    80000000 is not a valid large pool allocation, checking large session pool...

    80000000 is freed (or corrupt) pool

    Bad allocation size @80000000, too large



    *** An error (or corruption) in the pool was detected;

    *** Pool Region unknown (0xFFFFFFFF80000000)


    *** Use !poolval 80000000 for more details.



    If this had been the case I would have enabled Special Pool to narrow down the culprit.


  • Ntdebugging Blog

    Tracking down MmSt paged pool usage



    A trend that I’ve noticed recently are cases involving paged pool depletion with high MmSt tag usage that remains after trying KB304101 (PoolUsageMaximum). These pool allocations are used by the memory manager for section object prototype PTEs. There are generally only two options when this happens: 1) upgrade to a 64-bit platform, or 2) reduce the size of the volumes. But we may want to know what mapped files are using this memory. Here is how it can be done. Start with !memusage.


    5: kd> !memusage

     loading PFN database

    loading (100% complete)

    Compiling memory usage data (99% Complete).

                 Zeroed:  19073 ( 76292 kb)

                   Free:      0 (     0 kb)

                Standby: 1468824 (5875296 kb)

               Modified:    368 (  1472 kb)

        ModifiedNoWrite:   1927 (  7708 kb)

           Active/Valid: 605772 (2423088 kb)

             Transition:      0 (     0 kb)

                    Bad:      0 (     0 kb)

                Unknown:      0 (     0 kb)

                  TOTAL: 2095964 (8383856 kb)

      Building kernel map

      Finished building kernel map

    Scanning PFN database - (100% complete)


    Following this you will see the list of mapped files and their control areas.


      Usage Summary (in Kb):

    Control Valid Standby Dirty Shared Locked PageTables  name

    8c62a638  1108  945868  3064     0     0     0  mapped_file( $Mft )


    The control area is the address at the far left and has a Segment field that contains the total number of PTEs.


    5: kd> dt 8c62a638 _CONTROL_AREA Segment->TotalNumberOfPtes


       +0x000 Segment                    :

          +0x004 TotalNumberOfPtes          : 0x1e8b00


    The MmSt allocations contain these PTEs so all we need to do is multiply this by the size of a PTE to get the total size of the MmSt allocations for this control area. Note that there may be multiple allocations for this control area, but this number will reflect the total size all these allocations.


    5: kd> ?? 0x1e8b00 * sizeof(nt!_MMPTE)

    unsigned int 0xf45800


    So now we know the MmSt size in bytes for a single control area, or mapped file. What if we would like to see the totals for all mapped files from the !memusage output? First, place the !memusage output in a text file and remove all header information. You will also need to remove all tail information including the page file and process summaries. Every line should look like these.


    8c62a638  1108  945868  3064     0     0     0  mapped_file( $Mft )

    8b06ac18   516      0     0     0     0     0    No Name for File


    We want to include the “No Name for File” entries since those are valid mapped files even though the name could not be located. Next strip out everything but the control area address. You can use Excel or any other tool that allows you to select and delete columns in a text file. Now we have a file with a single column of all the control areas on the system. The following debugger command script can be used to process this file.


    $$ countptes.txt script

    r $t2 = 0;


    $$ Replace the memusage.txt file name with your file name.

    .foreach /f (ca "memusage.txt") {

        r $t1 = @@c++(((nt!_CONTROL_AREA *)(0x${ca}))->Segment->TotalNumberOfPtes);

        .printf "Control Area %p : %d\n", ${ca}, @$t1;

        r $t2 = @$t2 + @$t1;



    .printf "Total PTEs : %d\n", @$t2;

    .printf "MmSt size  : %d bytes\n", (@$t2 * @@c++(sizeof(nt!_MMPTE)));


    The following command will execute the script.


    5: kd> $$><countptes.txt


    This will show the number of PTEs for each control area, followed by a summary.


    Total PTEs : 62790244

    MmSt size  : 502321952 bytes


    A common high user of MmSt allocations is $Mft. The cache manager will hold the MmSt allocations for these file system metadata files at a cost of up to 4 files per PTE. This technique can be used to determine how much $Mft is using MmSt pool memory by first using findstr at a command prompt to isolate just those values from the !memusage output.


    C:\Projects>findstr /c:"$Mft" memusage.txt >mftusage.txt


    After stripping out the control area addresses with Excel and running the command script you’ll have the size of the MmSt allocations for just the $Mft files. If this is consuming most of the MmSt bytes then you are limited to the options mentioned at the beginning of this article. There may be other options if something else is the primary user but it will likely involve reducing some heavy load on the system.



  • Ntdebugging Blog

    Work Queues and Dispatcher Headers




    Hi everyone, Bob here again with a description of Work Queues and Dispatcher Headers.  For those of you that look at dumps, you may have noticed that there are always threads waiting at KeRemoveQueue.  You may have wondered what this function does.  Well, I’m glad you asked… J



    What are those threads doing?

    Those threads waiting on the Remove Queue are worker threads.  Worker threads are used when a system task cannot or does not want to do a particular task.  For example, a thread running a DPC cannot pend and wait for a task to be done, so it sends the work to a worker thread.



    How does this mechanism work?

    The worker thread and the entities that are going to give the worker thread work, each know of a KQUEUE structure.  The KQUEUE structure is initialized and, since the queue has an embedded dispatcher object, the worker thread pends on it waiting to be signaled.  That is what you see on one of these waiting stacks.


    Below is a KQUEUE:


    typedef struct _KQUEUE {
        LIST_ENTRY EntryListHead;
        ULONG CurrentCount;
        ULONG MaximumCount;
        LIST_ENTRY ThreadListHead;



    Below is an example of a waiter:


    Priority 9 BasePriority 9 PriorityDecrement 0

     Child-SP          RetAddr           Call Site

     fffffadc`b053dab0 fffff800`01027752 nt!KiSwapContext+0x85

     fffffadc`b053dc30 fffff800`01024ef0 nt!KiSwapThread+0x3c9    ß Waits on the dispatcher object

     fffffadc`b053dc90 fffffadc`b9a380b0 nt!KeRemoveQueue+0x656

     fffffadc`b053dd10 fffff800`0124b972 srv!WorkerThread+0xb0

     fffffadc`b053dd70 fffff800`010202d6 nt!PspSystemThreadStartup+0x3e

     fffffadc`b053ddd0 00000000`00000000 nt!KxStartSystemThread+0x16



    What is a dispatcher object?

    A dispatcher object can be passed into kernel routines such as KeWaitForSingleObject.  This object is a synchronization object.  This means that a thread can wait on this object until another thread “signals” it.  The function KeRemoveQueue is waiting for its dispatcher object to be “signaled”.


    Below is a dispatcher object.  Basically threads are queued on this object until the object is “signaled”.  Once that happens the waiting thread is readied for execution.



       +0x000 Type             : UChar

       +0x001 Absolute         : UChar

       +0x001 NpxIrql          : UChar

       +0x002 Size             : UChar

       +0x002 Hand             : UChar

       +0x003 Inserted         : UChar

       +0x003 DebugActive      : UChar

       +0x000 Lock             : Int4B

       +0x004 SignalState      : Int4B              ß Set when the object is signaled. 

       +0x008 WaitListHead     : _LIST_ENTRY        ß List of waiters on this object.



    Below is an actual dispatcher object for a queue:


    5: kd> dt nt!_dispatcher_header  fffffadcdb3ed368


       +0x000 Type             : 0x4 ''

       +0x001 Absolute         : 0 ''

       +0x001 NpxIrql          : 0 ''

       +0x002 Size             : 0x10 ''

       +0x002 Hand             : 0x10 ''

       +0x003 Inserted         : 0 ''

       +0x003 DebugActive      : 0 ''

       +0x000 Lock             : 1048580

       +0x004 SignalState      : 0

       +0x008 WaitListHead     : _LIST_ENTRY [ 0xfffffadc`db3f4ce8 - 0xfffffadc`da74dce8 ]   ß List of threads waiting for this object



    Each thread has a Wait List entry for each object it is waiting for:


    5: kd> dt nt!_KWAIT_BLOCK 0xfffffadc`db3f4ce8

       +0x000 WaitListEntry    : _LIST_ENTRY [ 0xfffffadc`da74dce8 - 0xfffffadc`db3ed370 ]   ß Next thread waiting for this object

       +0x010 Thread           : 0xfffffadc`db3f4bf0 _KTHREAD     ß The thread waiting for the object

       +0x018 Object           : 0xfffffadc`db3ed368              ß The object the thread is waiting for (queue object)

       +0x020 NextWaitBlock    : 0xfffffadc`db3f4ce8 _KWAIT_BLOCK ß Next object this thread is waiting for  (thread 0xfffffadc`db3f4bf0) if any.

       +0x028 WaitKey          : 0

       +0x02a WaitType         : 0x1 ''

       +0x02b SpareByte        : 0 ''

       +0x02c SpareLong        : 1533340



    What wakes up or signals the thread?

    When the thread is waiting, an entity can call KeInsertQueue to insert elements in the work queue.  When that event happens the thread is woken up and the system will remove the entry from the work queue and the call from KeRemoveQueue will return with the element.  If the thread is not waiting when the call is made, the dispatcher object is put in the queue and the next call to KeRemoveQueue will not pend.



    What about synchronization objects?

    When one thread wants to synchronize with another, a synchronization object (such as an event) is used.  When a thread waits for an event, another thread will signal the event when a job is done such as I/O.  The dispatcher objects above are used for all the synchronization objects.  As you can see by how the structures are designed, one thread can wait for many objects.


    Below this thread is waiting for a synchronization object.



    THREAD fffffadff752b040  Cid 0004.2858  

        fffffadcbe1c3768  NotificationEvent         ß Object thread is waiting for.

    Not impersonating

    DeviceMap                 fffffa80000840f0

    Owning Process            fffffadce06e15a0       Image:         System

    Wait Start TickCount      49664324       Ticks: 247591 (0:01:04:28.609)

    Context Switch Count      1

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Start Address EmcpBase (0xfffffadcbe22d810)

    Stack Init fffffadcb870be00 Current fffffadcb870b940

    Base fffffadcb870c000 Limit fffffadcb8706000 Call 0

    Priority 8 BasePriority 8 PriorityDecrement 0

    Child-SP          RetAddr           : Args to Child                                                           : Call Site

    fffffadc`b870b980 fffff800`01027752 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x85

    fffffadc`b870bb00 fffff800`0102835e : 00000000`00000000 00000000`00000000 fffffadf`f752b0d8 fffffadf`f752b040 : nt!KiSwapThread+0x3c9

    fffffadc`b870bb60 fffffadc`be21832b : 00000000`00000000 fffff800`00000000 00000000`00000000 fffffadc`be88b100 : nt!KeWaitForSingleObject+0x5a6

    fffffadc`b870bbe0 fffffadc`be1bd0da : 00000000`00000004 00000000`00000000 fffffadc`be239c40 00000000`00000000 : EmcpBase+0xb32b

    fffffadc`b870bc20 fffffadc`be22c9a1 : 00000000`00000000 fffffadc`b870bd08 fffffadc`be239c40 fffffadc`e06f6fe0 : EmcpMPAA+0xd0da

    fffffadc`b870bc70 fffffadc`be22d90b : fffffadc`da2338c0 00000000`00000001 fffffadc`d9eb3c10 fffffadc`b870bd08 : EmcpBase+0x1f9a1

    fffffadc`b870bce0 fffff800`0124b972 : fffffadc`d9f85780 fffffadf`f752b040 00000000`00000080 fffffadf`f752b040 : EmcpBase+0x2090b

    fffffadc`b870bd70 fffff800`010202d6 : fffff800`011b1180 fffffadf`f752b040 fffff800`011b5500 00000000`00000000 : nt!PspSystemThreadStartup+0x3e

    fffffadc`b870bdd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16


    Dispatcher header from address above:


    5: kd> dt nt!_dispatcher_header    fffffadcbe1c3768


       +0x000 Type             : 0 ''

       +0x001 Absolute         : 0 ''

       +0x001 NpxIrql          : 0 ''

       +0x002 Size             : 0x6 ''

       +0x002 Hand             : 0x6 ''

       +0x003 Inserted         : 0 ''

       +0x003 DebugActive      : 0 ''

       +0x000 Lock             : 393216

       +0x004 SignalState      : 0

       +0x008 WaitListHead     : _LIST_ENTRY [ 0xfffffadf`f752b138 - 0xfffffadf`f752b138 ]


    Wait block for this thread:


    5: kd> dt 0xfffffadf`f752b138 _KWAIT_BLOCK


       +0x000 WaitListEntry    : _LIST_ENTRY [ 0xfffffadc`be1c3770 - 0xfffffadc`be1c3770 ]

       +0x010 Thread           : 0xfffffadf`f752b040 _KTHREAD

       +0x018 Object           : 0xfffffadc`be1c3768

       +0x020 NextWaitBlock    : 0xfffffadf`f752b138 _KWAIT_BLOCK

       +0x028 WaitKey          : 0

       +0x02a WaitType         : 0x1 ''

       +0x02b SpareByte        : 0 ''

       +0x02c SpareLong        : 1




    I hope this gives a better understanding of Work Queues and Dispatcher Headers.  More detailed information can be found here:  http://msdn2.microsoft.com/en-us/library/ms810047.aspx and here:  http://www.microsoft.com/whdc/driver/kernel/locks.mspx.


  • Ntdebugging Blog

    NTDebugging Puzzler 0x00000005 (Better late than never)


    Hello NTDebuggers, from time to time we see the following problem.  It’s another access violation, and the debug notes below are from a minidump.   


    Here is what we need to know…


    ·         Generally speaking what happened to cause this AV?

    ·         What method you would use to isolate root cause of the failure?


    There are a lot of ways to do this.  We look forward to hearing your approach.


    We will post our methods and answer at the end of the week.   If you need anything please let us know.




    Microsoft (R) Windows Debugger  Version 6.8.0001.0

    Copyright (c) Microsoft Corporation. All rights reserved.



    Loading Dump File [D:\test123.dmp]

    User Mini Dump File: Only registers, stack and portions of memory are available


    0:000> k 123

    ChildEBP RetAddr 

    0017f93c 75e4edb5 ntdll!ZwWaitForMultipleObjects+0x15

    0017f9d8 75e430c3 kernel32!WaitForMultipleObjectsEx+0x11d

    0017f9f4 75ef2084 kernel32!WaitForMultipleObjects+0x18

    0017fa60 75ef22b1 kernel32!WerpReportFaultInternal+0x16c

    0017fa74 75ebbe60 kernel32!WerpReportFault+0x70

    0017fb00 7732d15a kernel32!UnhandledExceptionFilter+0x1c1

    0017fb08 773000c4 ntdll!_RtlUserThreadStart+0x6f

    0017fb1c 77361d05 ntdll!_EH4_CallFilterFunc+0x12

    0017fb44 772eb6d1 ntdll!_except_handler4+0x8e

    0017fb68 772eb6a3 ntdll!ExecuteHandler2+0x26

    0017fc10 772cee57 ntdll!ExecuteHandler+0x24

    0017fc10 10011127 ntdll!KiUserExceptionDispatcher+0xf

    *** ERROR: Module load completed but symbols could not be loaded for crash3.exe

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    0017ff40 0040104a 0x10011127

    0017ffa0 75eb19f1 crash3+0x104a

    0017ffac 7732d109 kernel32!BaseThreadInitThunk+0xe

    0017ffec 00000000 ntdll!_RtlUserThreadStart+0x23


    0:000> lm

    start    end        module name

    00400000 0040d000   crash3     (no symbols)          

    6c250000 6c288000   odbcint    (deferred)            

    6c290000 6c2f5000   odbc32     (deferred)            

    72a00000 72a86000   comctl32   (deferred)            

    74820000 749b4000   comctl32_74820000   (deferred)            

    75240000 75251000   samlib     (deferred)            

    75260000 75281000   ntmarta    (deferred)            

    754b0000 75510000   secur32    (deferred)            

    75510000 75570000   imm32      (deferred)            

    75700000 75790000   gdi32      (deferred)            

    757a0000 75870000   user32     (deferred)            

    758a0000 758a6000   nsi        (deferred)            

    758b0000 759f4000   ole32      (deferred)            

    75a00000 75aaa000   msvcrt     (deferred)            

    75ab0000 75ba0000   rpcrt4     (deferred)            

    75ba0000 75c1d000   usp10      (deferred)            

    75c20000 75c75000   shlwapi    (deferred)            

    75d60000 75e27000   msctf      (deferred)            

    75e30000 75f40000   kernel32   (pdb symbols)  

    76140000 76189000   Wldap32    (deferred)            

    76190000 7624f000   advapi32   (deferred)            

    76250000 76d1e000   shell32    (deferred)             

    76d20000 76d94000   comdlg32   (deferred)            

    76da0000 76dcd000   ws2_32     (deferred)            

    77280000 77287000   psapi      (deferred)            

    77290000 77299000   lpk        (deferred)            

    772b0000 77400000   ntdll      (pdb symbols)     


    Good luck and happy debugging.





    [Update: our answer. Posted 5/13/2008]


    We enjoyed seeing different people’s approaches on this week’s puzzler.   This was a simple module unload.  We loaded a lib, did a GetProcAddress, freed the lib, and called the function.  The dump was a mini dump created via .dump /m C:\dump file.   There are various ways this type of scenario may arise.   Obviously someone could unload a lib, but why?    In most cases I’ve seen, it was due to a ref count problem in a com object.  Poor accounting leading to one too many decrements, and the dll will get unloaded causing a simple crash footprint.


    There are quite a few ways to track this down.  First of all, if you had the debugger attached and got a full dump or /ma dump you would have seen the loaded module list.  This would have been a dead giveaway and part of why we did the .dump /m.  There are other options you can enable that make tracking of module loads easy under the debugger.  I personally like “loader snaps” if I’m trying to track down module load shenanigans.   To enable this, just go into the image section of the gflags tool and enable loader snaps for the exe in question.  Now attach a debugger and watch the mode load and GetProcAddress details scroll by.


    Yet another popular approach is to use process monitor.  This tool is not only easy to set up, but it also gives you great logs with call stacks and other details such as registry accesses.  


    This puzzler provided the bare minumum data required.  We did not give you much to go on because sometimes in real debugging scenarios you have to work with a lack of data.   I really liked how many people questioned the source of the dump file.  It really shows how familiar you all are with the various dump types.


    Great work!


  • Ntdebugging Blog

    How to have a colorful relationship with your dump files


    Hello NTDebuggers…  I look at a lot of dump files every day.  This being the case I like to take full advantage of the customizable look and feel of windbg.   I actually have an association setup between DMP files and a CMD file that loads my customized COLOR workspace each time I double click on a dump file.  I like a black background with colorized source and debug command window output.   

    Here is a snapshot of my typical debug session.


    Here is how you set it up.

    1) Create the following CMD file and put it in your path.  It’s called D.CMD on my system..

    echo off
    Title kd.exe -z %1
    start C:\Debuggers\windbg.exe -z %1 -W color

    2) Load windbg and under view \ options configure your color options.  Note: you will want to set every option in the list box.

    clip_image002 clip_image003

    3) Save the workspace by selecting FILE \ Save Workspace, I saved mine as COLOR

    4) If you don’t have a dump file create one by opening a process in the debugger and doing a .dump /ma C:\test\test.dmp (I’ll assume you have a directory called C:\test)

    5) Select the C:\test\test.dmp file in explorer and right click on the properties. (note I used badwindow.dmp)


    6) Change the dump file association via drilling down to Change \ Browse \ (Select D.CMD)


    7) That’s all there is to it.  Now you can simply double click on any DMP file and windbg will start and load with your COLOR workspace.

    Good luck and happy debugging.

    Jeff Dailey-

  • Ntdebugging Blog

    Announcement: ODbgExt (Open Debugger Extension) on CodePlex



    Hello NTDebuggers, I’d like to announce something new for our community to share.  We have decided to host an Open Source Debugger Extension project called ODbgExt on codeplex.com   Right now it’s just the basic framework.  This will be something we can work on together as a community.  Think of it as a debugger extension by the people for the people.  Jason Epperly and I (Jeff Dailey) will be the project managers for the codeplex environment.   You can find the project on http://www.codeplex.com/ODbgExt



    Project Description

    ODbgExt is an open source debugger extension for WinDbg that is intended to be developed by the debugging community. It is based on the Windows Debugger SDK Sample ‘exts’ and uses the COM Debugger interface IDebugControl. The primary objective of the project is to provide the debugging community a central location to share their debugger extensions that help to isolate common problems in the community and make debugging both live systems and dumps easier. The initial release will be the basic framework that we expect other developers to contribute to. There will be very basic functionality in the initial version. We, GES (Global Escalation Services) intend on doing a series of blogs to talk about writing debugger extensions over the coming year. As we blog about writing extension we will include the code in ODbgExt (This project). This is the same group that runs the
    http://blogs.msdn.com/ntdebugging blog. We encourage you to sign up and contribute your debugger extension ideas and or code to the project. Simply create a codeplex account and request access.

    Things we would like to include:

    • A Graphical representation of Kernel, and User mode execution time by process and thread, while correlating to idle / non-idle time
    • A Graphical representation of idle time for kernel and user mode.
    • A Graphical representation of pool resources used on a per process bases along with handle table counts.
    • Hang detection, scanning for various conditions that could cause hangs in a system or user mode process.
    • Detailed system information such as oldest and newest binaries.
    • Binary info based on vendor name
    • A better dissembler that colorizes calls, jumps and indents to show code flow
    • Extensive use of DML (Debugger Mark-up Language) to enable more point and click debugging within windbg.
    • Support for a SQL Database backend to allow storing information about debugging sessions or binaries in a SQL Database.
    • Warnings when critical thresholds are exceeded such as handle counts over 10,000, Low PTE Conditions, etc.
    • Simplified searching for pool tags in binaries.
    • Support for VBA for Windbg
    • Dump annotation, via dump streaming; the ability to embed data into a dump via the debugger extension and later retrieve it. (Imagine embedded debug notes)
    • Embedding a snapshot of performance data in the dump at the time the dump is taken, ie. CPU, IO etc.

    We hope you are as excited about this project as we are!  Please feel free to suggest more ideas for this project and by all means feel free to sign up and contribute some code!


    Thank you, 


    Jeff Dailey

    Platforms Global Escalation Services

  • Ntdebugging Blog

    NTDebugging Puzzler 0x00000004: This didn’t puzzle the Debug Ninja, how about you?


    Hello NTDebuggers, we have been very impressed with the responses we’ve gotten to our previous puzzlers so far.  We invited the Debug Ninja to come up with a real challenge for this week.

    This server blue screened with a Bug Check 0xD1: DRIVER_IRQL_NOT_LESS_OR_EQUAL.  The challenge this week is to tell us, as specifically as possible, why it crashed.  We’ll give you a hint, the complete answer is not that ebx+5441F815h points to an invalid address.

    1: kd> .bugcheck

    Bugcheck code 000000D1

    Arguments e074281d 00000002 00000001 ba502493


    1: kd> kv

    ChildEBP RetAddr  Args to Child

    f78b6544 ba502493 badb0d00 00000001 00000000 nt!_KiTrap0E+0x2a7 (FPO: [0,0] TrapFrame @ f78b6544)

    f78b65e0 ba50d9d8 00000020 8c32cab8 00000022 tcpip!GetAddrType+0x19f (FPO: [Non-Fpo]) (CONV: stdcall)

    f78b6694 ba50dc56 8c32cab8 8ca71c2c 000005c8 tcpip!IPRcvPacket+0x66c (FPO: [Non-Fpo]) (CONV: stdcall)

    f78b66d4 ba50dd58 00000000 8ca63440 8ca71c0a tcpip!ARPRcvIndicationNew+0x149 (FPO: [Non-Fpo]) (CONV: stdcall)

    f78b6710 bada5550 8c4b53b8 00000000 f78b678c tcpip!ARPRcvPacket+0x68 (FPO: [Non-Fpo]) (CONV: stdcall)

    f78b6764 ba9c614b 8cac2ad0 f78b6784 00000002 NDIS!ethFilterDprIndicateReceivePacket+0x1d2 (FPO: [Non-Fpo]) (CONV: stdcall)


    1: kd> .trap f78b6544

    ErrCode = 00000002

    eax=8c32cab8 ebx=8c323008 ecx=00000001 edx=00000001 esi=8ca71c18 edi=f78b6618

    eip=ba502493 esp=f78b65b8 ebp=f78b65e0 iopl=0         nv up ei pl zr na pe nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00250246


    ba502493 008b15f84154    add     byte ptr [ebx+5441F815h],cl ds:0023:e074281d=??


    The remainder of this puzzler is quite long, click here to see the complete debug notes required to solve this puzzlerPost your responses to this page (and not the complete debug page), so we can keep all the comments together.  Also, we won’t post any of your responses until Friday.

    [Update, posted 4/30/2008]

    Today is Wednesday and so far we haven’t received any complete responses; the partial answers we have received are close but no one has determined the root cause of the problem.  When looking at problems like this it is often useful to ask “why?” when you think you have the answer and think about what other information may be available.  In this situation it is possible to identify a specific problem and even determine appropriate action to fix the problem.

    Perhaps we were a bit overzealous in posting 14 pages of assembly as a puzzler.  Below is an abbreviated debug with less assembly, the link to the complete debug is still available for those of you who want to try the full challenge.  Also, we are going to break our promise not to post responses until Friday and we have posted some responses that we have gotten so far, neither of these responses completely identifies the problem but they are getting close.


    1: kd> dd @ebx+5441F815

    e074281d  ???????? ???????? ???????? ????????

    e074282d  ???????? ???????? ???????? ????????

    e074283d  ???????? ???????? ???????? ????????

    e074284d  ???????? ???????? ???????? ????????

    e074285d  ???????? ???????? ???????? ????????

    e074286d  ???????? ???????? ???????? ????????

    e074287d  ???????? ???????? ???????? ????????

    e074288d  ???????? ???????? ???????? ????????

    1: kd> !pte e074281d

                   VA e074281d

    PDE at 00000000C0603818    PTE at 00000000C0703A10

    contains 000000021B980963  contains E154FC1000000400

    pfn 21b980 -G-DA--KWEV                           not valid

                           Proto: 00000000E154FC10


    1: kd> ub ba50d9d8


    ba50d9c4 51              push    ecx

    ba50d9c5 50              push    eax

    ba50d9c6 53              push    ebx

    ba50d9c7 ff75f4          push    dword ptr [ebp-0Ch]

    ba50d9ca ff75f8          push    dword ptr [ebp-8]

    ba50d9cd ff75ec          push    dword ptr [ebp-14h]

    ba50d9d0 ff7508          push    dword ptr [ebp+8]

    ba50d9d3 e84a040000      call    tcpip!DeliverToUser (ba50de22)


    1: kd> uf tcpip!DeliverToUser



    ba50ded9 8b45f0          mov     eax,dword ptr [ebp-10h]

    ba50dedc 29451c          sub     dword ptr [ebp+1Ch],eax

    ba50dedf 8b471c          mov     eax,dword ptr [edi+1Ch]

    ba50dee2 ff7508          push    dword ptr [ebp+8]

    ba50dee5 2b4510          sub     eax,dword ptr [ebp+10h]

    ba50dee8 57              push    edi

    ba50dee9 8945fc          mov     dword ptr [ebp-4],eax

    ba50deec e817ffffff      call    tcpip!UpdateIPSecRcvBuf (ba50de08)



    ba50def1 807e0906        cmp     byte ptr [esi+9],6

    ba50def5 753e            jne     tcpip!DeliverToUser+0x12f (ba50df35)



    ba50def7 8b430c          mov     eax,dword ptr [ebx+0Ch]

    ba50defa 83b88c00000007  cmp     dword ptr [eax+8Ch],7

    ba50df01 0f857a510000    jne     tcpip!DeliverToUser+0x124 (ba513081)



    ba50df07 837f1000        cmp     dword ptr [edi+10h],0

    ba50df0b 741a            je      tcpip!DeliverToUser+0x114 (ba50df27)



    ba50df0d 640fb61551000000 movzx   edx,byte ptr fs:[51h]

    ba50df15 8b8880010000    mov     ecx,dword ptr [eax+180h]

    ba50df1b 3bca            cmp     ecx,edx

    ba50df1d 0f857145ffff    jne     tcpip!DeliverToUser+0x103 (ba502494)



    ba502494 8b15f84154ba    mov     edx,dword ptr [tcpip!_imp__KeNumberProcessors (ba5441f8)]

    ba50249a 0fbe12          movsx   edx,byte ptr [edx]

    ba50249d 3bca            cmp     ecx,edx

    ba50249f 0f8582ba0000    jne     tcpip!DeliverToUser+0x114 (ba50df27)



    ba50df23 c645f402        mov     byte ptr [ebp-0Ch],2



    ba50df27 640fb60d51000000 movzx   ecx,byte ptr fs:[51h]

    ba50df2f 898880010000    mov     dword ptr [eax+180h],ecx



    ba5024a5 e979ba0000      jmp     tcpip!DeliverToUser+0x110 (ba50df23)



    [Update: our answer, posted 5/2/2008]

    In response:  I would like to congratulate the people who figured this out and say thank you to all of the people who gave this a try and sent in a response.  I’m not sure if we made a challenge that was too easy, too difficult, or too long, but we did not get as many responses as we had expected.  If you looked at this and chose not to post a comment, or you have looked at our previous puzzlers but chose to skip this one, let us know why so we can post better puzzlers in the future.

    Here is the answer…

    Experienced assembly readers will recognize that this instruction is a valid instruction, but it takes action that no assembly writer would ever take.  An offset of 5441F815 is too big to be realistic; no structure is 5441F815 bytes long.

    1: kd> .trap f78b6544

    ErrCode = 00000002

    eax=8c32cab8 ebx=8c323008 ecx=00000001 edx=00000001 esi=8ca71c18 edi=f78b6618

    eip=ba502493 esp=f78b65b8 ebp=f78b65e0 iopl=0         nv up ei pl zr na pe nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00250246


    ba502493 008b15f84154    add     byte ptr [ebx+5441F815h],cl ds:0023:e074281d=??


    Why is this instruction telling us to go to a crazy offset?  One possible cause is that the instruction is corrupt; a good debug method for this problem would be to unassemble the code around this instruction to see if the adjacent instructions look valid, and then to compare these instructions to ones in a known good version of tcpip.sys.  Because this instruction was not corrupt neither of these steps helped, so I started looking at how we got to this instruction.

    Looking at the prior function on the stack, IPRcvPacket, I see that we actually called DeliverToUser and not GetAddrType.  Why is our current instruction in GetAddrType instead of DeliverToUser?  It is possible that the call went to the wrong place, that DeliverToUser called something else which returned to the wrong place, or that somehow the flow of execution did not go as intended.

    1: kd> ub ba50d9d8


    ba50d9c4 51              push    ecx

    ba50d9c5 50              push    eax

    ba50d9c6 53              push    ebx

    ba50d9c7 ff75f4          push    dword ptr [ebp-0Ch]

    ba50d9ca ff75f8          push    dword ptr [ebp-8]

    ba50d9cd ff75ec          push    dword ptr [ebp-14h]

    ba50d9d0 ff7508          push    dword ptr [ebp+8]

    ba50d9d3 e84a040000      call    tcpip!DeliverToUser (ba50de22)


    The next step I took was to analyze the DeliverToUser assembly to see how we might have ended up in GetAddrType instead of DeliverToUser.  What I found, and what several of our readers found, was that DeliverToUser contained a jump that went to 1 byte away from where we encountered the bogus instruction.

    ba50df1d 0f857145ffff    jne     tcpip!DeliverToUser+0x103 (ba502494)


    Why would eip point to ba50d9d3 instead of ba50d9d4?  That is the challenging question in this exercise, because the answer is not obvious.  Some engineers will conclude that the hardware must not have done what we asked it to do, and say that the hardware must be at fault.  However, we don’t really have any proof at this time that the hardware has actually done something wrong.  Hardware problems usually manifest themselves as a bit flip, where one bit is changed from the correct value.  We have gone from a 4 (0100) to a 3 (0011); a 3 bit change is hard to blame on hardware.

    To answer the question “Why would eip point to ba50d9d3 instead of ba50d9d4?” we need to carefully analyze the instruction at ba50df1d.  I broke out my trusty instruction set reference and looked up the inner workings of the jne instruction.  On page 3-542 of the Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M I found the layout for a jne to a 32 bit offset.  The first two bytes are 0F 85, and the next 8 bytes are the offset we are going to jump too.  This is our opcode:

    0f 85 ff ff 45 71

    Remember that by the time we make this jump the instruction pointer will have been moved to the next instruction, that will be at ba50df23 (ba50df1d plus the 6 bytes that are in this instruction).  If we take this offset (ff ff 45 71) and add it to the next instruction we see that we’ll end up at ba502494 as expected.

    1: kd> ?ba50df23+ffff4571

    Evaluate expression: -1169152876 = ba502494


    If we change just one bit in this opcode, if we change 71 (0111 0001) to be 70 (0111 0000), we end up at ba502493 instead.  Now we can conclude that the hardware must have made a mistake, and now that we know what bit was flipped we can confidently say that we have identified the mistake that was made.

    1: kd> ?ba50df23+ffff4570

    Evaluate expression: -1169152877 = ba502493


    We still have some unanswered questions.  The most common question I get when I’ve found a bit flip is what piece of hardware is broken?  Unfortunately I can’t answer this question with the data I have.  In this case the jne instruction was good when we read it to put it in the dump but obviously it was bad when it was executed.  Thinking about the parts needed to get the instruction from RAM and execute it I could speculate that a processor, riser board, or motherboard could be at fault but I don’t have any data that proves where the problem happened.  It is unlikely that hardware diagnostics will find this type of failure because it is intermittent.  Unless the problem manifests itself while the diagnostics are running, they will likely report that there is no problem.  In the case of this server I wasn’t able to find out specifically what failed, uptime was more important than troubleshooting so a whole new server was put in place of this one and the problem has not been seen since.

    Bit flips can be caused by any number of problems.  Problems with a processor, a processor riser board, a motherboard, or RAM are common causes.  As we have documented before in this blog improper voltage or overheating can also cause these types of problems.  However the potential list of bit-flip causes is nearly endless; power supplies, power cables, incoming power fluctuations, radiation from nearby electronics, even sun spots, can all cause bit flips.

    I should also explain why the stack showed that we were in GetAddrType instead of DeliverToUser.  This is because tcpip.sys has been optimized with Basic Block Tools (BBT) optimization, blocks of code from the same function are located in different pages to reduce the cache hits in common code paths.  This means that blocks of the GetAddrType code are located next to the DeliverToUser code.  As demonstrated below there is a block of GetAddrType code that ends with an unconditional jmp at ba50248f and there is a block of DeliverToUser code that starts at ba502494.

    1: kd> ub ba502494


    ba502474 f6401401        test    byte ptr [eax+14h],1

    ba502478 0f84bd760000    je      tcpip!GetAddrType+0xb4 (ba509b3b)

    ba50247e 85db            test    ebx,ebx

    ba502480 0f84bd6b0200    je      tcpip!GetAddrType+0xa8 (ba529043)

    ba502486 32c0            xor     al,al

    ba502488 e967770000      jmp     tcpip!GetAddrType+0x15a (ba509bf4)

    ba50248d 33c9            xor     ecx,ecx

    ba50248f e993770000      jmp     tcpip!GetAddrType+0x19b (ba509c27)

    1: kd> u


    ba502494 8b15f84154ba    mov     edx,dword ptr [tcpip!_imp__KeNumberProcessors (ba5441f8)]

    ba50249a 0fbe12          movsx   edx,byte ptr [edx]

    ba50249d 3bca            cmp     ecx,edx

    ba50249f 0f8582ba0000    jne     tcpip!DeliverToUser+0x114 (ba50df27)

    ba5024a5 e979ba0000      jmp     tcpip!DeliverToUser+0x110 (ba50df23)

    ba5024aa 90              nop

    ba5024ab 90              nop

    ba5024ac 90              nop


    Congratulations to those who have read this far.  Hopefully you now understand the cause of a problem that has stumped many engineers, and you have some new tools in your toolbox when you’re investigating a tricky problem.  Remember to keep asking “why?” and not to assume anything; those are the approaches I used to find the source of this problem.

  • Ntdebugging Blog

    The Debug Ninja speaks: Debugging a stop 0x20


    Hello, I am the Debug Ninja.  Recently Jeff approached me about contributing to this debugging blog, and as the Debug Ninja I felt an obligation to share at least a small amount of Ninja knowledge with the world.  Today I will start by explaining how to debug stop 20 blue screens.  Unlike typical blue screens where debugging starts with stack analysis, a stop 20 requires a different approach.


    Now you are probably wondering, “Great Debug Ninja, what is a stop 20 blue screen?”  A stop 20’s literal translation is KERNEL_APC_PENDING_DURING_EXIT.  In common language that means that we attempted to terminate a thread while Asynchronous Procedure Calls were disabled for this thread.  The operating system forces a bugcheck under these conditions because if APCs are disabled at thread termination it means a driver has a bug that disabled APCs more times than it enabled them.  Usually these bugs result in difficult to debug crashes or hangs later, so we stop the system at thread termination to make debugging easier.


    Perhaps you are now asking “How might a driver disable APC’s more times than it enables them?”  Good question Grasshopper.  As described in the WDK, a driver can disable APCs by entering a critical region, a guarded region, or by raising the IRQL to APC_LEVEL or higher.  However, not all of those methods will result in a stop 20 bugcheck.  Only calls that change the APC disable count in the KTHREAD structure can result in a stop 20.  The APIs KeEnterCriticalRegion, KeWaitForSingleObject, KeWaitForMultipleObjects, KeWaitForMutexObject, or FsRtlEnterFileSystem will decrement the APC disable count.  A driver should then call KeLeaveCriticalRegion, KeReleaseMutex, or FsRtlExitFileSystem to re-enable APCs; these calls increment the APC disable count in the KTHREAD structure.


    As you review the APIs mentioned above you will see that there are several ways for a driver writer to get into a situation where APCs are disabled and not re-enabled.  Many of the ways we get into this situation are difficult to debug and require instrumentation that is beyond the scope of this blog.  In this blog we are going to focus on the most common cause a stop 20 blue screen, an orphaned ERESOURCE.  A brief review of the WDK documentation for ExAcquireResourceExclusiveLite and ExAcquireResourceSharedLite will reveal that before you can acquire an ERESOURCE you must first disable normal kernel APC delivery by calling KeEnterCriticalRegion.  This means that if you orphan an ERESOURCE you will leave the APC disable count decremented, and when the thread is terminated the system will bugcheck.


    Now you certainly want to ask “Kind Ninja, will you show me how to debug such a problem?”  Absolutely Grasshopper!


    We start by opening the dump and checking the cause of the crash.


    1: kd> .bugcheck

    Bugcheck code 00000020

    Arguments 00000000 0000fffc 00000000 00000001


    Next we check what thread was being terminated; we can see this in the call stack as the first parameter to PspTerminateThreadByPointer.


    1: kd> kb

    ChildEBP RetAddr  Args to Child

    b5e57c80 8094c546 00000020 00000000 0000fffc nt!KeBugCheckEx+0x1b

    b5e57d18 8094c63f 00000000 00000000 8bf99330 nt!PspExitThread+0x64c

    b5e57d30 8094c991 8bf99330 00000000 00000001 nt!PspTerminateThreadByPointer+0x4b

    b5e57d54 8088978c 00000000 00000000 05c2ffb8 nt!NtTerminateThread+0x71

    b5e57d54 7c8285ec 00000000 00000000 05c2ffb8 nt!KiFastCallEntry+0xfc


    Finally we can look at the list of ERESOURCE structures with !locks to see if our thread owns any of these locks.


    1: kd> !locks


    KD: Scanning for held locks....


    Resource @ Ninja!NinjaLock (0x808a48c0)    Shared 2 owning threads

        Contention Count = 35

         Threads: 8bf99330-02<*> 8c1d19f0-01<*>


    !locks shows us that the thread in question is a shared owner of the Ninja driver’s NinjaLock.  The author of the Ninja driver needs to look at how their driver uses this ERESOURCE and determine why the lock was orphaned, unfortunately that means I need to do more work.  To find the bug that caused this problem I reviewed the code that uses NinjaLock.  That code was acquiring the NinjaLock inside of a try-except block.  I forgot to release the lock in the exception handler, resulting in the orphaned lock that we see here.  I guess that’s why I’m the Debug Ninja, and not the Code Writing Ninja.

  • Ntdebugging Blog

    NTDebugging Puzzler 0x00000003 (Matrix Edition) Some assembly required.


    Hello NTdebuggers, I'm very impressed with the depth of the answers we are seeing from our readers.  As I stated in last week's response, this week's puzzler is going to be harder.  With that said let's take it up a notch.  One of the things that is really cool about be an Escalation Engineer in GES/CPR is how far we go in the pursuit of solving complex problems.  If we're debugging some Microsoft code in a kernel dump or user mode, and our quest takes us into a binary that we don't have code or symbols for, we don't stop, we forge on!  Over the years there are members of our team that have had to port to or support Alpha, PowerPC, MIPs, IA64 and x64, myself included.  As a result most of us have books for just about every mainstream processor under the sun.  It's a good idea if you're going to be debugging on these platforms to have general working knowledge the CPUs you will encounter.  The most common CPU's we deal with are x86, followed by x64 and IA64.  Microsoft doesn't support PPC, MIPS or Alpha anymore unless you're dealing with Xbox consoles, and those are PPC.  That said, this week's challenge is to tell us what the following assembly does.  You can tell us in C, or break it down and comment on the various sections. 

    Some people like cross word puzzles, Most of us in GES/CPR love to esreveR reenignE assembler. Have FUN!


    “I don’t even see the code anymore”  Cypher...
    0:000> uf myfun
    puzzler3!myfun [c:\source\puzzler\puzzler3\puzzler3\puzzler3.cpp @ 20]:
       20 00cc1480 55              push    ebp
       20 00cc1481 8bec            mov     ebp,esp
       20 00cc1483 81ecf0000000    sub     esp,0F0h
       20 00cc1489 53              push    ebx
       20 00cc148a 56              push    esi
       20 00cc148b 57              push    edi
       20 00cc148c 8dbd10ffffff    lea     edi,[ebp-0F0h]
       20 00cc1492 b93c000000      mov     ecx,3Ch
       20 00cc1497 b8cccccccc      mov     eax,0CCCCCCCCh
       20 00cc149c f3ab            rep stos dword ptr es:[edi]
       26 00cc149e 8b4508          mov     eax,dword ptr [ebp+8]
       26 00cc14a1 50              push    eax
       26 00cc14a2 e803fcffff      call    puzzler3!ILT+165(_strlen) (00cc10aa)
       26 00cc14a7 83c404          add     esp,4
       26 00cc14aa 8945e0          mov     dword ptr [ebp-20h],eax
       28 00cc14ad 8b45e0          mov     eax,dword ptr [ebp-20h]
       28 00cc14b0 8945f8          mov     dword ptr [ebp-8],eax
       28 00cc14b3 eb09            jmp     puzzler3!myfun+0x3e (00cc14be)
       28 00cc14b5 8b45f8          mov     eax,dword ptr [ebp-8]
       28 00cc14b8 83e801          sub     eax,1
       28 00cc14bb 8945f8          mov     dword ptr [ebp-8],eax
       28 00cc14be 837df800        cmp     dword ptr [ebp-8],0
       28 00cc14c2 7e60            jle     puzzler3!myfun+0xa4 (00cc1524)
       30 00cc14c4 c745ec00000000  mov     dword ptr [ebp-14h],0
       30 00cc14cb eb09            jmp     puzzler3!myfun+0x56 (00cc14d6)
       30 00cc14cd 8b45ec          mov     eax,dword ptr [ebp-14h]
       30 00cc14d0 83c001          add     eax,1
       30 00cc14d3 8945ec          mov     dword ptr [ebp-14h],eax
       30 00cc14d6 8b45f8          mov     eax,dword ptr [ebp-8]
       30 00cc14d9 83e801          sub     eax,1
       30 00cc14dc 3945ec          cmp     dword ptr [ebp-14h],eax
       30 00cc14df 7d41            jge     puzzler3!myfun+0xa2 (00cc1522)
       32 00cc14e1 8b4508          mov     eax,dword ptr [ebp+8]
       32 00cc14e4 0345ec          add     eax,dword ptr [ebp-14h]
       32 00cc14e7 0fbe08          movsx   ecx,byte ptr [eax]
       32 00cc14ea 8b5508          mov     edx,dword ptr [ebp+8]
       32 00cc14ed 0355ec          add     edx,dword ptr [ebp-14h]
       32 00cc14f0 0fbe4201        movsx   eax,byte ptr [edx+1]
       32 00cc14f4 3bc8            cmp     ecx,eax
       32 00cc14f6 7e28            jle     puzzler3!myfun+0xa0 (00cc1520)
       34 00cc14f8 8b4508          mov     eax,dword ptr [ebp+8]
       34 00cc14fb 0345ec          add     eax,dword ptr [ebp-14h]
       34 00cc14fe 8a08            mov     cl,byte ptr [eax]
       34 00cc1500 884dd7          mov     byte ptr [ebp-29h],cl
       35 00cc1503 8b4508          mov     eax,dword ptr [ebp+8]
       35 00cc1506 0345ec          add     eax,dword ptr [ebp-14h]
       35 00cc1509 8b4d08          mov     ecx,dword ptr [ebp+8]
       35 00cc150c 034dec          add     ecx,dword ptr [ebp-14h]
       35 00cc150f 8a5101          mov     dl,byte ptr [ecx+1]
       35 00cc1512 8810            mov     byte ptr [eax],dl
       36 00cc1514 8b4508          mov     eax,dword ptr [ebp+8]
       36 00cc1517 0345ec          add     eax,dword ptr [ebp-14h]
       36 00cc151a 8a4dd7          mov     cl,byte ptr [ebp-29h]
       36 00cc151d 884801          mov     byte ptr [eax+1],cl
       38 00cc1520 ebab            jmp     puzzler3!myfun+0x4d (00cc14cd)
       40 00cc1522 eb91            jmp     puzzler3!myfun+0x35 (00cc14b5)
       41 00cc1524 5f              pop     edi
       41 00cc1525 5e              pop     esi
       41 00cc1526 5b              pop     ebx
       41 00cc1527 81c4f0000000    add     esp,0F0h
       41 00cc152d 3bec            cmp     ebp,esp
       41 00cc152f e820fcffff      call    puzzler3!ILT+335(__RTC_CheckEsp) (00cc1154)
       41 00cc1534 8be5            mov     esp,ebp
       41 00cc1536 5d              pop     ebp
       41 00cc1537 c3              ret

    Good luck, and happy debugging.

    Jeff Dailey-

    In response:  Wow, you folks did it again. I was worried that not many of our readers would respond.  Our entire team was very impressed with the number and quality of the responses we saw.  Congratulations goes out to all those assembler gurus out there that figured out this was a simple bubble sort.  We enjoyed seeing how various people went about solving this.  Some people compiled their code as they worked on reversing the function to verify the assembler.  This is a good approach.  Others just seemed to work it out end to end.  This is the approach I usually end up using because I’m typically in the middle of a debug and don’t actually need the source.


    Great work!


    Here is the answer….



    void myfun(char *val)


    00321480  push        ebp 

    00321481  mov         ebp,esp

    00321483  sub         esp,0F0h

    00321489  push        ebx 

    0032148A  push        esi 

    0032148B  push        edi 

    0032148C  lea         edi,[ebp-0F0h]

    00321492  mov         ecx,3Ch

    00321497  mov         eax,0CCCCCCCCh

    0032149C  rep stos    dword ptr es:[edi]

           int i;

           int j;

           int len;

           char t;



    0032149E  mov         eax,dword ptr [val]

    003214A1  push        eax 

    003214A2  call        @ILT+165(_strlen) (3210AAh)

    003214A7  add         esp,4

    003214AA  mov         dword ptr [len],eax


           for (i=len;i>0;i--)

    003214AD  mov         eax,dword ptr [len]

    003214B0  mov         dword ptr [i],eax

    003214B3  jmp         myfun+3Eh (3214BEh)

    003214B5  mov         eax,dword ptr [i]

    003214B8  sub         eax,1

    003214BB  mov         dword ptr [i],eax

    003214BE  cmp         dword ptr [i],0

    003214C2  jle         myfun+0A4h (321524h)



    003214C4  mov         dword ptr [j],0

    003214CB  jmp         myfun+56h (3214D6h)

    003214CD  mov         eax,dword ptr [j]

    003214D0  add         eax,1

    003214D3  mov         dword ptr [j],eax

    003214D6  mov         eax,dword ptr [i]

    003214D9  sub         eax,1

    003214DC  cmp         dword ptr [j],eax

    003214DF  jge         myfun+0A2h (321522h)


                         if (val[j]>val[j+1])

    003214E1  mov         eax,dword ptr [val]

    003214E4  add         eax,dword ptr [j]

    003214E7  movsx       ecx,byte ptr [eax]

    003214EA  mov         edx,dword ptr [val]

    003214ED  add         edx,dword ptr [j]

    003214F0  movsx       eax,byte ptr [edx+1]

    003214F4  cmp         ecx,eax

    003214F6  jle         myfun+0A0h (321520h)



    003214F8  mov         eax,dword ptr [val]

    003214FB  add         eax,dword ptr [j]

    003214FE  mov         cl,byte ptr [eax]

    00321500  mov         byte ptr [t],cl


    00321503  mov         eax,dword ptr [val]

    00321506  add         eax,dword ptr [j]

    00321509  mov         ecx,dword ptr [val]

    0032150C  add         ecx,dword ptr [j]

    0032150F  mov         dl,byte ptr [ecx+1]

    00321512  mov         byte ptr [eax],dl


    00321514  mov         eax,dword ptr [val]

    00321517  add         eax,dword ptr [j]

    0032151A  mov         cl,byte ptr [t]

    0032151D  mov         byte ptr [eax+1],cl



    00321520  jmp         myfun+4Dh (3214CDh)



    00321522  jmp         myfun+35h (3214B5h)


    00321524  pop         edi 

    00321525  pop         esi 

    00321526  pop         ebx 

    00321527  add         esp,0F0h

    0032152D  cmp         ebp,esp

    0032152F  call        @ILT+335(__RTC_CheckEsp) (321154h)

    00321534  mov         esp,ebp

    00321536  pop         ebp 

    00321537  ret         


    Thank You

    Jeff Dailey-

  • Ntdebugging Blog

    More dump forensics, understanding !locks, in this case a filter driver problem



    Written by Jeff Dailey: 


    Hello NTDebuggers, one of the most important things to understand in kernel debugging hung servers is the output of !locks.  There can be a lot of data and it’s not always clear what is going on.  One of the things I like to do in order to better understand the output is to use a visual representation of the resources involved and the threads that are blocking on those resources.   Before we can do that we need to understand what to look for so we can document it in our diagram. 


    It’s a good idea to understand ERESOURCEs in general l before jumping into !locks.  The following MSDN article goes into lots of great detail.  http://msdn2.microsoft.com/en-us/library/aa490224.aspx


    Simply put, you will typically see threads either with access to or trying to gain access to resources.   If a thread has access to a resource it will be marked by <*>.   Threads that have access to a resource can block other threads from gaining access to said resource.


    You will see threads waiting for shared access.  These threads do not have the <*> and listed above the threads that are Waiting on Exclusive Access.


    You will also see threads that are Waiting on Exclusive Access.  These threads are typically blocked waiting for the threads that have access or ownership of the resource to release it.


    Let’s take a look at one section of !locks output and annotate each thread section...


    Resource @ 0x896d2a68    Shared 1 owning threads  << This info is the ERESOURCE in question.      Contention Count = 15292  << The amount of contention for the object.

        NumberOfSharedWaiters = 1  << This is self explanatory

        NumberOfExclusiveWaiters = 39 << Number of exclusive waiters in the Ex Waiter List

         Threads: 89bd1234-01<*> 896d2020-01   << We have two threads here.  The owner, or shared owner <*>89bd1234 and the shared Waiter 896d2020

         Threads Waiting On Exclusive Access:

                  888ed020       87c036f8       885dc7a0       8bc538b0  << All of these threads are waiting on exclusive access.    

                  88e8cda0       88796988       8905fda0       8974dc10      



    Note the following output is completely fabricated, so alignment and variable names may not be valid.


    The following is some sample output from !locks.  In this scenario I document any ERESOURCE that has any threads waiting on exclusive access.  I also document the ERESOURCES as nodes and show the relationship to the Threads.  The key point is to show the threads involved, the resources they own, and the resources they are blocked on or trying to get exclusive access to.  Ultimately you need to work your way toward the head of the blocking chain of events to figure out what is holding up the entire chain of execution from moving forward.


    In this case you will see that a filter driver called MYFILTER has passed an invalid object to KeWaitForSingleObject.  As a result the thread blocked and all the other threads and processes related to those threads froze and could not move forward.  The machine was completely hung.



    1: kd> !locks


    KD: Scanning for held locks......


    Resource @ 0x8a50ee98    Shared 4 owning threads

         Threads: 896856d0-01<*> 89686778-01<*> 896862d0-01<*> 89685da0-01<*>

    KD: Scanning for held locks............................................................


    Resource @ 0x896dabcd    Exclusively owned

         Threads: 886e5678-01<*>

    KD: Scanning for held locks..


    Resource @ 0x896d2a68    Shared 1 owning threads

        Contention Count = 15292

        NumberOfSharedWaiters = 1

        NumberOfExclusiveWaiters = 39

         Threads: 89bd1234-01<*> 896d2020-01   

         Threads Waiting On Exclusive Access:

                  888ed020       87c036f8       885dc7a0       8bc538b0       

                  88e8cda0       88796988       8905fda0       8974dc10      

                  88d78020       87a7dda0       88b85b20       87b78020      

                  8936e8a0       87dd7ae8       886005a0       88557890      

                  887b3680       87cc2790       87dd4050       87fad8a0      

                  88179580       87b53d70       87cd2775       88ba0578      

                  87b676f8       8886b560       87f68388       89681da0      

                  88952720       888833c0      


    KD: Scanning for held locks................


    Resource @ 0x8959c790    Exclusively owned

        Contention Count = 4827

        NumberOfExclusiveWaiters = 35

         Threads: 89bd1234-01<*>

         Threads Waiting On Exclusive Access:

                  883e3aa0       88873020       88290020       87f5f588      

                  888154f0       88bd4b28       88cbc448       884bd6c8      

                  881e5da0       8935f518       87bcc978       8889e020      

                  88cb3020       88c92178       87cf9020       88daaac0      

                  89376020       88fe9020       887b29d0       87b6f7f0      

                  87e12020       87b4f498       894ee730       88810020      

                  881a8020       87dd55f0       888d3020       885f6da0      

                  881f7da0       880742e8       87a31b50       879ffb50      

                  88451da0       88646da0       8833a020      


    KD: Scanning for held locks.....................................................

    Resource @ 0x88ce81ff    Exclusively owned

        Contention Count = 108

         Threads: 87ad6f78-01<*>

    KD: Scanning for held locks......................................


    Resource @ 0x87da48fb    Exclusively owned

         Threads: 87bddda0-01<*>

    KD: Scanning for held locks.


    Resource @ 0x87df455c    Exclusively owned

        Contention Count = 2

        NumberOfExclusiveWaiters = 2

         Threads: 886e5678-01<*>

         Threads Waiting On Exclusive Access:

                  89bd1234       87ad6c68      


    KD: Scanning for held locks............................................


    Resource @ 0x87fcfe30    Shared 1 owning threads

         Threads: 8a60f8a3-01<*> *** Actual Thread 8a60f8a0

    KD: Scanning for held locks...........


    Resource @ 0x880ef1cd    Shared 1 owning threads

         Threads: 8a60c3af-01<*> *** Actual Thread 8a60c3a0

    KD: Scanning for held locks.

    27044 total locks, 9 locks currently held


    Good luck and happy debugging.

Page 20 of 24 (240 items) «1819202122»