• Ntdebugging Blog

    Call Stacks for Pool Allocations

    • 0 Comments

    Hello, it's the Debug Ninja back again for another NtDebugging Blog article.  For as long as I can remember user mode debuggers have had an easy way to get call stacks for heap allocations.  On more recent versions of Windows this has been as simple as using gflags +ust and umdh or !heap -k.  Kernel debuggers have not always had an easy way to determine who allocated a pool block.  Sure, we have pool tags to help us out, but often a programmer will use the same tag in many places and devalue this as a troubleshooting technique.

     

    Fortunately, starting in Windows Vista and Server 2008, kernel debuggers can get call stacks from pool allocations.  We can even get call stacks from pool frees.  This little known technique is not quite as useful as gflags +ust is for heap, but when it is needed it is very useful.

     

    First, you need to turn on special pool using driver verifier.  Verifier will obtain and track the call stack for the allocation and the free, so this technique will not work with traditional special pool as documented in KB188831 because those settings do not use driver verifier.  Because special pool requires additional memory overhead to run, this technique is not valuable for large memory leaks.  However, this technique is a good way to determine what code allocated or freed your pool block in other conditions.  For example, this works well if you find that pool has been freed when you expected it to be allocated.  This also works for smaller memory leaks, especially those for which you can easily reproduce the leak.  Analyzing the allocations and stacks for a leak must be done by hand, as there is no umdh-like tool for kernel mode.

     

    Step 1 - Turning on verifier

    In this example I am using Sysinternals’ notmyfault tool to generate the pool allocations.  Because I know the driver in question I set verifier to only monitor that driver.  Note that a reboot is required to make this setting take effect.

     

    Verifier /flags 1 /driver myfault.sys

     

    Step 2 - Finding the pool allocation to analyze

    For this example I am going to find the call stack of a leaked pool allocation.  First find the tag that is using the most pool by using !poolused.

     

    kd> !poolused 4

       Sorting by  Paged Pool Consumed

     

      Pool Used:

                NonPaged            Paged

     Tag    Allocs     Used    Allocs     Used

    Leak        0        0        23 23552000 UNKNOWN pooltag 'Leak', please update pooltag.txt

    CM31        0        0     20520 18514560 Internal Configuration manager allocations , Binary: nt!cm

    CIcr        0        0      2977  8511504 Code Integrity allocations for image integrity checking , Binary: ci.dll

     

    Next find the pool allocations for that tag with !poolfind.  There is some guessing to be done with all pool leak debugging techniques; you can’t be sure that the allocation you’re looking at has really been leaked and is not just in a state where it has not yet been freed.  You need to make an educated guess because there is no umdh-type functionality to analyze allocates and frees.  If you have the benefit of a live debug you can go the debugger and check back later to see if the memory has been freed or not.

     

    kd> !poolfind Leak

     

    Scanning large pool allocation table for Tag: Leak (fffffa8002e00000 : fffffa8002f80000)

     

    *fffff8a006a00000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a0058fa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006200000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a0068fa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a0060fa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a005a00000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006c00000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006400000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a0062fa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a005afa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a005c00000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006e00000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006600000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a0064fa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a005cfa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006afa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a005e00000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006800000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a0066fa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a005efa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006cfa000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a006000000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

    *fffff8a005800000 :large page allocation, Tag  is Leak, size  is 0xfa000 bytes

     

    Step 3 – Dump the call stack for the allocation

    This step is the easy one.  Once you have the address of the allocation use !verifier 0x80 Address.  If you were interested in all of the call stacks in the log you can run !verifier 0x80 without the Address parameter.

     

    kd> !verifier 0x80 fffff8a005800000

     

    Log of recent kernel pool Allocate and Free operations:

     

    There are up to 0x10000 entries in the log.

     

    Parsing 0x0000000000010000 log entries, searching for address 0xfffff8a005800000.

     

     

    ======================================================================

    Pool block fffff8a005800000, Size 00000000000fa000, Thread fffffa8002be4060

    fffff80001923cc6 nt!VeAllocatePoolWithTagPriority+0x2b6

    fffff80001923d3d nt!VerifierExAllocatePoolEx+0x1d

    fffff880042881f6 myfault+0x11f6

    fffff8800428842f myfault+0x142f

    fffff8000192e750 nt!IovCallDriver+0xa0

    fffff800017a3a97 nt!IopXxxControlFile+0x607

    fffff800017a42f6 nt!NtDeviceIoControlFile+0x56

    fffff80001487ed3 nt!KiSystemServiceCopyEnd+0x13

    ======================================================================

    Pool block fffff8a005800000, Size 0000000000001000, Thread fffffa8002187060

    fffff8000192393a nt!VfFreePoolNotification+0x4a

    fffff800015b6a6f nt!ExDeferredFreePool+0x107b

    fffff800017273eb nt!HvFreeDirtyData+0x7f

    fffff800017269bb nt!HvOptimizedSyncHive+0x53

    fffff80001726303 nt!CmFlushKey+0xaf

    fffff80001726b22 nt!NtFlushKey+0x142

    fffff80001487ed3 nt!KiSystemServiceCopyEnd+0x13

    Parsed entry 0000000000010000/0000000000010000...

    Finished parsing all pool tracking information.

     

    Keep in mind that the log may contain allocate and free information that predates the current use of the pool block, and that the log is of a fixed size so eventually old data will fall off the end.  The most recent use of the pool will be at the top of the output.  Usually this is the stack at the top of the output is what you are interested in, I have highlighted the relevant call stack in red.  In this instance we can see that the pool was most recently allocated by myfault.sys.

     

    Sometimes it is useful to have historical information about previous uses of the pool block such as when dealing with pool that was improperly freed.  In that scenario the most recent call stack may be from an allocate call when the pool block was reused by the memory manager and so you may need to go down several levels to find out where the pool was improperly freed.

     

    For more information on using !verifier you can refer to the debugger help in MSDN, http://msdn.microsoft.com/en-us/library/ff565591.aspx.

  • Ntdebugging Blog

    Debugging a CLOCK_WATCHDOG_TIMEOUT Bugcheck

    • 1 Comments

    Hi debuggers, Andrew Richards here for my first NT Debugging post. I thought I’d share a recent case that used a lot of discovery techniques to uncover the details of what was going on. Most bugchecks give you the information you need as arguments, but in the case of bugcheck 0x101, I had to go digging for a trap frame, the thread stack, look at the disassembly of the running threads, and lots of other goodies.

     

    As usual, the first thing I did was run !analyze -v to get a more detailed explanation of what a bugcheck 0x101 “CLOCK_WATCHDOG_TIMEOUT” is.

     

    0: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************

    CLOCK_WATCHDOG_TIMEOUT (101)
    An expected clock interrupt was not received on a secondary processor in an
    MP system within the allocated interval. This indicates that the specified
    processor is hung and not processing interrupts.
    Arguments:
    Arg1: 00000031, Clock interrupt time out interval in nominal clock ticks.
    Arg2: 00000000, 0.
    Arg3: 87337120, The PRCB address of the hung processor.
    Arg4: 00000003, 0.

     

    A bugcheck 0x101 occurs when the Clock interrupt (IRQL #28) has not been processed by each processor within a timeout.  The Clock interrupt is quite high in the IRQL table for x86; only the Inter-Processor Interrupt (IPI), Power-Fail and High interrupts are higher.

     

    31

    High

    30

    Power Fail

    29

    Inter-processor Interrupt

    28

    Clock

    27

    Profile/Synch

    26

    Device n

     

    5

    CMCI

    4

    3

    Device 1

    2

    DPC/Dispatch

    1

    APC

    0

    Passive

     

    The immediate hypothesis was that one of the processors was stuck processing an IPI (a common interrupt), causing it to be above the Clock IRQ level.

    The questions were:

    1.  At what IRQL was the problem processor?  Was it at IPI?

    2.  Why was it stuck?

    3.  Who did it?

     

    This was a 4 core box, so I ran !prcb four times to view the Processor Resource control Block (PRCB) of each processor. Processor #3 matched the PRCB specified in the bugcheck’s 3rd argument (Arg3 = 87337120). The first question was answered; Processor #3 was the culprit. Interestingly, the Arg4 value in the bugcheck has a value of 3.  Per the documentation it should be 0, but it seems that the processor number is provided.

     

    0: kd> !prcb 0
    PRCB for Processor 0 at 82b34d20:
    Current IRQL -- 28
    Threads--  Current 82b3e380 Next 00000000 Idle 82b3e380
    Processor Index 0 Number (0, 0) GroupSetMember 1
    Interrupt Count -- 000475e1
    Times -- Dpc    000008d6 Interrupt 0000007b
             Kernel 0001920f User      00003b0b

    0: kd> !prcb 1
    PRCB for Processor 1 at 807c7120:
    Current IRQL -- 0
    Threads--  Current 9bb65d48 Next 9bf52c10 Idle 807cc800
    Processor Index 1 Number (0, 1) GroupSetMember 2
    Interrupt Count -- 00030ae8
    Times -- Dpc    000005ca Interrupt 00000098
             Kernel 00017e4a User      00004eab

    0: kd> !prcb 2
    PRCB for Processor 2 at 87300120:
    Current IRQL -- 0
    Threads--  Current 9b41b6e8 Next 00000000 Idle 87305800
    Processor Index 2 Number (0, 2) GroupSetMember 4
    Interrupt Count -- 0002ab35
    Times -- Dpc    00000568 Interrupt 000000ac
             Kernel 0001a788 User      00002565

    0: kd> !prcb 3
    PRCB for Processor 3 at 87337120:
    Current IRQL -- 0
    Threads--  Current 8aaa17c8 Next 00000000 Idle 8733c800
    Processor Index 3 Number (0, 3) GroupSetMember 8
    Interrupt Count -- 00026c0d
    Times -- Dpc    00000620 Interrupt 0000008b
             Kernel 0001ac65 User      00001e33

     

    IRQL

    The Processor Context Records reported that Processors #0, #1 and #2 were at IRQL 31 (1f - HIGH), and Processor #3 was at IRQL 27 (1b - SYNCH). Having a value of 31 is expected at bugcheck as that is how the bugcheck gains control of the processor to gather the current context. It was strange that Processor #3 was different and that is matched the processor mentioned in the bugcheck

    0: kd> !pcr 0
    KPCR for Processor 0 at 82b34c00:
    ...
                    Irql: 0000001f
    ...

    0: kd> !pcr 1
    KPCR for Processor 1 at 807c7000:
    ...
                    Irql: 0000001f
    ...

    0: kd> !pcr 2
    KPCR for Processor 2 at 87300000:
    ...
                    Irql: 0000001f
    ...

    0: kd> !pcr 3
    KPCR for Processor 3 at 87337000:
    ...
                    Irql: 0000001b
    ...

     

    The next step was to look at the stacks of the processors to see what the threads were all involved in.

     

    Processor #0

    To determine the stack of the Processor #0’s thread before the bugcheck, the trap frame needed to be found. The trap frame is stored immediately above the interrupt handler. To find that, I looked at the Interrupt Descriptor Table to find out the name of the handler for clock interrupt and then I searched for that symbol on the stack.

     

    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    ==> Processor #0
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    0: kd> knL
    # ChildEBP RetAddr 
    00 82b31674 82a84a6f nt!KeBugCheckEx+0x1e
    01 82b316b0 82a840be nt!KeAccumulateTicks+0x242
    02 82b316f0 82a83f6b nt!KeUpdateRunTime+0x145
    03 82b3174c 82a88c17 nt!KeUpdateSystemTime+0x613
    04 82b3174c 82a85e79 nt!KeUpdateSystemTimeAssist+0x13
    05 82b317e0 82abfa17 nt!KiIpiSendPacket+0xdd
    06 82b31820 82af0866 nt!KeFlushSingleTb+0x136
    07 82b3190c 82b2ab90 nt!MmFreeSpecialPool+0x2b4
    08 82b31970 82d4a06e nt!ExFreePoolWithTag+0xd6
    09 82b31980 82d3fab2 nt!ViCtxFreeIsrContext+0xf
    0a 82b31998 82d3a1c0 nt!VfIoFreeIrp+0xd3
    0b 82b319a8 8ced5986 nt!IovFreeIrpPrivate+0x47
    WARNING: Stack unwind information not available. Following frames may be wrong.
    0c 82b319c4 82d3acd4 irsir+0x2986
    0d 82b319f4 82a81933 nt!IovpLocalCompletionRoutine+0x14b
    0e 82b31a3c 82d3ab64 nt!IopfCompleteRequest+0x128
    0f 82b31aa4 8b7c6abd nt!IovCompleteRequest+0x133
    10 82b31ab4 8b7c6461 serial!SerialGetNextIrpLocked+0x61
    11 82b31ad8 8b7c7567 serial!SerialGetNextIrp+0x27
    12 82b31b00 8b7b9eb7 serial!SerialTryToCompleteCurrent+0x7a
    13 82b31b38 82a83039 serial!SerialReadTimeout+0x68
    14 82b31b7c 82a82fdd nt!KiProcessTimerDpcTable+0x50
    15 82b31c68 82a82e9a nt!KiProcessExpiredTimerList+0x101
    16 82b31cdc 82a8100e nt!KiTimerExpiration+0x25c
    17 82b31d20 82a80e38 nt!KiRetireDpcList+0xcb
    18 82b31d24 00000000 nt!KiIdleLoop+0x38

     

    0: kd> !irql
    Debugger saved IRQL for processor 0x0 -- 28 (CLOCK2_LEVEL)

     

    0: kd> !idt

    Dumping IDT:

    37:   82e35104 hal!PicSpuriousService37

    51:   8aba1558 serial!SerialCIsrSw (KINTERRUPT 8aba1500)

    71:   89003cd8 i8042prt!I8042KeyboardInterruptService (KINTERRUPT 89003c80)

    72:   8aba17d8 USBPORT!USBPORT_InterruptService (KINTERRUPT 8aba1780)

    82:   89123058 ataport!IdePortInterrupt (KINTERRUPT 89123000)

                   ataport!IdePortInterrupt (KINTERRUPT 89123a00)

                   ataport!IdePortInterrupt (KINTERRUPT 89123780)

                   ataport!IdePortInterrupt (KINTERRUPT 89123500)

    92:   8aba1058 Impcd+0x8540 (KINTERRUPT 8aba1000)

    a0:   8aba1a58 ndis!ndisMiniportMessageIsr (KINTERRUPT 8aba1a00)

    a2:   8aba1cd8 USBPORT!USBPORT_InterruptService (KINTERRUPT 8aba1c80)

    b0:   891232d8 ndis!ndisMiniportMessageIsr (KINTERRUPT 89123280)

    b1:   89123cd8 ACPI!ACPIInterruptServiceRoutine (KINTERRUPT 89123c80)

    b2:   8aba12d8 serial!SerialCIsrSw (KINTERRUPT 8aba1280)

    c1:   82e353f4 hal!HalpBroadcastCallService

    d1:   82e1d634 hal!HalpHpetClockInterrupt

    d2:   82e1d898 hal!HalpHpetRolloverInterrupt

    df:   82e351dc hal!HalpApicRebootService

    e1:   82e35958 hal!HalpIpiHandler

    e3:   82e356f8 hal!HalpLocalApicErrorService

    fd:   82e35f2c hal!HalpProfileInterrupt

    fe:   82e361a8 hal!HalpPerfInterrupt

     

    The search was between the nt!KiIpiSendPacket call (I just chose it as it didn’t seem to be bugcheck related) and the current stack pointer.  Using the dereferenced pointer (poi) and some maths, the trap frame location is retrieved.

     

    0: kd> dps @esp 82b317e0
    ...
    82b31744  82b31760 nt!KiDoubleFaultStack+0x2760 <<<<< This is not a real symbol; it actually is a part of the trap frame
    82b31748  82e1d72a hal!HalpHpetClockInterrupt+0xf6
    82b3174c  82b31760 nt!KiDoubleFaultStack+0x2760
    82b31750  82a88c17 nt!KeUpdateSystemTimeAssist+0x13
    ...

     

    0: kd> .trap poi(82b31748-4)
    ErrCode = 00000000
    eax=87300120 ebx=841882dc ecx=4cdfc4c4 edx=82b34d20 esi=807c7120 edi=82b738c4
    eip=82a85e79 esp=82b317d4 ebp=82b31820 iopl=0         nv up ei pl nz na po nc
    cs=0008  ss=0010  ds=0000  es=dea0  fs=040f  gs=0008             efl=00000202
    nt!KiIpiSendPacket+0xdd:
    82a85e79 f390            pause

     

    So you might be asking yourself, can this be done an easier way?  The answer is definitely Yes. You just need to use kv instead; it adds the trap frame information on the end of the line.

     

    0: kd> kv

    ChildEBP RetAddr  Args to Child             

    82b31674 82a84a6f 00000101 00000031 00000000 nt!KeBugCheckEx+0x1e

    82b316b0 82a840be 00026161 00000000 0001cd00 nt!KeAccumulateTicks+0x242

    82b316f0 82a83f6b 82a85e79 807c7120 00000000 nt!KeUpdateRunTime+0x145

    82b3174c 82a88c17 ffffff1b ffffff1b 000000d1 nt!KeUpdateSystemTime+0x613

    82b3174c 82a85e79 ffffff1b ffffff1b 000000d1 nt!KeUpdateSystemTimeAssist+0x13 (FPO: [0,2] TrapFrame @ 82b31760)

    82b317e0 82abfa17 00000001 00000000 82a3cbe1 nt!KiIpiSendPacket+0xdd (FPO: [6,2,0])

    82b31820 82af0866 a33d6f20 00000001 a33d00cf nt!KeFlushSingleTb+0x136

    82b3190c 82b2ab90 a33d6f20 8a2096d8 8a2096d8 nt!MmFreeSpecialPool+0x2b4

    82b31970 82d4a06e a33d6f20 00000000 82b31998 nt!ExFreePoolWithTag+0xd6

    82b31980 82d3fab2 a33d6f20 a33d6f20 a33d6f20 nt!ViIrpFree+0xf

    82b31998 82d3a1c0 8ced5986 905cadb0 82b319c4 nt!VfIoFreeIrp+0xd3

    82b319a8 8ced5986 a33d6f20 8ced58f0 82b31a6c nt!IovFreeIrpPrivate+0x47

    WARNING: Stack unwind information not available. Following frames may be wrong.

    82b319c4 82d3acd4 00000000 a33d6f20 905cadb0 irsir+0x2986

    82b319f4 82a81933 00000000 a33d6f20 82b31a6c nt!IovpLocalCompletionRoutine+0x14b

    82b31a3c 82d3ab64 a33d6f20 8ab151ac 8ab150f0 nt!IopfCompleteRequest+0x128

    82b31aa4 8b7c6abd 00000000 8ab151ac 82b31ad8 nt!IovCompleteRequest+0x133

    82b31ab4 8b7c6461 8ab151ac 8ab1518c 82b31b0c serial!SerialGetNextIrpLocked+0x61 (FPO: [Non-Fpo])

    82b31ad8 8b7c7567 8ab151ac 8ab1518c 82b31b0c serial!SerialGetNextIrp+0x27 (FPO: [Non-Fpo])

    82b31b00 8b7b9eb7 8ab150f0 00000000 8ab15002 serial!SerialTryToCompleteCurrent+0x7a (FPO: [Non-Fpo])

    82b31b38 82a83039 8ab15314 8ab15002 3c171d26 serial!SerialReadTimeout+0x68 (FPO: [Non-Fpo])

    82b31b7c 82a82fdd 82b34d20 82b31ca8 00000001 nt!KiProcessTimerDpcTable+0x50

    82b31c68 82a82e9a 82b34d20 82b31ca8 00000000 nt!KiProcessExpiredTimerList+0x101

    82b31cdc 82a8100e 0001cacf 9bb374c0 82b3e380 nt!KiTimerExpiration+0x25c

    82b31d20 82a80e38 00000000 0000000e 00000000 nt!KiRetireDpcList+0xcb

    82b31d24 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x38 (FPO: [0,0,0])

     

    By using the stack based maths, or the value provide by kv, the trap frame address is used to set the context (.trap <addr>) to the code running before the interrupt.  These are the stored registers and the stack at the time of the interrupt.

     

    0: kd> .trap 82b31760

    ErrCode = 00000000

    eax=87300120 ebx=841882dc ecx=4cdfc4c4 edx=82b34d20 esi=807c7120 edi=82b738c4

    eip=82a85e79 esp=82b317d4 ebp=82b31820 iopl=0         nv up ei pl nz na po nc

    cs=0008  ss=0010  ds=0000  es=dea0  fs=040f  gs=0008             efl=00000202

    nt!KiIpiSendPacket+0xdd:

    82a85e79 f390            pause

     

    0: kd> knL
      *** Stack trace for last set context - .thread/.cxr resets it
    # ChildEBP RetAddr 
    00 82b317e0 82abfa17 nt!KiIpiSendPacket+0xdd
    01 82b31820 82af0866 nt!KeFlushSingleTb+0x136
    02 82b3190c 82b2ab90 nt!MmFreeSpecialPool+0x2b4
    03 82b31970 82d4a06e nt!ExFreePoolWithTag+0xd6
    04 82b31980 82d3fab2 nt!ViCtxFreeIsrContext+0xf
    05 82b31998 82d3a1c0 nt!VfIoFreeIrp+0xd3
    06 82b319a8 8ced5986 nt!IovFreeIrpPrivate+0x47
    WARNING: Stack unwind information not available. Following frames may be wrong.
    07 82b319c4 82d3acd4 irsir+0x2986
    08 82b319f4 82a81933 nt!IovpLocalCompletionRoutine+0x14b
    09 82b31a3c 82d3ab64 nt!IopfCompleteRequest+0x128
    0a 82b31aa4 8b7c6abd nt!IovCompleteRequest+0x133
    0b 82b31ab4 8b7c6461 serial!SerialGetNextIrpLocked+0x61
    0c 82b31ad8 8b7c7567 serial!SerialGetNextIrp+0x27
    0d 82b31b00 8b7b9eb7 serial!SerialTryToCompleteCurrent+0x7a
    0e 82b31b38 82a83039 serial!SerialReadTimeout+0x68
    0f 82b31b7c 82a82fdd nt!KiProcessTimerDpcTable+0x50
    10 82b31c68 82a82e9a nt!KiProcessExpiredTimerList+0x101
    11 82b31cdc 82a8100e nt!KiTimerExpiration+0x25c
    12 82b31d20 82a80e38 nt!KiRetireDpcList+0xcb
    13 82b31d24 00000000 nt!KiIdleLoop+0x38

     

    Dissassembling the first few instructions reveals a jump (jmp) that is back up in the nt!KiIpiSendPacket function. Using the jmp location and the instruction after the jmp as the bound, we can disassemble the loop. At the time of the bugcheck, the thread was executing a pause (a CPU based delay), and seemingly doing this in a loop while waiting for to release it.

     

    0: kd> u @eip
    nt!KiIpiSendPacket+0xdd:
    82a85e79 f390            pause  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @eip
    82a85e7b eb9e            jmp     nt!KiIpiSendPacket+0x7f (82a85e1b)
    82a85e7d 8d4900          lea     ecx,[ecx]

    ...

    0: kd> u 82a85e1b 82a85e7d
    nt!KiIpiSendPacket+0x7f:
    82a85e1b 8b86a4180000    mov     eax,dword ptr [esi+18A4h]
    82a85e21 0bc0            or      eax,eax <<<<<< Checking if value is nonzero
    82a85e23 7538            jne     nt!KiIpiSendPacket+0xc1 (82a85e5d) <<<<<< Take jmp to stay in loop
    82a85e25 f00fb196a4180000 lock cmpxchg dword ptr [esi+18A4h],edx
    82a85e2d 75ec            jne     nt!KiIpiSendPacket+0x7f (82a85e1b)
    82a85e2f 59              pop     ecx
    82a85e30 d1e9            shr     ecx,1
    82a85e32 8d7f04          lea     edi,[edi+4]
    82a85e35 72df            jb      nt!KiIpiSendPacket+0x7a (82a85e16)
    82a85e37 75f7            jne     nt!KiIpiSendPacket+0x94 (82a85e30)
    82a85e39 59              pop     ecx
    82a85e3a 64890d8c190000  mov     dword ptr fs:[198Ch],ecx
    82a85e41 8b4c240c        mov     ecx,dword ptr [esp+0Ch]
    82a85e45 8b542410        mov     edx,dword ptr [esp+10h]
    82a85e49 64ff0590360000  inc     dword ptr fs:[3690h]
    82a85e50 52              push    edx
    82a85e51 51              push    ecx
    82a85e52 ff15a4a0a082    call    dword ptr [nt!_imp__HalRequestIpi (82a0a0a4)]
    82a85e58 5f              pop     edi
    82a85e59 5e              pop     esi
    82a85e5a c21800          ret     18h
    82a85e5d 41              inc     ecx
    82a85e5e 850d7c3ab782    test    dword ptr [nt!HvlLongSpinCountMask (82b73a7c)],ecx
    82a85e64 7513            jne     nt!KiIpiSendPacket+0xdd (82a85e79)
    82a85e66 f605783ab78240  test    byte ptr [nt!HvlEnlightenments (82b73a78)],40h <<<<<< Don’t spin if you’re an enlightened VM, just pause
    82a85e6d 740a            je      nt!KiIpiSendPacket+0xdd (82a85e79)
    82a85e6f 52              push    edx
    82a85e70 51              push    ecx
    82a85e71 51              push    ecx
    82a85e72 e8c8b60500      call    nt!HvlNotifyLongSpinWait (82ae153f)
    82a85e77 59              pop     ecx
    82a85e78 5a              pop     edx
    82a85e79 f390            pause  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @eip
    82a85e7b eb9e            jmp     nt!KiIpiSendPacket+0x7f (82a85e1b)

     

    The summary so far: Since processor #0 was the thread that created the bugcheck, it must have been interrupted by a Clock interrupt so as to trigger a CLOCK_WATCHDOG_TIMEOUT bugcheck. It is not surprising then that the value of !irql is CLOCK2.

    The existence of the nt!KiIpiSendPacket function lends weight to the thought that this was the creator of an IPI and is not one of the processors that wasn’t processing IPIs.

     

    Processor #1

    Using the same technique as Processor #0, the disassembly for the loop on Processor #1 is determined. Processor #1 is in a tight loop within the nt!KeFlushMultipleRangeTb function, an ancestor function of nt!ExAllocatePoolWithTag, a memory related operation.  This is an interesting function to see, considering that nt!ExFreePoolWithTag, the counterpart, is on Processor #0’s stack.

     

    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    ==> Processor #1
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    1: kd> knL
    # ChildEBP RetAddr 
    00 95bfc7bc 82abb217 nt!KeFlushMultipleRangeTb+0x2d3
    01 95bfc89c 82aa7d11 nt!MiFlushTbAsNeeded+0x12e
    02 95bfc8dc 82b29487 nt!MiAllocatePagedPoolPages+0x567
    03 95bfc940 82aa4674 nt!MiAllocatePoolPages+0x1f
    04 95bfc998 82b2a132 nt!ExpAllocateBigPool+0xa6
    05 95bfc9fc 82aab6b1 nt!ExAllocatePoolWithTag+0x12d
    06 95bfca20 82c4cc62 nt!ExAllocatePoolWithTagPriority+0x196
    07 95bfca78 82c7a662 nt!IopQueryNameInternal+0x60
    08 95bfca98 82c57d88 nt!IopQueryName+0x1b
    09 95bfcb1c 82c71a50 nt!ObpQueryNameString+0x7f
    0a 95bfcb38 82c75e12 nt!ObQueryNameString+0x18
    0b 95bfcc14 82c65788 nt!EtwTraceProcess+0xa2
    0c 95bfcc38 82c73625 nt!PspExitProcess+0x37
    0d 95bfccb4 82c87051 nt!PspExitThread+0x59a
    0e 95bfcccc 82aba8c0 nt!PsExitSpecialApc+0x22
    0f 95bfcd1c 82a472a4 nt!KiDeliverApc+0x28b
    10 95bfcd1c 77556fc0 nt!KiServiceExit+0x64
    WARNING: Frame IP not in any known module. Following frames may be wrong.
    11 014ef918 00000000 0x77556fc0

     

    1: kd> !irql

    Debugger saved IRQL for processor 0x1 -- 0 (LOW_LEVEL)  <<<<<< Windows Internals 4th Edition notes that IRQL may not be saved; this explains the 0

     

    1: kd> u @eip
    nt!KeFlushMultipleRangeTb+0x2d3:
    82a40c31 f390            pause <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @eip
    82a40c33 8b07            mov     eax,dword ptr [edi]
    82a40c35 85c0            test    eax,eax
    82a40c37 75de            jne     nt!KeFlushMultipleRangeTb+0x2b9 (82a40c17)
    82a40c39 8a4d0f          mov     cl,byte ptr [ebp+0Fh]
    82a40c3c ff1558a1a082    call    dword ptr [nt!_imp_KfLowerIrql (82a0a158)]
    82a40c42 5f              pop     edi
    82a40c43 5e              pop     esi

     

    1: kd> u 82a40c17 82a40c39
    nt!KeFlushMultipleRangeTb+0x2b9:
    82a40c17 46              inc     esi
    82a40c18 85357c3ab782    test    dword ptr [nt!HvlLongSpinCountMask (82b73a7c)],esi
    82a40c1e 7511            jne     nt!KeFlushMultipleRangeTb+0x2d3 (82a40c31)
    82a40c20 f605783ab78240  test    byte ptr [nt!HvlEnlightenments (82b73a78)],40h
    82a40c27 7408            je      nt!KeFlushMultipleRangeTb+0x2d3 (82a40c31)
    82a40c29 56              push    esi
    82a40c2a e810090a00      call    nt!HvlNotifyLongSpinWait (82ae153f)
    82a40c2f eb02            jmp     nt!KeFlushMultipleRangeTb+0x2d5 (82a40c33)
    82a40c31 f390            pause <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @eip
    82a40c33 8b07            mov     eax,dword ptr [edi]
    82a40c35 85c0            test    eax,eax
    82a40c37 75de            jne     nt!KeFlushMultipleRangeTb+0x2b9 (82a40c17)

     

    This function tests a variable at edi in each loop. While the signal is not set, the thread goes around loop, eventually executes a pause instruction and then tries the test again. This thread seems to be waiting on someone to set a flag.

     

    Processor #2

    Processor #2 is also trying to send an IPI using nt!KiIpiSendPacket. It was caught in the same function that Processor #0 is in.  In this case, it is executing the jmp instruction that is one instruction after the pause which Processor #0 is executing.

     

    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    ==> Processor #2
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    2: kd> knL
    # ChildEBP RetAddr 
    00 a0c4fac0 82a40bcd nt!KiIpiSendPacket+0xdf
    01 a0c4fafc 82b29431 nt!KeFlushMultipleRangeTb+0x26f
    02 a0c4fbe8 82b2aef1 nt!MiFreePoolPages+0x42c
    03 a0c4fc50 82aa6b37 nt!ExFreePoolWithTag+0x436
    04 a0c4fc64 82c3745d nt!MmFreeAccessPfnBuffer+0x2f
    05 a0c4fcc0 82c4b83a nt!PfpFlushBuffers+0x2ba
    06 a0c4fd50 82c12f5e nt!PfTLoggingWorker+0xaa
    07 a0c4fd90 82aba219 nt!PspSystemThreadStartup+0x9e
    08 00000000 00000000 nt!KiThreadStartup+0x19

     

    2: kd> !irql

    Debugger saved IRQL for processor 0x2 -- 0 (LOW_LEVEL)  <<<<<< Windows Internals 4th Edition notes that IRQL may not be saved; this explains the 0

     

    2: kd> u @eip
    nt!KiIpiSendPacket+0xdf:
    82a85e7b eb9e            jmp     nt!KiIpiSendPacket+0x7f (82a85e1b) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @eip
    82a85e7d 8d4900          lea     ecx,[ecx]
    ...

     

    Summary so far

    So at this point we can say (without any real knowledge of how IPI handling is achieved) that:

     

    Processor #0 is waiting for Processor #1, #2 and/or #3 to respond to its IPI

    Processor #0 is running nt!ExFreePoolWithTag

    Processor #0 is reported to be at IRQL 28 (CLOCK2_LEVEL)

     

    Processor #1 is waiting for a flag to bet set; probably an IPI flag

    Processor #0 is running nt!ExAllocatePoolWithTag

    Processor #1 is reported to be at IRQL 0 (LOW_LEVEL); probably incorrect

     

    Processor #2 is waiting for Processor #0, #1 and/or #3 to respond to its IPI

    Processor #0 is running nt!ExFreePoolWithTag

    Processor #2 is reported to be at IRQL 0 (LOW_LEVEL); probably incorrect

     

    Processor #3 has been tagged by bugcheck as being the cause

    Processor #3 hasn’t been looked at yet

     

    Processor #3

    Analyzing Processor #3 was problematic. The Processor Context Record (PCR) was not completely captured in the dump. This happens when the bugcheck thread is unable to interrupt the target processor to gather the context – usually because it is running at the HIGH (31) IRQL.

    When the context is missing, all of the registers are treated as being zero. The zero value instruction pointer causes the stack to be incorrect when calling knL and various other commands.

     

    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    ==> Processor #3
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
    1: kd> ~3
    WARNING: Process directory table base 7B59C400 doesn't match CR3 00185000
    WARNING: Process directory table base 7B59C400 doesn't match CR3 00185000

     

    3: kd> r
    eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=00000000 edi=00000000
    eip=00000000 esp=00000000 ebp=00000000 iopl=0         nv up di pl nz na po nc
    cs=0000  ss=0000  ds=0000  es=0000  fs=0000  gs=0000             efl=00000000
    00000000 ??              ???

     

    3: kd> knL
    # ChildEBP RetAddr 
    WARNING: Frame IP not in any known module. Following frames may be wrong.
    00 00000000 00000000 0x0

     

    To determine the thread running on the processor, the !thread command was used. The value for the _ETHREAD (8aaa17c8) comes from the PCR that Windows maintains as part of scheduling. Once again, like the knL command, because the context was missing, the stack was not shown.

     

    3: kd> !pcr
    KPCR for Processor 3 at 87337000:
    ...
           CurrentThread:
    8aaa17c8
    ...

     

    3: kd> !thread
    THREAD
    8aaa17c8  Cid 0454.08ec  Teb: 00000000 Win32Thread: 00000000 RUNNING on processor 3
    IRP List:
        8f8c6f68: (0006,0094) Flags: 40000404  Mdl: 00000000
    Not impersonating
    DeviceMap                 89970d50
    Owning Process            89f93ac0       Image:         ZyxApp.exe
    Attached Process          N/A            Image:         N/A
    Wait Start TickCount      117454         Ticks: 589 (0:00:00:09.188)
    Context Switch Count      37             IdealProcessor: 3            
    UserTime                  00:00:00.000
    KernelTime                00:00:00.000
    Win32 Start Address 0x7753fd0f
    Stack Init 91aecfd0 Current 91aeca78 Base
    91aed000 Limit 91aea000 Call 0
    Priority 10 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
    ChildEBP RetAddr  Args to Child             
    00000000 00000000 00000000 00000000 00000000 0x0

     

    The important piece of information in the !thread output was the stack limits. These values allow me to do a search of the raw stack for known symbols. I used dps <limit> <base>. What I was looking for was the first symbol after the 0xffffffff entries which has a value above it that points to a location within this stack. In this case, is it the hal!KfLowerIrql symbol that has a value (base pointer) above it. This symbol is the deepest function that the stack has got to; it doesn’t necessarily mean that the stack is at this depth now.

     

    3: kd> dps 91aea000 91aed000
    91aea000  ffffffff
    91aea004  ffffffff

    91aec85c  ffffffff
    91aec860  ffffffff
    91aec864  ffffffff
    91aec868  ffffffff
    91aec86c  ffffffff
    91aec870  ffffffff
    91aec874  0001187f
    91aec878  00000010
    91aec87c  00011a1f
    91aec880  ffffffff
    91aec884  0001187f
    91aec888  82b7561f nt!MmSystemSpaceLock+0x1f
    91aec88c  91aec89c
    91aec890  82e20ba9 hal!KfLowerIrql+0x61
    91aec894  00011a00
    91aec898  82b7561f nt!MmSystemSpaceLock+0x1f
    91aec89c  91aec95c
    91aec8a0  82e1e92d hal!KeReleaseQueuedSpinLock+0x2d
    91aec8a4  82a3bdc7 nt!MiReturnNonPagedPoolVa+0x1d4
    91aec8a8  00000001

    91aec8ac  00000000

    91aec8b0  83cac9e8

    91aec8b4  ffffffff

    91aec8b8  ffffffff

    91aec8bc  ffffffff

    91aec8c0  ffffffff

    91aec8c4  ffffffff

    91aec8c8  ffffffff

    91aec8cc  ffffffff

    91aec8d0  82e21cee hal!HalpLegacyApicWriteIcr+0xa

    91aec8d4  91aec8f0

    91aec8d8  82e2aea0 hal!HalpRequestIpiSpecifyVector+0x40

    91aec8dc  00000000

    91aec8e0  83cac9e8

    91aec8e4  82b738cc nt!KiProcessorBlock+0xc

    91aec8e8  87300120

    91aec8ec  00000202

    91aec8f0  91aec914

    91aec8f4  82aa6e36 nt!MiInsertPageInFreeOrZeroedList+0x25b

    91aec8f8  83cac9e8

    91aec8fc  00000000

    91aec900  0000001f

    91aec904  00000001

    91aec908  83cac9e8

    91aec90c  00000003

    91aec910  00000007

    91aec914  00000fff

    91aec918  00000003

    91aec91c  82a40bcd nt!KeFlushMultipleRangeTb+0x26f

    91aec920  00000001

    91aec924  00000000

    91aec928  82a9fa43 nt!KiFlushTargetMultipleRangeTb

    91aec92c  00000000

    91aec930  91aec9bc

    91aec934  91aec9b8

    91aec938  00000001

    91aec93c  00000000

    91aec940  83e9cc04

    91aec944  00018736

    91aec948  8733a480

    91aec94c  00000003

    91aec950  91aec9bc

    91aec954  91aeca40

    91aec958  82b29431 nt!MiFreePoolPages+0x42c

    91aec95c  91aec9b8

    91aec960  1f000000

    91aec964  91aec9b8

    91aec968  00000000

    ...

     

    Using the first location found, I used the stack variables to build up a k= command. With an x86 stack, if you select the stack location above the symbol as the base pointer and stack pointer, and the symbol’s address as the instruction pointer, you’ll get a reconstructed stack.

     

    3: kd> k= 91aec89c 91aec88c 82e20ba9

    ChildEBP RetAddr 

    91aec89c 82e1e92d hal!KfLowerIrql+0x61

    91aec8a0 82a3bdc7 hal!KeReleaseQueuedSpinLock+0x2d

    91aec9a8 82aab049 nt!MiReturnNonPagedPoolVa+0x1d4

    91aec9c0 82ab1685 nt!FindNodeOrParent+0x2091aec9d8 82abd820 nt!RtlLookupElementGenericTableFullAvl+0x16

    91aeca0c 82e20ba9 nt!RtlLookupElementGenericTableAvl+0x18

    91aeca1c 82b199d2 hal!KfLowerIrql+0x61

    91aeca24 82d41ad1 nt!VfAvlUnlockShared+0x15

    91aeca40 82b2aef1 nt!VfRemLockDeleteMemoryRange+0x5c

    91aecaa8 963a55c5 nt!ExFreePoolWithTag+0x436

    91aecac8 963a61d4 Zyx+0x5c5
    91aecb00 82d3a6c3 Zyx+0x11d4
    91aecb24 82a4054a nt!IovCallDriver+0x258
    91aecb38 82c3b975 nt!IofCallDriver+0x1b
    91aecb7c 82c2c591 nt!IopDeleteFile+0x10c
    91aecb94 82a81d60 nt!ObpRemoveObjectRoutine+0x59
    91aecba8 82a81cd0 nt!ObfDereferenceObjectWithTag+0x88
    91aecbb0 82c4f308 nt!ObfDereferenceObject+0xd
    91aecbf0 82c7dba9 nt!ObpCloseHandleTableEntry+0x21d
    91aecc20 82c65f86 nt!ExSweepHandleTable+0x5f
    91aecc40 82c73666 nt!ObKillProcess+0x54
    91aeccb4 82c87051 nt!PspExitThread+0x5db
    91aecccc 82aba8c0 nt!PsExitSpecialApc+0x22
    91aecd1c 82a472a4 nt!KiDeliverApc+0x28b
    91aecd1c 775570b4 nt!KiServiceExit+0x64
    0147ff88 00000000 0x775570b4

     

    In this case, because nt!KiFlushTargetMultipleRangeTb was present on Processor #1 as well as being in the Processor #3 raw stack, I found the first symbol lower than this symbol on the (raw) stack that had a valid base pointer above it.  The stack was thus built around the nt!MiFreePoolPages function instead.

     

    91aec94c  00000003

    91aec950  91aec9bc

    91aec954  91aeca40

    91aec958  82b29431 nt!MiFreePoolPages+0x42c

    91aec95c  91aec9b8

    91aec960  1f000000

    91aec964  91aec9b8

    91aec968  00000000

     

    3: kd> k= 91aeca40 91aec954 82b29431

    ChildEBP RetAddr 

    91aeca40 82b2aef1 nt!MiFreePoolPages+0x42c

    91aecaa8 963a55c5 nt!ExFreePoolWithTag+0x436

    WARNING: Stack unwind information not available. Following frames may be wrong.

    91aecac8 963a61d4 Zyx+0x5c5

    91aecb00 82d3a6c3 Zyx+0x11d4

    91aecb24 82a4054a nt!IovCallDriver+0x258

    91aecb38 82c3b975 nt!IofCallDriver+0x1b

    91aecb7c 82c2c591 nt!IopDeleteFile+0x10c

    91aecb94 82a81d60 nt!ObpRemoveObjectRoutine+0x59

    91aecba8 82a81cd0 nt!ObfDereferenceObjectWithTag+0x88

    91aecbb0 82c4f308 nt!ObfDereferenceObject+0xd

    91aecbf0 82c7dba9 nt!ObpCloseHandleTableEntry+0x21d

    91aecc20 82c65f86 nt!ExSweepHandleTable+0x5f

    91aecc40 82c73666 nt!ObKillProcess+0x54

    91aeccb4 82c87051 nt!PspExitThread+0x5db

    91aecccc 82aba8c0 nt!PsExitSpecialApc+0x22

    91aecd1c 82a472a4 nt!KiDeliverApc+0x28b

    91aecd1c 775570b4 nt!KiServiceExit+0x64

    0147ff88 00000000 0x775570b4

     

    It’s very hard to get much further this stack since the exact registers are not known. In particular, it is hard to determine what function above nt!MiFreePoolPages the thread is executing at the moment.  As with the Processor #1 investigation, it is interesting to note that Processor #3 is also involved in a memory operation; specifically, it is doing a nt!ExFreePoolWithTag much like Processor #0 is.

     

    Inter-Processor Interrupts

    The only time an IPI (interrupt) is not processed immediately is when the target processor is at IPI IRQL or higher. The most common example being when it is already processing an IPI (interrupts of the same level cannot interrupt the handler for the same IRQL). In this case, the interrupt has to be queued until the interrupt mask allows its arrival. Usually, this design allows only one IPI to be processed at any one time.

     

    A deadlock (like) condition can arise though if an IPI is issued to a processor that is at a higher IRQL, and this processor (thread) attempts to send an IPI. The IPI logic blocks the send if there is an outstanding IPI to complete on the processor. The assumption being that interrupt queuing should be avoided as there is a probability of loss if the queue overflows.

     

    Looking in the Windows Internals book, there is a single sentence that says “Each interrupt level has a specific purpose. For example, the kernel issues an interprocessor interrupt (IPI) to request that another processor perform an action, such as dispatching a particular thread for execution or updating its translation look-aside buffer cache.”.  This is very interesting as the translation look-aside buffer is part of the memory manager, and the memory operations are being undertaken on all the processors.

     

    3rd Party Driver

    Instead of pulling my hair out combing through the threads on the system, the IPI code, the Memory Manager code or bugcheck code, I decided to look at the 3rd party driver in processor #3 to see if it was changing the IRQL.

     

    The first step was to find the bound of the Zyx+0x5c5 function.  The end address of the assembler is easy to determine, it is Zyx+0x5c5.  The question is, how big is the function? To work that out, you look at the assembler of the caller.  The caller’s assembler will point to the starting instruction in the Zyx+0x5c5 function.

     

    3: kd> k= 91aeca40 91aec954 82b29431

    ChildEBP RetAddr 

    91aeca40 82b2aef1 nt!MiFreePoolPages+0x42c

    91aecaa8 963a55c5 nt!ExFreePoolWithTag+0x436

    WARNING: Stack unwind information not available. Following frames may be wrong.

    91aecac8 963a61d4 Zyx+0x5c5

    91aecb00 82d3a6c3 Zyx+0x11d4

    91aecb24 82a4054a nt!IovCallDriver+0x258

    91aecb38 82c3b975 nt!IofCallDriver+0x1b

    91aecb7c 82c2c591 nt!IopDeleteFile+0x10c

    ...

     

    3: kd> ub Zyx+0x11d4
    Zyx+0x11b7:
    963a61b7 ff7008          push    dword ptr [eax+8]
    963a61ba 52              push    edx
    963a61bb 6a01            push    1
    963a61bd ff7018          push    dword ptr [eax+18h]
    963a61c0 e827fdffff      call    Zyx+0xeec (963a5eec)
    963a61c5 e9f1000000      jmp     Zyx+0x12bb (963a62bb)
    963a61ca e8d3fcffff      call    Zyx+0xea2 (963a5ea2)
    963a61cf e89ef3ffff      call    Zyx+0x572 (963a5572)

     

    3: kd> u 963a5572 Zyx+0x5c5
    Zyx+0x572:
    963a5572 8bff            mov     edi,edi
    963a5574 55              push    ebp
    963a5575 8bec            mov     ebp,esp
    963a5577 51              push    ecx
    963a5578 53              push    ebx
    963a5579 56              push    esi
    963a557a 57              push    edi
    963a557b b11f            mov     cl,1Fh <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 0x1F = 0n31 (HIGH)
    963a557d ff15006b3a96    call    dword ptr [Zyx+0x1b00 (963a6b00)] <<<<<<<< hal!KfRaiseIrql (via lookup)
    963a5583 bb14813a96      mov     ebx,offset Zyx+0x3114 (963a8114)
    963a5588 8bcb            mov     ecx,ebx
    963a558a 8845ff          mov     byte ptr [ebp-1],al
    963a558d e8c2100000      call    Zyx+0x1654 (963a6654) <<<<<<<<<<<<<<<< [Unresolved]
    963a5592 a1746d3a96      mov     eax,dword ptr [Zyx+0x1d74 (963a6d74)]
    963a5597 85c0            test    eax,eax
    963a5599 8b35586b3a96    mov     esi,dword ptr [Zyx+0x1b58 (963a6b58)] <<<< nt!ExFreePoolWithTag (via lookup)
    963a559f 7413            je      Zyx+0x5b4 (963a55b4)
    963a55a1 8b7808          mov     edi,dword ptr [eax+8]
    963a55a4 6a00            push    0
    963a55a6 50              push    eax
    963a55a7 ffd6            call    esi  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< nt!ExFreePoolWithTag (via 963a5599)
    963a55a9 85ff            test    edi,edi
    963a55ab 8bc7            mov     eax,edi
    963a55ad a3746d3a96      mov     dword ptr [Zyx+0x1d74 (963a6d74)],eax
    963a55b2 75ed            jne     Zyx+0x5a1 (963a55a1)
    963a55b4 a1a06d3a96      mov     eax,dword ptr [Zyx+0x1da0 (963a6da0)]
    963a55b9 85c0            test    eax,eax
    963a55bb 7413            je      Zyx+0x5d0 (963a55d0)
    963a55bd 8b7808          mov     edi,dword ptr [eax+8]
    963a55c0 6a00            push    0
    963a55c2 50              push    eax
    963a55c3 ffd6            call    esi <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< nt!ExFreePoolWithTag (via 963a5599)

     

    3: kd> ln poi(963a6b00)
    (82e20844)   hal!KfRaiseIrql   |  (82e208c6)   hal!HalpDispatchSoftwareInterrupt
    Exact matches:

     

    3: kd> ln poi(963a6b58)
    (82b2aaba)   nt!ExFreePoolWithTag   |  (82b2b660)   nt!ExDeferredFreePool
    Exact matches:

     

    The next step was to work out what functions the Zyx+0x5c5 function called, and with what parameters. The assembler didn’t make this immediately obvious. The address of the function calls were lookups. This is common when the address is imported by the code, instead of being bound. To get the function name of the call, I dereferenced the pointer (poi(<addr>) and passed the address to ln so that it listed the nearest symbols.

     

    The first function confirmed some the theory. A call to hal!KfRaiseIrql was being made with a IRQL of HIGH (31). And this was prior to a code path that called functions.

     

    The value of the @esi register was determined to be nt!ExFreePoolWithTag. This matched the function name in the reconstructed stack (nt!ExFreePoolWithTag+0x436) and confirmed that the stack was reconstructed in this area correctly.

     

    Conclusion

    The Zyx function raised IRQL to HIGH_LEVEL before calling nt!ExFreePoolWithTag. This function caused a Translation Look-aside Buffer (TLB) flush via nt!KiFlushTargetMultipleRangeTb) to occur. This in turn caused an IPI notification to the other processors (via nt!KiIpiSendPacket) to indicate the release of the TLB memory.

     

    The Processor #0 IPI was stuck as incomplete because Processor #3 could not be interrupted, and Processor #3 did not lower its IRQL as it was waiting to send its own IPI upon completion of the Processor #0 IPI – thus a deadlock was formed.

     

    I contacted the driver’s author and organized a new version that used the SYNCH (27) IRQL (the level immediately under IPI). The new build was sent to the customer and the immediate bugcheck that they were observing in their test environment disappeared.  Although this is not an ideal solution, it was sufficient for the time being.  An ideal solution would be to follow the documentation for ExFreePoolWithTag and not make the call with an IRQL greater than DISPATCH_LEVEL.

     

    I hope this case has provided a good foundation in how to recover and navigate through stacks without symbols or context. Once you understand the mechanics of calling conventions, the values needed to bound a disassembling operation and/or a stack reassemble are quite easy to determine.

     

Page 1 of 1 (2 items)