• Ntdebugging Blog

    Understanding Pool Corruption Part 3 – Special Pool for Double Frees

    • 0 Comments

    In Part 1 and Part 2 of this series we discussed pool corruption and how special pool can be used to identify the cause of such corruption.  In today’s article we will use special pool to catch a double free of pool memory.

     

    A double free of pool will cause a system to blue screen, however the resulting crash may vary.  In the most obvious scenario a driver that frees a pool allocation twice will cause the system to immediately crash with a stop code of C2 BAD_POOL_CALLER, and the first parameter will be 7 to indicate “Attempt to free pool which was already freed”.  If you experience such a crash, enabling special pool should be high on your list of troubleshooting steps.

     

    BAD_POOL_CALLER (c2)

    The current thread is making a bad pool request.  Typically this is at a bad IRQL level or double freeing the same allocation, etc.

    Arguments:

    Arg1: 0000000000000007, Attempt to free pool which was already freed

    Arg2: 00000000000011c1, (reserved)

    Arg3: 0000000004810007, Memory contents of the pool block

    Arg4: fffffa8001b10800, Address of the block of pool being deallocated

     

    A less obvious crash would be if the pool has been reallocated.  As we showed in Part 2, pool is structured so that multiple drivers share a page.  When DriverA calls ExFreePool to free its pool block the block is made available for other drivers.  If memory manager gives this memory to DriverF, and then DriverA frees it a second time, a crash may occur in DriverF when the pool allocation no longer contains the expected data.  Such a problem may be difficult for the developer of DriverF to identify without special pool.

     

     

    Special pool will place each driver’s allocation in a separate page of memory (as discussed in Part 2).  When a driver frees a pool block in special pool the whole page will be freed, and any access to a free page will cause an immediate bugcheck.  Additionally, special pool will place this page on the tail of the list of pages to be used again.  This increases the likelihood that the page will still be free when it is freed a second time, decreasing the likelihood of the DriverA/DriverF scenario shown above.

     

    To demonstrate this failure we will once again use the Sysinternals tool NotMyFault.  Choose the “Double free” option and click “Crash”.  Most likely you will get the stop C2 bugcheck mentioned above.  Enable special pool and reboot to get a more informative error.

     

    verifier /flags 1 /driver myfault.sys

     

    Choosing the “Double free” option with special pool enabled resulted in the following crash.  The bugcheck code PAGE_FAULT_IN_NONPAGED_AREA means some driver tried to access memory that was not valid.  This invalid memory was the freed special pool page.

     

    PAGE_FAULT_IN_NONPAGED_AREA (50)

    Invalid system memory was referenced.  This cannot be protected by try-except,

    it must be protected by a Probe.  Typically the address is just plain bad or it

    is pointing at freed memory.

    Arguments:

    Arg1: fffff9800a7fe7f0, memory referenced.

    Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.

    Arg3: fffff80060263888, If non-zero, the instruction address which referenced the bad memory address.

    Arg4: 0000000000000002, (reserved)

     

    Looking at the call stack we can see myfault.sys was freeing pool and ExFreePoolSanityChecks took a page fault that lead to the crash.

     

    kd> kn

    # Child-SP          RetAddr           Call Site

    00 fffff880`0419fe28 fffff800`5fd7e28a nt!DbgBreakPointWithStatus

    01 fffff880`0419fe30 fffff800`5fd7d8de nt!KiBugCheckDebugBreak+0x12

    02 fffff880`0419fe90 fffff800`5fc5b544 nt!KeBugCheck2+0x79f

    03 fffff880`041a05b0 fffff800`5fd1c5bc nt!KeBugCheckEx+0x104

    04 fffff880`041a05f0 fffff800`5fc95acb nt! ?? ::FNODOBFM::`string'+0x33e2a

    05 fffff880`041a0690 fffff800`5fc58eee nt!MmAccessFault+0x55b

    06 fffff880`041a07d0 fffff800`60263888 nt!KiPageFault+0x16e

    07 fffff880`041a0960 fffff800`6024258c nt!ExFreePoolSanityChecks+0xe8

    08 fffff880`041a09a0 fffff880`04c9b5d9 nt!VerifierExFreePoolWithTag+0x3c

    09 fffff880`041a09d0 fffff880`04c9b727 myfault!MyfaultDeviceControl+0x2fd

    0a fffff880`041a0b20 fffff800`60241a4a myfault!MyfaultDispatch+0xb7

    0b fffff880`041a0b80 fffff800`600306c7 nt!IovCallDriver+0xba

    0c fffff880`041a0bd0 fffff800`600458a6 nt!IopXxxControlFile+0x7e5

    0d fffff880`041a0d60 fffff800`5fc5a453 nt!NtDeviceIoControlFile+0x56

    0e fffff880`041a0dd0 000007fd`ea212c5a nt!KiSystemServiceCopyEnd+0x13

     

    Using the address from the bugcheck code, we can verify that the memory is in fact not valid:

     

    kd> dd fffff9800a7fe7f0

    fffff980`0a7fe7f0  ???????? ???????? ???????? ????????

    fffff980`0a7fe800  ???????? ???????? ???????? ????????

    fffff980`0a7fe810  ???????? ???????? ???????? ????????

    fffff980`0a7fe820  ???????? ???????? ???????? ????????

    fffff980`0a7fe830  ???????? ???????? ???????? ????????

    fffff980`0a7fe840  ???????? ???????? ???????? ????????

    fffff980`0a7fe850  ???????? ???????? ???????? ????????

    fffff980`0a7fe860  ???????? ???????? ???????? ????????

    kd> !pte fffff9800a7fe7f0

                                               VA fffff9800a7fe7f0

    PXE at FFFFF6FB7DBEDF98    PPE at FFFFF6FB7DBF3000    PDE at FFFFF6FB7E600298    PTE at FFFFF6FCC0053FF0

    contains 0000000002A91863  contains 0000000002A10863  contains 0000000000000000

    pfn 2a91      ---DA--KWEV  pfn 2a10      ---DA--KWEV  not valid

     

    So far we have enough evidence to prove that myfault.sys was freeing invalid memory, but how to we know this memory is being freed twice?  If there was a double free we need to determine if the first or second call to ExFreePool was incorrect.  To this so we need to determine what code freed the memory first.

     

    Driver Verifier special pool keeps track of the last 0x10000 calls to allocate and free pool.  You can dump this database with the !verifier 80 command.  To limit the data output you can also pass this command the address of the memory you suspect was double freed.

     

    Don’t assume the address in the bugcheck code is the address being freed, go get the address from the function that called VerifierExFreePoolWithTag.

     

    In the above call stack the call below VerifierExFreePoolWithTag is frame 9 (start counting with 0, or use kn).

     

    kd> .frame /r 9

    09 fffff880`041a09d0 fffff880`04c9b727 myfault+0x15d9

    rax=0000000000000000 rbx=fffff9800a7fe800 rcx=fffff9800a7fe800

    rdx=fffffa8001a37fa0 rsi=fffffa80035975e0 rdi=fffffa8003597610

    rip=fffff88004c9b5d9 rsp=fffff880041a09d0 rbp=fffffa80034568d0

    r8=fffff9800a7fe801  r9=fffff9800a7fe7f0 r10=fffff9800a7fe800

    r11=0000000000000000 r12=0000000000000000 r13=0000000000000000

    r14=fffff800600306c7 r15=fffffa8004381b80

    iopl=0         nv up ei ng nz na po nc

    cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000286

    myfault+0x15d9:

    fffff880`04c9b5d9 eb7a            jmp     myfault+0x1655 (fffff880`04c9b655)

     

    On x64 systems the first parameter is passed in rcx.  The below assembly shows that rcx originated from rbx.

     

    kd> ub fffff880`04c9b5d9

    myfault+0x15ba:

    fffff880`04c9b5ba ff15a80a0000    call    qword ptr [myfault+0x2068 (fffff880`04c9c068)]

    fffff880`04c9b5c0 33d2            xor     edx,edx

    fffff880`04c9b5c2 488bc8          mov     rcx,rax

    fffff880`04c9b5c5 488bd8          mov     rbx,rax

    fffff880`04c9b5c8 ff154a0a0000    call    qword ptr [myfault+0x2018 (fffff880`04c9c018)]

    fffff880`04c9b5ce 33d2            xor     edx,edx

    fffff880`04c9b5d0 488bcb          mov    rcx,rbx

    fffff880`04c9b5d3 ff153f0a0000    call    qword ptr [myfault+0x2018 (fffff880`04c9c018)]

     

    Run !verifier 80 using the address from rbx:

     

    kd> !verifier 80 fffff9800a7fe800

     

    Log of recent kernel pool Allocate and Free operations:

     

    There are up to 0x10000 entries in the log.

     

    Parsing 0x0000000000010000 log entries, searching for address 0xfffff9800a7fe800.

     

     

    ======================================================================

    Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0

    fffff80060251a32 nt!VfFreePoolNotification+0x4a

    fffff8005fe736c9 nt!ExFreePool+0x595

    fffff80060242597 nt!VerifierExFreePoolWithTag+0x47

    fffff88004c9b5ce myfault!MyfaultDeviceControl+0x2f2

    fffff88004c9b727 myfault!MyfaultDispatch+0xb7

    fffff80060241a4a nt!IovCallDriver+0xba

    fffff800600306c7 nt!IopXxxControlFile+0x7e5

    fffff800600458a6 nt!NtDeviceIoControlFile+0x56

    fffff8005fc5a453 nt!KiSystemServiceCopyEnd+0x13

    ======================================================================

    Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0

    fffff80060242a5d nt!VeAllocatePoolWithTagPriority+0x2d1

    fffff8006024b20e nt!XdvExAllocatePoolInternal+0x12

    fffff80060242f69 nt!VerifierExAllocatePool+0x61

    fffff88004c9b5c0 myfault!MyfaultDeviceControl+0x2e4

    fffff88004c9b727 myfault!MyfaultDispatch+0xb7

    fffff80060241a4a nt!IovCallDriver+0xba

    fffff800600306c7 nt!IopXxxControlFile+0x7e5

    fffff800600458a6 nt!NtDeviceIoControlFile+0x56

    fffff8005fc5a453 nt!KiSystemServiceCopyEnd+0x13

     

    The above output shows the pool block being allocated by myfault.sys and then freed by myfault.sys.  If we combine this information with the call stack leading up to our bugcheck we can conclude that the pool was freed once in MyfaultDeviceControl at offset 0x2f2, then freed again in MyfaultDeviceControl at offset 0x2fd.

     

    Now we know which driver is causing the problem, and if this is our driver we know which area of the code to investigate.

  • Ntdebugging Blog

    Event ID 157 "Disk # has been surprise removed"

    • 9 Comments

    Hello my name is Bob Golding and I would like to share information on a new error you may see in the system event log. It is Event ID 157 "Disk <n> has been surprise removed" with Source: disk.  This error indicates that the CLASSPNP driver has received a “surprise removal” request from the plug and play manager (PNP) for a non-removable disk.

     

    What does this error mean?

    The PNP manager does what is called enumerations.  An enumeration is a request sent to a driver that controls a bus, such as PCI, to take an inventory of devices on the bus and report back a list of the devices.  The SCSI bus is enumerated in a similar manner, as are devices on the IDE bus.

     

    These enumerations can happen for a number of reasons.  For example, hardware can request an enumeration when it detects a change in configuration.  Also a user can initiate an enumeration by selecting “scan for new devices” in device manager.  

     

    When an enumeration request is received, the bus driver will rescan the bus for all devices.  It will issue commands to the existing devices as though it was looking for new ones.  If these commands fail on an existing unit, the driver will mark the device as “missing”.  When the device is marked “missing”, it will not be reported back to PNP in the inventory.  When PNP determines that the device is not in the inventory it will send a surprise removal request to the bus driver so the bus driver can remove the device object.

     

    Since the CLASSPNP driver sits in the device stack and receives requests that are destined for disks, it sees the surprise removal request and logs an event if the disk is supposed to be non-removable.  An example of a non-removable disk is a hard drive on a SCSI or IDE bus.  An example of a removable disk is a USB thumb drive.

     

    Previously nothing was logged when a non-removable disk was removed, as a result disks would disappear from the system with no indication.  The event id 157 error was implemented in Windows 8.1 and Windows Server 2012 R2 to log a record of a disk disappearing.

     

    Why does this error happen?

    These errors are most often caused when something disrupts the system’s communication with a disk, such as a SAN fabric error or a SCSI bus problem.  The errors can also be caused by a disk that fails, or when a user unplugs a disk while the system is running.  An administrator that sees these errors needs to verify the heath of the disk subsystem.

     

    Event ID 157 Example:

     

Page 1 of 1 (2 items)