In Part 1 and Part 2 of this series we discussed pool corruption and how special pool can be used to identify the cause of such corruption. In today’s article we will use special pool to catch a double free of pool memory.
A double free of pool will cause a system to blue screen, however the resulting crash may vary. In the most obvious scenario a driver that frees a pool allocation twice will cause the system to immediately crash with a stop code of C2 BAD_POOL_CALLER, and the first parameter will be 7 to indicate “Attempt to free pool which was already freed”. If you experience such a crash, enabling special pool should be high on your list of troubleshooting steps.
The current thread is making a bad pool request. Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arg1: 0000000000000007, Attempt to free pool which was already freed
Arg2: 00000000000011c1, (reserved)
Arg3: 0000000004810007, Memory contents of the pool block
Arg4: fffffa8001b10800, Address of the block of pool being deallocated
A less obvious crash would be if the pool has been reallocated. As we showed in Part 2, pool is structured so that multiple drivers share a page. When DriverA calls ExFreePool to free its pool block the block is made available for other drivers. If memory manager gives this memory to DriverF, and then DriverA frees it a second time, a crash may occur in DriverF when the pool allocation no longer contains the expected data. Such a problem may be difficult for the developer of DriverF to identify without special pool.
Special pool will place each driver’s allocation in a separate page of memory (as discussed in Part 2). When a driver frees a pool block in special pool the whole page will be freed, and any access to a free page will cause an immediate bugcheck. Additionally, special pool will place this page on the tail of the list of pages to be used again. This increases the likelihood that the page will still be free when it is freed a second time, decreasing the likelihood of the DriverA/DriverF scenario shown above.
To demonstrate this failure we will once again use the Sysinternals tool NotMyFault. Choose the “Double free” option and click “Crash”. Most likely you will get the stop C2 bugcheck mentioned above. Enable special pool and reboot to get a more informative error.
verifier /flags 1 /driver myfault.sys
Choosing the “Double free” option with special pool enabled resulted in the following crash. The bugcheck code PAGE_FAULT_IN_NONPAGED_AREA means some driver tried to access memory that was not valid. This invalid memory was the freed special pool page.
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arg1: fffff9800a7fe7f0, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff80060263888, If non-zero, the instruction address which referenced the bad memory address.
Arg4: 0000000000000002, (reserved)
Looking at the call stack we can see myfault.sys was freeing pool and ExFreePoolSanityChecks took a page fault that lead to the crash.
# Child-SP RetAddr Call Site
00 fffff880`0419fe28 fffff800`5fd7e28a nt!DbgBreakPointWithStatus
01 fffff880`0419fe30 fffff800`5fd7d8de nt!KiBugCheckDebugBreak+0x12
02 fffff880`0419fe90 fffff800`5fc5b544 nt!KeBugCheck2+0x79f
03 fffff880`041a05b0 fffff800`5fd1c5bc nt!KeBugCheckEx+0x104
04 fffff880`041a05f0 fffff800`5fc95acb nt! ?? ::FNODOBFM::`string'+0x33e2a
05 fffff880`041a0690 fffff800`5fc58eee nt!MmAccessFault+0x55b
06 fffff880`041a07d0 fffff800`60263888 nt!KiPageFault+0x16e
07 fffff880`041a0960 fffff800`6024258c nt!ExFreePoolSanityChecks+0xe8
08 fffff880`041a09a0 fffff880`04c9b5d9 nt!VerifierExFreePoolWithTag+0x3c
09 fffff880`041a09d0 fffff880`04c9b727 myfault!MyfaultDeviceControl+0x2fd
0a fffff880`041a0b20 fffff800`60241a4a myfault!MyfaultDispatch+0xb7
0b fffff880`041a0b80 fffff800`600306c7 nt!IovCallDriver+0xba
0c fffff880`041a0bd0 fffff800`600458a6 nt!IopXxxControlFile+0x7e5
0d fffff880`041a0d60 fffff800`5fc5a453 nt!NtDeviceIoControlFile+0x56
0e fffff880`041a0dd0 000007fd`ea212c5a nt!KiSystemServiceCopyEnd+0x13
Using the address from the bugcheck code, we can verify that the memory is in fact not valid:
kd> dd fffff9800a7fe7f0
fffff980`0a7fe7f0 ???????? ???????? ???????? ????????
fffff980`0a7fe800 ???????? ???????? ???????? ????????
fffff980`0a7fe810 ???????? ???????? ???????? ????????
fffff980`0a7fe820 ???????? ???????? ???????? ????????
fffff980`0a7fe830 ???????? ???????? ???????? ????????
fffff980`0a7fe840 ???????? ???????? ???????? ????????
fffff980`0a7fe850 ???????? ???????? ???????? ????????
fffff980`0a7fe860 ???????? ???????? ???????? ????????
kd> !pte fffff9800a7fe7f0
PXE at FFFFF6FB7DBEDF98 PPE at FFFFF6FB7DBF3000 PDE at FFFFF6FB7E600298 PTE at FFFFF6FCC0053FF0
contains 0000000002A91863 contains 0000000002A10863 contains 0000000000000000
pfn 2a91 ---DA--KWEV pfn 2a10 ---DA--KWEV not valid
So far we have enough evidence to prove that myfault.sys was freeing invalid memory, but how to we know this memory is being freed twice? If there was a double free we need to determine if the first or second call to ExFreePool was incorrect. To this so we need to determine what code freed the memory first.
Driver Verifier special pool keeps track of the last 0x10000 calls to allocate and free pool. You can dump this database with the !verifier 80 command. To limit the data output you can also pass this command the address of the memory you suspect was double freed.
Don’t assume the address in the bugcheck code is the address being freed, go get the address from the function that called VerifierExFreePoolWithTag.
In the above call stack the call below VerifierExFreePoolWithTag is frame 9 (start counting with 0, or use kn).
kd> .frame /r 9
09 fffff880`041a09d0 fffff880`04c9b727 myfault+0x15d9
rax=0000000000000000 rbx=fffff9800a7fe800 rcx=fffff9800a7fe800
rdx=fffffa8001a37fa0 rsi=fffffa80035975e0 rdi=fffffa8003597610
rip=fffff88004c9b5d9 rsp=fffff880041a09d0 rbp=fffffa80034568d0
r8=fffff9800a7fe801 r9=fffff9800a7fe7f0 r10=fffff9800a7fe800
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
iopl=0 nv up ei ng nz na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000286
fffff880`04c9b5d9 eb7a jmp myfault+0x1655 (fffff880`04c9b655)
On x64 systems the first parameter is passed in rcx. The below assembly shows that rcx originated from rbx.
kd> ub fffff880`04c9b5d9
fffff880`04c9b5ba ff15a80a0000 call qword ptr [myfault+0x2068 (fffff880`04c9c068)]
fffff880`04c9b5c0 33d2 xor edx,edx
fffff880`04c9b5c2 488bc8 mov rcx,rax
fffff880`04c9b5c5 488bd8 mov rbx,rax
fffff880`04c9b5c8 ff154a0a0000 call qword ptr [myfault+0x2018 (fffff880`04c9c018)]
fffff880`04c9b5ce 33d2 xor edx,edx
fffff880`04c9b5d0 488bcb mov rcx,rbx
fffff880`04c9b5d3 ff153f0a0000 call qword ptr [myfault+0x2018 (fffff880`04c9c018)]
Run !verifier 80 using the address from rbx:
kd> !verifier 80 fffff9800a7fe800
Log of recent kernel pool Allocate and Free operations:
There are up to 0x10000 entries in the log.
Parsing 0x0000000000010000 log entries, searching for address 0xfffff9800a7fe800.
Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0
The above output shows the pool block being allocated by myfault.sys and then freed by myfault.sys. If we combine this information with the call stack leading up to our bugcheck we can conclude that the pool was freed once in MyfaultDeviceControl at offset 0x2f2, then freed again in MyfaultDeviceControl at offset 0x2fd.
Now we know which driver is causing the problem, and if this is our driver we know which area of the code to investigate.
Hello my name is Bob Golding and I would like to share information on a new error you may see in the system event log. It is Event ID 157 "Disk <n> has been surprise removed" with Source: disk. This error indicates that the CLASSPNP driver has received a “surprise removal” request from the plug and play manager (PNP) for a non-removable disk.
What does this error mean?
The PNP manager does what is called enumerations. An enumeration is a request sent to a driver that controls a bus, such as PCI, to take an inventory of devices on the bus and report back a list of the devices. The SCSI bus is enumerated in a similar manner, as are devices on the IDE bus.
These enumerations can happen for a number of reasons. For example, hardware can request an enumeration when it detects a change in configuration. Also a user can initiate an enumeration by selecting “scan for new devices” in device manager.
When an enumeration request is received, the bus driver will rescan the bus for all devices. It will issue commands to the existing devices as though it was looking for new ones. If these commands fail on an existing unit, the driver will mark the device as “missing”. When the device is marked “missing”, it will not be reported back to PNP in the inventory. When PNP determines that the device is not in the inventory it will send a surprise removal request to the bus driver so the bus driver can remove the device object.
Since the CLASSPNP driver sits in the device stack and receives requests that are destined for disks, it sees the surprise removal request and logs an event if the disk is supposed to be non-removable. An example of a non-removable disk is a hard drive on a SCSI or IDE bus. An example of a removable disk is a USB thumb drive.
Previously nothing was logged when a non-removable disk was removed, as a result disks would disappear from the system with no indication. The event id 157 error was implemented in Windows 8.1 and Windows Server 2012 R2 to log a record of a disk disappearing.
Why does this error happen?
These errors are most often caused when something disrupts the system’s communication with a disk, such as a SAN fabric error or a SCSI bus problem. The errors can also be caused by a disk that fails, or when a user unplugs a disk while the system is running. An administrator that sees these errors needs to verify the heath of the disk subsystem.
Event ID 157 Example: