In our previous article we discussed pool corruption that occurs when a driver writes too much data in a buffer. In this article we will discuss how special pool can help identify the driver that writes too much data.
Pool is typically organized to allow multiple drivers to store data in the same page of memory, as shown in Figure 1. By allowing multiple drivers to share the same page, pool provides for an efficient use of the available kernel memory space. However this sharing requires that each driver be careful in how it uses pool, any bugs where the driver uses pool improperly may corrupt the pool of other drivers and cause a crash.
Figure 1 – Uncorrupted Pool
With pool organized as shown in Figure 1, if DriverA allocates 100 bytes but writes 120 bytes it will overwrite the pool header and data stored by DriverB. In Part 1 we demonstrated this type of buffer overflow using NotMyFault, but we were not able to identify which code had corrupted the pool.
Figure 2 – Corrupted Pool
To catch the driver that corrupted pool we can use special pool. Special pool changes the organization of the pool so that each driver’s allocation is in a separate page of memory. This helps prevent drivers from accidentally writing to another driver’s memory. Special pool also configures the driver’s allocation at the end of the page and sets the next virtual page as a guard page by marking it as invalid. The guard page causes an attempt to write past the end of the allocation to result in an immediate bugcheck.
Special pool also fills the unused portion of the page with a repeating pattern, referred to as “slop bytes”. These slop bytes will be checked when the page is freed, if any errors are found in the pattern a bugcheck will be generated to indicate that the memory was corrupted. This type of corruption is not a buffer overflow, it may be an underflow or some other form of corruption.
Figure 3 – Special Pool
Because special pool stores each pool allocation in its own 4KB page, it causes an increase in memory usage. When special pool is enabled the memory manager will configure a limit of how much special pool may be allocated on the system, when this limit is reached the normal pools will be used instead. This limitation may be especially pronounced on 32-bit systems which have less kernel space than 64-bit systems.
Now that we have explained how special pool works, we should use it.
There are two methods to enable special pool. Driver verifier allows special pool to be enabled on specific drivers. The PoolTag registry value described in KB188831 allows special pool to be enabled for a particular pool tag. Starting in Windows Vista and Windows Server 2008, driver verifier captures additional information for special pool allocations so this is typically the recommended method.
To enable special pool using driver verifier use the following command line, or choose the option from the verifier GUI. Use the /driver flag to specify drivers you want to verify, this is the place to list drivers you suspect as the cause of the problem. You may want to verify drivers you have written and want to test or drivers you have recently updated on the system. In the command line below I am only verifying myfault.sys. A reboot is required to enable special pool.
verifier /flags 1 /driver myfault.sys
After enabling verifier and rebooting the system, repeat the activity that causes the crash. For some problems the activity may just be to wait for a period of time. For our demonstration we are running NotMyFault (see Part 1 for details).
The crash resulting from a buffer overflow in special pool will be a stop 0xD6, DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION.
kd> !analyze -v
* Bugcheck Analysis *
N bytes of memory was allocated and more than N bytes are being referenced.
This cannot be protected by try-except.
When possible, the guilty driver's name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arg1: fffff9800b5ff000, memory referenced
Arg2: 0000000000000001, value 0 = read operation, 1 = write operation
Arg3: fffff88004f834eb, if non-zero, the address which referenced memory.
Arg4: 0000000000000000, (reserved)
We can debug this crash and determine that notmyfault.sys wrote beyond its pool buffer.
The call stack shows that myfault.sys accessed invalid memory and this generated a page fault.
Child-SP RetAddr Call Site
fffff880`04822658 fffff803`721333f1 nt!KeBugCheckEx
fffff880`04822660 fffff803`720acacb nt! ?? ::FNODOBFM::`string'+0x33c2b
fffff880`04822700 fffff803`7206feee nt!MmAccessFault+0x55b
fffff880`04822840 fffff880`04f834eb nt!KiPageFault+0x16e
fffff880`048229d0 fffff880`04f83727 myfault+0x14eb
fffff880`04822b20 fffff803`72658a4a myfault+0x1727
fffff880`04822b80 fffff803`724476c7 nt!IovCallDriver+0xba
fffff880`04822bd0 fffff803`7245c8a6 nt!IopXxxControlFile+0x7e5
fffff880`04822d60 fffff803`72071453 nt!NtDeviceIoControlFile+0x56
fffff880`04822dd0 000007fc`4fe22c5a nt!KiSystemServiceCopyEnd+0x13
00000000`004debb8 00000000`00000000 0x000007fc`4fe22c5a
The !pool command shows that the address being referenced by myfault.sys is special pool.
kd> !pool fffff9800b5ff000
Pool page fffff9800b5ff000 region is Special pool
fffff9800b5ff000: Unable to get contents of special pool block
The page table entry shows that the address is not valid. This is the guard page used by special pool to catch overruns.
kd> !pte fffff9800b5ff000
PXE at FFFFF6FB7DBEDF98 PPE at FFFFF6FB7DBF3000 PDE at FFFFF6FB7E6002D0 PTE at FFFFF6FCC005AFF8
contains 0000000001B8F863 contains 000000000138E863 contains 000000001A6A1863 contains 0000000000000000
pfn 1b8f ---DA--KWEV pfn 138e ---DA--KWEV pfn 1a6a1 ---DA--KWEV not valid
The allocation prior to this memory is an 800 byte block of non paged pool tagged as “Wrap”. “Wrap” is the tag used by verifier when pool is allocated without a tag, it is the equivalent to the “None” tag we saw in Part 1.
kd> !pool fffff9800b5ff000-1000
Pool page fffff9800b5fe000 region is Special pool
*fffff9800b5fe000 size: 800 data: fffff9800b5fe800 (NonPaged) *Wrap
Owning component : Unknown (update pooltag.txt)
Special pool is an effective mechanism to track down buffer overflow pool corruption. It can also be used to catch other types of pool corruption which we will discuss in future articles.
It may be worth mentioning the verifier volatile flag and this blog post (published today):
One question: Is there a possibility to track down pool corruption in scenarios where some pool allocations were not successfull via special pool (as can be seen in the verifier statistics sometimes)?
[The volatile flag causes verifier to be disabled after the next reboot. This can be problematic in some environments and risks collecting data without special pool enabled. Such data would be a waste of a troubleshooting opportunity. This flag may be valuable in a scenario where the server cannot be rebooted immediately (such as a critical production system).
If the pool allocations are not allocated from special pool you need to try your repro again. Note that special pool only works against allocations smaller than 1 page (4KB).]