Understanding Pool Corruption Part 1 – Buffer Overflows

Understanding Pool Corruption Part 1 – Buffer Overflows

  • Comments 6

Before we can discuss pool corruption we must understand what pool is.  Pool is kernel mode memory used as a storage space for drivers.  Pool is organized in a similar way to how you might use a notepad when taking notes from a lecture or a book.  Some notes may be 1 line, others may be many lines.  Many different notes are on the same page.

 

Memory is also organized into pages, typically a page of memory is 4KB.  The Windows memory manager breaks up this 4KB page into smaller blocks.  One block may be as small as 8 bytes or possibly much larger.  Each of these blocks exists side by side with other blocks.

 

The !pool command can be used to see the pool blocks stored in a page.

 

kd> !pool fffffa8003f42000

Pool page fffffa8003f42000 region is Nonpaged pool

*fffffa8003f42000 size:  410 previous size:    0  (Free)      *Irp

            Pooltag Irp  : Io, IRP packets

fffffa8003f42410 size:   40 previous size:  410  (Allocated)  MmSe

fffffa8003f42450 size:  150 previous size:   40  (Allocated)  File

fffffa8003f425a0 size:   80 previous size:  150  (Allocated)  Even

fffffa8003f42620 size:   c0 previous size:   80  (Allocated)  EtwR

fffffa8003f426e0 size:   d0 previous size:   c0  (Allocated)  CcBc

fffffa8003f427b0 size:   d0 previous size:   d0  (Allocated)  CcBc

fffffa8003f42880 size:   20 previous size:   d0  (Free)       Free

fffffa8003f428a0 size:   d0 previous size:   20  (Allocated)  Wait

fffffa8003f42970 size:   80 previous size:   d0  (Allocated)  CM44

fffffa8003f429f0 size:   80 previous size:   80  (Allocated)  Even

fffffa8003f42a70 size:   80 previous size:   80  (Allocated)  Even

fffffa8003f42af0 size:   d0 previous size:   80  (Allocated)  Wait

fffffa8003f42bc0 size:   80 previous size:   d0  (Allocated)  CM44

fffffa8003f42c40 size:   d0 previous size:   80  (Allocated)  Wait

fffffa8003f42d10 size:  230 previous size:   d0  (Allocated)  ALPC

fffffa8003f42f40 size:   c0 previous size:  230  (Allocated)  EtwR

 

Because many pool allocations are stored in the same page, it is critical that every driver only use the space they have allocated.  If DriverA uses more space than it allocated they will write into the next driver’s space (DriverB) and corrupt DriverB’s data.  This overwrite into the next driver’s space is called a buffer overflow.  Later either the memory manager or DriverB will attempt to use this corrupted memory and will encounter unexpected information.  This unexpected information typically results in a blue screen.

 

The NotMyFault application from Sysinternals has an option to force a buffer overflow.  This can be used to demonstrate pool corruption.  Choosing the “Buffer overflow” option and clicking “Crash” will cause a buffer overflow in pool.  The system may not immediately blue screen after clicking the Crash button.  The system will remain stable until something attempts to use the corrupted memory.  Using the system will often eventually result in a blue screen.

 

NotMyFault

 

Often pool corruption appears as a stop 0x19 BAD_POOL_HEADER or stop 0xC2 BAD_POOL_CALLER.  These stop codes make it easy to determine that pool corruption is involved in the crash.  However, the results of accessing unexpected memory can vary widely, as a result pool corruption can result in many different types of bugchecks.

 

As with any blue screen dump analysis the best place to start is with !analyze -v.  This command will display the stop code and parameters, and do some basic interpretation of the crash.

 

kd> !analyze -v

*******************************************************************************

*                                                                             *

*                        Bugcheck Analysis                                    *

*                                                                             *

*******************************************************************************

 

SYSTEM_SERVICE_EXCEPTION (3b)

An exception happened while executing a system service routine.

Arguments:

Arg1: 00000000c0000005, Exception code that caused the bugcheck

Arg2: fffff8009267244a, Address of the instruction which caused the bugcheck

Arg3: fffff88004763560, Address of the context record for the exception that caused the bugcheck

Arg4: 0000000000000000, zero.

 

In my example the bugcheck was a stop 0x3B SYSTEM_SERVICE_EXCEPTION.  The first parameter of this stop code is c0000005, which is a status code for an access violation.  An access violation is an attempt to access invalid memory (this error is not related to permissions).  Status codes can be looked up in the WDK header ntstatus.h.

 

The !analyze -v command also provides a helpful shortcut to get into the context of the failure.

 

CONTEXT:  fffff88004763560 -- (.cxr 0xfffff88004763560;r)

 

Running this command shows us the registers at the time of the crash.

 

kd> .cxr 0xfffff88004763560

rax=4f4f4f4f4f4f4f4f rbx=fffff80092690460 rcx=fffff800926fbc60

rdx=0000000000000000 rsi=0000000000001000 rdi=0000000000000000

rip=fffff8009267244a rsp=fffff88004763f60 rbp=fffff8009268fb40

r8=fffffa8001a1b820  r9=0000000000000001 r10=fffff800926fbc60

r11=0000000000000011 r12=0000000000000000 r13=fffff8009268fb48

r14=0000000000000012 r15=000000006374504d

iopl=0         nv up ei pl nz na po nc

cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206

nt!ExAllocatePoolWithTag+0x442:

fffff800`9267244a 4c8b4808        mov     r9,qword ptr [rax+8] ds:002b:4f4f4f4f`4f4f4f57=????????????????

 

From the above output we can see that the crash occurred in ExAllocatePoolWithTag, which is a good indication that the crash is due to pool corruption.  Often an engineer looking at a dump will stop at this point and conclude that a crash was caused by corruption, however we can go further.

 

The instruction that we failed on was dereferencing rax+8.  The rax register contains 4f4f4f4f4f4f4f4f, which does not fit with the canonical form required for pointers on x64 systems.  This tells us that the system crashed because the data in rax is expected to be a pointer but it is not one.

 

To determine why rax does not contain the expected data we must examine the instructions prior to where the failure occurred.

 

kd> ub .

nt!KzAcquireQueuedSpinLock [inlined in nt!ExAllocatePoolWithTag+0x421]:

fffff800`92672429 488d542440      lea     rdx,[rsp+40h]

fffff800`9267242e 49875500        xchg    rdx,qword ptr [r13]

fffff800`92672432 4885d2          test    rdx,rdx

fffff800`92672435 0f85c3030000    jne     nt!ExAllocatePoolWithTag+0x7ec (fffff800`926727fe)

fffff800`9267243b 48391b          cmp     qword ptr [rbx],rbx

fffff800`9267243e 0f8464060000    je      nt!ExAllocatePoolWithTag+0xa94 (fffff800`92672aa8)

fffff800`92672444 4c8b03          mov     r8,qword ptr [rbx]

fffff800`92672447 498b00          mov     rax,qword ptr [r8]

 

The assembly shows that rax originated from the data pointed to by r8.  The .cxr command we ran earlier shows that r8 is fffffa8001a1b820.  If we examine the data at fffffa8001a1b820 we see that it matches the contents of rax, which confirms this memory is the source of the unexpected data in rax.

 

kd> dq fffffa8001a1b820 l1

fffffa80`01a1b820  4f4f4f4f`4f4f4f4f

 

To determine if this unexpected data is caused by pool corruption we can use the !pool command.

 

kd> !pool fffffa8001a1b820

Pool page fffffa8001a1b820 region is Nonpaged pool

 fffffa8001a1b000 size:  810 previous size:    0  (Allocated)  None

 

fffffa8001a1b810 doesn't look like a valid small pool allocation, checking to see

if the entire page is actually part of a large page allocation...

 

fffffa8001a1b810 is not a valid large pool allocation, checking large session pool...

fffffa8001a1b810 is freed (or corrupt) pool

Bad previous allocation size @fffffa8001a1b810, last size was 81

 

***

*** An error (or corruption) in the pool was detected;

*** Attempting to diagnose the problem.

***

*** Use !poolval fffffa8001a1b000 for more details.

 

 

Pool page [ fffffa8001a1b000 ] is __inVALID.

 

Analyzing linked list...

[ fffffa8001a1b000 --> fffffa8001a1b010 (size = 0x10 bytes)]: Corrupt region

 

 

Scanning for single bit errors...

 

None found

 

The above output does not look like the !pool command we used earlier.   This output shows corruption to the pool header which prevented the command from walking the chain of allocations.

 

The above output shows that there is an allocation at fffffa8001a1b000 of size 810.  If we look at this memory we should see a pool header.  Instead what we see is a pattern of 4f4f4f4f`4f4f4f4f.

 

kd> dq fffffa8001a1b000 + 810

fffffa80`01a1b810  4f4f4f4f`4f4f4f4f 4f4f4f4f`4f4f4f4f

fffffa80`01a1b820  4f4f4f4f`4f4f4f4f 4f4f4f4f`4f4f4f4f

fffffa80`01a1b830  4f4f4f4f`4f4f4f4f 00574f4c`46524556

fffffa80`01a1b840  00000000`00000000 00000000`00000000

fffffa80`01a1b850  00000000`00000000 00000000`00000000

fffffa80`01a1b860  00000000`00000000 00000000`00000000

fffffa80`01a1b870  00000000`00000000 00000000`00000000

fffffa80`01a1b880  00000000`00000000 00000000`00000000

 

At this point we can be confident that the system crashed because of pool corruption.

 

Because the corruption occurred in the past, and a dump is a snapshot of the current state of the system, there is no concrete evidence to indicate how the memory came to be corrupted.  It is possible the driver that allocated the pool block immediately preceding the corruption is the one that wrote to the wrong location and caused this corruption.  This pool block is marked with the tag “None”, we can search for this tag in memory to determine which drivers use it.

 

kd> !for_each_module s -a @#Base @#End "None"

fffff800`92411bc2  4e 6f 6e 65 e9 45 04 26-00 90 90 90 90 90 90 90  None.E.&........

kd> u fffff800`92411bc2-1

nt!ExAllocatePool+0x1:

fffff800`92411bc1 b84e6f6e65      mov     eax,656E6F4Eh

fffff800`92411bc6 e945042600      jmp     nt!ExAllocatePoolWithTag (fffff800`92672010)

fffff800`92411bcb 90              nop

 

The file Pooltag.txt lists the pool tags used for pool allocations by kernel-mode components and drivers supplied with Windows, the associated file or component (if known), and the name of the component. Pooltag.txt is installed with Debugging Tools for Windows (in the triage folder) and with the Windows WDK (in \tools\other\platform\poolmon).  Pooltag.txt shows the following for this tag:

 

None - <unknown>    - call to ExAllocatePool

 

Unfortunately what we find is that this tag is used when a driver calls ExAllocatePool, which does not specify a tag.  This does not allow us to determine what driver allocated the block prior to the corruption.  Even if we could tie the tag back to a driver it may not be sufficient to conclude that the driver using this tag is the one that corrupted the memory.

 

The next step should be to enable special pool and hope to catch the corruptor in the act.  We will discuss special pool in our next article.

Leave a Comment
  • Please add 1 and 8 and type the answer here:
  • Post
  • This is a really nice article, thanks!

  • Thanks, I was just thinking about a presentation about pool corruptions for my colleagues as the number of crashes related to this topic is quite high. Now I can point everyone here :) Enabling special pool in verifier is a bit cumbersome process and not all customers are willing to wait for next crash. So in many cases we go for  possible driver updates as a first step and enable verifier only in case of reoccurrence.

    [In the next article we will cover several different techniques to enable special pool.  There is a command line option for verifier which is less cumbersome and easier to provide to customers without deep technical experience.]

  • Wow!!! really cool one .. waiting for the next article :)

  • Super Article.  thank you so much !

  • Nice article, thanks !

    I can't wait for your next article, which will probably help me a lot with understanding a crashing VM I got (I suspect a malware).

  • nice and well written.

    Is it possible to figure out the culprit driver. in our case myfault.sys. when i ran the tool i get different BSOD on analyzing those them, i am able to figure out  issue is caused due to pool corruption but not able to figure out exact driver.

    [Hi Ram.  We just released part 2 of this article, it should answer your question.  http://blogs.msdn.com/b/ntdebugging/archive/2013/08/22/understanding-pool-corruption-part-2-special-pool-for-buffer-overruns.aspx]

Page 1 of 1 (6 items)