Corrupt Page Table Pages Caught in the MDL

Corrupt Page Table Pages Caught in the MDL

  • Comments 1

Hello all, Scott Olson here again to share another interesting issue I worked on a while back.  The issue was that after upgrading to Windows XP Service Pack 2 the system would experience random bug checks with memory corruption.  Interestingly, there was a very specific pattern to the corruption - it looked like a PFN address and flags were randomly placed into the page table page in several places in the process.  The memory manager would never do this type of thing and I suspected that a driver was editing user page table pages, which should never be done.

Let's take a look at the stack:

kd> kb
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr  Args to Child
f15b1308 80523096 c00862d8 10c5b000 00000000 nt!MiDeletePte+0x198
f15b13d0 80519776 000001d8 10d20fff 00000000 nt!MiDeleteVirtualAddresses+0x164
f15b13ec 805b1d74 10c20000 10d20fff f15b14a4 nt!MiDeleteFreeVm+0x20
f15b148c 8054060c ffffffff 049c6aa8 049c6ab0 nt!NtFreeVirtualMemory+0x42e
f15b148c 7c90eb94 ffffffff 049c6aa8 049c6ab0 nt!KiFastCallEntry+0xfc
03e4a398 7c90da54 7c8209b3 ffffffff 049c6aa8 ntdll!KiFastSystemCallRet
03e4a39c 7c8209b3 ffffffff 049c6aa8 049c6ab0 ntdll!NtFreeVirtualMemory+0xc

Here is the page table entry for the virtual address:

kd> !pte 10c5b000
               VA 10c5b000
PDE at 00000000C0600430    PTE at 00000000C00862D8
contains 000000003FC6F867  contains 0000000015E0086F
pfn 3fc6f ---DA--UWEV    pfn 15e00 ---DA-TUWEV

This shows that the value 15e0086f is incorrectly put into the page table pages.  This bad value corresponds to a write-through mapping to a range allocated via a call to MmAllocatePagesForMdl.

c00862d0  00000000 00000000 15e0086f 00000000
c00862e0  00000000 00000000 00000000 00000000
c00862f0  00000000 00000000 00000000 00000000
c0086300  00000000 00000000 00000000 00000000
c0086310  00000000 00000000 00000000 00000000
c0086320  00000000 00000000 00000000 00000000
c0086330  00000000 00000000 00000000 00000000
c0086340  00000000 00000000 00000000 00000000
c0086350  00000000 00000000 15e0086f 00000000
c0086360  00000000 00000000 00000000 00000000
c0086370  00000000 00000000 00000000 00000000
c0086380  00000000 00000000 00000000 00000000
c0086390  00000000 00000000 00000000 00000000
c00863a0  00000000 00000000 00000000 00000000
c00863b0  00000000 00000000 00000000 00000000
c00863c0  00000000 00000000 00000000 00000000
c00863d0  00000000 00000000 00000000 00000000
c00863e0  00000000 00000000 00000000 00000000
c00863f0  00000000 00000000 00000000 00000000
c0086400  00000000 00000000 00000000 00000000
c0086410  00000000 00000000 00000000 00000000
c0086420  00000000 00000000 00000000 00000000
c0086430  00000000 00000000 00000000 00000000
c0086440  00000000 00000000 00000000 00000000
c0086450  00000000 00000000 00000000 00000000
c0086460  15e0086f 00000000 00000000 00000000
c0086470  00000000 00000000 00000000 00000000
c0086480  00000000 00000000 00000000 00000000
c0086490  00000000 00000000 00000000 00000000
c00864a0  00000000 00000000 00000000 00000000
c00864b0  00000000 00000000 00000000 00000000
c00864c0  00000000 00000000 00000000 00000000
c00864d0  00000000 00000000 00000000 00000000
c00864e0  15e0086f 00000000 00000000 00000000
c00864f0  00000000 00000000 00000000 00000000
c0086500  00000000 00000000 00000000 00000000
c0086510  00000000 00000000 00000000 00000000
c0086520  00000000 00000000 00000000 00000000
c0086530  00000000 00000000 00000000 00000000
c0086540  00000000 00000000 00000000 00000000
c0086550  00000000 00000000 00000000 00000000
c0086560  15e0086f 00000000 00000000 00000000
c0086570  00000000 00000000 00000000 00000000
c0086580  00000000 00000000 00000000 00000000
c0086590  00000000 00000000 00000000 00000000
c00865a0  00000000 00000000 00000000 00000000
c00865b0  00000000 00000000 00000000 00000000
c00865c0  00000000 00000000 00000000 00000000
c00865d0  00000000 00000000 00000000 00000000
c00865e0  00000000 00000000 00000000 00000000
c00865f0  00000000 00000000 00000000 00000000
c0086600  00000000 00000000 00000000 00000000
c0086610  00000000 00000000 00000000 00000000
c0086620  00000000 00000000 00000000 00000000
c0086630  00000000 00000000 00000000 00000000
c0086640  00000000 00000000 00000000 00000000
c0086650  00000000 00000000 00000000 00000000
c0086660  00000000 00000000 15e0086f 00000000
c0086670  00000000 00000000 00000000 00000000
c0086680  00000000 00000000 00000000 00000000
c0086690  00000000 00000000 00000000 00000000
c00866a0  00000000 00000000 00000000 00000000
c00866b0  00000000 00000000 00000000 00000000
c00866c0  00000000 00000000 00000000 00000000
c00866d0  00000000 00000000 00000000 00000000
c00866e0  00000000 00000000 15e0086f 00000000
c00866f0  00000000 00000000 00000000 00000000
c0086700  00000000 00000000 00000000 00000000
c0086710  00000000 00000000 00000000 00000000
c0086720  00000000 00000000 00000000 00000000
c0086730  00000000 00000000 00000000 00000000
c0086740  00000000 00000000 00000000 00000000
c0086750  00000000 00000000 00000000 00000000
c0086760  00000000 00000000 15e0086f 00000000
c0086770  00000000 00000000 00000000 00000000
c0086780  00000000 00000000 00000000 00000000
c0086790  00000000 00000000 00000000 00000000
c00867a0  00000000 00000000 00000000 00000000
c00867b0  00000000 00000000 00000000 00000000
c00867c0  00000000 00000000 00000000 00000000
c00867d0  00000000 00000000 00000000 00000000
c00867e0  00000000 00000000 00000000 00000000

kd> !pfn 15e00
    PFN 00015E00 at address 81BCA800
    flink       00000000  blink / share count 00000001  pteaddress 000AF001
    reference count 0002   Cached     color 0
    restore pte 00000080  containing page        FFEDCB  Active       RW
        ReadInProgress WriteInProgress

The driver also has an outstanding call MmProbeAndLockPages call on the pages indicated by the reference count of 2.  Thinking that this pfn value is incorrect I decided to search for this value and see what I could find.

kd> s -d 80000000 l?7fffffff 00015e00
8022d534  00015e00 0001f190 00041d50 0001f140  .^......P...@...
86cacbf4  00015e00 0000cd1c 0000cc27 0000cc08  .^......'.......
86e25cdc  00015e00 0a130005 e56c6946 00000000  .^......Fil.....

I found a few entries but the middle one looks like it could be an MDL allocation.  So I verified this:

kd> !pool 86cacbf4 2
Pool page 86cacbf4 region is Nonpaged pool
*86cacbd0 size:   80 previous size:   28  (Allocated) *Mdl
                Pooltag Mdl  : Io, Mdls

Yes this is an MDL, let's inspect it:

kd> dt nt!_MDL 86cacbd8
   +0x000 Next             : (null)
   +0x004 Size             : 32
   +0x006 MdlFlags         : 138
   +0x008 Process          : (null)
   +0x00c MappedSystemVa   : 0x00004000
   +0x010 StartVa          : 0xf7baa000
   +0x014 ByteCount        : 0xfff
   +0x018 ByteOffset       : 0

Notice that the page 15e00 is in the MDL’s page list.

kd> dd 86cacbd8+1c
86cacbf4  00015e00 0000cd1c 0000cc27 0000cc08
86cacc04  0000cc09 0000cc0a 0000cc0b 0000cbec
86cacc14  0000cbed 0000cbee 0000cbef 0000cbd0
86cacc24  0000cbd1 0000cbd2 0000cbd3 0000cbd4
86cacc34  0000cbd5 0000cbd6 00000000 00000000
86cacc44  00000000 00000000 00000000 00010010

Next I wanted to see if I could find a driver that may have references to this MDL and I found two:

kd> s -d 80000000 l?7fffffff 86cacbd8
86f9c6a0  86cacbd8 0000003d 00000000 0000636a  ....=.......jc..
86fc7e68  86cacbd8 00000001 00000001 00000000  ................

Now let's see who owns these

kd> !pool 86f9c6a0 2
Pool page 86f9c6a0 region is Nonpaged pool
*86f9c618 size:   d8 previous size:   30  (Allocated) *Crpt
                Pooltag Crpt   : Memory corruption driver

kd> !pool 86fc7e68 2
Pool page 86fc7e68 region is Nonpaged pool
*86fc7e00 size:   98 previous size:   40  (Allocated) *Crpt
                Pooltag Crpt   : Memory corruption driver

This gives us a pretty convincing probability that this driver is at fault.  So now you may ask, "Why did this problem only start after applying Service Pack 2?"  By default when you install Server Pack 2, Data Execution Prevention (DEP) is enabled on systems that support it.  The support for DEP is in the PAE kernel which uses extra bits to describe the page table entries.  In this crash the solution was to disable DEP until the driver could be corrected.  The driver was incorrectly using the memory mappings by ignoring the extra bits in the page number and causing the memory corruption by writing to the wrong page.  For more information on default DEP settings and enabling/disabling it in Windows see the following article.

899298  The "Understanding Data Execution Prevention" help topic incorrectly states the default setting for DEP in Windows Server 2003 Service Pack 1
http://support.microsoft.com/default.aspx?scid=kb;EN-US;899298

Leave a Comment
  • Please add 2 and 4 and type the answer here:
  • Post