Part 1: Got Stack? No. We ran out of Kernel Mode Stack and kv won’t tell me why!

Part 1: Got Stack? No. We ran out of Kernel Mode Stack and kv won’t tell me why!

  • Comments 1

My name is Ryan Mangipano (ryanman) and I am a Sr. Support Escalation Engineer at Microsoft.  This two part blog will consist of a complete walkthrough of a bugcheck that occurred due to an overflowed stack condition. What is unique about this situation is the stack backtrace wasn’t being displayed.  As we proceed with the walkthrough of the dump analysis, I will be providing demonstrations and background information relating to Task States and Double Faults. 

 

I began my review with the command !analyze –v

 

!analyze -v

UNEXPECTED_KERNEL_MODE_TRAP (7f)

Arg1: 00000008, EXCEPTION_DOUBLE_FAULT

 

You can see from the output above that an unexpected kernel mode trap has taken place. Arg1 reported by the output from !analyze –v indicates the type of trap that occurred was a double fault. A double fault indicates that an exception occurred during a call to the handler for a prior exception. Although a double fault can be caused by other reasons (hardware or a corrupt stack pointer value), we most commonly observe this bugcheck when the drivers executing on the system have caused all of the available 12k of Kernel mode stack space to become exhausted.

 

 Threads on a 32-bit system are given 12k of kernel-mode stack space.  16k of kernel virtual address space will actually be consumed due to the occupation of 4K of virtual address space by an invalid PTE. This guard PTE is used to guard the virtual address range before the kernel stack limit (The stack grows toward lower memory addresses).  This 4k guard page is placed in this location to catch stack overflows. The 12k stack size is not configurable because it is hard coded into the kernel.  For more information, please refer to “Windows Internals, Fifth Edition, page 786, Kernel Stacks”.  

 

If the 12k of kernel stack space is all used up and drivers attempt to use stack space beyond the valid range, a page fault exception will occur as the invalid virtual addresses related to the guard PTE are referenced.   

 

When this page fault exception occurs, the CPU will automatically attempt to push some data onto the stack before transferring control to the page fault handler (Thank you to one of our reader's for correcting this information). More details on what data is pushed to the stack is available in the Intel Processor Family Developer’s Manual, Vol. 3 Chapter 14 (Protected-Mode Exceptions and Interrupts). However when the CPU tries to push this data, another fault will occur due to the stack pointer still providing an invalid address. This causes a double exception (AKA EXCEPTION_DOUBLE_FAULT).

 

So how can the OS handle this type of situation in order to write out the dump file?  The code associated with TRAP 0x8 (EXCEPTION_DOUBLE_FAULT) will perform a task state segment switch and obtain a new stack pointer which is valid. Task State Segment switching is a CPU provided mechanism that allows us to switch to a new task state and store a link to the previously executed task state. The information that is needed to restore a task is stored in a task-state segment (TSS).  The debugger command .tss can later be used to switch back to the previous task state to examine the context at the time of failure. More information regarding Task-State Segment (TSS) is available in the Intel Processor Manual Set (volume 3, Chapter 6).

 

 

In addition to the bugcheck data listed above, the output from the command !analyze -v has also provided me with the .tss command that I needed to type into the debugger.:

 

TSS:  00000028 -- (.tss 0x28)

 

You can type .tss 0x28 in the command window but I simply clicked the DML (debugger Markup Language) hyperlink which entered the .tss command for me. As discussed above, this command accepts the address of the saved Task State Segment (TSS) information for the current processor. This command will set the appropriate context just like the .trap or .cxr commands. 

 

The processor provides a Task Register which contains a 16-bit segment selector.  The register is actually larger. There is other data stored in this register, however it is only viewable by the processor for caching the segment descriptor. Windbg’s r command can be used to dump out the usable portion of this register.

 

3: kd> rtr

tr=00000050

 

So the task register was pointing to a different task (.tss 0x50) at the time of the second exception. But where did !analyze –v get this .tss 0x28 value from?

 

Let’s do some digging. You can get the address of the TSS for the current processor by using the !pcr command

 

3: kd> !pcr

KPCR for Processor 3 at f7737000:

.
.

(omitted several fields for this blog)

 

              TSS: f773a2e0

 

Extensions like !pcr are great, but I also like to understand how the values were obtained. So instead of just getting the value from !pcr,  How else can we find it?

The fs register points to the memory segment that the _KPCR for the current processor is stored. This structure is stored at the base, offset 0x0.

 

3: kd> rfs

Last set context:

     fs=00000030       ßpoints to the segment where the nt!_KPCR is stored at the base of.

 

Let’s see where the _KTSS pointer is stored within the KPCR structure.

 

3: kd> dt nt!*PCR*

          ntkrpamp!_KPCR

 

3: kd> dt ntkrpamp!_KPCR TSS

   +0x040 TSS : Ptr32 _KTSS  0x40 is the offset that the pointer to the TSS is stored.

 Let’s use those two values to dump this out. The 0030: represents the memory segment. Note that I have added 0x40 from the base and dumped out this location 

 

3: kd> dd 0030:00000040 L1   

0030:00000040  f773a2e0    ßpointer to the nt!_KTSS

 

3: kd> dt nt!_KTSS f773a2e0 Backlink

 +0x000 Backlink : 0x28    ß And here is our pointer to the previous task state.

 

This is why !analyze –v   has directed us to type in .tss 0x28

 

But where did !pcr get the address of the KPCR itself? !pcr is listing f7737000. We can find that out also.

 

3: kd> dt ntkrpamp!_KPCR SelfPcr

   +0x01c SelfPcr : Ptr32 _KPCR   ßso the pointer is stored at 0x1c

This command demonstrates the use of fs: instead of 0030: (BTW - I then provided the offset of 1c to get the pointer)

 

3: kd> dd fs:0x1c L1                 

0030:0000001c  f7737000         there it is, we found it

 

To demonstrate that both addresses reference the same data, let’s  dump it out using the size given below.

 

3: kd> dd f7737000 L0x54         

f7737000  b8ae60dc 00000000 00000000 f7737fe0

f7737010  19d5c42c 00000008 7ff9c000 f7737000

f7737020  f7737120 0000001f 00000000 00000000

f7737030  ffffffff 00000000 f773d800 f773d400

f7737040  f773a2e0 00010001 00000008 00000e56

f7737050  08000300 00000000 00000000 00000000

f7737060  00000000 00000000 00000000 00000000

f7737070  00000000 00000000 00000000 00000000

f7737080  00000000 00000000 00000000 00000000

f7737090  00100000 00000003 09f15190 00000000

f77370a0  09f15190 dabc6620 00000000 334e730f

f77370b0  00000000 00000000 00000000 00000000

f77370c0  00000000 00000000 00000000 00000000

f77370d0  00000000 00000000 00000000 00000000

f77370e0  00000000 00000000 00000000 00000000

f77370f0  00000000 00000000 00000000 00000000

f7737100  00000000 00000000 00000000 00000000

f7737110  00000000 00000000 00000000 00000000

f7737120  00010001 87d68438 00000000 f773a090

f7737130  00000003 00000008 0401010f 00000000

f7737140  00000000 00000000 00000000 00000000

 

3: kd> dd fs:0 L0x54

0030:00000000  b8ae60dc 00000000 00000000 f7737fe0

0030:00000010  19d5c42c 00000008 7ff9c000 f7737000

0030:00000020  f7737120 0000001f 00000000 00000000

0030:00000030  ffffffff 00000000 f773d800 f773d400

0030:00000040  f773a2e0 00010001 00000008 00000e56

0030:00000050  08000300 00000000 00000000 00000000

0030:00000060  00000000 00000000 00000000 00000000

0030:00000070  00000000 00000000 00000000 00000000

0030:00000080  00000000 00000000 00000000 00000000

0030:00000090  00100000 00000003 09f15190 00000000

0030:000000a0  09f15190 dabc6620 00000000 334e730f

0030:000000b0  00000000 00000000 00000000 00000000

0030:000000c0  00000000 00000000 00000000 00000000

0030:000000d0  00000000 00000000 00000000 00000000

0030:000000e0  00000000 00000000 00000000 00000000

0030:000000f0  00000000 00000000 00000000 00000000

0030:00000100  00000000 00000000 00000000 00000000

0030:00000110  00000000 00000000 00000000 00000000

0030:00000120  00010001 87d68438 00000000 f773a090

0030:00000130  00000003 00000008 0401010f 00000000

0030:00000140  00000000 00000000 00000000 00000000

 

 

Now that you have an idea of what a state is, let’s examine the stack output of the two states. First, we shall use .tss 0x50 to examine the stack backtrace associated with this state. We shall use the kC command to dump the stack after the .tss command. Notice that we have used the ; command to enter multiple commands on each line.

 

3: kd> .tss 0x50;kC

eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=00000000 edi=00000000

eip=8088b702 esp=f773d3c0 ebp=00000000 iopl=0         nv up di pl nz na po nc

cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000000

nt!_KiTrap08:

8088b702 fa              cli

  *** Stack trace for last set context - .thread/.cxr resets it

 

nt!_KiTrap08  

nt!_KiTrap0E       

 

The stack backtrace shows two trap handlers and nothing else. According to this stack output, we were first attempting to handle a Trap 0x0E which is a page fault. The page fault handler was invoked in an attempt to handle the invalid address that we accessed in the guard page when we overflowed the stack.  You can see that after the page fault, we encountered another exception represented by the KiTrap08. This is a result of the EXCEPTION_DOUBLE_FAULT indicating that the page fault handler has also encountered an exception. This matches what is listed as arg1 in the bugcheck data that !analyze –v has output. So, the stack backtrace for .tss 0x50 shows that we were first executing the task referenced by 0x28 for Trap0E/Page Fault, when a task state switch occurred and we switched to .tss 0x50 to handle the Trap08/DoubleFault. 

 

Next, we will use the command .tss 0x28 and dump the stack backtrace associated with that task state

 

3: kd> .tss 0x28;kC

eax=b8ae0023 ebx=b8ae60ec ecx=87d68438 edx=87758bd8 esi=b8ae6068 edi=808813d8

eip=8088c718 esp=b8ae5fe4 ebp=b8ae5fe4 iopl=0         nv up di pl zr na pe nc

cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010046

nt!_KiTrap0E+0x5c:

8088c718 89550c          mov     dword ptr [ebp+0Ch],edx ss:0010:b8ae5ff0=????????

  *** Stack trace for last set context - .thread/.cxr resets it

 nt!_KiTrap0E       

 

You can see in the output above that the stack backtrace has only displayed KiTrap0xE. We should see multiple stack frames listed. I’m a bit concerned about the fact that I do not see a valid stack backtrace listed in the output above. Nevertheless, let's proceed with our examination.

 

 

Now that we have set the proper task state using .tss 0x28 which loaded the registers with the appropriate context, our next step will be to determine where the stack related registers are pointing and how they relate to the 12k range of valid stack addresses for the current thread. This will help us to validate that we did in fact overflow the stack. The easiest way to examine the stack range that this thread was given is to use !thread

 

!thread   

Owning Process            874c6800       Image:         StackHog.exe

...

Base b8ae9000 Limit b8ae6000

 

 

3: kd> resp;rebp

Last set context:

esp=b8ae5fe4   ßNotice that this is outside of the Base and Limit ranges listed above.

Last set context:

ebp=b8ae5fe4

 

Since the stack grows toward lower addresses, an overflow of the b8ae6000 limit will result in a value that is below the address of the limit, you can see that the address of esp has fallen out of the valid range of stack space.

 

3: kd> dd b8ae5fe4 L1

b8ae5fe4  ????????

 

Let’s look at this memory range in more detail. The invalid ranges displayed by ????? represent the guard page. The range of valid stack addresses starts (or ends depending on how you look at it) at b8ae6000.

 

3: kd> dd b8ae5fe0 L10

b8ae5fe0  ???????? ???????? ???????? ????????

b8ae5ff0  ???????? ???????? ???????? ????????

b8ae6000  00000000 00000000 00000000 00000000

b8ae6010  00000000 b8ae0000 b8ae0023 00000023

 

Also, note that we are running in trap handler 0x0E.  This is the page fault handler on x86 (refer to your Intel Processor Manuals for more details).

 

3: kd> u . L1

nt!_KiTrap0E+0x5c

8088c718 89550c          mov     dword ptr [ebp+0Ch],edx

 

The address we we're attempting to access may be in cr2. Let’s dump it out.

 

3: kd> rcr2

Last set context:

cr2=b8ae5fe0     ßThis address is just beyond the stack limit for this thread

 

What is the present instruction in the trap handler doing?

 

3: kd> u . L1

nt!_KiTrap0E+0x5c

8088c718 89550c          mov     dword ptr [ebp+0Ch],edx

 

Ok, so were dereferencing ebp plus an offset of 0x0C. What does that add up to be?

 

3: kd> ? ebp+0x0c

Evaluate expression: -1196531728 = b8ae5ff0

 

3: kd> dd b8ae5ff0 L1

b8ae5ff0  ????????

 

Once the stack overflowed, we can see there were many access attempts to addresses which are not in the valid stack range. This led us to the 7f bugcheck with the double fault parameter.

 

When the system bugchecks because the entire 12k range of a thread’s kernel-mode stack space has been filled up, there can be a few causes. Drivers on the stack may have made very large allocations on the stack instead of using other methods of obtaining memory such as calling ExAllocatePoolWithTag(). Sometimes this is done since it is quicker to use the stack instead of making calls to allocate and free memory from the operating system pools. Other times a driver will have made calls in a manner that causes too many other calls to be made filling up the stack. It is possible for nested functions that never encounter an exit condition to continuously call themselves to exhaust the stack. Often a system will have software from many different vendors that all install heavy stack consuming drivers into the I/O path. Each driver will use a portion of stack space that will add up to a lot since there are so many drivers installed.  For example, if a system has too many file system filter drivers installed in the file system stack and they use more than the minimum amount of stack space possible, it’s not uncommon for all of them put together to cause a stack overflow.

 

Sometimes when dealing with this error, we need to realize that there may not be any one product to blame. A stop 7f sometimes isn’t about identifying the faulting component as it often is in other areas of troubleshooting. It is more about understanding that stack space isn’t an unlimited resource and developing a clear picture of what lead up to the stack space filling up. Sometimes this will result in the need to engage multiple vendors for assistance when there are a combination of drivers on the stack that are all using a large amount of stack space.  Sometimes vendors will provide newer updated drivers that have been optimized to use less stack space. Other times, we simply have too much I/O related software installed and the only answer is to simply remove some of the drivers by uninstalling the product.

 

NTFS and some 3rd party file system filter drivers employ a technique to avoid a stack overflow.  What they do is probe the stack by calling IoGetRemainingStackSize() and if there is not enough stack space left, they will offload the remainder of the work to a dedicated kernel thread that they created just for that purpose. On Vista or Later (or 2003 x64), developers can call KeExpandKernelStackAndCallout, which will allow chaining to another 16k stack.  For more information, see http://msdn.microsoft.com/en-us/library/aa906738.aspx.

 

The easiest way to figure out why we have overflowed the stack is to dump it out and examine the stack backtrace. Therefore, this is typically the first and sometimes the only step necessary to perform when reviewing an EXCEPTION_DOUBLE_FAULT memory dump.  We will now proceed to dump out the stack and examine the stack usage of the different drivers and the calls that they made to determine if further investigation is needed. So, let’s do that now.  I will use the L200 option; otherwise the debugger will only display the default number of frames which won’t display the entire stack. It doesn’t make much sense to review only the top of the stack since the entire stack is full.  I dumped the stack and only got one stack frame listed.

 

3: kd> kfL200

  Memory  ChildEBP RetAddr 

          b8ae5fe4 00000000 nt!_KiTrap0E+0x5c

 

This is not what I was hoping to see. We don’t have a stack. Let me try using kv to see if there is a trap frame

 

3: kd> kvL200

ChildEBP RetAddr  Args to Child             

b8ae5fe4 00000000 00000000 00000000 00000000 nt!_KiTrap0E+0x5c (FPO: [0,0] TrapFrame-EDITED @ 00000000)

 

So, I don’t see a valid trap frame either.  I went back to my !analyze –v output and verified that it had also displayed this one frame only. How will we see what filled up the stack to provide recommendations to the customer or analysis on what happened without the stack? In part two of this blog, we will review how to manually reconstruct the stack and pass values into the kf command in order to get a useful stack backtrace to display.

 

 

Share this post :

Leave a Comment
  • Please add 6 and 1 and type the answer here:
  • Post
  • Thanks for the post.

    >If [..] drivers attempt to use stack space beyond the valid range,

    >a page fault exception will occur [..].

    >[..]

    >when the exception handler [KiTrap0E] tries to push this trap frame

    >onto the stack, another fault will occur due to the stack pointer still

    >providing an invalid address. This causes a double exception.

    You probably just assumed that, but we believe this should be clarified: double fault happens not when _exception handler_ tries to save something in invalid page per se, but when _processor_ tries to save initial portion of trap frame in the invalid page. Something like this:

    "When stack space is all used up and there's an attempt to use stack space beyone the valid range, a page fault exception shall occur. Processor will try to save cs, rip, etc on the kernel stack — but it won't be able to, since rsp points to an invalid page".

    This sequence, btw, can be emphasized by the following observaion of information presented here: when initial stack fault in the KiTrap0E occured, esp and ebp had value of b8ae5fe4, and accessed invalid address was @ebp+0C == b8ae5ff0. But later on, we can see b8ae5fe0 in the cr2 – not the b8ae5ff0 – and indeed, last acceess was being made to @esp-4 by the processor's "internal routine" — when it tried to save execution state info on the stack (eflags, afawr).

Page 1 of 1 (1 items)