1. Introduction
Stack fault happens whenever a thread’s stack is almost used up, and the Windows CE kernel will generate a message of the problem like this:
9381584 PID:47c40bda TID:24ccc7ce [Stack fault]: Thread=85fa3a40 Proc=8060e640 'device.exe'
9381584 PID:47c40bda TID:24ccc7ce AKY=00020013 PC=01a23be0(dummy.dll+0x00003be0) RA=03ed33a0(devmgr.dll+0x000033a0) BVA=24081d24 FSR=00000007
The data in the message is important to understand the failure, which can be a data abort, stack fault, page fault, etc.
- AKY "Access Key": Process slot bitmask corresponding to the processes the excepting thread has access to. Platform Builder can shows Access Keys of each process in the “Processes” window.
- PC "Program Counter": Represents the current line of instruction. On ARM platforms, this is the current value of the PC register and EIP (Instruction Pointer) on x86 platforms. If symbols are available, the exception handler will attempt to provide an offset line into the DLL that caused the exception.
- RA "Return Address": Pointer to the instruction address of the function that called the current function. Had the current function NOT caused an exception, this is where we would return to. For a DLL on Windows CE, simply use the last two bytes of the address plus the preferred load address of the module (0x10000000) to search Rva+Base column in the corresponding map file and you will find the function the address falls into. For an EXE, use the last two bytes of the address plus the preferred load address (0x00010000) to search the Rva+Base column. The same calculation applies to BVA below.
- BVA "Base Virtual Address": The contents of BVA depend on the type of exception found. If the exception is a Prefetch Abort, the value points directly to the PC register (execution point). If the exception is a Data Abort, then this value points to why the exception was caused. It is a combination of the Virtual Memory base of the module found plus the value that caused the exception.
- FSR "Fault Status Register": The FSR represents several flags that will help you understand the nature of your exception. For ARM devices the following flags can be set:
#define FSR_ALIGNMENT 0x01
#define FSR_PAGE_ERROR 0x02
#define FSR_TRANSLATION 0x05
#define FSR_DOMAIN_ERROR 0x09
#define FSR_PERMISSION 0x0D
To confirm a stack fault failure, you should at least have the system dump when the problem occurred. Or better yet, you have a KITL enabled device and you can reproduce the problem.
2. A Real Example
The following is a real world example of stack fault in while running Hopper with focus on tmail.exe.
9381584 PID:47c40bda TID:24ccc7ce [Stack fault]: Thread=85fa3a40 Proc=8060e640 'device.exe'
9381584 PID:47c40bda TID:24ccc7ce AKY=00020013 PC=01a23be0(dummy.dll+0x00003be0) RA=03ed33a0(devmgr.dll+0x000033a0) BVA=24081d24 FSR=00000007
Failed to initialize bug tagger!
9383377 PID:47c40bda TID:24ccc7ce RaiseException: Thread=85fa3a40 Proc=8060e640 'device.exe'
9383463 PID:47c40bda TID:24ccc7ce AKY=00020013 PC=03f6c3c4(coredll.dll+0x0001e3c4) RA=8030a514(NK.EXE+0x0000a514) BVA=00000001 FSR=00000001
9384183 PID:a79698b2 TID:a5f6ac92 OEMIoControl: Unsupported Code 0x1010058 - device 0x0101 func 22
Assembly Dump:
DUM_IOControl:
01A23BA4 E1A0C00D mov r12, sp
01A23BA8 E92D5FF0 stmdb sp!, {r4 - r12, lr}
01A23BAC E28DB028 add r11, sp, #0x28 <<r11 points back to last sp (0x28 accounts for 40 bytes for those saved registers)
01A23BB0 E59FCD48 ldr r12, [pc, #0xD48] << Load value into r12 from address 0x01A24900. Note that the value is negative. (FFFF7EFC or -33028, the size of the local variables)
01A23BB4 E08DD00C add sp, sp, r12 << Sp = 24081D44 and only 7K (1D44) left on the stack after this.
$L36859:
01A23BB8 E1A06002 mov r6, r2
01A23BBC E1A04001 mov r4, r1
01A23BC0 E59F3D34 ldr r3, [pc, #0xD34]
01A23BC4 E5933000 ldr r3, dwLenIni
01A23BC8 E50B302C str r3, [r11, #-0x2C]
01A23BCC E59F0D24 ldr r0, [pc, #0xD24]
01A23BD0 E3A08001 mov r8, #1
01A23BD4 E3A03000 mov r3, #0
01A23BD8 E3A07057 mov r7, #0x57
01A23BDC E24BC902 sub r12, r11, #2, 18 << r12 = r11 -2>>18 = 0x24081e70
01A23BE0 E50C7114 str r7, [r12, #-0x114] << Crash here since a page in request failed due to stack overflow
01A23BE4 E24BC902 sub r12, r11, #2, 18
…
01A24900 FFFF7EFC ???
Registers Dump:
R0 = 01A23138 R1 = 0001000D R2 = 24089F80 R3 = 00000000
R4 = 0001000D R5 = 0005E180 R6 = 24089F80 R7 = 00000057
R8 = 00000001 R9 = 24089F80 R10 = 00000000 R11 = 24089E70
R12 = 24081E70 Sp = 24081D44 Lr = 03ED33A0 Pc = 01A23BE0
Cpsr = 4000001F
Negative=0 Zero=1 Carry=0 Overflow=0 Q=0
Call Stack Dump:
Call Stack: tmail.exe: 0x24CCC7CE 11:36:54 12/07/2007 Taipei Standard Time
0x24081d44 DUMMY!DUM_IOControl(void * 0x00000000, unsigned long 0x00000000, unsigned long * 0x00000000) dummy.cpp line 491 + 20 bytes
0x24089e70 DEVMGR!DM_DevDeviceIoControl(void * 0x00000000, unsigned long 0x00000000, unsigned long * 0x00000000, _OVERLAPPED * 0x00000000 {Internal=??? InternalHigh=??? Offset=??? ...}) devfile.c line 464 + 44 bytes
0x24089eb4 NK!SC_DeviceIoControl(void * 0x00000000, unsigned long 0x0001000d, unsigned long * 0x80321034, _OVERLAPPED * 0x0ba6169c {Internal=0x00550044 InternalHigh=0x0031004d Offset=0x0000003a ...}) kmisc.c line 2860 + 52 bytes
0x24089f28 COREDLL!xxx_DeviceIoControl(void * 0x00000000, unsigned long 0x00000000, unsigned long * 0x00000000, _OVERLAPPED * 0x00000000 {Internal=??? InternalHigh=??? Offset=??? ...}) twinbase.cpp line 49 + 52 bytes
0x24089f60 CAMERA_MAINSTONEII!SetCameraPresentFlag() cameradriver.cpp line 419
0x24089f90 CAMERA_MAINSTONEII!CAM_Open() cameradriver.cpp line 461
0x24089fb4 DEVMGR!I_CreateDeviceHandle(void * 0x0005e020) devfile.c line 98 + 24 bytes
0x2408a004 DEVMGR!DM_CreateDeviceHandle() devfile.c line 182 + 24 bytes
0x2408a084 COREDLL!xxx_CreateDeviceHandle() tdevice.c line 122
0x2408a08c FILESYS!FS_CreateFileW(unsigned long 0x00075c08, unsigned long 0xc0000000, void * 0x00000001) fsmain.c line 2275 + 28 bytes
0x2408a5ac COREDLL!xxx_CreateFileW(unsigned long 0x00000003, unsigned long 0x00000080, void * 0x00000000) twinbase.cpp line 100 + 52 bytes
0x2408a5e0 QUARTZ!CCaptureAdapter::Load() adapter.cpp line 69 + 44 bytes
End Call Stack: tmail.exe: 0x24CCC7CE 11:36:54 12/07/2007 Taipei Standard Time
Combining the call stack and the registers, we can draw the following figure of the thread stack:

The local variable size in the function DUM_IOControl() is 33028 (Note that -33298= FFFF7EFC). However, the thread stack is almost used up, with only 7492(0x1D44) bytes left. The problem should not occur if OEM checks every warning message when they compile the code.
The following instruction results in a device hang:
01A23BE0 E50C7114 str r7, [r12, #-0x114] << Crash here since a page in request failed due to stack overflow
At this point r12 is 0x24081E70. Thus r7 will be stored in 0x2E081D5C. Note that there are two 4K guard pages for a 64Kb thread stack (in this case the thread stack is 0x24080000 ~ 0x2408FFFF). Note that when you build applications within Platform Builder or the Windows Mobile Adaptation Kit, the default thread stack is 64Kb of reserved virtual memory (the physical RAM is committed one page at a time). This is a pretty reasonable limit for most things on an embedded device. However, when you build your applications in Visual Studio, the default thread stack is 1MB of reserved virtual memory, which is often much more than needed. That is why experienced Windows Mobile application developers will set the default thread stack to 64Kb in Visual Studio (in Project properties à Linker à System à Stack Reserve Size) and then allocate larger stacks only when necessary.
Back to the stack fault above, the two guard pages are: Page A: 0x24082000-0x24083FFFF and Page B: 0x24080000-0x24081FFFF. When Page B is being hit, the kernel will throw an exception for the application to handle. If the application hits Page A, the system will terminate the thread. The above instruction is hitting Page B and the thread will be terminated immediately. For more on thread stack guard pages, see http://blogs.msdn.com/hopperx/archive/2006/02/03/524170.aspx
3. A Faulty Sample Driver
Now let’s take a look at the sample driver that demos the sample problem. Let’s first see the failure logs and call stacks.
Debug messages:
920555 PID:a377f6ca TID:624727c2 [Stack fault]: Thread=83451bb4 Proc=80314240 'device.exe'
920557 PID:a377f6ca TID:624727c2 AKY=00000411 PC=019b14dc(baddrv.dll+0x000014dc) RA=019b151c(baddrv.dll+0x0000151c) BVA=161b19a4 FSR=00000005
Call stack:
0x161b19cc BADDRV!StackOverflow(unsigned long 0x00000010) line 25
0x161b59d8 BADDRV!StackOverflow(unsigned long 0x00000010) line 34
0x161b99e4 BADDRV!StackOverflow(unsigned long 0x00000011) line 34
0x161bd9f0 BADDRV!Stack_Fault(unsigned char * 0x00000000, unsigned long 0x00000000, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned long * 0x161bdb6c) line 20
0x161bda0c BADDRV!Launch_Test_case(unsigned long 0x00000300, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned long * 0x161bdb6c) line 40
0x161bda38 BADDRV!BAD_IOControl(unsigned long 0x00000066, unsigned long 0x00000300, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned long * 0x161bdb6c) line 78 + 36 bytes
0x161bda68 DEVMGR!DM_DevDeviceIoControl(void * 0x00000000, unsigned long 0x161bdb6c, unsigned long * 0x00000000, _OVERLAPPED * 0x161c5970) line 464 + 44 bytes
NK!80036520()
Registers:
R0 = 00000010 R1 = 00000000
R2 = 00000000 R3 = 00000001
R4 = 019B14A0 R5 = 00000003
R6 = 00000300 R7 = 00000000
R8 = 00000000 R9 = 00000000
R10 = 00000000 R11 = 161BDAA8
R12 = 161B59D8 Sp = 161B19CC
Lr = 019B151C Pc = 019B14DC
Cpsr = 2000001F
Negative=0 Zero=0 Carry=1 Overflow=0
Q=0
IRQ=0 FIQ=0 Thumb=0
M4=1 M3=1 M2=1 M1=1 M0=1
Disassembly:
StackOverflow:
019B14CC mov r12, sp
019B14D0 stmdb sp!, {r0}
019B14D4 stmdb sp!, {r12, lr}
019B14D8 sub sp, sp, #1, 18
$M26864:
019B14DC add r3, sp, #1, 18
019B14E0 ldr r3, [r3, #8]
019B14E4 cmp r3, #0
019B14E8 bhi |$M26864+14h (019b14f0)|
019B14EC b |$M26864+40h (019b151c)|
019B14F0 add r3, sp, #1, 18
019B14F4 ldr r3, [r3, #8]
019B14F8 sub r3, r3, #1
019B14FC add r12, sp, #1, 18
019B1500 str r3, [r12, #8]
019B1504 mov r3, #1
019B1508 add r12, sp, #3, 20
019B150C str r3, [r12, #0xFFC]
019B1510 add r0, sp, #1, 18
019B1514 ldr r0, [r0, #8]
019B1518 bl |StackOverflow (019b14cc)|
019B151C add sp, sp, #1, 18
019B1520 ldmia sp, {sp, lr}
019B1524 bx lr
The faulting instruction is PC=019B14DC. At that point SP is 161B19CC. We know that for each thread’s stack ranges from xxxx0000 ~ xxxxffff, and 161B19CC falls into the topmost guard page (Please refer to the stack figure in the above example). Therefore when the kernel sees this as one of the operands in this instruction, it will generate the stack fault.
The code that creates big arrays on the thread stack is shown below:
void Stack_Fault( PBYTE pBufIn,
DWORD dwLenIn,
PBYTE pBufOut,
DWORD dwLenOut,
PDWORD pdwActualOut )
{
DWORD dwStackDepth = 18;
StackOverflow(dwStackDepth);
}
void StackOverflow(DWORD dwDepthCount)
{
if(0 >= dwDepthCount )
return;
dwDepthCount --;
// Create an array on the stack
DWORD buf[4096];
buf[4095] = 1;
StackOverflow(dwDepthCount);
}
Stack_Fafault() calls StackOverflow() recursively. Each time 4Kb of space is allocated on the thread stack. This explains why the thread stack is all used up after three calls of StackOverflow(), as shown in the call stack dump above.
4. Conclusion
This article discusses stack fault, a common problem with a device driver on Windows Mobile platform. The problem occurs when the guard pages of the 64Kb thread stack is being hit. When you see this problem while running Hopper or other tests, make sure you have the call stack and registers, as well as the assembly code such that you can verify the cause.