Debugging Stack Fault of Hopper Failure on Windows Mobile devices

Debugging Stack Fault of Hopper Failure on Windows Mobile devices

  • Comments 3

1.  Introduction

Stack fault happens whenever a thread’s stack is almost used up, and the Windows CE kernel will generate a message of the problem like this:

9381584 PID:47c40bda TID:24ccc7ce [Stack fault]: Thread=85fa3a40 Proc=8060e640 'device.exe'

9381584 PID:47c40bda TID:24ccc7ce AKY=00020013 PC=01a23be0(dummy.dll+0x00003be0) RA=03ed33a0(devmgr.dll+0x000033a0) BVA=24081d24 FSR=00000007

 

The data in the message is important to understand the failure, which can be a data abort, stack fault, page fault, etc.

  • AKY  "Access Key": Process slot bitmask corresponding to the processes the excepting thread has access to. Platform Builder can show Access Keys of each process in the “Processes” window.
  • PC "Program Counter": Represents the current line of instruction. On ARM platforms, this is the current value of the PC register and EIP (Instruction Pointer) on x86 platforms. If symbols are available, the exception handler will attempt to provide an offset line into the DLL that caused the exception.
  • RA "Return Address": Pointer to the instruction address of the function that called the current function. Had the current function NOT caused an exception, this is where we would return to. For a DLL on Windows CE, simply use the last two bytes of the address plus the preferred load address of the module (0x10000000) to search Rva+Base column in the corresponding map file and you will find the function the address falls into. For an EXE, use the last two bytes of the address plus the preferred load address (0x00010000) to search the Rva+Base column. The same calculation applies to BVA below.
  • BVA "Base Virtual Address": The contents of BVA depend on the type of exception found. If the exception is a Prefetch Abort, the value points directly to the PC register (execution point). If the exception is a Data Abort, then this value points to why the exception was caused. It is a combination of the Virtual Memory base of the module found plus the value that caused the exception.
  • FSR "Fault Status Register": The FSR represents several flags that will help you understand the nature of your exception. For ARM devices the following flags can be set:

#define FSR_ALIGNMENT       0x01

                #define FSR_PAGE_ERROR      0x02

                #define FSR_TRANSLATION     0x05

                #define FSR_DOMAIN_ERROR    0x09

                #define FSR_PERMISSION      0x0D

To confirm a stack fault failure, you should at least have the system dump when the problem occurred. Or better yet, you have a KITL enabled device and you can reproduce the problem.

2.  A Real Example

The following is a real world example of stack fault in while running Hopper with focus on tmail.exe.

9381584 PID:47c40bda TID:24ccc7ce [Stack fault]: Thread=85fa3a40 Proc=8060e640 'device.exe'

9381584 PID:47c40bda TID:24ccc7ce AKY=00020013 PC=01a23be0(dummy.dll+0x00003be0) RA=03ed33a0(devmgr.dll+0x000033a0) BVA=24081d24 FSR=00000007

Failed to initialize bug tagger!

9383377 PID:47c40bda TID:24ccc7ce RaiseException: Thread=85fa3a40 Proc=8060e640 'device.exe'

9383463 PID:47c40bda TID:24ccc7ce AKY=00020013 PC=03f6c3c4(coredll.dll+0x0001e3c4) RA=8030a514(NK.EXE+0x0000a514) BVA=00000001 FSR=00000001

9384183 PID:a79698b2 TID:a5f6ac92 OEMIoControl: Unsupported Code 0x1010058 - device 0x0101 func 22

 

Assembly Dump:

DUM_IOControl:

01A23BA4 E1A0C00D             mov         r12, sp

01A23BA8 E92D5FF0             stmdb       sp!, {r4 - r12, lr}

01A23BAC E28DB028             add         r11, sp, #0x28 <<r11 points back to last sp (0x28 accounts for 40 bytes for those saved registers)

01A23BB0 E59FCD48             ldr         r12, [pc, #0xD48]   << Load value into r12 from address 0x01A24900. Note that the value is negative. (FFFF7EFC or -33028, the size of the local variables)

01A23BB4 E08DD00C             add         sp, sp, r12         <<  Sp = 24081D44 and only 7K (1D44)  left on the stack after this.

$L36859:

01A23BB8 E1A06002             mov         r6, r2

01A23BBC E1A04001             mov         r4, r1

01A23BC0 E59F3D34             ldr         r3, [pc, #0xD34]

01A23BC4 E5933000             ldr         r3, dwLenIni

01A23BC8 E50B302C             str         r3, [r11, #-0x2C]

01A23BCC E59F0D24             ldr         r0, [pc, #0xD24]

01A23BD0 E3A08001             mov         r8, #1

01A23BD4 E3A03000             mov         r3, #0

01A23BD8 E3A07057             mov         r7, #0x57

01A23BDC E24BC902             sub         r12, r11, #2, 18 << r12 = r11 -2>>18 = 0x24081e70

01A23BE0 E50C7114             str         r7, [r12, #-0x114]  << Crash here since a page in request failed due to stack overflow

01A23BE4 E24BC902             sub         r12, r11, #2, 18

01A24900 FFFF7EFC             ???

 

Registers Dump:

 R0 = 01A23138 R1 = 0001000D R2 = 24089F80 R3 = 00000000

 R4 = 0001000D R5 = 0005E180 R6 = 24089F80 R7 = 00000057

 R8 = 00000001 R9 = 24089F80 R10 = 00000000 R11 = 24089E70

 R12 = 24081E70 Sp = 24081D44 Lr = 03ED33A0 Pc = 01A23BE0

 Cpsr = 4000001F

 

 Negative=0 Zero=1 Carry=0 Overflow=0 Q=0

 

Call Stack Dump:

Call Stack: tmail.exe: 0x24CCC7CE  11:36:54 12/07/2007 Taipei Standard Time

    0x24081d44 DUMMY!DUM_IOControl(void * 0x00000000, unsigned long 0x00000000, unsigned long * 0x00000000) dummy.cpp line 491 + 20 bytes

    0x24089e70 DEVMGR!DM_DevDeviceIoControl(void * 0x00000000, unsigned long 0x00000000, unsigned long * 0x00000000, _OVERLAPPED * 0x00000000 {Internal=??? InternalHigh=??? Offset=??? ...}) devfile.c line 464 + 44 bytes

    0x24089eb4 NK!SC_DeviceIoControl(void * 0x00000000, unsigned long 0x0001000d, unsigned long * 0x80321034, _OVERLAPPED * 0x0ba6169c {Internal=0x00550044 InternalHigh=0x0031004d Offset=0x0000003a ...}) kmisc.c line 2860 + 52 bytes

    0x24089f28 COREDLL!xxx_DeviceIoControl(void * 0x00000000, unsigned long 0x00000000, unsigned long * 0x00000000, _OVERLAPPED * 0x00000000 {Internal=??? InternalHigh=??? Offset=??? ...}) twinbase.cpp line 49 + 52 bytes

    0x24089f60 CAMERA_MAINSTONEII!SetCameraPresentFlag() cameradriver.cpp line 419

    0x24089f90 CAMERA_MAINSTONEII!CAM_Open() cameradriver.cpp line 461

    0x24089fb4 DEVMGR!I_CreateDeviceHandle(void * 0x0005e020) devfile.c line 98 + 24 bytes

    0x2408a004 DEVMGR!DM_CreateDeviceHandle() devfile.c line 182 + 24 bytes

    0x2408a084 COREDLL!xxx_CreateDeviceHandle() tdevice.c line 122

    0x2408a08c FILESYS!FS_CreateFileW(unsigned long 0x00075c08, unsigned long 0xc0000000, void * 0x00000001) fsmain.c line 2275 + 28 bytes

    0x2408a5ac COREDLL!xxx_CreateFileW(unsigned long 0x00000003, unsigned long 0x00000080, void * 0x00000000) twinbase.cpp line 100 + 52 bytes

    0x2408a5e0 QUARTZ!CCaptureAdapter::Load() adapter.cpp line 69 + 44 bytes

End Call Stack: tmail.exe: 0x24CCC7CE  11:36:54 12/07/2007 Taipei Standard Time

 

Combining the call stack and the registers, we can draw the following figure of the thread stack:

The local variable size in the function DUM_IOControl() is 33028 (Note that -33298= FFFF7EFC). However, the thread stack is almost used up, with only 7492(0x1D44) bytes left. The problem should not occur if OEM checks every warning message when they compile the code.

The following instruction results in a device hang:

01A23BE0 E50C7114             str         r7, [r12, #-0x114]  << Crash here since a page in request failed due to stack overflow

               

At this point r12 is 0x24081E70. Thus r7 will be stored in 0x2E081D5C. Note that there are two 4K guard pages for a 64Kb thread stack (in this case the thread stack is 0x24080000 ~ 0x2408FFFF). Note that when you build applications within Platform Builder or the Windows Mobile Adaptation Kit, the default thread stack is 64Kb of reserved virtual memory (the physical RAM is committed one page at a time). This is a pretty reasonable limit for most things on an embedded device. However, when you build your applications in Visual Studio, the default thread stack is 1MB of reserved virtual memory, which is often much more than needed. That is why experienced Windows Mobile application developers will set the default thread stack to 64Kb in Visual Studio (in Project properties à Linker à System à Stack Reserve Size) and then allocate larger stacks only when necessary.

Back to the stack fault above, the two guard pages are: Page A: 0x24082000-0x24083FFFF and Page B: 0x24080000-0x24081FFFF. When Page B is being hit, the kernel will throw an exception for the application to handle. If the application hits Page A, the system will terminate the thread. The above instruction is hitting Page B and the thread will be terminated immediately. For more on thread stack guard pages, see http://blogs.msdn.com/hopperx/archive/2006/02/03/524170.aspx

3.  A Faulty Sample Driver

Now let’s take a look at the sample driver that demos the sample problem. Let’s first see the failure logs and call stacks.

Debug messages:

920555 PID:a377f6ca TID:624727c2 [Stack fault]: Thread=83451bb4 Proc=80314240 'device.exe'

 920557 PID:a377f6ca TID:624727c2 AKY=00000411 PC=019b14dc(baddrv.dll+0x000014dc) RA=019b151c(baddrv.dll+0x0000151c) BVA=161b19a4 FSR=00000005

 

                Call stack:

0x161b19cc BADDRV!StackOverflow(unsigned long 0x00000010)  line 25

0x161b59d8 BADDRV!StackOverflow(unsigned long 0x00000010)  line 34

0x161b99e4 BADDRV!StackOverflow(unsigned long 0x00000011)  line 34

0x161bd9f0 BADDRV!Stack_Fault(unsigned char * 0x00000000, unsigned long 0x00000000, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned long * 0x161bdb6c)  line 20

0x161bda0c BADDRV!Launch_Test_case(unsigned long 0x00000300, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned long * 0x161bdb6c)  line 40

0x161bda38 BADDRV!BAD_IOControl(unsigned long 0x00000066, unsigned long 0x00000300, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned char * 0x00000000, unsigned long 0x00000000, unsigned long * 0x161bdb6c)  line 78 + 36 bytes

0x161bda68 DEVMGR!DM_DevDeviceIoControl(void * 0x00000000, unsigned long 0x161bdb6c, unsigned long * 0x00000000, _OVERLAPPED * 0x161c5970)  line 464 + 44 bytes

NK!80036520()

 

                Registers:

R0 = 00000010 R1 = 00000000

 R2 = 00000000 R3 = 00000001

 R4 = 019B14A0 R5 = 00000003

 R6 = 00000300 R7 = 00000000

 R8 = 00000000 R9 = 00000000

 R10 = 00000000 R11 = 161BDAA8

 R12 = 161B59D8 Sp = 161B19CC

 Lr = 019B151C Pc = 019B14DC

 Cpsr = 2000001F

 

 Negative=0 Zero=0 Carry=1 Overflow=0

 Q=0

 

 IRQ=0 FIQ=0 Thumb=0

 

 M4=1 M3=1 M2=1 M1=1 M0=1

 

Disassembly:

StackOverflow:

019B14CC    mov         r12, sp

019B14D0    stmdb       sp!, {r0}

019B14D4    stmdb       sp!, {r12, lr}

019B14D8    sub         sp, sp, #1, 18

$M26864:

019B14DC    add         r3, sp, #1, 18

019B14E0    ldr         r3, [r3, #8]

019B14E4    cmp         r3, #0

019B14E8    bhi         |$M26864+14h (019b14f0)|

019B14EC    b           |$M26864+40h (019b151c)|

019B14F0    add         r3, sp, #1, 18

019B14F4    ldr         r3, [r3, #8]

019B14F8    sub         r3, r3, #1

019B14FC    add         r12, sp, #1, 18

019B1500    str         r3, [r12, #8]

019B1504    mov         r3, #1

019B1508    add         r12, sp, #3, 20

019B150C    str         r3, [r12, #0xFFC]

019B1510    add         r0, sp, #1, 18

019B1514    ldr         r0, [r0, #8]

019B1518    bl          |StackOverflow (019b14cc)|

019B151C    add         sp, sp, #1, 18

019B1520    ldmia       sp, {sp, lr}

019B1524    bx          lr

 

The faulting instruction is PC=019B14DC. At that point SP is 161B19CC. We know that for each thread’s stack ranges from xxxx0000 ~ xxxxffff, and 161B19CC falls into the topmost guard page (Please refer to the stack figure in the above example). Therefore when the kernel sees this as one of the operands in this instruction, it will generate the stack fault.

The code that creates big arrays on the thread stack is shown below:

void Stack_Fault( PBYTE pBufIn,

                 DWORD dwLenIn,

                 PBYTE pBufOut,

                 DWORD dwLenOut,

                 PDWORD pdwActualOut )

{

    DWORD dwStackDepth = 18;

    StackOverflow(dwStackDepth);

}

 

 

void StackOverflow(DWORD dwDepthCount)

{

    if(0 >= dwDepthCount )

        return;

 

    dwDepthCount --;

 

    // Create an array on the stack

    DWORD buf[4096];

    buf[4095] = 1;

    StackOverflow(dwDepthCount);

}

Stack_Fault() calls StackOverflow() recursively. Each time 4Kb of space is allocated on the thread stack. This explains why the thread stack is all used up after three calls of StackOverflow(), as shown in the call stack dump above.

4.  Conclusion

This article discusses stack fault, a common problem with a device driver on Windows Mobile platform. The problem occurs when the guard pages of the 64Kb thread stack is being hit. When you see this problem while running Hopper or other tests, make sure you have the call stack and registers, as well as the assembly code such that you can verify the cause.

Leave a Comment
  • Please add 4 and 8 and type the answer here:
  • Post
  • PingBack from http://blog.a-foton.ru/2008/09/debugging-stack-fault/

  • Thanks! Very useful info, but so many typos in the text...

  • awesome article plz visit my blog too

    i m also writing a blog on apple

    The Unofficial Apple Blog. Daily news ranging from iPhone, iPhone apps review,software,mobile to latest news.

    Join us Apptec.net, your Apple Community.

Page 1 of 1 (3 items)