• Ntdebugging Blog

    Video: Citrix engineers Nicholas Vasile, Dmitry Vostokov, and Kapil Ramlal at the Microsoft GEC Conference

    • 0 Comments

    Our Global Escalation Services team at Microsoft hosted the first annual Global Engineering Conference on February 25th & 26th in 2009. This year’s theme was “Building the Engineering Community” with a focus on increased knowledge sharing, collaboration, and best practices throughout the engineering community. Several vendors attended the conference including our good friends at Citrix. In this video Citrix Escalation engineers Nicholas Vasile, Dmitry Vostokov, and Kapil Ramlal share tools they created to take advantage of the ETW tracing infrastructure in Windows, debugging scripts, and best practices for engaging issues that require multi-vendor support.

    Video Link 

    Video Table of Contents-

    Who is Citrix?

    Pattern-Drive Memory Dump Analysis (Presenter: Dmitry Vostokov)

    Citrix & ETW: A case study with CDFControl (Presenter: Kapil Ramlal)

    Working with Citrix Technical Support (Presenter:Nicholas Vasile)

    Citrix’s Public Symbol Server

     

    Citrix Engineers:

    image

    image

    image


    Feel free to send feedback!

    -Ronsto

  • Ntdebugging Blog

    Windows NT Debugging Blog Live Chat 2

    • 0 Comments

    Windows NT Debugging Blog Live Chat

    Microsoft Platform Global Escalation Services is hosting our second live group debug chat session for the debugging community on March 17, 2009 at 11 AM Pacific Time.  We will be focusing on debugging techniques and any questions you may have about anything we’ve previously blogged about. 

    Details about the “PGES-Windows NT Debugging Blog Live Chat” can be found here: http://www.microsoft.com/communities/chats/default.mspx

  • Ntdebugging Blog

    Next Week: Windows NT Debugging Blog Live Chat

    • 0 Comments

    Windows NT Debugging Blog Live Chat

    Microsoft Platform Global Escalation Services is hosting our second live group debug chat session for the debugging community on March 17, 2009 at 11 AM Pacific Time.  We will be focusing on debugging techniques and any questions you may have about anything we’ve previously blogged about. 

    Details about the “PGES-Windows NT Debugging Blog Live Chat” can be found here: http://www.microsoft.com/communities/chats/default.mspx

    Share this post :
  • Ntdebugging Blog

    The Compiler Did What?

    • 0 Comments

    I was recently investigating a crash in an application.  As I researched the issue I found a very old defect in the code that was only recently being exposed by the compiler.

     

    The crash occurred at the below instruction because the ebx register does not hold a valid pointer.

     

    0:001> r

    eax=d9050cf7 ebx=003078c0 ecx=6e2e0000 edx=00000000 esi=00000001 edi=0c334468

    eip=65637fbe esp=010eb408 ebp=010eb878 iopl=0         nv up ei pl nz na po nc

    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202

    riched20!CTxtSelection::CreateCaret+0x429:

    65637fbe 8b4b1c          mov     ecx,dword ptr [ebx+1Ch] ds:002b:003078dc=????????

    0:001> dd 003078c0

    003078c0  ???????? ???????? ???????? ????????

    003078d0  ???????? ???????? ???????? ????????

    003078e0  ???????? ???????? ???????? ????????

    003078f0  ???????? ???????? ???????? ????????

    00307900  ???????? ???????? ???????? ????????

    00307910  ???????? ???????? ???????? ????????

    00307920  ???????? ???????? ???????? ????????

    00307930  ???????? ???????? ???????? ????????

     

    Examining the assembly leading up to the crash, ebx came from [ebp-40c].

     

    0:001> ub .

    riched20!CTxtSelection::CreateCaret+0x408:

    65637f9d 6a08            push    8

    65637f9f ff156cf06465    call    dword ptr [riched20!_imp__CreateBitmap (6564f06c)]

    65637fa5 898784000000    mov     dword ptr [edi+84h],eax

    65637fab eb06            jmp    riched20!CTxtSelection::CreateCaret+0x41e (65637fb3)

    65637fad 8bb5e4fbffff    mov     esi,dword ptr [ebp-41Ch]

    65637fb3 8b9df4fbffff    mov    ebx,dword ptr [ebp-40Ch]

    65637fb9 ff775c          push    dword ptr [edi+5Ch]

    65637fbc 6a01            push    1

    0:001> dd @ebp-40c l1

    010eb46c  003078c0

     

    Looking at the whole function, [ebp-40c] was populated at the beginning of the function as the contents of edi+1C. The contents of edi+1Ch were first moved into ecx and later the value of ecx was moved into [ebp-40Ch].    Further examination of the whole function showed the edi register is unchanged at the time of the crash, so I can use its current value to determine what [ebp-40c] should contain.

     

    0:001> uf riched20!CTxtSelection::CreateCaret

    riched20!CTxtSelection::CreateCaret:

    65637b95 8bff            mov     edi,edi

    65637b97 55              push    ebp

    65637b98 8bec            mov     ebp,esp

    65637b9a 81ec5c040000    sub     esp,45Ch

    65637ba0 a100e06465      mov     eax,dword ptr [riched20!__security_cookie (6564e000)]

    65637ba5 33c5            xor     eax,ebp

    65637ba7 8945fc          mov     dword ptr [ebp-4],eax

    65637baa 53              push    ebx

    65637bab 56              push    esi

    65637bac 57              push    edi

    65637bad 8bf9            mov     edi,ecx

    65637baf 8b4f1c          mov    ecx,dword ptr [edi+1Ch] <<< The value originates from [edi+1Ch]

    65637bb2 0fbf4740        movsx   eax,word ptr [edi+40h]

    65637bb6 898df4fbffff    mov    dword ptr [ebp-40Ch],ecx <<< Store the value on the stack

    <snip>

    65637fb3 8b9df4fbffff    mov    ebx,dword ptr [ebp-40Ch] <<< Read the value from the stack

    <snip>

    65637fbe 8b4b1c          mov    ecx,dword ptr [ebx+1Ch] <<< Crash here because ebx is invalid

    <snip>

     

    The expected value of [ebp-40C], and thus the expected value of the ebx register, is 091978c0 based on the value in [edi+1Ch] at the time of the crash.  This would be a valid pointer and is not what is currently in [ebp-40C] or ebx.  It is noteworthy that at the time of the crash, ebx is similar to what should be there, it differs only by the high word of the dword.

     

    0:001> r ebx

    ebx=003078c0

     

    0:001> dd @edi+1c l1

    0c334484  091978c0

     

    The expected value, 091978c0, is a valid pointer.

     

    0:001> dd 091978c0

    091978c0  091978c8 00000000 00000501 05000000

    091978d0  00000015 076c1a27 2a372f35 0c2e3998

    091978e0  000049aa 00000000 00000000 00000000

    091978f0  00000000 00000000 00000000 00000000

    09197900  00000000 00000000 00000000 00000000

    09197910  00000000 00000000 00000000 00000000

    09197920  1a3098a8 00000000 00000000 00000000

    09197930  00000000 00000000 00000000 00000000

     

    Somehow the value at ebp-40C was changed between instruction 65637bb6, where [ebp-40C] was set, and instruction 65637fb3 where [ebp-40C] was read.  Fortunately I had a mechanism to reproduce this crash so I was able to set a breakpoint and trace through how this happened.

     

    First I set a breakpoint on the instruction that populates [ebp-40C].

     

    0:003> bp 65637bb6

    0:003> g

    Breakpoint 0 hit

    eax=ffffffff ebx=0c334468 ecx=091978c0 edx=00000060 esi=091978c0 edi=0c334468

    eip=65637bb6 esp=010eb410 ebp=010eb878 iopl=0         nv up ei pl nz na pe nc

    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206

    riched20!CTxtSelection::CreateCaret+0x21:

    65637bb6 898df4fbffff    mov     dword ptr [ebp-40Ch],ecx ss:002b:010eb46c=00000000

     

    Next I calculated ebp-40C and set a break on write access breakpoint.

     

    0:001> ?@ebp-40c

    Evaluate expression: 17740908 = 010eb46c

    0:001> ba w4 010eb46c

    0:001> g

    Breakpoint 1 hit

    eax=00000030 ebx=00000000 ecx=00000000 edx=00000020 esi=00000001 edi=0c334468

    eip=65637f67 esp=010eb40c ebp=010eb878 iopl=0         nv up ei pl zr na pe nc

    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246

    riched20!CTxtSelection::CreateCaret+0x3d2:

    65637f67 66898475f4fbffff mov     word ptr [ebp+esi*2-40Ch],ax ss:002b:010eb46e=0919

     

    The write breakpoint hit at a location I was not expecting.  The instruction where the breakpoint hit is not modifying the variable that was stored at [ebp-40C].

     

    Although I cannot share the Windows source code on this blog, the code in question roughly resembles the below example.  Note that a proficient assembly language reader could figure out the code flow, this example is not sharing any magic.

     

    Struct1*    p1;

    WORD        array[512];

    p1 = GetStruct1();

    array[i-2] = 0x30;

    p1->p = variable2; // Crash here because p1 is not a valid pointer

     

    We are crashing because p1 is not a valid pointer.  The high word of p1 is being overwritten as 0030 by the line “array[i-2] = 0x30;” because i is 1, leading to an underflow of the array.  This underflow is corrupting the pointer in p1.

     

    0:001> r ebx

    ebx=003078c0

     

    Clearly there is a defect in the above code.  If it is legitimate for i to be 1 (and it is), then a check must be made to prevent an underflow of the array.  However further research found that this code has been consistent for many years and many releases of the product.  Why is this suddenly crashing now?  As the bank robber in Dirty Harry said, “I gots to know."

     

    In the above assembly we calculate that “array” starts at ebp-408 (assuming i is always 2 or greater, 2*2-40c is -408).  In the earlier assembly we see that p1 is placed at ebp-40c.  In this configuration an underflow of “array” will always corrupt p1.

     

    Examining the assembly on a system that does not crash, I found that the local variables are stored differently in a different version of this binary.  In the beginning of the function we see that p1 is stored in ebx.  In this version of the binary ebx is never stored on the stack, so it cannot be corrupted by an underflow.

     

    0:000> uf riched20!CTxtSelection::CreateCaret

    riched20!CTxtSelection::CreateCaret:

    74e75c53 8bff            mov     edi,edi

    74e75c55 55              push    ebp

    74e75c56 8bec            mov     ebp,esp

    74e75c58 81ec58040000    sub     esp,458h

    74e75c5e a19010e974      mov     eax,dword ptr [riched20!__security_cookie (74e91090)]

    74e75c63 53              push    ebx

    74e75c64 56              push    esi

    74e75c65 8bf1            mov     esi,ecx

    74e75c67 8b5e1c          mov    ebx,dword ptr [esi+1Ch]

     

    The code that populates array[i-2] with 0x30 is later in the function.  In this version, array is stored at ebp-404.  If there is an underflow it will corrupt ebp-408.

     

    riched20!CTxtSelection::CreateCaret+0x3e1:

    74e76034 66c7847df8fbffff3000 mov word ptr [ebp+edi*2-408h],30h

     

    The value stored at ebp-408 is used in several places in this function, however it is never used after instruction 74e76034 executes.  This means any underflow in the array only corrupts memory that is not used after the corruption, and as a result the corruption never results in a crash.  Although this defect has existed for a long time, the compiler has protected us until now.

     

    74e75d3f 0b85f8fbffff    or      eax,dword ptr [ebp-408h]

    74e75e51 ffb5f8fbffff    push    dword ptr [ebp-408h]

    74e75e8a 8b8df8fbffff    mov     ecx,dword ptr [ebp-408h]

    74e75f20 398df8fbffff    cmp     dword ptr [ebp-408h],ecx

    74e75fec 8b85f8fbffff    mov     eax,dword ptr [ebp-408h]

     

    The issue discussed in this article was addressed as part of KB2883200.

  • Ntdebugging Blog

    Another Who Done It

    • 0 Comments

    Hi my name is Bob Golding, I am an EE in GES. I want to share an interesting problem I recently worked on.  The initial symptom was the system bugchecked with a Stop 0xA which means there was an invalid memory reference.  The cause of the crash was a driver making I/O requests while Asynchronous Procedure Calls (APCs) were disabled.  The bugcheck caused by an invalid memory reference was the result of the problem and not the cause.

     

    An APC is queued to a thread during I/O completion. This is to guarantee the last phase of the I/O completion occurs in the same context as the process that issued the request.

     

    The stack of the trap is presented below.  The call stack shows that APCs are being enabled allowing queued APCs to run.

     

    Child-SP          RetAddr           Call Site

    fffff880`07bf3598 fffff800`030b85a9 nt!KeBugCheckEx

    fffff880`07bf35a0 fffff800`030b7220 nt!KiBugCheckDispatch+0x69

    fffff880`07bf36e0 fffff800`030d8b56 nt!KiPageFault+0x260

    fffff880`07bf3870 fffff800`030959ff nt!IopCompleteRequest+0xc73

    fffff880`07bf3940 fffff800`0306c0d9 nt!KiDeliverApc+0x1d7

    fffff880`07bf39c0 fffff800`033f8a1a nt!KiCheckForKernelApcDelivery+0x25

    fffff880`07bf39f0 fffff800`033cce2f nt!MiMapViewOfSection+0x2bafa

    fffff880`07bf3ae0 fffff800`030b8293 nt!NtMapViewOfSection+0x2be

    fffff880`07bf3bb0 00000000`772df93a nt!KiSystemServiceCopyEnd+0x13

    00000000`0015dea8 00000000`00000000 0x772df93a

     

    The reason the trap occurred is because when issuing requests to lower drivers it is common practice in drivers to implement code similar to:

     

    KEVENT event;

     

    status = IoCallDriver( DeviceObject, irp );

     

    //

    //  Wait for the event to be signaled if STATUS_PENDING is returned.

    //

    if (status == STATUS_PENDING) {

       (VOID)KeWaitForSingleObject( &event, // event is a local which is declared on the stack

                                    Executive,

                                    KernelMode,

                                    FALSE,

                                    NULL );

    }

     

    As you can see in the above code, if the return from IoCallDriver does not return pending the code continues and exits. Part of the last phase of I/O processing that takes place in the APC is signaling the event. If the call to IoCallDriver returns success, because the event is on the stack it is critical that the APC execute immediately before the stack unwinds. Since APCs where disabled, the execution of the APC was delayed and during this time the event became invalid. The APCs were delayed because the memory manager was in a critical area and APCs could not run.

     

    I needed to determine which driver did this so I enabled IRP logging in Driver Verifier to trace I/O requests.  With this enabled the next dump should contain a transaction log that will help identify what driver is performing I/O while APCs are disabled.  The command line to enable this is:

    verifier /flags 0x410 /all

     

    The new dump with verifier enabled also crashed after delivering an APC to the thread and completing the IRP.  From the debug output below I can find the IRPs that were issued and the thread that issued them, this is what I need to look for them in the log.

     

    1: kd> !thread

    THREAD fffffa80064c9b50 Cid 0200.0204  Teb: 000007fffffde000 Win32Thread: 0000000000000000 RUNNING on processor 1

    IRP List:

        fffff9800a33ec60: (0006,03a0) Flags: 40060070  Mdl: 00000000

        fffff9800a250c60: (0006,03a0) Flags: 40060070  Mdl: 00000000

        fffff9800a3f4ee0: (0006,0118) Flags: 40060070  Mdl: 00000000

    Not impersonating

    DeviceMap                 fffff8a000007890

    Owning Process            fffffa80064bbb30       Image:         csrss.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      1656           Ticks: 0

    Context Switch Count      25             IdealProcessor: 0

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address 0x000000004a061540

    Stack Init fffff88003b21c70 Current fffff88003b20890

    Base fffff88003b22000 Limit fffff88003b1c000 Call 0

    Priority 14 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffff880`03b21428 fffff800`0307a54c nt!KeBugCheckEx

    fffff880`03b21430 fffff800`030d02ee nt!MmAccessFault+0xffffffff`fff9c15c

    fffff880`03b21590 fffff800`030c8db9 nt!KiPageFault+0x16e

    fffff880`03b21728 fffff800`030e6ab3 nt!memcpy+0x229

    fffff880`03b21730 fffff800`030c4bd7 nt!IopCompleteRequest+0x5a3

    fffff880`03b21800 fffff800`0307ba85 nt!KiDeliverApc+0x1c7

    fffff880`03b21880 fffff800`0331d96a nt!KiCheckForKernelApcDelivery+0x25

    fffff880`03b218b0 fffff800`033e742e nt!MiMapViewOfSection+0xffffffff`fff36baa

    fffff880`03b219a0 fffff800`030d1453 nt!NtMapViewOfSection+0x2bd

    fffff880`03b21a70 00000000`7761159a nt!KiSystemServiceCopyEnd+0x13

    00000000`0025f078 00000000`00000000 0x7761159a

     

    The command “!verifier 100” will dump the transaction log.  Below is the relevant portion of the log containing the IRPs for our thread.

     

    IRP fffff9800a3f4ee0, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -4, SpecialApcDisable = -1

    fffff80003573a68 nt!IovAllocateIrp+0x28

    fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

    fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

    fffff880010f8bcc fltmgr!FltGetVolumeGuidName+0x18c

    fffff88004e4fbe1 baddriver+0x12be1

    fffff88004e73523 baddriver +0x36523

    fffff88004e7300c baddriver +0x3600c

    fffff88004e72cce baddriver +0x35cce

    fffff88004e5f715 baddriver +0x22715

    fffff88004e4c6c7 baddriver +0xf6c7

    fffff88004e48342 baddriver +0xb342

    fffff88004e5e44e baddriver +0x2144e

    fffff88004e5e638 baddriver +0x21638

     

    IRP fffff9800a250c60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1

    fffff80003573a68 nt!IovAllocateIrp+0x28

    fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

    fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

    fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73

    fffff88001010a6b mountmgr!QueryDeviceInformation+0x207

    fffff8800101986b mountmgr!QueryPointsFromMemory+0x57

    fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a

    fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9

    fffff80003574c16 nt!IovCallDriver+0x566

    fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac

    fffff88004e4fbe1 baddriver +0x12be1

    fffff88004e73523 baddriver +0x36523

    fffff88004e7300c baddriver +0x3600c

     

    IRP fffff9800a33ec60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1

    fffff80003573a68 nt!IovAllocateIrp+0x28

    fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

    fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

    fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73

    fffff88001010afd mountmgr!QueryDeviceInformation+0x299

    fffff8800101986b mountmgr!QueryPointsFromMemory+0x57

    fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a

    fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9

    fffff80003574c16 nt!IovCallDriver+0x566

    fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac

    fffff88004e4fbe1 baddriver +0x12be1

    fffff88004e73523 baddriver +0x36523

    fffff88004e7300c baddriver +0x3600c

     

    From the IRP log in verifier I can see that baddriver.sys is calling FltGetVolumeGuidName while APCs are disabled. Further investigation found that baddriver.sys had registered a function for image load notification, and the memory manager has APCs disabled when it calls the image notification routine. The image notification routine in baddriver.sys called FltGetVolumeGuidName which issued the I/O.  From the log output I see KernelApcDisable and SpecialApcDisable, the issue is SpecialApcDisable being –1.  The I/O completion APCs are considered special APCs, so kernel APC disable would not affect them.

     

    The solution was for the driver to check for APCs disabled before issuing the FltGetVolumeGuidName and not make this call if APCs are disabled.

  • Ntdebugging Blog

    ResAvail Pages and Working Sets

    • 0 Comments

    Hello everyone, I'm Ray and I'm here to talk a bit about a dump I recently looked at and a little-referenced memory counter called ResAvail Pages (resident available pages).

     

    The problem statement was:  The server hangs after a while.

     

    Not terribly informative, but that's where we start with many cases. First some good housekeeping:

     

    0: kd> vertarget

    Windows 7 Kernel Version 7601 (Service Pack 1) MP (2 procs) Free x64

    Product: Server, suite: TerminalServer SingleUserTS

    Built by: 7601.18113.amd64fre.win7sp1_gdr.130318-1533

    Machine Name: "ASDFASDF1234"

    Kernel base = 0xfffff800`01665000 PsLoadedModuleList = 0xfffff800`018a8670

    Debug session time: Thu Aug  8 09:39:26.992 2013 (UTC - 4:00)

    System Uptime: 9 days 1:08:39.307

     

    Of course Windows 7 Server == Server 2008 R2.

     

    One of the basic things I check at the beginning of these hang dumps with vague problem statements is the memory information.

     

    0: kd> !vm 21

     

    *** Virtual Memory Usage ***

    Physical Memory:     2097038 (   8388152 Kb)

    Page File: \??\C:\pagefile.sys

      Current:  12582912 Kb  Free Space:  12539700 Kb

      Minimum:  12582912 Kb  Maximum:     12582912 Kb

    Available Pages:      286693 (   1146772 Kb)

    ResAvail Pages:          135 (       540 Kb)

     

    ********** Running out of physical memory **********

     

    Locked IO Pages:           0 (         0 Kb)

    Free System PTEs:   33526408 ( 134105632 Kb)

     

    ******* 12 system cache map requests have failed ******

     

    Modified Pages:         4017 (     16068 Kb)

    Modified PF Pages:      4017 (     16068 Kb)

    NonPagedPool Usage:   113241 (    452964 Kb)

    NonPagedPool Max:    1561592 (   6246368 Kb)

    PagedPool 0 Usage:     35325 (    141300 Kb)

    PagedPool 1 Usage:     28162 (    112648 Kb)

    PagedPool 2 Usage:     24351 (     97404 Kb)

    PagedPool 3 Usage:     24350 (     97400 Kb)

    PagedPool 4 Usage:     24516 (     98064 Kb)

    PagedPool Usage:      136704 (    546816 Kb)

    PagedPool Maximum:  33554432 ( 134217728 Kb)

     

    ********** 222 pool allocations have failed **********

     

    Session Commit:         6013 (     24052 Kb)

    Shared Commit:          6150 (     24600 Kb)

    Special Pool:              0 (         0 Kb)

    Shared Process:      1214088 (   4856352 Kb)

    Pages For MDLs:           67 (       268 Kb)

    PagedPool Commit:     136768 (    547072 Kb)

    Driver Commit:         15548 (     62192 Kb)

    Committed pages:     1648790 (   6595160 Kb)

    Commit limit:        5242301 (  20969204 Kb)

     

    So we're failing to allocate pool, but we aren't out of virtual memory for paged pool or nonpaged pool.  Let's look at the breakdown:

     

    0: kd> dd nt!MmPoolFailures l?9

    fffff800`01892160  000001be 00000000 00000000 00000002

    fffff800`01892170  00000000 00000000 00000000 00000000

    fffff800`01892180  00000000

     

    Where:

        yellow   = Nonpaged high/medium/low priority failures

        green    = Paged high/medium/low priority failures

        cyan      = Session paged high/medium/low priority failures

     

    So we actually failed both nonpaged AND paged pool allocations in this case.  Why?  We're "Running out of physical memory", obviously.  So where does this running out of physical memory message come from?  In the above example this is from the ResAvail Pages counter.

     

    ResAvail Pages is the amount of physical memory there would be if every working set was at its minimum size and only what needs to be resident in RAM was present (e.g. PFN database, system PTEs, driver images, kernel thread stacks, nonpaged pool, etc).

     

    Where did this memory go then?  We have plenty of Available Pages (Free + Zero + Standby) for use.  So something is claiming memory it isn't actually using.  In this type of situation one of the things I immediately suspect is process working set minimums. Working set basically means the physical memory used by a process.

     

    So let's check.

     

    0: kd> !process 0 1

     

    <a lot of processes in this output>.

     

    PROCESS fffffa8008f76060

        SessionId: 0  Cid: 0adc    Peb: 7fffffda000  ParentCid: 0678

        DirBase: 204ac9000  ObjectTable: 00000000  HandleCount:   0.

        Image: cscript.exe

        VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.

        DeviceMap fffff8a000008a70

        Token                             fffff8a0046f9c50

        ElapsedTime                       9 Days 01:08:00.134

        UserTime                          00:00:00.000

        KernelTime                        00:00:00.015

        QuotaPoolUsage[PagedPool]         0

        QuotaPoolUsage[NonPagedPool]      0

        Working Set Sizes (now,min,max)  (5, 50, 345) (20KB, 200KB, 1380KB)

        PeakWorkingSetSize                1454

        VirtualSize                       65 Mb

        PeakVirtualSize                   84 Mb

        PageFaultCount                    1628

        MemoryPriority                    BACKGROUND

        BasePriority                      8

        CommitCharge                      0

     

    I have only shown one example process above for brevity's sake, but there were thousands returned.  241,423 to be precise.  None had abnormally high process working set minimums, but cumulatively their usage adds up.

     

    The “now” process working set is lower than the minimum working set.  How is that possible?  Well, the minimum and maximum are not hard limits, but suggested limits.  For example, the minimum working set is honored unless there is memory pressure, in which case it can be trimmed below this value.  There is a way to set the min and/or max as hard limits on specific processes by using the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag via SetProcessWorkingSetSize.

     

    You can view if the minimum and maximum working set values are configured in the _EPROCESS->Vm->Flags structure.  Note these numbers are from another system as this structure was already torn down for the processes we were looking at.

     

    0: kd> dt _EPROCESS fffffa8008f76060 Vm

    nt!_EPROCESS

       +0x398 Vm : _MMSUPPORT

    0: kd> dt _MMSUPPORT fffffa8008f76060+0x398

    nt!_MMSUPPORT

       +0x000 WorkingSetMutex  : _EX_PUSH_LOCK

       +0x008 ExitGate         : 0xfffff880`00961000 _KGATE

       +0x010 AccessLog        : (null)

       +0x018 WorkingSetExpansionLinks : _LIST_ENTRY [ 0x00000000`00000000 - 0xfffffa80`08f3c410 ]

       +0x028 AgeDistribution  : [7] 0

       +0x044 MinimumWorkingSetSize : 0x32

       +0x048 WorkingSetSize   : 5

       +0x04c WorkingSetPrivateSize : 5

       +0x050 MaximumWorkingSetSize : 0x159

       +0x054 ChargedWslePages : 0

       +0x058 ActualWslePages  : 0

       +0x05c WorkingSetSizeOverhead : 0

       +0x060 PeakWorkingSetSize : 0x5ae

       +0x064 HardFaultCount   : 0x41

       +0x068 VmWorkingSetList : 0xfffff700`01080000 _MMWSL

       +0x070 NextPageColor    : 0x2dac

       +0x072 LastTrimStamp    : 0

       +0x074 PageFaultCount   : 0x65c

       +0x078 RepurposeCount   : 0x1e1

       +0x07c Spare            : [2] 0

       +0x084 Flags            : _MMSUPPORT_FLAGS

    0: kd> dt _MMSUPPORT_FLAGS fffffa8008f76060+0x398+0x84

    nt!_MMSUPPORT_FLAGS

       +0x000 WorkingSetType   : 0y000

       +0x000 ModwriterAttached : 0y0

       +0x000 TrimHard         : 0y0

       +0x000 MaximumWorkingSetHard : 0y0

       +0x000 ForceTrim        : 0y0

       +0x000 MinimumWorkingSetHard : 0y0

       +0x001 SessionMaster    : 0y0

       +0x001 TrimmerState     : 0y00

       +0x001 Reserved         : 0y0

       +0x001 PageStealers     : 0y0000

       +0x002 MemoryPriority   : 0y00000000 (0)

       +0x003 WsleDeleted      : 0y1

       +0x003 VmExiting        : 0y1

       +0x003 ExpansionFailed  : 0y0

       +0x003 Available        : 0y00000 (0)

     

    How about some more detail?

     

    0: kd> !process fffffa8008f76060

    PROCESS fffffa8008f76060

        SessionId: 0  Cid: 0adc    Peb: 7fffffda000  ParentCid: 0678

        DirBase: 204ac9000  ObjectTable: 00000000  HandleCount:  0.

        Image: cscript.exe

        VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.

        DeviceMap fffff8a000008a70

        Token                             fffff8a0046f9c50

        ElapsedTime                       9 Days 01:08:00.134

        UserTime                          00:00:00.000

        KernelTime                        00:00:00.015

        QuotaPoolUsage[PagedPool]         0

        QuotaPoolUsage[NonPagedPool]      0

        Working Set Sizes (now,min,max)  (5, 50, 345) (20KB, 200KB, 1380KB)

        PeakWorkingSetSize                1454

        VirtualSize                       65 Mb

        PeakVirtualSize                   84 Mb

        PageFaultCount                    1628

        MemoryPriority                    BACKGROUND

        BasePriority                      8

        CommitCharge                      0

     

    No active threads

     

    0: kd> !object fffffa8008f76060

    Object: fffffa8008f76060  Type: (fffffa8006cccc90) Process

        ObjectHeader: fffffa8008f76030 (new version)

        HandleCount: 0  PointerCount: 1

     

    The highlighted information shows us that this process has no active threads left but the process object itself (and its 20KB working set use) were still hanging around because a kernel driver had a reference to the object that it never released.  Sampling other entries shows the server had been leaking process objects since it was booted.

     

    Unfortunately trying to directly track down pointer leaks on process objects is difficult and requires an instrumented kernel, so we tried to check the easy stuff first before going that route.  We know it has to be a kernel driver doing this (since it is a pointer and not a handle leak) so we looked at the list of 3rd party drivers installed.  Note: The driver names have been redacted.

     

    0: kd> lm

    start             end                 module name

    <snip>

    fffff880`04112000 fffff880`04121e00   driver1    (no symbols)   <-- no symbols usually means 3rd party       

    fffff880`04158000 fffff880`041a4c00   driver2    (no symbols)          

    <snip>

     

    0: kd> lmvm driver1   

    Browse full module list

    start             end                 module name

    fffff880`04112000 fffff880`04121e00   driver1    (no symbols)          

        Loaded symbol image file: driver1.sys

        Image path: \SystemRoot\system32\DRIVERS\driver1.sys

        Image name: driver1.sys

        Browse all global symbols  functions  data

        Timestamp:        Wed Dec 13 12:09:32 2006 (458033CC)

        CheckSum:         0001669E

        ImageSize:        0000FE00

        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

    0: kd> lmvm driver2

    Browse full module list

    start             end                 module name

    fffff880`04158000 fffff880`041a4c00   driver2    (no symbols)          

        Loaded symbol image file: driver2.sys

        Image path: \??\C:\Windows\system32\drivers\driver2.sys

        Image name: driver2.sys

        Browse all global symbols  functions  data

        Timestamp:        Thu Nov 30 12:12:07 2006 (456F10E7)

        CheckSum:         0004FE8E

        ImageSize:        0004CC00

        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

     

    Fortunately for both the customer and us we turned up a pair of drivers that predated Windows Vista (meaning they were designed for XP/2003) that raised an eyebrow.  Of course we need a more solid evidence link than just "it's an old driver", so I did a quick search of our internal KB.  This turned up several other customers who had these same drivers installed, experienced the same problem, then removed them and the problem went away.  That sounds like a pretty good evidence link. We implemented the same plan for this customer successfully.

  • Ntdebugging Blog

    Understanding Pool Corruption Part 3 – Special Pool for Double Frees

    • 0 Comments

    In Part 1 and Part 2 of this series we discussed pool corruption and how special pool can be used to identify the cause of such corruption.  In today’s article we will use special pool to catch a double free of pool memory.

     

    A double free of pool will cause a system to blue screen, however the resulting crash may vary.  In the most obvious scenario a driver that frees a pool allocation twice will cause the system to immediately crash with a stop code of C2 BAD_POOL_CALLER, and the first parameter will be 7 to indicate “Attempt to free pool which was already freed”.  If you experience such a crash, enabling special pool should be high on your list of troubleshooting steps.

     

    BAD_POOL_CALLER (c2)

    The current thread is making a bad pool request.  Typically this is at a bad IRQL level or double freeing the same allocation, etc.

    Arguments:

    Arg1: 0000000000000007, Attempt to free pool which was already freed

    Arg2: 00000000000011c1, (reserved)

    Arg3: 0000000004810007, Memory contents of the pool block

    Arg4: fffffa8001b10800, Address of the block of pool being deallocated

     

    A less obvious crash would be if the pool has been reallocated.  As we showed in Part 2, pool is structured so that multiple drivers share a page.  When DriverA calls ExFreePool to free its pool block the block is made available for other drivers.  If memory manager gives this memory to DriverF, and then DriverA frees it a second time, a crash may occur in DriverF when the pool allocation no longer contains the expected data.  Such a problem may be difficult for the developer of DriverF to identify without special pool.

     

     

    Special pool will place each driver’s allocation in a separate page of memory (as discussed in Part 2).  When a driver frees a pool block in special pool the whole page will be freed, and any access to a free page will cause an immediate bugcheck.  Additionally, special pool will place this page on the tail of the list of pages to be used again.  This increases the likelihood that the page will still be free when it is freed a second time, decreasing the likelihood of the DriverA/DriverF scenario shown above.

     

    To demonstrate this failure we will once again use the Sysinternals tool NotMyFault.  Choose the “Double free” option and click “Crash”.  Most likely you will get the stop C2 bugcheck mentioned above.  Enable special pool and reboot to get a more informative error.

     

    verifier /flags 1 /driver myfault.sys

     

    Choosing the “Double free” option with special pool enabled resulted in the following crash.  The bugcheck code PAGE_FAULT_IN_NONPAGED_AREA means some driver tried to access memory that was not valid.  This invalid memory was the freed special pool page.

     

    PAGE_FAULT_IN_NONPAGED_AREA (50)

    Invalid system memory was referenced.  This cannot be protected by try-except,

    it must be protected by a Probe.  Typically the address is just plain bad or it

    is pointing at freed memory.

    Arguments:

    Arg1: fffff9800a7fe7f0, memory referenced.

    Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.

    Arg3: fffff80060263888, If non-zero, the instruction address which referenced the bad memory address.

    Arg4: 0000000000000002, (reserved)

     

    Looking at the call stack we can see myfault.sys was freeing pool and ExFreePoolSanityChecks took a page fault that lead to the crash.

     

    kd> kn

    # Child-SP          RetAddr           Call Site

    00 fffff880`0419fe28 fffff800`5fd7e28a nt!DbgBreakPointWithStatus

    01 fffff880`0419fe30 fffff800`5fd7d8de nt!KiBugCheckDebugBreak+0x12

    02 fffff880`0419fe90 fffff800`5fc5b544 nt!KeBugCheck2+0x79f

    03 fffff880`041a05b0 fffff800`5fd1c5bc nt!KeBugCheckEx+0x104

    04 fffff880`041a05f0 fffff800`5fc95acb nt! ?? ::FNODOBFM::`string'+0x33e2a

    05 fffff880`041a0690 fffff800`5fc58eee nt!MmAccessFault+0x55b

    06 fffff880`041a07d0 fffff800`60263888 nt!KiPageFault+0x16e

    07 fffff880`041a0960 fffff800`6024258c nt!ExFreePoolSanityChecks+0xe8

    08 fffff880`041a09a0 fffff880`04c9b5d9 nt!VerifierExFreePoolWithTag+0x3c

    09 fffff880`041a09d0 fffff880`04c9b727 myfault!MyfaultDeviceControl+0x2fd

    0a fffff880`041a0b20 fffff800`60241a4a myfault!MyfaultDispatch+0xb7

    0b fffff880`041a0b80 fffff800`600306c7 nt!IovCallDriver+0xba

    0c fffff880`041a0bd0 fffff800`600458a6 nt!IopXxxControlFile+0x7e5

    0d fffff880`041a0d60 fffff800`5fc5a453 nt!NtDeviceIoControlFile+0x56

    0e fffff880`041a0dd0 000007fd`ea212c5a nt!KiSystemServiceCopyEnd+0x13

     

    Using the address from the bugcheck code, we can verify that the memory is in fact not valid:

     

    kd> dd fffff9800a7fe7f0

    fffff980`0a7fe7f0  ???????? ???????? ???????? ????????

    fffff980`0a7fe800  ???????? ???????? ???????? ????????

    fffff980`0a7fe810  ???????? ???????? ???????? ????????

    fffff980`0a7fe820  ???????? ???????? ???????? ????????

    fffff980`0a7fe830  ???????? ???????? ???????? ????????

    fffff980`0a7fe840  ???????? ???????? ???????? ????????

    fffff980`0a7fe850  ???????? ???????? ???????? ????????

    fffff980`0a7fe860  ???????? ???????? ???????? ????????

    kd> !pte fffff9800a7fe7f0

                                               VA fffff9800a7fe7f0

    PXE at FFFFF6FB7DBEDF98    PPE at FFFFF6FB7DBF3000    PDE at FFFFF6FB7E600298    PTE at FFFFF6FCC0053FF0

    contains 0000000002A91863  contains 0000000002A10863  contains 0000000000000000

    pfn 2a91      ---DA--KWEV  pfn 2a10      ---DA--KWEV  not valid

     

    So far we have enough evidence to prove that myfault.sys was freeing invalid memory, but how to we know this memory is being freed twice?  If there was a double free we need to determine if the first or second call to ExFreePool was incorrect.  To this so we need to determine what code freed the memory first.

     

    Driver Verifier special pool keeps track of the last 0x10000 calls to allocate and free pool.  You can dump this database with the !verifier 80 command.  To limit the data output you can also pass this command the address of the memory you suspect was double freed.

     

    Don’t assume the address in the bugcheck code is the address being freed, go get the address from the function that called VerifierExFreePoolWithTag.

     

    In the above call stack the call below VerifierExFreePoolWithTag is frame 9 (start counting with 0, or use kn).

     

    kd> .frame /r 9

    09 fffff880`041a09d0 fffff880`04c9b727 myfault+0x15d9

    rax=0000000000000000 rbx=fffff9800a7fe800 rcx=fffff9800a7fe800

    rdx=fffffa8001a37fa0 rsi=fffffa80035975e0 rdi=fffffa8003597610

    rip=fffff88004c9b5d9 rsp=fffff880041a09d0 rbp=fffffa80034568d0

    r8=fffff9800a7fe801  r9=fffff9800a7fe7f0 r10=fffff9800a7fe800

    r11=0000000000000000 r12=0000000000000000 r13=0000000000000000

    r14=fffff800600306c7 r15=fffffa8004381b80

    iopl=0         nv up ei ng nz na po nc

    cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000286

    myfault+0x15d9:

    fffff880`04c9b5d9 eb7a            jmp     myfault+0x1655 (fffff880`04c9b655)

     

    On x64 systems the first parameter is passed in rcx.  The below assembly shows that rcx originated from rbx.

     

    kd> ub fffff880`04c9b5d9

    myfault+0x15ba:

    fffff880`04c9b5ba ff15a80a0000    call    qword ptr [myfault+0x2068 (fffff880`04c9c068)]

    fffff880`04c9b5c0 33d2            xor     edx,edx

    fffff880`04c9b5c2 488bc8          mov     rcx,rax

    fffff880`04c9b5c5 488bd8          mov     rbx,rax

    fffff880`04c9b5c8 ff154a0a0000    call    qword ptr [myfault+0x2018 (fffff880`04c9c018)]

    fffff880`04c9b5ce 33d2            xor     edx,edx

    fffff880`04c9b5d0 488bcb          mov    rcx,rbx

    fffff880`04c9b5d3 ff153f0a0000    call    qword ptr [myfault+0x2018 (fffff880`04c9c018)]

     

    Run !verifier 80 using the address from rbx:

     

    kd> !verifier 80 fffff9800a7fe800

     

    Log of recent kernel pool Allocate and Free operations:

     

    There are up to 0x10000 entries in the log.

     

    Parsing 0x0000000000010000 log entries, searching for address 0xfffff9800a7fe800.

     

     

    ======================================================================

    Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0

    fffff80060251a32 nt!VfFreePoolNotification+0x4a

    fffff8005fe736c9 nt!ExFreePool+0x595

    fffff80060242597 nt!VerifierExFreePoolWithTag+0x47

    fffff88004c9b5ce myfault!MyfaultDeviceControl+0x2f2

    fffff88004c9b727 myfault!MyfaultDispatch+0xb7

    fffff80060241a4a nt!IovCallDriver+0xba

    fffff800600306c7 nt!IopXxxControlFile+0x7e5

    fffff800600458a6 nt!NtDeviceIoControlFile+0x56

    fffff8005fc5a453 nt!KiSystemServiceCopyEnd+0x13

    ======================================================================

    Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0

    fffff80060242a5d nt!VeAllocatePoolWithTagPriority+0x2d1

    fffff8006024b20e nt!XdvExAllocatePoolInternal+0x12

    fffff80060242f69 nt!VerifierExAllocatePool+0x61

    fffff88004c9b5c0 myfault!MyfaultDeviceControl+0x2e4

    fffff88004c9b727 myfault!MyfaultDispatch+0xb7

    fffff80060241a4a nt!IovCallDriver+0xba

    fffff800600306c7 nt!IopXxxControlFile+0x7e5

    fffff800600458a6 nt!NtDeviceIoControlFile+0x56

    fffff8005fc5a453 nt!KiSystemServiceCopyEnd+0x13

     

    The above output shows the pool block being allocated by myfault.sys and then freed by myfault.sys.  If we combine this information with the call stack leading up to our bugcheck we can conclude that the pool was freed once in MyfaultDeviceControl at offset 0x2f2, then freed again in MyfaultDeviceControl at offset 0x2fd.

     

    Now we know which driver is causing the problem, and if this is our driver we know which area of the code to investigate.

  • Ntdebugging Blog

    Debugging a Windows 8.1 Store App Crash Dump

    • 0 Comments

    Quality reports on the App Summary page

    Microsoft provides triage dumps of your Windows Store application’s crashes and hangs through the Quality section of the App Summary page on the Dev Center - Windows Store apps portal.

     

    Back in June 2012, the Windows Store team posted an article on this feature and the basics of debugging the dumps provided.  Improving apps with Quality reports,

    http://blogs.msdn.com/b/windowsstore/archive/2012/06/27/improving-apps-with-quality-reports.aspx.

     

    This article digs further into the debugging of Windows Store application crash dump files, and explains the recent changes made to exception reporting in Windows 8.1.

     

    The files being debugged can be obtained from the Quality page or by collecting them yourself using Windows Error Reporting (WER) or the AeDebug feature of Windows.

     

    An example AeDebug tool is Sysinternals ProcDump. To configure crash dumping, execute the following from an elevated command prompt:

     

    C:\>md c:\dumps

    C:\>procdump.exe -ma -i c:\dumps

     

    Windows Runtime Architecture

    The Windows Runtime (WinRT API) is at the core of all Windows Store applications. Similar to how Win32 and.NET sit between the Desktop app and the kernel, the WinRT API sits between the Windows Store app and the kernel.

     

     

    In between the WinRT API and app is a layer called the Language Projection layer. This layer projects the C++ centric concepts of WinRT, into language specific concepts.

     

    The projection of errors through the Language Projection layer is the focus of this article.

    • In WinRT, errors are modeled as IErrorInfo and IRestrictedErrorInfo interfaces.
    • In CLR languages, errors are modeled as exceptions and are represented as class objects derived from System.Exception.
    • In JavaScript, errors are also modeled as exceptions and are represented as JavaScript Exception (JSE) objects.
    • In C/C++, errors are modeled as an interface or a pure HRESULT.

     

     

    Because each language has a different concept on how errors are handled, the projection layer needs to use a least common denominator. For errors, that means that just an HRESULT (Error Code) and HSTRING (Error Message) are sent through the projection layer. Any addition information held by WinRT’s interface is not available in the receiving language. And conversely, any additional information held by the language’s object is not available to WinRT.

     

    If the error becomes unhandled, the HRESULT becomes the Exception Code reported in the Exception Record (of a live debug session or dump file).

     

    Visual Studio 2013

    Opening a dump file in Visual Studio allows you to see the Exception Record via the MiniDump File Summary. The Exception Code is listed in the Dump Summary section.

     

    MinidumpFileSummary

     

    If you Debug the application, the Exception Code‘s Description will be listed in the Output window.

     

    OutputWindow

     

    The call stack of the Exception Record’s context is viewable in the Call Stack window. Depending on the dump’s state, the $exceptionstack pseudo variable can be used in a Watch (or Locals) window to see the stack.

     

    VisualStudio-CallStack

     

    Note, having the Private PDBs of the application will make the stack output more complete/accurate.

     

    Debugging Tools for Windows

    Using the Debugging Tools for Windows, the Exception Record can be displayed using the .exr -1 command. The Exception Code’s description can (sometimes) be looked up using the !error <code> command. The context of the exception is changed to with the .ecxr command. The stack is displayed with the k command (knL adds frame numbers and omits source line information).

     

    0:004> .exr -1

    ExceptionAddress: 722248e8 (msvcr110!Concurrency::details::_ReportUnobservedException+0x00000022)

       ExceptionCode: c0000409 (Security check failure or stack buffer overrun)

      ExceptionFlags: 00000001

    NumberParameters: 1

       Parameter[0]: 00000005

     

    0:004> !error c0000409

    Error code: (NTSTATUS) 0xc0000409 (3221226505) - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.

     

    0:004> .ecxr

    eax=00000001 ebx=ffffffff ecx=00000005 edx=0a6ee048 esi=13672424 edi=0546c33c

    eip=722248e8 esp=02ceeef8 ebp=02ceef14 iopl=0         nv up ei pl nz na po nc

    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202

    msvcr110!Concurrency::details::_ReportUnobservedException+0x22:

    722248e8 cd29            int     29h

     

    0:004> knL

      *** Stack trace for last set context - .thread/.cxr resets it

    # ChildEBP RetAddr 

    00 02ceeef4 00f2f6cb msvcr110!Concurrency::details::_ReportUnobservedException+0x22

    WARNING: Stack unwind information not available. Following frames may be wrong.

    01 02ceef14 00f2fad7 MyBadApp+0xef6cb

    02 02ceef40 01122720 MyBadApp+0xefad7

    03 02ceef64 011228eb MyBadApp+0x2e2720

    04 02ceef70 00f3960a MyBadApp+0x2e28eb

    05 02ceefb0 010cc804 MyBadApp+0xf960a

    06 02ceefc0 0112108e MyBadApp+0x28c804

    07 02ceeff8 72422b61 MyBadApp+0x2e108e

    08 02cef034 72427e27 Microsoft_Xbox!DllGetClassObject+0x61352

    09 02cef040 76095c3e Microsoft_Xbox!DllGetClassObject+0x66618

    0a 02cef060 7610f497 rpcrt4!Invoke+0x2a

    0b 02cef6ec 75c241f8 rpcrt4!NdrStubCall2+0x33c

    0c 02cef734 75c1f58a combase!CStdStubBuffer_Invoke+0xc1

    0d 02cef7c0 75b24617 combase!SyncStubInvoke+0x144

    0e (Inline) -------- combase!StubInvoke+0x9a

    0f 02cef8e8 75b97d8d combase!CCtxComChnl::ContextInvoke+0x222

    10 02cef90c 75c24cc9 combase!DefaultInvokeInApartment+0x30

    11 (Inline) -------- combase!ASTAInvokeInApartment+0x35

    12 02cef9b4 75c1fdc7 combase!AppInvoke+0x5ae

    13 02cefb00 75c24c71 combase!ComInvokeWithLockAndIPID+0x5ed

    14 02cefb20 75b93118 combase!ComInvoke+0x153

    15 02cefb30 75b97b11 combase!ThreadDispatch+0x23

    16 02cefb44 75be53b5 combase!CComApartment::ASTAHandleMessage+0xe6

    17 02cefb68 75ba8f22 combase!ASTAWaitContext::DispatchCallsOnExitNonBlockingProcessEventsIfAppropriate+0x9e

    18 02cefb8c 75b5917e combase!ASTAWaitContext::~ASTAWaitContext+0x1a9

    19 02cefb98 74acb13d combase!CoEndProcessEvents+0x37

    1a 02cefbf4 00f733d7 windows_ui!Windows::UI::Core::CDispatcher::ProcessEvents+0x29ac1

    1b 02cefc64 00f77f46 MyBadApp+0x1333d7

    1c 02cefc94 74f6f45e MyBadApp+0x137f46

    1d 02cefca0 74f6f322 twinapi_appcore!Windows::ApplicationModel::Core::CoreApplicationView::Run+0x27

    1e 02cefcc0 74b1008a twinapi_appcore!<lambda_f0454c86bc54370cf843d844d6c13e00>::operator()+0xb2

    1f 02cefd44 75f4a534 SHCore!_WrapperThreadProc+0xe2

    20 02cefd50 77dd8f8b kernel32!BaseThreadInitThunk+0xe

    21 02cefd94 77dd8f61 ntdll!__RtlUserThreadStart+0x20

    22 02cefda4 00000000 ntdll!_RtlUserThreadStart+0x1b

     

    Simple so far...

    Using Visual Studio or the Debugging Tools for Windows is relatively simple when the Exception Record is associated with the call stack of the issue. This is not however always the case. It depends on what side of the projection layer the issue occurred. If the error (exception) was not handled on the language side, the exception is marshaled (projected) to the WinRT side for its exception handling. When this occurs, it starts getting very, very tricky indeed, to see what stack caused the issue...

     

    Language Exceptions - Error Code 0xC000027B

    In the initial design of the WinRT API, the projection of errors was done though a call to the RoOriginateError function. This function takes a HRESULT and HSTRING. Of note, there is no call stack captured. The limits of the RoOriginateError function were recognized and a new (associated) function was created for Windows 8.1.

     

    The RoOriginateLanguageException function takes a HRESULT, HSTRING and a marshalable interface pointer. When RoOriginateLanguageException in called, the current call stack is captured and is passed as part of the error.

     

    The purpose of RoOriginateLanguageException is to marshal the interface pointer so that additional language information is available on the other side of the projection layer. This behavior is achieved by using a specific exception code. Instead of the using the (user defined) HRESULT, a value of 0xC000027B is used. This error code indicates to the receiver that there is data to unmarshal. The data includes the HRESULT and HSTRING, and also the Interface pointer.

     

    The important point to understand here is that all async exceptions raised in Windows 8.1 Windows Store apps now result in a 0xC000027B error code in the Exception Record, not the error code passed by the caller.

     

    Debugging Language Exceptions

    The Exception Record of a Language Exception contains (of note) the exception code (0xC000027B) and two parameters.

     

    0:004> .exr -1

    ExceptionAddress: 73034fec (Windows_UI_Xaml!DirectUI::ErrorHelper::ProcessUnhandledError+0x000000b8)

       ExceptionCode: c000027b

      ExceptionFlags: 00000001

    NumberParameters: 2

       Parameter[0]: 0bdaf240

       Parameter[1]: 00000001

     

    0:006> !error c000027b

    Error code: (NTSTATUS) 0xc000027b (3221226107) - An application-internal exception has occurred.

     

    The first parameter is the address of a pointer array (of unmarshalled data). The second parameter is the count of pointers in the pointer array.

     

    So why is the need for a count?  Since applications have multiple threads, it is possible for multiple threads to call RoOriginateLanguageException simultaneously. Equally, there can be a cyclic nature to the experience, where exceptions are caught and then re-thrown. Since WinRT processes the errors asynchronously, multiple errors exist regularly. The first exception in the array should be the focus of the investigation.

     

    [Tip] Even though Microsoft publishes the private symbols for combase.dll (allowing you to view the local variables of combase!RoFailFast* functions), these locals regularly resolve to invalid addresses due to register reuse and other code flow optimizations. The pointer array in Parameter[0] is the correct place to get the address of the language exception pointer array.

     

    Original Error Code

    The first step when debugging a Language Exception is to determine the actual error code of the caller, instead of the 0xC000027B error code.

     

    Casting an address to a pointer array of a specific type in Visual Studio is, put simply, too difficult to undertake. The easiest option is to use the Debugging Tools for Windows. Even though these tools are all command-line driven and use an obscure syntax, it is relatively easy to follow the following commands to get to the important information.

     

    If not done already, set your symbol path to the Microsoft Public Symbol server:

     

    0:004> .sympath SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

    Symbol search path is: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

    Expanded Symbol search path is: srv*c:\Symbols*http://msdl.microsoft.com/download/symbols

    ************* Symbol Path validation summary **************

    Response                         Time (ms)     Location

    Deferred                                       SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

     

    Force the load of the symbols using the .reload /f command:

     

    0:004> .reload /f

    ...

     

    The next step is to display the pointer array as the original structure type. First, we need to know what structure to cast the pointer array to. Using the Parameter[0] value from .exr -1, we will generate a dt command that will display the header of the first record. We use Parameter[0] as the address in this command.

     

    dt <Parameter[0]> combase!_STOWED_EXCEPTION_INFORMATION_HEADER*

     

    Here’s an example:

     

    0:004> .exr -1

    ExceptionAddress: 73034fec (Windows_UI_Xaml!DirectUI::ErrorHelper::ProcessUnhandledError+0x000000b8)

       ExceptionCode: c000027b

      ExceptionFlags: 00000001

    NumberParameters: 2

       Parameter[0]: 0180cf90

       Parameter[1]: 00000003

     

    0:004> dt 0180cf90 combase!_STOWED_EXCEPTION_INFORMATION_HEADER*

    0x070884a4

       +0x000 Size             : 0x20

       +0x004 Signature        : 0x53453031

     

    The value of the Signature member (0x53453031) is converted to a string using .formats <value>.

     

    0:006> .formats 0x53453031

    Evaluate expression:

      Hex:     53453031

      Decimal: 1397043249

      Octal:   12321230061

      Binary:  01010011 01000101 00110000 00110001

      Chars:   SE01

      Time:    Wed Apr 09 04:34:09 2014

      Float:   low 8.46917e+011 high 0

      Double:  6.90231e-315

     

    The chars “SE01” map to a structure name of combase!_STOWED_EXCEPTION_INFORMATION_V1. It can be assumed that v2 uses a signature of “SE02” and a structure name of combase!_STOWED_EXCEPTION_INFORMATION_V2, and so on…

     

    Now we know the type, we can again use the values from .exr -1 to generate a dt command that will display each record. We use the Parameter[0] as the address, and Parameter[1] as the count in the command. We add an “*” to the end of the type as this is an array of pointers to the type, not structures packed next to each other.

     

    In this example, there are 3 pointers, so 3 records are displayed:

     

    dt -a<Parameter[1]> <Parameter[0]> combase!_STOWED_EXCEPTION_INFORMATION_V1*

     

    Note, there is no space between the -a and <Parameter[1]>.

     

    0:004> .exr -1

    ExceptionAddress: 73034fec (Windows_UI_Xaml!DirectUI::ErrorHelper::ProcessUnhandledError+0x000000b8)

       ExceptionCode: c000027b

      ExceptionFlags: 00000001

    NumberParameters: 2

       Parameter[0]: 0180cf90

       Parameter[1]: 00000003

     

    0:004> dt -a3 0180cf90 combase!_STOWED_EXCEPTION_INFORMATION_V1*

    [0] @ 0180cf90

    ---------------------------------------------

    0x070884a4

       +0x000 Header           : _STOWED_EXCEPTION_INFORMATION_HEADER

       +0x008 ResultCode       : 80131500

       +0x00c ExceptionForm    : 0y01

       +0x00c ThreadId         : 0y000000000000000000010100111100 (0x53c)

       +0x010 ExceptionAddress : 0x7721ea23 Void

       +0x014 StackTraceWordSize : 4

       +0x018 StackTraceWords  : 5

       +0x01c StackTrace       : 0x06f48418 Void

       +0x010 ErrorText        : 0x7721ea23  "?????"

     

    [1] @ 0180cf94

    ---------------------------------------------

    0x071ca274

       +0x000 Header           : _STOWED_EXCEPTION_INFORMATION_HEADER

       +0x008 ResultCode       : 80131500

       +0x00c ExceptionForm    : 0y01

       +0x00c ThreadId         : 0y000000000000000000010100111100 (0x53c)

       +0x010 ExceptionAddress : (null)

       +0x014 StackTraceWordSize : 4

       +0x018 StackTraceWords  : 0x19

       +0x01c StackTrace       : 0x071c926c Void

       +0x010 ErrorText        : (null)

     

    [2] @ 0180cf98

    ---------------------------------------------

    0x071c922c

       +0x000 Header           : _STOWED_EXCEPTION_INFORMATION_HEADER

       +0x008 ResultCode       : 80131534

       +0x00c ExceptionForm    : 0y01

       +0x00c ThreadId         : 0y000000000000000000010100111100 (0x53c)

       +0x010 ExceptionAddress : (null)

       +0x014 StackTraceWordSize : 4

       +0x018 StackTraceWords  : 9

       +0x01c StackTrace       : 0x071c8224 Void

       +0x010 ErrorText        : (null)

     

    The ResultCode member is 80131500 in the first two records, and 80131534 in the third record. A quick use of the !error <code> command looks up the descriptions:

     

    0:007> !error 80131500

    Error code: (HRESULT) 0x80131500 (2148734208) - <Unable to get error code text>

     

    0:007> !error 80131534

    Error code: (HRESULT) 0x80131534 (2148734260) - <Unable to get error code text>

     

    In this case, both aren‘t well-known error codes. This is common as API specific error codes aren’t in the OS error lookup routines.

     

    Here are some examples of known error codes, found by looking at a random selection of dumps. Some are quite common (80004003, 80004005 and 80070057) while others are quite rare:

     

    0:004> !error 80004003

    Error code: (HRESULT) 0x80004003 (2147500035) - Invalid pointer

     

    0:004> !error 80004005

    Error code: (HRESULT) 0x80004005 (2147500037) - Unspecified error

     

    0:005> !error 8000ffff

    Error code: (HRESULT) 0x8000ffff (2147549183) - Catastrophic failure

     

    0:004> !error 80070057

    Error code: (HRESULT) 0x80070057 (2147942487) - The parameter is incorrect.

     

    0:006> !error 80073db8

    Error code: (HRESULT) 0x80073db8 (2147958200) - Loading the state store failed.

     

    0:005> !error 800f1000

    Error code: (HRESULT) 0x800f1000 (2148470784) - No installed components were detected.

     

    0:006> !error 88985004

    Error code: (HRESULT) 0x88985004 (2291683332) - A font file exists but could not be opened due to access denied, sharing violation, or similar error.

     

    Original Call Stack

    Regardless of whether the error code is known or unknown, it is useful to determine the location of the issue by viewing the call stack.

     

    Symbol Pointers

    If the ExceptionForm member has a value of 0y01, the structure’s union represents a call stack.

     

    Unlike call stacks associated with threads, where the symbol pointers are placed throughout the stack next to local variables, these symbols pointers are packed tightly at the address specified in the StackTrace member. The dpS command is used to display the call stack.

    • It is important to include a limit (L) as the call stack is regularly longer than the default 10 rows displayed by dpS. The limit’s value is in the StackTraceWords member.
    • Note that capital S is used (dps vs dpS) because we want to omit the first column normally displayed by dps; the location of the symbol pointer is irrelevant.
    • If you aren‘t using the same bitness debugger as the target’s bitness, use ddS for StackTraceWordSize = 4, and dqS for StackTraceWordSize = 8.

     

    0:004> dt -a3 0180cf90 combase!_STOWED_EXCEPTION_INFORMATION_V1*

    [0] @ 0180cf90

    ---------------------------------------------

    0x070884a4

       +0x000 Header           : _STOWED_EXCEPTION_INFORMATION_HEADER

       +0x008 ResultCode       : 80131500

       +0x00c ExceptionForm    : 0y01

       +0x00c ThreadId         : 0y000000000000000000010100111100 (0x53c)

       +0x010 ExceptionAddress : 0x7721ea23 Void

       +0x014 StackTraceWordSize : 4

       +0x018 StackTraceWords  : 5

       +0x01c StackTrace       : 0x06f48418 Void

       +0x010 ErrorText        : 0x7721ea23  "?????"

    ...

     

    0:007> dpS 0x06f48418 L5

    7723f217 combase!RoOriginateLanguageException+0x3b

    72e29bfd clr!SetupErrorInfo+0x1e1

    72ef27e1 clr!MarshalNative::GetHRForException_WinRT+0x7d

    71981170 Windows_UI_Xaml_ni+0x291170

    72b02a36 clr!COMToCLRDispatchHelper+0x28

     

    Unicode String Pointer

    If the ExceptionForm member has a value of 0y10, the structure’s union represents an error message.

     

    The call stack is (hopefully) contained within the Unicode string pointed at by the ErrorText member. As the text is defined by the caller, the existence of a call stack text isn’t guaranteed.

     

    0:005> dt –a1 13f117e0 combase!_STOWED_EXCEPTION_INFORMATION_V1*

    [0] @ 13f117e0

    ---------------------------------------------

    0x0471f3c0

       +0x000 Header           : _STOWED_EXCEPTION_INFORMATION_HEADER

       +0x008 ResultCode       : 8000ffff

       +0x00c ExceptionForm    : 0y10

       +0x00c ThreadId         : 0y000000000000000000010101110100 (0x574)

       +0x010 ExceptionAddress : 0x0de38f7c Void

       +0x014 StackTraceWordSize : 0

       +0x018 StackTraceWords  : 0

       +0x01c StackTrace       : (null)

       +0x010 ErrorText        : 0x0de38f7c  "System.Exception..   at Windows.UI.Xaml.VisualStateManager.GoToState(Control control, String stateName, Boolean useTransitions)..   at MyBadApp.Common.LayoutAwarePage.InvalidateVisualState()..   at MyBadApp.Common.LayoutAwarePage.WindowSizeChanged(Object sender, WindowSizeChangedEventArgs e)"

     

    CLR - Last Exception Object

    Sometimes, the call stack retrieved from the record isn’t that useful. It may just be the call stack leading up to RoOriginateLanguageException function call, or it just might not relate to any of the code that the application author has written. In these cases, the CLR provides one more chance to understand the issue.

     

    When the CLR throws an exception on a managed thread, the address of the exception object is kept in an (internal) per-thread variable. This address is what the !sos.pe (print exception) command reads to display the CLR Last Exception of a thread.

     

    Note, if you use the Windows 8.1 SDK version of the Debugging Tools for Windows, SOS will be automatically loaded for you, including the download of any required DLLs. As such, it is highly suggested that you use the Windows 8.1 version.

     

    Example #1

    Looking at this example, we can see that there is a single record with an "Invalid pointer" error.

     

    0:006> .exr -1

    ExceptionAddress: 00007ffb87c46960 (twinapi_appcore!Microsoft::WRL::ComPtr<Windows::ApplicationModel::Core::UnhandledErrorDetectedEventArgs>::{dtor})

       ExceptionCode: c000027b

      ExceptionFlags: 00000001

    NumberParameters: 2

       Parameter[0]: 0000003fbc80c8a0

       Parameter[1]: 0000000000000001

     

    0:006> dt -a1 0000003fbc80c8a0 combase!_STOWED_EXCEPTION_INFORMATION_V1*

    [0] @ 0000003f`bc80c8a0

    ---------------------------------------------

    0x0000003f`bfc3c5b8

       +0x000 Header           : _STOWED_EXCEPTION_INFORMATION_HEADER

       +0x008 ResultCode       : 80004003

       +0x00c ExceptionForm    : 0y01

       +0x00c ThreadId         : 0y000000000000000000100111001000 (0x9c8)

       +0x010 ExceptionAddress : 0x00007ffb`981e1f1c Void

       +0x018 StackTraceWordSize : 8

       +0x01c StackTraceWords  : 0x18

       +0x020 StackTrace       : 0x0000003f`bd7ac9c0 Void

       +0x010 ErrorText        : 0x00007ffb`981e1f1c  "???"

     

    0:006> !error 80004003

    Error code: (HRESULT) 0x80004003 (2147500035) - Invalid pointer

     

    This is a common call stack. A CLR exception is being marshaling to the unhandled error reporting sub-system of WinRT.

     

    0:006> dpS 0x0000003f`bd7ac9c0 L18

    00007ffb`98238d27 combase!RoOriginateLanguageException+0x57

    00007ffb`71e0f926 mscorlib_ni!DomainNeutralILStubClass.IL_STUB_PInvoke(Int32, System.String, IntPtr)+0xe6

    00007ffb`71ff7084 mscorlib_ni!System.Runtime.InteropServices.WindowsRuntime.WindowsRuntimeMarshal.RoOriginateLanguageException(Int32, System.String, IntPtr)+0x44

    00007ffb`71ff6b8d mscorlib_ni!System.Runtime.InteropServices.WindowsRuntime.WindowsRuntimeMarshal.ReportUnhandledError(System.Exception)+0x12d

    00007ffb`885042f4 System_Runtime_WindowsRuntime_ni!System.Threading.WinRTSynchronizationContext+Invoker.InvokeCore()+0x73e04

    00007ffb`7ee6b915 clr!ExceptionTracker::CallHandler+0xc5

    00007ffb`7ee6b80b clr!ExceptionTracker::CallCatchHandler+0x7f

    00007ffb`7ee6b728 clr!ProcessCLRException+0x2e6

    00007ffb`9a30a7fd ntdll!RtlpExecuteHandlerForUnwind+0xd

    00007ffb`9a2b36ba ntdll!RtlUnwindEx+0x366

    00007ffb`7ee6d1c0 clr!ClrUnwindEx+0x40

    00007ffb`7ee6d174 clr!ProcessCLRException+0x2b2

    00007ffb`9a30a77d ntdll!RtlpExecuteHandlerForException+0xd

    00007ffb`9a2b29fb ntdll!RtlDispatchException+0x19b

    00007ffb`9a2b2668 ntdll!RtlRaiseException+0xf0

    00007ffb`976c8384 KERNELBASE!RaiseException+0x68

     

    There is a CLR Last Exception object and the exception code of it matches the record’s code:

     

    0:006> !sos.pe

    Exception object: 0000003fa2be4830

    Exception type:   System.NullReferenceException

    Message:          Object reference not set to an instance of an object.

    InnerException:   <none>

    StackTrace (generated):

        SP               IP               Function

        0000003FBC80D190 00007FFB1F72FC18 MyBadApp!MyBadApp.Utilities.Authentication.GetAliasFromSecurityToken()+0x18

        0000003FBC80D1D0 00007FFB1F72FAC2 MyBadApp! MyBadApp.MainPage.MainPage_AuthenticateUserCompleted(System.Object, System.EventArgs)+0x82

        0000003FBC80D210 00007FFB1F72F5E5 MyBadApp! MyBadApp.MainPage+<AuthenticateUser_Async>d__0.MoveNext()+0x305

        0000003FBC80EE60 00007FFB724F0B31 mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore.<ThrowAsync>b__4(System.Object)+0x4d61d1

        0000003FBC80EE90 00007FFB88490523 System_Runtime_WindowsRuntime_ni!System.Threading.WinRTSynchronizationContext+Invoker.InvokeCore()+0x33

     

    StackTraceString: <none>

    HResult: 80004003

     

    In this case, you can surmise that the System.NullReferenceException exception was thrown within the MyBadApp!MyBadApp.Utilities.Authentication.GetAliasFromSecurityToken() function, and that it was unhandled.

     

    Extraction of the CLR Last Exception object can also sometimes be done in Visual Studio. When you do a Debug with Managed Only on the dump file, the Locals window sometimes contains a pseudo variable called $exception that represents the exception.

     

    VisualStudio-Locals

     

    The Text Visualizer of the StackTrace member allows you to see the call stack.

     

    TextVisualizer

     

    Example #2

    Looking at another example, we can see that again there is a single record, this time with an "Unspecified error" exception code.

     

    0:004> .exr -1

    ExceptionAddress: 73034fec (Windows_UI_Xaml!DirectUI::ErrorHelper::ProcessUnhandledError+0x000000b8)

       ExceptionCode: c000027b

      ExceptionFlags: 00000001

    NumberParameters: 2

       Parameter[0]: 0bdaf240

       Parameter[1]: 00000001

     

    0:004> dt -a1 0bdaf240 combase!PSTOWED_EXCEPTION_INFORMATION_V1

    [0] @ 0bdaf240

    ---------------------------------------------

    0x0af64034

       +0x000 Header           : _STOWED_EXCEPTION_INFORMATION_HEADER

       +0x008 ResultCode       : 80004005

       +0x00c ExceptionForm    : 0y01

       +0x00c ThreadId         : 0y000000000000000000010010011110 (0x49e)

       +0x010 ExceptionAddress : (null)

       +0x014 StackTraceWordSize : 4

       +0x018 StackTraceWords  : 5

       +0x01c StackTrace       : 0x0af6302c Void

       +0x010 ErrorText        : (null)

     

    0:006> !error 80004005

    Error code: (HRESULT) 0x80004005 (2147500037) - Unspecified error

     

    The call stack of the record suggests that this is associated with GetNavigationState:

     

    0:004> dpS 0x0af6302c L5

    72ec4e7d Windows_UI_Xaml!DirectUI::NavigationHistory::WritePageStackEntryToString+0x1f7fde

    72ec4ef0 Windows_UI_Xaml!DirectUI::NavigationHistory::GetNavigationState+0x1f7ddf

    72ccd0fa Windows_UI_Xaml!DirectUI::Frame::GetNavigationStateImpl+0x3a

    72cccced Windows_UI_Xaml!DirectUI::FrameGenerated::GetNavigationState+0x2f

    737d00eb Windows_UI_Xaml_ni+0x2400eb

     

    But the CLR Last Exception object doesn’t have the same exception code as the record:

     

    0:004> !sos.pe

    Exception object: 02cb3cb8

    Exception type:   <Unknown>

    Message:          <Invalid Object>

    InnerException:   System.Runtime.InteropServices.COMException, Use !PrintException e2a09bc6 to see more.

    StackTrace (generated):

        SP       IP       Function

        052DF6C0 07141094 MyBadApp!UNKNOWN+0x544

        052DF8A4 73E0D17A mscorlib_ni!System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)+0x5e

        052DF8B4 73E0D115 mscorlib_ni!System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)+0x35

        052DF8C0 071409DB MyBadApp!UNKNOWN+0xb3

        052DF8CC 7458458F mscorlib_ni!System.Runtime.CompilerServices.AsyncMethodBuilderCore.<ThrowAsync>b__4(System.Object)+0x33

        052DF8D4 6F9EF994 System_Runtime_WindowsRuntime_ni!System.Threading.WinRTSynchronizationContext+Invoker.InvokeCore()+0x24

     

    StackTraceString: <none>

    HResult: 80131500

     

    0:006> !error 80131500

    Error code: (HRESULT) 0x80131500 (2148734208) - <Unable to get error code text>

     

    It does however have a nested CLR Exception object that does have the same exception code as the record. It too has a call stack that indicates GetNavigationState is having an issue.

     

    0:004> !PrintException /d 02caf968

    Exception object: 02caf968

    Exception type:   System.Runtime.InteropServices.COMException

    Message:          <Invalid Object>

    InnerException:   <none>

    StackTrace (generated):

        SP       IP       Function

        00000000 00000001 Windows_UI_Xaml_ni!Windows.UI.Xaml.Controls.Frame.GetNavigationState()+0x2

        052DF778 071411B5 MyBadApp!UNKNOWN+0x1d

        052DF788 07140BF3 MyBadApp!UNKNOWN+0xa3

     

    StackTraceString: <none>

    HResult: 80004005

     

    Summary

    The asynchronous and projected nature of Windows Store applications makes them significantly harder to debug than desktop applications. Knowing the error code and call stack is just the first step in understanding the root cause of a crash in a Store application. Hopefully this blog post has made those first steps easier to undertake, and that those first steps have pointed you in the right direction.

     

    The solutions to some of the more common issues have been talked about on episodes of Channel 9 Defrag Tools. These episodes show the code changes required to avoid the hang or crash:

  • Ntdebugging Blog

    NTFS Misreports Free Space (Part 3)

    • 0 Comments

    It’s been a while since my last post on this topic, and I wanted to take some time to update everyone on a cool new feature in Windows Server 2012 R2 and Windows 8.1.  Today we declare part 1 and part 2 of this blog as obsolete - at least for Windows Server 2012 R2 and Windows 8.1 users.

     

    The latest fsutil.exe now allows for the creation of an allocation report which summarizes how all of your disk space is being used by NTFS.  This new fsutil.exe functionality is implemented though some new file system controls that only exist on Windows Server 2012 R2 and Windows 8.1, so the binary is not portable to previous versions of Windows.

     

    USAGE: fsutil volume allocationreport X:

    X: is the drive letter of an NTFS volume on your system.

     

    Allocation Report

    The allocation report gives a summary of total reserved, free, and allocated clusters.  Reserved clusters are clusters that NTFS reserves just in case it needs to allocate space for a critical operation (like expanding a compressed file or extending the $MFT).  If you’re experiencing insufficient disk space errors on a volume that has plenty of free space, the issue could be caused by opening many compressed NTFS files at the same time.  Please refer to Understanding Ntfs Compression for more information on how to troubleshoot this.

     

    Allocation report:
    Total clusters              : 244100351 (999835037696 bytes)
    Free clusters               : 232507563 (952350978048 bytes)
    Reserved clusters           : 18352 (75169792 bytes)
    Total allocated             : 47484059648 bytes

     

    System Files

    If you suspect that there’s something you can’t see that’s taking up disk space, check the System Files section to see how much disk space is used by the system.  In this example, I have 884,703,232 bytes in use by NTFS metadata, and the breakdown of each system file’s usage is outlined below.  For details on each system file type, refer to http://blogs.technet.com/b/askcore/archive/2009/12/30/ntfs-metafiles.aspx.

     

    System files                : Count: 29. Total allocated: 884703232 bytes.
        $Mft                    : File ID 0x0001000000000000. Total allocated: 238063616 bytes.
        $MftMirr                : File ID 0x0001000000000001. Total allocated: 4096 bytes.
        $LogFile                : File ID 0x0002000000000002. Total allocated: 67108864 bytes.
        $Volume                 : File ID 0x0003000000000003. Total allocated: 0 bytes.
        $AttrDef                : File ID 0x0004000000000004. Total allocated: 4096 bytes.
        Root folder             : File ID 0x0005000000000005. Total allocated: 8192 bytes.
        $Bitmap                 : File ID 0x0006000000000006. Total allocated: 30515200 bytes.
        $Boot                   : File ID 0x0007000000000007. Total allocated: 8192 bytes.
        $BadClus                : File ID 0x0008000000000008. Total allocated: 0 bytes.
        $Secure                 : File ID 0x0009000000000009. Total allocated: 1855488 bytes.
        $UpCase                 : File ID 0x000a00000000000a. Total allocated: 131072 bytes.
        $Extend                 : File ID 0x000b00000000000b. Total allocated: 0 bytes.
        $ObjId                  : File ID 0x0001000000000019. Total allocated: 24576 bytes.
        $Quota                  : File ID 0x0001000000000018. Total allocated: 0 bytes.
        $Reparse                : File ID 0x000100000000001a. Total allocated: 786432 bytes.
        $UsnJrnl                : File ID 0x0002000000012f66. Total allocated: 34144256 bytes.
        $RmMetadata             : File ID 0x000100000000001b. Total allocated: 0 bytes.
        $Repair                 : File ID 0x000100000000001c. Total allocated: 94371840 bytes.
        $Txf                    : File ID 0x000100000000001e. Total allocated: 4096 bytes.
        $TxfLog                 : File ID 0x000100000000001d. Total allocated: 4096 bytes.
        $Tops                   : File ID 0x000100000000001f. Total allocated: 396623872 bytes.
        $TxfLog.blf             : File ID 0x0001000000000020. Total allocated: 65536 bytes.
        Other system files      : Count: 4. Total allocated: 0 bytes.
        Other system files under $Txf folder:
            Count               : 1
            Total allocated     : 8192 bytes.
        Other system files under $TxfLog folder:
            Count               : 2
            Total allocated     : 20971520 bytes.

     

    System Volume Information 

    If the usage in System Volume Information is higher than expected, the issue is likely to be caused by storage of diff areas for VSS volume shadow copies.  Deleting the volume shadow copies with VSSAdmin or Diskshadow will return the free space.  System Volume Information is also the home of the chunk store used by NTFS deduplication.

     

    System Volume Information   : Total allocated: 5366915072 bytes.
        Files                   : Count: 18. Total allocated: 5366882304 bytes.
        Folders                 : Count: 7. Total allocated: 32768 bytes.

     

    User Folders

    It costs something to maintain the folder structure of a volume, and the user folders section summarizes the overall cost.  Within this section is also a summary of how many NTFS compressed folders exist.  As you can see below, I have 145 folders with a compressed attribute flag but the total number of compressed bytes is zero.  I puzzled over the idea of zero compressed bytes until I discovered that this measurement is of how many compressed bytes exist in the context of folder indexes, and indexes are never compressed.  Only user data streams are compressed natively by NTFS.

     

    User folders                : Count: 23101. Total allocated: 77889536 bytes.
        Default streams         : 4689
            Allocated           : 4689
            Total allocated     : 77885440 bytes.
        Named streams           : 7
            Allocated           : 0
            Total allocated     : 0 bytes.
        Local metadata streams  : 95566
            Allocated           : 1
            Total allocated     : 4096 bytes.
    Within these folders there are:
        Compressed              : 145
            Total allocated     : 0 bytes
            Total size          : 0 bytes.
            Savings             : 0.00 %
        Sparse                  : 0
            Total allocated     : 0 bytes
            Total size          : 0 bytes.
            Savings             : 0.00 %
        Encrypted               : 0
            Total allocated     : 0 bytes

        With named streams      : 7
            Compressed          : 0
            Sparse              : 0
            Encrypted           : 0
        With no allocation      : 18412

     

    User Files

    In the user files section, we have a total of all user files and the compression statistics to show how much space is being saved by native NTFS compression.  There is also a nice summary of alternate named stream usage (ANS).  ANS allocations do not show up in DIR or Explorer, so this is a quick and easy way to see exactly how your named streams are affecting overall disk usage.  On my volume, I had 3115 files with named streams and zero bytes were allocated.  This seems to be another paradox, but there’s a logical explanation for what’s happening.  If a file has a named stream and the stream size is small enough for it to be resident, then the stream lives in the file’s MFT record (which is accounted in this report as part of $Mft                    : File ID 0x0001000000000000. Total allocated: 238063616 bytes.).

     

    User files                  : Count: 94128. Total allocated: 41154551808 bytes.
        Default streams         : 94128
            Allocated           : 72123
            Total allocated     : 41087229952 bytes.
        Named streams           : 4637
            Allocated           : 4562
            Total allocated     : 66740224 bytes.
        Local metadata streams  : 333248
            Allocated           : 142
            Total allocated     : 581632 bytes.
    Within these files there are:
        Compressed              : 2006
            Total allocated     : 374972416 bytes
            Total size          : 816416626 bytes.
            Savings             : 54.07 %
        Sparse                  : 1519
            Total allocated     : 1572864 bytes
            Total size          : 273374082 bytes.
            Savings             : 99.42 %
        Encrypted               : 0
            Total allocated     : 0 bytes

     

        With named streams      : 3115
            Compressed          : 0
            Sparse              : 0
            Encrypted           : 0
        With no allocation      : 20485

     

    As you can see, this new functionality in fsutil makes it easier and faster to determine what is using space on an NTFS volume.

  • Ntdebugging Blog

    Understanding ARM Assembly Part 2

    • 0 Comments

    My name is Marion Cole, and I am a Sr. Escalation Engineer in Microsoft Platforms Serviceability group.  This is Part 2 of my series of articles about ARM assembly.  In part 1 we talked about the processor that is supported.  Here we are going to talk about how Windows utilizes that ARM processor.

     

    As we discussed in part 1 Windows runs on the ARMV7-A with NEON.  We discussed the CPSR register in part 1.  There are a few bits that are important in the CPSR.  The first one is the Endian State bit:

    31

    30

    29

    28

    27

    26

    25

    24

    23

    22

    21

    20

    19

    18

    17

    16

    15

    14

    13

    12

    11

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    N

    Z

    C

    V

    Q

    IT

    J

    Reserved

    GE

    IT

    E

    A

    I

    F

    T

    M

     

    Bit 9 (the E bit) indicates the EndianState.  This bit should always be a 0 because Windows only runs in Little-Endian state.  So if you get a dump, and see the CPSR bit 9 is set then you have a problem.  Here is an example from the debugger:

    1: kd> r

    r0=00000001  r1=00000001  r2=00000000  r3=00000000  r4=e1074044  r5=c555b580

    r6=00000001  r7=e104ca39  r8=00000001  r9=00000000 r10=e9bf06c7 r11=d5f1ea08

    r12=e16b213c  sp=d5f1e9b0  lr=e0f0fe2f  pc=e0fdebd0 psr=00000133 ----- Thumb

    nt!DbgBreakPointWithStatus:

    e0fdebd0 defe     __debugbreak

     

    1: kd> .formats 00000133

    Evaluate expression:

      Hex:     00000133

      Decimal: 307

      Octal:   00000000463

      Binary:  00000000 00000000 00000001 00110011  ßBit 9 is 0.  Note first bit is Bit 0. 

      Chars:   ...3

      Time:    Wed Dec 31 18:05:07 1969

      Float:   low 4.30199e-043 high 0

      Double:  1.51678e-321

     

    So how could Bit 9 ever be a 1?  The SETEND instruction in the ARM ISA allows even user mode code to change the current endianness, doing so will be dangerous for an application and is discouraged.  If an exception is generated while in big-endian mode the behavior is unpredictable, but may lead to an application fault (user mode) or bugcheck (kernel mode).

     

    The next bit we are going to discuss is bit 5, the Thumb bit (the T bit).  This should be a 1 if executing Thumb instructions.  So let’s discuss the different instruction sets the ARM processor has.

     

    ARMv7 has four different ISA's for programming. 

    • ARM - basic ARM instruction set including conditional execution.
    • Thumb - This mode uses a 16 bit instruction encoding to reduce code footprint.  It has limitations with respect to register access and some system instructions aren't implemented for Thumb.
    • Thumb2 - This extension of the Thumb instruction set adds 32 bit opcode encodings and adds enough facilities to author an entire OS.  Support for Thumb2 is guaranteed in the ARMv7 architecture revision.
    • Jazelle - Java code interpretation.
    • ThumbEE - a limited version of Thumb2 intended as a code generation target for JIT scenarios.

     

    Windows requires Thumb2 support.  The advantage of using Thumb2 is that the combination of 16 and 32 bit opcodes along with some other ISA improvements allows for saving 20-30% code footprint at a 1-2% performance loss.  In addition the cache hit rate is improved due to increased density of the code.

     

    CPSR Bit 5 should always be 1 as Windows only runs in Thumb2 mode.  Also note that this bit is combined with bit 24, the Java state bit (the J bit).  Bit 24 should always be 0 when running Windows.

     

    The next bits to discuss are the CPU Mode bits 4-0 (M).  Windows only runs in two modes.  They are User Mode (10000) and Supervisor Mode (10011).  If Bits 4-0 are anything other than the indicated values given an exception will be raised.  Kernel will run in Supervisor Mode, and applications will run in User Mode.

     

    That brings up another point.  How does the processor switch between Supervisor Mode and User Mode?  It is called the SVC call.  In the x86 processor this was done via SYSENTER/SYSEXIT.  In x64 processor this was done via SYSCALL/SYSRET.  In ARM this is done via the SVC or Supervisor Call.  This call is made to have the kernel provide a service.  When invoked in ntdll.dll the service number is in r12.  Here is an example:

    1: kd> u ntdll!ZwQueryVolumeInformationFile

    771e8674    f04f0c8d    mov   r12,#0x8D
    771e8678    df01        svc   #1
    771e867a    4770        bx    lr

     

    When SVC is called the previous CPSR register is saved in the SPSR register (the Saved Program Status Register), and pc register is saved in lr register (the Link Register).  The processor then changes to kernel mode (0x13) with interrupts disabled.  The lr and SPSR values are used to generate a return from the SVC call.  When an exception is taken the stack is untouched, the previous mode's SP and LR are left alone, new modes SP becomes active, exception address is stored in the new mode's LR, and the previous CPSR is copied into the new mode's SPSR.  When returning from the exception the SPSR is copied back into the CPSR, and it returns to LR.

     

    Data Types

    ARMv7 processors support four data types from 8 bits to 64 bits, but the definitions are different than the ones in Windows.  In Windows 16 bits are defined as a word, on ARM a word is 32 bits.

    Byte

    8 bits

    HalfWord

    16 bits

    Word

    32 bits

    DoubleWord

    64 bits

     

    These can be signed or unsigned.

    • Unsigned 32 bit integer
    • Signed 32 bit integer
    • Unsigned 16 bit integer (zero extended)
    • Signed 16 bit register (sign extended)
    • Unsigned 8 bit integer (zero extended)
    • Signed 8 bit register (sign extended)
    • Two 16 bit integers
    • Four 8 bit integers
    • The upper or lower 32 bits of a 64 bit signed value whose other half is in another register
    • The upper or lower 32 bits of a 64 bit unsigned value whose other half is in another register

     

    Memory Model

    The ARM memory model is much like other architectures that we have supported.  ARM has a "weak ordering" memory model.  This means that two memory operations that occur in program order, may be observed from another processor or DMA controller in any order.  When an instruction stalls because it is waiting for the result of a preceding instruction, the core can continue executing subsequent instructions that do not need to wait for the unmet dependencies.  There are three instructions that allow you to configure memory barriers:

    • ISB - Instruction Synchronization Barrier
    • DMB - Data Memory Barrier
    • DSB - Data Synchronization Barrier

     

    An excellent blog article on this topic with an explanation of these three instructions is available at:

    http://blogs.arm.com/software-enablement/594-memory-access-ordering-part-3-memory-access-ordering-in-the-arm-architecture/

     

    Alignment and Atomicity

    Windows enables the ARM hardware to handle misaligned integer accesses transparently; however, there are still several situations where alignment faults may be generated on misaligned accesses. Follow the rules below:

    • Halfword and word-sized integer loads and stores do NOT need to be aligned (hardware will handle them efficiently and transparently)
    • Floating-point loads and stores SHOULD be aligned (the kernel will handle them transparently, but with significant overhead)
    • Load/store double (LDRD/STRD) and multiple (LDM/STM) operations SHOULD be aligned (the kernel will handle most of them transparently, but with significant overhead)
    • All uncached memory accesses MUST be aligned, even for integer accesses (you will get an alignment fault)

     

    Note that the memcpy() implementation provided by the Windows CRT presumes the copies are to/from cached memory, and thus leverages the hardware’s support for transparently handling misaligned integer reads and writes with little penalty. This means that memcpy() CANNOT be used when the source or destination is uncached memory. Instead, use the separate function _memcpy_strict_align(), which only performs aligned accesses.

     

    There are two types of atomicity supported.  Single-copy and Multi-copy.

     

    Single-copy atomicity

    There are rules around atomicity that are intended to specify the cases where memory access behavior in relation to program order can be guaranteed.  So certain access (aligned word accesses) are guaranteed by the architecture to return sensible results even if other threads are accessing the same memory.  These rules are necessary in order to guarantee that the programmer (and compiler) can rely on correct behavior of memory in the majority of the cases.

     

    Multi-copy atomicity

    These rules are similar, but relate specifically to multi-processing environments in which several observers may be using a particular item in memory.  To be able to guarantee correct behavior you need to be able to assume that memory behaves in a consistent way.

     

    More on Single-Copy and Multi-Copy atomicity in the ARM Architecture Reference Manual available from http://infocenter.arm.com/help/index.jsp.

     

    Common Assembly Instructions

    We are going to cover some common Thumb2 instructions.

    • ldr           r0, [r4]                  (ldrex, ldrh ldrb, ldrd, ldrexd, etc.)

      This is the Load Register instruction.  In the above example r0 is the destination register, and r4 is the base register.  This will take the address that is in r4, go to that memory location and copy the contents of that memory location into r0.

    • str           r2, [r4, #0x08]                    (strex, strh, strexh, strd, etc.)

      This is the Store Register instruction.  In the above example r2 is the source register, and r4 is the base register.  This will take the address in r4 and add 8 to that address.  It will take the value that is in r2, and store it at the address pointed to by r4 plus 8.

    • mov       r1, r4                                      (movs – sets the condition codes)

      This is the Move instruction.  In the above example r1 is the destination register, and r4 is the source register.  It will do the same thing as x86 in that it just copies what is in r4 to r1.  It can optionally updated the condition flags based on the value.

    • adds      r1, r5, #0                              (add)

      This is the Add instruction.  In the above example r1 is the destination register.  This will take the value that is in r5 and add 0 to it.  It will store the result in r1.  Because this has an (s) at the end of add it will update the flags.

    • sub         sp, sp, #0x14                      (subs)

      This is the Subtract instruction.  In the above example sp is the destination.  This will take the value that is in sp, subtract 14h from it, and store the result in sp. Because this does not have an (s) at the end it will not update the flags.

    • push      {r4-r9, r11, lr}

      This is the Push instruction.  It can push multiple registers to the stack in one instruction.  You can separate a full series of register with the beginning register "-" and ending register like seen above.  You can also list them all, and just separate them by ",".  This operates the same as an x86 processor in that it subtracts 4 from the stack pointer for each push.

    • pop        {r4-r9, r11, lr}

      This is the Pop instruction.  It pulls values from the stack back into the registers you list.  The registers work just like the push instruction.  This operates the same as an x86 processor in that it adds 4 to the stack pointer for each pop.

    • b??         |MyApp!main+0x60 (00b81348)|

      This is the Branch instruction.  This is equivalent to the jmp instruction in x86.  However it has several conditional variants such as "beq, bge, and etc.".

    • bx           r3

      This is the Branch and Exchange instruction.  This causes a branch to an address and instruction set specified by a register (r3 here).  This can do a long branch anywhere in the 32-bit address range.

    • bl            |MyApp!Function (00b815c4)|

      This is the Branch with Link instruction.  This calls a subroutine at a PC-relative address.  This will update the lr register.

    • blx          r3

      This is the Branch with Link and Exchange.  This calls a subroutine at an address and instruction set specified by a register (r3 here).  This will do a long branch anywhere in the 32-bit address range, and update the lr register.

    • dmb      

      This is the Data Memory Barrier instruction.  It is a memory barrier that ensures the ordering of observations of memory accesses.

    • cmp       r3, #0

      This is the Compare instruction.  It will subtract 0 from the value in r3, and set the flags accordingly. 

     

    In ARM addressing the base register points to memory being referenced.  The offset can be an immediate or an index register.  The memory stored at the base register`s address plus the offset is accessed.  The base register remains unchanged.  Example:

    Ldr r5,[r9,#0x1c]

     

    This will take the value that is in r9 and add 0x1C to it, go to that memory location, and retrieve the value there and store it in r5.  R9 will remain the same value.

     

    ARM also has some interesting thing about indexing.  They have Pre-Indexed addressing, Offset Addressing, and Post-Indexed Addressing.

     

    Pre-Indexed addressing the value of the base register is first modified by the offset then the memory pointed to by the modified base register is accessed.  Example:

    Str r2,[r4,#0x4]!

     

    The "!" at the end of the instruction is not a mistake.  This is how you tell it is a Pre-Indexed address. 

     

    Offset Addressing.  The value is added to the base register, and that is used as the address for memory access.  If the "!" was not there then this would just be Offset addressing.  Example:

    Str r2,[r4,#0x4]

     

    Post-Index addressing the memory address in the base register is accessed then afterwards the base register is modified by the offset value.  Example:

    Ldr pc,[sp],0x1c

     

    Notice the "!" is missing here.  Also notice the offset is outside the "[ ]".  That is how you can find a Post-Index.

     

    Part 3 of this series will cover Calling Conventions, Prolog/Epilog, and Rebuilding the stack.

Page 22 of 24 (235 items) «2021222324