• Ntdebugging Blog

    x64 Manual Stack Reconstruction and Stack Walking

    • 2 Comments

    My name is Trey Nash and I am an Escalation Engineer on the Core OS team.  My experience is as a software developer, and therefore my blog posts tend to be slanted in the direction of helping developers during the feature development, testing and the support phases.

    In this installment I would like to expand a bit on a previous post of mine called Challenges of Debugging Optimized x64 Code.  In that post I discussed the nuances of the x64 calling convention (thankfully of which there is only one) and how it is used in optimized builds of software.  The calling convention is sometimes referred to as the Application Binary Interface (ABI).  In this post, I would like to discuss the x64 unwind metadata and how you can use it in the debugger to manually walk a stack.

    In some cases, you may have a corrupted stack that the debugger simply cannot effectively walk for you.  This often happens because the debugger walks a stack from the top down (assuming the stack grows upwards as if it were a stack of places on a table), and if the stack is sufficiently trashed then the debugger cannot find its bearing.  In the x86 world, a large percentage of the time, you can spot the stack frames by following the chain of base pointers and then build a crafty stack backtrace command to display the stack at some point in time.  But in the x64 calling convention there is no base pointer.  In fact, once a function’s prolog code has executed the rsp register generally never changes until the epilog code.  To read more about x64 prolog and epilog code conventions, go here.

    Moreover, the syntax for creating a crafty stack backtrace command in the x64 environment is currently undocumented, and I aim to shed some light on that near the end of that blog post. J

    The Example Code

    For this blog post I have used the following example C# code that requires the .NET 4.0 framework and can be easily built from a Visual Studio 2010 command prompt.  You can find the code below:

    using System;
    using System.Numerics;
    using System.Threading;
    using System.Threading.Tasks;
    using System.Collections.Concurrent;

    class EntryPoint
    {
        const int FactorialsToCompute = 2000;

        static void Main() {
            var numbers = new ConcurrentDictionary<BigInteger, BigInteger>(4, FactorialsToCompute);

            // Create a factorial delegate.
            Func<BigInteger, BigInteger> factorial = null;
            factorial = (n) => ( n == 0 ) ? 1 : n * factorial(n-1);

            // Now compute the factorial of the list
            // concurrently.
            Parallel.For( 0,
                          FactorialsToCompute,
                          (i) => {
                            numbers[i] = factorial(i);
                          } );
        }
    }

    The spirit of this code is to concurrently compute the first 2000 factorials and store the results in a dictionary.  This code uses the new Task Parallel Library to distribute this work evenly across the multiple cores on the system.  To compile the example (assuming the code is stored in test.cs), you can execute the following command from a Visual Studio 2010 command prompt:

    csc /r:system.numerics.dll test.cs

    Note:    If you are using a 64bit platform, be sure to use the x64 command prompt shortcut installed by the Visual Studio 2010 installer.

    You can download a free evaluation of Visual Studio 2010
    here.

    x64 Unwind Metadata

    So how does the debugger and functions such as RtlVirtualUnwind know how to walk the x64 stack if it cannot find a base pointer?  The secret is that it uses unwind metadata that is typically baked into the Portable Executable (PE) file at link time. You can inspect this information using the /UNWINDINFO option of the command line tool dumpbin.  For example, I went to the directory on my machine which contains clr.dll (c:\Windows\Microsoft.NET\Framework\v4.0.30319) and dumped the unwind info looking for CLREvent::WaitEx, which I have pasted below:

      00013F20 000DFDB0 000DFE3C 007267D8  ?WaitEx@CLREvent@@QEAAKKW4WaitMode@@PEAUPendingSync@@@Z (public: unsigned long __cdecl CLREvent::WaitEx(unsigned long,enum WaitMode,struct PendingSync *))
        Unwind version: 1
        Unwind flags: UHANDLER
        Size of prologue: 0x20
        Count of codes: 10
        Unwind codes:
          20: SAVE_NONVOL, register=rbp offset=0xB0
          1C: SAVE_NONVOL, register=rbx offset=0xA8
          0F: ALLOC_SMALL, size=0x70
          0B: PUSH_NONVOL, register=r14
          09: PUSH_NONVOL, register=r13
          07: PUSH_NONVOL, register=r12
          05: PUSH_NONVOL, register=rdi
          04: PUSH_NONVOL, register=rsi
        Handler: 0020ADF0 __CxxFrameHandler3
        EH Handler Data: 007B3F54

    I’ll get into what all of this means shortly.

    Note:    The dumpbin.exe functionality is also exposed via the linker.  For example, the command “dumpbin.exe /?” is identical to “link.exe /dump /?”.

     

    Within the debugger, you can find this same information for a particular function using the .fnent command.  To demonstrate, I executed the example code within a windbg instance and broke in at some random point and chose one of the threads to look at which has a stack looking like the following:

      12  Id: f80.7f0 Suspend: 1 Teb: 000007ff`fffa0000 Unfrozen
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79
    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    At first glance, it may appear that this stack is already trashed since there is no symbol information for the bottom frames in the display.  Before jumping to this conclusion, recall that this is a managed application and therefore contains JIT compiled code.  To verify that the addresses without symbol information are JIT’ed code, you can do a couple of things.

    First, use the !EEHeap extension in the SOS extension to determine if these addresses reside in the JIT code heap.  Below, you can see the commands I used to both load the SOS extension and then display the Execution Engine (EE) Heap information:

    0:014> .loadby sos clr

    0:014> !EEHeap -loader
    Loader Heap:
    --------------------------------------
    System Domain:     000007fef50955a0
    LowFrequencyHeap:  000007ff00020000(2000:1000) Size: 0x1000 (4096) bytes.
    HighFrequencyHeap: 000007ff00022000(8000:1000) Size: 0x1000 (4096) bytes.
    StubHeap:          000007ff0002a000(2000:2000) Size: 0x2000 (8192) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff000d0000(6000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff000dc000(4000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff00106000(3a000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff000e0000(26000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0x8000 (32768) bytes.
    --------------------------------------
    Shared Domain:     000007fef5095040
    LowFrequencyHeap:  000007ff00020000(2000:1000) Size: 0x1000 (4096) bytes.
    HighFrequencyHeap: 000007ff00022000(8000:1000) Size: 0x1000 (4096) bytes.
    StubHeap:          000007ff0002a000(2000:2000) Size: 0x2000 (8192) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff000d0000(6000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff000dc000(4000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff00106000(3a000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff000e0000(26000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0x8000 (32768) bytes.
    --------------------------------------
    Domain 1:          00000000003e73c0
    LowFrequencyHeap:  000007ff00030000(2000:1000) 000007ff00140000(10000:5000) Size: 0x6000 (24576) bytes total, 0x1000 (4096) bytes wasted.
    HighFrequencyHeap: 000007ff00032000(8000:5000) Size: 0x5000 (20480) bytes.
    StubHeap:          Size: 0x0 (0) bytes.
    Virtual Call Stub Heap:
      IndcellHeap:     000007ff00040000(4000:1000) Size: 0x1000 (4096) bytes.
      LookupHeap:      000007ff0004b000(2000:1000) Size: 0x1000 (4096) bytes.
      ResolveHeap:     000007ff0007c000(54000:1000) Size: 0x1000 (4096) bytes.
      DispatchHeap:    000007ff0004d000(2f000:1000) Size: 0x1000 (4096) bytes.
      CacheEntryHeap:  Size: 0x0 (0) bytes.
    Total size:        Size: 0xf000 (61440) bytes total, 0x1000 (4096) bytes wasted.
    --------------------------------------
    Jit code heap:
    LoaderCodeHeap:    000007ff00150000(40000:2000) Size: 0x2000 (8192) bytes.
    Total size:        Size: 0x2000 (8192) bytes.

    --------------------------------------
    Module Thunk heaps:
    Module 000007fee5581000: Size: 0x0 (0) bytes.
    Module 000007ff000330d8: Size: 0x0 (0) bytes.
    Module 000007fef06f1000: Size: 0x0 (0) bytes.
    Total size:      Size: 0x0 (0) bytes.
    --------------------------------------
    Module Lookup Table heaps:
    Module 000007fee5581000: Size: 0x0 (0) bytes.
    Module 000007ff000330d8: Size: 0x0 (0) bytes.
    Module 000007fef06f1000: Size: 0x0 (0) bytes.
    Total size:      Size: 0x0 (0) bytes.
    --------------------------------------
    Total LoaderHeap size:   Size: 0x21000 (135168) bytes total, 0x1000 (4096) bytes wasted.

    =======================================

    I have highlighted the JIT heap information and you can see that the JIT’ed code instruction pointers in the stack fall within this range.

    The second sanity check you can perform is to use a variant of the u instruction to confirm that there is a call instruction just prior to that address as shown below:

    0:012> ub 0x7ff`0015098c
    000007ff`0015095e 488b01          mov     rax,qword ptr [rcx]
    000007ff`00150961 48898424b0000000 mov     qword ptr [rsp+0B0h],rax
    000007ff`00150969 488b4108        mov     rax,qword ptr [rcx+8]
    000007ff`0015096d 48898424b8000000 mov     qword ptr [rsp+0B8h],rax
    000007ff`00150975 4c8d8424b0000000 lea     r8,[rsp+0B0h]
    000007ff`0015097d 488b5308        mov     rdx,qword ptr [rbx+8]
    000007ff`00150981 488d8c24c0000000 lea     rcx,[rsp+0C0h]
    000007ff`00150989 ff5318          call    qword ptr [rbx+18h]

    So at this point we have verified that we probably have a valid stack.  But how does the debugger so effectively walk this stack for us if there is no stack frame pointer?  The answer, of course, is that it uses the unwind information.

    To explore the answer to that question, let’s focus on a particular frame within the stack such as frame 4 in the stack above.  The code at that frame is inside the function clr!CLREvent::WaitEx, and if we pass that to .fnent, we get the following output:

    0:012> .fnent clr!CLREvent::WaitEx
    Debugger function entry 00000000`04075e40 for:
    (000007fe`f48bfdb0)   clr!CLREvent::WaitEx   |  (000007fe`f48bfe3c)   clr!CLREvent::Set
    Exact matches:
        clr!CLREvent::WaitEx = <no type information>

    BeginAddress      = 00000000`000dfdb0
    EndAddress        = 00000000`000dfe3c
    UnwindInfoAddress = 00000000`007267d8

    Unwind info at 000007fe`f4f067d8, 20 bytes
      version 1, flags 2, prolog 20, codes a
      frame reg 0, frame offs 0
      handler routine: clr!_CxxFrameHandler3 (000007fe`f49eadf0), data 7b3f54
      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0
      02: offs 1c, unwind op 4, op info 3    UWOP_SAVE_NONVOL FrameOffset: a8
      04: offs f, unwind op 2, op info d     UWOP_ALLOC_SMALL
      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL
      06: offs 9, unwind op 0, op info d     UWOP_PUSH_NONVOL
      07: offs 7, unwind op 0, op info c     UWOP_PUSH_NONVOL
      08: offs 5, unwind op 0, op info 7     UWOP_PUSH_NONVOL
      09: offs 4, unwind op 0, op info 6     UWOP_PUSH_NONVOL

     Notice that this output is virtually identical to the same information provided by dumpbin using the /UNWINDINFO option.

    I have highlighted two interesting values above.  The value highlighted in green is a relative virtual address (RVA) to the unwind info that is baked into the PE file by the linker.  The value highlighted in yellow is the actual virtual address of the unwind info and can be computed by adding the module base address shown below to the RVA for UnwindInfoAddress.

    0:012> lmnm clr

    start             end                 module name

    000007fe`f47e0000 000007fe`f5145000   clr

    By examining the PE header using !dh you can confirm that the unwind information resides in the .rdata section of the module, which I have shown below:

    0:012> !dh clr

    File Type: DLL
    FILE HEADER VALUES
        8664 machine (X64)
           6 number of sections
    4BA21EEB time date stamp Thu Mar 18 07:39:07 2010

    <snip>
    SECTION HEADER #2
      .rdata name
      1FC8EC virtual size
      67F000 virtual address
      1FCA00 size of raw data
      67E200 file pointer to raw data
           0 file pointer to relocation table
           0 file pointer to line numbers
           0 number of relocations
           0 number of line numbers
    40000040 flags
             Initialized Data
             (no align specified)
             Read Only
    <snip>

    Using the Unwind Info

    Now let’s take a look at the unwind info and compare it to the prolog code of the function with which it is associated.  For convenience, I have reprinted the .fnent output for the function:

    0:012> .fnent clr!CLREvent::WaitEx
    Debugger function entry 00000000`04075e40 for:
    (000007fe`f48bfdb0)   clr!CLREvent::WaitEx   |  (000007fe`f48bfe3c)   clr!CLREvent::Set
    Exact matches:
        clr!CLREvent::WaitEx = <no type information>

    BeginAddress      = 00000000`000dfdb0
    EndAddress        = 00000000`000dfe3c
    UnwindInfoAddress = 00000000`007267d8

    Unwind info at 000007fe`f4f067d8, 20 bytes
      version 1, flags 2, prolog 20, codes a
      frame reg 0, frame offs 0
      handler routine: clr!_CxxFrameHandler3 (000007fe`f49eadf0), data 7b3f54
      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0
      02: offs 1c, unwind op 4, op info 3    UWOP_SAVE_NONVOL FrameOffset: a8
      04: offs f, unwind op 2, op info d     UWOP_ALLOC_SMALL
      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL
      06: offs 9, unwind op 0, op info d     UWOP_PUSH_NONVOL
      07: offs 7, unwind op 0, op info c     UWOP_PUSH_NONVOL
      08: offs 5, unwind op 0, op info 7     UWOP_PUSH_NONVOL
      09: offs 4, unwind op 0, op info 6     UWOP_PUSH_NONVOL

    The yellow highlighted value tells us that the prolog code for the function is 0x20 bytes in length. Using that information we can dump out the prolog code for the function:

    0:012> u clr!CLREvent::WaitEx clr!CLREvent::WaitEx+20
    clr!CLREvent::WaitEx:
    000007fe`f48bfdb0 488bc4          mov     rax,rsp
    000007fe`f48bfdb3 56              push    rsi
    000007fe`f48bfdb4 57              push    rdi
    000007fe`f48bfdb5 4154            push    r12
    000007fe`f48bfdb7 4155            push    r13
    000007fe`f48bfdb9 4156            push    r14
    000007fe`f48bfdbb 4883ec70        sub     rsp,70h
    000007fe`f48bfdbf 48c7442440feffffff mov   qword ptr [rsp+40h],0FFFFFFFFFFFFFFFEh
    000007fe`f48bfdc8 48895810        mov     qword ptr [rax+10h],rbx
    000007fe`f48bfdcc 48896818        mov     qword ptr [rax+18h],rbp

    The list of operations in the unwind info is listed in the reverse order of the operations in the assembly code.  Each of the UWOP_PUSH_NONVOL operations in the unwind info maps to a nonvolatile register that is pushed onto the stack for safe keeping in the prolog code.  I have highlighted the sections within the prolog and the .fnent output such that highlighting with like colors indicates related information.  Now, let’s take a look at the raw stack and tie all of this information together. 

    Below is the stack with the frame we are focusing on highlighted in yellow:

    0:012> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79
    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    Note:    The symbols above look a little weird and may lead you to believe that WaitEx is calling itself recursively, but it is not.  It only appears that way because you need the private symbols for clr.dll to be able to see the real function name.  Only public symbols are available outside of Microsoft.

     

    And below is the raw stack relevant to this frame with some highlighting and annotations that I have added:

    0:012> dps 00000000`04a51f60-10 L20
    00000000`04a51f50  00000000`00000001
    00000000`04a51f58  000007fe`f48bfe23 clr!CLREvent::WaitEx+0x5e
    00000000`04a51f60  00000000`c0402388
    00000000`04a51f68  00000000`c0402500
    00000000`04a51f70  000007fe`f48afaf0 clr!SystemNative::ArrayCopy
    00000000`04a51f78  00000000`00000182
    00000000`04a51f80  00000000`04a521d0
    00000000`04a51f88  000007fe`00000001
    00000000`04a51f90  00000000`00000057
    00000000`04a51f98  00000000`c0402398
    00000000`04a51fa0  ffffffff`fffffffe
    00000000`04a51fa8  007f0000`04a521d0
    00000000`04a51fb0  fffff880`009ca540
    00000000`04a51fb8  000007fe`f483da5b clr!SVR::heap_select::select_heap+0x1c
    00000000`04a51fc0  fffff880`009ca540
    00000000`04a51fc8  000007fe`fd4e18aa KERNELBASE!ResetEvent+0xa
    00000000`04a51fd0  00000000`0043dc60
    00000000`04a51fd8  00000000`00000178
    00000000`04a51fe0  00000000`00493c10
    00000000`04a51fe8  00000000`0043dc60    
    ß saved rdi
    00000000`04a51ff0  00000000`00000001

       *** call into clr!CLREvent::WaitEx ***

    00000000`04a51ff8  000007fe`f48d51d8 clr!SVR::gc_heap::wait_for_gc_done+0x98
    00000000`04a52000  00000000`00493ba0
    00000000`04a52008  00000000`00493ba0    
    ß saved rbx
    00000000`04a52010  00000000`00000058    
    ß saved rbp
    00000000`04a52018  000007fe`f0711e0f System_Numerics_ni+0x21e0f
    00000000`04a52020  00000000`00000178
    00000000`04a52028  000007fe`f4995249 clr!SVR::GCHeap::Alloc+0xb4
    00000000`04a52030  00000000`0043a140
    00000000`04a52038  00000000`0043dc60
    00000000`04a52040  00000000`00000000
    00000000`04a52048  00000000`04a522e0

    In the stack listing I have used the same color highlighting scheme as before to show how the data on the raw stack correlates to the unwind data.  And, using green highlighting, I have shown how the Child-SP value correlates to the stack frame.

    The cyan highlighting represents nonvolatile registers that are pushed onto the stack in the prolog code.  The blue highlighting represents stack space reserved for locals and for register home space allocated for calling sub routines.  In the unwind data the stack reservation is represented by a UWOP_ALLOC_SMALL operation.  And the red highlighting represents nonvolatile registers that are stored in the home space of the previous stack frame and represented by a UWOP_SAVE_NONVOL operation stored in the unwind information.

    As you can see, we have all of the information we need in the unwind data to determine which slots on the stack are used for what.  The only thing we don’t know is the partitioning of the reserved stack space for locals, which is described by the private symbol information for the clr.dll module.

    Tying it all Together

    .fnent produces its output directly from parsing the definition of the UNWIND_INFO structure and it even gives you the address of where that structure lives in memory.  The UNWIND_INFO structure also contains a variable amount of UNWIND_CODE structures.  You can find details of the structure definitions for UNWIND_INFO and UNWIND_CODE here.  Each parsed line of unwind information in the .fnent output is backed by at least one of these structures.  In fact, you can see the correlation between the structure fields for UNWIND_INFO and the data in the .fnent output as shown below:

    From UNWIND_CODE:

    UBYTE

    Offset in prolog

    UBYTE: 4

    Unwind operation code

    UBYTE: 4

    Operation info

    From .fnent:

      05: offs b, unwind op 0, op info e     UWOP_PUSH_NONVOL

    The meaning of the OpInfo (operation info) field is dependent on the UnwindOp (unwind operation code) field and is spelled out in the documentation for UNWIND_CODE.  For example, for the UWOP_PUSH_NONVOL operation shown above, the OpInfo field is an index into the following table, which indicates which nonvolatile register this push is associated with.  Note that the values in the below table are in decimal, while the .fnent values are in hex:

    0

    RAX

    1

    RCX

    2

    RDX

    3

    RBX

    4

    RSP

    5

    RBP

    6

    RSI

    7

    RDI

    8 to 15

    R8 to R15

    Therefore, the previous line from the .fnent output represents a push operation for the r14 register (05: offs b, unwind op 0, op info e   UWOP_PUSH_NONVOL).  Looking at the assembly above, we see that the topmost UWOP_PUSH_NONVOL operation in the .fnent output correlates to the last nonvolatile register push in the prolog code (push    r14).

    Note:    Remember, the push operations in the .fnent output are listed in the reverse order of where they are in the actual prolog code.  This helps the unwind code easily calculate offsets of where they should live in the stack.

     

    One thing that you will notice in the x64 calling convention is that once the prolog code has executed, the value for rsp will very rarely change.  The Child-SP value in the stack displayed by the k commands is the value of rsp for that frame after the prolog code has executed.  So the offsets to access these nonvolatile registers are then applied to the Child-SP value (previously highlighted in green) to find where they live on the stack.  So, in a way, the Child-SP value acts like the base pointer we are used to on the x86 platform.

    In the .fnent output above, you will also see the following:

      00: offs 20, unwind op 4, op info 5    UWOP_SAVE_NONVOL FrameOffset: b0

    For UWOP_SAVE_NONVOL, you see that the .fnent output shows us the offset where we can find this register, and the register in question is represented by the OpInfo value that equates to rbp.  The offset above is applied to the Child-SP value (00000000`04a51f60 in this case) to produce the address 00000000`04a52010, which indicates that’s where we can find a saved copy of rbp.  I have also annotated where it lives in the raw stack output shown previously.

    Note:    If you’re wondering why rbp is stored in the previous stack frame, check out my previous post on this topic where I describe how in optimized builds, the compiler can use the home space from the previous stack frame to save nonvolatile registers thus saving them with a MOV operation as opposed to a PUSH operation.  This is possible because in optimized builds the home space is not necessarily used to store parameters.

     

    So how does all of This Work for CLR JIT Code?

    If you have asked this question, then you are definitely paying attention!  As we have shown, the compiler and linker are responsible for placing unwind info in the Portable Executable file at build time.  But what about dynamic code that is generated at runtime?  Certainly there must be unwind information for dynamically compiled code as well, otherwise there would be no way to walk the stack or unwind the stack after an exception.

    As it turns out, APIs exist for this very situation, including RtlAddFunctionTable  and RtlInstallFunctionTableCallback.  In fact, the CLR uses RtlInstallFunctionTableCallback.  The generated unwind information is then rooted in a linked list where the head is at ntdll!RtlpDynamicFunctionTable.  The format of the linked list items is undocumented as it is an implementation detail, but using dbghelp.dll you can find the unwind information for a given instruction pointer if you so desire by calling SymFunctionTableAccess64.

    In fact, if you want to see the CLR adding dynamic unwind info in action you can run the test code above under the debugger, and then at the initial breakpoint, before the application starts running, set the following breakpoint:

    bu ntdll!RtlInstallFunctionTableCallback

    When you let the application run you should then end up with a call stack at the breakpoint that looks like the following, which clearly shows the JIT compiler adding the unwind info to the table dynamically:

    0:000> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`0017dca8 000007fe`f4832cc6 ntdll!RtlInstallFunctionTableCallback
    01 00000000`0017dcb0 000007fe`f4831422 clr!InstallEEFunctionTable+0x77
    02 00000000`0017df60 000007fe`f4828ca8 clr!StubLinker::EmitUnwindInfo+0x492
    03 00000000`0017e050 000007fe`f4832c1a clr!StubLinker::EmitStub+0xe8
    04 00000000`0017e0b0 000007fe`f48328e5 clr!StubLinker::LinkInterceptor+0x1ea
    05 00000000`0017e160 000007fe`f4831e40 clr!CTPMethodTable::CreateStubForNonVirtualMethod+0xa35
    06 00000000`0017e300 000007fe`f4832926 clr!CRemotingServices::GetStubForNonVirtualMethod+0x50
    07 00000000`0017e3c0 000007fe`f48223f3 clr!MethodDesc::DoPrestub+0x38b
    08 00000000`0017e4d0 000007fe`f47e2d07 clr!PreStubWorker+0x1df
    09 00000000`0017e590 000007fe`f48210b4 clr!ThePreStubAMD64+0x87
    0a 00000000`0017e660 000007fe`f48211c9 clr!CallDescrWorker+0x84
    0b 00000000`0017e6d0 000007fe`f4821245 clr!CallDescrWorkerWithHandler+0xa9
    0c 00000000`0017e750 000007fe`f4823cf1 clr!MethodDesc::CallDescr+0x2a1
    0d 00000000`0017e9b0 000007fe`f49cdc3d clr!MethodDescCallSite::Call+0x35
    0e 00000000`0017e9f0 000007fe`f4999f0d clr!AppDomain::InitializeDomainContext+0x1ac
    0f 00000000`0017ebf0 000007fe`f49212a1 clr!SystemDomain::InitializeDefaultDomain+0x13d
    10 00000000`0017f0c0 000007fe`f4923dd6 clr!SystemDomain::ExecuteMainMethod+0x191
    11 00000000`0017f670 000007fe`f4923cf3 clr!ExecuteEXE+0x43
    12 00000000`0017f6d0 000007fe`f49a7365 clr!CorExeMainInternal+0xc4
    13 00000000`0017f740 000007fe`f8ad3309 clr!CorExeMain+0x15

    But there is one more wrinkle to this picture.  We now know that by using RtlInstallFunctionTableCallback the CLR, or any other JIT engine, can register a callback that provides the unwind information at runtime.  But how does the debugger access this information?  When the debugger is broken into the process or if you are debugging a dump, it cannot execute the callback function registered with RtlInstallFunctionTableCallback.

    This is where the sixth and final parameter to RtlInstallFunctionTableCallback comes into play.  By providing the OutOfProcessCallbackDll parameter, the CLR is providing a dll which the debugger can use to effectively parse through the JITer’s unwind information statically.  When inspecting which path the CLR passes for OutOfProcessCallbackDll on my machine, I see the following string:

    0:000> du /c 80 000007fe`f5916160
    000007fe`f5916160  "C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll"

    So, the debugger uses mscordacwks.dll to statically examine the unwind info while the process is broken in the debugger or while inspecting a dump.

    Note:    This is one of the many reasons why you must have a complete process dump to effectively post-mortem debug managed applications.

    Using the ‘k =’ Command to Dump the Stack

    If you look at the documentation for the k command, you’ll see that there is a way to override the base pointer when walking the stack.  However, the documentation leaves it a complete mystery as to how to apply this in the x64 world.  To demonstrate what I mean, consider the following stack from earlier:

    0:012> kn
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa
    01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x79

    02 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    03 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    04 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    06 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    07 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    08 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    09 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    0b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
    12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c
    13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c

    Now, imagine the top of the stack is corrupted, which I have “simulated” by blacking out the top few frames in this stack dump.  Furthermore, let’s assume that we identified a frame where the stack starts to look sane again by looking at the raw stack below:

    0:012> dps 00000000`04a51e90 
    00000000`04a51e90  00000000`00000000
    00000000`04a51e98  00000000`04a52130
    00000000`04a51ea0  00000000`ffffffff
    00000000`04a51ea8  00000000`ffffffff
    00000000`04a51eb0  00000000`00000108
    00000000`04a51eb8  000007fe`f48bffc7 clr!CLREvent::WaitEx+0x170
    00000000`04a51ec0  00000000`00000000
    00000000`04a51ec8  00000000`00000108
    00000000`04a51ed0  000007fe`00000000
    00000000`04a51ed8  00000000`00000108
    00000000`04a51ee0  ffffffff`fffffffe
    00000000`04a51ee8  00000000`00000001
    00000000`04a51ef0  00000000`00000000
    00000000`04a51ef8  000007fe`f48bff70 clr!CLREvent::WaitEx+0xf8
    00000000`04a51f00  00000000`00000000
    00000000`04a51f08  00000000`00493ba0

    From looking at this stack, we can see the typical pattern of stack frames because the return addresses resolve to symbols of sorts.

    To dump out the corrupted stack, here is the undocumented syntax for the x64 platform:

    k = <rsp> <rip> <frame_count>

    <rsp> is the stack pointer to start with.  You want to use the stack pointer that would have been in rsp when that function was active.  Remember, typically rsp does not change after the function prolog code completes.  Therefore, if you pick the stack pointer just below the return address, you should be good.

    <rip> should be an instruction pointer from within the function that was executing at the time the <rsp> value above was in play.  In this case, the return address directly above <rsp> comes from that function and I have highlighted it in green.  This piece of information is critical so that the debugger can find the unwind metadata for the function that was current at this point in the stack.  Without it, the debugger cannot walk the stack.

    Armed with this information, you can construct a k command to dump the stack starting from this frame as shown below:

    0:012> kn = 00000000`04a51ec0 000007fe`f48bffc7 10
     # Child-SP          RetAddr           Call Site
    00 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x170
    01 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf8
    02 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e
    03 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x98
    04 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb4
    05 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc5
    06 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x389
    07 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c
    08 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b5
    09 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf
    0a 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c
    0b 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c
    0c 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c
    0d 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c
    0e 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c
    0f 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c

    Note:    The frame count in the above k expression is required.  That is the way the debugger engine distinguishes between this variant of the command (with an overridden rip) and the documented form of k that does not provide an overridden rip.

    Conclusion

    Since the x64 calling convention does not utilize a base pointer (among other things), we need some extra information to effectively walk the stack.  That extra information comes in the form of unwind metadata and is generated by the compiler and linker for static code and baked into the portable executable file.  If you happen to code in assembly language, there are various macros that you must use to decorate your assembly code so that the assembler can generate the proper unwind metadata.  For dynamically compiled code, that information is instead provided at runtime by registering a callback with the system.  Knowing this information is critical if you encounter a corrupted stack and must piece it together manually.  In such situations you’ll need to know how to dig out the unwind metadata manually and use it to effectively reconstruct the call stack.

    That said, you could spare yourself some effort and use the undocumented variant of the k command described above to dump the stack starting at any frame. J

    Happy debugging everyone!

    "The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious.  No association with any real company, organization, product, domain name, email address, logo, person, places, or events is intended or should be inferred."

  • Ntdebugging Blog

    Uncovering How Workspaces Work in WinDbg

    • 1 Comments

    Author - Jason Epperly

    Workspaces have always been a little confusing to me. I knew how to bend them to do what I needed to get the job done, however they still remained a bit mysterious. Recently I decided to sort this out, just so I knew how they worked under the hood. But before I show you my investigation let's discuss the different types of workspaces. Windbg uses several built-in types including Base, User, Kernel, Remote, Processor Architecture, Per Dump, and Per Executable. It also uses named workspaces (or user defined workspaces). When you perform a particular type of debugging (e.g. live user-mode, post-mortem dump analysis etc.) these workspaces are combined into the final environment. Here's a diagram to illustrate the possible combination of workspaces.

    image

    • The green line is the case where WinDbg is used to open a dump file. In this scenario the Base workspace + per dump workspace is used. Note: per dump simply means each individual dump file opened gets its own workspace.
    • The blue line is the scenario where WinDbg is used to live debug a running application, using the Base+User-mode workspaces.
    • The orange line is an example of WinDbg used to perform a live kernel debug on an x86 machine. In this case windbg is using Base+Kernel+x86 workspaces.

    From the diagram you can see windbg typically uses a combination of two workspaces. While live kernel debugging it uses three workspaces.

    So what is in a workspace?

    • Session Information
      • All break point (bp's) and handling information for exceptions and events (sx settings).
      • All open source files.
      • All user-defined aliases.
    • Configuration settings
      • The symbol path.
      • The executable image path.
      • The source path.
      • The current source options that were set with l+, l- (Set Source Options).
      • Log file settings.
      • The COM or 1394 kernel connection settings, if the connection was started by using the graphical interface.
      • The most recent paths in each Open dialog box (except for the workspace file and text file paths, which are not saved).
      • The current .enable_unicode, .force_radix_output, and .enable_long_status settings.
    • WinDbg graphical interface.
      • The title of the WinDbg window
      • The Automatically Open Disassembly setting
      • The default font
      • The size and position of the WinDbg window on the desktop.
      • Which debugging information windows are open.
      • The size and position of each open window, including the window's size, its floating or docked status, whether it is tabbed with other windows, and all of the related settings in its shortcut menu.
      • The location of the pane boundary in the Debugger Command window and the word wrap setting in that window.
      • Whether the toolbar and status bar, and the individual toolbars on each debugging information window, are visible.
      • The customization of the Registers window.
      • The flags in the Calls window, Locals window, and Watch window.
      • The items that were viewed in the Watch window.
      • The cursor location in each Source window.

    All of these settings (except for the blue ones) are applied cumulatively (Base first, then the next workspace, etc). The blue items above are only loaded from the last workspace in the chain. To show this in action I created a simple walk through to illustrate the use of workspaces the debugger.

    First I opened windbg without the use of any command line options. When it opens in this dormant state (not attached to anything and has nothing opened) its using the Base workspace. If I don't change anything (e.g. window placement) I am not prompted with any workspace dialogs when I start debugging. However if I moved the debugger's main window to any location (we will call this position 1) followed by executing any of the highlighted operations below -

    image

    I am prompted with this dialog-

    image

    Choosing "Yes" on the dialog above integrates my changes into the "Base" workspace so window position 1 is now part of my Base workspace.

    Now I'm going to select "Open Executable" and browse to our old faithful target binary notepad.exe. Once the binary is opened, windbg uses Base+Notepad (per Executable file). Now I'll move the debugger's main window again (we will call this position 2) and choose the option Debug > Stop debugging. Because of the window location change, I am prompted with the following-

    image

    If I choose 'Yes', windbg will use window position 2 for anytime I open the notepad executable in the future. After closing the notepad.exe executable, windbg reverts back to using the Base workspace.

    This time I'll actually launch notepad (not from the debugger) and attach to the running notepad.exe process with the debugger. We are now at Base+User-mode. I moved the debugger window (new position 3), selected Debug > Stop Debugging and get prompted with this dialog-

    image

    Choosing "Yes" will store WinDbg window position 3 in the User-mode workspace. Once I have completed this step, Windbg is again using the Base workspace because we stopped debugging.

    To futher illustrate workspaces I'll attach a to a target Virtual Machine for Kernel Debugging but not break in. Windbg is using Base+Kernel now. I moved the window again and as soon as I break-in I get this dialog-

    image

    I chose 'No' on the dialog because I'm getting the hang of things. If I move the window again and type qd (quit and detach) to end the current kernel debug session, I will see this dialog-

    image

    So before we ended the session, we were at Base+Kernel+AMD64.

    Running through this exercise helped me understand why I usually create a named workspace, change all my settings and use the command line option -W to open my workspace. Hopefully this will clear up some of the complexities involved with workspaces. This is why the debugger help file recommends making all the changes you need at the lowest possible level (i.e. Base first, then the others).

    Hope this helps...

Page 1 of 1 (2 items)