My name is Trey Nash and I am an Escalation Engineer on the Core OS team. My experience is as a software developer, and therefore my blog posts tend to be slanted in the direction of helping developers during the feature development, testing and the support phases.
In this installment I would like to expand a bit on a previous post of mine called Challenges of Debugging Optimized x64 Code. In that post I discussed the nuances of the x64 calling convention (thankfully of which there is only one) and how it is used in optimized builds of software. The calling convention is sometimes referred to as the Application Binary Interface (ABI). In this post, I would like to discuss the x64 unwind metadata and how you can use it in the debugger to manually walk a stack.
In some cases, you may have a corrupted stack that the debugger simply cannot effectively walk for you. This often happens because the debugger walks a stack from the top down (assuming the stack grows upwards as if it were a stack of places on a table), and if the stack is sufficiently trashed then the debugger cannot find its bearing. In the x86 world, a large percentage of the time, you can spot the stack frames by following the chain of base pointers and then build a crafty stack backtrace command to display the stack at some point in time. But in the x64 calling convention there is no base pointer. In fact, once a function’s prolog code has executed the rsp register generally never changes until the epilog code. To read more about x64 prolog and epilog code conventions, go here.
Moreover, the syntax for creating a crafty stack backtrace command in the x64 environment is currently undocumented, and I aim to shed some light on that near the end of that blog post. J
For this blog post I have used the following example C# code that requires the .NET 4.0 framework and can be easily built from a Visual Studio 2010 command prompt. You can find the code below:
using System;using System.Numerics;using System.Threading;using System.Threading.Tasks;using System.Collections.Concurrent;class EntryPoint{ const int FactorialsToCompute = 2000; static void Main() { var numbers = new ConcurrentDictionary<BigInteger, BigInteger>(4, FactorialsToCompute); // Create a factorial delegate. Func<BigInteger, BigInteger> factorial = null; factorial = (n) => ( n == 0 ) ? 1 : n * factorial(n-1); // Now compute the factorial of the list // concurrently. Parallel.For( 0, FactorialsToCompute, (i) => { numbers[i] = factorial(i); } ); }}
The spirit of this code is to concurrently compute the first 2000 factorials and store the results in a dictionary. This code uses the new Task Parallel Library to distribute this work evenly across the multiple cores on the system. To compile the example (assuming the code is stored in test.cs), you can execute the following command from a Visual Studio 2010 command prompt:
csc /r:system.numerics.dll test.cs
Note: If you are using a 64bit platform, be sure to use the x64 command prompt shortcut installed by the Visual Studio 2010 installer.You can download a free evaluation of Visual Studio 2010 here.
So how does the debugger and functions such as RtlVirtualUnwind know how to walk the x64 stack if it cannot find a base pointer? The secret is that it uses unwind metadata that is typically baked into the Portable Executable (PE) file at link time. You can inspect this information using the /UNWINDINFO option of the command line tool dumpbin. For example, I went to the directory on my machine which contains clr.dll (c:\Windows\Microsoft.NET\Framework\v4.0.30319) and dumped the unwind info looking for CLREvent::WaitEx, which I have pasted below:
00013F20 000DFDB0 000DFE3C 007267D8 ?WaitEx@CLREvent@@QEAAKKW4WaitMode@@PEAUPendingSync@@@Z (public: unsigned long __cdecl CLREvent::WaitEx(unsigned long,enum WaitMode,struct PendingSync *)) Unwind version: 1 Unwind flags: UHANDLER Size of prologue: 0x20 Count of codes: 10 Unwind codes: 20: SAVE_NONVOL, register=rbp offset=0xB0 1C: SAVE_NONVOL, register=rbx offset=0xA8 0F: ALLOC_SMALL, size=0x70 0B: PUSH_NONVOL, register=r14 09: PUSH_NONVOL, register=r13 07: PUSH_NONVOL, register=r12 05: PUSH_NONVOL, register=rdi 04: PUSH_NONVOL, register=rsi Handler: 0020ADF0 __CxxFrameHandler3 EH Handler Data: 007B3F54
I’ll get into what all of this means shortly.
Note: The dumpbin.exe functionality is also exposed via the linker. For example, the command “dumpbin.exe /?” is identical to “link.exe /dump /?”.
Within the debugger, you can find this same information for a particular function using the .fnent command. To demonstrate, I executed the example code within a windbg instance and broke in at some random point and chose one of the threads to look at which has a stack looking like the following:
12 Id: f80.7f0 Suspend: 1 Teb: 000007ff`fffa0000 Unfrozen # Child-SP RetAddr Call Site00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x7902 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x17003 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf804 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x9806 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb407 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc508 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x38909 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b50b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c
At first glance, it may appear that this stack is already trashed since there is no symbol information for the bottom frames in the display. Before jumping to this conclusion, recall that this is a managed application and therefore contains JIT compiled code. To verify that the addresses without symbol information are JIT’ed code, you can do a couple of things.
First, use the !EEHeap extension in the SOS extension to determine if these addresses reside in the JIT code heap. Below, you can see the commands I used to both load the SOS extension and then display the Execution Engine (EE) Heap information:
0:014> .loadby sos clr0:014> !EEHeap -loaderLoader Heap:--------------------------------------System Domain: 000007fef50955a0LowFrequencyHeap: 000007ff00020000(2000:1000) Size: 0x1000 (4096) bytes.HighFrequencyHeap: 000007ff00022000(8000:1000) Size: 0x1000 (4096) bytes.StubHeap: 000007ff0002a000(2000:2000) Size: 0x2000 (8192) bytes.Virtual Call Stub Heap: IndcellHeap: 000007ff000d0000(6000:1000) Size: 0x1000 (4096) bytes. LookupHeap: 000007ff000dc000(4000:1000) Size: 0x1000 (4096) bytes. ResolveHeap: 000007ff00106000(3a000:1000) Size: 0x1000 (4096) bytes. DispatchHeap: 000007ff000e0000(26000:1000) Size: 0x1000 (4096) bytes. CacheEntryHeap: Size: 0x0 (0) bytes.Total size: Size: 0x8000 (32768) bytes.--------------------------------------Shared Domain: 000007fef5095040LowFrequencyHeap: 000007ff00020000(2000:1000) Size: 0x1000 (4096) bytes.HighFrequencyHeap: 000007ff00022000(8000:1000) Size: 0x1000 (4096) bytes.StubHeap: 000007ff0002a000(2000:2000) Size: 0x2000 (8192) bytes.Virtual Call Stub Heap: IndcellHeap: 000007ff000d0000(6000:1000) Size: 0x1000 (4096) bytes. LookupHeap: 000007ff000dc000(4000:1000) Size: 0x1000 (4096) bytes. ResolveHeap: 000007ff00106000(3a000:1000) Size: 0x1000 (4096) bytes. DispatchHeap: 000007ff000e0000(26000:1000) Size: 0x1000 (4096) bytes. CacheEntryHeap: Size: 0x0 (0) bytes.Total size: Size: 0x8000 (32768) bytes.--------------------------------------Domain 1: 00000000003e73c0LowFrequencyHeap: 000007ff00030000(2000:1000) 000007ff00140000(10000:5000) Size: 0x6000 (24576) bytes total, 0x1000 (4096) bytes wasted.HighFrequencyHeap: 000007ff00032000(8000:5000) Size: 0x5000 (20480) bytes.StubHeap: Size: 0x0 (0) bytes.Virtual Call Stub Heap: IndcellHeap: 000007ff00040000(4000:1000) Size: 0x1000 (4096) bytes. LookupHeap: 000007ff0004b000(2000:1000) Size: 0x1000 (4096) bytes. ResolveHeap: 000007ff0007c000(54000:1000) Size: 0x1000 (4096) bytes. DispatchHeap: 000007ff0004d000(2f000:1000) Size: 0x1000 (4096) bytes. CacheEntryHeap: Size: 0x0 (0) bytes.Total size: Size: 0xf000 (61440) bytes total, 0x1000 (4096) bytes wasted.--------------------------------------Jit code heap:LoaderCodeHeap: 000007ff00150000(40000:2000) Size: 0x2000 (8192) bytes.Total size: Size: 0x2000 (8192) bytes.--------------------------------------Module Thunk heaps:Module 000007fee5581000: Size: 0x0 (0) bytes.Module 000007ff000330d8: Size: 0x0 (0) bytes.Module 000007fef06f1000: Size: 0x0 (0) bytes.Total size: Size: 0x0 (0) bytes.--------------------------------------Module Lookup Table heaps:Module 000007fee5581000: Size: 0x0 (0) bytes.Module 000007ff000330d8: Size: 0x0 (0) bytes.Module 000007fef06f1000: Size: 0x0 (0) bytes.Total size: Size: 0x0 (0) bytes.--------------------------------------Total LoaderHeap size: Size: 0x21000 (135168) bytes total, 0x1000 (4096) bytes wasted.=======================================
I have highlighted the JIT heap information and you can see that the JIT’ed code instruction pointers in the stack fall within this range.
The second sanity check you can perform is to use a variant of the u instruction to confirm that there is a call instruction just prior to that address as shown below:
0:012> ub 0x7ff`0015098c000007ff`0015095e 488b01 mov rax,qword ptr [rcx]000007ff`00150961 48898424b0000000 mov qword ptr [rsp+0B0h],rax000007ff`00150969 488b4108 mov rax,qword ptr [rcx+8]000007ff`0015096d 48898424b8000000 mov qword ptr [rsp+0B8h],rax000007ff`00150975 4c8d8424b0000000 lea r8,[rsp+0B0h]000007ff`0015097d 488b5308 mov rdx,qword ptr [rbx+8]000007ff`00150981 488d8c24c0000000 lea rcx,[rsp+0C0h]000007ff`00150989 ff5318 call qword ptr [rbx+18h]
So at this point we have verified that we probably have a valid stack. But how does the debugger so effectively walk this stack for us if there is no stack frame pointer? The answer, of course, is that it uses the unwind information.
To explore the answer to that question, let’s focus on a particular frame within the stack such as frame 4 in the stack above. The code at that frame is inside the function clr!CLREvent::WaitEx, and if we pass that to .fnent, we get the following output:
0:012> .fnent clr!CLREvent::WaitExDebugger function entry 00000000`04075e40 for:(000007fe`f48bfdb0) clr!CLREvent::WaitEx | (000007fe`f48bfe3c) clr!CLREvent::SetExact matches: clr!CLREvent::WaitEx = <no type information>BeginAddress = 00000000`000dfdb0EndAddress = 00000000`000dfe3cUnwindInfoAddress = 00000000`007267d8Unwind info at 000007fe`f4f067d8, 20 bytes version 1, flags 2, prolog 20, codes a frame reg 0, frame offs 0 handler routine: clr!_CxxFrameHandler3 (000007fe`f49eadf0), data 7b3f54 00: offs 20, unwind op 4, op info 5 UWOP_SAVE_NONVOL FrameOffset: b0 02: offs 1c, unwind op 4, op info 3 UWOP_SAVE_NONVOL FrameOffset: a8 04: offs f, unwind op 2, op info d UWOP_ALLOC_SMALL 05: offs b, unwind op 0, op info e UWOP_PUSH_NONVOL 06: offs 9, unwind op 0, op info d UWOP_PUSH_NONVOL 07: offs 7, unwind op 0, op info c UWOP_PUSH_NONVOL 08: offs 5, unwind op 0, op info 7 UWOP_PUSH_NONVOL 09: offs 4, unwind op 0, op info 6 UWOP_PUSH_NONVOL
Notice that this output is virtually identical to the same information provided by dumpbin using the /UNWINDINFO option.
I have highlighted two interesting values above. The value highlighted in green is a relative virtual address (RVA) to the unwind info that is baked into the PE file by the linker. The value highlighted in yellow is the actual virtual address of the unwind info and can be computed by adding the module base address shown below to the RVA for UnwindInfoAddress.
0:012> lmnm clr
start end module name
000007fe`f47e0000 000007fe`f5145000 clr
By examining the PE header using !dh you can confirm that the unwind information resides in the .rdata section of the module, which I have shown below:
0:012> !dh clrFile Type: DLLFILE HEADER VALUES 8664 machine (X64) 6 number of sections4BA21EEB time date stamp Thu Mar 18 07:39:07 2010<snip>SECTION HEADER #2 .rdata name 1FC8EC virtual size 67F000 virtual address 1FCA00 size of raw data 67E200 file pointer to raw data 0 file pointer to relocation table 0 file pointer to line numbers 0 number of relocations 0 number of line numbers40000040 flags Initialized Data (no align specified) Read Only<snip>
Now let’s take a look at the unwind info and compare it to the prolog code of the function with which it is associated. For convenience, I have reprinted the .fnent output for the function:
The yellow highlighted value tells us that the prolog code for the function is 0x20 bytes in length. Using that information we can dump out the prolog code for the function:
0:012> u clr!CLREvent::WaitEx clr!CLREvent::WaitEx+20clr!CLREvent::WaitEx:000007fe`f48bfdb0 488bc4 mov rax,rsp000007fe`f48bfdb3 56 push rsi000007fe`f48bfdb4 57 push rdi000007fe`f48bfdb5 4154 push r12000007fe`f48bfdb7 4155 push r13000007fe`f48bfdb9 4156 push r14000007fe`f48bfdbb 4883ec70 sub rsp,70h000007fe`f48bfdbf 48c7442440feffffff mov qword ptr [rsp+40h],0FFFFFFFFFFFFFFFEh000007fe`f48bfdc8 48895810 mov qword ptr [rax+10h],rbx000007fe`f48bfdcc 48896818 mov qword ptr [rax+18h],rbp
The list of operations in the unwind info is listed in the reverse order of the operations in the assembly code. Each of the UWOP_PUSH_NONVOL operations in the unwind info maps to a nonvolatile register that is pushed onto the stack for safe keeping in the prolog code. I have highlighted the sections within the prolog and the .fnent output such that highlighting with like colors indicates related information. Now, let’s take a look at the raw stack and tie all of this information together.
Below is the stack with the frame we are focusing on highlighted in yellow:
0:012> kn # Child-SP RetAddr Call Site00 00000000`04a51e18 000007fe`fd4e10ac ntdll!NtWaitForSingleObject+0xa01 00000000`04a51e20 000007fe`f48bffc7 KERNELBASE!WaitForSingleObjectEx+0x7902 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x17003 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf804 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e05 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x9806 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb407 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc508 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x38909 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c0a 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b50b 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf0c 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c0d 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c0e 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c0f 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c10 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c11 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c12 00000000`04a52fa0 000007ff`0015098c 0x7ff`0015098c13 00000000`04a53150 000007ff`0015098c 0x7ff`0015098c
Note: The symbols above look a little weird and may lead you to believe that WaitEx is calling itself recursively, but it is not. It only appears that way because you need the private symbols for clr.dll to be able to see the real function name. Only public symbols are available outside of Microsoft.
And below is the raw stack relevant to this frame with some highlighting and annotations that I have added:
0:012> dps 00000000`04a51f60-10 L2000000000`04a51f50 00000000`0000000100000000`04a51f58 000007fe`f48bfe23 clr!CLREvent::WaitEx+0x5e00000000`04a51f60 00000000`c040238800000000`04a51f68 00000000`c040250000000000`04a51f70 000007fe`f48afaf0 clr!SystemNative::ArrayCopy00000000`04a51f78 00000000`0000018200000000`04a51f80 00000000`04a521d000000000`04a51f88 000007fe`0000000100000000`04a51f90 00000000`0000005700000000`04a51f98 00000000`c040239800000000`04a51fa0 ffffffff`fffffffe00000000`04a51fa8 007f0000`04a521d000000000`04a51fb0 fffff880`009ca54000000000`04a51fb8 000007fe`f483da5b clr!SVR::heap_select::select_heap+0x1c00000000`04a51fc0 fffff880`009ca54000000000`04a51fc8 000007fe`fd4e18aa KERNELBASE!ResetEvent+0xa00000000`04a51fd0 00000000`0043dc6000000000`04a51fd8 00000000`0000017800000000`04a51fe0 00000000`00493c1000000000`04a51fe8 00000000`0043dc60 ß saved rdi00000000`04a51ff0 00000000`00000001 *** call into clr!CLREvent::WaitEx ***00000000`04a51ff8 000007fe`f48d51d8 clr!SVR::gc_heap::wait_for_gc_done+0x9800000000`04a52000 00000000`00493ba000000000`04a52008 00000000`00493ba0 ß saved rbx00000000`04a52010 00000000`00000058 ß saved rbp00000000`04a52018 000007fe`f0711e0f System_Numerics_ni+0x21e0f00000000`04a52020 00000000`0000017800000000`04a52028 000007fe`f4995249 clr!SVR::GCHeap::Alloc+0xb400000000`04a52030 00000000`0043a14000000000`04a52038 00000000`0043dc6000000000`04a52040 00000000`0000000000000000`04a52048 00000000`04a522e0
In the stack listing I have used the same color highlighting scheme as before to show how the data on the raw stack correlates to the unwind data. And, using green highlighting, I have shown how the Child-SP value correlates to the stack frame.
The cyan highlighting represents nonvolatile registers that are pushed onto the stack in the prolog code. The blue highlighting represents stack space reserved for locals and for register home space allocated for calling sub routines. In the unwind data the stack reservation is represented by a UWOP_ALLOC_SMALL operation. And the red highlighting represents nonvolatile registers that are stored in the home space of the previous stack frame and represented by a UWOP_SAVE_NONVOL operation stored in the unwind information.
As you can see, we have all of the information we need in the unwind data to determine which slots on the stack are used for what. The only thing we don’t know is the partitioning of the reserved stack space for locals, which is described by the private symbol information for the clr.dll module.
.fnent produces its output directly from parsing the definition of the UNWIND_INFO structure and it even gives you the address of where that structure lives in memory. The UNWIND_INFO structure also contains a variable amount of UNWIND_CODE structures. You can find details of the structure definitions for UNWIND_INFO and UNWIND_CODE here. Each parsed line of unwind information in the .fnent output is backed by at least one of these structures. In fact, you can see the correlation between the structure fields for UNWIND_INFO and the data in the .fnent output as shown below:
From UNWIND_CODE:
UBYTE Offset in prolog UBYTE: 4 Unwind operation code UBYTE: 4 Operation info
UBYTE
Offset in prolog
UBYTE: 4
Unwind operation code
Operation info
From .fnent:
05: offs b, unwind op 0, op info e UWOP_PUSH_NONVOL
The meaning of the OpInfo (operation info) field is dependent on the UnwindOp (unwind operation code) field and is spelled out in the documentation for UNWIND_CODE. For example, for the UWOP_PUSH_NONVOL operation shown above, the OpInfo field is an index into the following table, which indicates which nonvolatile register this push is associated with. Note that the values in the below table are in decimal, while the .fnent values are in hex:
0
RAX
1
RCX
2
RDX
3
RBX
4
RSP
5
RBP
6
RSI
7
RDI
8 to 15
R8 to R15
Therefore, the previous line from the .fnent output represents a push operation for the r14 register (05: offs b, unwind op 0, op info e UWOP_PUSH_NONVOL). Looking at the assembly above, we see that the topmost UWOP_PUSH_NONVOL operation in the .fnent output correlates to the last nonvolatile register push in the prolog code (push r14).
Note: Remember, the push operations in the .fnent output are listed in the reverse order of where they are in the actual prolog code. This helps the unwind code easily calculate offsets of where they should live in the stack.
One thing that you will notice in the x64 calling convention is that once the prolog code has executed, the value for rsp will very rarely change. The Child-SP value in the stack displayed by the k commands is the value of rsp for that frame after the prolog code has executed. So the offsets to access these nonvolatile registers are then applied to the Child-SP value (previously highlighted in green) to find where they live on the stack. So, in a way, the Child-SP value acts like the base pointer we are used to on the x86 platform.
In the .fnent output above, you will also see the following:
00: offs 20, unwind op 4, op info 5 UWOP_SAVE_NONVOL FrameOffset: b0
For UWOP_SAVE_NONVOL, you see that the .fnent output shows us the offset where we can find this register, and the register in question is represented by the OpInfo value that equates to rbp. The offset above is applied to the Child-SP value (00000000`04a51f60 in this case) to produce the address 00000000`04a52010, which indicates that’s where we can find a saved copy of rbp. I have also annotated where it lives in the raw stack output shown previously.
Note: If you’re wondering why rbp is stored in the previous stack frame, check out my previous post on this topic where I describe how in optimized builds, the compiler can use the home space from the previous stack frame to save nonvolatile registers thus saving them with a MOV operation as opposed to a PUSH operation. This is possible because in optimized builds the home space is not necessarily used to store parameters.
If you have asked this question, then you are definitely paying attention! As we have shown, the compiler and linker are responsible for placing unwind info in the Portable Executable file at build time. But what about dynamic code that is generated at runtime? Certainly there must be unwind information for dynamically compiled code as well, otherwise there would be no way to walk the stack or unwind the stack after an exception.
As it turns out, APIs exist for this very situation, including RtlAddFunctionTable and RtlInstallFunctionTableCallback. In fact, the CLR uses RtlInstallFunctionTableCallback. The generated unwind information is then rooted in a linked list where the head is at ntdll!RtlpDynamicFunctionTable. The format of the linked list items is undocumented as it is an implementation detail, but using dbghelp.dll you can find the unwind information for a given instruction pointer if you so desire by calling SymFunctionTableAccess64.
In fact, if you want to see the CLR adding dynamic unwind info in action you can run the test code above under the debugger, and then at the initial breakpoint, before the application starts running, set the following breakpoint:
bu ntdll!RtlInstallFunctionTableCallback
When you let the application run you should then end up with a call stack at the breakpoint that looks like the following, which clearly shows the JIT compiler adding the unwind info to the table dynamically:
0:000> kn # Child-SP RetAddr Call Site00 00000000`0017dca8 000007fe`f4832cc6 ntdll!RtlInstallFunctionTableCallback01 00000000`0017dcb0 000007fe`f4831422 clr!InstallEEFunctionTable+0x7702 00000000`0017df60 000007fe`f4828ca8 clr!StubLinker::EmitUnwindInfo+0x49203 00000000`0017e050 000007fe`f4832c1a clr!StubLinker::EmitStub+0xe804 00000000`0017e0b0 000007fe`f48328e5 clr!StubLinker::LinkInterceptor+0x1ea05 00000000`0017e160 000007fe`f4831e40 clr!CTPMethodTable::CreateStubForNonVirtualMethod+0xa3506 00000000`0017e300 000007fe`f4832926 clr!CRemotingServices::GetStubForNonVirtualMethod+0x5007 00000000`0017e3c0 000007fe`f48223f3 clr!MethodDesc::DoPrestub+0x38b08 00000000`0017e4d0 000007fe`f47e2d07 clr!PreStubWorker+0x1df09 00000000`0017e590 000007fe`f48210b4 clr!ThePreStubAMD64+0x870a 00000000`0017e660 000007fe`f48211c9 clr!CallDescrWorker+0x840b 00000000`0017e6d0 000007fe`f4821245 clr!CallDescrWorkerWithHandler+0xa90c 00000000`0017e750 000007fe`f4823cf1 clr!MethodDesc::CallDescr+0x2a10d 00000000`0017e9b0 000007fe`f49cdc3d clr!MethodDescCallSite::Call+0x350e 00000000`0017e9f0 000007fe`f4999f0d clr!AppDomain::InitializeDomainContext+0x1ac0f 00000000`0017ebf0 000007fe`f49212a1 clr!SystemDomain::InitializeDefaultDomain+0x13d10 00000000`0017f0c0 000007fe`f4923dd6 clr!SystemDomain::ExecuteMainMethod+0x19111 00000000`0017f670 000007fe`f4923cf3 clr!ExecuteEXE+0x4312 00000000`0017f6d0 000007fe`f49a7365 clr!CorExeMainInternal+0xc413 00000000`0017f740 000007fe`f8ad3309 clr!CorExeMain+0x15
But there is one more wrinkle to this picture. We now know that by using RtlInstallFunctionTableCallback the CLR, or any other JIT engine, can register a callback that provides the unwind information at runtime. But how does the debugger access this information? When the debugger is broken into the process or if you are debugging a dump, it cannot execute the callback function registered with RtlInstallFunctionTableCallback.
This is where the sixth and final parameter to RtlInstallFunctionTableCallback comes into play. By providing the OutOfProcessCallbackDll parameter, the CLR is providing a dll which the debugger can use to effectively parse through the JITer’s unwind information statically. When inspecting which path the CLR passes for OutOfProcessCallbackDll on my machine, I see the following string:
0:000> du /c 80 000007fe`f5916160000007fe`f5916160 "C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll"
So, the debugger uses mscordacwks.dll to statically examine the unwind info while the process is broken in the debugger or while inspecting a dump.
Note: This is one of the many reasons why you must have a complete process dump to effectively post-mortem debug managed applications.
If you look at the documentation for the k command, you’ll see that there is a way to override the base pointer when walking the stack. However, the documentation leaves it a complete mystery as to how to apply this in the x64 world. To demonstrate what I mean, consider the following stack from earlier:
Now, imagine the top of the stack is corrupted, which I have “simulated” by blacking out the top few frames in this stack dump. Furthermore, let’s assume that we identified a frame where the stack starts to look sane again by looking at the raw stack below:
0:012> dps 00000000`04a51e90 00000000`04a51e90 00000000`0000000000000000`04a51e98 00000000`04a5213000000000`04a51ea0 00000000`ffffffff00000000`04a51ea8 00000000`ffffffff00000000`04a51eb0 00000000`0000010800000000`04a51eb8 000007fe`f48bffc7 clr!CLREvent::WaitEx+0x17000000000`04a51ec0 00000000`0000000000000000`04a51ec8 00000000`0000010800000000`04a51ed0 000007fe`0000000000000000`04a51ed8 00000000`0000010800000000`04a51ee0 ffffffff`fffffffe00000000`04a51ee8 00000000`0000000100000000`04a51ef0 00000000`0000000000000000`04a51ef8 000007fe`f48bff70 clr!CLREvent::WaitEx+0xf800000000`04a51f00 00000000`0000000000000000`04a51f08 00000000`00493ba0
From looking at this stack, we can see the typical pattern of stack frames because the return addresses resolve to symbols of sorts.
To dump out the corrupted stack, here is the undocumented syntax for the x64 platform:
k = <rsp> <rip> <frame_count>
<rsp> is the stack pointer to start with. You want to use the stack pointer that would have been in rsp when that function was active. Remember, typically rsp does not change after the function prolog code completes. Therefore, if you pick the stack pointer just below the return address, you should be good.
<rip> should be an instruction pointer from within the function that was executing at the time the <rsp> value above was in play. In this case, the return address directly above <rsp> comes from that function and I have highlighted it in green. This piece of information is critical so that the debugger can find the unwind metadata for the function that was current at this point in the stack. Without it, the debugger cannot walk the stack.
Armed with this information, you can construct a k command to dump the stack starting from this frame as shown below:
0:012> kn = 00000000`04a51ec0 000007fe`f48bffc7 10 # Child-SP RetAddr Call Site00 00000000`04a51ec0 000007fe`f48bff70 clr!CLREvent::WaitEx+0x17001 00000000`04a51f00 000007fe`f48bfe23 clr!CLREvent::WaitEx+0xf802 00000000`04a51f60 000007fe`f48d51d8 clr!CLREvent::WaitEx+0x5e03 00000000`04a52000 000007fe`f4995249 clr!SVR::gc_heap::wait_for_gc_done+0x9804 00000000`04a52030 000007fe`f48aef28 clr!SVR::GCHeap::Alloc+0xb405 00000000`04a520a0 000007fe`f48aecc9 clr!FastAllocatePrimitiveArray+0xc506 00000000`04a52120 000007fe`f071244c clr!JIT_NewArr1+0x38907 00000000`04a522f0 000007fe`f07111b5 System_Numerics_ni+0x2244c08 00000000`04a52330 000007ff`00150acf System_Numerics_ni+0x211b509 00000000`04a523d0 000007ff`0015098c 0x7ff`00150acf0a 00000000`04a52580 000007ff`0015098c 0x7ff`0015098c0b 00000000`04a52730 000007ff`0015098c 0x7ff`0015098c0c 00000000`04a528e0 000007ff`0015098c 0x7ff`0015098c0d 00000000`04a52a90 000007ff`0015098c 0x7ff`0015098c0e 00000000`04a52c40 000007ff`0015098c 0x7ff`0015098c0f 00000000`04a52df0 000007ff`0015098c 0x7ff`0015098c
Note: The frame count in the above k expression is required. That is the way the debugger engine distinguishes between this variant of the command (with an overridden rip) and the documented form of k that does not provide an overridden rip.
Since the x64 calling convention does not utilize a base pointer (among other things), we need some extra information to effectively walk the stack. That extra information comes in the form of unwind metadata and is generated by the compiler and linker for static code and baked into the portable executable file. If you happen to code in assembly language, there are various macros that you must use to decorate your assembly code so that the assembler can generate the proper unwind metadata. For dynamically compiled code, that information is instead provided at runtime by registering a callback with the system. Knowing this information is critical if you encounter a corrupted stack and must piece it together manually. In such situations you’ll need to know how to dig out the unwind metadata manually and use it to effectively reconstruct the call stack.
That said, you could spare yourself some effort and use the undocumented variant of the k command described above to dump the stack starting at any frame. J
Happy debugging everyone!
"The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, places, or events is intended or should be inferred."
Author - Jason Epperly
Workspaces have always been a little confusing to me. I knew how to bend them to do what I needed to get the job done, however they still remained a bit mysterious. Recently I decided to sort this out, just so I knew how they worked under the hood. But before I show you my investigation let's discuss the different types of workspaces. Windbg uses several built-in types including Base, User, Kernel, Remote, Processor Architecture, Per Dump, and Per Executable. It also uses named workspaces (or user defined workspaces). When you perform a particular type of debugging (e.g. live user-mode, post-mortem dump analysis etc.) these workspaces are combined into the final environment. Here's a diagram to illustrate the possible combination of workspaces.
From the diagram you can see windbg typically uses a combination of two workspaces. While live kernel debugging it uses three workspaces.
So what is in a workspace?
All of these settings (except for the blue ones) are applied cumulatively (Base first, then the next workspace, etc). The blue items above are only loaded from the last workspace in the chain. To show this in action I created a simple walk through to illustrate the use of workspaces the debugger.
First I opened windbg without the use of any command line options. When it opens in this dormant state (not attached to anything and has nothing opened) its using the Base workspace. If I don't change anything (e.g. window placement) I am not prompted with any workspace dialogs when I start debugging. However if I moved the debugger's main window to any location (we will call this position 1) followed by executing any of the highlighted operations below -
I am prompted with this dialog-
Choosing "Yes" on the dialog above integrates my changes into the "Base" workspace so window position 1 is now part of my Base workspace.
Now I'm going to select "Open Executable" and browse to our old faithful target binary notepad.exe. Once the binary is opened, windbg uses Base+Notepad (per Executable file). Now I'll move the debugger's main window again (we will call this position 2) and choose the option Debug > Stop debugging. Because of the window location change, I am prompted with the following-
If I choose 'Yes', windbg will use window position 2 for anytime I open the notepad executable in the future. After closing the notepad.exe executable, windbg reverts back to using the Base workspace.
This time I'll actually launch notepad (not from the debugger) and attach to the running notepad.exe process with the debugger. We are now at Base+User-mode. I moved the debugger window (new position 3), selected Debug > Stop Debugging and get prompted with this dialog-
Choosing "Yes" will store WinDbg window position 3 in the User-mode workspace. Once I have completed this step, Windbg is again using the Base workspace because we stopped debugging.
To futher illustrate workspaces I'll attach a to a target Virtual Machine for Kernel Debugging but not break in. Windbg is using Base+Kernel now. I moved the window again and as soon as I break-in I get this dialog-
I chose 'No' on the dialog because I'm getting the hang of things. If I move the window again and type qd (quit and detach) to end the current kernel debug session, I will see this dialog-
So before we ended the session, we were at Base+Kernel+AMD64.
Running through this exercise helped me understand why I usually create a named workspace, change all my settings and use the command line option -W to open my workspace. Hopefully this will clear up some of the complexities involved with workspaces. This is why the debugger help file recommends making all the changes you need at the lowest possible level (i.e. Base first, then the others).
Hope this helps...