• Ntdebugging Blog

    Remote Debugging connecting to a Remote Stub using the Microsoft Debugging Tools for Windows


    The Microsoft Debugging Tools for Windows provide a couple ways to create a remote debugging connection including "Connecting to a remote session" and "Connecting to a remote stub". Connecting to a remote session is the most commonly used of the two, and is all you typically need, but it does not allow the client to access debugging symbols missing on the remote.

    Jeff’s blog post here has more information on some of the intricacies of debugging a remote session. Often a remote debugging session will be done within a corporate network where all users have access to symbol information, but if we are debugging outside of the secure environment where we cannot share symbols freely, debugging becomes more difficult .

    The Microsoft debugging tools package has two server side command line applications to help us do this by connecting to a remote stub: Dbgsrv.exe for user mode debugging and Kdsrv.exe for kernel mode debugging. They provide a thin server layer for the debugger to attach with full use of local symbols and source code.

    Conventional remote debug session

    Let’s illustrate the benefits of using remote stubs by showing the problems encountered when connecting to a remote session. 


    This diagram shows the conventional remote debug session. Both the debugging host and target are running windbg, though we could use KD or CDB. The goal is to debug a buggy application called Getif.exe which runs on the debugging target. We have symbols for the application on the debugging host however for security reasons the debugging target does not have symbol access to Getif.exe. On both machines we have Windows internet symbols loaded from the http://msdl.microsoft.com/download/symbols.


    Here is the symbol path on the debugging target machine. Because this system is outside our secure environment it does not have symbol access for the Getif.exe application. This will be problematic for debugging.



    Here is the symbol path on the debugging host which includes a path to the symbols for the application because it’s inside the secure location.



    To create the remote debugging server on the target machine, I specify the local port to use for the connection by using the .server command:

    .server tcp:port=<localport>



    Next I select “Connect to Remote Session” on the debugging host.



    Then enter the port and server name:



    And this is where we run into the headache! Because we don’t have Getif.exe symbol access on the machine located outside of the secure environment, the stacks for getif.exe don’t display symbolic information. It makes it hard to debug the application because we can’t dump data types or view other symbolic info.


    If we want to debug Getif.exe with symbols, we need to copy the symbol files to the debug target machine and add the location to the target’s symbol path. In our scenario this may not be possible because symbols are often restricted to in-house use and not allowed to be copied freely.



    Dbgsrv to the rescue!!


    By activating Dbgsrv on the debugging target machine, the debugger on the debugging host connects with full symbol access to the user mode process on the target. The diagram above illustrates the process.

    Let's run through an example using dbgsrv on the debugging target. On the debugging target enter

     <path to debugging tools>\dbgsrv -t tcp:port=<localport>

    Dbgsrv returns to a command prompt with no output in the command window and the debug port can be any open port on the system. It’s a good idea to confirm we are listening on the port we selected by using netstat. Here’s an example -  


    Now let’s move our focus to the debugging host computer. Instead of choosing the typical “Connect a Remote Session”, we use the option, “Connect to a Remote Stub” from the File menu.


    Next enter the port and server name.


    After selecting OK, we are back to the empty Command window as expected. We won’t see any text until connected to the target process:


    Next select File-Attach to a Process option.


    It is important to note the “Attach to Process” dialog shows the list of processes running on the debugging target machine.


    Now let’s attach to the process. The title bar confirms that we connected to the process on the kdsrv target:


    Here you can see the benefit of connecting with a remote stub. The symbols for getif.exe are working for the application running on a machine located outside of the secure environment.




    But what if I need to Kernel debug?

    You can do the same thing with Kernel debugging using Kdsrv.


    If we need to attach to a kernel debugger we can use kdsrv. Much like the previous example we enter the command:

    kdsrv -t tcp:port=<localport>

    We can see that we are listening on the local port we selected:


    We must now start our debugger on the debugging host machine with a command line that includes the information needed to connect to the KdSrv host. As of Windbg we cannot attach to the kdsrv remote stub from the Windbg graphical interface and then make the kernel connection to the debugging target .


    Here is the string I used in the graphic above-

    windbg -k kdsrv:server=@{tcp:server=rodolpho,port=5004},trans=@{com:pipe,port=\\mimi\pipe\2008target}

    Note that we are debugging to a named pipe on a kernel debug target running under Microsoft Hyper-V! This is a great way to debug without the hassle of setting up a second machine and attaching with a null modem cable.

    If we wanted to attach to COM1 with a null-modem cable we would use:

    windbg -k kdsrv:server=@{tcp:server=rodolpho,port=5004},trans=@{com:port=com1,baud=11500}

    For more information on client and server syntax see http://msdn.microsoft.com/en-us/library/cc266440.aspx.

    After Windbg starts the command window will look just like if we attached with a local debugger. The title bar will confirm that we are attaching through the debug server:


    We can also change context to the Getif.exe process after connecting to the remote stub. When we look at the application stack we have symbols for our application:


    After we have our remote stub connection, we can then share our client as a remote session, but we will always handle the symbols on the remote stub client.

    Once you setup a remote stub a couple of times it’s as simple as setting up a remote session, and can make a remote debug go a lot smoother. For more information check out the debugging tools help, available outside of windbg as debugger.chm in the debugger directory, as well as http://msdn.microsoft.com/en-us/library/cc266434.aspx

    Security Note: These examples use TCP/IP or named pipes and are not password protected. The debugger remotes will allow SSL and Secure pipe connections with password and certificate authentication if you require greater security. See the online help or this link for more information.

  • Ntdebugging Blog

    Microsoft Windows Dynamic Cache Service


    Excessive cached read I/O is a growing problem.  For over one year we have been working on this problem with several companies.  You can read more about it in the original blog post:


    On 32 bit systems, the kernel could address at most 2GB of virtual memory.  This address range is shared and divided up for the many resources that the system needs; one of which is the System File Cache's working set.  On 32 bit systems the theoretical limit is almost 1GB for the cache’s working set; however, when a page is removed from the working set it will end up on the standby page list.  Therefore the system can cache more than the 1 GB limit if there is available memory. The working set; however, is just limited to what can be allocated within the Kernel's 2GB virtual address range.  Since most modern systems have more than 1 GB of physical RAM, the System File Cache's working set's size on a 32 bit system typically isn't a problem.

    With 64 bit systems, the kernel virtual address space is very large and is typically larger than physical RAM on most systems.  On these systems the System File Cache's working set can be very large and is typically about equal to the size of physical RAM.  If applications or file sharing performs a lot of sustained cached read I/O, the System File Cache's working set can grow to take over all of physical RAM.  If this happens, then process working sets are paged out and everyone starts fighting for physical pages and performance suffers.

    The only way to mitigate this problem is to use the provided APIs of GetSystemFileCacheSize() and SetSystemFileCacheSize().  The previous blog post "Too Much Cache" contains sample code and a compiled utility that can be used to manually set the System File Cache's working set size.

    The provided APIs, while offering one mitigation strategy, has a couple of limitations:

    1) There is no conflict resolution between multiple applications.  If you have two applications trying to set the System File Cache's working set size, the last one to call SetSystemFileCacheSize() will win.  There is no centralized control of the System File Cache's working set size.

    2) There is no guidance on what to set the System File Cache's working set size to.  There is no one size fits all solution.  A high cache working set size is good for file servers, but bad for large memory application and a low working set size could hurt everyone's I/O performance.  It is essentially up to 3rd party developers or IT administrators to determine what is best for their server and often times, the limits are determined by a best guesstimate backed by some testing.

    We fully understand that while we provide one way to mitigate this problem, the solution is not ideal. We spent a considerable amount of time reviewing and testing other options.  The problem is that there are so many varied scenarios on how users and applications rely on the System File Cache.  Some strategies worked well for the majority of usage scenarios, but ended up negatively impacting others.  We could not release any code change that would knowingly hurt several applications.

    We also investigated changing some memory manager architecture and algorithms to address these issues with a more elegant solution; however the necessary code changes are too extensive.  We are experimenting with these changes in Windows 7 and there is no way that we could back port them to the current operating systems.  If we did, we would be changing the underlying infrastructure that everyone has been accustomed to.  Such a change would require stress tests of all applications that run on Windows. The test matrix and the chance of regression are far too large.

    So that brings us back to the only provided solution - use the provided APIs.  While this isn't an ideal solution, it does work, but with the limitations mentioned above.  In order to help address these limitations, I've updated the SetCache utility to the Microsoft Windows Dynamic Cache Service.  While this service does not completely address the limitations above, it does provide some additional relief.

    The Microsoft Windows Dynamic Cache Service uses the provided APIs and centralizes the management of the System File Cache's working set size.  With this service, you can define a list of processes that you want to prioritize over the System File Cache by monitoring the working set sizes of your defined processes and back off the System File Cache's working set size accordingly.  It is always running in the background monitoring and dynamically adjusting the System File Cache's working set size.  The service provides you with many options such as adding additional slack space for each process' working set or to back off during a low memory event.

    Please note that this service is experimental and includes sample source code and a compiled binary.  Anyone is free to re-use this code in their own solution.  Please note that you may experience some performance side effects while using this service as it cannot possibly address all usage scenarios. There may be some edge usage scenarios that are negatively impacted.  The service only attempts to improve the situation given the current limitations.  Please report any bugs or observations here to this blog post.  While we may not be able to fix every usage problem, we will try to offer a best effort support.

    Side Effects may include:

    Cache page churn - If the System File Cache's working set is too low and there is sustained cached read I/O, the memory manager may not be able to properly age pages.  When forced to remove some pages in order to make room for new cache pages, the memory manager may inadvertently remove the wrong pages.  This could result in cached page churn and decreased disk performance for all applications.

    Version 1.0.0 - Initial Release


    NOTE: The memory management algorithms in Windows 7 and Windows Server 2008 R2 operating systems were updated to address many file caching problems found in previous versions of Windows. There are only certain unique situations when you need to implement the Dynamic Cache service on computers that are running Windows 7 or Windows Server 2008 R2.  For more information on how to determine if you are experiencing this issue and how to resolve it, please see the More Information section of Microsoft Knowledge Base article 976618 - You experience performance issues in applications and services when the system file cache consumes most of the physical RAM.

  • Ntdebugging Blog

    Challenges of Debugging Optimized x64 Code


    If you have not had the luxury of debugging optimized x64 code as of yet, don’t wait much longer and fall behind the times!  Due to the x64 fastcall-like calling convention coupled with the abundance of general purpose registers, finding variable values at arbitrary points in a call stack can be very tricky indeed.

    In this article, I’d like to detail some of my favorite techniques for debugging optimized x64 code.  But before digging into these techniques, let’s first have a quick overview of the x64 calling convention.

    The x64 Calling Convention

    Those of you familiar with the fastcall calling convention on x86 platforms will recognize the similarities to the x64 calling convention.  Whereas you typically have to maintain knowledge of multiple calling conventions on x86 platforms, on x64 platforms there is currently just one.  (Of course, I’m excluding the case of no calling convention which one can achieve with __declspec(naked) or by coding in straight assembly.)

    I won’t go into all of the various nuances of the x64 calling convention, therefore I recommend you check out the following link (http://msdn.microsoft.com/en-us/library/ms794533.aspx).  But commonly, the first four parameters into a function are passed via the registers rcx, rdx, r8, and r9.  If the function accepts more than four parameters, those parameters are passed on the stack.  (Those of you familiar with the x86 fastcall calling convention where the first two parameters are passed in ecx and edx will recognize the similarities).

    To help illustrate how the x64 calling convention works, I have created some simple example code.  Although the code is contrived and far from real-world code, it demonstrates some scenarios that are likely to encounter in the real word.  The code is shown below.

    #include <stdlib.h>

    #include <stdio.h>

    #include <windows.h>



    FunctionWith4Params( int param1, int param2, int param3,

                         int param4 )


        size_t lotsOfLocalVariables1 = rand();

        size_t lotsOfLocalVariables2 = rand();

        size_t lotsOfLocalVariables3 = rand();

        size_t lotsOfLocalVariables4 = rand();

        size_t lotsOfLocalVariables5 = rand();

        size_t lotsOfLocalVariables6 = rand();


        printf( "Entering FunctionWith4Params( %X, %X, %X, %X )\n",

                param1, param2, param3, param4 );

        printf( "Local variables: %X, %X, %X, %X, %X, %X \n",

                lotsOfLocalVariables1, lotsOfLocalVariables2,

                lotsOfLocalVariables3, lotsOfLocalVariables4,

                lotsOfLocalVariables5, lotsOfLocalVariables6 );




    FunctionWith5Params( int param1, int param2, int param3,

                         int param4, int param5 )


        FunctionWith4Params( param5, param4, param3, param2 );

        FunctionWith4Params( rand(), rand(), rand(), rand() );




    FunctionWith6Params( int param1, int param2, int param3,

                         int param4, int param5, int param6 )


        size_t someLocalVariable1 = rand();

        size_t someLocalVariable2 = rand();

        printf( "Entering %s( %X, %X, %X, %X, %X, %X )\n",


                param1, param2, param3, param4, param5, param6 );

        FunctionWith5Params( rand(), rand(), rand(),

                             param1, rand() );

        printf( "someLocalVariable1 = %X, someLocalVariable2 = %X\n",

                someLocalVariable1, someLocalVariable2 );



    main( int /*argc*/, TCHAR** /*argv*/ )


        // I use the rand() function throughout this code to keep

        // the compiler from optimizing too much.  If I had used

        // constant values, the compiler would have optimized all

        // of these away.

        int params[] = { rand(), rand(), rand(),

                         rand(), rand(), rand() };

        FunctionWith6Params( params[0], params[1], params[2],

                             params[3], params[4], params[5] );

        return 0;



    Cut and paste this code into a cpp file (such as example.cpp).  I used the Windows SDK (specifically the Windows SDK CMD Shell) to compile this code as C++ code by using the following command line:

    cl /EHa /Zi /Od /favor:INTEL64 example.cpp /link /debug

    Notice the /Od switch.  This disables all optimizations.  Later on, I’ll enable maximum optimization and that’s when the fun begins!

    Once you have the executable module built (mine is named example.exe), then you can fire it up in the debugger as follows:

    windbg -Q -c "bu example!main;g;" example.exe

    The command above will launch the application in windbg, set a breakpoint on the main() routine, and then go to that breakpoint.

    Now, let’s have a look at a diagram of what the stack looks like when FunctionWith6Params() gets called.  The diagram shown below illustrates the stack when the instruction pointer is at the beginning of the code for FunctionWith6Params() but before the prolog code has executed:


    Notice that the caller, in this case main(), allocated enough space on the stack for all six parameters to FunctionWith6Params() even though the first four parameters are passed in via registers.  The extra space on the stack is commonly referred to as the “home space” for the register parameters.  In the previous diagram, I have shown those slots filled with xxxxxxxx to indicate that the values within there are virtually random at this point.  That’s because the caller, main(), does not initialize these slots.  The called function, at its discretion, may store the first four parameters in this space for safe keeping.  This is exactly what happens in non-optimized builds and is a huge debugging convenience since you can easily find the contents of the first four parameters on the stack if you need to.  Additionally, windbg stack commands such as kb and kv which show these first few parameters will report true results.

    With all of that said, here is what the stack looks like after the prolog code in FunctionWith6Params() executes:


    The prolog assembly code for FunctionWith6Params() is shown below:

    0:000> uf .

    example!FunctionWith6Params [c:\temp\blog_entry\sample_code\example.cpp @ 28]:

       41 00000001`40015900 mov     dword ptr [rsp+20h],r9d
       41 00000001`40015905 mov     dword ptr [rsp+18h],r8d
       41 00000001`4001590a mov     dword ptr [rsp+10h],edx
       41 00000001`4001590e mov     dword ptr [rsp+8],ecx
       41 00000001`40015912 push    rbx
       41 00000001`40015913 push    rsi
       41 00000001`40015914 push    rdi
       41 00000001`40015915 sub     rsp,50h

    You can see that the first four instructions save the first four parameters on the stack in the home space allocated by main().  Then, the prolog code saves any non-volatile registers that FunctionWith6Params() plans to use during its execution.  The saved registers’ states are restored in the function epilog code prior to returning to the caller.  Finally the prolog code reserves some space on the stack, in this case, for 0x50 bytes.

    What is this space reserved on the top of the stack for?  First, space is created for any local variables.  In this case, FunctionWith6Params() has two.  However, those two local variables only account for 0x10 bytes.  What’s the deal with the rest of the space created on the top of the stack?

    On the x64 platform, when code prepares the stack for calling another function, it does not use push instructions to put the parameters on the stack as is commonly the case in x86 code.  Instead, the stack pointer typically remains fixed for a particular function.  The compiler looks at all of the functions the code in the current function calls, it finds the one with the maximum number of parameters, and then creates enough space on the stack to accommodate those parameters.  In this example, FunctionWith6Params() calls printf() passing it 8 parameters.  Since that is the called function with the maximum number of parameters, the compiler creates 8 slots on the stack.  The top four slots on the stack will then be the home space used by any functions FunctionWith6Params() calls.

    A handy side effect of the x64 calling convention is that once you are inside the bracket of the prolog and epilog of a function, the stack pointer does not change while the instruction pointer is in that function.  This eliminates the need for a base pointer which is common in x86 calling conventions.  When the code in FunctionWith6Params() prepares to call a child function, it simply puts the first four parameters into the required registers and, if there are more than 4 parameters, it uses mov instructions to place the remaining parameters in the allocated stack space but making sure to skip the first four parameter slots on the stack.

    Debugging Optimized x64 code (The Nightmare Begins)

    Why is debugging x64 optimized code so tricky?  Well, remember that home space that the caller creates on the stack for the callee to save the first four parameters?  It turns out that the calling convention does not require the callee to use that space!  And you can certainly bet that optimized x64 code will not use that space unless it is necessary and convenient for its optimization purposes.  Moreover, when optimized code does use the home space, it could use it to store non-volatile registers rather than the first four parameters to the function.

    Go ahead and recompile the example code using the following command line:

    cl /EHa /Zi /Ox /favor:INTEL64 example.cpp /link /debug

    Notice the use of the /Ox switch.  This turns on maximum optimization.  Debug symbols are still turned on so we can debug the optimized code easily.  Always build your release product with debug information turned on so you can debug your release builds!

    Let’s look at how the prolog assembly code for FunctionWith6Params() has changed:

       41 00000001`400158e0 mov     qword ptr [rsp+8],rbx
       41 00000001`400158e5 mov     qword ptr [rsp+10h],rbp
       41 00000001`400158ea mov     qword ptr [rsp+18h],rsi
       41 00000001`400158ef push    rdi
       41 00000001`400158f0 push    r12
       41 00000001`400158f2 push    r13
       41 00000001`400158f4 sub     rsp,40h
       41 00000001`400158f8 mov     ebx,r9d
       41 00000001`400158fb mov     edi,r8d
       41 00000001`400158fe mov     esi,edx
       41 00000001`40015900 mov     r12d,ecx

    The optimized code is significantly different!  Let’s itemize the changes below:

    ·         The function uses the home space on the stack, however, it does not store the first four parameters there.  Instead it uses the space to store some non-volatile registers it must restore later in the epilog code.  This optimized code is going to make use of more processor registers, therefore it must save more of the non-volatile registers.

    ·         It still pushes three non-volatile registers onto the stack for safe keeping along with the other three it stored in the home space.

    ·         It then creates space on the stack.  However, it’s less space than in the non-optimized code, and is only 0x40 bytes.  That’s because the optimized code uses registers to represent the local variables someLocalVariable1 and someLocalVariable2.  Therefore, it only has to create space for the 8 slots needed to call the function with the maximum number of parameters, printf().

    ·         It then stores the first four parameters into non-volatile registers rather than in the home space. (Don’t count on this behavior.  An optimized function may make no copies of the contents of rcx, rdx, r8, and r9.  It all depends on the structure of the code)

    Now step through FunctionWith6Params() to the source line just after the first printf() call.  The output generated from the printf() call on my machine is as follows:

    Entering FunctionWith6Params( 29, 4823, 18BE, 6784, 4AE1, 3D6C )

    A common version of the stack command in windbg is kb, which also displays the first few parameters to each function in the frame.  In reality, it is displaying the first few positions of the stack.  The output for the kb command is as follows:

    0:000> kb
    RetAddr           : Args to Child                                                           : Call Site
    00000001`4001593b : 00000000`00004ae1 00000000`00004823 00000000`000018be 00000000`007e3570 : example!FunctionWith6Params+0x6a [c:\temp\blog_entry\sample_code\example.cpp @ 37]
    00000001`40001667 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : example!main+0x5b [c:\temp\blog_entry\sample_code\example.cpp @ 57]
    00000000`76d7495d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : example!__tmainCRTStartup+0x15b
    00000000`76f78791 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
    00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

    Notice that not all of the first four parameters of FunctionWith6Params() match what is shown by the kb command!  Of course, this is a side effect of the optimization.  You simply cannot trust the output displayed by kb and kv in optimized code.  This is the biggest reason why optimized x64 code is so difficult to debug.  Trust me when I say that it’s just pure luck that the second and third slots in the kb output above match the actual parameter values to FunctionWith6Params().  It’s because FunctionWith6Params() stores non-volatile registers in those slots and it just so happens that main() put those values in those non-volatile registers prior to calling FunctionWith6Params().

    Parameter Sleuthing -- Technique 1 (Down the Call Graph)

    Now, let’s look at some techniques for finding elusive function parameters to functions in the call stack while running x64 code.  I have placed a DebugBreak() call in FunctionWith4Params() to illustrate.  Go ahead and let the code run in windbg until it hits this breakpoint.  Now, imagine what you are looking at is actually not a live debugging scenario but rather a dump file from a customer of yours and this is the point where your application has crashed.  So, you take a look at the stack and it looks like the following:

    0:000> kL
    Child-SP          RetAddr           Call Site
    00000000`0012fdc8 00000001`40015816 ntdll!DbgBreakPoint
    00000000`0012fdd0 00000001`400158a0 example!FunctionWith4Params+0x66
    00000000`0012fe50 00000001`40015977 example!FunctionWith5Params+0x20
    00000000`0012fe80 00000001`40015a0b example!FunctionWith6Params+0x97
    00000000`0012fee0 00000001`4000168b example!main+0x5b
    00000000`0012ff20 00000000`7733495d example!__tmainCRTStartup+0x15b
    00000000`0012ff60 00000000`77538791 kernel32!BaseThreadInitThunk+0xd
    00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

    Now, let’s say that in order for you to figure out what went wrong, you need to know the first parameter to FunctionWith6Params().  Assume you have not seen the first parameter in the console output. No fair cheating!

    The first technique I would like to illustrate involves digging downward into the call graph  to find out what has happened to the contents of rcx (the first parameter) after entering FunctionWith6Params().  In this case, since the parameters are 32bit integers, we’ll be attempting to follow the contents of ecx, which is the lower half of rcx.

    Let’s start by looking at the assembly code within FunctionWith6Params() starting from the beginning up to the call into FunctionWith5Params()::

    0:000> u example!FunctionWith6Params example!FunctionWith6Params+0x97
    example!FunctionWith6Params [c:\temp\blog_entry\sample_code\example.cpp @ 41]:
    00000001`400158e0 mov     qword ptr [rsp+8],rbx
    00000001`400158e5 mov     qword ptr [rsp+10h],rbp
    00000001`400158ea mov     qword ptr [rsp+18h],rsi
    00000001`400158ef push    rdi
    00000001`400158f0 push    r12
    00000001`400158f2 push    r13
    00000001`400158f4 sub     rsp,40h
    00000001`400158f8 mov     ebx,r9d
    00000001`400158fb mov     edi,r8d
    00000001`400158fe mov     esi,edx
    00000001`40015900 mov     r12d,ecx
    00000001`40015903 call    example!rand (00000001`4000148c)
    00000001`40015908 movsxd  r13,eax
    00000001`4001590b call    example!rand (00000001`4000148c)
    00000001`40015910 lea     rdx,[example!`string'+0x68 (00000001`40020d40)]
    00000001`40015917 movsxd  rbp,eax
    00000001`4001591a mov     eax,dword ptr [rsp+88h]
    00000001`40015921 lea     rcx,[example!`string'+0x80 (00000001`40020d58)]
    00000001`40015928 mov     dword ptr [rsp+38h],eax
    00000001`4001592c mov     eax,dword ptr [rsp+80h]
    00000001`40015933 mov     r9d,esi
    00000001`40015936 mov     dword ptr [rsp+30h],eax
    00000001`4001593a mov     r8d,r12d
    00000001`4001593d mov     dword ptr [rsp+28h],ebx
    00000001`40015941 mov     dword ptr [rsp+20h],edi
    00000001`40015945 call    example!printf (00000001`400012bc)
    00000001`4001594a call    example!rand (00000001`4000148c)
    00000001`4001594f mov     edi,eax
    00000001`40015951 call    example!rand (00000001`4000148c)
    00000001`40015956 mov     esi,eax
    00000001`40015958 call    example!rand (00000001`4000148c)
    00000001`4001595d mov     ebx,eax
    00000001`4001595f call    example!rand (00000001`4000148c)
    00000001`40015964 mov     r9d,r12d
    00000001`40015967 mov     r8d,esi
    00000001`4001596a mov     edx,ebx
    00000001`4001596c mov     ecx,eax
    00000001`4001596e mov     dword ptr [rsp+20h],edi
    00000001`40015972 call    example!ILT+5(?FunctionWith5ParamsYAXHHHHHZ) (00000001`4000100a)

    FunctionWith6Params() copies ecx into r12d to preserve it for later use since the contents must be passed to multiple functions within the body of FunctionWith6Params().  Notice at the point where FunctionWith5Params() is called, the contents of ecx have been copied into both r12d and r9d, however, r9d is volatile so we must be careful with it since it could get overwritten prior to the next function call when FunctionWith5Params() calls FunctionWith4Params().  Armed with this information, let’s dig into the assembly code for FunctionWith5Params() that has executed up to this point:

    0:000> u example!FunctionWith5Params example!FunctionWith5Params+0x20
    example!FunctionWith5Params [c:\temp\blog_entry\sample_code\example.cpp @ 32]:
    00000001`40015880 mov     qword ptr [rsp+8],rbx
    00000001`40015885 mov     qword ptr [rsp+10h],rsi
    00000001`4001588a push    rdi
    00000001`4001588b sub     rsp,20h
    00000001`4001588f mov     ecx,dword ptr [rsp+50h]
    00000001`40015893 mov     eax,r9d
    00000001`40015896 mov     r9d,edx
    00000001`40015899 mov     edx,eax
    00000001`4001589b call    example!ILT+10(?FunctionWith4ParamsYAXHHHHZ) (00000001`4000100f)

    At the point where FunctionWith4Params() is called, the value we are after is now in eax, edx, and r12d.  Again, be careful with eax and edx as they are volatile.  However, since FunctionWith5Params() did not touch r12d, the contents of the parameter we are still after are still in r12d

    Now, let’s look at the code in FunctionWith4Params() that has executed so far:

    0:000> u example!FunctionWith4Params example!FunctionWith4Params+0x66
    example!FunctionWith4Params [c:\temp\blog_entry\sample_code\example.cpp @ 9]:
    00000001`400157b0 48895c2408      mov     qword ptr [rsp+8],rbx
    00000001`400157b5 48896c2410      mov     qword ptr [rsp+10h],rbp
    00000001`400157ba 4889742418      mov     qword ptr [rsp+18h],rsi
    00000001`400157bf 57              push    rdi
    00000001`400157c0 4154            push    r12
    00000001`400157c2 4155            push    r13
    00000001`400157c4 4156            push    r14
    00000001`400157c6 4157            push    r15
    00000001`400157c8 4883ec50        sub     rsp,50h
    00000001`400157cc 458be1          mov     r12d,r9d
    00000001`400157cf 458be8          mov     r13d,r8d
    00000001`400157d2 448bf2          mov     r14d,edx
    00000001`400157d5 448bf9          mov     r15d,ecx
    00000001`400157d8 e8afbcfeff      call    example!rand (00000001`4000148c)
    00000001`400157dd 4898            cdqe
    00000001`400157df 4889442448      mov     qword ptr [rsp+48h],rax
    00000001`400157e4 e8a3bcfeff      call    example!rand (00000001`4000148c)
    00000001`400157e9 4898            cdqe
    00000001`400157eb 4889442440      mov     qword ptr [rsp+40h],rax
    00000001`400157f0 e897bcfeff      call    example!rand (00000001`4000148c)
    00000001`400157f5 4863e8          movsxd  rbp,eax
    00000001`400157f8 e88fbcfeff      call    example!rand (00000001`4000148c)
    00000001`400157fd 4863f0          movsxd  rsi,eax
    00000001`40015800 e887bcfeff      call    example!rand (00000001`4000148c)
    00000001`40015805 4863f8          movsxd  rdi,eax
    00000001`40015808 e87fbcfeff      call    example!rand (00000001`4000148c)
    00000001`4001580d 4863d8          movsxd  rbx,eax
    00000001`40015810 ff15a24b0100    call    qword ptr [example!_imp_DebugBreak (00000001`4002a3b8)]

    We just found what we are looking for!  The red highlighted line shows r12 being saved on the stack because FunctionWith4Params() wants to reuse r12.  Since r12 is a non-volatile register, it must save the contents somewhere so it can restore the contents before the function exits.  All we have to do is locate that slot on the stack, and assuming that the stack has not been corrupted, we’ll have our prize.

    One technique for finding the slot is to start with the Child-SP value associated with the FunctionWith4Params() frame in the stack dump shown previously, which is 00000000`0012fdd0 in my build.  Using that value, let’s dump the stack content using the dps command:

    0:000> dps 00000000`0012fdd0 L10
    00000000`0012fdd0  00000001`00000001
    00000000`0012fdd8  00000001`40024040 example!_iob+0x30
    00000000`0012fde0  00000000`00000000
    00000000`0012fde8  00000001`40002f9e example!_getptd_noexit+0x76
    00000000`0012fdf0  00000000`00261310
    00000000`0012fdf8  00000001`40001a92 example!_unlock_file2+0x16
    00000000`0012fe00  00000000`00000001
    00000000`0012fe08  00000000`00004823
    00000000`0012fe10  00000000`000041bb
    00000000`0012fe18  00000000`00005af1
    00000000`0012fe20  00000000`00000000
    00000000`0012fe28  00000000`00000000
    00000000`0012fe30  00000000`00002cd6
    00000000`0012fe38  00000000`00000029
    00000000`0012fe40  00000000`00006952
    00000000`0012fe48  00000001`400158a0 example!FunctionWith5Params+0x20 [c:\temp\blog_entry\sample_code\example.cpp @ 34]

    I have highlighted the position that rsp points to when we enter FunctionWith4Params() in red.  Based on the prolog code shown for FunctionWith4Params() above, we can find the slot where our prize is stored.  I have highlighted it in green above and you can see the value on my machine is 0x29, which matches the value printf() sent to the console.  Additionally, I highlighted r14d in green in the assembly code for FunctionWith4Params() to indicate where the contents of edx (the second parameter) were copied to.  Since FunctionWith4Params() is virtually the top function on the stack (due to the fact that DebugBreak() takes no parameters), then r14d should also contain the value we are after.  Dumping the contents of r14 proves this as shown below:

    0:000> r r14

    To sum up, when you are chasing register-passed parameter values down through a call graph, look for places where the value is copied into.  Specifically, if the value is copied into a non-volatile register, that can be a good thing.  If a downstream function wants to reuse that non-volatile register, it must first save the contents (usually on the stack) so it can restore it when it is done.  If you’re not that lucky, you may be able to trace a register it was copied into which has not been changed at the breakpoint.  Both conditions were shown above.

    Parameter Sleuthing -- Technique 2 (Up the Call Graph)

    The second technique I would like to demonstrate is very similar to the first technique except that we walk the stack/call-graph in the opposite direction as before, that is, up the call graph.  Unfortunately, none of these techniques are fool proof and guaranteed to bear fruit.  So, it’s nice to have multiple techniques to employ even though all of them may strike out.

    We know that when FunctionWith6Params() gets called, ecx contains the value we are after.  Therefore, if we look at the code for main(), maybe we can find the source from which the ecx register was filled prior to the function call.  Let’s have a look as the assembly code in main():

    0:000> u example!main example!main+0x5b
    example!main [c:\temp\blog_entry\sample_code\example.cpp @ 58]:
    00000001`400159b0 48895c2408      mov     qword ptr [rsp+8],rbx
    00000001`400159b5 48896c2410      mov     qword ptr [rsp+10h],rbp
    00000001`400159ba 4889742418      mov     qword ptr [rsp+18h],rsi
    00000001`400159bf 48897c2420      mov     qword ptr [rsp+20h],rdi
    00000001`400159c4 4154            push    r12
    00000001`400159c6 4883ec30        sub     rsp,30h
    00000001`400159ca e8bdbafeff      call    example!rand (00000001`4000148c)
    00000001`400159cf 448be0          mov     r12d,eax
    00000001`400159d2 e8b5bafeff      call    example!rand (00000001`4000148c)
    00000001`400159d7 8be8            mov     ebp,eax
    00000001`400159d9 e8aebafeff      call    example!rand (00000001`4000148c)
    00000001`400159de 8bf0            mov     esi,eax
    00000001`400159e0 e8a7bafeff      call    example!rand (00000001`4000148c)
    00000001`400159e5 8bf8            mov     edi,eax
    00000001`400159e7 e8a0bafeff      call    example!rand (00000001`4000148c)
    00000001`400159ec 8bd8            mov     ebx,eax
    00000001`400159ee e899bafeff      call    example!rand (00000001`4000148c)
    00000001`400159f3 448bcf          mov     r9d,edi
    00000001`400159f6 89442428        mov     dword ptr [rsp+28h],eax
    00000001`400159fa 448bc6          mov     r8d,esi
    00000001`400159fd 8bd5            mov     edx,ebp
    00000001`400159ff 418bcc          mov     ecx,r12d
    00000001`40015a02 895c2420        mov     dword ptr [rsp+20h],ebx
    00000001`40015a06 e8fab5feff      call    example!ILT+0(?FunctionWith6ParamsYAXHHHHHHZ) (00000001`40001005)

    We see that ecx was copied from the contents of r12d.  This is helpful since r12d is a non-volatile register, and if it is reused by a function further down the call stack, it must be preserved and that preservation usually means putting a copy on the stack.  It would have been nice if ecx were filled with a value from the stack, at which point we would be virtually done.  But in this case, we just need to start our journey back downwards again.

    We don’t have to look very far.  Let’s have another look at the prolog code for FunctionWith6Params():

    example!FunctionWith6Params [c:\temp\blog_entry\sample_code\example.cpp @ 41]:
       41 00000001`400158e0 mov     qword ptr [rsp+8],rbx
       41 00000001`400158e5 mov     qword ptr [rsp+10h],rbp
       41 00000001`400158ea mov     qword ptr [rsp+18h],rsi
       41 00000001`400158ef push    rdi
       41 00000001`400158f0 push    r12
       41 00000001`400158f2 push    r13
       41 00000001`400158f4 sub     rsp,40h
       41 00000001`400158f8 mov     ebx,r9d
       41 00000001`400158fb mov     edi,r8d
       41 00000001`400158fe mov     esi,edx
       41 00000001`40015900 mov     r12d,ecx

    r12 is reused in FunctionWith6Params(), which means that our prize will be on the stack.  Let’s start by looking at the Child-SP for this frame which is at 00000000`0012fe80 by using the dps command:

    0:000> dps 00000000`0012fe80 L10
    00000000`0012fe80  00000000`00001649
    00000000`0012fe88  00000000`00005f90
    00000000`0012fe90  00000000`00000029
    00000000`0012fe98  00000000`00004823
    00000000`0012fea0  00000000`00006952
    00000000`0012fea8  00000001`00006784
    00000000`0012feb0  00000000`00004ae1
    00000000`0012feb8  00000001`00003d6c
    00000000`0012fec0  00000000`00000000
    00000000`0012fec8  00000000`00000029
    00000000`0012fed0  00000000`00006784
    00000000`0012fed8  00000001`4000128b example!main+0x5b [c:\temp\blog_entry\sample_code\example.cpp @ 72]

    I have highlighted in red the slot rsp points to when we enter FunctionWith6Params().  At this point, it is a simple matter to walk the assembly code and find the slot where the value is stored.  I have highlighted it in green above.

    Parameter Sleuthing -- Technique 3 (Inspecting Dead Space)

    The final technique I would like to demonstrate involves a little more trickery and involves looking at “dead” or previously used slots on the stack that are not used by the current function call.  To demonstrate, let’s say that after the DebugBreak() is hit, we need to know what the contents of param4 that were passed to FunctionWith6Params().  Let’s have another look at the assembly that has executed for FunctionWith6Params() and this time, let’s follow r9d, the fourth parameter:

    0:000> u example!FunctionWith6Params example!FunctionWith6Params+0x97
    example!FunctionWith6Params [c:\temp\blog_entry\sample_code\example.cpp @ 41]:
    00000001`400158e0 mov     qword ptr [rsp+8],rbx
    00000001`400158e5 mov     qword ptr [rsp+10h],rbp
    00000001`400158ea mov     qword ptr [rsp+18h],rsi
    00000001`400158ef push    rdi
    00000001`400158f0 push    r12
    00000001`400158f2 push    r13
    00000001`400158f4 sub     rsp,40h
    00000001`400158f8 mov     ebx,r9d
    00000001`400158fb mov     edi,r8d
    00000001`400158fe mov     esi,edx
    00000001`40015900 mov     r12d,ecx
    00000001`40015903 call    example!rand (00000001`4000148c)
    00000001`40015908 movsxd  r13,eax
    00000001`4001590b call    example!rand (00000001`4000148c)
    00000001`40015910 lea     rdx,[example!`string'+0x68 (00000001`40020d40)]
    00000001`40015917 movsxd  rbp,eax
    00000001`4001591a mov     eax,dword ptr [rsp+88h]
    00000001`40015921 lea     rcx,[example!`string'+0x80 (00000001`40020d58)]
    00000001`40015928 mov     dword ptr [rsp+38h],eax
    00000001`4001592c mov     eax,dword ptr [rsp+80h]
    00000001`40015933 mov     r9d,esi
    00000001`40015936 mov     dword ptr [rsp+30h],eax
    00000001`4001593a mov     r8d,r12d
    00000001`4001593d mov     dword ptr [rsp+28h],ebx
    00000001`40015941 mov     dword ptr [rsp+20h],edi
    00000001`40015945 call    example!printf (00000001`400012bc)
    00000001`4001594a call    example!rand (00000001`4000148c)
    00000001`4001594f mov     edi,eax
    00000001`40015951 call    example!rand (00000001`4000148c)
    00000001`40015956 mov     esi,eax
    00000001`40015958 call    example!rand (00000001`4000148c)
    00000001`4001595d mov     ebx,eax
    00000001`4001595f call    example!rand (00000001`4000148c)
    00000001`40015964 mov     r9d,r12d
    00000001`40015967 mov     r8d,esi
    00000001`4001596a mov     edx,ebx
    00000001`4001596c mov     ecx,eax
    00000001`4001596e mov     dword ptr [rsp+20h],edi
    00000001`40015972 call    example!ILT+5(?FunctionWith5ParamsYAXHHHHHZ) (00000001`4000100a)

    Notice that r9d is first moved into ebx.  But also, notice that it copied the contents into a slot on the stack at rsp+0x28.  What is this slot?  It’s the sixth parameter to the following printf() call.  Remember that the compiler looks at all of the functions the code calls and finds the function with the maximum number of parameters and then allocates enough space for that function.  As the code prepares to call printf(), it is moving the value we are after into the sixth parameter slot in that reserved stack space.  But what use is this information?

    If you examine FunctionWith6Params(), you see that every function called after printf() takes less than six parameters.  Specifically, the call to FunctionWith5Params() only uses five of those slots and just leaves the remaining three with junk in them.  This junk is actually our treasure!  From examining the code, it’s guaranteed that nobody has overwritten the slot represented by rsp+28.

    To find this slot, let’s again start by getting the Child-SP value for the frame we’re talking about as shown below:

    0:000> kL
    Child-SP          RetAddr           Call Site
    00000000`0012fdc8 00000001`40015816 ntdll!DbgBreakPoint
    00000000`0012fdd0 00000001`400158a0 example!FunctionWith4Params+0x66
    00000000`0012fe50 00000001`40015977 example!FunctionWith5Params+0x20
    00000000`0012fe80 00000001`40015a0b example!FunctionWith6Params+0x97
    00000000`0012fee0 00000001`4000168b example!main+0x5b
    00000000`0012ff20 00000000`7733495d example!__tmainCRTStartup+0x15b
    00000000`0012ff60 00000000`77538791 kernel32!BaseThreadInitThunk+0xd
    00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

    We can then take the highlighted value above and use the same offset in the code to find our value:

    0:000> dd 000000000012fe80+28 L1
    00000000`0012fea8  00006784

    As expected, the “dead” slot on the stack contains the value we are after.  You can compare the value to the output shown on the console to verify.

    A Non-volatile Register Shortcut

    Now that I have shown you the theory behind finding these elusive values passed around in registers, let me show you a shortcut that will make life a little bit easier.  The shortcut relies upon the /r option of the .frame command.  When using .frame /r, the debugger has the smarts to track non-volatile registers.  But as with any technique, always have multiple tools in your pocket in case you need to use all of them to verify a result.

    To demonstrate, let’s consider Technique 2 described previously where we look up the call graph and we want to know what r12 was prior to main() calling FunctionWith6Params().  Go ahead and re-launch the application in windbg and let it run until it hits the DebugBreak().  Now, let’s take a look at the stack including the frame numbers:

    0:000> knL
     # Child-SP          RetAddr           Call Site
    00 00000000`0012fdc8 00000001`40015816 ntdll!DbgBreakPoint
    01 00000000`0012fdd0 00000001`400158a0 example!FunctionWith4Params+0x66
    02 00000000`0012fe50 00000001`40015977 example!FunctionWith5Params+0x20
    03 00000000`0012fe80 00000001`40015a0b example!FunctionWith6Params+0x97
    04 00000000`0012fee0 00000001`4000168b example!main+0x5b
    05 00000000`0012ff20 00000000`7748495d example!__tmainCRTStartup+0x15b
    06 00000000`0012ff60 00000000`775b8791 kernel32!BaseThreadInitThunk+0xd
    07 00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

    Based on our previous analysis of the assembly in main(), we know that the first parameter to FunctionWith6Params() was also stored in the non-volatile register r12 in main() prior to calling FunctionWith6Params().  Now, check out what we get when we use the .frame /r command to set the current frame to 4.

    0:000> .frame /r 4
    04 00000000`0012fee0 00000001`4000168b example!main+0x5b [c:\temp\blog_entry\sample_code\example.cpp @ 70]
    rax=0000000000002ea6 rbx=0000000000004ae1 rcx=0000000000002ea6
    rdx=0000000000145460 rsi=00000000000018be rdi=0000000000006784
    rip=0000000140015a0b rsp=000000000012fee0 rbp=0000000000004823
     r8=000007fffffdc000  r9=0000000000001649 r10=0000000000000000
    r11=0000000000000246 r12=0000000000000029 r13=0000000000000000
    r14=0000000000000000 r15=0000000000000000
    iopl=0         nv up ei pl nz na pe nc
    cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
    00000001`40015a0b 488b5c2440      mov     rbx,qword ptr [rsp+40h] ss:00000000`0012ff20=0000000000000000

    As you can see, .frame /r shows the register contents as they were in main() prior to calling FunctionWith6Params().  Beware!  You can only trust the non-volatile registers when using this command!  Be sure to check out the following link to see which registers are considered volatile: Register Usage for x64 64-Bit.

    .frame /r can spare you the time spent manually digging around on the stack to find saved volatile registers.  In my experiments, .frame /r even works where there is no symbol information available.  However, it never hurts to know how to do it manually in case you’re faced with a situation where .frame /r breaks down.


    The x64 calling convention and the abundance of general purpose registers in the processor bring many opportunities for optimization to the table.  However, when all of those optimizations are in play, they can certainly make debugging difficult.  After giving a brief overview of the x64 calling convention, I demonstrated three techniques one can use to find parameter values to various functions in the call stack.  I also showed you a shortcut you can use to see non-volatile registers for a particular frame in the call stack.  I hope you find these techniques useful in your debugging adventures.  Additionally, I urge you to become more familiar with all of the nuances of the x64 calling convention.


  • Ntdebugging Blog

    Debug Fundamentals Exercise 3: Calling conventions



    Today’s exercise will focus on x86 function calling conventions.  The calling convention of a function describes the following:


    ·         The order in which parameters are passed

    ·         Where parameters are placed (pushed on the stack or placed in registers)

    ·         Whether the caller or the callee is responsible for unwinding the stack on return


    While debugging, an understanding of calling conventions is helpful when you need to determine why certain values are placed in registers or on the stack before a function call.


    Standard x86 calling convention on Windows:



    Unwinds stack

    Win32 (Stdcall)

    pushed onto stack from right to left


    Native C++ (Thiscall)

    pushed onto stack from right to left, "this" pointer in ecx


    COM (Stdcall for C++)

    pushed onto stack from right to left, then "this" is pushed



    arg1 in ecx, arg2 in edx, remaining args pushed onto stack from right to left



    pushed onto stack from right to left





    Below are calls to 5 functions.  Each function takes two DWORD parameters.  Based on the code that calls each function, identify the calling convention used.


    // Call to Function1

    01002ffe 8b08            mov     ecx,dword ptr [eax]

    01003000 53              push    ebx

    01003001 687c2c0001      push    offset 01002c7c

    01003006 50              push    eax

    01003007 ff11            call    dword ptr [ecx]


    // Call to Function2

    01002490 50              push    eax

    01002491 688c110001      push    offset 0100118c

    01002496 e82a020000      call    dbgex4!Function2 (010026c5)

    0100249b 59              pop     ecx

    0100249c 59              pop     ecx


    // Call to Function3

    0100248e 8bd0            mov     edx,eax

    01002490 8bcf            mov     ecx,edi

    01002492 e8aeffffff      call    dbgex4!Function3 (01002445)


    // Call to Function4

    00413586 8b450c          mov     eax,dword ptr [ebp+0Ch]

    00413589 50              push    eax

    0041358a 8b4d08          mov     ecx,dword ptr [ebp+8]

    0041358d 51              push    ecx

    0041358e 8b4dec          mov     ecx,dword ptr [ebp-14h]

    00413591 e86fdfffff      call    dbgex4!Function4 (00411505)


    // Call to Function5

    01003540 56              push    esi

    01003541 8d85d4f9ffff    lea     eax,[ebp-62Ch]

    01003547 50              push    eax

    01003548 ff1558100001    call    dbgex4!Function5 (01001058)]



    Bonus: describe the calling convention used for x64.



    [Update: our answer. Posted 12/18/2008]


    Function1 - COM (Stdcall for C++)


    Function2 - cdecl


    Function3 - fastcall


    Function4 - Native C++ (Thiscall)


    Function5 - Win32 (Stdcall)



    Bonus: describe the calling convention used for x64:  



  • Ntdebugging Blog

    Debug Fundamentals Exercise 2: Some reverse engineering for Thanksgiving



    Continuing our series on “Fundamentals Exercises”, we have some more reverse engineering for you!  Again, these exercises are designed more as learning experiences rather than simply puzzlers.  We hope you find them interesting and educational!  Feel free to post your responses here, but we won’t put them on the site until after we post the “official” responses, to avoid spoilers.


    Examine the following code, registers, and stack values to determine the following:


    1.       What is the return value from DemoFunction2?

    2.       What is the purpose of DemoFunction2?

    3.       Bonus: Both the last exercise and this week’s exercise involved accessing data at ebp+8.  Why ebp+8?



    1.       You probably don’t want to manually walk through every instruction that executes in the loop.  Instead, walk through a few iterations to determine the intent of the code.

    2.       The bracket notation [] in the assembly means to treat the value in brackets as a memory address, and access the value at that address.

    3.       32-bit integer return values are stored in eax



    0:000> uf 010024d0


    010024d0 55              push    ebp

    010024d1 8bec            mov     ebp,esp

    010024d3 8b5508          mov     edx,dword ptr [ebp+8]

    010024d6 33c0            xor     eax,eax

    010024d8 b920000000      mov     ecx,20h

    010024dd d1ea            shr     edx,1

    010024df 7301            jnc     asmdemo2!DemoFunction2+0x12 (010024e2)

    010024e1 40              inc     eax

    010024e2 e2f9            loop    asmdemo2!DemoFunction2+0xd (010024dd)

    010024e4 5d              pop     ebp

    010024e5 c3              ret


    0:000> r

    eax=80002418 ebx=7ffd7000 ecx=00682295 edx=00000000 esi=80002418 edi=00000002

    eip=010024d0 esp=0006fe98 ebp=0006fea8 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246


    010024d0 55              push    ebp


    0:000> dps esp

    0006fe98  0100251c asmdemo2!main+0x20

    0006fe9c  80002418

    0006fea0  00000002

    0006fea4  00000000

    0006fea8  0006ff88

    0006feac  01002969 asmdemo2!_mainCRTStartup+0x12c

    0006feb0  00000002

    0006feb4  00682270

    0006feb8  006822b8

    0006febc  f395c17d

    0006fec0  00000000

    0006fec4  00000000

    0006fec8  7ffd7000

    0006fecc  00000000

    0006fed0  00000000

    0006fed4  00000000

    0006fed8  00000094

    0006fedc  00000006

    0006fee0  00000000

    0006fee4  00001771

    0006fee8  00000002

    0006feec  76726553

    0006fef0  20656369

    0006fef4  6b636150

    0006fef8  00003120

    0006fefc  00000000

    0006ff00  00000000

    0006ff04  00000000

    0006ff08  00000000

    0006ff0c  00000000

    0006ff10  00000000

    0006ff14  00000000


    [Update: our answer. Posted 12/04/2008]

    We had a great response to this exercise!  It was good to see so many of you going through this.  There were some readers that found this a good exercise for beginners, and others were looking for a return to Puzzlers.  As an FYI, we may do more Puzzlers in the future, but for now we are going to continue on the “Fundamentals Exercise” track to help all our readers build up a solid foundation for debugging.


    It was interesting to read how several of you not only gave the answers, but made suggestions for how the code could be optimized!  I want to point out that the code we post for these exercises isn’t intended to be the optimal solution; it is written as a learning tool.   That said, keep that in-depth feedback coming; I think everyone will benefit from a discussion of optimization.


    Answers to exercise 2:


    1. DemoFunction2 returns 5, which is the number of bits set in 80002418, the value at ebp+8.
    2. DemoFunction2 finds the hamming weight of the 32-bit value passed to the function (it returns the count of bits that are equal to 1).
    3. ebp points to the base of the stack frame (the value stored there points to the previous frame's ebp), ebp+4 points to the return address, and ebp+8 points to the first parameter passed to the function.  Note that the value of ebp changes in the function prolog, at instruction 010024d1.  At this point ebp is set to 0006fe94, so at instruction 010024d3, ebp+8 is 0006fe9c, and [ebp+8] = 80002418.
  • Ntdebugging Blog

    How to modify an application behavior when you don't have the source



    From time to time we need to help customers change the way an application interacts with the operating system or SDKs.  The challenge is often the access to the code.  Sometimes neither party may own the application in question and none of the parties have access to the source.   Luckily, the Microsoft Research team came up with the Detours SDK to address this problem a number of years ago and the latest version makes it easy to implement a solution to a situation like this.  In short, Detours allows you to create a DLL that hooks one or more operating system functions, so that when that function is called, the caller will actually invoke your custom Detours code instead.


    The process is very simple:


    ·         Download the detours SDK http://research.microsoft.com/sn/detours/ and build it.

    ·         You can start with the SIMPLE Sample or our included sample that builds in the Visual Studio command-line environment.

    ·         Create a function pointer prototype for the  API you want to detour (TrueCreateFile in the example below).   It should have the same parameters and return value as the function you will detour.  As part of the declaration set the function pointer value to the real API Address.  In the following sample we will detour the CreateFile API.

    ·         You will also need to create your own version of the API you are detouring (ModifyCreateFile below).   In this case we are creating our own Createfile, which will call the original CreateFile with the FILE_FLAG_WRITE_THROUGH flag OR’d into the dwFlagsAndAttributes parameter.


    static HANDLE (WINAPI * TrueCreateFile)(LPCTSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurityAttributes, DWORD dwCreationDisposition, DWORD dwFlagsAndAttributes, HANDLE hTemplateFile) = CreateFile;


    HANDLE WINAPI ModifyCreateFile(LPCTSTR lpFileName,  DWORD dwDesiredAccess, DWORD dwShareMode,

        LPSECURITY_ATTRIBUTES   lpSecurityAttributes, DWORD dwCreationDisposition, DWORD dwFlagsAndAttributes, HANDLE hTemplateFile)


        dwFlagsAndAttributes |= FILE_FLAG_WRITE_THROUGH;

        return TrueCreateFile(lpFileName, dwDesiredAccess,  dwShareMode, lpSecurityAttributes, 

               dwCreationDisposition,      dwFlagsAndAttributes,  hTemplateFile);     



    ·         You will need to write your detour code in the DLLmain of your dll.   This should be executed when your DLL loads and dwReason is == DLL_PROCESS_ATTACH.   In our call to DetourAttach we pass our TrueCreateFile pointer (the real CreateFile address),  and the address of ModifyCreateFile (our custom create file api).  The detour API handles the intercept for us. 




            DetourAttach(&(PVOID&)TrueCreateFile, ModifyCreateFile);



    ·         When the DLL_PROCESS_DETACH happens you will need to clean up the detour and unhook the real API.




            DetourDetach(&(PVOID&)TrueCreateFile, ModifyCreateFile);



    So how do you get the DLL loaded into the target process?  There are a couple ways.  I recommend using the setdll tool that comes as part of the Detour SDK.  In the following case we are modifying NTBackup to automatically load our detoured DLL when NTbackup runs. 


    C:\test>setdll /d:nocache.dll ntbackup.exe

    Adding nocache.dll to binary files.



        MFC42u.dll -> MFC42u.dll

        msvcrt.dll -> msvcrt.dll

        ADVAPI32.dll -> ADVAPI32.dll

        KERNEL32.dll -> KERNEL32.dll

        GDI32.dll -> GDI32.dll

        USER32.dll -> USER32.dll

        ntdll.dll -> ntdll.dll

        COMCTL32.dll -> COMCTL32.dll

        SHELL32.dll -> SHELL32.dll

        MPR.dll -> MPR.dll

        comdlg32.dll -> comdlg32.dll

        NETAPI32.dll -> NETAPI32.dll

        RPCRT4.dll -> RPCRT4.dll

        ole32.dll -> ole32.dll

        SETUPAPI.dll -> SETUPAPI.dll

        USERENV.dll -> USERENV.dll

        NTMSAPI.dll -> NTMSAPI.dll

        CLUSAPI.dll -> CLUSAPI.dll

        query.dll -> query.dll

        sfc_os.dll -> sfc_os.dll

        SYSSETUP.dll -> SYSSETUP.dll

        OLEAUT32.dll -> OLEAUT32.dll



    Note that if you modify a binary that is protected by Windows File Protection the modified binary will be replaced by the OS with the original binary.  I recommend keeping your modified version in another directory so it does not get replaced.


    Click here for the sample source code and Makefile, which will build from a Visual Studio command prompt.


    Jeff Dailey

  • Ntdebugging Blog

    Debug Fundamentals Exercise 1: Reverse engineer a function



    Hello ntdebuggers!  We’ve seen a lot of interest in our Puzzlers, and we’ve also seen requests and interest in topics covering debugging fundamentals.  So we’ve decided to combine the two topics and post a series of “Fundamentals Exercises”.  These exercises will be designed more as learning experiences rather than simply puzzlers.  We hope you find them interesting and educational!


    Feel free to post your responses here, but we won’t put them on the site until after we post the “official” responses, so as to avoid spoilers.



    Examine the following code, registers, and stack values to determine the following:


    1.       When the function “DoTheWork” returns, what is the return value from that function?

    2.       Bonus: what is the mathematical operation that “DoTheWork” performs?



    1.       The bracket notation [] in the assembly means to treat the value in brackets as a memory address, and access the value at that address.

    2.       32-bit integer return values are stored in eax



    // Code

    0:000> uf eip


    0040101c 55              push    ebp

    0040101d 8bec            mov     ebp,esp

    0040101f 8b4d08          mov     ecx,dword ptr [ebp+8]

    00401022 8bc1            mov     eax,ecx

    00401024 49              dec     ecx

    00401025 0fafc1          imul    eax,ecx

    00401028 83f902          cmp     ecx,2

    0040102b 7ff7            jg      demo2!DoTheWork+0x8 (00401024)

    0040102d 5d              pop     ebp

    0040102e c3              ret


    // Current register state

    0:000> r

    eax=00000007 ebx=7ffd9000 ecx=ffffffff edx=00000007 esi=00001771 edi=00000000

    eip=0040101c esp=0012fe9c ebp=0012feac iopl=0         nv up ei pl nz na po nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202


    0040101c 55              push    ebp


    // Current stack values for this thread

    0:000> dps esp

    0012fe9c  00406717 demo2!main+0x27

    0012fea0  00000007

    0012fea4  82059a87

    0012fea8  00000007

    0012feac  0012ff88

    0012feb0  004012b2 demo2!mainCRTStartup+0x170

    0012feb4  00000002

    0012feb8  00980e48

    0012febc  00980e80

    0012fec0  00000094

    0012fec4  00000006

    0012fec8  00000000

    0012fecc  00001771

    0012fed0  00000002

    0012fed4  76726553

    0012fed8  20656369

    0012fedc  6b636150

    0012fee0  00003120

    0012fee4  00000000

    0012fee8  00000000

    0012feec  00000000

    0012fef0  00000000

    0012fef4  00000000

    0012fef8  00000000

    0012fefc  00000000

    0012ff00  00000000

    0012ff04  00000000

    0012ff08  00000000

    0012ff0c  00000000

    0012ff10  00000000

    0012ff14  00000000

    0012ff18  00000000


    [Update: our answer. Posted 11/19/2008]

    Wow - what a great response from our readers on this exercise!  It is great to see the various approaches to reverse engineering this code.  As for the answer, the numerical result (stored in eax) is 5040, and the corresponding mathematical operation is a factorial.  So 7! is the result calculated, given that 7 was passed to the function.  Congratulations to all of you that got it right! 


    Many of you posted some C code to give an idea of what the original source of DoTheWork() might have looked like.  The original source was actually written in assembly!  However, it was written to be called from C, and it uses ebp in the same way that compiled C code might.  This function wasn’t written with optimal performance in mind, but rather for learning about reverse engineering.



  • Ntdebugging Blog

    Ntfs Misreporting Free Space (Part 2)


    Continuing our discussion on the internals of disk usage, we will now shift our focus to internal metadata usage.

    …….. KB in …. Indexes.


    Consider for a moment a world without indexes…  The $MFT is a database containing records that are accessed via FRS (file record segment) numbers.  This FRS number includes an embedded sequence number that is updated anytime a file record is deleted & re-used.  A file record must have a new identity once it has been deleted & re-used, so the sequence number is part of this unique identity.  Without indexes, you would have to find files by remembering their FRS / Sequence numbers.  It would be like remembering your favorite web sites by remembering the IP addresses.  In this way, file indexes are like a DNS database so we don’t have to find files using FRS numbers.  The folder structure has a reserved FRS number for the root.  On all NTFS volumes, the root folder is FRS 0x5.  Since the root is in a well known location, it can be accessed without doing any index lookups.


    Each folder is like a set of DNS records containing information about a domain in the name space.  The records contain information about the names and FRS numbers for the files “in” the folder.  I put “in” in quotes because the folder itself does not actually contain any files (just records containing basic information about the files).  The files are actually records in the MFT that are accessed by FRS number, so the index entries map names to FRS numbers.  Folders use two types of metadata streams: $INDEX_ROOT:”$I30” and $INDEX_ALLOCATION:”$I30” to track the names that exist in their namespace.  The streams have an attribute type code and a name.  For example, $INDEX_ALLOCATION is the attribute type, and the attribute name is “$I30”.  The “$I30” name is a short tag indicating that the stream contains file name indexes (as opposed to security indexes, reparse indexes, etc.) 


    Why “$I30”?  Filenames are largely alphanumeric, and the first alphanumeric character in the UNICODE table is 0x30 (48 for those who are hexadecimally challenged). “$I30” is a shorthand method for saying “Index that’s alphanumeric”.


    When a file is created, the $FILENAME information is packaged into a name index record which is stored in the parent folder’s $I30 index.  With the exception of having one or two $I30 related streams, there is very little difference between a file and a folder.


    Now, let’s create a new folder "NewFolder" in the root, and look at the $I30 index entry created in the root for "NewFolder".


    D:\>md d:\NewFolder


    D:\>dir d:\

     Volume in drive D is d

     Volume Serial Number is 4447-4F88


     Directory of d:\


    10/08/2008  02:42 PM    <DIR>          NewFolder

                   0 File(s)              0 bytes

                   1 Dir(s)      68,620,288 bytes free


    If you open up an NTFS exploration tool and read the $INDEX_ALLOCATION:$I30 for the root folder, you will find an index entry in the root folder containing the filename “NewFolder”.  In addition to NFI.EXE, there are some data recovery utilities that can be used to examine NTFS metadata, but I am not able to give any brand names on the blog.  NFI.EXE is a useful tool for drilling down into NTFS, and it’s FREE in the OEM Support Tools Phase 3 Service Release 2.  Since NFI is free, it is also not an officially “supported” utility.  Despite this, NFI can tell you a lot of information about the allocated ranges of any file.  Also, you can give it a logical sector number and it will find the file that owns the sector.  For the purpose of this demonstration though, we will be using the standard command line interface “NFI.EXE C:” or “NFI.EXE [drive\path]”.


    C:\Windows\system32>c:\shared\Disktools\nfi.exe d:\

    NTFS File Sector Information Utility.

    Copyright (C) Microsoft Corporation 1999. All rights reserved.


    Root Directory

        $STANDARD_INFORMATION (resident)

        $FILE_NAME (resident)

        $SECURITY_DESCRIPTOR (resident)

        $INDEX_ROOT $I30 (resident)

        $INDEX_ALLOCATION $I30 (nonresident)

            logical sectors 78920-78927 (0x13448-0x1344f)

        $BITMAP $I30 (resident)

        Attribute Type 0x100 $TXF_DATA (resident)


    Here is the sector in the root directory $I30 index allocation that contains our “NewFolder” index entry.


    LBN 78922


    0x0000   c6 06 3b 47 75 29 c9 01-c6 06 3b 47 75 29 c9 01   ╞.;Gu)╔.╞.;Gu)╔.

    0x0010   c6 06 3b 47 75 29 c9 01-c6 06 3b 47 75 29 c9 01   ╞.;Gu)╔.╞.;Gu)╔.

    0x0020   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

    0x0030   06 00 00 20 00 00 00 00-07 00 24 00 53 00 65 00   ... ......$.S.e.

    0x0040   63 00 75 00 72 00 65 00-0a 00 00 00 00 00 0a 00   c.u.r.e.........

    0x0050   60 00 50 00 00 00 00 00-05 00 00 00 00 00 05 00   `.P.............

    0x0060   c6 06 3b 47 75 29 c9 01-c6 06 3b 47 75 29 c9 01   ╞.;Gu)╔.╞.;Gu)╔.

    0x0070   c6 06 3b 47 75 29 c9 01-c6 06 3b 47 75 29 c9 01   ╞.;Gu)╔.╞.;Gu)╔.

    0x0080   00 00 02 00 00 00 00 00-00 00 02 00 00 00 00 00   ................

    0x0090   06 00 00 00 00 00 00 00-07 03 24 00 55 00 70 00   ..........$.U.p.

    0x00a0   43 00 61 00 73 00 65 00-03 00 00 00 00 00 03 00   C.a.s.e.........

    0x00b0   60 00 50 00 00 00 00 00-05 00 00 00 00 00 05 00   `.P.............

    0x00c0   c6 06 3b 47 75 29 c9 01-c6 06 3b 47 75 29 c9 01   ╞.;Gu)╔.╞.;Gu)╔.

    0x00d0   c6 06 3b 47 75 29 c9 01-c6 06 3b 47 75 29 c9 01   ╞.;Gu)╔.╞.;Gu)╔.

    0x00e0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

    0x00f0   06 00 00 00 00 00 00 00-07 03 24 00 56 00 6f 00   ..........$.V.o.

    0x0100   6c 00 75 00 6d 00 65 00-05 00 00 00 00 00 05 00   l.u.m.e.........

    0x0110   58 00 44 00 00 00 00 00-05 00 00 00 00 00 05 00   X.D.............

    0x0120   c6 06 3b 47 75 29 c9 01-e7 f8 69 2e d8 38 c9 01   ╞.;Gu)╔.τ°i.╪8╔.

    0x0130   e7 f8 69 2e d8 38 c9 01-e7 f8 69 2e d8 38 c9 01   τ°i.╪8╔.τ°i.╪8╔.

    0x0140   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

    0x0150   06 00 00 10 00 00 00 00-01 03 2e 00 00 00 00 00   ...►............

    0x0160   25 00 00 00 00 00 01 00-68 00 54 00 00 00 00 00   %.......h.T.....

    0x0170   05 00 00 00 00 00 05 00-36 61 f9 9f 75 29 c9 01   ........6a∙ƒu)╔.

    0x0180   8c 08 e1 8d 61 38 c9 01-8c 08 e1 8d 61 38 c9 01   î.ßìa8╔.î.ßìa8╔.

    0x0190   8c 08 e1 8d 61 38 c9 01-00 00 00 00 00 00 00 00   î.ßìa8╔.........

    0x01a0   00 00 00 00 00 00 00 00-00 00 00 10 00 00 00 00   ...........►....

    0x01b0   09 00 4e 00 65 00 77 00-46 00 6f 00 6c 00 64 00   ..N.e.w.F.o.l.d.

    0x01c0   65 00 72 00 6c 00 75 00-23 00 00 00 00 00 01 00   e.r.l.u.#.......

    0x01d0   88 00 74 00 00 00 00 00-05 00 00 00 00 00 05 00   ê.t.............

    0x01e0   f6 7d 8e 47 75 29 c9 01-a6 06 ab 47 75 29 c9 01   ÷}ÄGu)╔.ª.½Gu)╔.

    0x01f0   a6 06 ab 47 75 29 c9 01-a6 06 ab 47 75 29 44 00   ª.½Gu)╔.ª.½Gu)D.


    Below is the $I30 index entry in human readable format.  Notice that it has everything needed to populate a WIN32_FIND_DATA  structure, and most importantly, the FRS number of our newly created folder.  The complete index record contains a duplicate copy of the $FILE_NAME attribute from the file record, and this allows FindFirstFile()/FindNextFile() to get all pertinent information about our found file without actually opening the file.


    FileReference            FRS,SEQ <0x25, 0x1> // FRS and Sequence number for "NewFolder"

    ParentDirectory          FRS,SEQ <0x5, 0x5>  // FRS and Sequence number for the root folder.

    CreationTime           : 10/08/2008 LCL 14:40:04.520

    LastModificationTime   : 10/08/2008 LCL 14:40:04.707

    LastChangeTime         : 10/08/2008 LCL 14:40:04.707

    LastAccessTime         : 10/08/2008 LCL 14:40:04.707

    Allocated Length       : 0

    File Size              : 0

    File Attribute Flags   : 0x10000000          // Attribute flags

    File Name              : "NewFolder"


    Now let’s do a pop quiz on indexes to see if everyone is on the same page…


    Suppose that you write a fancy new application and you call FindFirstFile() / FindNextFile() in a loop.  The cFilename string returned during one of the iterations is “MyFile.txt” (you also have the WIN32_FIND_DATA for the same file).


    1.       Where did the name “MyFile.txt” come from?


    2.       If you call FindFirstFile()/FindNextFile() with a wildcard “*.*”, is it necessary to open each found file to retrieve the WIN32_FIND_DATA?


    3.       When you call FindFirstFile(), what is NTFS doing behind the scenes?


    4.       What happens when you close the search handle?



    1.         If you said “from the file’s parent folder’s $I30 index”, then you are correct.

    2.         If you said “no”, then you are correct.  The WIN32_FIND_DATA is also retrieved from the $I30 index.  There is no need to open the individual files to get this information.

    3.         When you call FindFirstFile, NTFS opens the $I30 index stream(s) for the target folder and scans through the index entries for the first record that matches the specified wild card.  A search context is also created to keep track of the current search location in the index stream.

    4.         The target folder’s index handle is closed and the search context is freed.


    If you passed the quiz (or at least understand the answers), you're ready to read on…


    In short, high index usage is the result of having a large number of index entries.  Common sense would dictate that you probably have the same number of indexes as you have files - Right?  Well....the answer is not quite that simple.  Suppose that you have 8.3 names turned on and you create a file called "tiny.txt".  This file is both 8.3 and LFN compliant, so there will be exactly one index entry created for this file.  Now consider what happens when you create a file named "MyFileHasAReallyLongName.txt".  This is NOT 8.3 compliant, so NTFS will create an 8.3 name ("MyFile~1.txt").  Now NTFS has to maintain an 8.3 index entry, AND an LFN index entry for a single file.  This effectively doubles index usage (plus, long filenames have to be stored in the index and that also makes the LFN filename index larger than normal).  If you plan to create a large number of files on a volume, then it is a good practice to either use 8.3 compliant names, or disable 8.3 name creation altogether.


    If you have a large folder and want to see how many bytes are in use by indexes, then use contig.exe (from http://technet.microsoft.com/en-us/sysinternals/bb545046.aspx) to find out the allocated length of the folder's $INDEX_ALLOCATION.  Then divide this number by how many files are in the folder.  That will give you bytes per index entry. 


    Below is an example of how to determine how to determine index stream size for a folder.

    In my “System32” folder, I had a $I30 index allocation which was 536,576 bytes long.  It contained records for 2,460 files, so this averages out to 218 bytes per index.  The presence of 8.3 names can be discovered by running “DIR /X”.  On my systems, I don’t have a need for 8.3, so I turned off 8.3 via the registry (refer to KB121007).

    Whenever possible, try to distribute large numbers of files across several volumes.  If you have to put millions of files on a single volume, try to keep your filenames short to save space and improve performance. 

    …….. KB in …. bad sectors.


    When a bad sector is detected by CHKDSK /R or if a write occurs because of a bad sector on disk, the cluster that contains the bad sector will be added to the allocated range of $Badclus.  If $Badclus contains any allocated ranges, then it's time to consider replacing the hard drive.


    IMPORTANT:  If you have a software mirrored volume, and one hard disk has bad sectors, then it is likely that one of the drives in the mirror is going to fail soon.  If this happens, keep in mind that when you replace the failing drive, the regenerated mirror set will still still have sectors marked in the $Badclus file even though the mirror is healthy.  Since a mirror is a perfect block-by-block copy of the volume, all information for all files is duplicated between the members (including $Badclus).  For this reason, the $Badclus information is mirrored to the working drive as well as the failing drive.



    …….. KB in use by the system.


    System usage is comprised of $MFT, $Logfile, $Secure, and all other supporting structures in the MFT.  If you are looking for system usage, you will need to drill down into the NTFS metadata files.


    In most cases, high system usage cannot be “fixed”, but it can be kept under control by proper configuration and user education.  NFI will give you the information about the size of the various internal metadata files, and you can research the details on how each of the internal system files work, but there simply isn’t enough room in a blog post to talk about them all.  However, we will discuss the two most common problems that we see:  1. High $MFT usage, and 2. Bloated Security Stream in the $Secure File.


    1.       High $MFT Usage  Every file on the volume is defined by ONE OR MORE file records that are exactly 1KB in size.  If the MFT is large, it's because you have a large number of file records in the MFT (free records are also included in the total MFT size).  Below are two different ways to view the MFT information.  FSUTIL will show you the valid data length, while NFI will give a view of where the fragments of the $MFT:$DATA attribute are laid out on disk.

    Unfortunately, if your $MFT is too big and you want it to be smaller, you will have to reformat the drive.  Just keep in mind that once you restore your files, you will have 1KB of MFT allocated for each file on the drive (lots of extra file records are needed to restore your 20GB compressed files), but I will assume that everyone read part 1 and they are not going to do that.  C|;3)


    2.       Bloated Security Stream in the $Secure File

    Following good development practices will save you lots of headaches with your $Secure file.  If you write applications that ACL & re-ACL files over and over and over, your $Secure file probably looks like mine (love those logon scripts from the IT department)…

    To those of you who are savvy with file system internals, you probably noticed something was missing in the picture above.  The $Secure file shows $SII/$SDH INDEX_ROOT(s)/INDEX_ALLOCATION(s), but where is the actual security stream $DATA:$SDS?


    There simply was no more room in the base file record for the $DATA:$SDS attribute, so it was moved to a child record.  To find the child record, we can read the $ATTRIBUTE_LIST (via sector editor) and find the pointer to the file record(s) that hold the $SDS stream metadata.  To keep the legal department happy, I can’t give you the data types, but I can tell you that my $DATA:$SDS stream (shown below) is split between two child records because the $SDS stream is heavily fragmented.  The first child record is FRS 0x1888, and the other is FRS 0x12d7e.  If you were to read those two file records, they would each contain the mapping information for approximately 400 fragments of my security stream.


    LBN 3537832


    0x0000   10 00 00 00 20 00 00 1a-00 00 00 00 00 00 00 00   ... ..→........

    0x0010   09 00 00 00 00 00 09 00-00 00 d4 00 09 00 00 00   ...............

    0x0020   30 00 00 00 20 00 00 1a-00 00 00 00 00 00 00 00   0... ..→........

    0x0030   09 00 00 00 00 00 09 00-07 00 24 04 53 65 53 63   ..........$.SeSc

    0x0040   80 00 00 00 28 00 04 1a-00 00 00 00 00 00 00 00   Ç...(..→........

    0x0050   88 18 00 00 00 00 37 00-00 00 24 00 53 00 44 00   ê↑....7...$.S.D.

    0x0060   53 00 02 00 01 01 00 00-80 00 00 00 28 00 04 1a   S.......Ç...(..→

    0x0070   f9 01 00 00 00 00 00 00-7e 2d 01 00 00 00 0f 00   ∙.......~-......

    0x0080   00 00 24 00 53 00 44 00-53 00 00 00 00 00 00 00   ..$.S.D.S.......

    0x0090   90 00 00 00 28 00 04 1a-00 00 00 00 00 00 00 00   É...(..→........

    0x00a0   09 00 00 00 00 00 09 00-60 17 24 00 53 00 44 00   ........`$.S.D.

    0x00b0   48 00 00 00 00 00 00 00-90 00 00 00 28 00 04 1a   H.......É...(..→


    My security stream may look scary because it has 400 fragments, but it is only about 3.3MB plus the size of the $SII & SDH streams.  If it were to grow past the 1GB range, I would start looking for the cause of the growth. 


    In theory, you can bloat your $SDS stream by creating lots of unique security descriptors, but this is usually not the cause of bloating.  Instead, most mischief is caused by application developers who call SetFileSecurity() without properly preparing their security descriptor buffer.


    Most applications:


    1.       Allocate some heap memory.

    2.       Init the SD via InitializeSecurityDescriptor(). 

    3.       Set up the ACE’s.

    4.       Assign security to the target object.


    The problem is that heap memory is like recycled paper.  When you call InitializeSecurityDescriptor() the first few bytes of your buffer will say “I’m a security descriptor”, but the ending bytes will have some text from an e-mail you decided not to send to your boss.  As the SD is filled in with ACE’s, the letter to the boss is overwritten with the ACE’s.  At that point, your buffer looks like a valid SD to the system, but there’s still some slack space at the end that says “Porsche destroyed in the fire.  Yours truly, Larry”.  When you send this buffer to SetFileSecurity(), NTFS takes this buffer and computes a hash value to determine whether this SD is unique (the salutation to your boss is also included in the hash).  If the hash is identical to a hash value in the $SDH stream, then we do a comparison between the new & existing SD’s.  If they match a byte-per-byte comparison, then the existing SD is used.  If not, your new SD is added to the stream (along with the bad news about the boss’ car).  To prevent this, always zero your entire security descriptor buffer prior to calling InitializeSecurityDescriptor().  You will prevent $SDS bloating and your boss will never know about the car.




    I hope you all find this information useful in your sleuthing efforts. 


    Best regards,


    Dennis Middleton “The NTFS Doctor”




  • Ntdebugging Blog

    Remote kernel or user mode debugging of dumps or live systems



    GES (Global Escalation Services) is not only responsible for helping our external customers, but we spend a great deal of time collaborating with engineers and developers around the world at our support and development sites.  We often look at large dump files, but in some cases we perform a live debug to determine root cause of a server failure.  In the case of a memory dump, the files are usually very large, so copying the files over the network, even on the fastest WAN connection, can take a LONG time.

    The solution is remote debugging.    

    Here is how you do it!

    First of all it takes two to tango: a remote person and the expert who will help the remote person by debugging the process using the debugger installed on their machine.

    Let’s say you are the expert who is helping the remote person. Here’s how the process works:

    1.       The remote person opens a dump file, debugs a process, or kernel debugs a machine at some remote location using windbg.

    2.       The remote person decides, “I NEED HELP!”

    3.       The remote person simply types in .server tcp:port=9999 at the windbg prompt.



    Notice the following output.


    Server started.  Client can connect with any of these command lines

    0: <debugger> -remote tcp:Port=9999,Server=MyServerName



    1.       The remote person sends email or IM to the person they want help from with the connection string <debugger> -remote tcp:port=9999,Server=MyServerName

    2.       The expert runs WINDBG -remote tcp:port=9999,Server=MyServerName from the debuggers directory.

    3.       At this point the remote person should see the following message at the remote debugger site.


    EXPERTMACHINE\expert (tcp connected at Tue Mar 25 15:36:53 2008


    Once connected, the expert can issue any command to debug the dump or target machine remotely.   The great part is many people can connect to the remote debugger session if needed.  It’s a great collaboration tool, and something we use every day at Microsoft.   


    When remote debugging, I find it useful to save the debug session in a log file.  It’s as easy as typing .logopen C:\mydebuggersession.log in windbg before the remote debug session starts.  This gives everyone the opportunity to look at the debug session later if necessary.


    One last thing to keep in mind about remote debugging is security. I recommend using the .noshell command to prevent the execution of remote shell commands. Without the .noshell command, it is possible for people connected to your session to use the .shell (Command Shell) command to execute an application or a Microsoft MS-DOS command directly from the debugger.


    Thanks Jeff-



  • Ntdebugging Blog

    Windows Hotfixes and Updates - How do they work?


    Today I would like to talk about some of the work the Windows Serviceability (WinSE) team does regarding servicing Windows and releasing updates.

    The operating system is divided into multiple components. Each component can consist of one or more files, registry keys, configuration settings, etc.  WinSE releases updates based on components rather than the entire operating system. This reduces a lot of overhead with having to install updates to components that have not changed. Depending on the severity and applicability of the problem, there are different kinds of release mechanisms. Keep in mind, though, the actual fix still remains the same.

    1.       Updates and Security Updates

    These Updates are typically available on Windows Update. They frequently contain security fixes, and from time to time also contain reliability rollup packages. These updates are thoroughly tested and Microsoft highly recommends that you update your computer with these releases. In fact, most are automatically downloaded to your machine if you have Windows Update turned on. In most cases, Update releases are also available as standalone downloads from the download center.


    2.       Hotfixes

    When an individual customer reports a bug to Microsoft for a specific scenario, the WinSE team releases Hotfixes to address these problems. Hotfixes are not meant to be widely distributed and go through a limited amount of testing due to the customer's need for an urgent fix.  Hotfixes are developed in a separate environment than the regular Updates.  This allows Microsoft to release Updates that do not include the Hotfix files, thereby minimizing risk for the customer.

    Once the Hotfix is ready and packaged by WinSE, a KB article is written describing the problem, with instructions on how to obtain the Hotfix.  Microsoft recommends that only customers experiencing the particular problem install the Hotfix for that problem.

    Note: Hotfixes are also sometimes referred to as LDRs, or QFE's (Quick Fix Engineering). The term QFE is an old term that is mostly no longer used in reference to current versions of Windows.


    3.       SP  - Service Pack

    The service pack is a major update in the life of an OS. It contains a wide variety of fixes as well as all the GDR and LDR fixes that were released since the previous service pack was shipped. This is a thoroughly tested release and highly recommended by Microsoft for installation. This is usually available as a standalone release, and is then released through Windows Update as well.



    GDR vs. LDR branches

    Now that we have described the different kinds of updates, let's take a deeper look into how these fixes are built. When a new OS or service pack is released, 2 branches are created from the release code tree -a GDR (general distribution release) branch and a LDR (limited distribution release) branch. Hotfixes are built solely from the LDR branch, while Updates for broad release are built from the GDR branch.

    Service Packs are built from a third branch that contains all Updates , Hotfixes and additional fixes.  This way the new service pack is shipped with all the fixes from both branches.

    Note – Once the new service pack is shipped, the code branches from the previous release are still active and serviced as necessary.

    Installing a Hotfix

    By default, all components on Windows systems start on the GDR branch following each major release. When you install updates from Windows Update for a GDR component, it gets upgraded with the GDR version.

    When you install a specific Hotfix, the files and components in the Hotfix package are migrated to the LDR branch. At this point, that particular component is marked as a LDR component. If you install a newer Update over this component, the Windows servicing technology will automatically install the appropriate latest version from the LDR branch for you. This is possible because each Update package ships with both the GDR and LDR versions of the component.

    Once a component is marked as a LDR component, the only way to move back to the GDR branch is to uninstall all Hotfixes for that component, or move to the next available service pack.


    What would happen if a user installed a Hotfix, and then sometime later installed the next service pack? Well, in that case it depends on the Hotfix and when it was built.

    1.       If the Hotfix was built before the service pack, then the component will be moved to the GDR version contained in the service pack.

    2.       If the Hotfix was built after the service pack, the component will be migrated to the post-service pack version of the component, and will stay on the same branch that it was originally on.


    In order to make this work, these packages contain both the RTM GDR version, the RTM Hotfix branch, and the SP1 Hotfix and GDR version of each binary.


    All fixes built for Windows are cumulative in nature by branch, i.e. a new update will contain the new fix, as well as all the previous fixes for that branch. Referencing the chart above, installing fix #4 can get you fixes #2 and #4 on the GDR branch. If the component is on the LDR branch, then the user would get fixes #1-4.


    Finally, the servicing technology has to handle the case where you need the functionality of an older Hotfix (e.g. “Fix #1” in the diagram above) but you may already have installed “Fix #4” which might be a critical security update.  What happens is that when the GDR branch of a fix is installed, it also places a copy of the Hotfix version of the same fix on the system.  When you run the installer for Hotfix #1, it detects that a newer version of the file is already installed, but it also detects that it needs to migrate it to the Hotfix version of the binary that was previously stored on the system. The result is that you end up with the Hotfix binary for Fix #4, which has both the Hotfix you need plus the cumulative set of security fixes.


    Stay tuned for more, in the next blog entry, I will talk about the staging mechanism that Windows uses to install Updates and Hotfixes as well as the uninstall process. Also, I will talk about how to determine the branch a file is built from.


    - Omer 


    More Information

    Description of the standard terminology that is used to describe Microsoft software updates

    Description of the contents of Windows XP Service Pack 2 and Windows Server 2003 software update packages


Page 16 of 24 (239 items) «1415161718»