• Ntdebugging Blog

    Join us on Facebook


    We’ve set up a new Facebook group named “Windows Debugging” as a community for Windows Debugging industry experts.  If Windows debugging is an interest of yours, please consider joining the group!  Many of the regular contributors to this blog will be there as well.  Just login to Facebook and search for “Windows Debugging” to find us, or go here:


  • Ntdebugging Blog

    How it Works: DLL Injection





    Hi everyone, this is Bob again.  I recently worked on an issue where the interaction of two threads in Winlogon led to a bugcheck.  One thread was a Winlogon thread initializing GDI.  The interesting thing about this scenario is how the other thread ended up in this process.




    What was the thread doing?


    Below is the user half of the thread stack.  The thread attempted to load a DLL.  


    ChildEBP RetAddr  Args to Child

    0058eaec 773901ad 773901d9 0058eafc 00240022 ntdll!KiFastSystemCallRet

    0058eb0c 775d96f3 775d1808 00000000 77e6f032 USER32!NtUserRegisterWindowMessage+0xc

    0058ed24 775e4755 00000000 00000001 7c837512 comctl32!InitGlobalMetrics+0x44

    0058ed3c 775e426a 00000031 0058ed68 7763490c comctl32!_ProcessAttach+0x98

    0058ed48 7763490c 775d0000 00000001 00000000 comctl32!DllMain+0x21

    0058ed68 7c81a352 775d0000 00000001 00000000 comctl32!_DllMainCRTStartup+0x52

    0058ed88 7c833465 776348ba 775d0000 00000001 ntdll!LdrpCallInitRoutine+0x14

    0058ee90 7c834311 00000000 00000000 7c8e2e58 ntdll!LdrpRunInitializeRoutines+0x367

    0058f124 7c834065 00000000 00080e98 0058f3ec ntdll!LdrpLoadDll+0x3cd

    0058f3a0 77e41bf3 00080e98 0058f3ec 0058f3cc ntdll!LdrLoadDll+0x198

    0058f408 77e5c70b 7c8e2e58 00000000 00000000 kernel32!LoadLibraryExW+0x1b2

    0058f41c 7c92a6a1 7c8e2e58 00000000 7c8e2e58 kernel32!LoadLibraryW+0x11

    0058f454 7c92a65f 7c8e2e58 7c8d0000 7c9297b6 SHELL32!SHFusionLoadLibrary+0x2a

    0058f460 7c9297b6 00000020 00000008 0058f6a8 SHELL32!DelayLoadCC+0x15

    0058f694 7c929728 0058f6a8 0000007c 00000001 SHELL32!SHFusionInitializeIDCC+0x92

    0058f8b4 7c92966f 7c8d0000 0000007c 00000001 SHELL32!SHFusionInitializeFromModuleID+0x3a

    0058f8c8 7c92962c 7c8d0000 00000001 0058f8f8 SHELL32!_ProcessAttach+0x34

    0058f8d8 7c92bb63 7c8d0000 00000001 00000000 SHELL32!DllMain+0x27

    0058f8f8 7c81a352 7c8d0000 00000001 00000000 SHELL32!_DllMainCRTStartup+0x52

    0058f918 7c833465 7c92bb1b 7c8d0000 00000001 ntdll!LdrpCallInitRoutine+0x14

    0058fa20 7c834311 00000000 00000000 00000004 ntdll!LdrpRunInitializeRoutines+0x367    ß This function is

          loading and calling init

          functions of dependent DLL’s

    0058fcb4 7c834065 00000000 00080760 0058ff7c ntdll!LdrpLoadDll+0x3cd

    0058ff30 77e41bf3 00080760 0058ff7c 0058ff5c ntdll!LdrLoadDll+0x198        ß 0058ff5c is the Unicode string

       pointer to DLL name

    0058ff98 77e5c70b 00570254 00000000 00000000 kernel32!LoadLibraryExW+0x1b2

    0058ffac 0057017e 00570254 00000000 00200008 kernel32!LoadLibraryW+0x11

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    0058fff4 00000000 00570228 00905a4d 00000003 0x57017e



    The DLL being loaded depends on other DLLs.  These DLLs are loaded and initialized when the first DLL is loaded.  So if DLL ‘A’ has a call to DLL ‘B’, then DLL ‘B’ is loaded by the loader when DLL ‘A’ is loaded.



    What is so unusual about this thread?

    If you examine the IP for that start address taken from !thread and where it is calling LoadLibraryW, the IP is not in a range of any loaded module.


    1: kd> !thread

    THREAD 86edd020  Cid 7884.7528  Teb: 7ffdc000 Win32Thread: bc1adb48 RUNNING on processor 1

    Not impersonating

    DeviceMap                 e10018c0

    Owning Process            87c51d88       Image:         winlogon.exe

    Wait Start TickCount      2567944        Ticks: 0

    Context Switch Count      4                 LargeStack

    UserTime                  00:00:00.015

    KernelTime                00:00:00.000

    Start Address 0x00570000   ß Starting address.  This is not in any module displayed by “!peb”


    The !PEB extension will display the loaded module list and address range for the process.  It is not shown here because of space.  However this address is not in any loaded module.


    Let’s look at the function:


    00570000 55              push    ebp

    00570001 8bec            mov     ebp,esp

    00570003 83ec3c          sub     esp,3Ch

    00570006 8365e800        and     dword ptr [ebp-18h],0

    0057000a 8365ec00        and     dword ptr [ebp-14h],0

    0057000e 8365f800        and     dword ptr [ebp-8],0

    00570012 8365dc00        and     dword ptr [ebp-24h],0

    00570016 8365f000        and     dword ptr [ebp-10h],0


    1: kd> u

    0057001a 8365e000        and     dword ptr [ebp-20h],0

    0057001e 8365f400        and     dword ptr [ebp-0Ch],0

    00570022 6a01            push    1

    00570024 8b4508          mov     eax,dword ptr [ebp+8]        ß The 1st argument is a pointer to a list of functions.

    00570027 ff5004          call    dword ptr [eax+4]

    0057002a 8945fc          mov     dword ptr [ebp-4],eax

    0057002d 8b4508          mov     eax,dword ptr [ebp+8]        ß Function block.

    00570030 ff5010          call    dword ptr [eax+10h]


    1: kd> u

    00570033 8945e4          mov     dword ptr [ebp-1Ch],eax

    00570036 837de400        cmp     dword ptr [ebp-1Ch],0

    0057003a 0f84c0010000    je      00570200


    The first argument is a block of functions.  That is what was passed as the initial parameter.  What functions were passed?


    1: kd> dds 570228 l 5

    00570228  77e5c6fa kernel32!LoadLibraryW

    0057022c  77e6c2dc kernel32!SetErrorMode

    00570230  77e70531 kernel32!GetCurrentDirectoryW

    00570234  77e70d67 kernel32!SetCurrentDirectoryW

    00570238  77e63ec7 kernel32!GetProcessHeap


    These functions are standard kernel32 calls.  So, the question is why is it doing that?



    What is this thread doing?

    Based on the fact that the IP is not in any module, the IP is page aligned, and the thread was passed function addresses as it initial parameter, it looks like this thread was “injected” into this process. 



    How was this thread injected?

    Another process, (I do not know which one) allocated a block of memory in the Winlogon process via VirtualAllocEx.  This function takes a process handle as input and it can be a different process.  Then code can be moved into the process via WriteProcessMemory.  Then a thread can be created using the starting address of the memory address returned from VirtualAllocEx.



    Are we done?

    Nope – remember the block of addresses?  This is needed because the module was not loaded by the loader.  So the functions did not get resolved by the linker.  So the addresses of functions outside of the code moved are unknown.  Since in Windows Server 2003 the functions of certain DLLs stay at the same address, they can be passed to another process.  Vista and beyond does not do this, so this method will not work.




    Hope you found this interesting.  More on DLL injection can be found here: http://www.codeproject.com/dll/DLL_Injection_tutorial.asp and here: http://en.wikipedia.org/wiki/DLL_injection.


    AppInit_DLLs is another method of getting a DLL to load (in all process that rely on user32.dll), documented in KB 197571.  This can lead to problems though, as Raymond Chen discussed here.


  • Ntdebugging Blog

    Designing the Perfect Breakpoint



    Written by Jeff Dailey.



    When it comes to live debugging, the breakpoint is king.  Oftentimes solving a very complex problem in a production environment involves doing a local, non-production debug one of my own test machines.  I’ll typically debug the process or code in question to get a good idea of how it works and what data I need to collect when I break in.   This normally involves some reverse engineering and doing code review of the process or module in question.    We always want to minimize impact on production environments when it’s time to do the live debug.  I call it designing the perfect breakpoint.



    There are several types of breakpoints we typically use. 



    The most common breakpoint is bp, and setting it in windbg is simple:  bp 0XADDRESS.   Note that when you break into a user mode process it halts execution of the process.  When you break into the kernel debugger it halts the entire machine.  This being the case, we like to figure out what commands we would issue once the breakpoint hits.  You can then set the breakpoint to automatically issue the commands at the time of break in.   



    The syntax for running commands upon hitting a breakpoint is straightforward:  bp 0XADDRESS “comand1;command2;…”


    As an example - bp 0xMYADDRESS “kv;dd g_myGlobal;g”    This command will break on 0xMYADDRESS, dump the callstack via kv; dump memory as DWORDs starting at the g_MyGlobal address; and then continue execution with the g command.



    Another cool trick is to have a breakpoint enable or disable another breakpoint or disable itself.     Say you have a breakpoint that is hit far too often to break in EVERY single time that code executes.    However, once a particular state or a condition is met, you may want every call stack associated with that break point.    All you need do is set a breakpoint that is waiting on your key conditional code to execute.   This breakpoint can then enable the more expensive breakpoint.



    Here is the example:


    bp ntdll!RtlAllocateHeap “kv;g”    << This will be breakpoint 1, set on a very common call.

    bd 1    << Disable break point 1, as we don’t want to see its output until breakpoint 2 executes.

    bp notepad!OpenFile “be 1;g”  << This will enable breakpoint 1 when NotePad!OpenFile is executed. 



    The most expensive type of break point is a ba - Break on access.  This break point allows you to break on access to either memory or I/O ports.  There are several variations : ba w (break on write), ba r (break on read), ba e (break on execute), and ba i  (for I/O ports)



    Let’s use break on write for our sample.   The syntax is simple.  ba w4 0xNNNNnnnn.  This means break on access, access type WRITE, with a width of 4 bytes, followed by the target address.   However, because the processor has to watch for access to this memory, it generally slows down execution time.  This might be a breakpoint that you enable conditionally using BE during another condition check. 



    Ba type breakpoints are very handy when you have a resource changing and you don’t know what code path is touching it.    Imagine you had a corrupted critical section.  It’s not likely one of the critical section APIs is causing  the corruption.  It’s more likely that something overwrote that memory.  That being the case you could simply do a ba w4 on the critical section address followed by a “kv;g” and let the process run.  You will see all the places the critical section APIs touches the critical section; however, you will also see any offender that writes over that memory.



    Refer to the debugger.chm file in the debugger tools directory for details on the BA variations.



    The conditional break


    You can also do conditional breaks or checks during the breakpoint execution in the debugger by using the j command in the section following the breakpoint: bp Address “j EXPRESSION ‘IfTrueCommands1;IfTruecommand2’  ; ‘ElseCommand1;ElseCommand2’”



    The following is an example of checking a call to a file open operation.  The breakpoint is set on the instruction following a CALL.

    bp notepad!SomeNotePadCodeOpeningAFile+0x132 "j @eax >0 '.echo all is good file opened ok;gc';'!gle;kv;lsa eip;gc'“



    If eax (the default register used to store function return values) is nonzero, we echo ‘all is good, file opened ok’ to the command output window.  If eax is zero (indicating failure), we display the last error with !gle, dump the call stack via kv, and list the source at the eip address using LSA EIP.



    Good luck and Happy debugging.

  • Ntdebugging Blog

    Windbg Tip: KN, .Frame , DV, and DT - It's so easy


    Written by Jeff Dailey.

    Hello NTDebuggers, many of us take for granted some of the simple commands in the debugger that make life easy. I was thinking of several in particular that go great together.  The first command would be kn.  Kn will show the current call stack and includes the stack frame number to the far left of the individual call.  

    I loaded up a copy of BadWindow.EXE in Windbg, set my thread to thread Zero via ~0s.  Then I simply type kn.  This dumps out the call stack, and as you can see, to the far left each call there is the stack frame number.

    Using stack frame 2 from the above stack let’s look at the locals it uses.  First we must get in context using .frame and the frame number.

    Now to dump out the locals we can just use the dv (Display Local Variables) command.  (Note you need private symbols for the binary you are debugging for this to work).

    As you can see, we have several locals here.  Now, because I have private symbols for this binary, the debugger already understands the structures and data types for each variable.

    Furthermore because I’m already in the stack frames context I can simply dt (display type) with the name of the variable and the debugger figures out the address.  Let’s dt msg and see what data it contains.

    As you can see  have a couple of structures that are part of this structure.    Look at the right in the above output.  You will notice a data type name of HWND_ and tagPOINT to the right.  These are structures.  The good news is that dt will work recursively and will dump out structures under your current structure.   All you need to do is issue a DT –r.  You can specify the depth by appending a number to –r.

    For more information about the dt command and others mentioned in this blog make sure you take a look at the debugger.chm file that comes with the Windows Debugging Tools.

    Good luck and happy debugging.

  • Ntdebugging Blog

    How to debug WOW64 applications?



    Hello, my name is Venkatesh Ganga, and I’m an Escalation Engineer on the Microsoft Platform Global Escalation Services team.  Recently, I worked on an issue where registry reflection was not happening while installing 32bit Office under the system account. This required looking into both the 32bit Office code and the Wow64 code where the registry reflection is implemented.  When attaching to the Wow64 process using the 32bit debugger it’s like debugging a 32bit process on the 32bit machine; there are no 64bit binaries in the process. However, we needed to debug Wow64 to debug the registry reflection code. To do this we attached to the Wow64 process using the 64bit debugger which allows you to see the Wow64 binaries.

    Ø  lm

                Base TimeStamp                     Module

              400000 42435b2a Mar 24 18:28:26 2005 C:\Program Files (x86)\Internet Explorer\IEXPLORE.EXE

            77ec0000 45d6cc72 Feb 17 03:35:46 2007 C:\WINDOWS\system32\ntdll.dll

            6b000000 45d6943d Feb 16 23:35:57 2007 C:\WINDOWS\system32\wow64.dll

            6b280000 45d695f3 Feb 16 23:43:15 2007 C:\WINDOWS\system32\wow64win.dll

            78b80000 42438b7a Mar 24 21:54:34 2005 C:\WINDOWS\system32\wow64cpu.dll


    There are 2 options for debugging Wow64 applications.

    1.       Using the 64bit debugger and Wow64 debugger extension (Wow64exts.dll)

    2.       Using the 32bit debugger

    Using 64bit debugger and Wow64 debugger extension (Wow64exts.dll)

    I ran the 32bit version of Internet Explorer on a 64bit machine and attached to it using the 64bit debugger. Here is the thread 0 call stack when viewed from 64bit debugger.

    0:000> kL

    Child-SP          RetAddr           Call Site

    00000000`0013edf8 00000000`78b8428e wow64cpu!CpupSyscallStub+0x9

    00000000`0013ee00 00000000`6b006a5a wow64cpu!Thunk0Arg+0x5

    00000000`0013ee70 00000000`6b005e0d wow64!RunCpuSimulation+0xa

    00000000`0013eea0 00000000`77ed8030 wow64!Wow64LdrpInitialize+0x2ed

    00000000`0013f6d0 00000000`77ed582f ntdll!LdrpInitializeProcess+0x1538

    00000000`0013f9d0 00000000`77ef30a5 ntdll!_LdrpInitialize+0x18f

    00000000`0013fab0 00000000`77d59620 ntdll!KiUserApcDispatch+0x15

    00000000`0013ffa8 00000000`00000000 0x77d59620

    00000000`0013ffb0 00000000`00000000 0x0

    00000000`0013ffb8 00000000`00000000 0x0

    00000000`0013ffc0 00000000`00000000 0x0

    00000000`0013ffc8 00000000`00000000 0x0

    00000000`0013ffd0 00000000`00000000 0x0

    00000000`0013ffd8 00000000`00000000 0x0

    00000000`0013ffe0 00000000`00000000 0x0

    00000000`0013ffe8 00000000`00000000 0x0

    00000000`0013fff0 00000000`00000000 0x0

    00000000`0013fff8 00000000`00000000 0x0

    00000000`00140000 00000020`78746341 0x0

    00000000`00140008 00005370`00000001 0x20`78746341


    The above stack only shows the 64 bit calls; we cannot see what the 32 bit calls are doing. To get the 32bit stack you must use one of the below methods.

    Ø  Option 1 : Run “!wow64exts.k”

    0:000> !wow64exts.k

    Walking 64bit Stack…

    Child-SP          RetAddr           Call Site

    00000000`0013edf8 00000000`78b8428e wow64cpu!CpupSyscallStub+0x9

    00000000`0013ee00 00000000`6b006a5a wow64cpu!Thunk0Arg+0x5

    00000000`0013ee70 00000000`6b005e0d wow64!RunCpuSimulation+0xa

    00000000`0013eea0 00000000`77ed8030 wow64!Wow64LdrpInitialize+0x2ed

    00000000`0013f6d0 00000000`77ed582f ntdll!LdrpInitializeProcess+0x1538

    00000000`0013f9d0 00000000`77ef30a5 ntdll!_LdrpInitialize+0x18f

    00000000`0013fab0 00000000`77d59620 ntdll!KiUserApcDispatch+0x15

    00000000`0013ffa8 00000000`00000000 0x77d59620

    00000000`0013ffb0 00000000`00000000 0x0

    00000000`0013ffb8 00000000`00000000 0x0

    00000000`0013ffc0 00000000`00000000 0x0

    00000000`0013ffc8 00000000`00000000 0x0

    00000000`0013ffd0 00000000`00000000 0x0

    00000000`0013ffd8 00000000`00000000 0x0

    00000000`0013ffe0 00000000`00000000 0x0

    00000000`0013ffe8 00000000`00000000 0x0

    00000000`0013fff0 00000000`00000000 0x0

    00000000`0013fff8 00000000`00000000 0x0

    00000000`00140000 00000020`78746341 0x0

    00000000`00140008 00005370`00000001 0x20`78746341

    Walking 32bit Stack...

    ChildEBP          RetAddr          

    002ded98 75ec1c83 USER32!NtUserWaitMessage+0x15

    002dee24 75ec61ef BROWSEUI!BrowserProtectedThreadProc+0x44

    002dfea8 779ba3a6 BROWSEUI!SHOpenFolderWindow+0x22c

    002dfec8 0040243d SHDOCVW!IEWinMain+0x129

    002dff1c 00402748 IEXPLORE!WinMain+0x316

    002dffc0 7d4e7d2a IEXPLORE!WinMainCRTStartup+0x186

    002dfff0 00000000 KERNEL32!BaseProcessStart+0x28


    Ø  Option 2 : Switch to x86 mode (using “!wow64exts.sw”) and do KB.


    0:000> !wow64exts.sw

    Switched to 32bit mode

    0:000:x86> kb

    ChildEBP          RetAddr           Args to Child                                        

    002ded98 75ec1c83 002f1be8 002dee50 002f1be8 USER32!NtUserWaitMessage+0x15

    002dee24 75ec61ef 002f1be8 002f1be8 00000000 BROWSEUI!BrowserProtectedThreadProc+0x44

    002dfea8 779ba3a6 002f1be8 00000001 00000000 BROWSEUI!SHOpenFolderWindow+0x22c

    002dfec8 0040243d 002e2508 00000001 ffffffff SHDOCVW!IEWinMain+0x129

    002dff1c 00402748 00400000 00000000 002e2508 IEXPLORE!WinMain+0x316

    002dffc0 7d4e7d2a 00000000 00000000 7efdf000 IEXPLORE!WinMainCRTStartup+0x186

    002dfff0 00000000 004025c2 00000000 000000c8 KERNEL32!BaseProcessStart+0x28


    The easiest way to see all of the 32bit call stacks is by switching to  32bit mode (!wow64exts.sw) and doing ~*k.  In addition, you can set breakpoints in 32bit or 64 bit binaries using the 64bit debugger.  Also note  “!peb” will show both the 64bit and 32bit PEB.


    Using the 32bit debugger

    As mentioned earlier there is nothing wrong with using the 32 bit debugger.  If you just need to debug the application’s 32bit code, using it is probably the simplest approach.  However, if you need to view Wow64 code or binaries, you must use the 64bit debugger.  Note that these techniques apply to debugging Wow64 dumps and live processes.

    You can find more information about WoW64 applications at http://msdn.microsoft.com/en-us/library/aa384249(VS.85).aspx 


  • Ntdebugging Blog

    NTDebugging Puzzler 0x00000007: Interlocked functions



    Today, we will have some fun with interlocked functions.


    The following section of code is reentrant.  A “well meaning” developer used interlocked functions to avoid serializing on a global table lock.


    Initial smoke testing shows that the code works fine.  Sometimes things are not as they appear when doing initial code review.  After several hours of heavy stress testing, the developer finds the machine has bugchecked.  Analysis of the dump showed that the caller of this function had steamrolled through nonpaged pool writing stacks on top of everyone’s pool memory.


    The goal of today’s puzzler is to find the bug and describe how it could be fixed with minimal timing impact.


    Here are a few details before you begin.


    1.       The variable gIndex is an unsigned long global.

    2.       The gLogArray memory was allocated from nonpaged pool and the size of this allocation is correct.




    00:   PLOG_ENTRY GetNextLogEntry ()


    01:         ULONG IterationCount = MAX_RECORDS;


    02:         PLOG_ENTRY pEntry;


    03:         do


    04:               if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS) == MAX_RECORDS)


    05:                     pEntry = &gLogArray[0];


    06:               else


    07:                     pEntry = &gLogArray[InterlockedIncrement(&gIndex)];


    08:               --IterationCount;


    09:         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (IterationCount > 0));


    0a:         return (IterationCount > 0) ? pEntry : NULL;





    Happy hunting,


    Dennis Middleton “The NTFS Doctor”


      [Update: our answer. Posted 6/10/2008]


    Thanks for all the great posts!  I saw many interesting answers, and a few unique solutions that I hadn’t considered.


    Bug Description


    A slight time gap exists between the InterlockedCompareExchange and the InterlockedIncrement.  For this reason, several threads could pass the check for MAX_RECORDS prior to doing the increment.


    Assume that N is the number of threads that pass the check for MAX_RECORDS while gIndex is at a particular value.

    If N or more threads are able to pass the check while gIndex is equal to MAX_RECORDS-(N-1), then gIndex would be incremented beyond MAX_RECORDS.


    For example, let’s assume that 3 threads passed the check while gIndex was at MAX_RECORDS-2.  Then after the three increments occur, gIndex would be equal to MAX_RECORDS+1.  From that point, invalid pointers would be passed out to the caller.


    Possible Solutions


    There are several ways to solve this problem.  Some are more efficient than others.  I would avoid doing checks for MAX_RECORDS-1, MAX_RECORDS, or MAX_RECORDS+1 (interlocked or not) since there could potentially be more than two threads involved in the race condition.  Such a solution would only reduce the likelihood of an overflow.


    There were a few posts suggesting a lock or critical section for the section of code between the compare and increment.  That would be one solution, but it would also do away with the benefits of using interlocked functions.


    In keeping with the philosophy of keeping the code fast and simple, here’s a solution that gives a very good result with minimal impact.


    1.       I removed the if () {} else {} and simply allowed gIndex to increment unchecked.  With this change, gIndex can approach its max unsigned long value and possibly wrap - we need to keep the array index in check.

    2.       The modulus operator (added to line 4 below) will divide the incremented gIndex by MAX_RECORDS and use the remainder as the array index.  When dividing, the resultant remainder is always smaller than the divisor (MAX_RECORDS).  For this reason, the array index is guaranteed to be smaller than MAX_RECORDS.  As even multiples of MAX_RECORDS are reached, the array index resets back to zero mathemagically and no interlocked compare is even necessary.



    00:   PLOG_ENTRY GetNextLogEntry ()


    01:         ULONG IterationCount = MAX_RECORDS;


    02:         PLOG_ENTRY pEntry;


    03:         do



    04:               pEntry = &gLogArray[InterlockedIncrement(&gIndex) % MAX_RECORDS];


    05:               --IterationCount;


    06:         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (IterationCount > 0));


    07:         return (IterationCount > 0) ? pEntry : NULL;




    With the fix in place, the code is smaller, faster, easier to read, and most of all - the bug is gone.  When developing code, always try to think small.             


    Best Regards,

    NTFS Doctor




  • Ntdebugging Blog

    Beefier Breakpoints Using Debugger Scripts



    Written by Ron Stock

    I recently collaborated with a third-party vendor which required me to track down a problem during the Vista upgrade process. The issue came down to catching a Microsoft component opening a specific registry service key during the setup process. Issues like this frequently call for live debugging to catch registry access in real time. I could have set a breakpoint on RegOpenKeyExW() and examined every requested key passed into the function, but this method is very time consuming considering RegOpenKeyExW()  is a highly used code path. This methodology involves setting a break point on the function, waiting for the break in, checking the second parameter (lpSubKey) for a match to the desired registry key (string), and hitting “g” if the key doesn’t match. I could have potentially repeated these steps a hundred times before finding the call containing the key I needed. I wanted some way to set a ‘break on access’ only when my key was touched. We ran into a similar issue requiring the ability to catch a file system function working with a specific file when possibly hundreds of files were in use. So how do you gain the automation needed for this type of debugging? The answer is debugger scripts.

    Let’s run through an example with Windows Explorer.exe using this method. In this scenario we’ll attempt to catch Explorer.exe opening the HKEY_LOCAL_MACHINE\SYSTEM\Setup key.  Windows Explorer opens several keys per second, so the focus is to ‘break  in’ when this specific key is passed to RegOpenKeyEx() without having to manually wade through hundreds of possible keys passed into the function.

    1.       Identify the function in question.


    Per MSDN, RegOpenKeyEx is defined as the following, and we’re interested in the second parameter because it contains the name of the key.

    LONG WINAPI RegOpenKeyEx(

      __in        HKEY hKey,

      __in_opt    LPCTSTR lpSubKey,

      __reserved  DWORD ulOptions,

      __in        REGSAM samDesired,

      __out       PHKEY phkResult


    2.       Create the debugger script

    The script logic is straight forward. Our debugger script will break in every time RegOpenKeyEx() is called.  It then checks the second parameter for the string SYSTEM\Setup. If it finds a match it stays broken in, otherwise it returns command to the running thread. Our script will look like this -


    .if (@$t0 != 0) { as /mu ${/v:_str} @$t0 } .else { ad /q ${/v:_str} }

    .if ($spat("${_str}x", "*system\Setup*") == 0) { .echo _str;g } .else { .echo _str }


    r$t0 is considered a pseudo-register and is used as a variable in this script. Since the second parameter of RegOpenKeyEx() is passed via esp+8, our debugger command program will dereference esp+8 and move it into r$t0. Our pseudo-register now holds the memory address of lpSubKey. The second line of the program simply checks to see if r$t0 is NULL, and will setup our alias named _str at the address held in r$t0 if r$t0 is not NULL. The third line uses the MASM string operator, $spat, looking for a string pattern which matches system\setup. If it doesn’t find a match it prints the string, and returns command back to the executing thread. It’s the equivalent of typing ‘g’ in the debugger. If the string pattern system\setup is found, the string is printed out on the screen, and the debugger remains broken in. Then we can debug the underlying problem in the appropriate context. Now save them into a txt located at c:\debuggers\strings.txt so we can reference it from our breakpoint.


    3.       Set the break point.


    Next we break into the Explorer process from the kernel debugger


    kd> !process 0 0 Explorer.exe

    PROCESS 8340d6a8  SessionId: 1  Cid: 0270    Peb: 7ffd5000  ParentCid: 01b0

        DirBase: 2d48f000  ObjectTable: 90171348  HandleCount: 559.

        Image: explorer.exe



    kd> !bpid 0270

    Finding wininit.exe (1)...

    Finding winlogon.exe (1)...

    Waiting for winlogon.exe to break.  This can take a couple of minutes...

    Break instruction exception - code 80000003 (first chance)

    Break into process 270 set.  The next break should be in the desired process.

    Break instruction exception - code 80000003 (first chance)


    77f17dfe cc              int     3



    Then we set a breakpoint on ADVAPI32!RegOpenKeyExW instructing it to use our script to evaluate whether SYSTEM\Setup is the key passed in at the second parameter. If it finds a match to our key it stays broken in otherwise execution is passed back to the running Explorer thread.


    kd> bp ADVAPI32!RegOpenKeyExW "$$<c:\\debuggers\\strings.txt"

    kd> bl

    0 e 76e8f09d     0001 (0001) ADVAPI32!RegOpenKeyExW "$$<c:\debuggers\strings.txt"


    This is the output we would expect to see for all of the non-matches.


    kd> g

    kd> r$t0=poi(esp+8)

    kd> .if (@$t0 != 0) { as /mu ${/v:_str} @$t0 } .else { ad /q ${/v:_str} }

    kd> .if ($spat("${_str}x", "*system\Setup*") == 0) { .echo _str;g } .else { .echo _str }


    kd> r$t0=poi(esp+8)

    kd> .if (@$t0 != 0) { as /mu ${/v:_str} @$t0 } .else { ad /q ${/v:_str} }

    kd> .if ($spat("${_str}x", "*system\Setup*") == 0) { .echo _str;g } .else { .echo _str }

    Software\Policies\Microsoft\Windows NT\DnsClient


    (Several entries have been removed here to conserve space in the blog)


    kd> .if (@$t0 != 0) { as /mu ${/v:_str} @$t0 } .else { ad /q ${/v:_str} }

    kd> .if ($spat("${_str}x", "*system\Setup*") == 0) { .echo _str;g } .else { .echo _str }


    kd> r$t0=poi(esp+8)

    kd> .if (@$t0 != 0) { as /mu ${/v:_str} @$t0 } .else { ad /q ${/v:_str} }

    kd> .if ($spat("${_str}x", "*system\Setup*") == 0) { .echo _str;g } .else { .echo _str }

    Software\Policies\Microsoft\Windows NT\DnsClient

    kd> r$t0=poi(esp+8)

    kd> .if (@$t0 != 0) { as /mu ${/v:_str} @$t0 } .else { ad /q ${/v:_str} }

    kd> .if ($spat("${_str}x", "*system\Setup*") == 0) { .echo _str;g } .else { .echo _str }


    kd> r$t0=poi(esp+8)

    kd> .if (@$t0 != 0) { as /mu ${/v:_str} @$t0 } .else { ad /q ${/v:_str} }

    kd> .if ($spat("${_str}x", "*system\Setup*") == 0) { .echo _str;g } .else { .echo _str }



    76e8f09d 8bff            mov     edi,edi


    Bingo! It detected that System\Setup was the string passed in and remained broken in allowing us to start any necessary debugging.



    kd> kv

    ChildEBP RetAddr  Args to Child             

    077aeae0 760dfb75 80000002 760dfb94 00000000 ADVAPI32!RegOpenKeyExW (FPO: [Non-Fpo])

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    077aed18 760dfbb7 00000004 00000000 0000003a 0x760dfb75

    077aee20 7678c1b2 00000000 00000000 00000000 0x760dfbb7

    077aee24 00000000 00000000 00000000 00000000 kernel32!WaitForSingleObject+0x12 (FPO: [Non-Fpo])


    kd> dc 760dfb94

    760dfb94  00790053 00740073 006d0065 0053005c  S.y.s.t.e.m.\.S.

    760dfba4  00740065 00700075 ff530000 4ce81075  e.t.u.p...S.u..L

    760dfbb4  8bffffff 0ffb3bf8 0032c284 89fb3b00  .....;....2..;..

    760dfbc4  840ff47d 0000012b f875ff56 013ee857  }...+...V.u.W.>.

    760dfbd4  f08b0000 840ff33b 000000f2 0000c3e9  ....;...........

    760dfbe4  90909000 ff8b9090 83ec8b55 458b1cec  ........U......E

    760dfbf4  db335308 fc458956 e8f45d89 fffff583  .S3.V.E..]......

    760dfc04  ff18758b 1e891475 fff5a9e8 89c33bff  .u..u........;..


    Debugger scripts are another cool tool to add to your debugging arsenal. I’ve personally benefited from this type of scripting many times. Again check the debugger help files for more information on the available commands.


    Share this post :
  • Ntdebugging Blog

    Corrupt Page Table Pages Caught in the MDL


    Hello all, Scott Olson here again to share another interesting issue I worked on a while back.  The issue was that after upgrading to Windows XP Service Pack 2 the system would experience random bug checks with memory corruption.  Interestingly, there was a very specific pattern to the corruption - it looked like a PFN address and flags were randomly placed into the page table page in several places in the process.  The memory manager would never do this type of thing and I suspected that a driver was editing user page table pages, which should never be done.

    Let's take a look at the stack:

    kd> kb
      *** Stack trace for last set context - .thread/.cxr resets it
    ChildEBP RetAddr  Args to Child
    f15b1308 80523096 c00862d8 10c5b000 00000000 nt!MiDeletePte+0x198
    f15b13d0 80519776 000001d8 10d20fff 00000000 nt!MiDeleteVirtualAddresses+0x164
    f15b13ec 805b1d74 10c20000 10d20fff f15b14a4 nt!MiDeleteFreeVm+0x20
    f15b148c 8054060c ffffffff 049c6aa8 049c6ab0 nt!NtFreeVirtualMemory+0x42e
    f15b148c 7c90eb94 ffffffff 049c6aa8 049c6ab0 nt!KiFastCallEntry+0xfc
    03e4a398 7c90da54 7c8209b3 ffffffff 049c6aa8 ntdll!KiFastSystemCallRet
    03e4a39c 7c8209b3 ffffffff 049c6aa8 049c6ab0 ntdll!NtFreeVirtualMemory+0xc

    Here is the page table entry for the virtual address:

    kd> !pte 10c5b000
                   VA 10c5b000
    PDE at 00000000C0600430    PTE at 00000000C00862D8
    contains 000000003FC6F867  contains 0000000015E0086F
    pfn 3fc6f ---DA--UWEV    pfn 15e00 ---DA-TUWEV

    This shows that the value 15e0086f is incorrectly put into the page table pages.  This bad value corresponds to a write-through mapping to a range allocated via a call to MmAllocatePagesForMdl.

    c00862d0  00000000 00000000 15e0086f 00000000
    c00862e0  00000000 00000000 00000000 00000000
    c00862f0  00000000 00000000 00000000 00000000
    c0086300  00000000 00000000 00000000 00000000
    c0086310  00000000 00000000 00000000 00000000
    c0086320  00000000 00000000 00000000 00000000
    c0086330  00000000 00000000 00000000 00000000
    c0086340  00000000 00000000 00000000 00000000
    c0086350  00000000 00000000 15e0086f 00000000
    c0086360  00000000 00000000 00000000 00000000
    c0086370  00000000 00000000 00000000 00000000
    c0086380  00000000 00000000 00000000 00000000
    c0086390  00000000 00000000 00000000 00000000
    c00863a0  00000000 00000000 00000000 00000000
    c00863b0  00000000 00000000 00000000 00000000
    c00863c0  00000000 00000000 00000000 00000000
    c00863d0  00000000 00000000 00000000 00000000
    c00863e0  00000000 00000000 00000000 00000000
    c00863f0  00000000 00000000 00000000 00000000
    c0086400  00000000 00000000 00000000 00000000
    c0086410  00000000 00000000 00000000 00000000
    c0086420  00000000 00000000 00000000 00000000
    c0086430  00000000 00000000 00000000 00000000
    c0086440  00000000 00000000 00000000 00000000
    c0086450  00000000 00000000 00000000 00000000
    c0086460  15e0086f 00000000 00000000 00000000
    c0086470  00000000 00000000 00000000 00000000
    c0086480  00000000 00000000 00000000 00000000
    c0086490  00000000 00000000 00000000 00000000
    c00864a0  00000000 00000000 00000000 00000000
    c00864b0  00000000 00000000 00000000 00000000
    c00864c0  00000000 00000000 00000000 00000000
    c00864d0  00000000 00000000 00000000 00000000
    c00864e0  15e0086f 00000000 00000000 00000000
    c00864f0  00000000 00000000 00000000 00000000
    c0086500  00000000 00000000 00000000 00000000
    c0086510  00000000 00000000 00000000 00000000
    c0086520  00000000 00000000 00000000 00000000
    c0086530  00000000 00000000 00000000 00000000
    c0086540  00000000 00000000 00000000 00000000
    c0086550  00000000 00000000 00000000 00000000
    c0086560  15e0086f 00000000 00000000 00000000
    c0086570  00000000 00000000 00000000 00000000
    c0086580  00000000 00000000 00000000 00000000
    c0086590  00000000 00000000 00000000 00000000
    c00865a0  00000000 00000000 00000000 00000000
    c00865b0  00000000 00000000 00000000 00000000
    c00865c0  00000000 00000000 00000000 00000000
    c00865d0  00000000 00000000 00000000 00000000
    c00865e0  00000000 00000000 00000000 00000000
    c00865f0  00000000 00000000 00000000 00000000
    c0086600  00000000 00000000 00000000 00000000
    c0086610  00000000 00000000 00000000 00000000
    c0086620  00000000 00000000 00000000 00000000
    c0086630  00000000 00000000 00000000 00000000
    c0086640  00000000 00000000 00000000 00000000
    c0086650  00000000 00000000 00000000 00000000
    c0086660  00000000 00000000 15e0086f 00000000
    c0086670  00000000 00000000 00000000 00000000
    c0086680  00000000 00000000 00000000 00000000
    c0086690  00000000 00000000 00000000 00000000
    c00866a0  00000000 00000000 00000000 00000000
    c00866b0  00000000 00000000 00000000 00000000
    c00866c0  00000000 00000000 00000000 00000000
    c00866d0  00000000 00000000 00000000 00000000
    c00866e0  00000000 00000000 15e0086f 00000000
    c00866f0  00000000 00000000 00000000 00000000
    c0086700  00000000 00000000 00000000 00000000
    c0086710  00000000 00000000 00000000 00000000
    c0086720  00000000 00000000 00000000 00000000
    c0086730  00000000 00000000 00000000 00000000
    c0086740  00000000 00000000 00000000 00000000
    c0086750  00000000 00000000 00000000 00000000
    c0086760  00000000 00000000 15e0086f 00000000
    c0086770  00000000 00000000 00000000 00000000
    c0086780  00000000 00000000 00000000 00000000
    c0086790  00000000 00000000 00000000 00000000
    c00867a0  00000000 00000000 00000000 00000000
    c00867b0  00000000 00000000 00000000 00000000
    c00867c0  00000000 00000000 00000000 00000000
    c00867d0  00000000 00000000 00000000 00000000
    c00867e0  00000000 00000000 00000000 00000000

    kd> !pfn 15e00
        PFN 00015E00 at address 81BCA800
        flink       00000000  blink / share count 00000001  pteaddress 000AF001
        reference count 0002   Cached     color 0
        restore pte 00000080  containing page        FFEDCB  Active       RW
            ReadInProgress WriteInProgress

    The driver also has an outstanding call MmProbeAndLockPages call on the pages indicated by the reference count of 2.  Thinking that this pfn value is incorrect I decided to search for this value and see what I could find.

    kd> s -d 80000000 l?7fffffff 00015e00
    8022d534  00015e00 0001f190 00041d50 0001f140  .^......P...@...
    86cacbf4  00015e00 0000cd1c 0000cc27 0000cc08  .^......'.......
    86e25cdc  00015e00 0a130005 e56c6946 00000000  .^......Fil.....

    I found a few entries but the middle one looks like it could be an MDL allocation.  So I verified this:

    kd> !pool 86cacbf4 2
    Pool page 86cacbf4 region is Nonpaged pool
    *86cacbd0 size:   80 previous size:   28  (Allocated) *Mdl
                    Pooltag Mdl  : Io, Mdls

    Yes this is an MDL, let's inspect it:

    kd> dt nt!_MDL 86cacbd8
       +0x000 Next             : (null)
       +0x004 Size             : 32
       +0x006 MdlFlags         : 138
       +0x008 Process          : (null)
       +0x00c MappedSystemVa   : 0x00004000
       +0x010 StartVa          : 0xf7baa000
       +0x014 ByteCount        : 0xfff
       +0x018 ByteOffset       : 0

    Notice that the page 15e00 is in the MDL’s page list.

    kd> dd 86cacbd8+1c
    86cacbf4  00015e00 0000cd1c 0000cc27 0000cc08
    86cacc04  0000cc09 0000cc0a 0000cc0b 0000cbec
    86cacc14  0000cbed 0000cbee 0000cbef 0000cbd0
    86cacc24  0000cbd1 0000cbd2 0000cbd3 0000cbd4
    86cacc34  0000cbd5 0000cbd6 00000000 00000000
    86cacc44  00000000 00000000 00000000 00010010

    Next I wanted to see if I could find a driver that may have references to this MDL and I found two:

    kd> s -d 80000000 l?7fffffff 86cacbd8
    86f9c6a0  86cacbd8 0000003d 00000000 0000636a  ....=.......jc..
    86fc7e68  86cacbd8 00000001 00000001 00000000  ................

    Now let's see who owns these

    kd> !pool 86f9c6a0 2
    Pool page 86f9c6a0 region is Nonpaged pool
    *86f9c618 size:   d8 previous size:   30  (Allocated) *Crpt
                    Pooltag Crpt   : Memory corruption driver

    kd> !pool 86fc7e68 2
    Pool page 86fc7e68 region is Nonpaged pool
    *86fc7e00 size:   98 previous size:   40  (Allocated) *Crpt
                    Pooltag Crpt   : Memory corruption driver

    This gives us a pretty convincing probability that this driver is at fault.  So now you may ask, "Why did this problem only start after applying Service Pack 2?"  By default when you install Server Pack 2, Data Execution Prevention (DEP) is enabled on systems that support it.  The support for DEP is in the PAE kernel which uses extra bits to describe the page table entries.  In this crash the solution was to disable DEP until the driver could be corrected.  The driver was incorrectly using the memory mappings by ignoring the extra bits in the page number and causing the memory corruption by writing to the wrong page.  For more information on default DEP settings and enabling/disabling it in Windows see the following article.

    899298  The "Understanding Data Execution Prevention" help topic incorrectly states the default setting for DEP in Windows Server 2003 Service Pack 1

  • Ntdebugging Blog

    Understanding NTFS Compression


    As our group’s file systems expert, I see most of the common problems associated with the use/misuse of NTFS compression.  Before you check the “Compress contents to save disk space” checkbox, it might be good to understand how this affects the happy little bits, bytes, & nibbles running around on the disk.

    Often backup applications receive ERROR_DISK_FULL errors attempting to back up compressed files and this causes quite a bit of confusion when there are still several gigabytes of free space on the drive.  Other issues may also occur when copying compressed files.  The goal of this blog is to give the reader a more thorough understanding of what really happens when you compress NTFS files.

    Compression Units…

    NTFS uses a parameter called the “compression unit” to define the granularity and alignment of compressed byte ranges within a data stream.  The size of the compression unit is based entirely on NTFS cluster size (refer to the table below for details).  In the descriptions below, the abbreviation "CU" is used to describe a Compression Unit and/or its size.

    The default size of the CU is 16 clusters, although the actual size of the CU really depends on the cluster size of the disk.  Below is a chart showing the CU sizes that correspond to each of the valid NTFS cluster sizes.

    Cluster Size Compression Unit Compression Unit (hex bytes)
    512 Bytes 8 KB 0x2000
    1 KB 16 KB 0x4000
    2 KB 32 KB 0x8000
    4 KB 64 KB 0x10000
    8 KB 64 KB 0x10000
    16 KB 64 KB 0x10000
    32 KB 64 KB 0x10000
    64 KB 64 KB 0x10000

    Native NTFS compression does not function on volumes where the cluster size is greater than 4KB, but sparse file compression can still be used.

    NTFS Sparse Files…

    The Sparse Files features of NTFS allow applications to create very large files consisting mostly of zeroed ranges without actually allocating LCNs (logical clusters) for the zeroed ranges.

    For the code-heads in the audience, this can be done by calling DeviceIoControl with the FSCTL_SET_SPARSE IO control code as shown below.

    BOOL SetSparse(HANDLE hFile)
               DWORD Bytes;
               return DeviceIoControl(hFile, FSCTL_SET_SPARSE, NULL, 0, NULL, 0, &Bytes, NULL);

    To specify a zeroed range, your application must then call the DeviceIoControl with the FSCTL_SET_ZERO_DATA IO control code.

    BOOL ZeroRange(HANDLE hFile, LARGE_INTEGER RangeStart, LONGLONG RangeLength)
                FILE_ZERO_DATA_INFORMATION  FileZeroData;
                DWORD Bytes;
                FileZeroData.FileOffset.QuadPart = RangeStart.QuadPart;
                FileZeroData.BeyondFinalZero.QuadPart = RangeStart.QuadPart + RangeLength + 1;
                return DeviceIoControl(  hFile,

    Because sparse files don't actually allocate space for zeroed ranges, a sparse file can be larger than the parent volume.  To do this, NTFS creates a placeholder VCN (virtual cluster number) range with no logical clusters mapped to them. 

    Any attempt to access a “sparsed” range would result in NTFS returning a buffer full of zeroes.  Accessing an allocated range would result in a normal read of the allocated range.  When data is written to a sparse file, an allocated range is created which is exactly aligned with the compression unit boundaries containing the byte(s) written.  Refer to the example below.  If a single byte write occurs for virtual cluster number VCN 0x3a, then all of Compression Unit 3 (VCN 0x30 - 0x3f) would become an allocated LCN (logical cluster number) range.  The allocated LCN range would be filled with zeroes and the single byte would be written to the appropriate byte offset within the target LCN.

    [...] - ALLOCATED
    (,,,) - Compressed
    {   } - Sparse (or free) range
       / 00000000000000000000000000000000000000000000000000000000000000000000000000000000
    VCN  00000000000000001111111111111111222222222222222233333333333333334444444444444444
       \ 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
                CU0             CU1            CU2             CU3             CU4
         {                                              }[..............]{              }
    Extents  {
        VCN = 0x0 LEN = 0x30                        CU0 - CU2
        VCN = 0x30 LEN = 0x10: LCN = 0x2a21f        CU3
        VCN = 0x10 LEN = 0x10                       CU4

    Below is a screen shot of a 2GB file that was created using the sparse file API’s.


    Notice that this volume is only 76.9 MB, yet it has a 2 GB file in the root folder.


    If we attempt to duplicate this sparse file with the COPY command, it fails.  This is because COPY doesn’t know how to duplicate the sparse attributes on the file, so it attempts to create a real 2GB file in the root of D:.  This can occur in production when attempting to move a large database file from one volume to another.  If you have a database application that uses sparse attributes, then it is a good practice to use the database software’s backup / restore features when moving the database to a different volume.


    Now let’s look at the file’s properties with the COMPACT utility.  Notice that the file shows up as compressed, and it has a huge compression ratio.


    If you go back and look at the file properties from EXPLORER, you will notice that there is no compression checkbox (or any other indication that the file is compressed).  This is because the shell does not check for the sparse bit on the file.


    In short, use caution when moving sparse files from one location to another.  Applications tell the file system the offsets of zeroed ranges, so you should always leave the management of sparse files to the application that created them.  Moving or copying the sparse files manually may cause unexpected results.

    NTFS Compression…

    Now that we have discussed sparse files, we will move on to conventional NTFS compression.

    NTFS compresses files by dividing the data stream into CU’s (this is similar to how sparse files work).  When the stream contents are created or changed, each CU in the data stream is compressed individually.  If the compression results in a reduction by one or more clusters, the compressed unit will be written to disk in its compressed format.  Then a sparse VCN range is tacked to the end of the compressed VCN range for alignment purposes (as shown in the example below).  If the data does not compress enough to reduce the size by one cluster, then the entire CU is written to disk in its uncompressed form.

    This design makes random access very fast since only one CU needs to be decompressed in order to access any single VCN in the file.  Unfortunately, large sequential access will be relatively slower since decompression of many CU’s is required to do sequential operations (such as backups).

    In the example below, the compressed file consists of six sets of mapping pairs (encoded file extents).  Three allocated ranges co-exist with three sparse ranges.  The purpose of the sparse ranges is to maintain VCN alignment on compression unit boundaries.  This prevents NTFS from having to decompress the entire file if a user wants to read a small byte range within the file.  The first compression unit (CU0) is compressed by 12.5% (which makes the allocated range smaller by 2 VCNs).  An additional free VCN range is added to the file extents to act as a placeholder for the freed LCNs at the tail of the CU.  The second allocated compression unit (CU1) is similar to the first except that the CU compressed by roughly 50%.

    NTFS was unable to compress CU2 and CU3, but part of CU4 was compressible by 69%.  For this reason, CU2 & CU3 are left uncompressed while CU4 is compressed from VCNs 0x40 to 0x44.  Thus, CU2, CU3, and CU4 are a single run, but the run contains a mixture of compressed & uncompressed VCNs.

    NOTE:  Each set of brackets represents a contiguous run of allocated or free space.  One set of NTFS mapping pairs describes each set of brackets.

    [...] - ALLOCATED
    (,,,) - Compressed
    {   } - Sparse (or free) range
       / 00000000000000000000000000000000000000000000000000000000000000000000000000000000
    VCN  00000000000000001111111111111111222222222222222233333333333333334444444444444444
       \ 0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
               CU0             CU1             CU2             CU3             CU4
         (,,,,,,,,,,,,){}(,,,,,,){      }[...............................,,,,){         }
    Extents  {
        VCN = 0x0 LEN = 0xe : LCN = 0x29e32d        CU0
        VCN = 0xe LEN = 0x2                         CU0
        VCN = 0x10 LEN = 0x8 : LCN = 0x2a291f       CU1
        VCN = 0x18 LEN = 0x8                        CU1
        VCN = 0x20 LEN = 0x25 : LCN = 0x39dd49      CU2 - CU4
        VCN = 0x45 LEN = 0xb                        CU4

    Now we come to the part where we describe limitations of this design.  Below are some examples of what happens when things go wrong while reading / writing compressed files.

    Disk full error during a backup read operation or file copy…

    1. NTFS determines which compression unit is being accessed.
    2. The compression unit’s entire allocated range is read.
    3. If the unit is not compressed, then we skip to step 5.  Otherwise, NTFS would attempt to reserve (but not allocate) the space required to write the decompressed CU back to disk.  If insufficient free space exists on the disk, then the application might get an ERROR_DISK_FULL during the read.
    4. The CU would be decompressed in memory.
    5. The decompressed byte range would be mapped into cache and returned to the requesting application.
    6. If part of the CU is altered in cache…
      1. The reserved disk space from step 3 would become allocated space.
      2. The CU contents would be compressed and flushed back to the newly allocated region (the LCN location will usually not change).
      3. Any recoverable disk space within the CU would be freed.

    Failure to copy a large file to a compressed folder…

    This is the most common problem seen with compression, and currently the solution is to educate users about limitations.  NTFS compression creates approximately one file fragment for every 16 clusters of data.  Because the max cluster size allowed for standard compression is 4K, the largest compression unit allowed is 64KB.  In order to compress a 100 GB file into 64KB sections, you could potentially end up with 1,638,400 fragments.  Encoding 1,638,400 fragments becomes problematic for the file system and can cause a failure to create the compressed file.  On Vista and later, the file copy will fail with STATUS_FILE_SYSTEM_LIMITATION.  On earlier releases, the status code would be STATUS_INSUFFICIENT_RESOURCES.  If possible, avoid using compression on files that are large, or critical to system performance.

    I received feedback from the NTFS Principal Development Lead about this blog.  Fortunately, most of the feedback was good, but he asked that I add a maximum size recommendation.  According to our development team’s research, 50-60 GB is a “reasonable size” for a compressed file on a volume with a 4KB cluster size.  This “reasonable size” goes down sharply for volumes with smaller cluster sizes.

    Compressing .VHD (Virtual Hard Disk) files causes slow virtual machine performance…

    A popular hammer company received a call from a customer complaining “It sure hurts when I hit my thumb with your hammers!”  Likewise, if you compress .VHD files, it is going to be a painful experience, so please compress responsibly.

    Best Regards,

    Dennis Middleton “The NTFS Doctor”

  • Ntdebugging Blog

    NTDebugging Puzzler 0x00000006: Invalid Handle - can you handle it?


    Hi NTDebuggers, this week’s puzzler just so happens to match its number: 0x000000006 = ERROR_INVALID_HANDLE.   That said, let me give you a scenario and the challenge will be to provide the best action plan to isolate the problem.  This should include an explanation of what types of code problems cause invalid handles.  


    Scenario 1 :  You have a customer or client that is getting invalid handle errors.  This is causing unhandled exceptions and the application is crashing.   What action plan do you give the customer?  In scenario 1, you don’t have access to the remote machine.  Your customer will execute any action plan you give them, gather the data, and provide you with the results. 


    Scenario 2: You have full access to the process and you can debug it live on your own personal machine; you even have source access.  What do you do?


    Good luck and happy debugging.



    [Update: our answer. Posted 5/27/2008]


    Hi NTDebuggers, if you’ve been debugging for a while I’m sure you’ve run into an Invalid Handle Error 6 (aka ERROR_INVALID_HANDLE or “The handle is invalid”).  If you see it in your debugger output it indicates you have tried to pass an invalid handle to a call that is either expecting a different handle type or a the value for the handle its self is not valid.   To understand this error in more detail you have to understand what handles are and where they live in the operating system.  Each process has a handle table in the kernel.  Think of this as an array or table of objects in kernel mode.   The handle values are actually multiples of four.  Review http://blogs.msdn.com/oldnewthing/archive/2005/01/21/358109.aspx to see for what the bottom 3 bits are used for.  For this illustration we will be numbering them 1,2,3 etc.



    When you open a file you will get back a simple pointer-sized value that in the handle table points to a file object.  In this case let’s say you opened a file and you received a handle value of 2.  If you envision the handle table you might see something like this.


    ------------- USER ADDRESS SPACE FOR YOUR PROCESS -------------


    Your handle for your open file is 2


    ------------- KERNEL ADDRESS SPACE FOR YOUR PROCESS -------------


    Handle table


    Handle [1]=Pointer to ->Reg

    Handle [2]=Pointer to ->File      <<<<<< Your handle is 2  It points to a file object

    Handle [3]=Pointer to ->File

    Handle [4]=Pointer to ->Reg

    Handle [5]=Pointer to ->Process

    Handle [6]=Pointer to ->Thread

    Handle [7]=Pointer to ->Thread

    Handle [8]=Pointer to ->File

    Handle [9]=Pointer to ->File


    In the debugger, you can see the handle table is referenced from the EPROCESS object for each process.  In the !process output it is called the ObjectTable.


    0: kd> !process 85df8da0 

    PROCESS 85df8da0  SessionId: 0  Cid: 0114    Peb: 7ffda000  ParentCid: 0268

        DirBase: 0acc07e0  ObjectTable: e4b6a4c8  HandleCount:  49



    Before we get into troubleshooting let’s look at a couple of scenarios for bad handles.


    In the first bad handle scenario, let’s say your application closes handle 2 using CloseHandle(), however your application keeps the value 2 in a variable somewhere. In this scenario we will assume no other code has come along and opened a new object that is occupying element 2 of the handle table.  If you go to make a file I/O call using handle 2 the object manager will catch this and your app will get an invalid handle exception.  This is somewhat harmless but your app my crash if you don’t catch the exception and your I/O will obviously not complete.


    In the second scenario we’ll say that your file I/O code closes the handle, and holds on to that handle value.  Meanwhile another part of your application opens a registry key or some other object that is not a file object.  Now your file I/O code goes to use handle 2 again.  The problem is it’s not a handle to a file now.  When the registry code opened the registry key element 2 was available so we now have a registry object in that location.  If you go to do a file I/O on handle 2 you will get an invalid handle message.


    Now in our final scenario, we close a file handle in our file I/O code and keep the handle around.  In this case though we have different code that is also doing file I/O, It opens a file and gets handle value 2.  Now you have big trouble because if the first file I/O code now writes to handle 2, and it should not be because it closed it!  This handle now belongs to a different file.  This means you have code basically doing a write into an unknown region for that code’s context and file format.   This will result in file corruption and no invalid handle error.


    So this really comes down to best practice.  If you’re writing code and close a handle NULL IT OUT!  That way you guarantee it will not be reused accidently at some other point.



    MyHandle = NULL;



    On to debugging:  So how do you track this down? For the remote scenario you could give your customer the following action plan:



    1.       Start gflags and under the image file tab, enter your application name in the image edit control.


    2.       Check Create User mode stack trace database.


    3.       Start the process under windbg.   “WinDBG processname.exe”


    4.       In windbg Run !htrace -enable


    5.       Do a sxe ch.  This will cause windbg to break on an invalid handle exception.


    6.       Run sx to confirm  “ch - Invalid handle – break” should be set.


    7.       Enter g for go.


    8.       Let the process run until you get an invalid handle message.  The debugger should break in.


    9.       Now all you have to do is run !htrace



    Htrace will dump out all the call stacks for each handle that was opened or closed.

    You need to take the invalid handle value and search backward to see where it was last opened, and where it was closed before that.  It’s likely that the close before the last open is your culprit code path.  Make sure that suspect close nulls that handle value out so it does not reuse it. 


    In the live debug scenario following the same procedure.  In this case you have the luxury of going back and setting a breakpoint in the offending call stack that freed the handle before the reuse.  You can then figure out why it was not zeroed out.


    Good luck and happy debugging.



Page 19 of 25 (241 items) «1718192021»