• Ntdebugging Blog

    Understanding File System Minifilter and Legacy Filter Load Order


    Hello, my name is Fred Jeng from the Global Escalation Services team. For today’s post, I want to go over how Windows 7 and Windows Server 2008 R2 load file system mini-filters in a mixed environment when legacy filters are also present. I recently came across an issue where the filters were being loaded out of order based on their altitudes. This can cause all sorts of problems with a filter driver’s functionality if they are incorrectly positioned on the stack.  Take for example the following filter stack, obtained using the fltmc command from the cmd prompt:




    Filter Name                     Num Instances    Altitude    Frame

    ------------------------------  -------------  ------------ -----

    AVLegacy                                        389998.99   <Legacy>

    EncryptionLegacy                                149998.99   <Legacy>

    AVMiniFilter                           3        328000         0

    luafv                                  1        135000         0

    FileInfo                               13       45000          0


    At first glance it looks like there is a problem causing the legacy encryption filter to be loaded above the antivirus minifilter, which has a higher altitude. This may cause issues with AVMiniFilter as the IOs that it receives are still encrypted. Due to limitations in how the filter drivers attach to the driver stack, this is actually the intended behavior. However, there is a solution to manipulate the load order to load the legacy filters correctly based on their altitude.


    First some background information regarding legacy filters and minifilters.

    In the old days before minifilters, legacy drivers can only attach at the top of the driver stack so the load order also controlled the attachment order. The earlier a legacy driver loads, the lower it can attach on the file system stack. Minifilters on the other hand can load at any time, but their positions relative to other minifilters are controlled by their altitude. When a minifilter loads, it needs to register with an appropriate frame created by fltmgr. Each frame is a fltmgr device object and represents a range of altitudes. There can be more than one frame on the file system stack but the range of altitudes that each frame represents cannot overlap with the altitude range of another frame. For interoperability with legacy drivers, minifilters must still maintain a load order group. The frames are created and managed by fltmgr, which itself is a legacy driver. The ramification of this is that fltmgr must follow the old legacy filter driver rules and attach only at the top of the stack.


    From the above example, let’s walk through how the legacy and minifilters above are loaded to get us into the state such that the altitude appears to be out of order.

    First, here are the details for the 5 drivers.


    Driver Name


    Load Order

    Start Type




    FsFilter Anti-Virus





    FsFilter Anti-Virus





    FsFilter Encryption





    FsFilter Virtualization





    FSFilter Bottom





    MSDN has an article that describes load order groups and altitudes for minifilters: http://msdn.microsoft.com/en-us/library/windows/hardware/ff549689%28v=vs.85%29.aspx.

    Referencing this article regarding load order groups and altitudes for minifilter drivers, we can determine that our filters will load in the following order.







    On system bootup, when fltmgr.sys loads it will create Frame 0 with a default altitude range of 0 to 49999. When FileInfo loads with an altitude of 45000, it will fit into the default Frame 0. Next to load is EncryptionLegacy. Since this is a legacy driver, it will attach on top of the legacy driver fltmgr.sys. So this is how our file system stack looks right now.




    Next up is the AVLegacy driver.  This is a legacy driver, so it has to attach above EncryptionLegacy.




    Now the AVMinifilter will load with an altitude of 328000. The OS will check if it will fit in the Frame 0 Fltmgr, but this frame only supports an altitude of 0-45000. Before deciding to create a new fltmgr Frame instance, it will check if there are any legacy filters attached above Frame 0 and adjust Frame 0’s altitude if there are. So in our case, we do have legacy filters on the file system stack at this point and so we go up the list of legacy drivers. First we see EncryptionLegacy with an altitude of 149998.99 so we adjust Frame 0 to cover from 0 to 149998.99. We continue up the list and see AVLegacy with an altitude of 329998.99 so we again adjust the altitude of Frame 0 to now cover 0-329998.99. The reason we do this is because Frame 0 now must handle all minifilters below 329998.99. Since we can only attach legacy filters to the top of the stack, if we add an additional FltMgr frame instance, it has to sit above AVLegacy and can only support minifilters with altitude of 329998.99 or higher. Now that Frame 0 supports 0-329998.99, we can register AVMinifilter with Frame 0.




    At this point, you can already see that AVMinifilter which has a higher altitude then EncryptionLegacy will be loaded below EncryptionLegacy. The last driver to load is the luafv minifilter, and it will fit into Frame 0.




    A couple of things to point out.

    Why can’t we insert a Frame between AVLegacy and EncryptionLegacy when ACMinifilter loads?

    This is due to how the file system stack is constructed with legacy drivers only being able to attach to the top of the stack. Since FltMgr is a legacy driver, it has to conform to these rules.


    Why do we adjust the altitude in Frame 0 to cover 0-329998.99? Why not stop at 149998.99?

    If Frame 0 only adjusts its altitude to the legacy filter directly attached above it and not all the way to the highest attached legacy filter, we won’t be able to handle some range of mini-filters. For example, assume we only adjust Frame 0 to cover 0-149998.99, then when the AVMinifilter with an altitude of 328000 comes along, it won’t fit in Frame 0, and we’re unable to insert a Frame between AVLegacy (329998.99) and EncryptionLegacy(149998.99) so we would either be unable to load AVMinifilter, or we would have to create Frame 1 above AVLegacy and load AVMinifilter there. In which case we would again be faced with the altitude disordering issue.


    If this is the expected behavior, how do we resolve the problem of EncryptionLegacy being loaded above AVMinifilter? The solution is to inject a dummy minifilter that loads at the appropriate time to force fltmgr to create a Frame between the legacy filters. For our case above, I used the DDK to create the NullFilter minifilter driver and changed the load order to FSFilter Compression and gave it an altitude of 160030 which is within the assigned altitude for FSFilterCompression and set the start type to SERVICE_BOOT_START.  Please note that I only used this driver in a test environment, production minfilter drivers must use an altitude assigned by Microsoft.


    For information on minifilter load order groups and altitude, reference http://msdn.microsoft.com/en-us/windows/hardware/gg462963.aspx.


    Our list of filters are as follows:


    Driver Name


    Load Order

    Start Type




    FsFilter Anti-Virus





    FsFilter Anti-Virus





    FsFilter Encryption





    FsFilter Virtualization





    FSFilter Bottom





    FSFilter Compression




    So with the new NullFilter dummy driver, our filter load order should be as follows:








    After FileInfo and EncryptionLegacy loads, the stack is the same as what we had earlier.





    Now when the NullFilter minifilter loads with an altitude of 160030, we see that it doesn’t fit in Frame 0. As before, we check for any attached legacy filter drivers and see EncryptionLegacy so we adjust Frame 0 to cover 0-149998.99. Since NullFilter still does not fit in Frame 0, we will create a new Frame and attach it above the EncryptionLegacy driver.







    The AVLegacy driver will load next, and since it is a legacy driver, it will attach above the Frame 1 instance of FltMgr.




    The last two minifilters to load are AVMiniFilter and luafv. When AVMinifilter loads into Frame 1 with an altitude of 328000, it will see that Frame 1 at the time only supports 149998.99-160030. It follows the same algorithm to check if there are any legacy filters attached above the frame. In this case, we have AVLegacy attached above Frame 1 so we adjust Frame 1 to cover 149998.99-329998.99 before inserting AVMinifilter into Frame 1.




    By strategically injecting a dummy minifilter driver, we can get the legacy and minifilter drivers to all load at the correct altitude.

  • Ntdebugging Blog

    Don't Believe Everything You Read


    Recently, I was contacted by a customer who was advised by an ISV to set a registry value under one of the sub keys in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\.  Let's call it UseQuantumComputing = 1 (value name has been changed to protect the ISV).  The customer wanted to know what this value actually did and no one could find any documentation explaining it.  These issues often come to our team because we have access to the Windows source code. I did a bit of code review to find out what this value does.  As it turns out, nowhere in Windows source code between Windows 2000 and Windows Server 2012 do we ever check for or set UseQuantumComputing.


    I can think of a few reasons the ISV would suggest setting this value.  Perhaps they were under the impression this did something but got confused about the value name.  It's possible they hoped making a registry change would have a placebo effect.  Or, perhaps their software actually checks this value, not Windows.


    The latter of these possibilities is probably the worst case scenario. An ISV should not create a registry value inside of keys used for Windows’ own internal use. Why? The first reason is that there's no guarantee that Microsoft won't end up coincidentally using that same value name later. This would cause a conflict between the two users of the value.  Second, we have to consider the scenario where two different ISVs both decide to use the same value. That would be bad too.  Lastly, there's no guarantee that the key in use will still exist in later versions, or that it will be writeable or readable by the ISV due to permission changes.  In addition to all these reasons, there is the common sense issue that it is just confusing. Now the ISV's software and uninstaller needs to look all over the registry, not just in their own keys.


    On a similar note, I also recently had a case where a "Windows Tips" blog (not created, endorsed, or run by Microsoft) suggested using a registry value that was implemented in Windows but was not documented by Microsoft.  It turns out this value wasn't thoroughly tested (because it was undocumented and wasn't intended to be used in production), and using it would cause server hangs under certain conditions.  These hangs were only discovered after a large customer decided to implement the undocumented value across their enterprise. 


    Here are a few tips for IT Pros, developers, and users alike:

    • Don't implement random registry settings if you can't find documentation for that setting on an official Microsoft site, like MSDN, TechNet, or support.microsoft.com(information on forums or answer boards (e.g. social.*.microsoft.com or answers.*.microsoft.com) is not official documentation).  At best these unknown registry settings they will do nothing, at worst they will cause you headaches later.
    • If a key/value isn't documented, changes to it likely are not tested, and could put your machine in a state that makes it difficult or impossible to support.
    • If you are a developer, keep any of your registry settings in your own key. Don't pollute in others' keys.
    • If an ISV or Microsoft suggests you implement a setting, make sure you understand the implications of that setting.


    I'll leave you with the warning displayed in many of our KBs - it's there for a reason!


    WARNING: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.



  • Ntdebugging Blog

    Leaving the Do Not Disturb Sign on the Door Will Cause the KERNEL_APC_PENDING_DURING_EXIT Bugcheck


    This is Ron Stock from the Global Escalation Services team and I recently worked with a customer to determine which misbehaving driver was crashing their critical server. This particular crash was a STOP 0x00000020 which maps to KERNEL_APC_PENDING_DURING_EXIT.


    The KERNEL_APC_PENDING_DURING_EXIT bugcheck type indicates the APC disable count for a thread was not equal to zero when the thread exited. The APC disable count is a field in the _KTHREAD structure and it is decremented when drivers disable APCs by calling functions such as KeEnterCriticalRegion, FsRtlEnterFileSystem or by acquiring a mutex. Disabling APC delivery to a thread is the equivalent of hanging the “Do Not Disturb” sign on your door. When drivers need to perform a critical operation they ‘hang the sign on the door’ to prevent interruption from APCs. When the same driver fails to ‘take the sign off the door’ by calling KeLeaveCriticalRegion, FsRtlExitFileSystem or KeReleaseMutex, the APC disable count is never incremented back to its original value. This forgetful behavior causes a bugcheck because the APC disable count is checked when the thread is exiting. The OS expects this value to be zero on thread exit.


    In my case the value was 0xffff (negative 1) indicating a driver had forgot to remove the ‘Do Not Disturb’ sign.


    kd> !analyze –v



    The key data item is the thread's APC disable count.

    If this is non-zero, then this is the source of the problem.


    Arg1: 0000000000000000, The address of the APC found pending during exit.

    Arg2: 000000000000ffff, The thread's APC disable count

    Arg3: 0000000000000000, The current IRQL

    Arg4: 0000000000000001


    Because the value is decremented earlier in time the current call stack is not particularly useful. It merely shows the thread exiting under normal conditions.


    0: kd> !thread -1 e

    THREAD fffffa8049f04b50  Cid 0004.0998  Teb: 0000000000000000 Win32Thread: 0000000000000000 RUNNING on processor 0

    Not impersonating

    DeviceMap                 fffff8a000007ee0

    Owning Process            fffffa8048cad9e0       Image:         System

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      11503325       Ticks: 0

    Context Switch Count      185715         IdealProcessor: 0

    UserTime                  00:00:00.000

    KernelTime                00:00:06.078

    Win32 Start Address srv2!SrvProcWorkerThread(0xfffff88003c4b400)

    Stack Init fffff88005078db0 Current fffff880050789b0

    Base fffff88005079000 Limit fffff88005073000 Call 0

    Priority 15 BasePriority 15 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffff880`05078b08 fffff800`01984bd9 nt!KeBugCheckEx

    fffff880`05078b10 fffff800`019a1a3d nt!PspExitThread+0xffffffff`fffe3ae9

    fffff880`05078c10 fffff800`0195bc8a nt!PspTerminateThreadByPointer+0x4d

    fffff880`05078c60 fffff880`03c56769 nt!PsTerminateSystemThread+0x22

    fffff880`05078c90 fffff880`03c4b5b6 srv2!SrvProcTerminateWorkerThreadInternal+0x99

    fffff880`05078cc0 fffff800`01966e5a srv2!SrvProcWorkerThread+0x1b6

    fffff880`05078d40 fffff800`016c0d26 nt!PspSystemThreadStartup+0x5a

    fffff880`05078d80 00000000`00000000 nt!KxStartSystemThread+0x16


    Driver Verifier is the ideal tool for this type of bugcheck. It has a feature called Critical Region logging which tracks the call stack and KTRHEAD value for each call to either KeEnterCriticalRegion() and KeLeaveCriticalRegion(). I had the customer enable this logging by selecting the “Miscellaneous checks” option in Driver Verifier using these steps-

    • Run Verifier.exe
    • Select “Create custom settings (For code developers)”
    • Select individual settings from a full list
    • Select Miscellaneous checks
    • Select Driver Names from a list
    • Manually choose all of the third-party drivers.
    • Reboot after making the changes. 


    After running through the steps above, we gathered another STOP 0x00000020 dump. I confirmed the “Miscellaneous checks” option was enabled by using the !verifier command


    0: kd> !verifier


    Verify Level 800 ... enabled options are:

          Miscellaneous checks enabled


    The stack in this new dump was in the same SrvProcWorkerThread thread exit path so we had a consistent pattern. The thread with the negative APC Disable count was fffffa804b5be040.


    0: kd> !thread -1 e

    THREAD fffffa804b5be040 Cid 0004.082c  Teb: 0000000000000000 Win32Thread: 0000000000000000 RUNNING on processor 0

    Not impersonating

    DeviceMap                 fffff8a000007ee0

    Owning Process            fffffa8048cad9e0       Image:         System

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      4458237        Ticks: 0

    Context Switch Count      36067          IdealProcessor: 0            

    UserTime                  00:00:00.000

    KernelTime                00:00:01.218

    Win32 Start Address srv2!SrvProcWorkerThread(0xfffff88004827400)

    Stack Init fffff88005cc6db0 Current fffff88005cc69b0

    Base fffff88005cc7000 Limit fffff88005cc1000 Call 0

    Priority 15 BasePriority 15 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffff880`05cc6b08 fffff800`0198dbd9 nt!KeBugCheckEx

    fffff880`05cc6b10 fffff800`019aaa3d nt!PspExitThread+0xffffffff`fffe3ae9

    fffff880`05cc6c10 fffff800`01964c8a nt!PspTerminateThreadByPointer+0x4d

    fffff880`05cc6c60 fffff880`048326d9 nt!PsTerminateSystemThread+0x22

    fffff880`05cc6c90 fffff880`048275b6 srv2!SrvProcTerminateWorkerThreadInternal+0x99

    fffff880`05cc6cc0 fffff800`0196fe5a srv2!SrvProcWorkerThread+0x1b6

    fffff880`05cc6d40 fffff800`016c9d26 nt!PspSystemThreadStartup+0x5a

    fffff880`05cc6d80 00000000`00000000 nt!KxStartSystemThread+0x16


    I dumped the Critical Region log by using the !verifier 200 command. The Critical Region log has enough room for 128 stacks. After dumping the log, the first thing to do is to find the KTHREAD value of the thread with the non-zero APC disable count. Unfortunately in my case thread fffffa804b5be040 didn’t appear in the log. In fact all 128 stacks had a driver named Suspect.sys calling KeEnterCriticalRegion or KeLeaveCriticalRegion.  Note: To protect our vendor friends, I renamed the actual sys file in this article to suspect.sys.


    The customer disabled the suspect.sys driver hoping this was the driver forgetting to re-enable APCs. If nothing else, this would perhaps remove the noisy suspect.sys from the log in the next dump.


    0: kd> !verifier 200


    Enter/Leave Critical Region log:

    There are up to 0x80 entries in the log.


    Displaying all the log entries.



    Thread fffffa8048ce4b50

    fffff80001b74293 nt!VerifierKeLeaveCriticalRegion+0xc3

    fffff8800100aafa Suspect.sys+0xaafa

    fffff88001001e30 Suspect.sys+0x1e30

    fffff80001abc68c nt!IopLoadUnloadDriver+0x1c

    fffff800016e1641 nt!ExpWorkerThread+0x111

    fffff8000196ee5a nt!PspSystemThreadStartup+0x5a

    fffff800016c8d26 nt!KiStartSystemThread+0x16


    Thread fffffa8048ce4b50

    fffff80001b6b0a2 nt!VerifierKeEnterCriticalRegion+0x92

    fffff880010062a3 Suspect.sys+0x62a3

    fffff8800100a7e2 Suspect.sys+0xa7e2

    fffff88001001e30 Suspect.sys+0x1e30

    fffff80001abc68c nt!IopLoadUnloadDriver+0x1c

    fffff800016e1641 nt!ExpWorkerThread+0x111

    fffff8000196ee5a nt!PspSystemThreadStartup+0x5a


    Unfortunately, the system continued to crash and in the next dump the critical region log was completely empty. My guess is the complier was optimizing the KeEnterCriticalRegion and KeLeaveCriticalRegion calls in the driver, causing them to be inlined and skipping the call to VerifierKeLeaveCriticalRegion/VerifierKeEnterCriticalRegion. I needed another attack plan.


    There is another Verifier option called I/O Verification and it works in a similar way to the steps below.  Please note that this functionality is not documented and may be subject to change at any time.

    1. A call to IoCallDriver() is made to send an IO packet to a driver associated with a device.
    2. Verifier hooks the call.
    3. Verifier creates a structure to record state info.
    4. Verifier fills in the structure with data including the thread’s APC Disable Count.
    5. Next Verifier calls the normal IoCallDriver() routine to “continue” the call made in step 1.
    6. The driver does its work (disables and re-enables APCs as needed)
    7. The call to IoCallDriver() returns when the driver is finished.
    8. Verifier checks the real APC count in the thread. Next it compares the value to the recorded value in the structure from step 4. If the two values do not match, then Verifier crashes the machine so we can pull the bad driver out of the dump.


    I had the customer enable I/O Verification using these steps-

    • Run Verifier.exe
    • Select “Create custom settings (For code developers)”
    • Select individual settings from a full list
    • Select I/O Verification
    • Select Driver Names from a list
    • Manually choose all of the third-party drivers.
    • Reboot after making the changes. 


    As we expected, the machine crashed again because of the APC Disable issue.  Because we enabled I/O Verification, the bugcheck type changed to DRIVER_VERIFIER_DETECTED_VIOLATION and now we have a smoking gun.


    I used the !verifier command to review the Verifier settings. The output below shows “I\O subsystem checking enabled” which confirms I/O Verification was been enabled.


    0: kd> !verifier


    Verify Level 810 ... enabled options are:

          Io subsystem checking enabled


    The parameters to KeBugcheck reconfirmed the APC disable count was -1 (ffff). And this time we have an additional breadcrumb, the driver dispatch routine address.



    A device driver attempting to corrupt the system has been caught.  This is

    because the driver was specified in the registry as being suspect (by the

    administrator) and the kernel has enabled substantial checking of this driver.

    If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will

    be among the most commonly seen crashes.


    Arg1: 00000000000000c5, Thread APC disable count changed by driver dispatch routine.

    Arg2: fffff88001345610, Driver dispatch routine address.

    Arg3: 000000000000ffff, Current thread APC disable count.

    Arg4: 0000000000000000, Thread APC disable count before calling driver dispatch routine.

          The APC disable count is decremented each time a driver calls

          KeEnterCriticalRegion, FsRtlEnterFileSystem, or acquires a mutex. The APC

          disable count is incremented each time a driver calls KeLeaveCriticalRegion,

          FsRtlExitFileSystem, or KeReleaseMutex. Since these calls should always be in

          pairs, this value should be zero when a thread exits. A negative value

          indicates that a driver has disabled APC calls without re-enabling them. A

          positive value indicates that the reverse is true.


    Notice the Verifier functions on the call stack which we I leveraged for the “saved state” information I discussed above in the I/O Verification architecture.


    0: kd> kn

    # Child-SP          RetAddr           Call Site

    00 fffff880`05db1b08 fffff800`0174a9c0 nt!KeBugCheckEx

    01 fffff880`05db1b10 fffff800`01b66b4ent!VfBugCheckNoStackUsage+0x30

    02 fffff880`05db1b50 fffff800`01b6cc2e nt!VfAfterCallDriver+0x33e

    03 fffff880`05db1ba0 fffff880`04054756 nt!IovCallDriver+0x57e

    04 fffff880`05db1c00 fffff880`0404b7b0 srv2!Smb2ExecuteRead+0x9a6

    05 fffff880`05db1c80 fffff880`0404b6fb srv2!SrvProcessPacket+0xa0

    06 fffff880`05db1cc0 fffff800`01960e5a srv2!SrvProcWorkerThread+0x2fb

    07 fffff880`05db1d40 fffff800`016bad26nt!PspSystemThreadStartup+0x5a

    08 fffff880`05db1d80 00000000`00000000nt!KiStartSystemThread+0x16


    Next I dumped the driver dispatch routine noted in the KeBugCheckoutput above using the ln command. This points to fltmgr!FltpDispatch which tells me we have a filter manager minifilter driver making calls to disable APCs but rudely forgetting to re-enable them. As I noted above we save the state info before the call to IoCallDriver().


    0: kd> ln fffff88001345610

    (fffff880`01345610)   fltmgr!FltpDispatch   |  (fffff880`01345710)   fltmgr!FltReleasePushLock

    Exact matches:

        fltmgr!FltpDispatch (<no parameter info>)


    Now the goal was to determine which minifilter is leaving the “Do Not Disturb” sign on the door and forgetting to remove it. We can find this using the fltmgr device object.  The “saved state” structure is passed to VfAfterCallDriver as the first parameter so I switched to the VfAfterCallDriverframe (second frame) to dig it out. I used the /r flag to show the original values of the registers for this frame.


    0: kd> .frame /r 2

    02 fffff880`05db1b50 fffff800`01b6cc2e nt!VfAfterCallDriver+0x33e

    rax=0000000000000000 rbx=fffffa804b729790 rcx=00000000000000c4

    rdx=00000000000000c5 rsi=fffffa804a3b0000 rdi=fffff8000183ce80

    rip=fffff80001b66b4e rsp=fffff88005db1b50 rbp=fffffa804de8c290

    r8=fffff88001345610   r9=000000000000ffff r10=fffff80001b7a640

    r11=0000000000000000 r12=000000004de8c290 r13=0000000000000000

    r14=0000000000000000 r15=0000000000000000

    iopl=0         nv up ei ng nz na pe nc

    cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282


    fffff800`01b66b4e cc              int     3


    Parameter 1 on x64 is always passed via rcx. I dumped the assembly for VfAfterCallDriver and confirmed the value in rcx (the base of the saved state structure) is moved to rbx.


    0: kd> u nt!VfAfterCallDriver


    fffff800`01b66810 48895c2410      mov     qword ptr[rsp+10h],rbx

    fffff800`01b66815 48896c2418      mov     qword ptr[rsp+18h],rbp

    fffff800`01b6681a 4889742420      mov     qword ptr[rsp+20h],rsi

    fffff800`01b6681f 57              push    rdi

    fffff800`01b66820 4154            push    r12

    fffff800`01b66822 4155            push    r13

    fffff800`01b66824 4883ec30        sub     rsp,30h

    fffff800`01b66828 488bfa          mov     rdi,rdx

    fffff800`01b6682b 488bd9          mov     rbx,rcx


    The device object is stored in the save state information at offset 0xa0.


    0: kd> ? fffffa804b729790 + 0xa0

    Evaluate expression: -6046048151504 = fffffa80`4b729830


    0: kd> dq fffffa80`4b729830 l1

    fffffa80`4b729830  fffffa80`4a3b0060


    0: kd> !devobj fffffa80`4a3b0060

    Device object (fffffa804a3b0060) is for:

      \FileSystem\FltMgr DriverObject fffffa80491fb7c0

    Current Irp 00000000 RefCount 0 Type 00000008 Flags 00040000

    DevExt fffffa804a3b01b0 DevObjExt fffffa804a3b0208

    ExtensionFlags (0x80000800)  DOE_DEFAULT_SD_PRESENT, DOE_DESIGNATED_FDO

    Characteristics (0000000000) 

    AttachedTo (Lower) fffffa804a3b1030 \FileSystem\Ntfs

    Device queue is not busy.


    0: kd> !devstack fffffa80`4a3b0060

      !DevObj   !DrvObj            !DevExt   ObjectName

    > fffffa804a3b0060  \FileSystem\FltMgr fffffa804a3b01b0 

      fffffa804a3b1030  \FileSystem\Ntfs   fffffa804a3b1180 


    As http://msdn.microsoft.com/en-us/library/ff541610(v=vs.85).aspxexplains – “The filter manager is installed with Windows, but it becomes active only when a minifilter driver is loaded. The filter manager attaches to the file system stack for a target volume. A minifilter driver attaches to the file system stack indirectly, by registering with the filter manager for the I/O operations the minifilter driver chooses to filter.”


    Using the power of the fltkd extension, I dumped the volume information associated with this device object. From the output below, we can extract the name of the filter attached to the volume. The culprit is named BadDriver.sys. The customer removed the driver and the problem went away long enough for the vendor to create an update for BadDriver.sys. Happy Ending!


    0: kd> !fltkd.volume fffffa80`4a3b0060


    FLT_VOLUME: fffffa804a3b0800 "\Device\HarddiskVolume3"

       FLT_OBJECT: fffffa804a3b0800  [04000000] Volume

          RundownRef               : 0x000000000000020a (261)

          PointerCount             : 0x00000001

          PrimaryLink              : [fffffa804ae06810-fffffa804a2b16f0]

       Frame                    : fffffa8049fcd420 "Frame 0"

       Flags                    : [00000064] SetupNotifyCalledEnableNameCaching FilterAttached

       FileSystemType           : [00000002] FLT_FSTYPE_NTFS

       VolumeLink               : [fffffa804ae06810-fffffa804a2b16f0]

       DeviceObject             : fffffa804a3b0060

       DiskDeviceObject         : fffffa804a1c0350

       FrameZeroVolume          : fffffa804a3b0800

       VolumeInNextFrame        : 0000000000000000

       Guid                     : "\??\Volume{552791b0-455d-11de-b7b9-00145eed6acc}"

       CDODeviceName            : "\Ntfs"

       CDODriverName            : "\FileSystem\Ntfs"

       TargetedOpenCount        : 258

       Callbacks                : (fffffa804a3b0910)

       ContextLock              : (fffffa804a3b0cf8)

       VolumeContexts           : (fffffa804a3b0d00)  Count=0

       StreamListCtrls          : (fffffa804a3b0d08)  rCount=2871

       FileListCtrls            : (fffffa804a3b0d88)  rCount=0

       NameCacheCtrl            : (fffffa804a3b0e08)

       InstanceList             : (fffffa804a3b0890)

          FLT_INSTANCE: fffffa804b5b1010 "BadDriver.sys Instance" "189600"

  • Ntdebugging Blog

    Debugging a Debugger to Debug a Dump


    Recently I came across an instance where my debugger did not do what I wanted.  Rarely do computers disobey me, but this one was unusually stubborn.  There was no other option; I had to bend the debugger to my will.


    There are many ways to make a computer program do what you want.  If you have the source code you can rewrite and recompile the program.  If you have a hex editor you can edit the code of the binary.  A shim can be used to modify a program at runtime.  In this instance I was in a hurry and I was ok with a temporary solution, so I used a debugger to change the execution of the debugger while it ran.


    Debug a debugger? Can you do such a thing?  Of course you can.


    In this example a memory dump was captured of a system and I was asked to determine if the system had run out of desktop heap.  Usually the !dskheap command can be used to determine how much heap has been used by each desktop. Unfortunately, this command failed me.


    23: kd> !dskheap

    Error Reading TotalFreeSize from nt!_HEAP @ fffffa8019c65c00

    Failed to GetHeapInfo for desktop @fffffa8019c65c00

    EnumDsktps failed on Winsta: 19c4f090FillWinstaArray failed


    The error indicates that the command couldn’t read from the _HEAP structure at fffffa8019c65c00 for desktop fffffa8019c65c00.  Further investigation found that reason I got this error is that the heap for the desktop in question is not valid memory.  Because the memory is described by a prototype PTE I assume that the heap has not been initialized (Note: See Windows Internals’ Memory Management chapter for more information about proto PTEs).


    23: kd> dt win32k!tagDESKTOP fffffa8019c65c00

       +0x000 dwSessionId      : 0

       +0x008 pDeskInfo        : 0xfffff900`c05e0a70 tagDESKTOPINFO

       +0x010 pDispInfo        : 0xfffff900`c0581e50 tagDISPLAYINFO

       +0x018 rpdeskNext       : 0xfffffa80`19c6ef20 tagDESKTOP

       +0x020 rpwinstaParent   : 0xfffffa80`19c4f090 tagWINDOWSTATION

       +0x028 dwDTFlags        : 0x110

       +0x030 dwDesktopId      : 0x19c65c00`00000003

       +0x038 spmenuSys        : (null)

       +0x040 spmenuDialogSys  : (null)

       +0x048 spmenuHScroll    : (null)

       +0x050 spmenuVScroll    : (null)

       +0x058 spwndForeground  : (null)

       +0x060 spwndTray        : (null)

       +0x068 spwndMessage     : 0xfffff900`c05e0d90 tagWND

       +0x070 spwndTooltip     : 0xfffff900`c05e0fa0 tagWND

       +0x078 hsectionDesktop  : 0xfffff8a0`00ef08e0 Void

       +0x080 pheapDesktop     : 0xfffff900`c05e0000 tagWIN32HEAP

       +0x088 ulHeapSize       : 0x18000

       +0x090 cciConsole       : _CONSOLE_CARET_INFO

       +0x0a8 PtiList          : _LIST_ENTRY [ 0xfffffa80`19c65ca8 - 0xfffffa80`19c65ca8 ]

       +0x0b8 spwndTrack       : (null)

       +0x0c0 htEx             : 0n0

       +0x0c4 rcMouseHover     : tagRECT

       +0x0d4 dwMouseHoverTime : 0

       +0x0d8 pMagInputTransform : (null)

    23: kd> dd 0xfffff900`c05e0000

    fffff900`c05e0000  ???????? ???????? ???????? ????????

    fffff900`c05e0010  ???????? ???????? ???????? ????????

    fffff900`c05e0020  ???????? ???????? ???????? ????????

    fffff900`c05e0030  ???????? ???????? ???????? ????????

    fffff900`c05e0040  ???????? ???????? ???????? ????????

    fffff900`c05e0050  ???????? ???????? ???????? ????????

    fffff900`c05e0060  ???????? ???????? ???????? ????????

    fffff900`c05e0070  ???????? ???????? ???????? ????????

    23: kd> !pte fffff900`c05e0000

                                               VA fffff900c05e0000

    PXE at FFFFF6FB7DBEDF90    PPE at FFFFF6FB7DBF2018    PDE at FFFFF6FB7E403010    PTE at FFFFF6FC80602F00

    contains 000000076245C863  contains 0000000762569863  contains 000000045FA17863  contains F8A000F4F9780400

    pfn 76245c    ---DA--KWEV  pfn 762569    ---DA--KWEV  pfn45fa17    ---DA--KWEV   not valid

                                                                                     Proto: FFFFF8A000F4F978


    There are many desktops in this session and I wanted to know about the usage of the other desktops, but the !dskheap command stopped after just one error.  I needed to force it to continue after this error, so I launched a debugger to debug my debugger.  There is a command to do this, just run .dbgdbg.


    23: kd> .dbgdbg

    Debugger spawned, connect with



    For clarity I will call the original debugger where I ran !dskheap debugger1, and the new debugger spawned by .dbgdbg debugger2 .


    Before switching to debugger2 I need to know what I am going to debug.  The error message gives a hint about where to set a breakpoint, I am looking for a failure from GetHeapInfo.


    23: kd> !dskheap

    Error Reading TotalFreeSize from nt!_HEAP @ fffffa8019c65c00

    Failed to GetHeapInfo for desktop @fffffa8019c65c00

    EnumDsktps failed on Winsta: 19c4f090FillWinstaArray failed


    I need to know which module GetHeapInfo is in, the .extmatch match command indicates which module contains the !dskheap command.


    23: kd> .extmatch dskheap



    Switching to debugger2 I set a breakpoint on kdexts!GetHeapInfo.  Use Ctrl+C to trigger a debug break in cdb (this is the same as a Ctrl+Break in windbg).


    0:004> bp kdexts!GetHeapInfo

    0:004> g


    Switch back to debugger1 and run the !dskheap command.


    23: kd> !dskheap


    In debugger2 I have hit the breakpoint.


    Breakpoint 0 hit


    000007f9`4237b9b0 4055            push    rbp


    The error says GetHeapInfo failed, so I am interested in what this function returns.  To see what GetHeapInfo returns I go up one level in the stack and set a new breakpoint on the code just after it returns.  This new breakpoint will also dump the return value of GetHeapInfo (return values are always in the rax register).


    0:000> gu


    000007f9`4237b483 4885c0          test    rax,rax

    0:000> r rax


    0:000> bc *

    0:000> bp 000007f9`4237b483 "r rax"

    0:000> g


    The next time the breakpoint hit the return value was 1, which in this instance means GetHeapInfo failed.  This is where I exerted my control over the computer: I forced the return value to 0.




    000007f9`4237b483 4885c0          test    rax,rax

    0:000> r rax=0


    I ran through the other breakpoints and changed rax as necessary.


    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> r rax=0

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g


    Everything was going well, until the computer defied me again.  The !dskheap output computes the percentage of heap usage by dividing the bytes used by the size of the heap.  Unfortunately, the size of the heap was left at 0 for the two heaps where I changed the return value.  It is well known that only Chuck Norris can divide by zero; to prevent a roundhouse kick to your computer the processor generates an exception.


    (2d0.928): Integer divide-by-zero - code c0000094 (first chance)

    First chance exceptions are reported before any exception handling.

    This exception may be expected and handled.


    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> r r11


    0:000> r rax


    0:000> g


    Fortunately debugger1 handles the divide by zero exception and it is easy to run !dskheap again.


    23: kd> !dskheap


    Back in debugger2 I set a new breakpoint on the div instruction that outputs the divisor.  When the divisor (r11) is 0 I changed it to a non-zero value to avoid the wrath of Mr. Norris.




    000007f9`4237b483 4885c0          test    rax,rax

    0:000> bp 000007f9`4237b90e

    0:000> bp 000007f9`4237b90e "r r11"

    breakpoint 1 redefined

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> r rax=0

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> r rax=0

    0:000> g



    000007f9`4237b483 4885c0          test    rax,rax

    0:000> g



    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> g



    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> r r11=1

    0:000> g



    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> g



    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> g



    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> g



    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> g



    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> r r11=1

    0:000> g



    000007f9`4237b90e 49f7f3          div     rax,r11

    0:000> g


    Finally, back in debugger1 I now have complete output for !dskheap.  After a few strategic modifications of the program’s execution I got it to output the data I wanted.  As it turns out we aren’t out of desktop heap after all.


    23: kd> !dskheap

    Error Reading TotalFreeSize from nt!_HEAP @ fffffa8019c65c00

    Error Reading TotalFreeSize from nt!_HEAP @ fffffa801a53ea30

      Winstation\Desktop            Heap Size(KB)   Used Rate(%)


      WinSta0\Default                  20480                 0%

      WinSta0\Disconnect                   0                 0%

      WinSta0\Winlogon                   192                 2%

      Service-0x0-3e7$\Default          7680                 1%

      Service-0x0-3e4$\Default          7680                 0%

      Service-0x0-3e5$\Default          7680                 0%

      Service-0x0-26f46a$\Default          0                 0%

      Service-0x0-2706f2$\Default       7680                 0%


                    Total Desktop: (   51392 KB -   8 desktops)

                    Session ID:  0


  • Ntdebugging Blog

    How to Setup a Debug Crash Cart to Prevent Your Server from Flat Lining


    This is Ron Stock from the Global Escalation Services team and I recently had the task of live debugging a customer’s remote server.  In debug circles we use what is known as a crash cart to live debug production servers. The phrase conjures up visions of a wheeled cabinet containing an emergency defibrillator, a heart monitor and latex gloves. Luckily for our purposes, the term merely denotes a machine setup with the Debugging Tools for Windows. This life saving machine is attached to the ailing production server for debugging and no medical degree is required.


    The ailing production server is referred to as the Target Computer and the Debugging Tools for Windows are installed on the Host computer. The machines are attached with either a null-modem cable, 1394 cable, a special USB cable, or an ethernet cable (network debugging was added in Windows 8). Below I outline serial debugging because this is the most common technique.  In future articles I plan to discuss configuring the other methods.




    Serial Connection Setup


    A null-modem cable is a serial cable used to send data between two serial ports and it can be cheaply purchased at most electronics stores. Be aware these are different from standard serial cables because the transmit and receive lines are cross linked.


    Plug the null-modem cable into a serial port on each of the computers.  The serial port on the target computer must be built into the system, add on components such as PCI cards will not work for serial debugging on the target computer.


    Target Computer setup


    1. To enable debugging enter the following command from an elevated command prompt.

    bcdedit /debug on


    2. In most systems the default debug settings are sufficient. The current settings can be verified with the below command.

    bcdedit /dbgsettings


    3. Use the below command if you need to change the debug settings, where x is the number of the COM port connected to the null modem cable on the target machine and rate is the baud rate desired for debugging. The rate is usually 115200.

    bcdedit /dbgsettings serial debugport:x baudrate:115200


    5. Reboot the target computer.


    Host Computer setup

    1. First install the Windows Debugging Tools on the host computer. Navigate to the Windows Software Development Kit (SDK) currently located at this link http://msdn.microsoft.com/en-US/windows/hardware/hh852363 and choose the download option.

          a. If you are not able to install the SDK on the host computer, the debugging tools for windows can be installed to a different system and the debugger directory can be copied to the host computer.


    2. Click Next until you see the Select the features you want to install screen.


    3. Select only the option named Debugging Tools for Windows and click the Install button. I typically install the tools to a directory named C:\debugger




    4. After the Windows Debugging Tools are installed I set my symbol path by setting the environment variable _NT_SYMBOL_PATH. I recommend setting it to the public symbol server SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols. If you prefer, you can specify any path in place of ‘c:\localsymbols’.


    5. Open the debugger by running windbg.exe from the c:\debugger folder.


    6. On the File menu, choose Kernel Debug.


    7. In the Kernel Debugging dialog box, open the COM tab.


    8. In the Baud rate box, enter the same rate you selected for the Target Machine in the steps above. This is usually 115200.


    9. In the Port box, enter COMx where x is the COM port connected to the null modem cable on this computer. In my example I plugged my null modem cable to com port 1 so I typed com1 in the field.

          a. It is not necessary to use the same port number on both the target and the host.  For example, it is possible to use com1 on the target and com2 on the host.




    10. Click OK and you’ll receive a message indicating the Host computer is waiting to connect.



    11. From the Debug menu select Break will cause the debugger to break into the target machine and give you the opportunity to debug your ailing production server. Good Luck!!

  • Ntdebugging Blog

    Case of the Unexplained Services exe Termination


    Hello Debuggers! This is Ron Stock from the Global Escalation Services team and I recently worked an interesting case dispatched to our team because Services.exe was terminating. Nothing good ever happens when Services.exe exits. In this particular case, client RDP sessions were forcibly disconnected from the server and the server machine was shutting down unexpectedly. This is the message encountered at the console on the server.




    The customer was able to trigger the crash by changing anything to do with a particular non-Microsoft service in the Services MMC (e.g. changing the Startup Type, changing the dependencies, stopping the service etc...) To protect our vendor friend, I refer to this service as FriendlyService throughout this article. We could have stopped the investigation right here and implicated the FriendlyService vendor, however as you will see, this service was merely a victim.



    When investigating a process termination, I routinely gather a process dump to start my investigation. The frustrating thing about this instance was how none of our debugging tools were generating a dump file when Services.exe terminated. I tried the usual toolset including AdPlus, ProcDump and DebugDiag to no avail. Despite the lack of data from a memory dump, I was still able to piece together a stack and attack the problem through a creative approach. Debugging is an art.


    First I reviewed the application log which was loaded with entries like this one.


    11/29/2012 04:29:05 PM  Information   HE2NTSP208       1004    Application Error  N/A Faulting application services.exe, version 5.2.3790.4455, faulting module msvcr80.dll, version 8.0.50727.6195, fault address 0x000000000001df67


    From the event log entry above I was able to determine the faulting module, msvcr80.dll, and the relative offset, 0x000000000001df67. This address is basically the return address of the faulting call. As you can see from the ln output below, the function name was msvcr80!wcscpy_s. As I noted above, I wasn’t able to gather a dump during process termination so I had the customer use ProcDump to snap a dump of Services.exe during normal process operation (not a crash dump).


    0:000> ln 00000000`78130000 + 0x000000000001df67
    (00000000`7814ded0)   msvcr80!wcscpy_s+0x97   |  (00000000`7814df80)   msvcr80!wcsncpy_s


    Next I needed to determine which DLLs in the Services.exe process were calling msvcr80!wcscpy_s by reviewing the import tables of the binaries loading in the process. I used the !peb command to dump out the Process Environment Block (PEB). This gave me the list of base addresses for each loaded DLL. I focused mainly on non-Microsoft binaries. To protect our vendor friends, I renamed the DLL to ThirdPartyServiceMonitor.Dll in this article.


    This is the output from the !peb command with the base addresses in the left column.


    0:000> !peb

    PEB at 000007fffffde000

        InheritedAddressSpace:    No

        ReadImageFileExecOptions: No

        BeingDebugged:            Yes

        ImageBaseAddress:         0000000100000000

        Ldr                       0000000077fa9f20

        Ldr.Initialized:          Yes

        Ldr.InInitializationOrderModuleList: 00000000000d2df0 . 00000000001a3600

        Ldr.InLoadOrderModuleList:           00000000000d2d20 . 000000000019cbc0

        Ldr.InMemoryOrderModuleList:         00000000000d2d30 . 000000000019cbd0

                Base TimeStamp                     Module

           100000000 49882047 Feb 03 04:45:27 2009 C:\WINDOWS\system32\services.exe

            77ec0000 4ecbcd57 Nov 22 10:27:03 2011 C:\WINDOWS\system32\ntdll.dll

            77d40000 49c51cdd Mar 21 11:59:09 2009 C:\WINDOWS\system32\kernel32.dll

         7ff7fc00000 45d6ccae Feb 17 03:36:46 2007 C:\WINDOWS\system32\msvcrt.dll

         7ff7fee0000 4a61f064 Jul 18 10:55:16 2009 C:\WINDOWS\system32\ADVAPI32.dll

         7ff7fd30000 4c6ba77a Aug 18 04:27:22 2010 C:\WINDOWS\system32\RPCRT4.dll

         7ff7e9c0000 4a37438e Jun 16 02:02:38 2009 C:\WINDOWS\system32\Secur32.dll

            77c20000 45e7c5c2 Mar 02 00:35:46 2007 C:\WINDOWS\system32\USER32.dll

         7ff7fc90000 490062ac Oct 23 06:40:28 2008 C:\WINDOWS\system32\GDI32.dll

         7ff7c680000 45d6ccab Feb 17 03:36:43 2007 C:\WINDOWS\system32\USERENV.dll

         7ff7c450000 45d6cc90 Feb 17 03:36:16 2007 C:\WINDOWS\system32\SCESRV.dll

         7ff7e490000 45d6cc04 Feb 17 03:33:56 2007 C:\WINDOWS\system32\AUTHZ.dll

         7ff77370000 4fedd464 Jun 29 11:14:28 2012 C:\WINDOWS\system32\NETAPI32.dll

         7ff7c410000 45d6cca5 Feb 17 03:36:37 2007 C:\WINDOWS\system32\umpnpmgr.dll

         7ff7d4d0000 45d6ccb8 Feb 17 03:36:56 2007 C:\WINDOWS\system32\WINSTA.dll

         7ff65470000 45d6cc5c Feb 17 03:35:24 2007 E:\Program Files\ThirdPartyDirectory\ThirdParty2.dll

              400000 424360e9 Mar 24 19:52:57 2005 C:\WINDOWS\system32\msvcp60.dll

         7ff7d500000 45d6cc3b Feb 17 03:34:51 2007 C:\WINDOWS\system32\IMM32.DLL

              e50000 4df9462e Jun 15 18:54:22 2011 E:\Program Files\ThirdPartyDirectory\ThirdPartyServiceMonitor.dll


    Using the base address of ThirdPartyServiceMonitor, I dumped the header to find the Import Address Table Directory.


    0:000> !dh 00000000`00e50000


    1E000 [     758] address [size] of Import Address Table Directory


    Using the dps command I dumped all of the functions in the import table of ThirdPartyServiceMonitor.dll. I found msvcr80!wcscpy_s in the function list. This indicates ThirdPartyServiceMonitor.dll makes calls to the msvcr80!wcscpy_s.


    0:000> dps 00000000`00e50000+ 1E000 l758/@$ptrsize




    00000000`00e6e458  00000000`7814d890 msvcr80!_wcsnicmp

    00000000`00e6e460  00000000`7814db20 msvcr80!_wcsicmp

    00000000`00e6e498  00000000`7814ded0 msvcr80!wcscpy_s




    Since this was the only third party DLL with msvcr80!wcscpy_s in its import table, I was able to continue piecing together my stack. ThirdPartyServiceMonitor.dll was calling msvcr80!wcscpy_s and causing Services.exe to crash. At this point in the investigation, the stack looks like this-





    In my quest to continue building my “conceptual” stack without a crash dump file, I reviewed the status code from the “System Shutdown” dialog displayed when Services.exe terminated. Notice the status code -1073741811 in the error. What the heck does that mean?



    Well I easily resolved the cryptic status code by passing 0n-1073741811 to the !error command in the debugger. The “0n” prefix indicates the value should be interpreted as decimal rather than hex by the debugger. I also included the negative symbol “-“ because this also appears in status code.


    0:030> !error 0n-1073741811

    Error code: (NTSTATUS) 0xc000000d (3221225485) - An invalid parameter was passed to a service or function.


    Armed with the status code information, I reviewed the msvcr80!wcscpy_s assembly code to determine if this status code was returned at any point. In the assembly I found a call to a function named msvcr80!_invalid_parameter. In context of the message, “an invalid parameter was passed to a service or function”, this certainly sounds like the code path taken.


    0:000> uf msvcr80!wcscpy_s



       18 00000000`7814deea e88184feff      call    msvcr80!_errno(00000000`78136370)

       18 00000000`7814deef 4533c9          xor     r9d,r9d

       18 00000000`7814def2 4533c0          xor     r8d,r8d

       18 00000000`7814def5 33d2            xor     edx,edx

       18 00000000`7814def7 33c9            xor     ecx,ecx

       18 00000000`7814def9 48c744242000000000 mov   qword ptr [rsp+20h],0

       18 00000000`7814df02 c70016000000    mov     dword ptr[rax],16h

       18 00000000`7814df08 e873d1feff      call    msvcr80!_invalid_parameter (00000000`7813b080)

       18 00000000`7814df0d b816000000      mov     eax,16h

       34 00000000`7814df12 4883c438        add     rsp,38h

       34 00000000`7814df16 c3              ret


    It was reasonable to add this call to my conceptual stack because the call tree makes sense.






    Because I’m a curious kind of guy, I unassembled msvcr80!_invalid_parameter to peel back another layer of the onion. To my surprise I found a call to msvcr80!_imp_TerminateProcess. BOOM! This explains why the debugger wasn’t catching the process crash. The process was terminating ‘organically’ through a TerminateProcess call rather than crashing due to an exception, however it was unexpectedly terminating. In other words, all of the services running on the machine were not expecting Services.exe to terminate.




       88 00000000`7813b155 ff156d200900    call    qword ptr [msvcr80!_imp_GetCurrentProcess (00000000`781cd1c8)]

       88 00000000`7813b15b ba0d0000c0      mov     edx,0C000000Dh

       88 00000000`7813b160 488bc8          mov     rcx,rax

       88 00000000`7813b163 ff1557200900    call    qword ptr [msvcr80!_imp_TerminateProcess(00000000`781cd1c0)]


    Now I was able to cobble together a fairly accurate stack without a dump file. At this point I could tell ThirdPartyServiceMonitor.dll was passing bad parameters to msvcr80!wcscpy_s. However, this didn’t explain how FriendlyService (mentioned in the blog introduction) was triggering the issue. I needed to go deeper with a live debug by leveraging the stack information I devised.







    On the customer’s server I attached Windbg to the Services.exe process and set a breakpoint on msvcr80!_invalid_parameter.


    0:001> bp msvcr80!_invalid_parameter


    Then I had the customer reproduce the issue by changing the startup type on FriendlyService in the Services MMC. As I mentioned above, this was one way to trigger the issue. BOOM! My breakpoint hit and looked exactly like the conceptual stack I pieced together. Now I was able to determine what ThirdPartyServiceMonitor was passing to msvcr80!wcscpy_s while broken in with the debugger.


    0:035> k

    Child-SP          RetAddr           Call Site

    00000000`105deb00 00000000`7814df67 msvcr80!_invalid_parameter+0xe3

    00000000`105df0c0 00000000`00e53045 msvcr80!wcscpy_s+0x97

    00000000`105df100 00000000`00e5947e ThirdPartyServiceMonitor+0x3045

    00000000`105df130 00000000`00e58405 ThirdPartyServiceMonitor+0x947e

    00000000`105df180 000007ff`7fd69c75 ThirdPartyServiceMonitor+0x8405

    00000000`105df1b0 000007ff`7fe9ccc9 rpcrt4!Invoke+0x65

    00000000`105df200 000007ff`7fe9d58d rpcrt4!NdrStubCall2+0x54d


    I reviewed the assembly of ThirdPartyServiceMonitor at the point in which it calls msvcr80!wcscpy_s. I discovered that the vendor hardcoded the size of the destination string buffer with 200 hex (512 decimal) while the size of the source string buffer was greater than 512 decimal.


    0:035> ub 00000000`00e53045


    00000000`00e53020 898324040000    mov     dword ptr [rbx+424h],eax

    00000000`00e53026 488b4738        mov     rax,qword ptr [rdi+38h]

    00000000`00e5302a 4c8b4010        mov     r8,qword ptr [rax+10h] <<<< Source Buffer

    00000000`00e5302e 4d85c0          test    r8,r8

    00000000`00e53031 7412            je      ThirdPartyServiceMonitor+0x3045 (00000000`00e53045)

    00000000`00e53033 488d8b28040000  lea     rcx,[rbx+428h] <<<< Destination Buffer

    00000000`00e5303a ba00020000      mov     edx,200h <<<<< Buffer Size hardcoded to 200 hex

    00000000`00e5303f ff1553b40100    call    qword ptr [ThirdPartyServiceMonitor!PlugControl+0xff38 (00000000`00e6e498)] <<<< call to msvcr80!wcscpy_s

    00000000`00e53045 eb16            jmp     ThirdPartyServiceMonitor +0x305d (00000000`00e5305d) <<<< return from msvcr80!wcscpy_s


    0:035> .formats 200

    Evaluate expression:

      Hex:     00000000`00000200

      Decimal: 512


    Here is the definition of wcscpy_s from MSDN

    errno_t wcscpy_s(

       wchar_t *strDestination,         // Location of destination string buffer

       size_t numberOfElements,         // Size of the destination string buffer.

       const wchar_t *strSource         // Null-terminated source string buffer.



    Here are the parameters passed to msvcr80!wcscpy_s. As a reminder, the x64 compiler uses rcx to pass the first parameter, rdx for the second, and r8 for the third. In this case the buffer size was the second parameter in rdx and the source buffer was in r8, the third parameter.


    0:035> dq 00000000`00e6e498 l1

    00000000`00e6e498  00000000`7814ded0


    0:035> ln 00000000`7814ded0

    (00000000`7814ded0)   msvcr80!wcscpy_s   |  (00000000`7814df80)   msvcr80!wcsncpy_s

    Exact matches:



    Dumping the source string in r8 showed the string was clearly longer than 512 characters.


    0:035> dc 00000000`0103b438

    00000000`0103b438  003a0000 0049005c 0074006e 00670065  ..:.\.F.r.e.i.n.

    00000000`0103b448  00610072 00690074 006e006f 00670041  d.l.y.S.e.r.v.i.

    00000000`0103b458  006e0065 005c0074 0069006c 005c0062  c.e.\.l.i.b.\..



    0:035> dc

    00000000`0103b738  00650070 002e0072 00700061 002e0070  p.e.r...a.p.p...

    00000000`0103b748  00610070 00610072 0065006d 00650074  p.a.r.a.m.e.t.e.

    00000000`0103b758  002e0072 003d0031 00770020 00610072  r...1.=. .w.r.a.

    00000000`0103b768  00700070 00720065 0061002e 00700070  p.p.e.r...a.p.p.

    00000000`0103b778  0070002e 00720061 006d0061 00740065  ..p.a.r.a.m.e.t.

    00000000`0103b788  00720065 0032002e 0020003d 00720077  e.r...2.=. .w.r.

    00000000`0103b798  00700061 00650070 002e0072 00700061  a.p.p.e.r...a.p.

    00000000`0103b7a8  002e0070 00610070 00610072 0065006d  p...p.a.r.a.m.e.

    0:035> dc

    00000000`0103b7b8  00650074 002e0072 003d0033 00770020  t.e.r...3.=. .w.

    00000000`0103b7c8  00610072 00700070 00720065 0061002e  r.a.p.p.e.r...a.

    00000000`0103b7d8  00700070 0070002e 00720061 006d0061  p.p...p.a.r.a.m.

    00000000`0103b7e8  00740065 00720065 0034002e 0020003d  e.t.e.r...4.=. .

    00000000`0103b7f8  00720077 00700061 00650070 002e0072  w.r.a.p.p.e.r...

    00000000`0103b808  00700061 002e0070 00610070 00610072  a.p.p...p.a.r.a.

    00000000`0103b818  0065006d 00650074 002e0072 003d0035  m.e.t.e.r...5.=.

    00000000`0103b828  00770020 00610072 00700070 00720065  .w.r.a.p.p.e.r.


    It was not hard to figure out this string was the imagefile path located at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\FriendlyService. You can see from the string that service takes parameters which increased the size of the image path beyond 512 characters.


    FriendlyService\lib\friendly.3\sbin\friendlywindows-x86-32.exe-s E:\ FriendlyService\conf\wrapper.conf set.DATA_APP=apia "set.DATA_APP_LONG= FriendlyServiceVendor" set.DATA_EXE=E:\ FriendlyService\bin\..\lib\friendly.3\bin set.DATA_HOME=E:\ FriendlyService\bin\..\lib\friendly.3 set.INSTALL_DIR=E:\ FriendlyService\bin\.. wrapper.working.dir=E:\ FriendlyService\bin\.. wrapper.app.parameter.1= wrapper.app.parameter.2= wrapper.app.parameter.3= wrapper.app.parameter.4= wrapper.app.parameter.5= wrapper.app.parameter.6= wrapper.app.parameter.7= wrapper.app.parameter.8= wrapper.app.parameter.9=


    The ThirdPartyServiceMonitor hooks any changes made to a service (e.g. changing the Startup Type, changing the dependencies, stopping the service etc...). After hooking the change, ThirdPartyServiceMonitor.dll performs a string copy of the service’s image file path. In most cases this works like a champ, however in this instance, the FriendlyService image path from vendor 1 is really long and the ThirdPartyServiceMonitor from vendor 2 doesn’t account for service image paths exceeding 512 characters. This is the perfect storm! ThirdPartyServiceMonitor needs to remove the hardcoded buffer size.


    This example occurred on Windows Server 2003.  Starting in Windows 7 and Windows Server 2008 R2, Windows has added functionality for catching processes that exit ‘organically’ through a TerminateProcess call.  You can find information on this added functionality on MSDN.

  • Ntdebugging Blog

    Use Caution When Implementing IPC for Performance Counters



    Recently I was working with a developer who had created performance counters that work in Performance Monitor but are never collected in a user defined data collector set.  The customer explained that their counters update named shared memory inside the application which should be read by perfmon or the data collector set.


    Putting counter data in shared memory is a common technique for performance counter developers.  A programmer can update performance data in a block of shared memory in their application and then use a performance extension dll (aka an “extensible counter”) to read from the shared memory.


    Shared memory is created by calling CreateFileMapping and MapViewOfFile.  This memory is then accessed by another application by calling OpenFileMapping.  All applications which use this shared memory must pass the same lpName to CreateFileMapping or OpenFileMapping.  An example of using these APIs to implement shared memory is available on MSDN.


    Based on the customer’s explanation that they are populating shared memory in their application, and their counters work in Performance Monitor but do not work in a user defined data collector set, I suspected that OpenFileMapping was failing for the data collector set.


    User defined data collector sets run in a rundll32.exe process.  If you have multiple rundll32.exe processes you may need to identify which one is related to your data collector set.  The relevant process has a command line similar to “rundll32.exe C:\Windows\system32\pla.dll,PlaHost”.  There are several tools that can be used to identify the command line of the process such as tlist.exe, which is included with the Debugging Tools for Windows.


    After attaching a debugger to rundll32.exe, I wanted to break on the ret instruction at the end of the OpenFileMappingW function.  This would allow me to determine if the function succeeds or fails.  According to MSDN “If the function fails, the return value is NULL. To get extended error information, call GetLastError.”


    The uf command is an easy way to unassemble a function and find the ret instruction to break on.


    0:001> uf kernelbase!OpenFileMappingW


    75b88e0d 8bff            mov     edi,edi



    75b88e79 c9              leave

    75b88e7a c20c00          ret     0Ch

    0:001> bp 75b88e7a

    0:001> g

    Breakpoint 0 hit

    eax=00000000 ebx=00008022 ecx=7ffd8000 edx=00000002 esi=05abf03c edi=00000000

    eip=75b88e7a esp=05abeb20 ebp=05abeb3c iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246


    75b88e7a c20c00          ret     0Ch


    In the above output we can see that the eax register is NULL, indicating that the call to OpenFileMapping failed.  The !gle command will show the last error and last status.


    0:008> !gle

    LastErrorValue: (Win32) 0x2 (2) - The system cannot find the file specified.

    LastStatusValue: (NTSTATUS) 0xc0000034 - Object Name not found.


    The failure is that OpenFileMapping cannot find the file.  The file name is the third parameter to OpenFileMapping.  We can get the first three parameters from the kb command.


    0:008> kb

    ChildEBP RetAddr  Args to Child             

    05abf0d0 6abae355 0002001f 00000000 05abeb7c kernelbase!OpenFileMappingW+0x90

    WARNING: Stack unwind information not available. Following frames may be wrong.

    05abf0f8 7784fe67 02a7ae90 05abf224 05abf254 ninjaprf+0x10edb

    05abf110 7784fc97 00472158 02a7ae90 05abf224 advapi32!CallExtObj+0x17

    05abf270 7784efaf 05abf2bc 60fcfa02 05abf778 advapi32!QueryExtensibleData+0x735

    05abf654 75ff0468 80000004 05abf778 00000000 advapi32!PerfRegQueryValue+0x5da

    05abf748 75ffd505 80000004 05abf778 05abf790 kernel32!LocalBaseRegQueryValue+0x366

    05abf7b4 61247dc5 80000004 02a7ae90 00000000 kernel32!RegQueryValueExW+0xb7

    05abf830 61250595 80000004 02a7ae58 02a7ae90 pdh!GetSystemPerfData+0x92

    05abf89c 6124c753 02a407d0 05abf8e8 61241928 pdh!GetQueryPerfData+0xa4

    05abf8b8 61254463 02a407d0 05abf8e8 60fcf32f pdh!PdhiCollectQueryData+0x32

    05abf90c 611c6d04 02a58f08 00000000 75ffc3e0 pdh!PdhUpdateLogW+0xa2

    05abf9bc 611be128 0045c968 00000000 00000000 pla!HPerformanceCounterDataCollector::Cycle+0x48

    05abf9bc 00000000 0045c968 00000000 00000000 pla!PlaiCollectorControl+0x3b7

    0:008> da 05abeb7c

    05abeb7c  "Local\NINJAPERF_S-1-5-18"


    The user defined data collector set is failing to open the file "Local\NINJAPERF_S-1-5-18".  This is the name that the performance extension dll ninjaprf.dll has given to its shared memory.


    Based on the customer’s description this operation works in Performance Monitor.  Next, I attached a debugger to perfmon and set the same breakpoint.


    Breakpoint 0 hit

    eax=000009f8 ebx=00008022 ecx=a7330000 edx=080ee678 esi=06798070 edi=00000000

    eip=760be9bb esp=0a84e564 ebp=0a84e580 iopl=0         nv up ei pl zr na pe nc

    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246


    760be9bb c20c00          ret     0Ch


    In the above output we can see that eax is a handle number, indicating that the function succeeded.


    Looking at the file being opened we can see why this works in one scenario and not in another.


    0:016> kb 1

    ChildEBP RetAddr  Args to Child             

    0a84e580 698e4ab9 0002001f 00000000 0a84e5c0 kernelbase!OpenFileMappingW+0x90

    0:016> da 0a84e5c0

    0a84e5c0  "Local\NINJAPERF_S-1-5-21-123578"

    0a84e5e0  "095-571698237-1598563147-18961"


    The file name used is unique for each user.  The ninjaprf dll has chosen a file name which includes the SID of the current user.  This works for Performance Monitor because the user who starts the application is the same as the user who runs Performance Monitor.  However, a user defined data collector set runs in rundll32.exe as the Local System account.


    It is possible to run the user defined data collector set as a different user, however the file name being used will not work in that scenario either.  By using the “Local\” prefix, the file is created in the local session namespace.  The application runs in the user’s session, while rundll32.exe is started by the Task Scheduler service and runs in session 0.  This prevents the user defined data collector set from seeing the file created by the application.


    If your performance counter uses shared memory to communicate, be aware that your performance extension dll may be run in the context of a different user account and a different user session.  Your inter process communication techniques must account for this, or your customers will ask you why your counters do not work outside of Performance Monitor.

  • Ntdebugging Blog

    Determining the source of Bug Check 0x133 (DPC_WATCHDOG_VIOLATION) errors on Windows Server 2012


    What is a bug check 0x133?

    Starting in Windows Server 2012, a DPC watchdog timer is enabled which will bug check a system if too much time is spent in DPC routines. This bug check was added to help identify drivers that are deadlocked or misbehaving.  The bug check is of type "DPC_WATCHDOG_VIOLATION" and has a code of 0x133.  (Windows 7 also included a DPC watchdog but by default, it only took action when a kernel debugger was attached to the system.)  A description of DPC routines can be found at http://msdn.microsoft.com/en-us/library/windows/hardware/ff544084(v=vs.85).aspx.


    The DPC_WATCHDOG_VIOLATION bug check can be triggered in two ways. First, if a single DPC exceeds a specified number of ticks, the system will stop with 0x133 with parameter 1 of the bug check set to 0.  In this case, the system's time limit for single DPC will be in parameter 3, with the number of ticks taken by this DPC in parameter 2.  Alternatively, if the system exceeds a larger timeout of time spent cumulatively in all DPCs since the IRQL was raised to DPC level, the system will stop with a 0x133 with parameter 1 set to 1.  Microsoft recommends that DPCs should not run longer than 100 microseconds and ISRs should not run longer than 25 microseconds, however the actual timeout values on the system are set much higher.


    How to debug a 0x133 (0, …

    In the case of a stop 0x133 with the first parameter set to 0, the call stack should contain the offending driver.  For example, here is a debug of a 0x133 (0,…) kernel dump:


    0: kd> .bugcheck

    Bugcheck code 00000133

    Arguments 00000000`00000000 00000000`00000283 00000000`00000282 00000000`00000000  


    Per MSDN, we know that this DPC has run for 0x283 ticks, when the limit was 0x282.


    0: kd> k

    Child-SP          RetAddr           Call Site

    fffff803`08c18428 fffff803`098525df nt!KeBugCheckEx

    fffff803`08c18430 fffff803`09723f11 nt! ??::FNODOBFM::`string'+0x13ba4

    fffff803`08c184b0 fffff803`09724d98 nt!KeUpdateRunTime+0x51

    fffff803`08c184e0 fffff803`09634eba nt!KeUpdateTime+0x3f9

    fffff803`08c186d0 fffff803`096f24ae hal!HalpTimerClockInterrupt+0x86

    fffff803`08c18700 fffff803`0963dba2 nt!KiInterruptDispatchLBControl+0x1ce

    fffff803`08c18898 fffff803`096300d0 hal!HalpTscQueryCounter+0x2

    fffff803`08c188a0 fffff880`04be3409 hal!HalpTimerStallExecutionProcessor+0x131

    fffff803`08c18930 fffff880`011202ee ECHO!EchoEvtTimerFunc+0x7d                //Here is our driver, and we can see it calls into StallExecutionProcessor

    fffff803`08c18960 fffff803`097258b4 Wdf01000!FxTimer::TimerHandler+0x92

    fffff803`08c189a0 fffff803`09725ed5 nt!KiProcessExpiredTimerList+0x214

    fffff803`08c18ae0 fffff803`09725d88 nt!KiExpireTimerTable+0xa9

    fffff803`08c18b80 fffff803`0971fe76 nt!KiTimerExpiration+0xc8

    fffff803`08c18c30 fffff803`0972457a nt!KiRetireDpcList+0x1f6

    fffff803`08c18da0 00000000`00000000 nt!KiIdleLoop+0x5a


    Let’s view the driver’s unassembled DPC routine and see what it is doing


    0: kd> ub fffff880`04be3409


    fffff880`04be33e0 448b4320        mov     r8d,dword ptr[rbx+20h]

    fffff880`04be33e4 488b0d6d2a0000  mov     rcx,qword ptr [ECHO!WdfDriverGlobals (fffff880`04be5e58)]

    fffff880`04be33eb 4883631800      and     qword ptr [rbx+18h],0

    fffff880`04be33f0 488bd7          mov     rdx,rdi

    fffff880`04be33f3 ff150f260000    call    qword ptr [ECHO!WdfFunctions+0x838(fffff880`04be5a08)]

    fffff880`04be33f9 bbc0d40100      mov     ebx,1D4C0h

    fffff880`04be33fe b964000000      mov     ecx,64h

    fffff880`04be3403 ff15f70b0000    call    qword ptr[ECHO!_imp_KeStallExecutionProcessor (fffff880`04be4000)]   //Its Calling KeStallExecutionProcessor with 0x64 (decimal 100) as a parameter

    0: kd> u fffff880`04be3409


    fffff880`04be3409 4883eb01        sub     rbx,1

    fffff880`04be340d 75ef            jne     ECHO!EchoEvtTimerFunc+0x72 (fffff880`04be33fe)     //Here we can see it is jumping back to call KeStallExecutionProcessor in a loop

    fffff880`04be340f 488b5c2430      mov     rbx,qword ptr[rsp+30h]

    fffff880`04be3414 4883c420        add     rsp,20h

    fffff880`04be3418 5f              pop     rdi

    fffff880`04be3419 c3              ret

    fffff880`04be341a cc              int     3

    fffff880`04be341b cc              int     3


    0: kd> !pcr

    KPCR for Processor 0 at fffff80309974000:

        Major 1 Minor 1

          NtTib.ExceptionList: fffff80308c11000

              NtTib.StackBase: fffff80308c12080

             NtTib.StackLimit: 000000d70c7bf988

           NtTib.SubSystemTib: fffff80309974000

                NtTib.Version: 0000000009974180

            NtTib.UserPointer: fffff803099747f0

                NtTib.SelfTib: 000007f7ab80c000


                      SelfPcr: 0000000000000000

                         Prcb: fffff80309974180

                         Irql: 0000000000000000

                          IRR: 0000000000000000

                          IDR: 0000000000000000

                InterruptMode: 0000000000000000

                          IDT: 0000000000000000

                          GDT: 0000000000000000

                          TSS: 0000000000000000


                CurrentThread: fffff803099ce880

                   NextThread: fffffa800261cb00

                   IdleThread: fffff803099ce880


                    DpcQueue:  0xfffffa80020ce790 0xfffff880012e4e9c [Normal] NDIS!NdisReturnNetBufferLists

                               0xfffffa800185f118 0xfffff88000c0ca00 [Normal] ataport!AtaPortInitialize

                               0xfffff8030994fda0 0xfffff8030972bc30 [Normal] nt!KiBalanceSetManagerDeferredRoutine

                               0xfffffa8001dbc118 0xfffff88000c0ca00 [Normal] ataport!AtaPortInitialize

                               0xfffffa8002082300 0xfffff88001701df0 [Normal] USBPORT


    The !pcr output shows us queued DPCs for this processor. If you want to see more information about DPCs and the DPC Watchdog, you could dump the PRCB listed in the !pcr output like this:


    dt nt!_KPRCB fffff80309974180 Dpc*


    Often the driver will be calling into a function like KeStallExecutionProcessor in a loop, as in our example debug.  To resolve this problem, contact the driver vendor to request an updated driver version that spends less time in its DPC Routine.


    How to troubleshoot a 0x133 (1, …

    Determining the cause of a stop 0x133 with a first parameter of 1 is a bit more difficult because the problem is a result of DPCs running from multiple drivers, so the call stack is insufficient to determine the culprit.  To troubleshoot this stop, first make sure that the NT Kernel Logger or Circular Kernel Context Logger ETW traces are enabled on the system.  (For directions on setting this up, see http://blogs.msdn.com/b/ntdebugging/archive/2009/12/11/test.aspx.)


    Once the logging is enabled and the system bug checks, dump out the list of ETW loggers using !wmitrace.strdump. Find the ID of the NT Kernel logger or the Circular logger.  You can then use !wmitrace.logsave (ID) (path to ETL) to write out the ETL log to a file.  Load it up with Windows Performance Analyzer and add the DPC or DPC/ISR Duration by Module, Function view (located in the Computation group) to your current analysis window:



    Next, make sure the table is also shown by clicking the box in the upper right of the view:



    Ensure that the Address column is added on the left of the gold bar, then expand each address entry to see individual DPC enters/exits for each function.  Using this data, you can determine which DPC routines took the longest by looking at the inclusive duration column, which should be added to the right of the gold bar: 


    In this case, these DPCs took 1 second, which is well over the recommended maximum of 100 us.  The module column (and possible the function column, if you have symbols) will show which driver is responsible for that DPC routine.  Since our ECHO driver was based on WDF, that is the module named here.


    For an example of doing this type of analysis in xperf, see http://blogs.msdn.com/b/ntdebugging/archive/2008/04/03/windows-performance-toolkit-xperf.aspx.


    More Information

    For additional information about Stop 0x133 errors, see this page on MSDN: http://msdn.microsoft.com/en-us/library/windows/hardware/jj154556(v=vs.85).aspx.


    For DPC timing recommendations and for advice on capturing DPC timing information using tracelog, see http://msdn.microsoft.com/en-us/library/windows/hardware/ff545764(v=vs.85).aspx.


    Guidelines for writing DPC routines can be found at http://msdn.microsoft.com/en-us/library/windows/hardware/ff546551(v=vs.85).aspx.



    -Matt Burrough

  • Ntdebugging Blog

    Troubleshooting Pool Leaks Part 7 – Windows Performance Toolkit


    In Part 1 of this series we identified a pool leak in non paged pool.  In Part 2 and Part 3 of this series we identified what pool tag was leaking.  In Part 5 and Part 6 we got call stacks showing the memory being allocated.  In this article we are going to discuss a tool that combines this information into one piece of data.


    Starting with Windows 7 and Windows Server 2008 R2, Windows has new functionality to track pool allocations and frees using the Windows Performance Toolkit, commonly referred to as xperf.  For this example we will be using the WPT from the Windows 8 ADK.  When installing the ADK select only the Windows Performance Toolkit option to minimize download time.


    ADK Install


    Before collecting pool usage data on a 64-bit system, you must disable the paging of data such as drivers and call stacks.  The first time you run the Windows Performance Recorder UI and click the Start button you will be prompted that Disable Paging Executive is not set.  If you click OK to this dialog WPR will set DisablePagingExecutive and ask that you reboot.  To set this ahead of time run the following command from an elevated command prompt, and reboot afterwards:


    wpr –disablepagingexecutive on


    There are three methods to collect data from a pool leak using the Windows Performance Toolkit.  The WPR UI, WPR command line, and xperf command line each provide different methods to collect this data.


    Windows Performance Recorder UI:

    The easiest way to record a trace is with the Windows Performance Recorder, which presents a GUI to make it easy for a user to record a trace.  To collect data regarding a pool leak simply check the “Pool usage” checkbox and click the Start button.  Reproduce the leak and after a few minutes click the Save button.  Use Cancel to stop the trace after your log is saved.  Note that by default the WPR will use a circular log in memory, if you record for a long period of time the log will wrap and data will be lost.  A sequential file log will be captured by selecting File as the “Logging mode”, however these logs can become very large in a short period of time on a busy system so it is not recommended to leave the log running for longer than a few minutes.


    WPR Pool usage


    Because these traces can quickly become large, it can be helpful to trace just one pool tag.  This can be done in WPR using a custom WPR recording profile.  Below is a sample profile that collects pool usage information only for the tag “Leak”, which we identified as the leaking tag in previous articles (Note that you should change the pool tag Leak to the tag you are investigating).  Save this text in a file ending in .wprp (ie PoolTagLeak.wprp) and load it in WPR using the Add Profiles button.  Check the “Pool usage Tag ‘Leak’” option under Custom measurements.  Use the Start button to begin collecting data, reproduce the leak, and use the Save button to save the log.  After you have collected the log click Cancel to stop collecting data.


    <?xml version="1.0" encoding="utf-8"?>

    <WindowsPerformanceRecorder Version="1.0" Author="NTDebugging Blog" Copyright="Microsoft Corporation" Company="Microsoft Corporation">


        <SystemCollector Id="SystemCollector" Name="NT Kernel Logger">

          <BufferSize Value="1024" />

          <Buffers Value="1024" />


        <SystemProvider Id="SystemProvider">


            <Keyword Value="Pool" />



            <Stack Value="PoolAllocation" />

            <Stack Value="PoolAllocationSession" />



            <PoolTag Value="Leak"/>



        <Profile Id="PoolUsage.Verbose.File" Name="PoolUsage" Description="Pool usage Tag 'Leak' " LoggingMode="File" DetailLevel="Verbose">


            <SystemCollectorId Value="SystemCollector">

              <SystemProviderId Value="SystemProvider" />




        <Profile Id="PoolUsage.Verbose.Memory" Name="PoolUsage" Description="Pool usage Tag 'Leak'" Base="PoolUsage.Verbose.File" LoggingMode="Memory" DetailLevel="Verbose" />




    WPR Custom Profile


    Windows Performance Recorder command line:

    WPR can also be run from a command line if you need to script its operation, or if you prefer typing text over clicking buttons.  By default WPR will be installed at C:\Program Files (x86)\Windows Kits\8.0\Windows Performance Toolkit\wpr.exe.


    To start a trace, run the following from a command line:


    wpr -start GeneralProfile -start Pool


    To save the trace run:


    wpr –stop pool.etl “pool leak”


    Cancel the trace with:


    wpr -cancel


    Optionally, you can use the custom profile defined earlier in this article to trace just one pool tag.  This command assumes PoolTagLeak.wprp is in the same folder as wpr.exe, use the full path if your custom profile is located elsewhere.  Save and cancel the trace using the above steps.


    wpr -start PoolTagLeak.wprp


    Xperf command line:

    The third way to enable this tracing is with xperf.  This is an older tool which has been replaced by WPR, however xperf provides a circular logging functionality that isn’t available in WPR.  Circular logging can be useful if you need to run xperf over a longer period of time.  The below command enables tracing with xperf for the pool tag Leak using a circular buffer.  Note that you should change the pool tag Leak to the tag you are investigating.


    xperf -on Base+CSwitch+POOL -stackwalk PoolAlloc+PoolAllocSession –PoolTag Leak -BufferSize 1024 -MaxBuffers 1024 -MaxFile 1024 –FileMode Circular


    To save and cancel the xperf trace in one command:


    xperf -d pool.etl


    Analyzing data with Windows Performance Analyzer:

    After you have collected a trace using the method that works best for your scenario, open the etl file in the Windows Performance Analyzer.  The below output is from a trace collected with WPR.


    WPA Graph Explorer


    Pool analysis will require symbols.  Configure the symbol path using the option in the Trace menu.  Often the symbol path will be pre-populated, if it is not use srv*c:\symbols*http://msdl.microsoft.com/download/symbols.  Click the Load Symbols option from the Trace menu, and be patient while WPA downloads symbols from the symbol server.


    In the Graph Explorer click the + next to Memory to drop down the available memory graphs.  Right click the Pool Total Allocation Size graph and choose Add graph to New Analysis View.  If you are working with a small resolution screen you may want to click the X at the top of the Graph Explorer to close it, the Graph Explorer can be restored from the Window menu.


    The key to effective xperf analysis is to sort the data by the appropriate columns.  Columns can be added to the chart at the bottom of the view by right clicking the header and choosing the appropriate fields.  To perform pool analysis the Type, Paged, Pool Tag, and Stack columns are necessary.  Drag each of these columns to the left of the yellow line and sort them in the order shown below.  Click the Size column to sort it as the primary.


    WPA Pool Graphs


    The Type column indicates when the pool memory was allocated and when it was freed.  The term “AIFO” means the pool was Allocated Inside the timeframe of the trace and it was Freed Outside the timeframe of the trace (or perhaps it was never freed at all).  The term “AIFI” means the pool was Allocated Inside the timeframe of the trace, and it was also Freed Inside the timeframe of the trace (this memory was not leaked).  Because we are interested in memory that was not freed, start by clicking the + next to AIFO.


    The Paged column indicates if the pool allocations recorded are Paged or NonPaged.  From the perfmon analysis in Part 1 we know that the leak we are troubleshooting in this example is in NonPaged pool.  If a perfmon log is not available, the Size column is an indicator of what type of pool was leaked.  Click the + next to whichever type of pool is largest in your trace.


    The Pool Tag column displays the pool tag associated with each pool allocation.  Again, the Size column is an indicator of which tag is leaking.  Click the + next to the largest pool tag in your trace.


    The Stack column displays the call stack leading up to the allocation.  This is the information we are most interested in, it will indicate what driver is allocating the pool and it may indicate why.  Click the + next to the largest Stack in your trace.  Depending on how many times a particular code path is repeated, your Stack may only partially display and there may be more + options, you can use the right arrow key as a shortcut to open each of these until you see the call to ExAllocatePool.  When the complete stack has been displayed the right arrow key will stop expanding stacks.


    WPA Tag 'Leak' Call Stack


    In the above output we can see that there was a NonPaged pool leak in the tag Leak.  The call stack shows that the allocations were made by myfault.sys.  A driver developer would have a great use for this information.  If this was data from an actual leak the developer would use this output to determine that the leak is occurring due to an IOCTL sent from NotMyfault.exe!WinMain which leads to an allocation made in myfault.sys!MyfaultDeviceControl.  A developer can use this information to perform a code review and identify under what conditions MyfaultDeviceControl allocates this pool, under what conditions it should be expected to free it, and why it may not free the memory.


    This article concludes our series on troubleshooting pool leaks.  We have demonstrated various techniques which each have their own strengths and weaknesses.  Each of these techniques has a place in your debugging toolkit and are applicable to different circumstances depending on what your scenario is and what data you have available.

  • Ntdebugging Blog

    Troubleshooting Pool Leaks Part 6 – Driver Verifier


    In part 5 we used poolhittag to get call stacks of pool being allocated and freed.  This information is often essential to identifying the cause of a memory leak; however it is not always feasible to configure a live kernel debug to obtain this information.  Fortunately there are alternative methods to get such call stacks.


    Driver verifier has an option to enable pool tracking for a specific driver, or for multiple drivers.  This functionality was first introduced in Windows Vista and Windows Server 2008.  This information is also captured when driver verifier is used to enable special pool, however for the purposes of this article we will focus on using pool tracking.


    The data stored by driver verifier requires a debugger to view.  Any method of debugging can be used for this.  You can use a live kernel debug as we described in part 4, you can get a memory dump (kernel or complete, a small dump is insufficient), or you can use livekd.


    If you have used the steps from Part 1, Part 2, or Part 3, you likely have an idea which drivers are likely involved in creating the pool leak.  In this example we are generating the leak using notmyfault, the same tool we have been using in prior examples.  As seen in Part 2, the relevant driver is myfault.sys.


    Although driver verifier has GUI, the easiest way to enable this functionality is with the below command from an elevated command prompt:


    Verifier /flags 8 /driver myfault.sys


    The above command will provide the following output, allowing you to confirm that the expected settings are enabled:


    New verifier settings:


    Special pool: Disabled

    Pool tracking: Enabled

    Force IRQL checking: Disabled

    I/O verification: Disabled

    Deadlock detection: Disabled

    DMA checking: Disabled

    Security checks: Disabled

    Force pending I/O requests: Disabled

    Low resources simulation: Disabled

    IRP Logging: Disabled

    Miscellaneous checks: Disabled


    Verified drivers:





    You must restart this computer for the changes to take effect.


    After rebooting the system, reproduce the memory leak and attach a debugger or generate a memory dump after the memory has been leaked.


    Break in with the debugger (Ctrl+Break or Ctrl+C) or load the dump in windbg (File – Open Crash Dump).


    Set the symbol path and reload symbols.


    1: kd> .symfix c:\symbols

    1: kd> .reload

    Loading Kernel Symbols



    The !verifier command has various options to view information about driver verifier.  To view the pool allocations which have been tracked by verifier for notmyfault.sys, use the following:


    0: kd> !verifier 3 myfault.sys


    Verify Level 8 ... enabled options are:

          All pool allocations checked on unload


    Summary of All Verifier Statistics


    RaiseIrqls                             0x0

    AcquireSpinLocks                       0x0

    Synch Executions                       0x0

    Trims                                  0x0


    Pool Allocations Attempted             0xb

    Pool Allocations Succeeded             0xb

    Pool Allocations Succeeded SpecialPool 0xa

    Pool Allocations With NO TAG           0x1

    Pool Allocations Failed                0x0

    Resource Allocations Failed Deliberately   0x0


    Current paged pool allocations         0x0 for 00000000 bytes

    Peak paged pool allocations            0x1 for 00000080 bytes

    Current nonpaged pool allocations      0xa for 009CE000 bytes

    Peak nonpaged pool allocations         0xa for 009CE000 bytes


    Driver Verification List


    Entry     State           NonPagedPool   PagedPool   Module


    fffffa80031b5830 Loaded           009ce000       00000000    myfault.sys


    Current Pool Allocations  0000000a    00000000

    Current Pool Bytes        009ce000    00000000

    Peak Pool Allocations     0000000a    00000001

    Peak Pool Bytes           009ce000    00000080


    PoolAddress  SizeInBytes    Tag       CallersAddress

    fffffa8005400000     0x000fb000     Leak      fffff8800447d634

    fffffa80052fb000     0x000fb000     Leak      fffff8800447d634

    fffffa8005200000     0x000fb000     Leak      fffff8800447d634

    fffffa80050fb000     0x000fb000     Leak      fffff8800447d634

    fffffa8005000000     0x000fb000     Leak      fffff8800447d634

    fffffa8004efb000     0x000fb000     Leak      fffff8800447d634

    fffffa8004e00000     0x000fb000     Leak      fffff8800447d634

    fffffa8004cfb000     0x000fb000     Leak      fffff8800447d634

    fffffa8004c00000     0x000fb000     Leak      fffff8800447d634

    fffffa8004a66000     0x000fb000     Leak      fffff8800447d634


    At the bottom of the above output is the list of allocations made by notmyfault.sys.  For our purposes we are going to assume that these allocations have been leaked, as opposed to just being normal allocations that were not yet freed when the debugger was attached.


    The !verifier command has an option to view call stacks for one of the tracked allocations.  Keep in mind that the size of the database is limited and only more recent allocations will be kept in the database.


    0: kd> !verifier 80 fffffa8005400000


    Log of recent kernel pool Allocate and Free operations:


    There are up to 0x10000 entries in the log.


    Parsing 0x0000000000010000 log entries, searching for address 0xfffffa8005400000.




    Pool block fffffa8005400000, Size 00000000000fa000, Thread fffffa80044ceb60

    fffff80001927cc6 nt!VeAllocatePoolWithTagPriority+0x2b6

    fffff80001927d3d nt!VerifierExAllocatePoolEx+0x1d

    fffff8800447d634 myfault!MyfaultDeviceControl+0x358

    fffff8800447d727 myfault!MyfaultDispatch+0xb7

    fffff80001932750 nt!IovCallDriver+0xa0

    fffff800017a7a97 nt!IopXxxControlFile+0x607

    fffff800017a82f6 nt!NtDeviceIoControlFile+0x56

    fffff8000148bed3 nt!KiSystemServiceCopyEnd+0x13

    Parsed entry 0000000000010000/0000000000010000...

    Finished parsing all pool tracking information.


    The above output shows the call stack leading to the pool allocation.  This is the same information we had seen in Part 5, however we are able to obtain this information using a dump or livekd, whereas the steps from Part 5 required an invasive debug and extended system downtime.


    When you have completed troubleshooting, disable driver verifier with the following command and reboot:

    verifier /reset

Page 4 of 24 (240 items) «23456»