• Ntdebugging Blog

    Windows Troubleshooting – Special Pool

    • 0 Comments

    The Windows Support team has a new YouTube channel, “Windows Troubleshooting”.  The first set of videos cover debugging blue screens.

    In this video, Bob Golding, Senior Escalation Engineer, describes how the Special Pool Windows diagnostics tool catches drivers that corrupt memory. Bob also introduces how memory is organized in the system for allocating memory for drivers.

  • Ntdebugging Blog

    We Need Your XPERF Feedback

    • 0 Comments

    Hi,

     

    This is Tate asking for your direct and valuable feedback on XPerf (The Windows Performance Toolkit).  We are knee deep in working with the creators of the tool (The Windows Fundamentals Team) to provide end user feedback which will drive features for the next version. Now that you as our readers have seen several blog posts on the utility and hopefully have used it for a few of your own debugs, what would you like to see in terms of new features, fixes, data collection steps, etc?

     

    Particular areas of interest to the team are:

     

    -        Data Collection – easy or difficult, what would you change?

    -        Usage Scenarios – stack walking is powerful, what would you like the tool to expose that it does not?

    -        Any blocking factors keeping you from using the toolkit, what are they, how could we remedy that for you?

    -        Data Analysis – what would make your life as a debugger using the toolkit better?

     

    Please send us your feedback below and thanks in advance.

     

    -Tate

  • Ntdebugging Blog

    Commitment Failures, Not Just a Failed Love Story

    • 1 Comments

    I was working on a debug the other day when I ran the “!vm” command and saw that the system had some 48,000 commit requests that failed. This was strange as the system was not out of memory and the page file was not full. I was left scratching my head and thinking “I wish I knew where !vm got that information from.” So I went on a quest to find out, here is what I found.

     

    13: kd> !vm 1

     

    *** Virtual Memory Usage ***

          Physical Memory:    12580300 (  50321200 Kb)

          Page File: \??\C:\pagefile.sys

            Current:  50331648 Kb  Free Space:  50306732 Kb

            Minimum:  50331648 Kb  Maximum:     50331648 Kb

          Available Pages:     4606721 (  18426884 Kb)

          ResAvail Pages:     12189247 (  48756988 Kb)

          Locked IO Pages:           0 (         0 Kb)

          Free System PTEs:   33460257 ( 133841028 Kb)

          Modified Pages:        20299 (     81196 Kb)

          Modified PF Pages:      6154 (     24616 Kb)

          NonPagedPool 0 Used:   19544 (     78176 Kb)

          NonPagedPool 1 Used:   22308 (     89232 Kb)

          NonPagedPool Usage:    53108 (    212432 Kb)

          NonPagedPool Max:    9408956 (  37635824 Kb)

          PagedPool 0 Usage:    168921 (    675684 Kb)

          PagedPool 1 Usage:   4149241 (  16596964 Kb)

          PagedPool 2 Usage:     17908 (     71632 Kb)

          PagedPool Usage:     4336070 (  17344280 Kb)

          PagedPool Maximum:  33554432 ( 134217728 Kb)

          Session Commit:         3438 (     13752 Kb)

          Shared Commit:          6522 (     26088 Kb)

          Special Pool:              0 (         0 Kb)

          Shared Process:        53597 (    214388 Kb)

          PagedPool Commit:    4336140 (  17344560 Kb)

          Driver Commit:          5691 (     22764 Kb)

          Committed pages:     5565215 (  22260860 Kb)

          Commit limit:       25162749 ( 100650996 Kb)

     

          ********** 48440 commit requests have failed  **********

     

    It turns out that this calculation is from a global ULONG array named “nt!MiChargeCommitmentFailures”.  The array has 4 members and they are used to trace the types of commit failures that have taken place. This is done by first calculating the new commit size NewCommitValue = CurrentCommitValue + SystemReservedMemory. Based on this calculation commit errors are tracked in a few different ways, which are listed below with the corresponding member in the array that is incremented.

     

    MiChargeCommitmentFailures[0] - If the system failed a commit request and an expansion of the pagefile has failed.

    MiChargeCommitmentFailures[1] - If the system failed a commit and we have already reached the maximum pagefile size.

    MiChargeCommitmentFailures[2] - If the system failed a commit while the pagefile lock is held.

    MiChargeCommitmentFailures[3] - If the system failed a commit and the NewCommitValue is less than or equal to CurrentCommitValue.

     

    In order to calculate the count of failures, "!vm" adds up the values stored in each array member of the array. Members 0 and 1 are always counted, member 2 is counted if the OS version is Windows 2003/XP and member 3 is counted if the build version is newer than Windows 2003/XP. 

     

    Let's look at the array in the dump I was debugging:

     

    13: kd> dc nt!MiChargeCommitmentFailures L4

    fffff800`01e45ce0  00000000 0000bd38 00000000 00000000  ....8...........

     

     

    Converting this to decimal we find the 48000+ commit failures I was seeing the in output of !VM.

     

    13: kd> ?0000bd38

    Evaluate expression: 48440 = 00000000`0000bd38

     

    Since I now had my answer, “where does the number come from?”, I was left wanting to know a bit more about the overall flow of why a VirtualAlloc fails to commit.

     

    When memory is allocated by VirtualAlloc the newly allocated memory is not committed to physical memory. Only when the memory is accessed by a read or write is it backed by physical memory.

     

    When this newly allocated memory is accessed for the first time it will need to be backed by commit space. Under normal conditions this is a smooth process, however when the system hits what’s called the commit limit and can’t expand this limit we see commit failures.

     

    So how is the commit limit calculated? Let’s say we have a system with 4GB of physical memory and a pagefile that is 6GB in size. To determine the commit limit we add physical memory and the pagefile size together - in this example the commit limit would be 10GB. Since memory manger will not let any user mode allocation consume every last morsel of commit space it keeps a small amount of the commit space for the system to avoid hangs. When the limit is reached the system tries to grow the page file. If there is no more room to grow the pagefile or the pagefile has reached its configured maximum size, the system will try and free some committed memory to make room for more requests. If expansion of the page file or the attempt to free memory do not allow the allocation to complete, the allocation fails and MiChargeCommitmentFailures is incremented.

     

     

    To sum it all up, commit limit is RAM + pagefile, commit failures happen when we hit the commit limit and the system is unable to grow the pagefile because it is already at its max.  It’s that simple, well almost.

     

    For those that will want to know more about how memory manger works please see the post from Somak: The Memory Shell Game.

     

    Randy Monteleone

  • Ntdebugging Blog

    How to have a colorful relationship with your dump files

    • 3 Comments

    Hello NTDebuggers…  I look at a lot of dump files every day.  This being the case I like to take full advantage of the customizable look and feel of windbg.   I actually have an association setup between DMP files and a CMD file that loads my customized COLOR workspace each time I double click on a dump file.  I like a black background with colorized source and debug command window output.   

    Here is a snapshot of my typical debug session.

    clip_image001[12]

    Here is how you set it up.

    1) Create the following CMD file and put it in your path.  It’s called D.CMD on my system..

    echo off
    Title kd.exe -z %1
    C:
    CD\debuggers
    start C:\Debuggers\windbg.exe -z %1 -W color

    2) Load windbg and under view \ options configure your color options.  Note: you will want to set every option in the list box.

    clip_image002 clip_image003

    3) Save the workspace by selecting FILE \ Save Workspace, I saved mine as COLOR

    4) If you don’t have a dump file create one by opening a process in the debugger and doing a .dump /ma C:\test\test.dmp (I’ll assume you have a directory called C:\test)

    5) Select the C:\test\test.dmp file in explorer and right click on the properties. (note I used badwindow.dmp)

    clip_image005

    6) Change the dump file association via drilling down to Change \ Browse \ (Select D.CMD)

    clip_image006

    7) That’s all there is to it.  Now you can simply double click on any DMP file and windbg will start and load with your COLOR workspace.

    Good luck and happy debugging.

    Jeff Dailey-

  • Ntdebugging Blog

    WMI Nugget: How to Gather the Provider Binary from a WMI Class Name

    • 0 Comments

    It's Venkatesh with a WMI nugget. While troubleshooting or debugging WMI issues you may come across WMI queries wherein you don’t know which provider implemented the WMI class used in the query. You may want to know the binary and the product that implemented the provider so you can contact the vendor or upgrade the binary to see if it resolves the issue you are investigating.

    To help make the investigation easier, Windows Vista and all post operating systems have the cool new 'Analytic and Debug log' for WMI tracing. To get similar tracing in Windows 2003 and earlier versions, WMI has a “Verbose Logging” option enabled through the WMI control (WmiMgmt.msc). Check out this blog for detailed steps to enable the WMI Analytic and Debug -

    http://blogs.msdn.com/wmi/archive/2009/05/27/is-wmiprvse-a-real-villain.aspx

    After enabling WMI logging you will see events like the ones pictured below. There may be a particular WMI query relevant to the issue you're troubleshooting requiring you to know the provider binary for the WMI class in the query. Apart from the class name (InstProvSamp) in the query you need the WMI namespace to which this class belongs to such as '\\.\root\default' as shown below.

    image

    Fig 1: Event ID 1 in the Windows Vista’s Event Viewer\Windows Logs\Applications and Services Logs\Microsoft\Windows\WMI-Activity\Trace log

    Now we have the WMI Namespace and the class name so let’s find the provider for it.

    1. Get the provider name from the Class Name and Namespace:

    Every WMI class has a qualifier “provider” containing the name of the provider to which it belongs. Using wbemtest.exe connect to the namespace we mentioned above(\\.\root\default). Then get the class information for our class, InstProvSamp using the “Open Class…” button. After clicking "OK" the Object editor for InstrProvSamp displays the class information for InstProvSamp including the name of the provider it belongs to (CIM_STRING qualifier “provider” as shown below).

    image

    image

    2. Get the class identifier (CLSID) of the provider from the provider name:

    For every WMI provider WMI maintains an instance of system class __Win32Provider which contains information about the physical implementation of the provider like CLSID which contains the GUID that represents the class identifier (CLSID) of the provider COM object. Now query the repository to get the provider information for “InstProvSamp” provider as I did below.

    image

    image 

    image

    3. Get the provider binary from the provider’s CLSID:

    During registration every COM object adds an entry under HKLM\SOFTWARE\Classes\CLSID. Using regedit.exe you can browse to provider’s COM registration information using the CLSID obtained from Step 2 as shown below.

    image

    This should help you connect the dots while debugging your WMI issue. Let me know if you have additional questions!

    Share this post :
  • Ntdebugging Blog

    ResAvail Pages and Working Sets

    • 0 Comments

    Hello everyone, I'm Ray and I'm here to talk a bit about a dump I recently looked at and a little-referenced memory counter called ResAvail Pages (resident available pages).

     

    The problem statement was:  The server hangs after a while.

     

    Not terribly informative, but that's where we start with many cases. First some good housekeeping:

     

    0: kd> vertarget

    Windows 7 Kernel Version 7601 (Service Pack 1) MP (2 procs) Free x64

    Product: Server, suite: TerminalServer SingleUserTS

    Built by: 7601.18113.amd64fre.win7sp1_gdr.130318-1533

    Machine Name: "ASDFASDF1234"

    Kernel base = 0xfffff800`01665000 PsLoadedModuleList = 0xfffff800`018a8670

    Debug session time: Thu Aug  8 09:39:26.992 2013 (UTC - 4:00)

    System Uptime: 9 days 1:08:39.307

     

    Of course Windows 7 Server == Server 2008 R2.

     

    One of the basic things I check at the beginning of these hang dumps with vague problem statements is the memory information.

     

    0: kd> !vm 21

     

    *** Virtual Memory Usage ***

    Physical Memory:     2097038 (   8388152 Kb)

    Page File: \??\C:\pagefile.sys

      Current:  12582912 Kb  Free Space:  12539700 Kb

      Minimum:  12582912 Kb  Maximum:     12582912 Kb

    Available Pages:      286693 (   1146772 Kb)

    ResAvail Pages:          135 (       540 Kb)

     

    ********** Running out of physical memory **********

     

    Locked IO Pages:           0 (         0 Kb)

    Free System PTEs:   33526408 ( 134105632 Kb)

     

    ******* 12 system cache map requests have failed ******

     

    Modified Pages:         4017 (     16068 Kb)

    Modified PF Pages:      4017 (     16068 Kb)

    NonPagedPool Usage:   113241 (    452964 Kb)

    NonPagedPool Max:    1561592 (   6246368 Kb)

    PagedPool 0 Usage:     35325 (    141300 Kb)

    PagedPool 1 Usage:     28162 (    112648 Kb)

    PagedPool 2 Usage:     24351 (     97404 Kb)

    PagedPool 3 Usage:     24350 (     97400 Kb)

    PagedPool 4 Usage:     24516 (     98064 Kb)

    PagedPool Usage:      136704 (    546816 Kb)

    PagedPool Maximum:  33554432 ( 134217728 Kb)

     

    ********** 222 pool allocations have failed **********

     

    Session Commit:         6013 (     24052 Kb)

    Shared Commit:          6150 (     24600 Kb)

    Special Pool:              0 (         0 Kb)

    Shared Process:      1214088 (   4856352 Kb)

    Pages For MDLs:           67 (       268 Kb)

    PagedPool Commit:     136768 (    547072 Kb)

    Driver Commit:         15548 (     62192 Kb)

    Committed pages:     1648790 (   6595160 Kb)

    Commit limit:        5242301 (  20969204 Kb)

     

    So we're failing to allocate pool, but we aren't out of virtual memory for paged pool or nonpaged pool.  Let's look at the breakdown:

     

    0: kd> dd nt!MmPoolFailures l?9

    fffff800`01892160  000001be 00000000 00000000 00000002

    fffff800`01892170  00000000 00000000 00000000 00000000

    fffff800`01892180  00000000

     

    Where:

        yellow   = Nonpaged high/medium/low priority failures

        green    = Paged high/medium/low priority failures

        cyan      = Session paged high/medium/low priority failures

     

    So we actually failed both nonpaged AND paged pool allocations in this case.  Why?  We're "Running out of physical memory", obviously.  So where does this running out of physical memory message come from?  In the above example this is from the ResAvail Pages counter.

     

    ResAvail Pages is the amount of physical memory there would be if every working set was at its minimum size and only what needs to be resident in RAM was present (e.g. PFN database, system PTEs, driver images, kernel thread stacks, nonpaged pool, etc).

     

    Where did this memory go then?  We have plenty of Available Pages (Free + Zero + Standby) for use.  So something is claiming memory it isn't actually using.  In this type of situation one of the things I immediately suspect is process working set minimums. Working set basically means the physical memory used by a process.

     

    So let's check.

     

    0: kd> !process 0 1

     

    <a lot of processes in this output>.

     

    PROCESS fffffa8008f76060

        SessionId: 0  Cid: 0adc    Peb: 7fffffda000  ParentCid: 0678

        DirBase: 204ac9000  ObjectTable: 00000000  HandleCount:   0.

        Image: cscript.exe

        VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.

        DeviceMap fffff8a000008a70

        Token                             fffff8a0046f9c50

        ElapsedTime                       9 Days 01:08:00.134

        UserTime                          00:00:00.000

        KernelTime                        00:00:00.015

        QuotaPoolUsage[PagedPool]         0

        QuotaPoolUsage[NonPagedPool]      0

        Working Set Sizes (now,min,max)  (5, 50, 345) (20KB, 200KB, 1380KB)

        PeakWorkingSetSize                1454

        VirtualSize                       65 Mb

        PeakVirtualSize                   84 Mb

        PageFaultCount                    1628

        MemoryPriority                    BACKGROUND

        BasePriority                      8

        CommitCharge                      0

     

    I have only shown one example process above for brevity's sake, but there were thousands returned.  241,423 to be precise.  None had abnormally high process working set minimums, but cumulatively their usage adds up.

     

    The “now” process working set is lower than the minimum working set.  How is that possible?  Well, the minimum and maximum are not hard limits, but suggested limits.  For example, the minimum working set is honored unless there is memory pressure, in which case it can be trimmed below this value.  There is a way to set the min and/or max as hard limits on specific processes by using the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag via SetProcessWorkingSetSize.

     

    You can view if the minimum and maximum working set values are configured in the _EPROCESS->Vm->Flags structure.  Note these numbers are from another system as this structure was already torn down for the processes we were looking at.

     

    0: kd> dt _EPROCESS fffffa8008f76060 Vm

    nt!_EPROCESS

       +0x398 Vm : _MMSUPPORT

    0: kd> dt _MMSUPPORT fffffa8008f76060+0x398

    nt!_MMSUPPORT

       +0x000 WorkingSetMutex  : _EX_PUSH_LOCK

       +0x008 ExitGate         : 0xfffff880`00961000 _KGATE

       +0x010 AccessLog        : (null)

       +0x018 WorkingSetExpansionLinks : _LIST_ENTRY [ 0x00000000`00000000 - 0xfffffa80`08f3c410 ]

       +0x028 AgeDistribution  : [7] 0

       +0x044 MinimumWorkingSetSize : 0x32

       +0x048 WorkingSetSize   : 5

       +0x04c WorkingSetPrivateSize : 5

       +0x050 MaximumWorkingSetSize : 0x159

       +0x054 ChargedWslePages : 0

       +0x058 ActualWslePages  : 0

       +0x05c WorkingSetSizeOverhead : 0

       +0x060 PeakWorkingSetSize : 0x5ae

       +0x064 HardFaultCount   : 0x41

       +0x068 VmWorkingSetList : 0xfffff700`01080000 _MMWSL

       +0x070 NextPageColor    : 0x2dac

       +0x072 LastTrimStamp    : 0

       +0x074 PageFaultCount   : 0x65c

       +0x078 RepurposeCount   : 0x1e1

       +0x07c Spare            : [2] 0

       +0x084 Flags            : _MMSUPPORT_FLAGS

    0: kd> dt _MMSUPPORT_FLAGS fffffa8008f76060+0x398+0x84

    nt!_MMSUPPORT_FLAGS

       +0x000 WorkingSetType   : 0y000

       +0x000 ModwriterAttached : 0y0

       +0x000 TrimHard         : 0y0

       +0x000 MaximumWorkingSetHard : 0y0

       +0x000 ForceTrim        : 0y0

       +0x000 MinimumWorkingSetHard : 0y0

       +0x001 SessionMaster    : 0y0

       +0x001 TrimmerState     : 0y00

       +0x001 Reserved         : 0y0

       +0x001 PageStealers     : 0y0000

       +0x002 MemoryPriority   : 0y00000000 (0)

       +0x003 WsleDeleted      : 0y1

       +0x003 VmExiting        : 0y1

       +0x003 ExpansionFailed  : 0y0

       +0x003 Available        : 0y00000 (0)

     

    How about some more detail?

     

    0: kd> !process fffffa8008f76060

    PROCESS fffffa8008f76060

        SessionId: 0  Cid: 0adc    Peb: 7fffffda000  ParentCid: 0678

        DirBase: 204ac9000  ObjectTable: 00000000  HandleCount:  0.

        Image: cscript.exe

        VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.

        DeviceMap fffff8a000008a70

        Token                             fffff8a0046f9c50

        ElapsedTime                       9 Days 01:08:00.134

        UserTime                          00:00:00.000

        KernelTime                        00:00:00.015

        QuotaPoolUsage[PagedPool]         0

        QuotaPoolUsage[NonPagedPool]      0

        Working Set Sizes (now,min,max)  (5, 50, 345) (20KB, 200KB, 1380KB)

        PeakWorkingSetSize                1454

        VirtualSize                       65 Mb

        PeakVirtualSize                   84 Mb

        PageFaultCount                    1628

        MemoryPriority                    BACKGROUND

        BasePriority                      8

        CommitCharge                      0

     

    No active threads

     

    0: kd> !object fffffa8008f76060

    Object: fffffa8008f76060  Type: (fffffa8006cccc90) Process

        ObjectHeader: fffffa8008f76030 (new version)

        HandleCount: 0  PointerCount: 1

     

    The highlighted information shows us that this process has no active threads left but the process object itself (and its 20KB working set use) were still hanging around because a kernel driver had a reference to the object that it never released.  Sampling other entries shows the server had been leaking process objects since it was booted.

     

    Unfortunately trying to directly track down pointer leaks on process objects is difficult and requires an instrumented kernel, so we tried to check the easy stuff first before going that route.  We know it has to be a kernel driver doing this (since it is a pointer and not a handle leak) so we looked at the list of 3rd party drivers installed.  Note: The driver names have been redacted.

     

    0: kd> lm

    start             end                 module name

    <snip>

    fffff880`04112000 fffff880`04121e00   driver1    (no symbols)   <-- no symbols usually means 3rd party       

    fffff880`04158000 fffff880`041a4c00   driver2    (no symbols)          

    <snip>

     

    0: kd> lmvm driver1   

    Browse full module list

    start             end                 module name

    fffff880`04112000 fffff880`04121e00   driver1    (no symbols)          

        Loaded symbol image file: driver1.sys

        Image path: \SystemRoot\system32\DRIVERS\driver1.sys

        Image name: driver1.sys

        Browse all global symbols  functions  data

        Timestamp:        Wed Dec 13 12:09:32 2006 (458033CC)

        CheckSum:         0001669E

        ImageSize:        0000FE00

        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

    0: kd> lmvm driver2

    Browse full module list

    start             end                 module name

    fffff880`04158000 fffff880`041a4c00   driver2    (no symbols)          

        Loaded symbol image file: driver2.sys

        Image path: \??\C:\Windows\system32\drivers\driver2.sys

        Image name: driver2.sys

        Browse all global symbols  functions  data

        Timestamp:        Thu Nov 30 12:12:07 2006 (456F10E7)

        CheckSum:         0004FE8E

        ImageSize:        0004CC00

        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

     

    Fortunately for both the customer and us we turned up a pair of drivers that predated Windows Vista (meaning they were designed for XP/2003) that raised an eyebrow.  Of course we need a more solid evidence link than just "it's an old driver", so I did a quick search of our internal KB.  This turned up several other customers who had these same drivers installed, experienced the same problem, then removed them and the problem went away.  That sounds like a pretty good evidence link. We implemented the same plan for this customer successfully.

  • Ntdebugging Blog

    Where the rubber meets the road, or in this case the hardware meets the probe.

    • 1 Comments

    Hi my name is Bob, I’m an Escalation engineer with the Microsoft critical problem resolution team.  We had one of our readers ask how much we deal with hardware.  Well in response we recently worked on an interesting problem I thought I would share with you.  In this case it was interesting because it demonstrated an issue specific to multi-processor machines and something that probably sounded innocuous to the driver writer who caused the  problem.

     

    What was the problem?

     

    The problem was that everything worked except the time in the system was not being updated. The RTC had stopped.   We spent time reviewing the semantics of how the RTC should function and even hooked up an oscilloscope to the RTC on the motherboard and were able to turn it off and on with the debugger by writing out the correct port.  The trace on the scope validated our understanding of what had to be written to the port to turn the clock off.    One we had a clear understanding of this we understood what we were looking for in a driver that might cause the problem.  Note the clock typically fires every 10ms so you do not need a fast scope to do this.

     

     

    Special keyboard driver written

     

    In order to catch a dump in state we had to modify the keyboard driver.  It would cause an “Int 3” in its ISR instead of calling “bug check” for an E2 stop.    Because the RTC was not running the idle thread was not getting quantums and as a result a normal break in would not work.  However the system would respond to ISRs. 

     

    What was found

     

    All RTC interrupts were stopped - the clock was not running.  We checked all the obvious places to see if the RTC was disabled.

     

    We looked at the ICR in the I/O APIC.  This is the interrupt configuration register. There is a register for every interrupt pin on the APIC. These registers are used to tell the APIC what vector to send the processor so the processor can service the interrupt.  It also has configuration information about level and if it edge triggered and a mask bit.  The mask bit was not set.

    Below is a reenactment.

     

    0: kd> ed ffd01000

    ffd01000 00000034 20 ß Select register 20 which is pin 8.

     

     

    0: kd> ed ffd01010

    ffd01010 000008d1    ß Contents ß Vect D1 Bit 16 the interrupt mask bit is not set so it is OK.

     

     

    Next check the RTC status register which are I/O ports 70 and 71.  Port 70 is the address port.  Port 71 is the data port.  This information is from an old BIOS book.

     

    0: kd> ob 70 b       ß ‘B’ is a control register.   

    0: kd> ib 71

    00000071: 42         ß The value 42 means that the RTC is enabled.  Bit 6 is the enable.

     

     

    So what was it?

     

    The way the RTC works is it will interrupt at a certain interval.  When the interrupt is serviced, the status register has to be read to start it again.

     

    We discovered another driver that was reading  the clock, this was done by disassembly various drivers in the dump and looking for the I/O operation to ports 70 or 71.  The lower addresses selected by port 70 will yield the time when read.  That is what the driver was doing.

     

    You would think that simply reading the time in this way would not hurt anything.  However, in a multi-processor system, access has to be serialized.  There is only one set of I/O ports for the system.

     

    Since it takes two accesses to perform an operation on the clock, one address & one data, a collision between two processors can cause undetermined results.

     

    Below is a timing diagram of the issue;

     

     

    Proc 0 running OS RTC handler                         Proc 1 running XYZ driver

     

      T1          Set register select to status register

     

      T2                                                                                          Set register select to read time

     

      T3          Read status register to restart clock

     

     

    So at T3 the OS RTC handler reads the wrong register so the clock does not start. 

     

     

    Conclusion

     

    I thought this was an interesting problem that illustrates the need for serialization.  And it demonstrates what to look out for in a multi-proc environment.  You always have to think “What happens if the other processor does…….”

     

    For more information consult any BIOS developer manuals you may have lying around or this link we found http://www.geocities.com/SiliconValley/Campus/1671/docs/rtc.htm

    See the “Status Register C” section, “All future interrupts are disabled until this register is read - your interrupt handler *must* do it.”

  • Ntdebugging Blog

    Use Caution When Implementing IPC for Performance Counters

    • 0 Comments

     

    Recently I was working with a developer who had created performance counters that work in Performance Monitor but are never collected in a user defined data collector set.  The customer explained that their counters update named shared memory inside the application which should be read by perfmon or the data collector set.

     

    Putting counter data in shared memory is a common technique for performance counter developers.  A programmer can update performance data in a block of shared memory in their application and then use a performance extension dll (aka an “extensible counter”) to read from the shared memory.

     

    Shared memory is created by calling CreateFileMapping and MapViewOfFile.  This memory is then accessed by another application by calling OpenFileMapping.  All applications which use this shared memory must pass the same lpName to CreateFileMapping or OpenFileMapping.  An example of using these APIs to implement shared memory is available on MSDN.

     

    Based on the customer’s explanation that they are populating shared memory in their application, and their counters work in Performance Monitor but do not work in a user defined data collector set, I suspected that OpenFileMapping was failing for the data collector set.

     

    User defined data collector sets run in a rundll32.exe process.  If you have multiple rundll32.exe processes you may need to identify which one is related to your data collector set.  The relevant process has a command line similar to “rundll32.exe C:\Windows\system32\pla.dll,PlaHost”.  There are several tools that can be used to identify the command line of the process such as tlist.exe, which is included with the Debugging Tools for Windows.

     

    After attaching a debugger to rundll32.exe, I wanted to break on the ret instruction at the end of the OpenFileMappingW function.  This would allow me to determine if the function succeeds or fails.  According to MSDN “If the function fails, the return value is NULL. To get extended error information, call GetLastError.”

     

    The uf command is an easy way to unassemble a function and find the ret instruction to break on.

     

    0:001> uf kernelbase!OpenFileMappingW

    kernelbase!OpenFileMappingW:

    75b88e0d 8bff            mov     edi,edi

    <snip>

    kernelbase!OpenFileMappingW+0x8f:

    75b88e79 c9              leave

    75b88e7a c20c00          ret     0Ch

    0:001> bp 75b88e7a

    0:001> g

    Breakpoint 0 hit

    eax=00000000 ebx=00008022 ecx=7ffd8000 edx=00000002 esi=05abf03c edi=00000000

    eip=75b88e7a esp=05abeb20 ebp=05abeb3c iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    kernelbase!OpenFileMappingW+0x90:

    75b88e7a c20c00          ret     0Ch

     

    In the above output we can see that the eax register is NULL, indicating that the call to OpenFileMapping failed.  The !gle command will show the last error and last status.

     

    0:008> !gle

    LastErrorValue: (Win32) 0x2 (2) - The system cannot find the file specified.

    LastStatusValue: (NTSTATUS) 0xc0000034 - Object Name not found.

     

    The failure is that OpenFileMapping cannot find the file.  The file name is the third parameter to OpenFileMapping.  We can get the first three parameters from the kb command.

     

    0:008> kb

    ChildEBP RetAddr  Args to Child             

    05abf0d0 6abae355 0002001f 00000000 05abeb7c kernelbase!OpenFileMappingW+0x90

    WARNING: Stack unwind information not available. Following frames may be wrong.

    05abf0f8 7784fe67 02a7ae90 05abf224 05abf254 ninjaprf+0x10edb

    05abf110 7784fc97 00472158 02a7ae90 05abf224 advapi32!CallExtObj+0x17

    05abf270 7784efaf 05abf2bc 60fcfa02 05abf778 advapi32!QueryExtensibleData+0x735

    05abf654 75ff0468 80000004 05abf778 00000000 advapi32!PerfRegQueryValue+0x5da

    05abf748 75ffd505 80000004 05abf778 05abf790 kernel32!LocalBaseRegQueryValue+0x366

    05abf7b4 61247dc5 80000004 02a7ae90 00000000 kernel32!RegQueryValueExW+0xb7

    05abf830 61250595 80000004 02a7ae58 02a7ae90 pdh!GetSystemPerfData+0x92

    05abf89c 6124c753 02a407d0 05abf8e8 61241928 pdh!GetQueryPerfData+0xa4

    05abf8b8 61254463 02a407d0 05abf8e8 60fcf32f pdh!PdhiCollectQueryData+0x32

    05abf90c 611c6d04 02a58f08 00000000 75ffc3e0 pdh!PdhUpdateLogW+0xa2

    05abf9bc 611be128 0045c968 00000000 00000000 pla!HPerformanceCounterDataCollector::Cycle+0x48

    05abf9bc 00000000 0045c968 00000000 00000000 pla!PlaiCollectorControl+0x3b7

    0:008> da 05abeb7c

    05abeb7c  "Local\NINJAPERF_S-1-5-18"

     

    The user defined data collector set is failing to open the file "Local\NINJAPERF_S-1-5-18".  This is the name that the performance extension dll ninjaprf.dll has given to its shared memory.

     

    Based on the customer’s description this operation works in Performance Monitor.  Next, I attached a debugger to perfmon and set the same breakpoint.

     

    Breakpoint 0 hit

    eax=000009f8 ebx=00008022 ecx=a7330000 edx=080ee678 esi=06798070 edi=00000000

    eip=760be9bb esp=0a84e564 ebp=0a84e580 iopl=0         nv up ei pl zr na pe nc

    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246

    kernelbase!OpenFileMappingW+0x90:

    760be9bb c20c00          ret     0Ch

     

    In the above output we can see that eax is a handle number, indicating that the function succeeded.

     

    Looking at the file being opened we can see why this works in one scenario and not in another.

     

    0:016> kb 1

    ChildEBP RetAddr  Args to Child             

    0a84e580 698e4ab9 0002001f 00000000 0a84e5c0 kernelbase!OpenFileMappingW+0x90

    0:016> da 0a84e5c0

    0a84e5c0  "Local\NINJAPERF_S-1-5-21-123578"

    0a84e5e0  "095-571698237-1598563147-18961"

     

    The file name used is unique for each user.  The ninjaprf dll has chosen a file name which includes the SID of the current user.  This works for Performance Monitor because the user who starts the application is the same as the user who runs Performance Monitor.  However, a user defined data collector set runs in rundll32.exe as the Local System account.

     

    It is possible to run the user defined data collector set as a different user, however the file name being used will not work in that scenario either.  By using the “Local\” prefix, the file is created in the local session namespace.  The application runs in the user’s session, while rundll32.exe is started by the Task Scheduler service and runs in session 0.  This prevents the user defined data collector set from seeing the file created by the application.

     

    If your performance counter uses shared memory to communicate, be aware that your performance extension dll may be run in the context of a different user account and a different user session.  Your inter process communication techniques must account for this, or your customers will ask you why your counters do not work outside of Performance Monitor.

  • Ntdebugging Blog

    Uncover the mystery of a bugcheck 0x24 (NTFS_FILE_SYSTEM)

    • 0 Comments

     

    My name is Nan, I am an Escalation Engineer in Platforms Global Escalation Services in GCR. Today I’d like to share an interesting sample case with respect to filter manager to showcase the powerful debugging extension fltkd.dll (it is included in the public debugging tools) and talk about Filter Manager/Minifilter architecture briefly through the investigation.

     

    Recently I worked on an interesting server bugcheck case where Windows Server 2012 R2 would crash with bugcheck code 0x24 (NTFS_FILE_SYSTEM) after mounting a ReFS volume.

     

    3: kd> .bugcheck

    Bugcheck code 00000024

    Arguments 000000b5`00190637 ffffd000`23f17e58 ffffd000`23f17660 fffff800`00cf95c7

     

    The 2nd and 3rd parameters are the exception record and context record. Do a .cxr on the 3rd parameter and we obtain a more informative stack.

     

    3: kd> .cxr ffffd00023f17660

    rax=ffffe0003c8dae20 rbx=ffffc00003dba2c0 rcx=0000000000000018

    rdx=ffffe0003c84bc60 rsi=ffffd00023f18210 rdi=ffffc00003dba120

    rip=fffff80000cf95c7 rsp=ffffd00023f18090 rbp=0000000000000000

    r8=0000000100000000  r9=0000000000000001 r10=0000000000000004

    r11=ffffd00023f18168 r12=0000000000000000 r13=ffffe0003c84bf70

    r14=0000000000000005 r15=0000000000000000

    iopl=0         nv up ei ng nz na pe nc

    cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010282

    Ntfs!NtfsCommonQueryInformation+0xa7:

    fffff800`00cf95c7 418b4104        mov    eax,dword ptr [r9+4] ds:002b:00000000`00000005=???????? <<<< R9 has invalid data.

    3: kd> knL

      *** Stack trace for last set context - .thread/.cxr resets it

    # Child-SP          RetAddr           Call Site

    00 ffffd000`23f18090 fffff800`00d00813 Ntfs!NtfsCommonQueryInformation+0xa7

    01 ffffd000`23f18170 fffff800`00d01156 Ntfs!NtfsFsdDispatchSwitch+0xd3

    02 ffffd000`23f181f0 fffff800`00402f3e Ntfs!NtfsFsdDispatchWait+0x47

    03 ffffd000`23f18440 fffff800`00406986 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x25e

    04 ffffd000`23f184c0 fffff800`0042ef2a FLTMGR!FltPerformSynchronousIo+0x2b6

    05 ffffd000`23f18570 fffff800`01041843 FLTMGR!FltQueryInformationFile+0x52

    06 ffffd000`23f185b0 fffff800`010417bb av+0x3843

    07 ffffd000`23f18600 fffff800`00f1c06d av+0x37bb

    08 ffffd000`23f18630 fffff800`00401eea fltdrv1+0xc06d

    09 ffffd000`23f18720 fffff800`00405c13 FLTMGR!FltpPerformPreCallbacks+0x31a

    0a ffffd000`23f18830 fffff800`0042bcf2 FLTMGR!FltpPassThroughFastIo+0x73

    0b ffffd000`23f18890 fffff801`6c9b62b1 FLTMGR!FltpFastIoQueryStandardInfo+0x132

    0c ffffd000`23f18920 fffff801`6c7684b3 nt!NtQueryInformationFile+0x3e1

    0d ffffd000`23f18a90 00007ff9`e632668a nt!KiSystemServiceCopyEnd+0x13

    0e 00000000`09bcd828 00000000`00000000 0x00007ff9`e632668a

     

    An exception happens in Ntfs.sys because Ntfs!NtfsCommonQueryInformation attempts to access an invalid memory address in kernel mode.  We look back a few instructions to figure how R9 obtains the bogus value.

     

    3: kd> ub fffff80000cf95c7

    Ntfs!NtfsCommonQueryInformation+0x81:

    fffff800`00cf95a1 4c8b8fa8000000  mov    r9,qword ptr [rdi+0A8h] <<<< where R9 value is obtained.

    fffff800`00cf95a8 4d894bb0        mov     qword ptr [r11-50h],r9

    fffff800`00cf95ac 4d894ba0        mov     qword ptr [r11-60h],r9

    fffff800`00cf95b0 488b5820        mov     rbx,qword ptr [rax+20h]

    fffff800`00cf95b4 49895ba8        mov     qword ptr [r11-58h],rbx

    fffff800`00cf95b8 4c8b87a0000000  mov     r8,qword ptr [rdi+0A0h]

    fffff800`00cf95bf 4d894388        mov     qword ptr [r11-78h],r8

    fffff800`00cf95c3 4d894398        mov     qword ptr [r11-68h],r8

    3: kd> r rdi

    Last set context:

    rdi=ffffc00003dba120

     

    Data in R9 is obtained from memory being referenced by RDI+0A8h. Typically it is supposed to be an opaque structure pertaining to NTFS. Before giving up and engaging Microsoft support, we can give another shot by our old friend !pool extension.

     

    3: kd> !pool ffffc00003dba120 2

    Pool page ffffc00003dba120 region is Paged pool

    *ffffc00003dba000 size:  450 previous size:    0  (Allocated) *Reff

                Pooltag Reff : FCB_DATA, Binary : refs.sys

     

    To my surprise, the pool tag points to ReFS. Let’s revisit the call stack above, it crashes in Ntfs path clearly. How the query operation against Ntfs end up referencing something in ReFS?

     

    Instead of doing the bottom-up analysis, we can take the top-down approach in effort to figure out more information. Let’s revisit the call stack again. The top entry on the stack is nt!NtQueryInformationFile that enters Filter Manager and winds up calling FltpPerformPreCallbacks function, which calls into 3rd party driver fltdrv1.sys.

     

    3: kd> knL

      *** Stack trace for last set context - .thread/.cxr resets it

    # Child-SP          RetAddr           Call Site

    00 ffffd000`23f18090 fffff800`00d00813 Ntfs!NtfsCommonQueryInformation+0xa7

    01 ffffd000`23f18170 fffff800`00d01156 Ntfs!NtfsFsdDispatchSwitch+0xd3

    02 ffffd000`23f181f0 fffff800`00402f3e Ntfs!NtfsFsdDispatchWait+0x47

    03 ffffd000`23f18440 fffff800`00406986 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x25e

    04 ffffd000`23f184c0 fffff800`0042ef2a FLTMGR!FltPerformSynchronousIo+0x2b6

    05 ffffd000`23f18570 fffff800`01041843 FLTMGR!FltQueryInformationFile+0x52

    06 ffffd000`23f185b0 fffff800`010417bb av+0x3843

    07 ffffd000`23f18600 fffff800`00f1c06d av+0x37bb

    08 ffffd000`23f18630 fffff800`00401eea fltdrv1+0xc06d

    09 ffffd000`23f18720 fffff800`00405c13 FLTMGR!FltpPerformPreCallbacks+0x31a

    0a ffffd000`23f18830 fffff800`0042bcf2 FLTMGR!FltpPassThroughFastIo+0x73

    0b ffffd000`23f18890 fffff801`6c9b62b1 FLTMGR!FltpFastIoQueryStandardInfo+0x132

    0c ffffd000`23f18920 fffff801`6c7684b3 nt!NtQueryInformationFile+0x3e1

    0d ffffd000`23f18a90 00007ff9`e632668a nt!KiSystemServiceCopyEnd+0x13

    0e 00000000`09bcd828 00000000`00000000 0x00007ff9`e632668a

     

    As implied by the function name FLTMGR!FltpPerformPreCallbacks, we can suppose it is calling some call back routine in 3rd party driver fltdrv1.sys which is a minifilter registered in Filter Manager.

     

    In the Filter Manager architecture a minifilter driver can filter IRP-based I/O operations as well as fast I/O and file system filter (FSFilter) callback operations. For each of the I/O operations it chooses to filter, a minifilter driver can register a preoperation callback routine, a postoperation callback routine, or both. When handling an I/O operation, the filter manager calls the appropriate callback routine for each minifilter driver that registered for that operation. When that callback routine returns, the filter manager calls the appropriate callback routine for the next minifilter driver that registered for the operation.

     

    So FLTMGR!FltpPerformPreCallbacks is supposed to invoke the preoperation call back inside fltdrv1.sys, Here is how preoperation call back is declared.

     

    typedef FLT_PREOP_CALLBACK_STATUS ( *PFLT_PRE_OPERATION_CALLBACK)(

      _Inout_  PFLT_CALLBACK_DATA Data,

      _In_     PCFLT_RELATED_OBJECTS FltObjects,

      _Out_    PVOID *CompletionContext

    );

     

    The FLT_CALLBACK_DATA structure represents an I/O operation. The Filter Manager and minifilters use this structure to initiate and process I/O operations. In this structure Iopb pointers to an FLT_IO_PARAMETER_BLOCK structure that contains the parameters for the I/O operation.

     

    So let’s decode the assembly code FLTMGR!FltpPerformPreCallbacks and figure out the parameters it passes into the 3rd party callback first.

     

    3: kd> ub fffff800`00401eea

    FLTMGR!FltpPerformPreCallbacks+0x2ff:

    fffff800`00401ecf 7810            js     FLTMGR!FltpPerformPreCallbacks+0x311 (fffff800`00401ee1)

    fffff800`00401ed1 000f            add     byte ptr [rdi],cl

    fffff800`00401ed3 85d2            test    edx,edx

    fffff800`00401ed5 c00000          rol     byte ptr [rax],0

    fffff800`00401ed8 4c8d45bf        lea     r8,[rbp-41h]

    fffff800`00401edc 488d55d7        lea     rdx,[rbp-29h]

    fffff800`00401ee0 488d8bd8000000  lea     rcx,[rbx+0D8h] <<<< this has the first parameter Data

    fffff800`00401ee7 ff5618          call    qword ptr [rsi+18h]

    3: kd> .frame /r 09

    09 ffffd000`23f18720 fffff800`00405c13 FLTMGR!FltpPerformPreCallbacks+0x31a

    rax=ffffe0003c8dae20 rbx=ffffe0003c8d2c80 rcx=0000000000000018

    rdx=ffffe0003c84bc60 rsi=ffffe0000165b560 rdi=ffffe0003c884890

    rip=fffff80000401eea rsp=ffffd00023f18720 rbp=ffffd00023f187c9

    r8=0000000100000000  r9=0000000000000001 r10=0000000000000004

    r11=ffffd00023f18168 r12=ffffe0003c8d2d00 r13=ffffe0000165b010

    r14=ffffe0003c8d2df8 r15=0000000000000004

    iopl=0         nv up ei ng nz na pe nc

    cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010282

    FLTMGR!FltpPerformPreCallbacks+0x31a:

    fffff800`00401eea 8bf8            mov     edi,eax

    3: kd> dt ffffe0003c8d2c80+d8 _FLT_CALLBACK_DATA

    FLTMGR!_FLT_CALLBACK_DATA

       +0x000 Flags            : 2

       +0x008 Thread           : 0xffffe000`0375c880 _KTHREAD

       +0x010 Iopb             : 0xffffe000`3c8d2db0 _FLT_IO_PARAMETER_BLOCK

    3: kd> dt 0xffffe000`3c8d2db0 _FLT_IO_PARAMETER_BLOCK

    FLTMGR!_FLT_IO_PARAMETER_BLOCK

       +0x000 IrpFlags         : 0

       +0x004 MajorFunction    : 0x5 ''

       +0x005 MinorFunction    : 0 ''

       +0x006 OperationFlags   : 0 ''

       +0x007 Reserved         : 0 ''

       +0x008 TargetFileObject : 0xffffe000`3c884890 _FILE_OBJECT

       +0x010 TargetInstance   : 0xffffe000`0165b010 _FLT_INSTANCE

     

    What we are concerned about are parameters TargetFileObject and TargetInstance.

    • TargetFileObject - A file object pointer for the file or directory that is the target for this I/O operation.
    • TargetInstance - An opaque instance pointer for the minifilter that is the target for this I/O operation.

     

    Let’s dump the TargetFileObject first, it is a file open on HarddiskVolume4.

     

    3: kd> !fileobj 0xffffe000`3c884890

     

    \AV-REDIRECT-F3392236-BA8C-4AD9-8AEA-48F099A8EE9C\ffffe0003c8dae20.ini

     

    Device Object: 0xffffe000011009b0   \Driver\volmgr

    Vpb is NULL

    Event signalled

     

    Flags:  0x40002

          Synchronous IO

          Handle Created

     

    File Object is currently busy and has 0 waiters.

     

    FsContext: 0x00000000   FsContext2: 0x46527641

    CurrentByteOffset: 0

     

    3: kd> !devstack 0xffffe000011009b0  

      !DevObj           !DrvObj            !DevExt          ObjectName

      ffffe00001143040  \Driver\volsnap    ffffe00001143190 

    > ffffe000011009b0  \Driver\volmgr     ffffe00001100b00  HarddiskVolume4

    !DevNode ffffe00001148a50 :

      DeviceInst is "STORAGE\Volume\{19df1540-2f74-11e4-80bb-0050568127c8}#0000000000100000"

      ServiceName is "volsnap"

     

    The target File open is \AV-REDIRECT-F3392236-BA8C-4AD9-8AEA-48F099A8EE9C\ffffe0003c8dae20.ini on HarddiskVolume4. Next, what is the target filter instance?

     

    In short, a minifilter driver attaches to the file system stack indirectly, by registering with the filter manager for the I/O operations the minifilter driver chooses to filter. An instance represents an attachment of a filter on a volume. Here is an excellent article to elaborate how filter attach/altitudes works.

     

    Public symbols for fltmgr.sys does include the definitions of various key filter manager structures. We don’t bother to browse the structure manually. There is a public debug extension fltkd.dll and we can use it to view the instance in a very neat way. Here is how we print out the filter instance in question.

     

    3: kd> !fltkd.instance 0xffffe000`0165b010

     

    FLT_INSTANCE: ffffe0000165b010 "FLTDRV1" "388920" <<<< instance name and altitude #

       FLT_OBJECT: ffffe0000165b010  [01000000] Instance

          RundownRef               : 0x0000000000000002 (1)

          PointerCount             : 0x00000001

          PrimaryLink              : [ffffe0000165dab0-ffffe00002501108]

       OperationRundownRef      : ffffe000025035b0

    Could not read field "Number" of fltmgr!_EX_RUNDOWN_REF_CACHE_AWARE from address: ffffe000025035b0

       Flags                    : [00000000]

       Volume                   : ffffe00002501010 "\Device\HarddiskVolume4"  <<<< the underlying volume filter instance attaches to.

       Filter                   : ffffe000011f1010 "fltdrv1"  <<<< filter name

       TrackCompletionNodes     : ffffe0000165bda0

       ContextLock              : (ffffe0000165b080)

       Context                  : 0000000000000000

       CallbackNodes            : (ffffe0000165b0a0)

       VolumeLink               : [ffffe0000165dab0-ffffe00002501108]

       FilterLink               : [ffffe00001650070-ffffe0000164c340]

     

    In !fltkd.instance output, we can see the instance passed into 3rd party pre-operation call back is attached to the volume "\Device\HarddiskVolume4”.  Filter in the output describes a minifilter.  We can run !fltkd.filter to print out the details of filter.

     

    3: kd> !fltkd.filter ffffe000011f1010

     

    FLT_FILTER: ffffe000011f1010 "fltdrv1" "388920"

       FLT_OBJECT: ffffe000011f1010  [02000000] Filter

          RundownRef               : 0x0000000000000026 (19)

          PointerCount             : 0x00000002

          PrimaryLink              : [ffffe00000a9ac60-ffffe00000ab50c0]

       Frame                    : ffffe00000ab5010 "Frame 0"

       Flags                    : [00000002] FilteringInitiated

       DriverObject             : ffffe0000119f9c0

       FilterLink               : [ffffe00000a9ac60-ffffe00000ab50c0] *** ERROR: Module load completed but symbols could not be loaded for fltdrv1.sys

     

       PreVolumeMount           : fffff80000f1bd78  fltdrv1+0xbd78

       PostVolumeMount          : fffff80000f1c354  fltdrv1+0xc354

       FilterUnload             : 0000000000000000  (null)

       InstanceSetup            : fffff80000f1c528  fltdrv1+0xc528

       InstanceList             : (ffffe000011f1068) <<<< all filter instances this filter creates.

          FLT_INSTANCE: ffffe000011e6640 "FLTDRV1" "388920"

          …

          FLT_INSTANCE: ffffe0000173d010 "FLTDRV1" "388920"

          FLT_INSTANCE: ffffe00001793010 "FLTDRV1" "388920"

          FLT_INSTANCE: ffffe00001787010 "FLTDRV1" "388920"

     

    A side note about Filter Frame, for interoperability with legacy filter drivers the filter manager can attach filter device objects to a file system I/O stack in more than one location. Each of the filter manager's filter device objects is called a frame. From the perspective of a legacy filter driver, each filter manager frame is just another legacy filter driver. If there are no legacy filters present, there is only one frame, "Frame 0". If there are some legacy filters present on the system (attached on top of Frame 0), FltMgr might create more frames. We can view all frames Filter Manager manages by running !fltkd.frames

     

    In !fltkd.instance output,  it also displays the volume object filter instance is attached to. We can also dump out the Filter Volume and see all filter instances attached to it. All filter instances are displayed in order of altitude (number next to the filter instance name). Filter manager will process the I/O from filter with higher altitude to one with lower altitude. The filter I figured out just now is the top one and its filter instance altitude implies FLTDRV1 is a sort of FSFilter Activity Monitor.

     

    3: kd> !fltkd.volume ffffe00002501010

     

    FLT_VOLUME: ffffe00002501010 "\Device\HarddiskVolume4"

       FLT_OBJECT: ffffe00002501010  [04000000] Volume

          RundownRef               : 0x000000000000000a (5)

          PointerCount             : 0x00000001

          PrimaryLink              : [ffffe0000165a020-ffffe0000254f020]

       Frame                    : ffffe00000ab5010 "Frame 0"

       Flags                    : [00000064] SetupNotifyCalled EnableNameCaching FilterAttached

       FileSystemType           : [00000002] FLT_FSTYPE_NTFS

       VolumeLink               : [ffffe0000165a020-ffffe0000254f020]

       DeviceObject             : ffffe00002501df0

       DiskDeviceObject         : ffffe000011009b0

       FrameZeroVolume          : ffffe00002501010

       VolumeInNextFrame        : 0000000000000000

       Guid                     : "\??\Volume{19df169b-2f74-11e4-80bb-0050568127c8}"

       CDODeviceName            : "\Ntfs"

       CDODriverName            : "\FileSystem\Ntfs"

       TargetedOpenCount        : 0

       Callbacks                : (ffffe00002501120)

       ContextLock              : (ffffe00002501508)

       VolumeContexts           : (ffffe00002501510)  Count=0

       StreamListCtrls          : (ffffe00002501518)  rCount=4

       FileListCtrls            : (ffffe00002501598)  rCount=0

       NameCacheCtrl            : (ffffe00002501618)

       InstanceList             : (ffffe000025010a0) <<<< all filter instances attached to particular volume.

          FLT_INSTANCE: ffffe0000165b010 "FLTDRV1" "388920"

          FLT_INSTANCE: ffffe0000165daa0 "Fltdrv2 Instance" "366992"

          FLT_INSTANCE: ffffe00001661600 "V-locity cache Instance" "366789"

          FLT_INSTANCE: ffffe0000343c010 "DKRtWrt Instance" "137100"

    3: kd> !devstack ffffe00002501df0

      !DevObj           !DrvObj            !DevExt          ObjectName

    > ffffe00002501df0  \FileSystem\FltMgr ffffe00002501f40 

      ffffe00000244030  \FileSystem\Ntfs   ffffe00000244180  <<<< this volume is a NTFS volume.

     

    Let’s recap a bit before proceeding, we know the original query is all about a file open against \AV-REDIRECT-F3392236-BA8C-4AD9-8AEA-48F099A8EE9C\ffffe0003c8dae20.ini on HarddiskVolume4 which happens to be NTFS volume.

     

    It appears to be a normal query from I/O manager and nothing stands out yet.  Let’s carry on the investigation.

     

    On the call stack, 3rd party functions are black box, but the third party driver calls filter manager API FLTMGR!FltQueryInformationFile to retrieve some file information. So we pick up the analysis from that call.

     

    3: kd> knL

      *** Stack trace for last set context - .thread/.cxr resets it

    # Child-SP          RetAddr           Call Site

    00 ffffd000`23f18090 fffff800`00d00813 Ntfs!NtfsCommonQueryInformation+0xa7

    01 ffffd000`23f18170 fffff800`00d01156 Ntfs!NtfsFsdDispatchSwitch+0xd3

    02 ffffd000`23f181f0 fffff800`00402f3e Ntfs!NtfsFsdDispatchWait+0x47

    03 ffffd000`23f18440 fffff800`00406986 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x25e

    04 ffffd000`23f184c0 fffff800`0042ef2a FLTMGR!FltPerformSynchronousIo+0x2b6

    05 ffffd000`23f18570 fffff800`01041843 FLTMGR!FltQueryInformationFile+0x52

    06 ffffd000`23f185b0 fffff800`010417bb av+0x3843

    07 ffffd000`23f18600 fffff800`00f1c06d av+0x37bb

    08 ffffd000`23f18630 fffff800`00401eea fltdrv1+0xc06d

    09 ffffd000`23f18720 fffff800`00405c13 FLTMGR!FltpPerformPreCallbacks+0x31a

     

    Here is how function FltQueryInformationFile is defined.

     

    NTSTATUS FltQueryInformationFile(

      _In_       PFLT_INSTANCE Instance,

      _In_       PFILE_OBJECT FileObject,

      _Out_      PVOID FileInformation,

      _In_       ULONG Length,

      _In_       FILE_INFORMATION_CLASS FileInformationClass,

      _Out_opt_  PULONG LengthReturned

    );

     

    First parameter represents an attachment to specific volume and second parameter FileObject is the target file to be queried. We need to figure out what instance and file object 3rd party driver were passed into FltQueryInformationFile. As there is no public symbol information for FltQueryInformationFile, we can decode the assembly code directly.

     

    3: kd> u FltQueryInformationFile

    FLTMGR!FltQueryInformationFile:

    fffff800`0042eed8 48895c2408      mov     qword ptr [rsp+8],rbx

    fffff800`0042eedd 4889742410      mov     qword ptr [rsp+10h],rsi

    fffff800`0042eee2 57              push    rdi

    fffff800`0042eee3 4883ec30        sub     rsp,30h

    fffff800`0042eee7 498bf0          mov     rsi,r8

    fffff800`0042eeea 4c8d442420      lea     r8,[rsp+20h]

    fffff800`0042eeef 418bf9          mov     edi,r9d

    fffff800`0042eef2 e8b970fdff      call    FLTMGR!FltAllocateCallbackData (fffff800`00405fb0)

     

    Because FltQueryInformationFile does not populate the rcx and rdx registers (used for the first to parameters), we can assume that it passes its first two parameters through to FltAllocateCallbackData. Let’s see what FltAllocateCallbackData does and what we may obtain from this hint.

     

    NTSTATUS FltAllocateCallbackData(

      _In_      PFLT_INSTANCE Instance,

      _In_opt_  PFILE_OBJECT FileObject,

      _Out_     PFLT_CALLBACK_DATA *RetNewCallbackData

    );

     

    It takes Instance and FileObject as the first two parameters and an instance of PFLT_CALLBACK_DATA as the third parameter. Keep in mind the FLT_CALLBACK_DATA structure represents an I/O operation. The Filter Manager and minifilters use this structure to initiate and process I/O operations, so it should contain the information pertaining to the query a 3rd party driver issues.

     

    Looking back the assembly code before calling FltAllocateCallbackData, 3rd output parameter RetNewCallbackData is obtained by this instruction:

     

    fffff800`0042eeea 4c8d442420      lea     r8,[rsp+20h]

     

    So let’s dump out the PFLT_CALLBACK_DATA from the stack pointer.

     

    3: kd> .frame /r 5

    05 ffffd000`23f18570 fffff800`01041843 FLTMGR!FltQueryInformationFile+0x52

    rax=ffffe0003c8dae20 rbx=ffffe0003c8d20e8 rcx=0000000000000018

    rdx=ffffe0003c84bc60 rsi=0000000009bcd870 rdi=0000000000000018

    rip=fffff8000042ef2a rsp=ffffd00023f18570 rbp=ffffd00023f18690

    r8=0000000100000000  r9=0000000000000001 r10=0000000000000004

    r11=ffffd00023f18168 r12=0000000000000000 r13=0000000000000000

    r14=0000000003000000 r15=0000000000000000

    iopl=0         nv up ei ng nz na pe nc

    cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010282

    FLTMGR!FltQueryInformationFile+0x52:

    fffff800`0042ef2a 488b4c2468      mov     rcx,qword ptr [rsp+68h] ss:0018:ffffd000`23f185d8=ffffd00023f18618

    3: kd> dt poi(ffffd00023f18570+20) _FLT_CALLBACK_DATA

    FLTMGR!_FLT_CALLBACK_DATA

       +0x000 Flags            : 0x10009

       +0x008 Thread           : 0xffffe000`0375c880 _KTHREAD

       +0x010 Iopb             : 0xffffe000`3c8d2140 _FLT_IO_PARAMETER_BLOCK

    3: kd> dt 0xffffe000`3c8d2140 _FLT_IO_PARAMETER_BLOCK

    FLTMGR!_FLT_IO_PARAMETER_BLOCK

       +0x000 IrpFlags         : 0

       +0x004 MajorFunction    : 0x5 ''

       +0x005 MinorFunction    : 0 ''

       +0x006 OperationFlags   : 0 ''

       +0x007 Reserved         : 0 ''

       +0x008 TargetFileObject : 0xffffe000`3c8dae20 _FILE_OBJECT

       +0x010 TargetInstance   : 0xffffe000`0165daa0 _FLT_INSTANCE

     

    Now we’ve got the target instance and file object the 3rd party driver hopes is querying. Let’s look at the target file object first.

     

    3: kd> !fileobj 0xffffe000`3c8dae20

     

    \$RECYCLE.BIN\S-1-5-21-589703330-573742762-725482543-110873\desktop.ini

     

    Device Object: 0xffffe000010f1ca0   \Driver\volmgr

    Vpb: 0xffffe000010cc9a0

    Access: Read SharedRead SharedWrite SharedDelete

     

    Flags:  0x62

          Synchronous IO

          Sequential Only

          Cache Supported

     

    FsContext: 0xffffc00003dba120 FsContext2: 0xffffc00003dba2c0

    CurrentByteOffset: 0

    Cache Data:

      Section Object Pointers: ffffe0003c887380

      Shared Cache Map: 00000000

    3: kd> !devstack 0xffffe000010f1ca0  

      !DevObj           !DrvObj            !DevExt          ObjectName

      ffffe00001142040  \Driver\volsnap    ffffe00001142190 

    > ffffe000010f1ca0  \Driver\volmgr     ffffe000010f1df0  HarddiskVolume5 <<<< volume changes.

    !DevNode ffffe00001148770 :

      DeviceInst is "STORAGE\Volume\{19df16e3-2f74-11e4-80bb-0050568127c8}#0000000000100000"

      ServiceName is "volsnap"

     

    The biggest difference between the original query and current query is not only the file name but also the target volume, now it is against the file \$RECYCLE.BIN\S-1-5-21-589703330-573742762-725482543-110873\desktop.ini on HarddiskVolume5.

     

    Another note, is about the FsContext 0xffffc00003dba120 in the file object, that is the address I ran !pool with and it is from ReFS. It seems we’ve nailed down where the ReFS things come from.

     

    Let’s continue to dump all of the volumes managed by Filter Manager. Apparently hardiskvolume5 is an ReFS volume instead of NTFS.

     

    3: kd> !fltkd.volumes

     

    Volume List: ffffe00000ab5140 "Frame 0"

       FLT_VOLUME: ffffe000010e5010 "\Device\Mup"

          FLT_INSTANCE: ffffe000011e6640 "FLTDRV1" "388920"

    ..

       FLT_VOLUME: ffffe00002501010 "\Device\HarddiskVolume4"

          FLT_INSTANCE: ffffe0000165b010 "FLTDRV1" "388920"

          FLT_INSTANCE: ffffe0000165daa0 "Fltdrv2 Instance" "366992"

          FLT_INSTANCE: ffffe00001661600 "V-locity cache Instance" "366789"

          FLT_INSTANCE: ffffe0000343c010 "DKRtWrt Instance" "137100"

       FLT_VOLUME: ffffe0000169e010 "\Device\HarddiskVolume5"

          FLT_INSTANCE: ffffe00001694190 "FLTDRV1" "388920"

    3: kd> !fltkd.volume ffffe0000169e010

     

    FLT_VOLUME: ffffe0000169e010 "\Device\HarddiskVolume5"

       FLT_OBJECT: ffffe0000169e010  [04000000] Volume

          RundownRef               : 0x0000000000000004 (2)

          PointerCount             : 0x00000001

          PrimaryLink              : [ffffe00001693020-ffffe000016aa020]

       Frame                    : ffffe00000ab5010 "Frame 0"

       Flags                    : [00000064] SetupNotifyCalled EnableNameCaching FilterAttached

       FileSystemType           : [0000001c]

       VolumeLink               : [ffffe00001693020-ffffe000016aa020]

       DeviceObject             : ffffe000016a1df0

       DiskDeviceObject         : ffffe000010f1ca0

       FrameZeroVolume          : ffffe0000169e010

       VolumeInNextFrame        : 0000000000000000

       Guid                     : "\??\Volume{19df173a-2f74-11e4-80bb-0050568127c8}"

       CDODeviceName            : "\Refs"

       CDODriverName            : "\FileSystem\ReFS"

       TargetedOpenCount        : 0

       Callbacks                : (ffffe0000169e120)

       ContextLock              : (ffffe0000169e508)

       VolumeContexts           : (ffffe0000169e510)  Count=0

       StreamListCtrls          : (ffffe0000169e518)  rCount=2

       FileListCtrls            : (ffffe0000169e598)  rCount=0

       NameCacheCtrl            : (ffffe0000169e618)

       InstanceList             : (ffffe0000169e0a0)

          FLT_INSTANCE: ffffe00001694190 "FLTDRV1" "388920"

    3: kd> !devstack ffffe000016a1df0

      !DevObj           !DrvObj            !DevExt          ObjectName

    > ffffe000016a1df0  \FileSystem\FltMgr ffffe000016a1f40 

      ffffe00002259030  \FileSystem\ReFS   ffffe00002259180 

     

    So the target file object that 3rd party driver would like to query is on a ReFS volume. Let’s dump out the target instance now.

     

    3: kd> !fltkd.instance 0xffffe000`0165daa0

     

    FLT_INSTANCE: ffffe0000165daa0 "Fltdrv2 Instance" "366992" <<<< instance changes.

       FLT_OBJECT: ffffe0000165daa0  [01000000] Instance

          RundownRef               : 0x0000000000000000 (0)

          PointerCount             : 0x00000002

          PrimaryLink              : [ffffe00001661610-ffffe0000165b020]

       OperationRundownRef      : ffffe0000165f140

    Could not read field "Number" of fltmgr!_EX_RUNDOWN_REF_CACHE_AWARE from address: ffffe0000165f140

       Flags                    : [00000000]

       Volume                   : ffffe00002501010 "\Device\HarddiskVolume4" <<<< volume remains the same

       Filter                   : ffffe00000a9ac50 "DKTLFSMF" <<<< another filter.

       TrackCompletionNodes     : ffffe0000165ceb0

       ContextLock              : (ffffe0000165db10)

       Context                  : ffffe00000d096d0

       CallbackNodes            : (ffffe0000165db30)

       VolumeLink               : [ffffe00001661610-ffffe0000165b020]

       FilterLink               : [ffffe00000a9ad10-ffffe00002553cc0]

     

    We can notice that instance changes from original FLT_INSTANCE: ffffe0000165b010 "FLTDRV1" "388920" to FLT_INSTANCE:  0xffffe000`0165daa0 "Fltdrv2 Instance" "366992".  However, the instance is attached to HarddiskVolume4 which is bound to NTFS. After initializing the parameters of the callback data structure returned by FltAllocateCallbackData, FltQueryInformationFile initiates the I/O operation by passing the structure to FltPerformSynchronousIo or FltPerformAsynchronousIo. These routines send the I/O operation only to the minifilter driver instances attached below the initiating instance (specified in the Instance parameter) and to the file system i.e. NTFS.

     

    So the whole story is, the original query is all about a file open against \AV-REDIRECT-F3392236-BA8C-4AD9-8AEA-48F099A8EE9C\ffffe0003c8dae20.ini on HarddiskVolume4 which happens to be NTFS volume. The query is passed to 3rd party preoperation call back routine. The 3rd party driver continues the query by calling FltQueryInformationFile for a different file open that is on a ReFS volume but the filter instance attached to original NTFS volume.

     

    It is not hard to image the potential corruption if NTFS is instructed to operate on something pertaining to ReFS system. Bugcheck is one of the payoffs.

     

    Actually the 3rd party drivers’ behavior breaks the minifilter callback rule. Here is except from MSDN:

     

    A minifilter driver can modify the parameters for an I/O operation. For example, a minifilter driver's preoperation callback routine can redirect an I/O operation to a different volume by changing the target instance for the operation. The new target instance must be an instance of the same minifilter driver at the same altitude on another volume.

     

    Conclusion:

     

    The fltdrv1.sys driver is violating the minifilter callback rule by redirecting I/O to an instance of a different minifilter driver.

     

    In this post, we go through the dump analysis to look for the useful information about filter manager and minifilter, and solve a case with public debugging extension fltkd.dll and public filter manger symbol. Hope you enjoyed it!

  • Ntdebugging Blog

    Debugging Without a Debugger

    • 3 Comments

    Hi, my name is Todd Webb, I am an Escalation Engineer from the Global Escalation Services OEM Team.

     

    I recently worked a case where we were trying to troubleshoot an issue and the system would hang with USB debugging enabled. This was a problem since USB debugging was the only kernel debugging option available on this particular hardware setup. I needed a quick and easy way to find the source of the problem without using many of the standard methods.  Basically I had to debug a debugger issue with no access to a debugger.  I knew the problem was likely USB related, but had no other information.  The method I used to troubleshoot this issue was to write some macros that would make a call to HalMakeBeep to generate audible indications of where we were in the code.  The primary reason for using this method was I could quickly and easily move the location in the code where I made the calls as I narrowed down where the problem occurred. 

     

    The alternative would have been to add a bunch of logging and find some way to force a dump.  My initial assessment was this would be more time consuming and less flexible then the beep code method.  Others may have other methods to do this, but this just gives people another method of debugging driver start-up problems on systems that are not debuggable.

     

    The main beep macro allows you to pass in a frequency and duration in milliseconds for a single beep.  Then based on the example below you can extend the single beep macro to create a series of beeps.

     

    Here is an example of the macros I used:

     

    //

    // beep macros for troubleshooting without debugger

    //

     

    #define EHCI_BEEP1(f, d)                                              \

        {                                                                 \

            LARGE_INTEGER duration;                                       \

            duration.QuadPart = Int32x32To64(d, -10000);                  \

            HalMakeBeep(f);                                               \

            KeDelayExecutionThread(KernelMode, FALSE, &duration);         \

            HalMakeBeep(0);                                               \

            duration.QuadPart = Int32x32To64(500, -10000);                \

            KeDelayExecutionThread(KernelMode, FALSE, &duration);         \

        }

     

    #define EHCI_BEEP2(f, d)                                              \

        {                                                                 \

            EHCI_BEEP1(f, d);                                             \

            EHCI_BEEP1(f, d);                                             \

        }

         

    #define EHCI_BEEP3(f, d)                                              \

        {                                                                 \

            EHCI_BEEP2(f, d);                                             \

            EHCI_BEEP1(f, d);                                             \

        }

     

    Here is an example of a long beep followed by three short beeps:

     

        //

        // debug - 1 long : 3 short

        //

        EHCI_BEEP1(600, 1000);

        EHCI_BEEP3(600, 500);

     

    Share this post :
Page 21 of 24 (240 items) «1920212223»