• Ntdebugging Blog

    Missing System Writer Case Explained

    • 4 Comments

    I worked on a case the other day where all I had was a procmon log and event logs to troubleshoot a problem where the System Writer did not appear in the VSSADMIN LIST WRITERS output. This might be review for the folks that know this component pretty well but I figured I would share how I did it for those that are not so familiar with the System Writer.

     

    WHAT WE KNOW:

    1. System State Backups fail
    2. Running a VSS List Writers does not list the system writer

     

    Looking at the event logs I found the error shown below. This error indicates there was a failure while “Writer Exposing its Metadata Context”. Each writer is responsible for providing a list of files, volumes, and other resources it is designed to protect. This list is called metadata and is formatted as XML. In the example we are working with the error is “Unexpected error calling routine XML document is too long”.  While helpful, this message alone does not provide us with a clear reason why the XML document is too long.

     

    Event Type: Error

    Event Source: VSS

    Event ID: 8193

    Description: Volume Shadow Copy Service error: Unexpected error calling routine XML document is too long. hr = 0x80070018, The program issued a command but the command length is incorrect. . Operation: Writer Exposing its Metadata Context: Execution Context: Requestor Writer Instance ID: {636923A0-89C2-4823-ADEF-023A739B2515} Writer Class Id: {E8132975-6F93-4464-A53E-1050253AE220} Writer Name: System Writer

     

    The second event that was logged was also not very helpful as it only indicates that the writer did have a failure. It looks like we are going to need to collect more data to figure this out.

     

    Event Type: Error

    Event Source: VSS

    Event ID: 8193

    Description: Volume Shadow Copy Service error: Unexpected error calling routine CreateVssExamineWriterMetadata. hr = 0x80042302, A Volume Shadow Copy Service component encountered an unexpected error. Check the Application event log for more information. . Operation: Writer Exposing its Metadata Context: Execution Context: Requestor Writer Instance ID: {636923A0-89C2-4823-ADEF-023A739B2515} Writer Class Id: {E8132975-6F93-4464-A53E-1050253AE220} Writer Name: System Writer

     

    From the error above we learned that there was an issue with the metadata file for the System Writer. These errors are among some of the most common issues seen with this writer. There are some not so well documented limitations within the writer due to some hard set limits on path depth and the number of files in a given path. These limitations are frequently exposed by the C:\Windows\Microsoft.Net\ path. Often, this path is used by development software like Visual Studio as well as server applications like IIS. Below I have listed a few known issues that should help provide some scope when troubleshooting System Writer issues.

     

    Known limitations and common points of failure:

    • More than 1,000 folders in a folder causes writer to fail during OnIdentify
    • More than 10,000 files in a folder causes writer to fail during OnIdentify (frequently C:\Windows\Microsoft.Net)
    • Permissions issues (frequently in C:\Windows\WinSXS and C:\Windows\Microsoft.Net)
    • Permissions issues with COM+ Event System Service
      • This service needs to be running and needs to have Network Service with Service User Rights

     

    What data can I capture to help me find where the issue is?

     

    The best place to start is with a Process Monitor (Procmon) capture. To prepare for this capture you will need to download Process Monitor, open the Services MMC snap-in, as well as open an administrative command prompt which will be used in a later step of the process.

     

    You should first stop the Cryptographic Services service using the Services MMC.

     

     

    Once stopped you will want to open Procmon, note that by default Procmon will start capturing when opened. Now that you have Procmon open and capturing data you will start the cryptographic service. This will allow you to capture any errors during service initialization. Once the service is started you will use the command prompt opened earlier to run “vssadmin.exe list writers”. This will signal the writers on the system to capture their metadata, which is a common place we see failures with the System Writer. When the vssadmin command completes, stop the Procmon capture and save this data to disk.

     

    Now that we have data how do we find the errors?

     

    Open your newly created Procmon file. First, add a new filter for the path field that contains “VSS\Diag”.

     

     

    We do this because this is the registry path that all writers will log to when entering and leaving major functions. Now that we have our filtered view we need to look for an entry from the System Writer. You can see the highlighted line below shows the “IDENTIFY” entry for the System Writer. From here we can ascertain the PID of the svchost.exe that the system writer is running in. We now want to include only this svchost. To accomplish this you can right click on the PID for the svchost.exe and select “Include ‘(PID)’”. 

     

     

    Now that we have found our System Writers svchost we will want to remove the filter for “VSS\Diag”; to do that you can return to the filter tool in Procmon and uncheck the box next to the entry.

     

     

    We now have a complete view of what this service was doing at the time it started and completed the OnIdentify. Our next step is to locate the IDENTIFY (Leave) entry as this is often a great marker for where your next clue will be. While in most cases we can’t directly see the error the writer hit we can make some educated connections based on the common issues we spoke about above. If we take a look at the events that took place just before the IDENTIFY (Leave) we can see that we were working in the C:\Windows\Microsoft.NET\assembly\ directory. This is one of the paths that the System Writer is responsible for protecting. As mentioned above, there are some known limitations to the depth of paths and number of files in the “C:\Windows\Microsoft.NET” folder. This is a great example of that limitation as seen in our procmon capture. The example below shows the IDENTIFY (Leave) with the line before that being where the last work was taking place. Meaning this is what the writer was touching when it failed.

     

     

    What does this tell us and what should we do next?

     

    Given the known path limitations, we need to check out the number of files and folders in the C:\Windows\Microsoft.Net\ path and see where the bloat is. Some of these files can be safely removed, however only files located in the Temp locations (Temporary ASP.NET Files) are safe to delete.

     

    Recently we released KB2807849 which addresses the issue shown above.

     

    There are other possible causes of the event log errors mentioned above, such as issues with file permissions. For those problems follow the same steps as above and you are likely to see the IDENTIFY (Leave) just after file access error is displayed in your procmon log. For these failures you will need to investigate the permissions on the file we failed on. Likely the file is missing permissions for the writer’s service account Network Service or Local System. All that is needed here is to add back the missing permissions for your failed file.

     

    While these issues can halt your nightly backups, it is often fairly easy to find the root cause. It just takes time and a little bit of experience with Process Monitor. 

     

    Good luck and successful backups!

  • Ntdebugging Blog

    Understanding Pool Corruption Part 2 – Special Pool for Buffer Overruns

    • 1 Comments

    In our previous article we discussed pool corruption that occurs when a driver writes too much data in a buffer.  In this article we will discuss how special pool can help identify the driver that writes too much data.

     

    Pool is typically organized to allow multiple drivers to store data in the same page of memory, as shown in Figure 1.  By allowing multiple drivers to share the same page, pool provides for an efficient use of the available kernel memory space.  However this sharing requires that each driver be careful in how it uses pool, any bugs where the driver uses pool improperly may corrupt the pool of other drivers and cause a crash.

     

    Figure 1 – Uncorrupted Pool

     

    With pool organized as shown in Figure 1, if DriverA allocates 100 bytes but writes 120 bytes it will overwrite the pool header and data stored by DriverB.  In Part 1 we demonstrated this type of buffer overflow using NotMyFault, but we were not able to identify which code had corrupted the pool.

     

    Figure 2 – Corrupted Pool

     

    To catch the driver that corrupted pool we can use special pool.  Special pool changes the organization of the pool so that each driver’s allocation is in a separate page of memory.  This helps prevent drivers from accidentally writing to another driver’s memory.  Special pool also configures the driver’s allocation at the end of the page and sets the next virtual page as a guard page by marking it as invalid.  The guard page causes an attempt to write past the end of the allocation to result in an immediate bugcheck.

     

    Special pool also fills the unused portion of the page with a repeating pattern, referred to as “slop bytes”.  These slop bytes will be checked when the page is freed, if any errors are found in the pattern a bugcheck will be generated to indicate that the memory was corrupted.  This type of corruption is not a buffer overflow, it may be an underflow or some other form of corruption.

     

    Figure 3 – Special Pool

     

    Because special pool stores each pool allocation in its own 4KB page, it causes an increase in memory usage.  When special pool is enabled the memory manager will configure a limit of how much special pool may be allocated on the system, when this limit is reached the normal pools will be used instead.  This limitation may be especially pronounced on 32-bit systems which have less kernel space than 64-bit systems.

     

    Now that we have explained how special pool works, we should use it.

     

    There are two methods to enable special pool.  Driver verifier allows special pool to be enabled on specific drivers.  The PoolTag registry value described in KB188831 allows special pool to be enabled for a particular pool tag.  Starting in Windows Vista and Windows Server 2008, driver verifier captures additional information for special pool allocations so this is typically the recommended method.

     

    To enable special pool using driver verifier use the following command line, or choose the option from the verifier GUI.  Use the /driver flag to specify drivers you want to verify, this is the place to list drivers you suspect as the cause of the problem.  You may want to verify drivers you have written and want to test or drivers you have recently updated on the system.  In the command line below I am only verifying myfault.sys.  A reboot is required to enable special pool.

     

    verifier /flags 1 /driver myfault.sys

     

    After enabling verifier and rebooting the system, repeat the activity that causes the crash.  For some problems the activity may just be to wait for a period of time.  For our demonstration we are running NotMyFault (see Part 1 for details).

     

    The crash resulting from a buffer overflow in special pool will be a stop 0xD6, DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION.

     

    kd> !analyze -v

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

     

    DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION (d6)

    N bytes of memory was allocated and more than N bytes are being referenced.

    This cannot be protected by try-except.

    When possible, the guilty driver's name (Unicode string) is printed on

    the bugcheck screen and saved in KiBugCheckDriver.

    Arguments:

    Arg1: fffff9800b5ff000, memory referenced

    Arg2: 0000000000000001, value 0 = read operation, 1 = write operation

    Arg3: fffff88004f834eb, if non-zero, the address which referenced memory.

    Arg4: 0000000000000000, (reserved)

     

    We can debug this crash and determine that notmyfault.sys wrote beyond its pool buffer.

     

    The call stack shows that myfault.sys accessed invalid memory and this generated a page fault.

     

    kd> k

    Child-SP          RetAddr           Call Site

    fffff880`04822658 fffff803`721333f1 nt!KeBugCheckEx

    fffff880`04822660 fffff803`720acacb nt! ?? ::FNODOBFM::`string'+0x33c2b

    fffff880`04822700 fffff803`7206feee nt!MmAccessFault+0x55b

    fffff880`04822840 fffff880`04f834eb nt!KiPageFault+0x16e

    fffff880`048229d0 fffff880`04f83727 myfault+0x14eb

    fffff880`04822b20 fffff803`72658a4a myfault+0x1727

    fffff880`04822b80 fffff803`724476c7 nt!IovCallDriver+0xba

    fffff880`04822bd0 fffff803`7245c8a6 nt!IopXxxControlFile+0x7e5

    fffff880`04822d60 fffff803`72071453 nt!NtDeviceIoControlFile+0x56

    fffff880`04822dd0 000007fc`4fe22c5a nt!KiSystemServiceCopyEnd+0x13

    00000000`004debb8 00000000`00000000 0x000007fc`4fe22c5a

     

    The !pool command shows that the address being referenced by myfault.sys is special pool.

     

    kd> !pool fffff9800b5ff000

    Pool page fffff9800b5ff000 region is Special pool

    fffff9800b5ff000: Unable to get contents of special pool block

     

    The page table entry shows that the address is not valid.  This is the guard page used by special pool to catch overruns.

     

    kd> !pte fffff9800b5ff000

                                               VA fffff9800b5ff000

    PXE at FFFFF6FB7DBEDF98    PPE at FFFFF6FB7DBF3000    PDE at FFFFF6FB7E6002D0    PTE at FFFFF6FCC005AFF8

    contains 0000000001B8F863  contains 000000000138E863  contains 000000001A6A1863  contains 0000000000000000

    pfn 1b8f      ---DA--KWEV  pfn 138e      ---DA--KWEV  pfn 1a6a1     ---DA--KWEV  not valid

     

    The allocation prior to this memory is an 800 byte block of non paged pool tagged as “Wrap”.  “Wrap” is the tag used by verifier when pool is allocated without a tag, it is the equivalent to the “None” tag we saw in Part 1.

     

    kd> !pool fffff9800b5ff000-1000

    Pool page fffff9800b5fe000 region is Special pool

    *fffff9800b5fe000 size:  800 data: fffff9800b5fe800 (NonPaged) *Wrap

                Owning component : Unknown (update pooltag.txt)

     

    Special pool is an effective mechanism to track down buffer overflow pool corruption.  It can also be used to catch other types of pool corruption which we will discuss in future articles.

  • Ntdebugging Blog

    Understanding Pool Corruption Part 1 – Buffer Overflows

    • 6 Comments

    Before we can discuss pool corruption we must understand what pool is.  Pool is kernel mode memory used as a storage space for drivers.  Pool is organized in a similar way to how you might use a notepad when taking notes from a lecture or a book.  Some notes may be 1 line, others may be many lines.  Many different notes are on the same page.

     

    Memory is also organized into pages, typically a page of memory is 4KB.  The Windows memory manager breaks up this 4KB page into smaller blocks.  One block may be as small as 8 bytes or possibly much larger.  Each of these blocks exists side by side with other blocks.

     

    The !pool command can be used to see the pool blocks stored in a page.

     

    kd> !pool fffffa8003f42000

    Pool page fffffa8003f42000 region is Nonpaged pool

    *fffffa8003f42000 size:  410 previous size:    0  (Free)      *Irp

                Pooltag Irp  : Io, IRP packets

    fffffa8003f42410 size:   40 previous size:  410  (Allocated)  MmSe

    fffffa8003f42450 size:  150 previous size:   40  (Allocated)  File

    fffffa8003f425a0 size:   80 previous size:  150  (Allocated)  Even

    fffffa8003f42620 size:   c0 previous size:   80  (Allocated)  EtwR

    fffffa8003f426e0 size:   d0 previous size:   c0  (Allocated)  CcBc

    fffffa8003f427b0 size:   d0 previous size:   d0  (Allocated)  CcBc

    fffffa8003f42880 size:   20 previous size:   d0  (Free)       Free

    fffffa8003f428a0 size:   d0 previous size:   20  (Allocated)  Wait

    fffffa8003f42970 size:   80 previous size:   d0  (Allocated)  CM44

    fffffa8003f429f0 size:   80 previous size:   80  (Allocated)  Even

    fffffa8003f42a70 size:   80 previous size:   80  (Allocated)  Even

    fffffa8003f42af0 size:   d0 previous size:   80  (Allocated)  Wait

    fffffa8003f42bc0 size:   80 previous size:   d0  (Allocated)  CM44

    fffffa8003f42c40 size:   d0 previous size:   80  (Allocated)  Wait

    fffffa8003f42d10 size:  230 previous size:   d0  (Allocated)  ALPC

    fffffa8003f42f40 size:   c0 previous size:  230  (Allocated)  EtwR

     

    Because many pool allocations are stored in the same page, it is critical that every driver only use the space they have allocated.  If DriverA uses more space than it allocated they will write into the next driver’s space (DriverB) and corrupt DriverB’s data.  This overwrite into the next driver’s space is called a buffer overflow.  Later either the memory manager or DriverB will attempt to use this corrupted memory and will encounter unexpected information.  This unexpected information typically results in a blue screen.

     

    The NotMyFault application from Sysinternals has an option to force a buffer overflow.  This can be used to demonstrate pool corruption.  Choosing the “Buffer overflow” option and clicking “Crash” will cause a buffer overflow in pool.  The system may not immediately blue screen after clicking the Crash button.  The system will remain stable until something attempts to use the corrupted memory.  Using the system will often eventually result in a blue screen.

     

    NotMyFault

     

    Often pool corruption appears as a stop 0x19 BAD_POOL_HEADER or stop 0xC2 BAD_POOL_CALLER.  These stop codes make it easy to determine that pool corruption is involved in the crash.  However, the results of accessing unexpected memory can vary widely, as a result pool corruption can result in many different types of bugchecks.

     

    As with any blue screen dump analysis the best place to start is with !analyze -v.  This command will display the stop code and parameters, and do some basic interpretation of the crash.

     

    kd> !analyze -v

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

     

    SYSTEM_SERVICE_EXCEPTION (3b)

    An exception happened while executing a system service routine.

    Arguments:

    Arg1: 00000000c0000005, Exception code that caused the bugcheck

    Arg2: fffff8009267244a, Address of the instruction which caused the bugcheck

    Arg3: fffff88004763560, Address of the context record for the exception that caused the bugcheck

    Arg4: 0000000000000000, zero.

     

    In my example the bugcheck was a stop 0x3B SYSTEM_SERVICE_EXCEPTION.  The first parameter of this stop code is c0000005, which is a status code for an access violation.  An access violation is an attempt to access invalid memory (this error is not related to permissions).  Status codes can be looked up in the WDK header ntstatus.h.

     

    The !analyze -v command also provides a helpful shortcut to get into the context of the failure.

     

    CONTEXT:  fffff88004763560 -- (.cxr 0xfffff88004763560;r)

     

    Running this command shows us the registers at the time of the crash.

     

    kd> .cxr 0xfffff88004763560

    rax=4f4f4f4f4f4f4f4f rbx=fffff80092690460 rcx=fffff800926fbc60

    rdx=0000000000000000 rsi=0000000000001000 rdi=0000000000000000

    rip=fffff8009267244a rsp=fffff88004763f60 rbp=fffff8009268fb40

    r8=fffffa8001a1b820  r9=0000000000000001 r10=fffff800926fbc60

    r11=0000000000000011 r12=0000000000000000 r13=fffff8009268fb48

    r14=0000000000000012 r15=000000006374504d

    iopl=0         nv up ei pl nz na po nc

    cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206

    nt!ExAllocatePoolWithTag+0x442:

    fffff800`9267244a 4c8b4808        mov     r9,qword ptr [rax+8] ds:002b:4f4f4f4f`4f4f4f57=????????????????

     

    From the above output we can see that the crash occurred in ExAllocatePoolWithTag, which is a good indication that the crash is due to pool corruption.  Often an engineer looking at a dump will stop at this point and conclude that a crash was caused by corruption, however we can go further.

     

    The instruction that we failed on was dereferencing rax+8.  The rax register contains 4f4f4f4f4f4f4f4f, which does not fit with the canonical form required for pointers on x64 systems.  This tells us that the system crashed because the data in rax is expected to be a pointer but it is not one.

     

    To determine why rax does not contain the expected data we must examine the instructions prior to where the failure occurred.

     

    kd> ub .

    nt!KzAcquireQueuedSpinLock [inlined in nt!ExAllocatePoolWithTag+0x421]:

    fffff800`92672429 488d542440      lea     rdx,[rsp+40h]

    fffff800`9267242e 49875500        xchg    rdx,qword ptr [r13]

    fffff800`92672432 4885d2          test    rdx,rdx

    fffff800`92672435 0f85c3030000    jne     nt!ExAllocatePoolWithTag+0x7ec (fffff800`926727fe)

    fffff800`9267243b 48391b          cmp     qword ptr [rbx],rbx

    fffff800`9267243e 0f8464060000    je      nt!ExAllocatePoolWithTag+0xa94 (fffff800`92672aa8)

    fffff800`92672444 4c8b03          mov     r8,qword ptr [rbx]

    fffff800`92672447 498b00          mov     rax,qword ptr [r8]

     

    The assembly shows that rax originated from the data pointed to by r8.  The .cxr command we ran earlier shows that r8 is fffffa8001a1b820.  If we examine the data at fffffa8001a1b820 we see that it matches the contents of rax, which confirms this memory is the source of the unexpected data in rax.

     

    kd> dq fffffa8001a1b820 l1

    fffffa80`01a1b820  4f4f4f4f`4f4f4f4f

     

    To determine if this unexpected data is caused by pool corruption we can use the !pool command.

     

    kd> !pool fffffa8001a1b820

    Pool page fffffa8001a1b820 region is Nonpaged pool

     fffffa8001a1b000 size:  810 previous size:    0  (Allocated)  None

     

    fffffa8001a1b810 doesn't look like a valid small pool allocation, checking to see

    if the entire page is actually part of a large page allocation...

     

    fffffa8001a1b810 is not a valid large pool allocation, checking large session pool...

    fffffa8001a1b810 is freed (or corrupt) pool

    Bad previous allocation size @fffffa8001a1b810, last size was 81

     

    ***

    *** An error (or corruption) in the pool was detected;

    *** Attempting to diagnose the problem.

    ***

    *** Use !poolval fffffa8001a1b000 for more details.

     

     

    Pool page [ fffffa8001a1b000 ] is __inVALID.

     

    Analyzing linked list...

    [ fffffa8001a1b000 --> fffffa8001a1b010 (size = 0x10 bytes)]: Corrupt region

     

     

    Scanning for single bit errors...

     

    None found

     

    The above output does not look like the !pool command we used earlier.   This output shows corruption to the pool header which prevented the command from walking the chain of allocations.

     

    The above output shows that there is an allocation at fffffa8001a1b000 of size 810.  If we look at this memory we should see a pool header.  Instead what we see is a pattern of 4f4f4f4f`4f4f4f4f.

     

    kd> dq fffffa8001a1b000 + 810

    fffffa80`01a1b810  4f4f4f4f`4f4f4f4f 4f4f4f4f`4f4f4f4f

    fffffa80`01a1b820  4f4f4f4f`4f4f4f4f 4f4f4f4f`4f4f4f4f

    fffffa80`01a1b830  4f4f4f4f`4f4f4f4f 00574f4c`46524556

    fffffa80`01a1b840  00000000`00000000 00000000`00000000

    fffffa80`01a1b850  00000000`00000000 00000000`00000000

    fffffa80`01a1b860  00000000`00000000 00000000`00000000

    fffffa80`01a1b870  00000000`00000000 00000000`00000000

    fffffa80`01a1b880  00000000`00000000 00000000`00000000

     

    At this point we can be confident that the system crashed because of pool corruption.

     

    Because the corruption occurred in the past, and a dump is a snapshot of the current state of the system, there is no concrete evidence to indicate how the memory came to be corrupted.  It is possible the driver that allocated the pool block immediately preceding the corruption is the one that wrote to the wrong location and caused this corruption.  This pool block is marked with the tag “None”, we can search for this tag in memory to determine which drivers use it.

     

    kd> !for_each_module s -a @#Base @#End "None"

    fffff800`92411bc2  4e 6f 6e 65 e9 45 04 26-00 90 90 90 90 90 90 90  None.E.&........

    kd> u fffff800`92411bc2-1

    nt!ExAllocatePool+0x1:

    fffff800`92411bc1 b84e6f6e65      mov     eax,656E6F4Eh

    fffff800`92411bc6 e945042600      jmp     nt!ExAllocatePoolWithTag (fffff800`92672010)

    fffff800`92411bcb 90              nop

     

    The file Pooltag.txt lists the pool tags used for pool allocations by kernel-mode components and drivers supplied with Windows, the associated file or component (if known), and the name of the component. Pooltag.txt is installed with Debugging Tools for Windows (in the triage folder) and with the Windows WDK (in \tools\other\platform\poolmon).  Pooltag.txt shows the following for this tag:

     

    None - <unknown>    - call to ExAllocatePool

     

    Unfortunately what we find is that this tag is used when a driver calls ExAllocatePool, which does not specify a tag.  This does not allow us to determine what driver allocated the block prior to the corruption.  Even if we could tie the tag back to a driver it may not be sufficient to conclude that the driver using this tag is the one that corrupted the memory.

     

    The next step should be to enable special pool and hope to catch the corruptor in the act.  We will discuss special pool in our next article.

  • Ntdebugging Blog

    Another Who Done It

    • 0 Comments

    Hi my name is Bob Golding, I am an EE in GES. I want to share an interesting problem I recently worked on.  The initial symptom was the system bugchecked with a Stop 0xA which means there was an invalid memory reference.  The cause of the crash was a driver making I/O requests while Asynchronous Procedure Calls (APCs) were disabled.  The bugcheck caused by an invalid memory reference was the result of the problem and not the cause.

     

    An APC is queued to a thread during I/O completion. This is to guarantee the last phase of the I/O completion occurs in the same context as the process that issued the request.

     

    The stack of the trap is presented below.  The call stack shows that APCs are being enabled allowing queued APCs to run.

     

    Child-SP          RetAddr           Call Site

    fffff880`07bf3598 fffff800`030b85a9 nt!KeBugCheckEx

    fffff880`07bf35a0 fffff800`030b7220 nt!KiBugCheckDispatch+0x69

    fffff880`07bf36e0 fffff800`030d8b56 nt!KiPageFault+0x260

    fffff880`07bf3870 fffff800`030959ff nt!IopCompleteRequest+0xc73

    fffff880`07bf3940 fffff800`0306c0d9 nt!KiDeliverApc+0x1d7

    fffff880`07bf39c0 fffff800`033f8a1a nt!KiCheckForKernelApcDelivery+0x25

    fffff880`07bf39f0 fffff800`033cce2f nt!MiMapViewOfSection+0x2bafa

    fffff880`07bf3ae0 fffff800`030b8293 nt!NtMapViewOfSection+0x2be

    fffff880`07bf3bb0 00000000`772df93a nt!KiSystemServiceCopyEnd+0x13

    00000000`0015dea8 00000000`00000000 0x772df93a

     

    The reason the trap occurred is because when issuing requests to lower drivers it is common practice in drivers to implement code similar to:

     

    KEVENT event;

     

    status = IoCallDriver( DeviceObject, irp );

     

    //

    //  Wait for the event to be signaled if STATUS_PENDING is returned.

    //

    if (status == STATUS_PENDING) {

       (VOID)KeWaitForSingleObject( &event, // event is a local which is declared on the stack

                                    Executive,

                                    KernelMode,

                                    FALSE,

                                    NULL );

    }

     

    As you can see in the above code, if the return from IoCallDriver does not return pending the code continues and exits. Part of the last phase of I/O processing that takes place in the APC is signaling the event. If the call to IoCallDriver returns success, because the event is on the stack it is critical that the APC execute immediately before the stack unwinds. Since APCs where disabled, the execution of the APC was delayed and during this time the event became invalid. The APCs were delayed because the memory manager was in a critical area and APCs could not run.

     

    I needed to determine which driver did this so I enabled IRP logging in Driver Verifier to trace I/O requests.  With this enabled the next dump should contain a transaction log that will help identify what driver is performing I/O while APCs are disabled.  The command line to enable this is:

    verifier /flags 0x410 /all

     

    The new dump with verifier enabled also crashed after delivering an APC to the thread and completing the IRP.  From the debug output below I can find the IRPs that were issued and the thread that issued them, this is what I need to look for them in the log.

     

    1: kd> !thread

    THREAD fffffa80064c9b50 Cid 0200.0204  Teb: 000007fffffde000 Win32Thread: 0000000000000000 RUNNING on processor 1

    IRP List:

        fffff9800a33ec60: (0006,03a0) Flags: 40060070  Mdl: 00000000

        fffff9800a250c60: (0006,03a0) Flags: 40060070  Mdl: 00000000

        fffff9800a3f4ee0: (0006,0118) Flags: 40060070  Mdl: 00000000

    Not impersonating

    DeviceMap                 fffff8a000007890

    Owning Process            fffffa80064bbb30       Image:         csrss.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      1656           Ticks: 0

    Context Switch Count      25             IdealProcessor: 0

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address 0x000000004a061540

    Stack Init fffff88003b21c70 Current fffff88003b20890

    Base fffff88003b22000 Limit fffff88003b1c000 Call 0

    Priority 14 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffff880`03b21428 fffff800`0307a54c nt!KeBugCheckEx

    fffff880`03b21430 fffff800`030d02ee nt!MmAccessFault+0xffffffff`fff9c15c

    fffff880`03b21590 fffff800`030c8db9 nt!KiPageFault+0x16e

    fffff880`03b21728 fffff800`030e6ab3 nt!memcpy+0x229

    fffff880`03b21730 fffff800`030c4bd7 nt!IopCompleteRequest+0x5a3

    fffff880`03b21800 fffff800`0307ba85 nt!KiDeliverApc+0x1c7

    fffff880`03b21880 fffff800`0331d96a nt!KiCheckForKernelApcDelivery+0x25

    fffff880`03b218b0 fffff800`033e742e nt!MiMapViewOfSection+0xffffffff`fff36baa

    fffff880`03b219a0 fffff800`030d1453 nt!NtMapViewOfSection+0x2bd

    fffff880`03b21a70 00000000`7761159a nt!KiSystemServiceCopyEnd+0x13

    00000000`0025f078 00000000`00000000 0x7761159a

     

    The command “!verifier 100” will dump the transaction log.  Below is the relevant portion of the log containing the IRPs for our thread.

     

    IRP fffff9800a3f4ee0, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -4, SpecialApcDisable = -1

    fffff80003573a68 nt!IovAllocateIrp+0x28

    fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

    fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

    fffff880010f8bcc fltmgr!FltGetVolumeGuidName+0x18c

    fffff88004e4fbe1 baddriver+0x12be1

    fffff88004e73523 baddriver +0x36523

    fffff88004e7300c baddriver +0x3600c

    fffff88004e72cce baddriver +0x35cce

    fffff88004e5f715 baddriver +0x22715

    fffff88004e4c6c7 baddriver +0xf6c7

    fffff88004e48342 baddriver +0xb342

    fffff88004e5e44e baddriver +0x2144e

    fffff88004e5e638 baddriver +0x21638

     

    IRP fffff9800a250c60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1

    fffff80003573a68 nt!IovAllocateIrp+0x28

    fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

    fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

    fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73

    fffff88001010a6b mountmgr!QueryDeviceInformation+0x207

    fffff8800101986b mountmgr!QueryPointsFromMemory+0x57

    fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a

    fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9

    fffff80003574c16 nt!IovCallDriver+0x566

    fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac

    fffff88004e4fbe1 baddriver +0x12be1

    fffff88004e73523 baddriver +0x36523

    fffff88004e7300c baddriver +0x3600c

     

    IRP fffff9800a33ec60, Thread fffffa80064c9b50, IRQL = 0, KernelApcDisable = -5, SpecialApcDisable = -1

    fffff80003573a68 nt!IovAllocateIrp+0x28

    fffff800033b20e2 nt!IoBuildDeviceIoControlRequest+0x32

    fffff8000356d72e nt!IovBuildDeviceIoControlRequest+0x4e

    fffff8800101eec7 mountmgr!MountMgrSendDeviceControl+0x73

    fffff88001010afd mountmgr!QueryDeviceInformation+0x299

    fffff8800101986b mountmgr!QueryPointsFromMemory+0x57

    fffff88001019f86 mountmgr!MountMgrQueryPoints+0x36a

    fffff8800101ea71 mountmgr!MountMgrDeviceControl+0xe9

    fffff80003574c16 nt!IovCallDriver+0x566

    fffff880010f8bec fltmgr!FltGetVolumeGuidName+0x1ac

    fffff88004e4fbe1 baddriver +0x12be1

    fffff88004e73523 baddriver +0x36523

    fffff88004e7300c baddriver +0x3600c

     

    From the IRP log in verifier I can see that baddriver.sys is calling FltGetVolumeGuidName while APCs are disabled. Further investigation found that baddriver.sys had registered a function for image load notification, and the memory manager has APCs disabled when it calls the image notification routine. The image notification routine in baddriver.sys called FltGetVolumeGuidName which issued the I/O.  From the log output I see KernelApcDisable and SpecialApcDisable, the issue is SpecialApcDisable being –1.  The I/O completion APCs are considered special APCs, so kernel APC disable would not affect them.

     

    The solution was for the driver to check for APCs disabled before issuing the FltGetVolumeGuidName and not make this call if APCs are disabled.

  • Ntdebugging Blog

    Remoting Your Debug Crash Cart With KDNET

    • 2 Comments

    This is Christian Sträßner from the Global Escalation Services team based in Munich, Germany.

     

    Back in January, my colleague Ron Stock posted an interesting article about Kernel Debugging using a serial cable: How to Setup a Debug Crash Cart to Prevent Your Server from Flat Lining

     

    Today we look at a new kernel debugging transport introduced in Windows 8 and Windows Server 2012 that makes the cabling much easier, now a network cable can be used as a debug cable. The new KDNET transport utilizes a PCI Ethernet network card in the Target. Most major NIC Vendors have compatible NICs. You can find a list of supported NICs here:

    http://msdn.microsoft.com/en-us/library/windows/hardware/hh830880.aspx

     

    Note that this will not work with Wireless or USB attached NICs in the Target.

     

    In the example below, we utilized an Acer AC 100 Server as the Target. It ships with an onboard Intel 82579LM Gigabit NIC:

     

    Network Adapters

     

    The great thing about KDNET is that the NIC can still be used for normal network activity. The “Microsoft Kernel Debug Network Adapter” driver is the magic behind this. When KDNET.DLL is active, the NIC’s driver will be “banged out” and KDNET takes control of the NIC.

     

    BCD Configuration

    To configure KDNET, you first need to determine the IPv4 Address of the machine with the debugger. In our example, ipconfig.exe tells us that it is 192.168.1.35:

     

    ipconfig

     

    Next go to your Target machine.

     

    The kernel debug settings used to configure KDNET are stored globally in the BCD Store in the {dbgsettings} area. The kernel debug settings apply to all boot entries.

     

    Use bcdedit.exe /dbgsettings net hostip:<addr>port:<port> to set the transport to KDNET, the IP Address of the debugger and the port. You can connect multiple targets to the same debug host by using a different port for each target.

     

    BCD will generate a cryptographic key for you automatically the first time. You can generate a new cryptographic key by appending the ‘newkey’ keyword. Copy the ‘Key’ to a secure location - you will need it in the debugger.

     

    You can display the debug settings using: bcdedit /dbgsettings

     

    Next, for safety, copy the {current} entry to a new entry (bcdedit /copy {current} /d <description>).

     

    Then enable kernel debugging on the copy (bcdedit.exe /debug {new-guid} on).

     

    If required, also use this (new) entry to enable the checked kernel (bcdedit /set {new-guid} hal <path> and bcdedit.exe /set {new-guid} kernel <path>).

     

    bcdedit

     

    Debugger

    On your Debugger Machine open WinDbg->File->Kernel Debugging (Ctrl-K) and choose the NET tab:

     

    Copy and paste the ‘Key’ here and set the port to the value specified on the Target (the default is 50000):

     

     Kernel Debugging

     

    Next a dialog from Windows Firewall might pop up (depending on your configuration). You want to allow access at this point.

     

    Windows Firewall

     

    You need to make sure that your debug host machine allows inbound UDP traffic on the configured port (50000 in this example, and by default) for the network type in use.

     

    If your company has implemented IPSec Policies, make sure you have exceptions in place that allow unsecured communication on the port used (KDNET does not talk IPSec).

     

    The Debugger Window will now look like this:

     

    windbg waiting to reconnect

     

    The Debugger is now set up and ready to go.

     

    Reboot the target system now.

     

    When the target comes back online, it will try to connect to the IP Address and Port that was configured with the bcdedit.exe command. The Debugger Command Window will look something like the screenshot below.

     

    windbg connected

     

    You now can break in as usual. This is a good time to fix your symbol setup if you have not done it yet.

     

    Operation

    You still can communicate normally over the NIC and IP that you use on the target. You do not need an additional NIC in the target to use KDNET. When debugging production servers with heavy traffic, we recommend using a dedicated NIC for debugging (note, 10GigE NICs are currently not supported).

     

    If you don’t want the NIC to be used by the OS as well, it can be disabled via: bcdedit.exe -set loadoptions NO_KDNIC

     

    Normal Network IO

     

    Although you can use KDNET to debug power state transitions (in particular Connected Standby), it is best avoided. The KDNET protocol polls on a regular basis and as such, many systems will not drop to a lower power state. Instead, use USB, 1394, or serial.

     

    Disconnecting the NIC from media (unplugging the NIC in the target machine) is not supported and will most likely blue screen the target machine.

     

    Note 1:

    If you have more than one NIC in your target, please read the following (copied from the debugger help):

    If there is more than one network adapter in the target computer, use Device Manager to determine the PCI bus, device, and function numbers for the adapter you want to use for debugging. Then in an elevated Command Prompt window, enter the following command, where b, d, and f are the bus number, device number, and function number of the adapter:

    bcdedit /set {dbgsettings} busparams b.d.f

     

    Note 2:

    If you use the Windows NIC Teaming (LBFO) in Server 2012: KDNET is not compatible with NIC Teaming as indicated by the Whitepaper:

    http://download.microsoft.com/download/F/6/5/F65196AA-2AB8-49A6-A427-373647880534/[Windows%20Server%202012%20NIC%20Teaming%20(LBFO)%20Deployment%20and%20Management].docx

     

    How does it look on the network?

    This is a packet sent from the target to the debug host machine.

     

    Network Packet

     

    The TTL of the packets sent from the target to the debug host is currently set to 16 (this is not configurable).

     

    This screenshot shows that your connection can only run over 16 IP hops max. This is a theoretical limitation, but it highlights some important facts. Your host is not talking to the Windows IP stack on the target, instead it talks to a basic IPv4/UDP implementation in KDNET. The transport is UDP/IPv4 based, so there is not much tolerance for poor network conditions aside from retry operations at the Debugger Transport Protocol Level.

     

    A few words on performance.

    The performance is generally limited by the latency of the link between the host and target. Therefore, even with a LAN like latency (<=1ms), you will not be able to get even close to wire speed of a 1GigE Connection. Expect to see speeds between 1.5 – 2.5Mbytes/s.

     

    Keep this in mind when you plan to pull large portions of memory from the target over KDNET (like the .dump command). This screenshot was taken while executing the .dump /f command (Full Kernel Dump):

     

    Network Activity

     

    Even with the performance restrictions mentioned, KDNET is a valuable extension of the existing debugging methods.  It allows you to debug a Windows machine without the need for special hardware (1394) or legacy ports (serial) that not every machine has today (especially tablets and notebooks).  It also saves you from using USB2 debugging - which requires special cables and a good amount of hope that the machine’s vendor has attached the single debug capable USB port to an external port on the chassis.

     

    Also, there is no need for you to physically enter the Datacenter where the target is located.  You can do all these steps from your convenient office chair. J

     

    To see network kernel debugging in action, watch Episode #27 of Defrag Tools on Channel 9.

     

    Thanks to Andrew Richards and Joe Ballantyne for their help in writing this article.

  • Ntdebugging Blog

    Our Bangalore Team is Hiring - Windows Server Escalation Engineer

    • 0 Comments

    Would you like to join the world’s best and most elite debuggers to enable the success of Microsoft solutions?

     

    As a trusted advisor to our top customers you will be working with to the most experienced IT professionals and developers in the industry. You will influence our product teams in sustained engineering efforts to drive improvements in our products.

     

    This role involves deep analysis of product source code and debugging to solve problems in multi-million dollar configurations and will give you an opportunity to stretch your critical thinking skills. During the course of debugging, you will uncover opportunities to improve the customer experience while influencing the current and future design of our products.

     

    In addition to providing support to customers while being the primary interface to our sustained engineering teams, you will also have the opportunity to work with new technologies and unreleased software. Through our continuous investment in depth training and hands-on experience with tough customer challenges you will become the world’s best in this area. Expect to partner with many various roles at Microsoft launching a very successful career!

     

    This position is located is at the Microsoft Global Technical Support Center in Bangalore, India.

     

    Learn more about what an Escalation Engineer does at:

    Profile: Ron Stock, CTS Escalation Engineer - Microsoft Customer Service & Support - What is CSS?

    Microsoft JobsBlog JobCast with Escalation Engineer Jeff Dailey

    Microsoft JobsBlog JobCast with Escalation Engineer Scott Oseychik

     

    Apply here:

    https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=1&jid=109989&jlang=en&pp=ss

  • Ntdebugging Blog

    Interpreting Event 153 Errors

    • 13 Comments

    Hello my name is Bob Golding and I would like to share with you a new event that you may see in the system event log.  Event ID 153 is an error associated with the storage subsystem. This event was new in Windows 8 and Windows Server 2012 and was added to Windows 7 and Windows Server 2008 R2 starting with hot fix KB2819485.

     

    An event 153 is similar to an event 129.  An event 129 is logged when the storport driver times out a request to the disk; I described event 129 messages in a previous article.  The difference between a 153 and a 129 is that a 129 is logged when storport times out a request, a 153 is logged when the storport miniport driver times out a request.  The miniport driver may also be referred to as an adapter driver or HBA driver, this driver is typically written the hardware vendor.

     

    Because the miniport driver has a better knowledge of the request execution environment, some miniport drivers time the request themselves instead of letting storport handle request timing.  This is because the miniport driver can abort the individual request and return an error rather than storport resetting the drive after a timeout.  Resetting the drive is disruptive to the I/O subsystem and may not be necessary if only one request has timed out.  The error returned from the miniport driver is bubbled up to the class driver who can log an event 153 and retry the request.

     

    Below is an example event 153:

     

    Event 153 Example

     

    This error means that a request failed and was retried by the class driver.  In the past no message would be logged in this situation because storport did not timeout the request.  The lack of messages resulted in confusion when troubleshooting disk errors because timeouts would occur but there would be no evidence of the error.

     

    The details section of the event the log record will present what error caused the retry and whether the request was a read or write. Below is the details output:

     

    Event 153 Details

     

    In the example above at byte offset 29 is the SCSI status, at offset 30 is the SRB status that caused the retry, and at offset 31 is the SCSI command that is being retried.  In this case the SCSI status was 00 (SCSISTAT_GOOD), the SRB status was 09 (SRB_STATUS_TIMEOUT), and the command was 28 (SCSIOP_READ). 

     

    The most common SCSI commands are:

    SCSIOP_READ - 0x28

    SCSIOP_WRITE - 0x2A

     

    The most common SRB statuses are below:

    SRB_STATUS_TIMEOUT - 0x09

    SRB_STATUS_BUS_RESET - 0x0E

    SRB_STATUS_COMMAND_TIMEOUT - 0x0B

     

    A complete list of SCSI operations and statuses can be found in scsi.h in the WDK.  A list of SRB statuses can be found in srb.h.

     

    The timeout errors (SRB_STATUS_TIMEOUT and SRB_STATUS_COMMAND_TIMEOUT) indicate a request timed out in the adapter. In other words a request was sent to the drive and there was no response within the timeout period.  The bus reset error (SRB_STATUS_BUS_RESET) indicates that the device was reset and that the request is being retried due to the reset since all outstanding requests are aborted when a drive receives a reset.

     

    A system administrator who encounters event 153 errors should investigate the health of the computer’s disk subsystem.  Although an occasional timeout may be part of the normal operation of a system, the frequent need to retry requests indicates a performance issue with the storage that should be corrected.

  • Ntdebugging Blog

    Commitment Failures, Not Just a Failed Love Story

    • 1 Comments

    I was working on a debug the other day when I ran the “!vm” command and saw that the system had some 48,000 commit requests that failed. This was strange as the system was not out of memory and the page file was not full. I was left scratching my head and thinking “I wish I knew where !vm got that information from.” So I went on a quest to find out, here is what I found.

     

    13: kd> !vm 1

     

    *** Virtual Memory Usage ***

          Physical Memory:    12580300 (  50321200 Kb)

          Page File: \??\C:\pagefile.sys

            Current:  50331648 Kb  Free Space:  50306732 Kb

            Minimum:  50331648 Kb  Maximum:     50331648 Kb

          Available Pages:     4606721 (  18426884 Kb)

          ResAvail Pages:     12189247 (  48756988 Kb)

          Locked IO Pages:           0 (         0 Kb)

          Free System PTEs:   33460257 ( 133841028 Kb)

          Modified Pages:        20299 (     81196 Kb)

          Modified PF Pages:      6154 (     24616 Kb)

          NonPagedPool 0 Used:   19544 (     78176 Kb)

          NonPagedPool 1 Used:   22308 (     89232 Kb)

          NonPagedPool Usage:    53108 (    212432 Kb)

          NonPagedPool Max:    9408956 (  37635824 Kb)

          PagedPool 0 Usage:    168921 (    675684 Kb)

          PagedPool 1 Usage:   4149241 (  16596964 Kb)

          PagedPool 2 Usage:     17908 (     71632 Kb)

          PagedPool Usage:     4336070 (  17344280 Kb)

          PagedPool Maximum:  33554432 ( 134217728 Kb)

          Session Commit:         3438 (     13752 Kb)

          Shared Commit:          6522 (     26088 Kb)

          Special Pool:              0 (         0 Kb)

          Shared Process:        53597 (    214388 Kb)

          PagedPool Commit:    4336140 (  17344560 Kb)

          Driver Commit:          5691 (     22764 Kb)

          Committed pages:     5565215 (  22260860 Kb)

          Commit limit:       25162749 ( 100650996 Kb)

     

          ********** 48440 commit requests have failed  **********

     

    It turns out that this calculation is from a global ULONG array named “nt!MiChargeCommitmentFailures”.  The array has 4 members and they are used to trace the types of commit failures that have taken place. This is done by first calculating the new commit size NewCommitValue = CurrentCommitValue + SystemReservedMemory. Based on this calculation commit errors are tracked in a few different ways, which are listed below with the corresponding member in the array that is incremented.

     

    MiChargeCommitmentFailures[0] - If the system failed a commit request and an expansion of the pagefile has failed.

    MiChargeCommitmentFailures[1] - If the system failed a commit and we have already reached the maximum pagefile size.

    MiChargeCommitmentFailures[2] - If the system failed a commit while the pagefile lock is held.

    MiChargeCommitmentFailures[3] - If the system failed a commit and the NewCommitValue is less than or equal to CurrentCommitValue.

     

    In order to calculate the count of failures, "!vm" adds up the values stored in each array member of the array. Members 0 and 1 are always counted, member 2 is counted if the OS version is Windows 2003/XP and member 3 is counted if the build version is newer than Windows 2003/XP. 

     

    Let's look at the array in the dump I was debugging:

     

    13: kd> dc nt!MiChargeCommitmentFailures L4

    fffff800`01e45ce0  00000000 0000bd38 00000000 00000000  ....8...........

     

     

    Converting this to decimal we find the 48000+ commit failures I was seeing the in output of !VM.

     

    13: kd> ?0000bd38

    Evaluate expression: 48440 = 00000000`0000bd38

     

    Since I now had my answer, “where does the number come from?”, I was left wanting to know a bit more about the overall flow of why a VirtualAlloc fails to commit.

     

    When memory is allocated by VirtualAlloc the newly allocated memory is not committed to physical memory. Only when the memory is accessed by a read or write is it backed by physical memory.

     

    When this newly allocated memory is accessed for the first time it will need to be backed by commit space. Under normal conditions this is a smooth process, however when the system hits what’s called the commit limit and can’t expand this limit we see commit failures.

     

    So how is the commit limit calculated? Let’s say we have a system with 4GB of physical memory and a pagefile that is 6GB in size. To determine the commit limit we add physical memory and the pagefile size together - in this example the commit limit would be 10GB. Since memory manger will not let any user mode allocation consume every last morsel of commit space it keeps a small amount of the commit space for the system to avoid hangs. When the limit is reached the system tries to grow the page file. If there is no more room to grow the pagefile or the pagefile has reached its configured maximum size, the system will try and free some committed memory to make room for more requests. If expansion of the page file or the attempt to free memory do not allow the allocation to complete, the allocation fails and MiChargeCommitmentFailures is incremented.

     

     

    To sum it all up, commit limit is RAM + pagefile, commit failures happen when we hit the commit limit and the system is unable to grow the pagefile because it is already at its max.  It’s that simple, well almost.

     

    For those that will want to know more about how memory manger works please see the post from Somak: The Memory Shell Game.

     

    Randy Monteleone

  • Ntdebugging Blog

    Debugging a Network Connectivity Issue - TrackNblOwner to the Rescue

    • 0 Comments

    Hello Debug community this is Karim Elsaid again.  Today I’m going to discuss a recent interesting case where intermittently the server is losing access to the network.  No communication (even pings) can be done from / to the server when the issue hits.

     

    We went through the normal exercise and asked the customer to obtain a Kernel memory dump from the machine while it was in the problematic state, hoping that we will find some data to help us to demystify the issue.

     

    One of the very first commands we run upon receiving a hang dump is the very famous “!locks” command.  This yielded the following:

     

    8: kd> !locks

    **** DUMP OF ALL RESOURCE OBJECTS ****

    KD: Scanning for held locks..

     

    Resource @ nt!IopDeviceTreeLock (0xfffff80001a81c80)    Shared 1 owning threads

         Threads: fffffa800cd8a040-01<*>

    KD: Scanning for held locks.

     

    Resource @ nt!PiEngineLock (0xfffff80001a81b80)    Exclusively owned

        Contention Count = 6

         Threads: fffffa800cd8a040-01<*>

    KD: Scanning for held locks

    84372 total locks, 2 locks currently held

     

    What I’m looking for is Locks with exclusive owners and waiters.  From the above output we can see that thread fffffa800cd8a040 exclusively owns a Plug and Play (Pi prefix) lock and shared owns an I/O Manager (Io prefix) device tree lock.

     

    There are no waiters for the exclusive lock, however PnP locks always worth investigating.  While debugging I always treat everything a possible suspect unless proven otherwise, so let’s dump this thread:

     

    8: kd> !thread fffffa800cd8a040 e

    THREAD fffffa800cd8a040  Cid 0004.005c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable

        fffff88002b0f118  SynchronizationEvent

    IRP List:

        fffffa8016527510: (0006,0310) Flags: 00000000  Mdl: 00000000

    Not impersonating

    DeviceMap                 fffff8a000006100

    Owning Process            fffffa800cd56040       Image:        System

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      14791337       Ticks: 15577 (0:00:04:03.002)

    Context Switch Count      835317         IdealProcessor: 2            

    UserTime                  00:00:00.000

    KernelTime                00:00:26.863

    Win32 Start Address nt!ExpWorkerThread (0xfffff8000188f530)

    Stack Init fffff88002b0fc70 Current fffff88002b0ee30

    Base fffff88002b10000 Limit fffff88002b0a000 Call 0

    Priority 12 BasePriority 12 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

    *** ERROR: Module load completed but symbols could not be loaded for myfault.sys

    Child-SP          RetAddr            Call Site

    fffff880`02b0ee70 fffff800`0187ba32 nt!KiSwapContext+0x7a

    fffff880`02b0efb0 fffff800`0188cd8f nt!KiCommitThreadWait+0x1d2

    fffff880`02b0f040 fffff800`018e1816 nt!KeWaitForSingleObject+0x19f

    fffff880`02b0f0e0 fffff880`01618fcd nt! ??::FNODOBFM::`string'+0x12ff6

    fffff880`02b0f150 fffff880`0173f54e tcpip!FlPnpEvent+0x17d

    fffff880`02b0f1c0 fffff880`00f87b2f tcpip!Fl48PnpEvent+0xe

    fffff880`02b0f1f0 fffff880`00f884b7 NDIS!ndisPnPNotifyBinding+0xbf

    fffff880`02b0f280 fffff880`00fa1911 NDIS!ndisPnPNotifyAllTransports+0x377

    fffff880`02b0f3f0 fffff880`00fa2c5b NDIS!ndisCloseMiniportBindings+0x111

    fffff880`02b0f500 fffff880`00f3bbc2 NDIS!ndisPnPRemoveDevice+0x25b

    fffff880`02b0f6a0 fffff880`00fa5b69 NDIS!ndisPnPRemoveDeviceEx+0xa2

    fffff880`02b0f6e0 fffff800`01aec8d9 NDIS!ndisPnPDispatch+0x609

    fffff880`02b0f780 fffff800`01c6c1e1 nt!IopSynchronousCall+0xc5

    fffff880`02b0f7f0 fffff800`0197f733 nt!IopRemoveDevice+0x101

    fffff880`02b0f8b0 fffff800`01c6bd34 nt!PnpRemoveLockedDeviceNode+0x1a3

    fffff880`02b0f900 fffff800`01c6be40 nt!PnpDeleteLockedDeviceNode+0x44

    fffff880`02b0f930 fffff800`01cfcd04 nt!PnpDeleteLockedDeviceNodes+0xa0

    fffff880`02b0f9a0 fffff800`01cfd35c nt!PnpProcessQueryRemoveAndEject+0xc34

    fffff880`02b0fae0 fffff800`01be65ce nt!PnpProcessTargetDeviceEvent+0x4c

    fffff880`02b0fb10 fffff800`0188f641 nt! ?? ::NNGAKEGL::`string'+0x5ab9b

    fffff880`02b0fb70 fffff800`01b1ce5a nt!ExpWorkerThread+0x111

    fffff880`02b0fc00 fffff800`01876d26 nt!PspSystemThreadStartup+0x5a

    fffff880`02b0fc40 00000000`00000000 nt!KiStartSystemThread+0x16

     

    Interesting, by looking at the stack above we can see that thread is doing some NDIS PnP stuff.  This thread has been waiting for more than 4 minutes, but hold on,  what is “ nt! ?? ::FNODOBFM::`string”?  This doesn’t seem to be a useful function name, no its not!  This is a side effect of Basic Block Tools optimization (BBT).  Using public symbols the debugger will find it hard to get to the right symbol, there is a nice a trick you can use in order to get to the right function.

     

    P.S for a nice x64 Deep Dive please refer to our archive.

     

    Let’s display the function data for the return address fffff800`018e1816:

     

    8: kd> .fnent fffff800`018e1816

    Debugger function entry 000000e8`f28f14f8 for:

    (fffff800`018c4790)   nt! ?? ::FNODOBFM::`string'+0x12ff6   |  (fffff800`018c47c8)   nt!vDbgPrintExWithPrefixInternal

     

    BeginAddress      = 00000000`000da7d0

    EndAddress        = 00000000`000da81c

    UnwindInfoAddress = 00000000`001c8a54

     

    Unwind info at fffff800`019cfa54, 10 bytes

      version 1, flags 4, prolog 0, codes 0

     

    Chained info:

    BeginAddress      = 00000000`000182f0

    EndAddress        = 00000000`00018358

    UnwindInfoAddress = 00000000`001bf910

     

    Unwind info at fffff800`019c6910, 6 bytes

      version 1, flags 0, prolog 4, codes 1

      00: offs 4, unwind op 2, op info c      UWOP_ALLOC_SMALL. 

     

    For optimized binaries, you will find a section “Chained Info”.  Add the BeginAddress to the start address of the module and you should hit the correct function so:

     

    8: kd> ln nt+000182f0

    (fffff800`0181f2f0)   nt!ExWaitForRundownProtectionReleaseCacheAware  |  (fffff800`0181f358)   nt!KeGetRecommendedSharedDataAlignment

    Exact matches:

        nt!ExWaitForRundownProtectionReleaseCacheAware (<no parameter info>)

     

    Bingo!  You got the function.  So tcpip!FlPnpEvent was calling ExWaitForRundownProtectionReleaseCacheAware.  This function will basically wait for the rundown protection to drop down to 0.

     

    A thread can call ExAcquireRundownProtectionEx against a shared object for safe access.  Rundown Protection provides a way to protect an object from being deleted unless all outstanding access has been finished (Run Down).  The “ExWaitForRundownProtectionReleaseCacheAware” will do exactly the same; it will wait for all rundown protection calls to be completed.

     

    The question is which structure are we waiting for its rundown to drain, that will depend on what we are dealing with.  Because of code optimization the debugger is not showing you the full picture.  Through code review I found that in this particular dump there is an inline call to function “FlpUninitializePacketProviderInterface”.

     

    So the stack in reality should look like this:

     

    Child-SP          RetAddr           Call Site

    fffff880`02b0ee70 fffff800`0187ba32 nt!KiSwapContext+0x7a

    fffff880`02b0efb0 fffff800`0188cd8f nt!KiCommitThreadWait+0x1d2

    fffff880`02b0f040 fffff800`018e1816 nt!KeWaitForSingleObject+0x19f

    fffff880`02b0f0e0 fffff880`01618fcd nt!ExWaitForRundownProtectionReleaseCacheAware

    ----inline function----             tcpip!FlpUninitializePacketProviderInterface

    fffff880`02b0f150 fffff880`0173f54e tcpip!FlPnpEvent+0x17d

    fffff880`02b0f1c0 fffff880`00f87b2f tcpip!Fl48PnpEvent+0xe

     

    So we need to un-initialize a network interface but before doing that we need to make sure that there are no outstanding references to packets and that there are no outstanding packets pending.  When we say packets, starting in NDIS 6 we basically mean “NET_BUFFER” and “Net_Buffer_List” structures.  So we need to check for any outstanding Net_Buffer_Lists (NBLs) that are pending, one reference will correspond to one pending NBL.

     

    To the rescue, the “NDISKD” debugger extension has a very nice and handy command to display all pending NBLS and their owners, it is “!pendingnbls”.  For the command to work it you must first enable “TrackNblOwner” through the registry.  By default, this registry key is not enabled on server SKUs as it may cause a performance hit.  On client SKUs this is enabled by default.

     

    When you run !pendingnbls on a clean Windows 2008 R2 install you get:

     

    8: kd> !ndiskd.pendingnbls

        This command requires NBL tracking to be enabled on the debugee target

        machine.  (By default, client operating systems have level 1, and servers

        have level 0).  To enable, set this REG_DWORD value to a nonzero value on

        the target machine and reboot the target machine:

       

        HKLM\SYSTEM\CurrentControlSet\Services\NDIS\Parameters ! TrackNblOwner

        Possible Values (features are cumulative)

        * 0:  Disable all tracking.

        * 1:  Track the most recent owner of each NBL (enables !ndiskd.pendingnbls)

     

        Show me all allocated NBLs so I can manually find the one I want

     

    You can find all allocated NBLs with the command “!ndiskd.nblpool -force -find ((@$extin.Flags)&0x108)==0x100)”, but still you don’t get any owner.

     

    So I asked the customer to turn on “TrackNblOwner” and reboot, wait for the next occurrence of the issue and get a new memory dump.

     

    Two days later we received the memory dump file.  I verified that they are having the same issue I found in the last dump and that TrackNblOwner is configured correctly:

     

    23: kd> dp NDIS!ndisTrackNblOwner L1

    fffff880`00ef1a30  00000000`00000001

     

    Then I immediately checked all pending NBLs to claim the prize, and it was not surprising to see why the NIC card was not un-initializing:

     

    23: kd> !ndiskd.pendingnbls

     

    PHASE 1/3: Found 20 NBL pool(s).                

    PHASE 2/3: Found 550 freed NBL(s).                                   

     

        Pending Nbl        Currently held by                                       

        fffffa801dc559f0   fffffa80142d31a0 - My Ethernet 1Gb 4-port Adapter  [Miniport]                   

        fffffa801dc81680   fffffa80142d31a0 - My Ethernet 1Gb 4-port Adapter  [Miniport]                   

        fffffa80131d2aa0   fffffa80142d31a0 - My Ethernet 1Gb 4-port Adapter  [Miniport]

    ……………………………….

    Ret of the repeated output omitted

     

    PHASE 3/3: Found 1854 pending NBL(s) of 3005 total NBL(s).                     

    Search complete.

     

    So we currently have 1854 NBLs pending on the NIC miniport driver “fffffa80142d31a0”.  This is the Miniport that currently holding all NBLs:

     

    23: kd> !ndiskd.miniport fffffa80142d31a0

     

     

    MINIPORT

     

        My Ethernet 1Gb 4-port Adapter 

     

        Ndis handle        fffffa80142d31a0

        Ndis API version   v6.20

        Adapter context    fffffa80138cc000

        Miniport driver    fffffa800d4f7530 - MyMiniPortDriver  v1.0

        Network interface  fffffa800d25e870

     

        Media type         802.3

        Device instance    PCI\VEN_1111&DEV_1111&SUBSYS_169D103C&REV_01\4&2263a140&0&0010

        Device object      fffffa80142d3050    More information

        MAC address        xx-xx-xx-xx-xx-xx

     

     

    STATE

     

        Miniport           Running

        Device PnP         QUERY_REMOVED

        Datapath           Normal

        Operational status DORMANT

        Operational flags  DORMANT_PAUSED

        Admin status       ADMIN_UP

        Media              Connected

        Power              D0

        References         9

        Total resets       0

        Pending OID        None

        Flags              BUS_MASTER, 64BIT_DMA, SG_DMA, DEFAULT_PORT_ACTIVATED,

                           SUPPORTS_MEDIA_SENSE, DOES_NOT_DO_LOOPBACK,

                           MEDIA_CONNECTED

        PnP flags          PM_SUPPORTED, DEVICE_POWER_ENABLED, RECEIVED_START,

                           HARDWARE_DEVICE

     

    What you notice from the above that the device received a “Query_Removed” PNP and is currently in a Dormant_Paused state.

     

    From: http://msdn.microsoft.com/en-us/library/ff566737.aspx:

    NET_IF_OPER_STATUS_DORMANT_PAUSED

    The operational status is set to NET_IF_OPER_STATUS_DORMANT because the miniport adapter is in the paused or pausing state.

     

    NDIS 6.0 and up allow miniport adapters to be paused and the documentation here shows what the miniport driver should do when it receives a pause request.

     

    Because the adapter was in a pause state, basic network commads like “ping” ceased to work as described earlier in the symptoms.  The next action is definitely to involve the miniport adapter vendor to trace this further and find out why all these pending NBLs were not completed.

     

    Until a next adventure!

    Best Regards,

    Karim

  • Ntdebugging Blog

    Understanding File System Minifilter and Legacy Filter Load Order

    • 2 Comments

    Hello, my name is Fred Jeng from the Global Escalation Services team. For today’s post, I want to go over how Windows 7 and Windows Server 2008 R2 load file system mini-filters in a mixed environment when legacy filters are also present. I recently came across an issue where the filters were being loaded out of order based on their altitudes. This can cause all sorts of problems with a filter driver’s functionality if they are incorrectly positioned on the stack.  Take for example the following filter stack, obtained using the fltmc command from the cmd prompt:

     

    C:\Windows\system32>fltmc

     

    Filter Name                     Num Instances    Altitude    Frame

    ------------------------------  -------------  ------------ -----

    AVLegacy                                        389998.99   <Legacy>

    EncryptionLegacy                                149998.99   <Legacy>

    AVMiniFilter                           3        328000         0

    luafv                                  1        135000         0

    FileInfo                               13       45000          0

     

    At first glance it looks like there is a problem causing the legacy encryption filter to be loaded above the antivirus minifilter, which has a higher altitude. This may cause issues with AVMiniFilter as the IOs that it receives are still encrypted. Due to limitations in how the filter drivers attach to the driver stack, this is actually the intended behavior. However, there is a solution to manipulate the load order to load the legacy filters correctly based on their altitude.

     

    First some background information regarding legacy filters and minifilters.

    In the old days before minifilters, legacy drivers can only attach at the top of the driver stack so the load order also controlled the attachment order. The earlier a legacy driver loads, the lower it can attach on the file system stack. Minifilters on the other hand can load at any time, but their positions relative to other minifilters are controlled by their altitude. When a minifilter loads, it needs to register with an appropriate frame created by fltmgr. Each frame is a fltmgr device object and represents a range of altitudes. There can be more than one frame on the file system stack but the range of altitudes that each frame represents cannot overlap with the altitude range of another frame. For interoperability with legacy drivers, minifilters must still maintain a load order group. The frames are created and managed by fltmgr, which itself is a legacy driver. The ramification of this is that fltmgr must follow the old legacy filter driver rules and attach only at the top of the stack.

     

    From the above example, let’s walk through how the legacy and minifilters above are loaded to get us into the state such that the altitude appears to be out of order.

    First, here are the details for the 5 drivers.

     

    Driver Name

    Type

    Load Order

    Start Type

    Altitude

    AVLegacy

    Legacy

    FsFilter Anti-Virus

    SERVICE_BOOT_START

    329998.99

    AVMiniFilter

    Minifilter

    FsFilter Anti-Virus

    SERVICE_BOOT_START

    328000

    EncryptionLegacy

    Legacy

    FsFilter Encryption

    SERVICE_BOOT_START

    149998.99

    Luafv

    Minifilter

    FsFilter Virtualization

    SERVICE_AUTO_START

    135000

    FileInfo

    Minifilter

    FSFilter Bottom

    SERVICE_BOOT_START

    45000

     

     

    MSDN has an article that describes load order groups and altitudes for minifilters: http://msdn.microsoft.com/en-us/library/windows/hardware/ff549689%28v=vs.85%29.aspx.

    Referencing this article regarding load order groups and altitudes for minifilter drivers, we can determine that our filters will load in the following order.

    FileInfo

    EncryptionLegacy

    AVLegacy

    AVMiniFilter

    luafv

     

    On system bootup, when fltmgr.sys loads it will create Frame 0 with a default altitude range of 0 to 49999. When FileInfo loads with an altitude of 45000, it will fit into the default Frame 0. Next to load is EncryptionLegacy. Since this is a legacy driver, it will attach on top of the legacy driver fltmgr.sys. So this is how our file system stack looks right now.

     

    image001

     

    Next up is the AVLegacy driver.  This is a legacy driver, so it has to attach above EncryptionLegacy.

     

    image002

     

    Now the AVMinifilter will load with an altitude of 328000. The OS will check if it will fit in the Frame 0 Fltmgr, but this frame only supports an altitude of 0-45000. Before deciding to create a new fltmgr Frame instance, it will check if there are any legacy filters attached above Frame 0 and adjust Frame 0’s altitude if there are. So in our case, we do have legacy filters on the file system stack at this point and so we go up the list of legacy drivers. First we see EncryptionLegacy with an altitude of 149998.99 so we adjust Frame 0 to cover from 0 to 149998.99. We continue up the list and see AVLegacy with an altitude of 329998.99 so we again adjust the altitude of Frame 0 to now cover 0-329998.99. The reason we do this is because Frame 0 now must handle all minifilters below 329998.99. Since we can only attach legacy filters to the top of the stack, if we add an additional FltMgr frame instance, it has to sit above AVLegacy and can only support minifilters with altitude of 329998.99 or higher. Now that Frame 0 supports 0-329998.99, we can register AVMinifilter with Frame 0.

     

     image003

     

    At this point, you can already see that AVMinifilter which has a higher altitude then EncryptionLegacy will be loaded below EncryptionLegacy. The last driver to load is the luafv minifilter, and it will fit into Frame 0.

     

    image004

     

    A couple of things to point out.

    Why can’t we insert a Frame between AVLegacy and EncryptionLegacy when ACMinifilter loads?

    This is due to how the file system stack is constructed with legacy drivers only being able to attach to the top of the stack. Since FltMgr is a legacy driver, it has to conform to these rules.

     

    Why do we adjust the altitude in Frame 0 to cover 0-329998.99? Why not stop at 149998.99?

    If Frame 0 only adjusts its altitude to the legacy filter directly attached above it and not all the way to the highest attached legacy filter, we won’t be able to handle some range of mini-filters. For example, assume we only adjust Frame 0 to cover 0-149998.99, then when the AVMinifilter with an altitude of 328000 comes along, it won’t fit in Frame 0, and we’re unable to insert a Frame between AVLegacy (329998.99) and EncryptionLegacy(149998.99) so we would either be unable to load AVMinifilter, or we would have to create Frame 1 above AVLegacy and load AVMinifilter there. In which case we would again be faced with the altitude disordering issue.

     

    If this is the expected behavior, how do we resolve the problem of EncryptionLegacy being loaded above AVMinifilter? The solution is to inject a dummy minifilter that loads at the appropriate time to force fltmgr to create a Frame between the legacy filters. For our case above, I used the DDK to create the NullFilter minifilter driver and changed the load order to FSFilter Compression and gave it an altitude of 160030 which is within the assigned altitude for FSFilterCompression and set the start type to SERVICE_BOOT_START.  Please note that I only used this driver in a test environment, production minfilter drivers must use an altitude assigned by Microsoft.

     

    For information on minifilter load order groups and altitude, reference http://msdn.microsoft.com/en-us/windows/hardware/gg462963.aspx.

     

    Our list of filters are as follows:

     

    Driver Name

    Type

    Load Order

    Start Type

    Altitude

    AVLegacy

    Legacy

    FsFilter Anti-Virus

    SERVICE_BOOT_START

    329998.99

    AVMiniFilter

    Minifilter

    FsFilter Anti-Virus

    SERVICE_BOOT_START

    328000

    EncryptionLegacy

    Legacy

    FsFilter Encryption

    SERVICE_BOOT_START

    149998.99

    Luafv

    Minifilter

    FsFilter Virtualization

    SERVICE_AUTO_START

    135000

    FileInfo

    Minifilter

    FSFilter Bottom

    SERVICE_BOOT_START

    45000

    NullFilter

    Minifilter

    FSFilter Compression

    SERVICE_BOOT_START

    160030

     

    So with the new NullFilter dummy driver, our filter load order should be as follows:

    FileInfo

    EncryptionLegacy

    NullFilter

    AVLegacy

    AVMiniFilter

    luafv

     

    After FileInfo and EncryptionLegacy loads, the stack is the same as what we had earlier.

     

     

    image001

     

    Now when the NullFilter minifilter loads with an altitude of 160030, we see that it doesn’t fit in Frame 0. As before, we check for any attached legacy filter drivers and see EncryptionLegacy so we adjust Frame 0 to cover 0-149998.99. Since NullFilter still does not fit in Frame 0, we will create a new Frame and attach it above the EncryptionLegacy driver.

     

     

    image005

     

     

           
         

    The AVLegacy driver will load next, and since it is a legacy driver, it will attach above the Frame 1 instance of FltMgr.

     

    image006 

     

    The last two minifilters to load are AVMiniFilter and luafv. When AVMinifilter loads into Frame 1 with an altitude of 328000, it will see that Frame 1 at the time only supports 149998.99-160030. It follows the same algorithm to check if there are any legacy filters attached above the frame. In this case, we have AVLegacy attached above Frame 1 so we adjust Frame 1 to cover 149998.99-329998.99 before inserting AVMinifilter into Frame 1.

     

    image007

     

    By strategically injecting a dummy minifilter driver, we can get the legacy and minifilter drivers to all load at the correct altitude.

Page 3 of 24 (239 items) 12345»