• Ntdebugging Blog

    Troubleshooting Pool Leaks Part 3 – Debugging

    • 1 Comments

    In our previous articles we discussed identifying a pool leak with perfmon, and narrowing the source of the leak with poolmon.  These tools are often preferred because they are easy to use, provide verbose information, and can be run on a system without forcing downtime.  However, it is not always possible to get perfmon and poolmon data.  If a system is experiencing poor performance you may have a business need to get the system up and running as quickly as possible without allowing time to troubleshoot.  It is also possible to completely exhaust memory through a pool leak, leaving the system in a state where tools such as perfmon and poolmon will not work.  In these scenarios it may be possible to troubleshoot the poor performance by forcing a bugcheck, gathering a memory dump, and performing a post mortem analysis.

     

    Although a dump is not the ideal data to troubleshoot a leak, it can be done.  I say less than ideal because a dump is a snapshot of the system memory, and does not provide the historical data which perfmon would provide.  The lack of historical data makes it difficult to differentiate between high memory usage due to load and high memory usage due to a leak.  It is up to you, as the troubleshooter, to determine if the dump is sufficient evidence of a leak.  Sometimes identifying the tag and the driver will help you identify a known issue that causes a leak, or your knowledge of the driver architecture may allow you to determine if the memory usage is normal or not.  In some scenarios you may decide to start monitoring with perfmon and collect additional data for a future occurrence.

     

    The first step to debug a pool leak using a dump is to load the dump in windbg, set the symbol path, and reload symbols.

     

    1: kd> .symfix c:\symbols

    1: kd> .reload

    Loading Kernel Symbols

    ...............................................................

    ................................................................

    .....

     

    The !vm command will show memory utilization, the 1 flag will limit the verbosity of this command.  For the scenario of a pool leak, the significant values are “NonPagedPool Usage:” and “PagedPool Usage:”.  If the debugger identifies a value that is out of the normal range it will flag it, and we can see here that the debugger has flagged excessive nonpaged pool usage.  This is similar to the information we obtained in Part 1 using perfmon, but unlike perfmon we do not have trend data to indicate if this is temporary high pool usage due to load or if this is a leak.

     

    1: kd> !vm 1

     

    *** Virtual Memory Usage ***

            Physical Memory:      403854 (   1615416 Kb)

            Page File: \??\C:\pagefile.sys

              Current:   1048576 Kb  Free Space:   1015644 Kb

              Minimum:   1048576 Kb  Maximum:      4194304 Kb

            Available Pages:      106778 (    427112 Kb)

            ResAvail Pages:       225678 (    902712 Kb)

            Locked IO Pages:           0 (         0 Kb)

            Free System PTEs:   33533355 ( 134133420 Kb)

            Modified Pages:         4844 (     19376 Kb)

            Modified PF Pages:      4838 (     19352 Kb)

            NonPagedPool Usage:   155371 (    621484 Kb)

            NonPagedPool Max:     191078 (    764312 Kb)

            ********** Excessive NonPaged Pool Usage *****

            PagedPool 0 Usage:     27618 (    110472 Kb)

            PagedPool 1 Usage:      3848 (     15392 Kb)

            PagedPool 2 Usage:       299 (      1196 Kb)

            PagedPool 3 Usage:       283 (      1132 Kb)

            PagedPool 4 Usage:       344 (      1376 Kb)

            PagedPool Usage:       32392 (    129568 Kb)

            PagedPool Maximum:  33554432 ( 134217728 Kb)

            Session Commit:         7764 (     31056 Kb)

            Shared Commit:          6371 (     25484 Kb)

            Special Pool:              0 (         0 Kb)

            Shared Process:         5471 (     21884 Kb)

            PagedPool Commit:      32394 (    129576 Kb)

            Driver Commit:          2458 (      9832 Kb)

            Committed pages:      326464 (   1305856 Kb)

            Commit limit:         665998 (   2663992 Kb)

     

    The debugger can parse the pool tagging database and present similar information as poolmon.  The !poolused command will do this, the /t5 option will limit output to the top 5 consumers, the 2 flag will sort by nonpaged pool usage (use the 4 flag if your leak is in paged pool).

     

    1: kd> !poolused /t5 2

    ..

    Sorting by NonPaged Pool Consumed

     

                   NonPaged                  Paged

    Tag     Allocs         Used     Allocs         Used

     

    Leak       601    615424000          0            0    UNKNOWN pooltag 'Leak', please update pooltag.txt

    Pool         6      1717840          0            0    Pool tables, etc.

    nVsC       664      1531552          0            0    UNKNOWN pooltag 'nVsC', please update pooltag.txt

    netv      4369      1172224          1          144    UNKNOWN pooltag 'netv', please update pooltag.txt

    Thre       607       774048          0            0    Thread objects , Binary: nt!ps

     

    TOTAL     43424    634209952      63565    126487760

     

    The above output shows that the tag “Leak” is associated with almost all of the nonpaged pool usage.  This is the same information we obtained in Part 2 using poolmon.

     

    Now we must identify what drivers use the pool tag “Leak”. Because we have a snapshot of the system memory we can search the dump for this tag.  We can match each address to a module using the command lm a.

     

    1: kd> !for_each_module s -a @#Base @#End "Leak"

    fffff880`044b63aa  4c 65 61 6b 3b c1 0f 42-c1 41 8d 49 fd 8b d0 ff  Leak;..B.A.I....

    fffff880`044b6621  4c 65 61 6b 3b c1 0f 42-c1 33 c9 8b d0 ff 15 cc  Leak;..B.3......

    1: kd> lm a fffff880`044b63aa

    start             end                 module name

    fffff880`044b5000 fffff880`044bc000   myfault    (no symbols)

    1: kd> lm a fffff880`044b6621

    start             end                 module name

    fffff880`044b5000 fffff880`044bc000   myfault    (no symbols)

     

    The tag and driver name can be used to search the internet for known problems.  If a known issue is found a driver update may be available, and installing this update may prevent a future memory leak.

     

    If there are no updates available for the driver, or if this is your driver and you need to identify the cause of the leak, don’t panic.  In future articles we will show techniques for getting call stacks of pool allocations, these call stacks be used to identify under what conditions the driver leaks memory.

  • Ntdebugging Blog

    Troubleshooting Pool Leaks Part 2 – Poolmon

    • 8 Comments

    In our previous article we discussed how to identify a pool leak using perfmon.  Although it may be interesting to know that you have a pool leak, most customers are interested in identifying the cause of the leak so that it can be corrected.  In this article we will begin the process of identifying what kernel mode driver is leaking pool, and possibly identify why.

     

    Often when we are collecting data for a poor performance scenario there are two pieces of data that we collect.  Perfmon log data is one, as we discussed in our previous article.  The other piece of data is poolmon logs.  The memory manager tracks pool usage according to the tag associated with the pool allocations, using a technique called pool tagging.  Poolmon gathers this data and displays it in an easy to use format.  Poolmon can also be configured to dump data to a log, and in some scenarios it is beneficial to schedule poolmon to periodically collect such logs.  There are several available techniques to schedule poolmon, however that is beyond the scope of this article.

     

    Poolmon has shipped with many different packages over the years; it is currently available with the Windows Driver Kit.  If you install the WDK to the default folder, poolmon will be in “C:\Program Files (x86)\Windows Kits\8.0\Tools\x64”.  Poolmon does not have dependencies on other modules in this folder; you can copy it to your other computers when you need to investigate pool usage.

     

    How does pool tagging work?  When a driver allocates pool it calls the ExAllocatePoolWithTag API.  This API accepts a tag - a four-letter string - that will be used to label the allocation.  It is up to a driver developer to choose this tag.  Ideally each developer will choose a tag that is unique to their driver and use a different tag for each code path which calls ExAllocatePoolWithTag.  Because each tag should be unique to each driver, if we can identify the tag whose usage corresponds with the leak we can then begin to identify the driver which is leaking the memory.  The tag may also give the driver developer clues as to why the memory is being leaked, if they use a unique tag for each code path.

     

    To view the pool usage associated with each tag run “poolmon -b” from a command prompt.  This will sort by the number of bytes associated with each tag.  If you are tracking pool usage over a period of time, you can log the data to a file with “poolmon -b -n poolmonlog1.txt”, replacing 1 with increasing numbers to obtain a series of logs.  Once you have a series of logs you may be able to view usage increasing for a specific tag, in a corresponding fashion to what you see in perfmon.

     

    When analyzing poolmon the important data is at the top.  Typically the tag with the largest usage in bytes is the cause of the leak.

     

    Poolmon -b

     

    In the above data we can see that the tag with the most pool usage is “Leak”.  Now that we know what tag is leaking we need to identify what driver is using this tag.  Techniques for associating a leak with a tag vary, but findstr is often effective.  Most drivers are located in c:\windows\system32\drivers, so that is a good starting point when looking for the driver.  If you don’t find a result in that folder, go up a folder and try again, repeating until you get to the root of the drive.

     

    C:\>findstr /s Leak *.sys

    Users\Administrator\Desktop\myfault.sys:

    AìIALeak;┴☼BAì

    ·∟   §£ Θ─  A╗☻   E"Θ╡☺ Hï♣╔♂  $Θª  Hï:Hc ê\@ë

      └δ_Aï ╞♣@  ë♣▓  Aï@ë¼←  δCAâ∙uAïAìIALeak;┴☼B3ï §

     

    In the above output we can see that “Leak” is used in myfault.sys.  If we hadn’t forced this leak with notmyfault, the next step in troubleshooting would be an internet search for the tag and the driver.  Often such a search will allow you to identify a specific fault within the driver and a solution.

     

    Don’t panic if findstr doesn’t find your tag, or if you find the tag but it is not unique to one driver.  In future articles we will cover additional techniques for associating drivers with tags, and for associating allocations with specific code within a driver.

  • Ntdebugging Blog

    Troubleshooting Pool Leaks Part 1 – Perfmon

    • 7 Comments

    Over the years the NTDebugging Blog has published several articles about pool memory and pool leaks.  However, we haven’t taken a comprehensive approach to understanding and troubleshooting pool memory usage.  This upcoming series of articles is going to tackle pool leaks from the basics to advanced troubleshooting techniques.  Most of the examples will use the Windows Sysinternals tool NotMyFault to generate a leak so our readers will be able to reproduce the described behavior and repeat the troubleshooting steps.

     

    We need to start by understanding what pool is and how it is used.  Pool is virtual memory that is used by drivers in much the same way user mode applications use heap.  A driver developer calls ExAllocatePoolWithTag to get a block of memory that can be used in much the same way a user mode programmer would use memory returned by HeapAlloc or malloc.  The memory manager, which is responsible for managing pool, is able to efficiently handle small allocations by taking a page of virtual memory (typically 4KB) and breaking it up into smaller blocks.  The memory manager is also able to allocate pool in blocks larger than a page.  There are two types of pool a developer can request from ExAllocatePoolWithTag, paged pool and nonpaged pool.  As the names suggest one type of pool memory can be paged out, and the other cannot be paged.  Paged pool is used for most allocations, nonpagedpool is used for memory that will be written or read at an IRQL of DISPATCH_LEVEL or above.

     

    Pool leaks happen when a driver calls ExAllocatePoolWithTag but never calls the corresponding ExFreePool or ExFreePoolWithTag routine.  A leak is different than just high memory utilization, which may happen in normal conditions as load increases.  For example, the srv.sys driver creates work items for incoming requests, and when there is a large amount of SMB traffic to a server the pool usage from srv.sys may increase to handle this traffic.  Typically the differentiation between a leak and high memory usage due to load is that a leak never decreases.  Memory usage that is load related should decrease when the load is reduced.  Monitoring is required to differentiate between these two scenarios.  Performance Monitor (aka perfmon) is typically the most effective tool to begin such an investigation.

     

    The symptom of a pool leak is often poor system performance when the system runs out of pool, or on 64-bit systems the pool may begin to consume a large amount of the available memory.  This symptom makes perfmon an ideal tool to begin troubleshooting as it can be used to identify a wide variety of potential causes of poor performance.  Perfmon is most useful when it is started before a system enters a state of poor performance so that trend data can be analyzed leading up to the problem.

     

    You can use the below commands from an elevated command prompt to collect perfmon data from such a scenario.

     

    First create the data collector.  This command collects data from a variety of counters at a 5 minute interval and is designed to be run for several hours prior to and during a the time a system experiences poor performance (shorter intervals can be used for leaks that happen faster than several hours). We often recommend collecting these counters to perform general performance troubleshooting because we usually don’t know that there is a memory leak until after this data is collected and analyzed.

    Logman.exe create counter PerfLog-Long -o "c:\perflogs\\%computername%_PerfLog-Long.blg" -f bincirc -v mmddhhmm -max 300 -c "\LogicalDisk(*)\*" "\Memory\*" "\Cache\*" "\Network Interface(*)\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Processor(*)\*" "\Processor Information(*)\*" "\Process(*)\*" "\Redirector\*" "\Server\*" "\System\*" "\Server Work Queues(*)\*" "\Terminal Services\*" –si 00:05:00

     

    Then start collecting data:

    Logman.exe start PerfLog-Long

     

    When the performance problem is being experienced, stop collecting data:

    Logman.exe stop PerfLog-Long

     

    After you have collected the data, open the .blg file in the Performance Monitor MMC snap-in.  Browse to the Memory object, and add the counters “Pool Nonpaged Bytes” and “Pool Paged Bytes”.  This should leave you with a view similar to the below screenshot.

     

    image001

     

    The steadily increasing line in the above screenshot, without a substantial decrease in the line, is an indicator that nonpaged pool memory is being leaked.  If we look at the maximum count we see that nonpaged pool has consumed 540MB.  The significance of this value depends on the amount of RAM in the system.  In this instance the system has 1GB of RAM so nonpaged pool is consuming 54% of the available memory.  We can now conclude that the cause of the performance problem is a nonpaged pool memory leak, which is consuming a large amount of RAM and preventing other components from using this RAM.

     

    Next we need to start investigating which driver has allocated the most pool.  We will begin that in part 2.

  • Ntdebugging Blog

    How To Deadlock Yourself (Don’t Do This)

    • 1 Comments

    Some APIs should come with a warning in big red letters saying “DANGER!”, or perhaps more subtly “PROCEED WITH CAUTION”.  One such API is ExSetResourceOwnerPointer. Although the documentation contains an explanation of what limited activity you can do with the resource after making this call, its warning is not very strongly worded.

     

    You may see evidence of a call to ExSetResourceOwnerPointer in a debug.  A lock in !locks will have an unusual owner field, such as the one shown below:

    2: kd> !locks
    **** DUMP OF ALL RESOURCE OBJECTS ****
    KD: Scanning for held locks...

    Resource @ 0xfffffa8011efede8 Exclusively owned
    Contention Count = 20
    NumberOfSharedWaiters = 16
    Threads: fffff88007fab7f3-02<*> *** Unknown owner, possibly FileSystem
    fffffa80169538a0-01 fffffa801ea69b60-01 fffffa8017dfd430-01 fffffa800cd76b60-01
    fffffa801512a410-01 fffffa801279b340-01 fffffa8016d079a0-01 fffffa8015452aa0-01
    fffffa801607bb60-01 fffffa8012f79b60-01 fffffa8013b4e040-01 fffffa801b03e300-01
    fffffa800cd77040-01 fffffa8013a8e040-01 fffffa800cd76040-01 fffffa80172d7490-01

     

    The error “*** Unknown owner, possibly FileSystem” is an indicator that the owner field of this eresource has likely been modified by ExSetResourceOwnerPointer.  Fortunately for us debuggers, programmers often point the owner field to a location on the thread stack.  You can pass an address to the !thread command and it will interpret the address as a stack value.

    2: kd> !thread fffff88007fab7f3 e
    fffff88007fab7f3 is not a thread object, interpreting as stack value...
    THREAD fffffa80169538a0 Cid 0004.0638 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable
    fffffa80139ea3e0 Semaphore Limit 0x7fffffff
    IRP List:
    fffffa8016fd1010: (0006,0310) Flags: 00000884 Mdl: 00000000
    Not impersonating
    DeviceMap fffff8a000008aa0
    Owning Process fffffa800cd6a5f0 Image: System
    Attached Process N/A Image: N/A
    Wait Start TickCount 27606952 Ticks: 141 (0:00:00:02.199)
    Context Switch Count 90787
    UserTime 00:00:00.000
    KernelTime 00:00:02.496
    Win32 Start Address nt!ExpWorkerThread (0xfffff80002293a50)
    Stack Init fffff88007fabdb0 Current fffff88007fab3a0
    Base fffff88007fac000 Limit fffff88007fa6000 Call 0
    Priority 14 BasePriority 13 UnusualBoost 1 ForegroundBoost 0 IoPriority 2 PagePriority 5
    Child-SP RetAddr Call Site
    fffff880`07fab3e0 fffff800`0228da52 nt!KiSwapContext+0x7a
    fffff880`07fab520 fffff800`0228fbaf nt!KiCommitThreadWait+0x1d2
    fffff880`07fab5b0 fffff800`0224ec9e nt!KeWaitForSingleObject+0x19f
    fffff880`07fab650 fffff800`022ad98c nt!ExpWaitForResource+0xae
    fffff880`07fab6c0 fffff880`0140fc10 nt!ExAcquireSharedStarveExclusive+0x1bc
    fffff880`07fab720 fffff880`0140f8e2 sis!SipDereferenceCSFile+0x40
    fffff880`07fab750 fffff880`0140f608 sis!SipDereferencePerLink+0x62
    fffff880`07fab780 fffff880`014102e7 sis!SipDereferenceScb+0x184
    fffff880`07fab7c0 fffff800`025796e6 sis!SiFilterContextFreedCallback+0xaf
    fffff880`07fab7f0 fffff880`016b9bcc nt!FsRtlTeardownPerStreamContexts+0xe2
    fffff880`07fab840 fffff880`016b98d5 Ntfs!NtfsDeleteScb+0x108
    fffff880`07fab880 fffff880`0162ccb4 Ntfs!NtfsRemoveScb+0x61
    fffff880`07fab8c0 fffff880`016b72dc Ntfs!NtfsPrepareFcbForRemoval+0x50
    fffff880`07fab8f0 fffff880`01635882 Ntfs!NtfsTeardownStructures+0xdc
    fffff880`07fab970 fffff880`016ce813 Ntfs!NtfsDecrementCloseCounts+0xa2
    fffff880`07fab9b0 fffff880`016a838f Ntfs!NtfsCommonClose+0x353
    fffff880`07faba80 fffff880`016cd7ef Ntfs!NtfsFspClose+0x15f
    fffff880`07fabb50 fffff880`01635c0d Ntfs!NtfsCommonCreate+0x193f
    fffff880`07fabd30 fffff800`0227e787 Ntfs!NtfsCommonCreateCallout+0x1d
    fffff880`07fabd60 fffff800`0227e741 nt!KySwitchKernelStackCallout+0x27 (TrapFrame @ fffff880`07fabc20)
    fffff880`085fffe0 fffff800`0229620a nt!KiSwitchKernelStackContinue
    fffff880`08600000 fffff880`01635b2f nt!KeExpandKernelStackAndCalloutEx+0x29a
    fffff880`086000e0 fffff880`016d29c0 Ntfs!NtfsCommonCreateOnNewStack+0x4f
    fffff880`08600140 fffff880`013330b6 Ntfs!NtfsFsdCreate+0x1b0
    fffff880`086002f0 fffff800`0258d717 fltmgr!FltpCreate+0xa6
    fffff880`086003a0 fffff800`0258379f nt!IopParseDevice+0x5a7
    fffff880`08600530 fffff800`02588b16 nt!ObpLookupObjectName+0x32f
    fffff880`08600630 fffff800`0258f827 nt!ObOpenObjectByName+0x306
    fffff880`08600700 fffff800`02599438 nt!IopCreateFile+0x2b7
    fffff880`086007a0 fffff880`01405bcf nt!NtCreateFile+0x78
    fffff880`08600830 fffff880`01405fbf sis!SipOpenBackpointerStream+0x10b
    fffff880`086008f0 fffff880`0140657d sis!SipOpenCSFileWork+0x3bf
    fffff880`08600c70 fffff800`02293b61 sis!SipOpenCSFile+0x21
    fffff880`08600cb0 fffff800`0252ea26 nt!ExpWorkerThread+0x111
    fffff880`08600d40 fffff800`02264866 nt!PspSystemThreadStartup+0x5a
    fffff880`08600d80 00000000`00000000 nt!KxStartSystemThread+0x16

     

    Looking at the call stack for the above thread we can see that sis.sys is trying to acquire the eresource shared.  Ordinarily, if a thread already owns an eresource exclusive, it can obtain it shared without first releasing the exclusive ownership.  In this scenario the kernel will compare the eresource’s owner field to the current thread and if they match the thread will be allowed to take shared ownership of the eresource.  This is where the danger of ExSetResourceOwnerPointer comes into play.  If you change the owner field with ExSetResourceOwnerPointer then this check fails because the owner field doesn’t match the current thread.

     

    The result of this scenario is that the thread waits for the exclusive owner to release the lock so this thread can get shared access.  Unfortunately this thread is the exclusive owner, and it is the shared waiter.  The thread has deadlocked on itself.

     

    Even if you are careful in your handling of the resource after calling ExSetResourceOwnerPointer, there is often a risk that your driver may be re-entered in the same thread and you may end up in a scenario you didn’t initially anticipate.  This is why using this API is dangerous, and should be avoided when not absolutely necessary.

     

    This issue demonstrated in this article was addressed in KB2608658 (issue 3), which is available for download from the Microsoft Download Center.

  • Ntdebugging Blog

    What Did Storport Do With My I/O?

    • 1 Comments

    In a previous article I showed how to track an I/O request from the filesystem, through the class driver, and to the storage driver.  In that article I concluded with "From this data we can usually assume that the request has been sent to the disk drive and we are waiting for the disk to respond" and "There may be conditions where the request is stuck in storport, or in the miniport".  In this article I want to show what happens to the I/O request when gets to storport.sys.

     

    You will find that the structures needed to investigate a request inside of storport can be dumped with the dt command using publicly available storport.sys symbols.  However, until now those structures have not been useful to most debuggers because we have not described how an IRP goes through storport, or how to find an IRP once it has been sent to storport.

     

    In the previous article I showed how to find the IRP that was sent to the storport miniport driver.  Below is the example IRP we will be using today.

     

    48: kd> !irp fffffa81`4135a010

    Irp is active with 5 stacks 3 is current (= 0xfffffa814135a170)

    Mdl=fffffa8232ad1100: No System Buffer: Thread 00000000:  Irpstack trace.

         cmd  flg cl Device   File     Completion-Context

    [  0, 0]   0  0 00000000 00000000 00000000-00000000

     

                            Args: 00000000 00000000 00000000 00000000

    [  0, 0]   0  0 00000000 00000000 00000000-00000000

     

                            Args: 00000000 00000000 00000000 00000000

    >[  f, 0]   0 e1 fffffa813ba0c060 00000000 fffff880011651a0-fffffa8141609cb0 Success Error Cancel pending

                   \Driver\stormini mpio!MPIOPdoCompletion

                            Args: fffffa8231a15c10 00000000 00000000 fffffa813ba0c1b0

    [  f, 0]   0 e1 fffffa80c329f400 00000000 fffff88001487f8c-fffffa81414bc9c0 Success Error Cancel pending

                   \Driver\mpio     EmcpBase!PowerPlatformIoEnqueue

                            Args: fffffa8231a15c10 00000000 00000000 fffffa8141609cb0

    [  f, 0]   0 e1 fffffa80c34ab550 00000000 fffff88001dbaa00-fffffa8231a15af0 Success Error Cancel pending

                   \Driver\EmcpBase CLASSPNP!TransferPktComplete

                            Args: fffffa8231a15c10 00000000 00000000 00000000

     

    Although the device object in the IRP is named \Driver\stormini, the dispatch routines for the driver all point to storport.  Storport is the port driver that handles queuing the request, stormini is the miniport.

     

    48: kd> !drvobj \Driver\stormini 2

    Driver object (fffffa80c19d5bf0) is for:

    \Driver\stormini

    DriverEntry:   fffff88001729008 stormini!StateHandler

    DriverStartIo: 00000000

    DriverUnload:  fffff8800168f28c stormini

    AddDevice:     fffff88001416dc0 storport!DriverAddDevice

     

    Dispatch routines:

    [00] IRP_MJ_CREATE                      fffff880014582a0        storport!RaDriverCreateIrp

    [01] IRP_MJ_CREATE_NAMED_PIPE           fffff80001a721d4        nt!IopInvalidDeviceRequest

    [02] IRP_MJ_CLOSE                       fffff880014581c0        storport!RaDriverCloseIrp

    [03] IRP_MJ_READ                        fffff80001a721d4        nt!IopInvalidDeviceRequest

    [04] IRP_MJ_WRITE                       fffff80001a721d4        nt!IopInvalidDeviceRequest

    [05] IRP_MJ_QUERY_INFORMATION           fffff80001a721d4        nt!IopInvalidDeviceRequest

    [06] IRP_MJ_SET_INFORMATION             fffff80001a721d4        nt!IopInvalidDeviceRequest

    [07] IRP_MJ_QUERY_EA                    fffff80001a721d4        nt!IopInvalidDeviceRequest

    [08] IRP_MJ_SET_EA                      fffff80001a721d4        nt!IopInvalidDeviceRequest

    [09] IRP_MJ_FLUSH_BUFFERS               fffff80001a721d4        nt!IopInvalidDeviceRequest

    [0a] IRP_MJ_QUERY_VOLUME_INFORMATION    fffff80001a721d4        nt!IopInvalidDeviceRequest

    [0b] IRP_MJ_SET_VOLUME_INFORMATION      fffff80001a721d4        nt!IopInvalidDeviceRequest

    [0c] IRP_MJ_DIRECTORY_CONTROL           fffff80001a721d4        nt!IopInvalidDeviceRequest

    [0d] IRP_MJ_FILE_SYSTEM_CONTROL         fffff80001a721d4        nt!IopInvalidDeviceRequest

    [0e] IRP_MJ_DEVICE_CONTROL              fffff88001453010        storport!RaDriverDeviceControlIrp

    [0f] IRP_MJ_INTERNAL_DEVICE_CONTROL     fffff880014016c0        storport!RaDriverScsiIrp

    [10] IRP_MJ_SHUTDOWN                    fffff80001a721d4        nt!IopInvalidDeviceRequest

    [11] IRP_MJ_LOCK_CONTROL                fffff80001a721d4        nt!IopInvalidDeviceRequest

    [12] IRP_MJ_CLEANUP                     fffff80001a721d4        nt!IopInvalidDeviceRequest

    [13] IRP_MJ_CREATE_MAILSLOT             fffff80001a721d4        nt!IopInvalidDeviceRequest

    [14] IRP_MJ_QUERY_SECURITY              fffff80001a721d4        nt!IopInvalidDeviceRequest

    [15] IRP_MJ_SET_SECURITY                fffff80001a721d4        nt!IopInvalidDeviceRequest

    [16] IRP_MJ_POWER                       fffff8800141a9d0        storport!RaDriverPowerIrp

    [17] IRP_MJ_SYSTEM_CONTROL              fffff88001458f40        storport!RaDriverSystemControlIrp

    [18] IRP_MJ_DEVICE_CHANGE               fffff80001a721d4        nt!IopInvalidDeviceRequest

    [19] IRP_MJ_QUERY_QUOTA                 fffff80001a721d4        nt!IopInvalidDeviceRequest

    [1a] IRP_MJ_SET_QUOTA                   fffff80001a721d4        nt!IopInvalidDeviceRequest

    [1b] IRP_MJ_PNP                         fffff88001459b00        storport!RaDriverPnpIrp

     

    To find what storport has done with the IRP we must look at the unit.  The device extension contains the information regarding the state of the unit.

     

    48: kd> !devobj fffffa813ba0c060

    Device object (fffffa813ba0c060) is for:

    000000aa \Driver\stormini DriverObject fffffa80c19d5bf0

    Current Irp 00000000 RefCount 0 Type 00000007 Flags 00001050

    Dacl fffff9a10049abb0 DevExt fffffa813ba0c1b0 DevObjExt fffffa813ba0c630 DevNode fffffa80c19dad90

    ExtensionFlags (0x00000800)

                                 Unknown flags 0x00000800

    AttachedDevice (Upper) fffffa80c31ef9d0 \Driver\Disk

    Device queue is not busy.

    48: kd> dt storport!_RAID_UNIT_EXTENSION fffffa813ba0c1b0

       +0x000 ObjectType       : 1 ( RaidUnitObject )

       +0x008 DeviceObject     : 0xfffffa81`3ba0c060 _DEVICE_OBJECT

       +0x010 Adapter          : 0xfffffa80`c25a61b0 _RAID_ADAPTER_EXTENSION

       +0x018 SlowLock         : 0

       +0x020 DeviceState      : 1 ( DeviceStateWorking )

       +0x028 NextUnit         : _LIST_ENTRY [ 0xfffffa80`c25941d8 - 0xfffffa80`c25a6228 ]

       +0x038 UnitTableLink    : _LIST_ENTRY [ 0xfffffa80`c25a3320 - 0xfffffa80`c25e7b48 ]

       +0x048 Address          : _STOR_SCSI_ADDRESS

       +0x050 Identity         : _STOR_SCSI_IDENTITY

       +0x070 AlignmentInfo    : _STOR_ALIGNMENT_INFO

       +0x080 Flags            : <unnamed-tag>

       +0x088 ZeroOutstandingEvent : _KEVENT

       +0x0a0 OutstandingCount : 0n30

       +0x0a4 PagingPathCount  : 0

       +0x0a8 CrashDumpPathCount : 0

       +0x0ac HiberPathCount   : 0

       +0x0b0 TagList          : _QUEUE_TAG_LIST

       +0x0d0 IoQueue          : _IO_QUEUE

       +0x140 MaxQueueDepth    : 0xff

       +0x144 Power            : _RAID_POWER_STATE

       +0x150 PendingQueue     : _STOR_EVENT_QUEUE

       +0x170 PendingTimer     : _KTIMER

       +0x1b0 PendingDpc       : _KDPC

       +0x1f0 PauseTimer       : _KTIMER

       +0x230 PauseTimerDpc    : _KDPC

       +0x270 RestartDpc       : _KDPC

       +0x2b0 CommonBufferVAs  : (null)

       +0x2b8 CommonBufferSize : 0

       +0x2bc CommonBufferBlocks : 0

       +0x2c0 UnitExtension    : 0xfffffa80`c185dda0 Void

       +0x2c8 DefaultTimeout   : 0xa

       +0x2d0 DeferredList     : <unnamed-tag>

       +0x410 ResetCount       : 0n1

       +0x418 ResetResources   : _RAID_IO_RESOURCES

       +0x438 ResetResourcesAcquired : 0n1

       +0x43c SenseInfoSize    : 0x14 ''

       +0x440 IdleCounter      : (null)

       +0x448 PowerUpRequired  : 0n0

       +0x450 HierarchicalResetWorkItem : (null)

       +0x458 HierarchicalResetWorkItemTimeoutCountdown : 0n0

       +0x45c HierarchicalResetAbortCount : 1

       +0x460 HierarchicalResetWorkItemSpinLock : 0

       +0x468 RegistryParameters : _RAID_UNIT_REGISTRY_PARAMETERS

     

    Two parts of the device extension are relevant to finding the I/O request, the PendingQueue and the IoQueue.  Under most conditions, when a request is sent to storport it will be sent to the miniport driver and put on the PendingQueue.  The PendingQueue has a list of outstanding requests and a timer that is decremented every second.  The timer is refreshed when the request on the head of the queue is completed.  A request is completed when the miniport calls StorPortNotification with RequestComplete.  If the timer is decremented to 0, the request has timed out and storport will do a hierarchical reset to attempt to correct the problem.

     

    In the below PendingQueue we can see that there is one entry in the queue, and the Timeout is set to -2.  Under most conditions the Timeout will be a value between 0 and 0x3c (0n60), indicating the number of seconds left in the timer.  A Timeout value of -2 indicates that there was a timeout and it is being handled.  A Timeout value of -1 indicates that the queue is empty.

     

    48: kd> dt storport!_RAID_UNIT_EXTENSION fffffa813ba0c1b0 PendingQueue.

       +0x150 PendingQueue  :

          +0x000 List          : _LIST_ENTRY [ 0xfffff880`05906040 - 0xfffff880`05906040]

          +0x010 Lock          : 0

          +0x018 Timeout       : 0xfffffffe

     

    Items in the queue will be in the form of an XRB, or Extended Request Block.  There is one request in the pending queue (we know there is one request because the Flink and Blink point to the same address).  The Irp for this XRB is not our IRP, so we need to go look somewhere else for our request.

     

    48: kd> dt storport!_EXTENDED_REQUEST_BLOCK 0xfffff880`05906040-30

       +0x000 Signature        : 0x1f2e3d4c

       +0x008 Pool             : (null)

       +0x010 OwnedMdl         : 0y0

       +0x010 RemoveFromEventQueue : 0y1

       +0x010 State            : 0y011

       +0x010 RemappedSenseInfo : 0y1

       +0x012 InitiatingProcessor : _PROCESSOR_NUMBER

       +0x018 InitiatingToken  : (null)

       +0x020 CompletedLink    : _SLIST_ENTRY

       +0x030 PendingLink      : _STOR_EVENT_QUEUE_ENTRY

       +0x048 Mdl              : (null)

       +0x050 SgList           : (null)

       +0x058 RemappedSgListMdl : (null)

       +0x060 RemappedSgList   : (null)

       +0x068 Irp              : 0xfffffa82`35142ee0 _IRP

       +0x070 Srb              : 0xfffffa81`41cf54a0 _SCSI_REQUEST_BLOCK

       +0x078 SrbData          : <unnamed-tag>

       +0x098 Adapter          : 0xfffffa80`c25a61b0 _RAID_ADAPTER_EXTENSION

       +0x0a0 Unit             : 0xfffffa81`3ba0c1b0 _RAID_UNIT_EXTENSION

       +0x0a8 ScatterGatherBuffer : [424]  ""

       +0x250 CompletionRoutine : 0xfffff880`014199d0     void  storport!RaidUnitCompleteResetRequest+0

       +0x258 u                : <unnamed-tag>

       +0x270 RequestStartTimeStamp : 0

     

    If the unit is in a state where the request cannot be sent to the miniport, the requests will be queued on the DeviceQueue in the IoQueue. Storport may not be able to send the request to the miniport because the queue is full, the queue is paused, or because an untagged request is outstanding and tagged requests cannot be sent until the untagged request is completed.

     

    In the below DeviceQueue we can see that the unit is paused (the PauseCount is 1).  This would explain why our request is not in the PendingQueue.  It should be in the DeviceQueue.

     

    48: kd> dt storport!_RAID_UNIT_EXTENSION fffffa813ba0c1b0 IoQueue.DeviceQueue.

       +0x0d0 IoQueue              :

          +0x010 DeviceQueue          :

             +0x000 Type                 : 0n258

             +0x002 Size                 : 0n88

             +0x008 Gateway              : 0xfffffa80`c25a6470_STOR_IO_GATEWAY

             +0x010 DeviceList           : _LIST_ENTRY [ 0xfffffa81`4232d3a8 - 0xfffffa81`4172c768 ]

             +0x020 ByPassList           : _LIST_ENTRY [ 0xfffffa81`3ba0c2b0 - 0xfffffa81`3ba0c2b0 ]

             +0x030 Lock                 : 0

             +0x038 Depth                : 0n32

             +0x03c OutstandingCount     : 0n0

             +0x040 DeviceCount          : 0n27

             +0x044 ByPassCount          : 0n0

             +0x048 ByPassPowerCount     : 0n0

             +0x04c PauseCount           : 0n1

             +0x050 BusyCount            : 0n0

             +0x054 Frozen               : 0 ''

             +0x055 Locked               : 0 ''

             +0x056 Untagged             : 0 ''

             +0x057 PowerLocked          : 0 ''

     

    Requests from the DeviceQueue are in the DeviceList.  We can dump the DeviceList with !list and look for our IRP.

     

    48: kd> !list "-t nt!_IRP.Tail.Overlay.DeviceQueueEntry.DeviceListEntry.Flink -x\"??@$extret\" 0xfffffa814232d3a8-78"

    unsigned int64 0xfffffa81`4232d330

     

    unsigned int64 0xfffffa81`bea59630

     

    unsigned int64 0xfffffa81`41dad420

     

    unsigned int64 0xfffffa82`32e90770

     

    unsigned int64 0xfffffa81`4135a010

    <snip the end of the list>

     

    Now that we have found our request in the DeviceQueue's DeviceList we may want to look into why the request has not been sent to the disk and put on the PendingQueue.  Earlier I mentioned that the Timeout value in the PendingQueue is -2, indicating that a request in the PendingQueue has timed out.  If we take another look at the XRB from the PendingQueue, we see that the CompletionRoutine is RaidUnitCompleteResetRequest.  This indicates that a LUN reset request was sent to the disk, and it timed out. When a LUN reset times out storport calls the miniport’s bus reset function. When the bus reset function is called the miniport is required to complete all requests issued to it.  In this instance the reset was issued, but for some reason the reset has not yet completed and so all of the outstanding requests are stuck.  For more information on storport timeouts, see our article on Event 129 Errors.

  • Ntdebugging Blog

    Hotfix to Enable Mini-Filter Performance Diagnostics With XPerf for Windows Server 2008R2

    • 2 Comments

    Greetings ntdebugging community, Bob here again and today I would like to let everyone know about a new feature implemented in Windows Server 2008 R2’s kernel and filter manager binaries released in knowledge base article 2666390.

     

    Beginning with this update, a minifilter that is adversely affecting system performance can be identified in Windows 2008 R2. These measurements are taken within the filter manager and measure the execution time of the minifilter’s pre and post operations.  On every file access file system minifilters are given the chance to perform pre and post modifications to IO operations within the driver stack.

     

    To start recording the operations and to view the results, the latest version of the Windows Performance Toolkit from the Windows 8 beta ADK needs to be installed on the system. How to install the package can be found in the KB article.

     

    The command to start recording and to record stack traces is (note that these commands may wrap depending on your screen resolution, they are intended to be typed in one line):

    xperf -on  PROC_THREAD+LOADER+FLT_IO_INIT+FLT_IO+FLT_FASTIO+FLT_IO_FAILURE+FILENAME -stackwalk MiniFilterPreOpInit+MiniFilterPostOpInit

     

    The command above starts a trace with most of the buffering done in memory. The command below specifies a file used as a backing store to use less memory:

    xperf -start -f <traceFile> -on  PROC_THREAD+LOADER+FLT_IO_INIT+FLT_IO+FLT_FASTIO+FLT_IO_FAILURE+FILENAME -stackwalk MiniFilterPreOpInit+MiniFilterPostOpInit

     

    Note: <traceFile> is the path of the file to store the backing file (i.e. trace.etl).  It is recommended that the backing file be placed on a drive other than the one experiencing the problem, to avoid xperf operations interfering with or contributing to the scenario you are measuring.  You can add the +FILENAME switch if you want to log names of the files that the minifilter is operating on.

     

    You may not know if a minifilter is causing the problem until after you begin data analysis, so you can add the filter manager tracing to the Driver Delays command presented in my previous blog, Using Xperf to investigate slow I/O issues.

    xperf -on PROC_THREAD+LOADER+CSWITCH+FILENAME+FILE_IO+FILE_IO_INIT+DRIVERS+FLT_IO_INIT+FLT_IO+FLT_FASTIO+FLT_IO_FAILURE -f kernel.etl -stackwalk CSwitch+DiskReadInit+DiskWriteInit+DiskFlushInit+FileCreate+FileCleanup+FileClose+FileRead+FileWrite+MiniFilterPreOpInit+MiniFilterPostOpInit -BufferSize 1024 -MaxBuffers 1024 -MaxFile 1024 -FileMode Circular

     

    To stop the trace and flush data to the trace file, type the following command:

    xperf –stop –d mymergedtrace.etl

     

    The file specified must be a different file name than the one provided with the –f switch if that option was used when starting the trace.

     

    Before we begin looking at the trace, let’s talk a little about what we are tracing. We are tracing fltmgr calling into the minifilter.  The fltmgr manages minifilters.  When a file operation is done, each minifilter has a chance to process and optionally modify an operation on a file before and after the call goes into the file system.  For example, in this case we are going to see results of the time it took for the minifilter to do a “pre create” operation. That is the function that is called before the “create file” function is sent to the file system.  This tracing can also track the time it took for the minifilter to do a “post create” operation.  That is the minifilter function that gets called after the create file is sent to the file system.  So the minifilter is called before and after every file system operation. The minifilter may not have functions to process before and after, however it has the option.

     

    Once the problem is traced, the data can be viewed by clicking on the merged etl file and you will see a similar presentation as below.

     image001 

     

    The Mini-Filter Delays are in the Storage display. So click on the triangle on the left side of Storage and you will see:

    image002 

     

    To get more detail, right click on the “Mini-Filter Delays” section and select “Add To New Analysis View”and that will bring up minifilter delays detail as below.

    Image3

     

    The Duration is the time in micro seconds of the total time delay in the minifilter. So in the example above the scanner minifilter did 4,068 requests and it took 30 seconds total. 

     

    We can expand the display by clicking on the triangle next to scanner.sys to get a more detailed view, I chose to investigate scanner.sys because it has the longest Duration.  I dragged the “Major Function Name (Count)” and “File Name” columns to the left of the yellow bar to get detail and used the triangles to expand the ProcessID with the longest Duration, and then I expanded the File Name with the longest Duration.   To get more usable space on your monitor, you can click the X to close the Graph Explorer (to get it back, from the Window menu choose Graph Explorer).

    Image4

     

    On the display above it shows the scanner.sys filter was operating on the file “C:\users\Administrator\Desktop\scanner\test2.txt”   The green 6 is the total operations. So there were three cleanups and three creates.  So the display is telling us that the filter did 3 “create” operations that took close to 15 seconds and 3 cleanup operations that took 5 milliseconds on the file.

     

    If you prefer operating from a command line rather than a GUI, this same data can be extracted using the below command.  Note that -min specifies the time in ns, so this command will show delays longer than 100 ms.

    xperf -i mymergedtrace.etl -a minifilterdelay -min 100000

     

    This will give you output similar to the below information:

    [1/2]    100.0%

    [2/2]    100.0%

    There are 3 MiniFilter Delay completion events in given time range.

       Process Name,     Module Name,       Call (us),     Return (us),  Time (us), Major Func Name,                       Filename

       explorer.exe,     scanner.sys,        18649145,        23645502,    4996357,          CREATE, C:\Users\Administrator\Desktop\scanner\test2.txt

       explorer.exe,     scanner.sys,        23646355,        28644638,    4998282,          CREATE, C:\Users\Administrator\Desktop\scanner\test2.txt

       explorer.exe,     scanner.sys,        28645194,        33645158,    4999964,          CREATE, C:\Users\Administrator\Desktop\scanner\test2.txt

     

    It seems there is a problem in the create file handling of the scanner.sys minifilter.  For some reason the create file function in the minifiltertook 5 seconds for each operation.  The vendor of this filter would need to do additional investigation to identify why the filter has this delay.

  • Ntdebugging Blog

    Debugging a Crash, Found a Trojan

    • 8 Comments

    Hi, I'm Manish from Global Escalation Services. I would like to present a multiple random bug check issue, which was caused by malicious code (trojan) running on the machine. This is the walkthrough of how we found the virus on the server.


    In this particular dump, the machine crashed with Bugcheck 0xA (IRQL_NOT_LESS_OR_EQUAL) because we got a Page Fault at dispatch level, also known as DPC Level (IRQL 2). Windows system architecture governs that we cannot have a page fault at dispatch level because paging requires I/O, I/O requires a wait, and we cannot wait while the IRQL is above dispatch level.  So when this anomaly happens Windows will intentionally crash the machine.

    We trapped in CcMapData because we touched the address c226d800 which was not valid because it’s paged out. If we look at implementation of CcMapData on MSDN http://msdn.microsoft.com/en-us/library/windows/hardware/ff539155(v=vs.85).aspx we see that this function can only be called below Dispatch Level, but the current CPU IRQL is 2.


    So how did this happen?  The most likely possibility is that some driver on the stack raised the IRQL by calling KeRaiseIrql and then forgot to lower it by calling KeLowerIrql.  There are many drivers on this stack and anyone could be the culprit. It is difficult to track when one of these may have done it as it’s long gone.


    We could have enabled IRQL Checking using Driver Verifier to find the culprit but we have to enable it on all drivers listed in the stack (and possibly others), which could cause some performance issues.  Also this was not the only crash we were seeing; there were various stop codes.

    So before going the verifier route I decided to dig more into this dump. I noticed there is an address on the stack for which the module name is not getting resolved. This looked odd (suspicious) as most of the time the debugger does a great job of finding the module.

    I started investigating what is this module.  It turned out to be a Trojan “TrojanDropper:Win32/Sirefef.B”.
    http://www.microsoft.com/security/portal/Threat/Encyclopedia/Entry.aspx?Name=TrojanDropper%3AWin32%2FSirefef.B

     

    0: kd> !analyze -v

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

    IRQL_NOT_LESS_OR_EQUAL (a)

    An attempt was made to access a pageable (or completely invalid) address at an

    interrupt request level (IRQL) that is too high.  This is usually

    caused by drivers using improper addresses.

    If a kernel debugger is available get the stack backtrace.

    Arguments:

    Arg1: c226d800, memory referenced

    Arg2: d0000002, IRQL

    Arg3: 00000000, bitfield :

          bit 0 : value 0 = read operation, 1 = write operation

          bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)

    Arg4: 808b64a6, address which referenced memory

     

    0: kd> kv

    ChildEBP RetAddr  Args to Child             

    f78ae41c 808b64a6 badb0d00 00000000 00000001 nt!KiTrap0E+0x2a7 (FPO: [0,0] TrapFrame @ f78ae41c)

    f78ae4cc f71a6f2d 8b22d520 f78ae4fc 00000400 nt!CcMapData+0x8c (FPO: [Non-Fpo])

    f78ae4ec f71a4494 f78ae7ec 8b64c150 01c6d800 Ntfs!NtfsMapStream+0x4b (FPO: [Non-Fpo])

    f78ae560 f71a6df0 f78ae7ec 8b3a7100 e7c50ce0 Ntfs!NtfsReadMftRecord+0x86 (FPO: [Non-Fpo])

    f78ae598 f71a6fac f78ae7ec 8b3a7100 e7c50ce0 Ntfs!NtfsReadFileRecord+0x7a (FPO: [Non-Fpo])

    f78ae5d0 f718e46d f78ae7ec e7c50cd8 e7c50ce0 Ntfs!NtfsLookupInFileRecord+0x37 (FPO: [Non-Fpo])

    f78ae67c f718e541 f78ae7ec e7c50cd8 f718e2cb Ntfs!NtfsWalkUpTree+0xbe (FPO: [Non-Fpo])

    f78ae6d8 f718e263 f78ae7ec e7c50cd8 00000000 Ntfs!NtfsBuildNormalizedName+0x44 (FPO: [Non-Fpo])

    f78ae704 f7196c2e f78ae7ec 8801d600 e7c50da0 Ntfs!NtfsQueryNameInfo+0x4b (FPO: [Non-Fpo])

    f78ae774 f71a0ff6 f78ae7ec 88081cb0 8b5f9260 Ntfs!NtfsCommonQueryInformation+0x291 (FPO: [Non-Fpo])

    f78ae7d8 f71a102f f78ae7ec 88081cb0 00000001 Ntfs!NtfsFsdDispatchSwitch+0x12a (FPO: [Non-Fpo])

    f78ae8f4 8081df85 8b3a7020 88081cb0 88081cb0 Ntfs!NtfsFsdDispatchWait+0x1c (FPO: [Non-Fpo])

    f78ae908 f721fd28 8b2fd220 8b60d308 8b54b020 nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    f78ae934 8081df85 8b5f9260 88081cb0 88081cb0 fltMgr!FltpDispatch+0x152 (FPO: [Non-Fpo])

    f78ae948 f721fd28 88081ed0 8b60d308 c000000d nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    f78ae974 8081df85 8b54b020 88081cb0 88081cb0 fltMgr!FltpDispatch+0x152 (FPO: [Non-Fpo])

    f78ae988 f5c62bec 88081cb0 8b54a980 88081cb0 nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    WARNING: Stack unwind information not available. Following frames may be wrong.

    f78ae99c f5c5e3ee 8b03b690 00000000 f78ae9c0 CtxSbx+0x5bec

    f78ae9ac 8081df85 8b03b690 88081cb0 8b0528e0 CtxSbx+0x13ee

    f78ae9c0 f61074e1 8b0528e0 8801d600 f78aea00nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    f78ae9e4 f61075d0 8b0a9b80 00081cb0 f78aea08 CtxAltStr+0x44e1

    f78ae9f4 8081df85 8b0a9b80 88081cb0 88081ed0 CtxAltStr+0x45d0

    f78aea08 8b5cfc89 00000000 87ef0000 87ef9002 nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    f78aec90 8081df85 8b5ef610 8b08c110 8b08c2a8 0x8b5cfc89 <------------ This looks odd what is this module?

    f78aeca4 f7241607 8b08c2a8 00000000 f78aece8nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    f78aecb4 f72412b2 8b08c2a8 8b392b70 87f11974 CLASSPNP!SubmitTransferPacket+0xbb (FPO: [Non-Fpo])

    f78aece8 f7241533 00000000 00000e00 87f11808 CLASSPNP!ServiceTransferRequest+0x1e4 (FPO: [Non-Fpo])

    f78aed0c 8081df85 8b392ab8 00000000 8b2a2670 CLASSPNP!ClassReadWrite+0x159 (FPO: [Non-Fpo])

    f78aed20 f74c80cf 8b2b7d80 87f11998 f78aed44 nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    f78aed30 8081df85 8b6e4020 87f11808 87f119bc PartMgr!PmReadWrite+0x95 (FPO: [Non-Fpo])

    f78aed44 f7317053 87f119d8 8b6e8148 882b9888 nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    f78aed60 8081df85 8b2b7cc8 87f11808 87f119fc ftdisk!FtDiskReadWrite+0x1a9 (FPO: [Non-Fpo])

    f78aed74 f72cf4f5 885360cc 88536098 87f26810 nt!IofCallDriver+0x45 (FPO: [Non-Fpo])

    f78aed90 f72d1517 87f11808 885360cc 8b60c470 volsnap!VspDecrementIrpRefCount+0x14f (FPO: [Non-Fpo])

    f78aeda8 f72c0398 87f26810 87f26810 88536098 volsnap!VspWriteVolumePhase22+0x3d (FPO: [Non-Fpo])

    f78aee30 f72d15d3 00f26810 885360cc 00000000 volsnap!VspAcquireNonPagedResource+0xc6 (FPO: [Non-Fpo])

    f78aee50 8081e123 00000000 87f215e8 88536098 volsnap!VspWriteVolumePhase2+0x59 (FPO: [Non-Fpo])

    f78aee80 f7241829 f78aeeb0 f72413ec 8b392ab8 nt!IopfCompleteRequest+0xcd (FPO: [Non-Fpo])

    f78aee88 f72413ec 8b392ab8 87f215e8 00000001 CLASSPNP!ClassCompleteRequest+0x11 (FPO: [Non-Fpo])

    f78aeeb0 8081e123 00000000 8b08c428 8b08c5c0 CLASSPNP!TransferPktComplete+0x1fd (FPO: [Non-Fpo])

    f78aeee0 f7266545 8b6e30e8 8b08c428 f78aef24 nt!IopfCompleteRequest+0xcd (FPO: [Non-Fpo])

    f78aeef0 f7265a8a 87f856a8 00000001 00000000 SCSIPORT!SpCompleteRequest+0x5e (FPO: [Non-Fpo])

    f78aef24 f7265130 8b6e30e8 87f856a8 f78aef9b SCSIPORT!SpProcessCompletedRequest+0x6a7 (FPO: [Non-Fpo])

    f78aef9c 8083211c 8b6e30a4 8b6e3030 00000000 SCSIPORT!ScsiPortCompletionDpc+0x2bd (FPO: [Non-Fpo])

    f78aeff4 8088dba7 f3f4f92c 00000000 00000000 nt!KiRetireDpcList+0xca (FPO: [Non-Fpo])

    f3f4f948 80a603d9 ffdffa02 f3f4f980 f3f4f980 nt!KiDispatchInterrupt+0x37 (FPO: [Uses EBP] [0,0,1])

    f3f4f964 80a60577 8b6c226c f3f4f980 8088d91d hal!HalpCheckForSoftwareInterrupt+0x81 (FPO: [Non-Fpo])

    f3f4f970 8088d91d 8b304c00 000001a3 f3f4fa04 hal!HalEndSystemInterrupt+0x67 (FPO: [Non-Fpo])

    f3f4f970 809395e7 8b304c00 000001a3 f3f4fa04 nt!KiInterruptDispatch+0x5d (FPO: [0,2] TrapFrame @ f3f4f980)

    f3f4fa04 80939c37 88139178 88028301 00000008 nt!ObpAllocateObject+0x199 (FPO: [Non-Fpo])

    f3f4fa38 808f8d28 00000000 8b76aad0 f3f4fa70 nt!ObCreateObject+0x129 (FPO: [Non-Fpo])

    f3f4fb44 80937a40 8b6afd10 00000000 88028398 nt!IopParseDevice+0x710 (FPO: [Non-Fpo])

    f3f4fbc4 80933b74 00000000 f3f4fc04 00000042 nt!ObpLookupObjectName+0x5b0 (FPO: [Non-Fpo])

    f3f4fc18 808eaee7 00000000 00000000 dffa7c01 nt!ObOpenObjectByName+0xea (FPO: [Non-Fpo])

    f3f4fc94 808ec181 077ef198 c0140000 077ef14c nt!IopCreateFile+0x447 (FPO: [Non-Fpo])

    f3f4fcf0 808eec10 077ef198 c0140000 077ef14c nt!IoCreateFile+0xa3 (FPO: [Non-Fpo])

    f3f4fd30 808897ec 077ef198 c0140000 077ef14c nt!NtCreateFile+0x30 (FPO: [Non-Fpo])

    f3f4fd30 7c82847c 077ef198 c0140000 077ef14c nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ f3f4fd64)

    077ef22c 00000000 00000000 00000000 00000000 0x7c82847c

       

    The current CPU IRQL is 2, at which pagefaults cannot be serviced, hence windows crashed the machine.

    0: kd> !irql

    Debugger saved IRQL for processor 0x0 -- 2 (DISPATCH_LEVEL)

     

    Un-assembling from the return address of this unknown driver, to try to confirm that this is really driver code and that it really does belong in this call stack:

    0: kd> ub 8b5cfc89

    8b5cfc62 8bd7            mov     edx,edi

    8b5cfc64 c60605          mov     byte ptr [esi],5

    8b5cfc67 895e18          mov     dword ptr [esi+18h],ebx

    8b5cfc6a c7460400020000  mov     dword ptr [esi+4],200h

    8b5cfc71 c7460809000000  mov     dword ptr [esi+8],9

    8b5cfc78 c7461c70fb5c8b  mov     dword ptr [esi+1Ch],8B5CFB70h

    8b5cfc7f c64603e0        mov     byte ptr [esi+3],0E0h

    8b5cfc83 ff158c805d8b    call    dword ptr ds:[8B5D808Ch]


    Sure does call IofCallDriver

    0: kd> dps 8B5D808Ch l1

    8b5d808c  8081df40 nt!IofCallDriver

     

    So what is this driver? In Windows every image starts with a PE header which starts with letters “MZ”. So I started looking for PE header. I took the offset of the module on the stack and went back to its page boundary, then I started moving 1 page back at a time looking for the PE header.

    0: kd> dc 8b5cf000

    8b5cf000  0010b9f2 f02b0000 3b063c8b 83137538  ......+..<.;8u..

    8b5cf010  c08304e9 04f98304 b85fee73 00000001  ........s._.....

    8b5cf020  9cb8c35e b98b5d86 00000010 ff8bd02b  ^....]......+...

    8b5cf030  3b02348b 83137530 c08304e9 04f98304  .4.;0u..........

    8b5cf040  b85fee73 00000001 335fc35e ccc35ec0  s._.....^._3.^..

    8b5cf050  83ec8b55 78a04cec 538b5da4 c0b60f56  U....L.x.].SV...

    8b5cf060  827ae857 ff330000 33f46589 f845c7db  W.z...3..e.3..E.

    8b5cf070  00000400 8b084d8b 52510c55 50b4458d  .....M..U.QR.E.P

     

    0: kd> dc 8b5cf000-1000

    8b5ce000  01c73024 00000000 c70cc483 00800002  $0..............

    8b5ce010  845e5f00 b10874db 0815ff01 8b8b5d80  ._^..t.......]..

    8b5ce020  6a0c2444 15ff5000 8b5d8028 14c25b5d  D$.j.P..(.].][..

    8b5ce030  cccccc00 cccccccc cccccccc cccccccc  ................

    8b5ce040  6a306a56 2415ff00 8b8b5d80 74f685f0  Vj0j...$.].....t

    8b5ce050  6a006a53 68026a01 8b5d0892 5d08ae68  Sj.j.j.h..].h..]

    8b5ce060  ff006a8b 5d80ac15 ff56508b 5d80a815  .j.....].PV....]

    8b5ce070  000d8b8b ff8b5da2 5d80b015 6a006a8b  .....].....].j.j


    0: kd> dc 8b5cf000-1000*2

    8b5cd000  0689c033 89044689 46890846 1046890c  3....F..F..F..F.

    8b5cd010  89144689 ec831846 1c46891c 57204689  .F..F.....F..F W

    8b5cd020  85244689 8d0574f6 02eb284e c033c933  .F$..t..N(..3.3.

    8b5cd030  41890189 08418904 8b34468b 4e891048  ...A..A..F4.H..N

    8b5cd040  14508b10 8b145689 4e891848 1c508b18  ..P..V..H..N..P.

    8b5cd050  8b1c5689 4e891848 1c508b20 2424448d  .V..H..N .P..D$$

    8b5cd060  244c8d50 186a510c 8b245689 6a302454  P.L$.Qj..V$.T$0j

    8b5cd070  46c65205 46c70005 00000008 0c46c700  .R.F...F......F.

     

    Finally I found the PE header for this image.

    0: kd> dc 8b5cf000-1000*3

    8b5cc000  00905a4d 00000003 00000004 0000ffff  MZ..............

    8b5cc010  000000b8 00000000 00000040 00000000  ........@.......

    8b5cc020  00000000 00000000 00000000 00000000  ................

    8b5cc030  00000000 00000000 00000000 000000d0  ................

    8b5cc040  0eba1f0e cd09b400 4c01b821 685421cd  ........!..L.!Th

    8b5cc050  70207369 72676f72 63206d61 6f6e6e61  is program canno

    8b5cc060  65622074 6e757220 206e6920 20534f44  t be run in DOS

    8b5cc070  65646f6d 0a0d0d2e 00000024 00000000  mode....$.......

     

    Using the built-in debugger extension !dh I dumped the header of this image to find the name.  Unfortunately there is no name for this image. This address 8b5cc000 is not in the loaded module list, which raised further suspicion. Either it is hiding its load address or this driver was not loaded by standard loading mechanism. This module date shows it’s pretty recent build.

    0: kd> !dh 8b5cc000

     

    File Type: DLL

    FILE HEADER VALUES

         14C machine (i386)

           4 number of sections

    4EA3461E time date stamp Sun Oct 23 04:09:26 2011

     

           0 file pointer to symbol table

           0 number of symbols

          E0 size of optional header

        2102 characteristics

                Executable

                32 bit word machine

                DLL

     

    OPTIONAL HEADER VALUES

         10B magic #

        9.00 linker version

        A400 size of code

        2000 size of initialized data

           0 size of uninitialized data

        46C0 address of entry point

        1000 base of code

             ----- new -----

    10000000 image base

        1000 section alignment

         200 file alignment

           1 subsystem (Native)

        5.00 operating system version

        0.00 image version

        5.00 subsystem version

       10000 size of image

         400 size of headers

        DF1F checksum

    00100000 size of stack reserve

    00001000 size of stack commit

    00100000 size of heap reserve

    00001000 size of heap commit

           0  DLL characteristics

           0 [       0] address [size] of Export Directory

        C91C [      3C] address [size] of Import Directory

           0 [       0] address [size] of Resource Directory

           0 [       0] address [size] of Exception Directory

           0 [       0] address [size] of Security Directory

        F000 [     3FC] address [size] of Base Relocation Directory

           0 [       0] address [size] of Debug Directory

           0 [       0] address [size] of Description Directory

           0 [       0] address [size] of Special Directory

           0 [       0] address [size] of Thread Storage Directory

           0 [       0] address [size] of Load Configuration Directory

           0 [       0] address [size] of Bound Import Directory

        C000 [     1A8] address [size] of Import Address Table Directory

           0 [       0] address [size] of Delay Import Directory

           0 [       0] address [size] of COR20 Header Directory

           0 [       0] address [size] of Reserved Directory

     

     

    SECTION HEADER #1

       .text name

        A354 virtual size

        1000 virtual address

        A400 size of raw data

         400 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    60000020 flags

             Code

             (no align specified)

             Execute Read

     

    SECTION HEADER #2

      .rdata name

        13AC virtual size

        C000 virtual address

        1400 size of raw data

        A800 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    40000040 flags

             Initialized Data

             (no align specified)

             Read Only

     

    SECTION HEADER #3

       .data name

         4B0 virtual size

        E000 virtual address

         200 size of raw data

        BC00 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    C0000040 flags

             Initialized Data

             (no align specified)

             Read Write

     

    SECTION HEADER #4

      .reloc name

         576 virtual size

        F000 virtual address

         600 size of raw data

        BE00 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    42000040 flags

             Initialized Data

             Discardable

             (no align specified)

             Read Only


    0: kd> !lmi 8b5cc000 

    Loaded Module Info: [8b5cc000]

    ffffffff8b5cc000 is not a valid address

     

    0: kd> lmvm 8b5cc000 

    start    end        module name


    Checking the import table it does have calls to raise and lower the IRQL. We get to the import table by taking the base address plus the offset to the Import Address Table Directory.  The below output is just a snippet of the whole table.

    0: kd> dps 8b5cc000+c000

    8b5d8000  80a603f4 hal!KfLowerIrql

    8b5d8004  80a5ff00 hal!KeGetCurrentIrql

    8b5d8008  80a600b4 hal!KfRaiseIrql

     

    Dumping the entire image contents in memory to find more clues about this driver.  Again, I am only showing snippets of the whole output.

    0: kd> dc 8b5cc000  L?10000/4

    8b5cc000  00905a4d 00000003 00000004 0000ffff  MZ..............

    8b5cc010  000000b8 00000000 00000040 00000000  ........@.......

    8b5cc020  00000000 00000000 00000000 00000000  ................

    8b5cc030  00000000 00000000 00000000 000000d0  ................

    8b5cc040  0eba1f0e cd09b400 4c01b821 685421cd  ........!..L.!Th

    8b5cc050  70207369 72676f72 63206d61 6f6e6e61  is program canno

    8b5cc060  65622074 6e757220 206e6920 20534f44  t be run in DOS

    8b5cc070  65646f6d 0a0d0d2e 00000024 00000000  mode....$.......

    8b5d5e20  76000000 66697265 77252079 00000a5a  ...verify %wZ...

    8b5d5e30  31000000 35343332 39383736 33323130  ...1234567890123

    8b5d5e40  37363534 31303938 35343332 39383736  4567890123456789

    8b5d5e50  33323130 37363534 31303938 35343332  0123456789012345

    8b5d5e60  39383736 33323130 37363534 31303938  6789012345678901

    8b5d5e70  00343332 66000000 646e756f 67697320  234....found sig

    8b5d5e80  7574616e 3d206572 0a752520 b4000000  nature = %u.....   

     

    Interestingly this image has other images (modules) embedded in it.  We can see other PE headers, which again have no name.  This looks highly suspicious and resembles behavior used by malicious software.

    8b5d0970  f775c085 4848c78b 4dc35f5e 0300905a  ..u...HH^_.MZ...

    8b5d0980  04000000 ff000000 b80000ff 00000000  ................

    8b5d0990  40000000 00000000 00000000 00000000  ...@............

    8b5d09a0  00000000 00000000 00000000 00000000  ................

    8b5d09b0  00000000 c8000000 0e000000 000eba1f  ................

    8b5d09c0  21cd09b4 cd4c01b8 69685421 72702073  ...!..L.!This pr

    8b5d09d0  6172676f 6163206d 746f6e6e 20656220  ogram cannot be

    8b5d09e0  206e7572 44206e69 6d20534f 2e65646f  run in DOS mode.

    8b5d2200  4d00004e 0300905a 04000000 ff000000  N..MZ...........

    8b5d2210  b80000ff 00000000 40000000 00000000  ...........@....

    8b5d2220  00000000 00000000 00000000 00000000  ................

    8b5d2230  00000000 00000000 00000000 f0000000  ................

    8b5d2240  0e000000 000eba1f 21cd09b4 cd4c01b8  ...........!..L.

    8b5d2250  69685421 72702073 6172676f 6163206d  !This program ca

    8b5d2260  746f6e6e 20656220 206e7572 44206e69  nnot be run in D

    8b5d2270  6d20534f 2e65646f 240a0d0d 00000000  OS mode....$....

     

    After ensuring my AV definitions were up to date, I decided to dump this memory contents into a file onto my machine. The moment contents were written my AV Microsoft Forefront Endpoint Protection caught a Trojan in this file.

    0: kd> .writemem c:\temp\drv.sys 8b5cc000   L?10000
    Writing 10000 bytes................................

       
    image001

     

    image003


    Before any further trouble shooting we asked the customer to clean the Trojan. Once we did that bug checks disappeared.


    Hope you enjoyed reading this post.
  • Ntdebugging Blog

    Updated Archive of the NtDebugging Twitter Debug Tips

    • 0 Comments

    Every Wednesday (usually) we post a debug tip to our twitter page at https://twitter.com/#!/ntdebugging. This blog is an archive of these tips to allow our readers to find this information easily. Periodically we post an updated blog with the current archive. Follow us on twitter if you want to see the new tips as we post them.

    The goal of these tips is to share debug commands, and forms of commands (parameters, flags, etc) that we in Platforms Global Escalation Services find useful. I hope you can add these commands to your toolkit and they will help you debug more efficiently.

    Tips:

    !thread/!process [address] e - on x64 will not show you the meaningless Args to Child information.

    .frame /c [FrameNumber] - sets context to specificied stack frame. On x64 provides more reliable register information than .trap.

    kn - Dumps call stack with frame numbers, easier than counting stacks for .frame.

    .frame /r [FrameNumber] - same as .frame /c, but shows registers without changing context.

    Note: With .frame /c or /r you can only trust the nonvolatile registers. See http://bit.ly/dik4OR for vol/nonvol regs.

    k=rbp rip FrameCount - Dumps call stack starting at rbp/rip on x64. Useful when the stack is corrupt.

    .process/.thread /p /r [address] - sets new process context, sets .cache forcedecodeuser, and reloads user symbols.

    !process [address] 17 - Sets the context for this command, avoids the need for .process to see user stacks. Try !process 0 17

    ~~[ThreadID]s - Changes threads in user mode. Use Thread ID number from output such as !locks. Ex: ~~[1bd4]s

    runas /netonly /u:<account> windbg.exe - Launch windbg with domain account. Use when dbg computer isn't in domain and symbol server is.

    !heap -p -a <address> - Shows information about the heap block containing <address>, even if you aren't using pageheap.

    ub - Unassembles starting at a location prior to your address. Accepts l<number> to specify how many instructions to go back. ub . l20

    !stacks 2 [FilterString] - Finds kernel mode call stacks that contain the FilterString in a symbol.

    !thread [address] 17 (or 1e on x64) - Sets context for this command, avoids the need for .thread/.process for user stacks.

    .hh [Text] - Opens the debugger help. [Text] is the topic to lookup in the index. Example: .hh !pte

    ?? can dump structs using C++ style expressions. Ex: ??((nt!_KTHREAD*)(0xfffffa800ea43bb0))->ApcState

    bp /t EThread - Sets a kernel mode breakpoint that only triggers when hit in the context of this thread.

    bp /p EProcess - Sets a kernel mode breakpoint that only triggers when hit in the context of this process.

    gc - If you run 'p' and hit a breakpoint, gc takes you where p would have gone if you had not hit the bp.

    gu - Go until the current function returns. Effectively this unwinds one stack frame. #windbg

    pc - Steps through until the next 'call' instruction. Combine with other commands to find who returned your error> pc;p;r eax

    pt - Steps through until the next 'ret' instruction. Similar to gu, but pt stops on the ret and gu stops after the ret.

    .ignore_missing_pages 1 - supresses the error: "Page 2a49 not present in the dump file. Type ".hh dbgerr004" for details"

    .exr -1 shows the most recent exception. Useful in user dumps of crashes, especially for no execute crashes (NX/DEP).

    wt - Trace calls until they return to the current address. More useful with -or to get return values. Use -l for depth.

    .thread /w - Changes to the WOW64 32-bit context from 64-bit kernel mode. Wow64exts doesn't work in kernel mode.

    ??sizeof(structure) - Gets the size of a structure, it's easier than counting.

    sxe ld:module.dll - Enables an exception which will break into the debugger when module.dll is loaded.

    vertarget - Shows OS version of the debug target. Also shows machine name, uptime, and session time (when the dump was taken).

    !vm 1 - In a kernel debugger, shows basic information about memory usage. Available, committed, pagefile, pool, sysptes, etc.

    .time - Shows session time (when dump was taken) and system uptime. In user mode shows process uptime, kernel/user time.

    ba w size [address] - Break on write access only. Replace size with the num bytes you want to watch. Ex: ba w 4 005d5f10

    .process -i <address> - Make the process active and break into the debugger. Use in live kernel debugs to get into process context.

    .reload /f /o - Overwrites cached files in your downstream symbol store. Useful when your store has corrupt pdbs.

    ->* - Use with dt to dump pointers. Example: dt _EPROCESS [Address] ObjectTable->*

    !for_each_module s -a @#Base @#End "PTag" - Find the drivers using pool tag "PTag".

    .unload [DllName] - Unloads the debug extension you didn't intend to load. Omit DllName to unload the last dll loaded.

    !exqueue dumps the executive worker queues. Use flags 7f to dump the worker threads and the queues.

    lmvm <module> - Dumps information about the module. Remember to use <module> and not <module.dll>.

    !thread -t TID - Dump a thread using thread ID rather than thread address. Useful when working with a critical section.

    !list - Walks a linked list and displays informatino for each element in a list. See blog later today for an example.

    .time -h # - Shows the debug session time using the timezone offset of #. Ex: .time -h 0 shows when a dump was taken in UTC.

    !session - Lists all of the user session IDs. A quick way to list the active sessions from a dump of a terminal server.

    !session -s SessionID - Changes the current session context to SessionID. Useful when looking at GDI, or other per session data.

    | ProcNum s - Switches to process number ProcNum. Use when debugging multiple dumps, or processes, in one windbg.

    !! - Launches a shell process and redirects its output to the debugger. The same as .shell, but "bang bang" sounds cooler.

    uf Function - Dumps the assembly for Function (name or address). Useful for optimized code that is not contiguous in memory.

    uf /c Function - Shows all of the calls made by Function (can be function name or address).

    !wow64exts.sw - switches between x64 and x86 contexts. Often used to reverse .thread /w context switch.

  • Ntdebugging Blog

    Troubleshooting Memory Leaks With Just a Dump

    • 3 Comments

    Hello debuggers, the debug ninja is back again.  Sometimes we have a scenario where a process is using a lot of memory, and the only data we are able to get at the moment is a user dump.  Ordinarily data from tools such as umdh or xperf would be preferable because they provide memory usage data over a period of time and can include call stack information. However, umdh requires restarting the process (which loses the state of high memory usage), and xperf requires the installation of the Windows Performance Toolkit which may not always be an immediate option.

     

    When we have such a dump we may not be able to specifically identify what piece of code is generating the high memory usage, but we may be able to narrow the scope of troubleshooting to a specific dll.

     

    The first thing we need to do is identify what type of memory is using most of the address space.  The debugger command !address –summary allows us to do this: 

    0:000> !address -summary

     

    --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal

    Free                                    489      7fe`6ff5a000 (   7.994 Tb)           99.92%

    Heap                                   9094        1`75ed1000 (   5.843 Gb)  93.47%    0.07%

    <unknown>                               275        0`12e41000 ( 302.254 Mb)  4.72%    0.00%

    Image                                   937        0`05a6a000 (  90.414 Mb)  1.41%    0.00%

    Stack                                   138        0`01700000 (  23.000 Mb)  0.36%    0.00%

    Other                                    14        0`001bd000 (   1.738 Mb)  0.03%    0.00%

    TEB                                      46        0`0005c000 ( 368.000 kb)   0.01%   0.00%

    PEB                                       1        0`00001000 (   4.000 kb)  0.00%    0.00%

     

    From this example we can see that most of the memory is used by heap.  A process will usually have multiple heaps, each created by a call to HeapCreate.  We can examine the size of each of these heaps with !heap –s: 

    0:000> !heap -s

    LFH Key                   : 0x0000006c1104d280

    Termination on corruption : ENABLED

              Heap     Flags  Reserv  Commit  Virt  Free  List   UCR Virt  Lock  Fast

                                (k)     (k)   (k)     (k) length      blocks cont. heap

    -------------------------------------------------------------------------------------

    0000000000100000 00000002  16384  12824  16384  1180   254     5   0      3   LFH

    0000000000010000 00008000     64      4     64     1     1     1   0      0     

    00000000003d0000 00001002   1088    708   1088   121    20     2   0      0   LFH

    0000000003080000 00001002   1536    700   1536     4     4     2   0      0   LFH

    00000000033a0000 00001002 5229696 1377584 5229696 414244  4039  3059   0     2c   LFH

        External fragmentation  30 % (4039 free blocks)

        Virtual address fragmentation  73 % (3059 uncommited ranges)

    0000000003380000 00001002     64      8     64     3     1     1   0      0     

    0000000003600000 00001002    512     56    512     3     1     1   0      0     

    0000000003c20000 00001002    512      8    512     3     1     1   0      0     

    0000000003220000 00001002    512      8    512     3     1     1   0      0     

    0000000003e50000 00001002    512      8    512     3     1     1   0      0     

    0000000003d00000 00001002    512    148    512     5     3     1   0      0   LFH

     

    From the above output we can see that most of the memory is being used by heap 00000000033a0000.

     

    At this point we need to try to identify what this heap is used for.  A brute force method to do this is to search memory with the ‘s’ command. 

    0:000> s -q 0 l?7fffffffffffffff 00000000033a0000

    <snip>

    000007fe`f21810a0  00000000`033a0000 00000000`00000001

     

    The output of the ‘s’ command may be verbose.  You will need to manually examine the addresses where ‘s’ finds hits.  Most of these addresses will probably be in heap memory, we are looking for an address that matches a module.  I snipped the above output to just show the relevant hit, an address inside of a loaded module.

     

    The search of memory revealed that the heap 00000000033a0000 is used by the module useheap.dll, specifically it is part of the global ‘Blob’ class. 

    0:000> ln 000007fe`f21810a0

    (000007fe`f21810a0)  useheap!Blob::m_Heap

     

    At this point we don’t know specifically what code in useheap.dll has allocated a lot of heap, however we have significantly narrowed the scope of the problem.  We can now determine if there is a known issue with heap usage in useheap.dll that is addressed in a later version.  We may also know from experience that this module uses a lot of memory under specific circumstances, such as a high volume of work sent to this service.

     

    I hope this example helps the next time you have high memory usage and only have a user dump to troubleshoot with.  Good luck!

  • Ntdebugging Blog

    How the Clipboard Works, Part 2

    • 1 Comments

    Last time, we discussed how applications place data on the clipboard, and how to access that data using the debugger. Today, we'll take a look at how an application can monitor the clipboard for changes.  Understanding this is important because it is a place where Windows allows 3rd-party code to "hook" into the system.  If you experience unexpected behavior with copying and pasting, a program using these hooks may be misbehaving.  We’ll start by covering the hooking mechanisms for clipboard, and then review how to identify what applications, if any, are using these hooks in the debugger.

     

    There are three ways to monitor the clipboard for changes - clipboard viewers, clipboard format listeners, and querying the clipboard sequence number.  We will focus on the first two as these allow an application to register for notifications whenever the clipboard is updated.  The third method simply allows an application to check and see if a change has occurred, and should not be used in a polling loop.

     

    The Clipboard Viewer functionality has been around since Windows 2000, if not earlier.  The way it works is pretty simple - an application interested in receiving clipboard change notifications calls SetClipboardViewer and passes a handle to its window.  Windows then stores that handle in a per-session win32k global, and anytime the clipboard is changed Windows sends a WM_DRAWCLIPBOARD message to the registered window.

     

    Of course, multiple applications are allowed to register their windows as clipboard viewers - so how does Windows handle that?  Well, if an application calls SetClipboardViewer and another clipboard viewer was already registered, Windows returns the handle value of the previous viewer's window to the new viewer.  It is then the responsibility of the new viewer to call SendMessage every time it receives a WM_DRAWCLIPBOARD to notify the next viewer in the chain. Each clipboard viewer also needs to handle the WM_CHANGECBCHAIN message, which notifies  all viewers when one of the viewers in the chain is removed, and specifies what the next viewer in the chain is.  This allows the chain to be maintained.

     

    An obvious problem with this design is it relies on each clipboard viewer application to behave correctly, not to terminate unexpectedly, and to generally be a good citizen.  If any viewer decided not to be friendly, it could simply skip notifying the next viewer in line about an update, rendering the next viewer and all subsequent viewers impotent.

     

    To address these problems, the Clipboard Format Listener mechanism was added in Windows Vista. This works in much the same way as the clipboard viewer functionality except in this case Windows maintains the list of listeners, instead of depending on each application to preserve a chain.

     

    If an application wishes to become a clipboard format listener,it calls the AddClipboardFormatListener function and passes in a handle to its window. After that, its window message handler will receive WM_CLIPBOARDUPDATE messages.  When the application is ready to exit or no longer wishes to receive notifications, it can call RemoveClipboardFormatListener.

     

    Now that we've covered the ways to register a viewer/listener, let's take a look at how to identify them using the debugger.  First, you'll need to identify a process in the session you are interested in checking for clipboard monitors.  It can be any win32 process in that session -we just need to use it to locate a pointer to the Window Station.  In this case, I'll use the Notepad window I used in part 1: 

    kd> !process 0 0 notepad.exe

    PROCESS fffff980366ecb30

        SessionId: 1  Cid: 0374   Peb: 7fffffd8000  ParentCid: 0814

        DirBase: 1867e000  ObjectTable: fffff9803d28ef90  HandleCount: 52.

        Image: notepad.exe

     

    If you are doing this in a live kernel debug, you'll need to change context into the process interactively (using .process /I< address> then hit g and wait for the debugger to break back in).  Now DT the process address as an _EPROCESS and look for the Win32Process field: 

    kd> dt _EPROCESS fffff980366ecb30 Win32Process

    nt!_EPROCESS

       +0x258 Win32Process : 0xfffff900`c18c0ce0 Void

     

    Now DT the Win32Process address as a win32k!tagPROCESSINFO and identify the rpwinsta value: 

    kd> dt win32k!tagPROCESSINFO 0xfffff900`c18c0ce0 rpwinsta

       +0x258 rpwinsta : 0xfffff980`0be2af60 tagWINDOWSTATION

     

    This is our Window Station. Dump it using dt: 

    kd> dt 0xfffff980`0be2af60 tagWINDOWSTATION

    win32k!tagWINDOWSTATION

       +0x000 dwSessionId      : 1

       +0x008 rpwinstaNext     : (null)

       +0x010 rpdeskList       : 0xfffff980`0c5e2f20 tagDESKTOP

       +0x018 pTerm            : 0xfffff960`002f5560 tagTERMINAL

       +0x020 dwWSF_Flags      : 0

       +0x028 spklList         : 0xfffff900`c192cf80 tagKL

       +0x030 ptiClipLock      : (null)

       +0x038 ptiDrawingClipboard: (null)

       +0x040 spwndClipOpen    : (null)

       +0x048 spwndClipViewer  : 0xfffff900`c1a4ca70 tagWND

       +0x050 spwndClipOwner   : 0xfffff900`c1a3ef70 tagWND

       +0x058 pClipBase        : 0xfffff900`c5512fa0 tagCLIP

       +0x060 cNumClipFormats  : 4

       +0x064 iClipSerialNumber : 0x16

       +0x068 iClipSequenceNumber : 0xc1

       +0x070 spwndClipboardListener : 0xfffff900`c1a53440 tagWND

       +0x078 pGlobalAtomTable: 0xfffff980`0bd56c70 Void

       +0x080 luidEndSession   : _LUID

       +0x088 luidUser         : _LUID

       +0x090 psidUser         : 0xfffff900`c402afe0 Void

     

    Note the spwndClipViewer, spwndClipboardListener, and spwndClipOwnerfields.  spwndClipViewer is the most-recently-registered window in the clipboard viewer chain.  Similarly, spwndClipboardListener is the most recent listener in our Clipboard Format Listener list.  spwndClipOwner is the window that set the content in the clipboard.

     

    Given the window, it is just a few steps to determine the process.  This would work forspwndClipViewer, spwndClipboardListener, and spwndClipOwner.  First, dt the value as a tagWND.  We'll use the spwndClipViewer for this demonstration: 

    kd> dt 0xfffff900`c1a4ca70 tagWND

    win32k!tagWND

       +0x000 head             : _THRDESKHEAD

       +0x028 state            : 0x40020008

       +0x028 bHasMeun         : 0y0

       +0x028 bHasVerticalScrollbar : 0y0

     

    We only care about the head - so since it is at offset 0, dt the same address as a _THRDESKHEAD: 

    kd> dt 0xfffff900`c1a4ca70 _THRDESKHEAD

    win32k!_THRDESKHEAD

       +0x000 h                : 0x00000000`000102ae Void

       +0x008 cLockObj         : 6

       +0x010 pti              : 0xfffff900`c4f26c20tagTHREADINFO

       +0x018 rpdesk           : 0xfffff980`0c5e2f20 tagDESKTOP

       +0x020 pSelf            : 0xfffff900`c1a4ca70  "???"

     

    Now, dt the address in pti as a tagTHREADINFO: 

    kd> dt 0xfffff900`c4f26c20 tagTHREADINFO

    win32k!tagTHREADINFO

       +0x000 pEThread         : 0xfffff980`0ef6cb10 _ETHREAD

       +0x008 RefCount         : 1

       +0x010 ptlW32           : (null)

       +0x018 pgdiDcattr       : 0x00000000`000f0d00 Void

     

    Here, we only care about the value of pEThread, which we can pass to !thread: 

    kd> !thread 0xfffff980`0ef6cb10 e

    THREAD fffff9800ef6cb10 Cid 087c.07ec  Teb: 000007fffffde000 Win32Thread: fffff900c4f26c20 WAIT: (WrUserRequest) UserModeNon-Alertable

        fffff9801c01efe0  SynchronizationEvent

    Not impersonating

    DeviceMap                 fffff980278a0fc0

    Owning Process            fffff98032e18b30       Image:         viewer02.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount     5435847        Ticks: 33 (0:00:00:00.515)

    Context Switch Count     809            IdealProcessor: 0                 LargeStack

    UserTime                  00:00:00.000

    KernelTime                00:00:00.062

    Win32 Start Address 0x000000013f203044

    Stack Init fffff880050acdb0 Current fffff880050ac6f0

    Base fffff880050ad000 Limit fffff880050a3000 Call 0

    Priority 11 BasePriority 8 UnusualBoost 0 ForegroundBoost 2IoPriority 2 PagePriority 5

    Child-SP          RetAddr           Call Site

    fffff880`050ac730 fffff800`01488f32 nt!KiSwapContext+0x7a

    fffff880`050ac870 fffff800`0148b74f nt!KiCommitThreadWait+0x1d2

    fffff880`050ac900 fffff960`000dc8e7 nt!KeWaitForSingleObject+0x19f

    fffff880`050ac9a0 fffff960`000dc989 win32k!xxxRealSleepThread+0x257

    fffff880`050aca40 fffff960`000dafc0 win32k!xxxSleepThread+0x59

    fffff880`050aca70 fffff960`000db0c5 win32k!xxxRealInternalGetMessage+0x7dc

    fffff880`050acb50 fffff960`000dcab5 win32k!xxxInternalGetMessage+0x35

    fffff880`050acb90 fffff800`01482ed3 win32k!NtUserGetMessage+0x75

    fffff880`050acc20 00000000`77929e6a nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`050acc20)

    00000000`002ffb18 00000000`00000000 0x77929e6a

     

    As you can see, we have a clipboard viewer registered from process viewer02.exe.  Because of viewer's process-maintained chain architecture, it isn't easy to see the next process in the chain.  However, we can do this for clipboard listeners.  Let's look back at our window station: 

    kd> dt 0xfffff980`0be2af60 tagWINDOWSTATION

    win32k!tagWINDOWSTATION

       +0x000 dwSessionId      : 1

       +0x008 rpwinstaNext     : (null)

       +0x010 rpdeskList       : 0xfffff980`0c5e2f20 tagDESKTOP

       +0x018 pTerm            : 0xfffff960`002f5560 tagTERMINAL

       +0x020 dwWSF_Flags      : 0

       +0x028 spklList         : 0xfffff900`c192cf80 tagKL

       +0x030 ptiClipLock      : (null)

       +0x038 ptiDrawingClipboard : (null)

       +0x040 spwndClipOpen    : (null)

       +0x048 spwndClipViewer  : 0xfffff900`c1a4ca70tagWND

       +0x050 spwndClipOwner   : 0xfffff900`c1a3ef70tagWND

       +0x058 pClipBase        : 0xfffff900`c5512fa0 tagCLIP

       +0x060 cNumClipFormats  : 4

       +0x064 iClipSerialNumber : 0x16

       +0x068 iClipSequenceNumber : 0xc1

       +0x070 spwndClipboardListener: 0xfffff900`c1a53440 tagWND

       +0x078 pGlobalAtomTable: 0xfffff980`0bd56c70 Void

       +0x080 luidEndSession   : _LUID

       +0x088 luidUser         : _LUID

       +0x090 psidUser         : 0xfffff900`c402afe0 Void

     

    If we dt the spwndClipboardListener, there is a field that shows the next listener named spwndClipboardListenerNext: 

    kd> dt 0xfffff900`c1a53440 tagWND spwndClipboardListenerNext

    win32k!tagWND

       +0x118 spwndClipboardListenerNext : 0xfffff900`c1a50080 tagWND

     

    When you reach the last clipboard format listener's tagWND, its spwndClipboardListenerNext value will be null: 

    kd> dt 0xfffff900`c1a50080 tagWND spwndClipboardListenerNext

    win32k!tagWND

       +0x118 spwndClipboardListenerNext : (null)

     

    Using this window address, we can go through the same steps as above to identify this listener's process name. As mentioned earlier, since tagWND is a kernel structure, the OS is maintaining these spwndClipboardListener/spwndClipboardListenerNext pointers, so they aren't susceptible to the chain problems of clipboard viewers.

     

    That wraps up our clipboard coverage. I hope you found it informative.  Want to learn more about monitoring the clipboard?  This MSDN article is a good resource.

     

    -Matt Burrough

Page 5 of 24 (238 items) «34567»