• Ntdebugging Blog

    NTDebuggers Debug Puzzler 0x00000001 "Where did my process go?"

    • 10 Comments

    Hello NTDebuggers, in the spirit of Click and Clack (The Tappet brothers), a favorite troubleshooting show of mine, we thought it would be fun to offer up some Debug puzzlers for our readers.

     

    That said, this week’s Debug Puzzler is in regard to Dr. Watson.  I’m sure most of you have seen Dr. Watson errors.  This typically means your application has crashed due to an unhandled exception.  Sometimes however the process just seems to disappear.  The Just-in-Time (JIT) debugging options configured via the AEDebug key does not catch the crash…  Does anyone know why this may happen?  

     

    We will post reader’s comments as they respond during the week, and next Monday will post our answer and recognize some of the best answers we received from our readers.

     

    Good luck and happy debugging!

     

    -  Jeff Dailey


    [Update: our answer, posted 4/11/2008]

    Hello NTDebuggers.  Let me start of by saying that we were very impressed by our reader’s answers.    Our two favorite answers were submitted by Skywing and molotov.

    When a thread starts, the ntdll Run Time Library (RTL) for the process inserts an exception hander before it calls the BaseThreadInit code to hand control over to the executable or DLL running in the process (notepad in the example below).  If anything goes wrong with the chain of exception handlers, the process can’t make it back to the RTL exception handler and the process will simply terminate.  See http://www.microsoft.com/msj/0197/Exception/Exception.aspx for details.

    ChildEBP RetAddr 

    000ef7ac 75fbf837 ntdll!KiFastSystemCallRet

    000ef7b0 75fbf86a USER32!NtUserGetMessage+0xc

    000ef7cc 00b21418 USER32!GetMessageW+0x33

    000ef80c 00b2195d notepad!WinMain+0xec

    000ef89c 76e24911 notepad!_initterm_e+0x1a1

    000ef8a8 7704e4b6 kernel32!BaseThreadInitThunk+0xe

    000ef8e8 7704e489 ntdll!__RtlUserThreadStart+0x23  << Exception Handler is inserted here.

    000ef900 00000000 ntdll!_RtlUserThreadStart+0x1b

     

    Secondly, the process that crashes is actually responsible for starting the debugger via the RTL exception handler.   The debugger is registered under the AeDebug registry key.    Even if you are able to unwind to the RTL exception handler you may still run into trouble.  If the computer is low on system resources such as desktop heap, you may not be able to create a new process and thus will not be able to launch the debugger.   As SkyWing stated, it’s a relatively heavyweight operation.   Applications may also call TerminateProcess from within their own code based on an error condition.    If we have a customer that sees this symptom on a regular basis we typically recommend having them attach a debugger to monitor the process.   Simply run via ADPLUS -crash  -p (PROCESSID).

    Good work folks!  We’ll have another puzzler ready next Monday.

    Good Luck and happy debugging!

    - Jeff

  • Ntdebugging Blog

    Windows Performance Toolkit - Xperf

    • 12 Comments

    I recently came across a very interesting profiling tool that is available in Vista SP1 and Server 08 called the Windows Performance Analyzer. You can use this tool to profile and diagnose different kinds of symptoms that the machine is experiencing. This tool is built on top off the Event Tracing for Windows (ETW) infrastructure. It uses the ETW providers to record kernel events and then display them in a graphical format.

    Performance Analyzer provides many different graphical views of trace data including:

    1. CPU Sampling
    2. CPU and Disk utilization by process and thread
    3. Interrupt service routine and deferred procedure call
    4. Hard faults
    5. Disk I/O Detail

    What you will need

    Download the latest version of the Windows Performance Tools Kit, and install it on your machine. (http://www.microsoft.com/whdc/system/sysperf/perftools.mspx : Windows Performance Tools Kit, v.4.1.1 (QFE)) You will need to find the toolkit that corresponds to your processor architecture. Currently there are 3 versions available i.e. X86, IA64, X64.

    After installation you should be able to see 2 new tools. The first one is Xperf, which is a command line tool that is used to capture the trace. The second is called XperfView, which graphically interprets the trace that has been collected by Xperf.

    You will need to run the Xperf and XperfView from an elevated command prompt for all functionality.

    Starting a Trace

    For many tasks all you need for effective analysis is a kernel trace.  For this example, we'll use the –on DiagEasy parameter to enable several kernel events including: image loading; disk I/O; process and thread events; hard faults; deferred procedure calls; interrupts; context switches; and, and performance counters.  From an elevated command prompt launch xperf –on DiagEasy.

    This starts the kernel logger in sequential mode to the default file "\kernel.etl"; uses a default buffer size of 64K, with a minimum of 64 and a maximum of 320 buffers. 

    Stopping a Trace

    To stop a trace, type xperf –d <filename>.etl at the command line. This will stop the trace and output the file.

    Viewing a Trace

    There are 2 ways to view the trace. From an Elevated Command prompt, launch xperf <filename>.etl, or
    launch the XperfView tool and open the file manually. When you open the trace file, you should see something similar like this.

    NOTE - While you need to run xperf from an elevated command prompt in order to record a trace you do not need an elevated command prompt in order to *analyze* a trace.  

    Using the Chart Selector tab, you can select all the graphs that you want to look at. To drill down in each chart, you can select the Summary table. For instance, in the CPU Sampling chart, the summary table gets you the summary of the processes that were running, with information like the amount of CPU time, CPU %, stacks (if the stacks were collected in the trace, see below). When looking at the Summary table for the Disk I/O chart, you can see which processes were writing files (the filename too!) to disk, as well as how much time it took.

    You also have the ability to zoom in on a selected area. Another really cool feature is the ability to overlay multiple graphs on one frame. This way you can correlate different pieces of data together very easily.

    Also, you select which counter instances you want to see in each specific chart. On the top right corner of each chart is a drop down box from where you can select the counter instances. For instance on the Disk I/O chart, you can select Disk 0, Disk 1, or a combination as well.

    You can also view detailed information about the system that the trace was taken on. Click on the Trace menu item, and select System Configuration.

    Xperf Kernel Flags and Groups

    In the first sample Xperf command we ran, xperf –on DiagEasy. I am sure many of you were wondering what DiagEasy means. DiagEasy is a group of kernel events that are predefined by the Windows Performance Toolkit. This group includes Process, Threads, Kernel and User Image Load/Unload, Disk I/O, DPCs and Context Switch events.

    When we used the xperf –on DiagEasy command, we did not specify an individual provider, so we enabled the kernel events for all the ETW providers on the system. If you want to enable events for a specific provider, you can the following format xperf -on: (GUID|KnownProviderName)[:Flags[:Level]]. For more information about ETW providers, Kernel Flags and Groups, you can run the xperf –help providers command.

    Stack Tracing

    One of the most powerful features in Performance Analyzer is the ability to visualize stacks. It's important to note that this requires no special instrumentation in the code – only that you have symbols for the binary components you are interested in analyzing.

    When the trace is setup to collect the stacks, Performance Analyzer will display call stack summary information for the events that had stack walking enabled. Here is an example that takes a trace (with stack tracing enabled) of the entire system while running a "find string" utility.. We can use the Stack Tracing feature of Xperf to record a stack when certain events happen, or take sample at regular intervals over time. See xperf –help stackwalk output for more info.

    Below, we will use the Stack Tracking feature of Xperf to take stack samples at regular intervals. With this output, we will be able to determine where the CPU is spending most of its time within a process.

    xperf -on latency -stackwalk Profile

    Latency is the kernel group to enable certain events, including the profile event which records the CPUs' activity every millisecond. The "-stackwalk Profile" flag tells Xperf to record stack walks on every profile event, which makes the profile information much more useful. In other words, in order to get profile information with stack walks you need to turn on the profile event, and turn on stack walking for that event.

    Note that decoding of stacks requires that symbol decoding be configured. However stacks can be recorded without symbols, and can even be viewed without symbols, although they are much less useful without symbols. I only mention this in the event you're trying to record a trace of a problematic machine with little time to mess around with _NT_SYMBOL_PATH.

    To get a trace with the stack information, do the following:

    1. Run the command, xperf -on latency -stackwalk Profile.
    2. Run the application that you wish to analyze, then
    3. You can end the trace by using the regular xperf –d <filename>.etl command.
    4. Load the trace in the viewer and then select the Load Symbols from the Trace drop down menu.
    5. Once the symbols are loaded, select an interesting region on the CPU sampling graph.
    6. Right click and select Summary Table.

    Click on the selector tab to bring up the column chooser list. Then select "Process name", "Process", "Stack", "Weight" and "%Weight". These are the most useful columns when looking at stacks from the sample profile event. You should get a view similar to this.

     

    At this point I need to mention a few of the restrictions with stack walking coupled with when and how it works.

    ·                Xperf stack walking is not available on XP

    ·                On Vista stack walking is available for x86, and is available for x64 as of Vista SP1.

    ·                On Windows 7 stack walking is available.

    ·                Stack walking on x64 is complicated. You have to set DisablePagingExecutive in the registry, as documented here:

    http://social.msdn.microsoft.com/Forums/en-US/wptk_v4/thread/282e5beb-0afd-411b-9a81-a0bbbf569dfe/

    REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD –f

    Sample Case

    I recently came across a case where the customer was complaining that DPC processing was taking up too much CPU time. We ran Xperf on the machine and drilled down into the DPC activity on the machine.

    From the Xperf graph, I was able to confirm that the customer was actually seeing high DPC usage. I selected the Summary for this chart, and got the list of drivers that were actually taking up CPU time.

    Right off the bat, I could identify the driver that had a lot of DPC activity. I also noted that the average duration for each DPC from that driver was taking 788 microseconds. This is way too high. Each DPC should be taking a maximum of 100 microseconds.

     

    - Omer

      

    Reference

    Performance.Analyzer.QuickStart.xps – This is shipped with the performance toolkit.

    From an elevated command prompt, launch xperf -help

    http://www.microsoft.com/whdc/system/sysperf/perftools.mspx

    http://blogs.gotdotnet.com/pigscanfly/archive/2008/02/16/using-xperf-to-take-a-trace.aspx

    http://www.microsoft.com/whdc/Driver/tips/DPC_ISR.mspx

  • Ntdebugging Blog

    The Case of the Low Hanging Filter Driver Fruit

    • 1 Comments

     

    Written By Jeff Dailey:

     

    Not all our cases are crashes, leaks, or high CPU.  Sometimes the problems we are faced with are purely a question of why a given application runs slow on a particular version of Windows versus another version of windows.  In other cases an application may just start running slow for no reason.   OK, not likely.  There is ALWAYS SOME RESASON.  Something changed!  In this case, the customer reported that an application started running slow when booted into “Normal Mode”, but when the OS was booted in safe mode, the application would run fast.  In this particular case the customer reported that a given operation went from taking just a few seconds (safe mode) to several minutes (normal mode).   Further research found that the problem was related to accessing the registry and registry performance in general.  At this point I’m already thinking, “Registry Access?” and “Safe Mode”.  What could affect registry access that does not run in safe mode?  Well lots of services DO NOT start in safe mode.  What kind of services could affect registry calls?  Antivirus?  Maybe…  Let’s look deeper.

     

    One of the first things I typically do in such cases is to ask for a kernrate log of the slow and fast scenario. http://download.microsoft.com/download/8/e/c/8ec3a7d8-05b4-440a-a71e-ca3ee25fe057/rktools.exe  Kernrate is a sampling profiler.   It basically checks the location of the instruction pointer at regular intervals and stores the results in a hash table.   We can then get a breakdown of the %time spent in each module that is executing.    Even better you can zoon in to each module.  Zooming in shows utilization at a function level within the module and requires symbols to be present in a flat symbol directory on the machine being profiled.  I recommend downloading the symbol pack for this (http://www.microsoft.com/whdc/devtools/debugging/symbolpkg.mspx) or use symchk.exe (included in the debugging tools) to download the symbols.  We’ll talk more about symbols and symchk.exe in an upcoming post.

     

    In a lot of cases kernrate data is only a starting point.   We will find some code that is running a lot longer in one case verses another and that in turn requires a follow up code review and multiple debugs to further isolate the problem.  This case however was different.  The following is output from beyond compare that shows a comparison between the module execution time in kernel.   The slow test run is on the right, and the fast test run is on the left.  Keeping in mind that I was looking for something different between safe mode and normal mode, I simply started by looking at the modules listed on the slow side (Right) there were not on the fast side (Left).   What was loaded during the normal run that was not a factor during the safe mode run.  Right away FILTERDRVXYZ just above the HAL jumped off the page.  (Some names were changed to protect the innocent. J)  I did a http://www.live.com search to find out what this driver was.  It was a file system filter driver for an antivirus program.

     

    To understand why a filter driver can slow things down you need to understand that a filter driver basically installs itself between the application and the underlying file system and intercepts nearly every call being made to the underlying I/O subsystem.   Most of the time filter drivers are just fine and run with no problems at all.   However, what do they do?  An antivirus filter driver may have several tasks that it needs to do for each I/O.  This may involve looking up information about a process or file when it’s being accessed.   If the data source associated with this information changes it can change the amount of time that these operations take.   The interesting thing is that the timing change may only be a hundred or so extra milliseconds per call to the registry or file system.  Under most cases a user would not even notice this.  If however you have some aspect of your application needs to make many thousands of calls and they each take 250+ milliseconds vs. 10 milliseconds, all of a sudden the problem compounds.  The classic example of this is a VIRUS PROFILE gets uploaded or pushed to servers and all of a sudden the customer starts seeing performance problems because the data set or comparison for the filter driver changes.   In this case I simply recommended that the customer remove the suspect filter driver to see if it was the root of the problem.  IT WAS…  They contacted the product vendor to get an update and the case was resolved.

     

    Good luck and happy debugging….

  • Ntdebugging Blog

    Debugging a bluescreen at home

    • 15 Comments

     

    Hi, my name is Chad. I work as an escalation engineer for Microsoft’s OEM support team.

     

    A while back, I encountered an interesting crash on one of my computers at home, and I thought I’d post about how I debugged it.

     

    This particular machine had been humming along quite happily for some time, but one day while I was scanning some photos, it bluescreened. Naturally, I hoped it was just a fluke, but after it happened a few more times while doing the same thing, I decided to debug it.

     

    Ordinarily, if a machine crashes when performing a specific activity, like scanning photos, my first inclination would be to suspect a bug in one of the drivers involved in that activity, like the scanner driver or the USB driver. But in this case, I had been using this scanner for a long time, with the same drivers, and never had this problem, so this sudden crashing was kind of mysterious.

     

    Let's see what we can tell from the dump!

     

    The first order of business when looking at a crash dump is the !analyze -v command. I've trimmed some of it here for brevity, but it goes something like this:

     

    kd> !analyze -v

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

     

    KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)

    This is a very common bugcheck.  Usually the exception address pinpoints

    the driver/function that caused the problem.  Always note this address

    as well as the link date of the driver/image that contains this address.

    Some common problems are exception code 0x80000003.  This means a hard

    coded breakpoint or assertion was hit, but this system was booted

    /NODEBUG.  This is not supposed to happen as developers should never have

    hardcoded breakpoints in retail code, but ...

    If this happens, make sure a debugger gets connected, and the

    system is booted /DEBUG.  This will let us see why this breakpoint is

    happening.

    Arguments:

    Arg1: c0000005, The exception code that was not handled

    Arg2: 8738e300, The address that the exception occurred at

    Arg3: b9b3dc7c, Trap Frame

    Arg4: 00000000

     

    Debugging Details:

    ------------------

     

     

    EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

     

    FAULTING_IP:

    +ffffffff8738e300

    8738e300 0000             add     [eax],al

     

    TRAP_FRAME:  b9b3dc7c -- (.trap ffffffffb9b3dc7c)

    ErrCode = 00000002

    eax=00000001 ebx=bc514c68 ecx=0001065e edx=bc510000 esi=00000955 edi=b9b3dd64

    eip=8738e300 esp=b9b3dcf0 ebp=b9b3dd08 iopl=0         nv up ei pl zr na po nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246

    8738e300 0000             add     [eax],al                ds:0023:00000001=??

    Resetting default scope

     

    DEFAULT_BUCKET_ID:  DRIVER_FAULT

     

    BUGCHECK_STR:  0x8E

     

    LAST_CONTROL_TRANSFER:  from 8051d6a7 to 8053331e

     

    STACK_TEXT: 

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    b9b3dcec bf801619 b9b3dd64 0012eee4 bf81551f 0x8738e300

    b9b3dd08 bf81553c b9b3dd64 0012eee4 bf81551f win32k!ValidateHwnd+0x5c

    b9b3dd50 804de7ec 0001065e 012eb244 00000000 win32k!NtUserInvalidateRect+0x1d

    b9b3dd50 7c90eb94 0001065e 012eb244 00000000 nt!KiFastCallEntry+0xf8

    0012eecc 77d4b601 0105a943 0001065e 012eb244 ntdll!KiFastSystemCallRet

    0012eed0 0105a943 0001065e 012eb244 00000000 USER32!NtUserInvalidateRect+0xc

    77d4b601 90909090 001134b8 0300ba00 12ff7ffe NikonScan4!GetSource+0x21e93

    ...

     

    From the stack trace, we can tell that NikonScan4.dll made a call into User32, which ultimately ended up calling into win32k.sys down in kernel mode. Win32k was in a function called ValidateHwnd() and then tried to call some function at address 0x8738e300, at which point we tried to dereference an invalid pointer and crashed.

     

    What’s at 0x8738e300?

     

    kd> dc 8738e300

    8738e300  00000000 00000001 00000000 87360350  ............P.6.

    8738e310  00000000 00000001 f71af9fe f71b0030  ............0...

    8738e320  f71afb0e f71afbb4 f71b0098 f71b0214  ................

    8738e330  f71afef6 f71aff8e 07fef800 00000000  ................

    8738e340  f71afffc 00000000 0a0e000a 644c6d4d  ............MmLd

    8738e350  8732ea58 870303e0 ffffffff ffffffff  X.2.............

    8738e360  00000012 00000000 f797f000 f7989905  ................

    8738e370  0000c000 00500050 e1971458 00160016  ....P.P.X.......

     

    kd> !pool 8738e300 2

    Pool page 8738e300 region is Nonpaged pool

    *8738e2f8 size:   50 previous size:    8  (Allocated) *NV 

                  Owning component : Unknown (update pooltag.txt)

     

    Well, that’s bad. 0x8738e300 isn’t actually a valid address of a function. That location contains some user data (specifically, some nonpaged pool).

     

    So, that's why we blew up: the ValidateHwnd() function in win32k made a call to this bad address which contains data instead of code! Let's see if we can figure out why it did this. We can find the return address in win32k!ValidateHwnd on the stack, and unassemble the instructions leading up to the point where we ran off into the weeds.

     

    kd> kv L8

    ChildEBP RetAddr  Args to Child             

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    b9b3dcec bf801619 b9b3dd64 0012eee4 bf81551f 0x8738e300

    b9b3dd08 bf81553c b9b3dd64 0012eee4 bf81551f win32k!ValidateHwnd+0x5c (FPO: [Non-Fpo])

    b9b3dd50 804de7ec 0001065e 012eb244 00000000 win32k!NtUserInvalidateRect+0x1d (FPO: [Non-Fpo])

    b9b3dd50 7c90eb94 0001065e 012eb244 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9b3dd64)

    0012eecc 77d4b601 0105a943 0001065e 012eb244 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0012eed0 0105a943 0001065e 012eb244 00000000 USER32!NtUserInvalidateRect+0xc

    77d4b601 90909090 001134b8 0300ba00 12ff7ffe NikonScan4!GetSource+0x21e93

    77d4b605 001134b8 0300ba00 12ff7ffe 900008c2 0x90909090

     

    The return address from where we made the bad call is bf801619. Let’s unassemble a few instructions backwards from that address (using the handy but often-overlooked “ub” command) and see what the code was doing:

     

    kd> ub bf801619

    win32k!ValidateHwnd+0x2f:

    bf8015ff 8d1c82           lea     ebx,[edx+eax*4]

    bf801602 8bc1             mov     eax,ecx

    bf801604 c1e810           shr     eax,0x10

    bf801607 663b430a         cmp     ax,[ebx+0xa]

    bf80160b 75ad             jnz     win32k!ValidateHwnd+0x3d (bf8015ba)

    bf80160d 807b0801         cmp     byte ptr [ebx+0x8],0x1

    bf801611 7573             jnz     win32k!ValidateHwnd+0xff (bf801686)

    bf801613 ff15e0b298bf call dword ptr [win32k!_imp__PsGetCurrentThread (bf98b2e0)]

     

    OK, so it's pretty simple. Win32k decided which function address to call by reading it from a pointer stored at a hardcoded location (in other words, in a global variable) within Win32k itself. That pointer is located at bf98b2e0. The debugger helpfully tells us that this pointer is intended to contain the address of a function called PsGetCurrentThread, but let’s double-check this and make sure it actually does. At this point, the working assumption would be that this pointer had gotten corrupted somehow.

     

    kd> dd bf98b2e0 L1

    bf98b2e0  804e4a15

     

    Interesting. bf98b2e0 contains the value 804e4a15. This is not even close to the bad address the processor actually called! Remember, from the stack trace, we jumped to 8738e300 instead. What does this pointer actually point to?

     

    kd> ln 804e4a15

    (804e4a15)   nt!PsGetCurrentThread   |  (804e4a51)   nt!KeSetEventBoostPriority

    Exact matches:

        nt!PsGetCurrentThread = <no type information>

     

    Sure enough, 804e4a15 is the address of nt!PsGetCurrentThread.

     

    So, basically, win32k was trying to call PsGetCurrentThread(), and even had the correct pointer in memory to get there, but the processor instead jumped to a bogus address located in the middle of some user data.

     

    At this point it's pretty safe to say that this is a hardware problem, and furthermore, since the relevant memory looks fine, it looks like a problem within the CPU itself. (I’m not an expert on CPU architecture, but if I were to take a wild guess I’d say that maybe the processor had some problem when reading from its L2 cache.)

     

    Upon discovering this, I decided to crack open my case and take a look. It didn’t take long to spot the problem:

    image

    Yes, that's my CPU fan. Notice that the heatsink is completely clogged up with dust!

     

    The resolution to this story: I took a can of compressed air and blew all the dust out of the heatsink. This took care of the problem, and the computer happily scanned hundreds of photos and hasn’t crashed again since. Why was it only crashing when I was scanning photos? Most likely because this was causing the CPU to run at 100% utilization for extended periods of time, and it was simply overheating!

     

  • Ntdebugging Blog

    NonPagedPool Depletion

    • 1 Comments

    I recently was engaged on an issue where a server was depleting NonPagedPool over a period of a few days.  Ordinarily, we would just use a tool like PoolMon to identify the offending pool tag and then find the driver that uses that pool tag using the method in this article

     

    However, what made this case interesting was the pool tag and that we were unable to identify the driver using the normal methodology.  You’ll see what I mean in a moment.  The engineer supplied me with a kernel dump of the server while it was in-state and this is what I found.

     

    Let’s start by taking a look at the virtual memory usage:

     

    2: kd> !vm

     

    *** Virtual Memory Usage ***

           Physical Memory:      851420 (   3405680 Kb)

           Page File: \??\C:\pagefile.sys

             Current:   3584000 Kb  Free Space:   3568552 Kb

             Minimum:   3584000 Kb  Maximum:      3584000 Kb

           Available Pages:      573277 (   2293108 Kb)

           ResAvail Pages:       800628 (   3202512 Kb)

           Locked IO Pages:        1067 (      4268 Kb)

           Free System PTEs:      25102 (    100408 Kb)

           Free NP PTEs:            335 (      1340 Kb)

           Free Special NP:           0 (         0 Kb)

           Modified Pages:           22 (        88 Kb)

           Modified PF Pages:        22 (        88 Kb)

           NonPagedPool Usage:    31369 (    125476 Kb)    ß Very high

           NonPagedPool Max:      31986 (    127944 Kb)

           ********** Excessive NonPaged Pool Usage *****

           PagedPool 0 Usage:     19071 (     76284 Kb)

           PagedPool 1 Usage:       735 (      2940 Kb)

           PagedPool 2 Usage:       747 (      2988 Kb)

           PagedPool 3 Usage:       720 (      2880 Kb)

           PagedPool 4 Usage:       746 (      2984 Kb)

           PagedPool Usage:       22019 (     88076 Kb)

           PagedPool Maximum:     38912 (    155648 Kb)

     

           ********** 3 pool allocations have failed **********

     

    So we can see that NPP usage is very high given the server is using the /3GB switch which limits NPP to 128MB by default.  We need to identify what pool tag is associated with the high NPP usage:

     

    2: kd> !poolused /t2 2

       Sorting by  NonPaged Pool Consumed

     

      Pool Used:

                NonPaged            Paged

     Tag    Allocs     Used    Allocs     Used

     None   246479 50827424         0        0      call to ExAllocatePool

     MmCm     1198 18462512         0        0      Calls made to MmAllocateContiguousMemory , Binary: nt!mm

     

    Interesting, so the offending tag is “None”.  This means that these allocations were made by calling the function ExAllocatePool instead of ExAllocatePoolWithTag.  ExAllocatePool is obsolete and should no longer be used. 

     

    Now, I need to find out which driver is calling this function.  First, I need to know where ExAllocatePool lives:

     

    2: kd> x nt!ExAllocatePool

    e0894d1f nt!ExAllocatePool

     

    Next, I need to search all the drivers to see which one is importing this function:

     

    2: kd> !for_each_module s-d @#Base @#End e0894d1f

    f50b8058  e0894d1f e0828e04 e089b708 e084011b  .M..............

     

    Hmm, looks suspiciously like an import table, let’s see:

     

    2: kd> dps f50b8058

    f50b8058  e0894d1f nt!ExAllocatePool

    f50b805c  e0828e04 nt!_wcsnicmp

    f50b8060  e089b708 nt!ExFreePoolWithTag

    f50b8064  e083e30a nt!KeInitializeEvent

    <SNIP>

     

    Yep, that’s an import table.  You can also verify that this is the import table of a particular module by checking the header (!dh on the module’s base address and look for “Import Address Table Directory”).

     

    As you can see, we have only one driver that imports ExAllocatePool.  Let’s see which driver this is:

     

    2: kd> !lmi f50b8058

    Loaded Module Info: [f50b8058]

             Module: {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}

       Base Address: f50b3000

         Image Name: {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}.sys

           <SNIP>

     

    I’ve removed the incriminating identifiers from module information displayed above to protect the guilty.  It is interesting to note that the driver name was a GUID and that this driver did not exist on the disk.  This was because the driver is dynamically created when its parent program loads.  

     

    The software package was removed and the server was happy again. 

     

    -         David

  • Ntdebugging Blog

    Wanted: Windows Internals subject matter experts

    • 3 Comments

    Microsoft is looking for five Windows Internals subject matter experts to come work on a very special five to eight day project on the Redmond campus during the month of May 2008.  Candidates must have good communications skills, be non Microsoft employees,  have 5+ years experience with windows, be familiar with the Windows Internals book, have kernel and user mode debugging experience, and be C literate.  

     

    We prefer candidates strongly represent the Windows IT Professional or Windows development industry.    These typically would be 2nd or 3rd tier escalation resources or Windows developers (developing Win32 code, or device drivers) that are working for Gold Partners, ISVs or are independent consultants or MVPs.

     

    A Microsoft NDA (Non Disclosure Agreement) will be required.

     

    Please contact Jeff Dailey at jeffda@microsoft.com with the subject line “Windows Internals Project”

    Please include a copy of your resume and details explaining your Windows background, along with your contact info, and your location.

     

    We will cover air travel, lodging and food for non local candidates attending from within the US.

     

    Thank you.

     

    More Info:

    The Widows Internals experts selected for this project will be brought to Redmond to aid in developing a new certification program based on the Windows Internals.  This is not a hiring effort or product testing session.  Though the candidates should have C literacy they will not be writing or reviewing any Microsoft source code. 

     

  • Ntdebugging Blog

    The default interactive desktop heap size has been increased on 32-bit Vista SP1

    • 5 Comments

     

    This is going to be a short blog post, but considering the amount of feedback we’ve received on the our two previous desktop heap posts, I think this is worth blogging about.  32-bit Vista SP1 and 32-bit Windows Server 2008 both have a new value for the default size of interactive desktop heaps.  Previously, this heap size was 3 MB by default, but on Vista SP1 and Server 2008 the default interactive heap size is 12 MB.  We’ve heard your comments about the problems associated with the heap size, and this new default size should go a long way towards avoiding interactive desktop heap depletion.

     

    On 32-bit systems, running Vista SP1 or Windows Server 2008, you’ll now see that the relevant portion of the registry value looks like this (by default)...

     

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

    SharedSection=1024,12288,512

     

    64-bit Windows still has a default interactive desktop heap size of 20MB.  Keep in mind that in Vista, session view space (the memory range that desktop heaps are allocated from) is now a dynamic address range.  This means that in Vista and Server 2008 the individual desktop heap sizes can be increased to larger sizes with less concern over exhausting session view space.

     

    - Matthew Justice

  • Ntdebugging Blog

    How to Find the Owner of a Named Pipe

    • 1 Comments

    This is a follow-up on the LPC hang blog. The same hang troubleshooting techniques apply to this, but when a named pipe is involved you’ll have to use a slightly different method to following the chain from a client application to the server application. For the purpose of this exercise I’ll use the named pipe server (http://msdn2.microsoft.com/en-us/library/aa365588.aspx) and client (http://msdn2.microsoft.com/en-us/library/aa365592.aspx) given by MSDN.

     

     

    Here is a diagram of our scenario.

    named_pipes

     

    Here we have the client application that is waiting on a read operation to a named pipe. We need to determine what process will be putting data into the named pipe which will let the client application move forward.

    Here is the thread in the client application.

     

     

    THREAD 81e70858  Cid 06bc.06c0  Teb: 7ffdf000 Win32Thread: 00000000 WAIT: (Executive) UserMode Non-Alertable

        81baaf7c  NotificationEvent

    Not impersonating

    DeviceMap                 e25176b8

    Owning Process            81b3a020       Image:         client.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      73194          Ticks: 1599 (0:00:00:24.984)

    Context Switch Count      21            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address client (0x00401376)

    Start Address kernel32!BaseProcessStartThunk (0x77e617f8)

    Stack Init f7450000 Current f744fc04 Base f7450000 Limit f744d000 Call 0

    Priority 10 BasePriority 8 PriorityDecrement 2

    ChildEBP RetAddr  Args to Child             

    f744fc1c 808202b6 81e70858 81e70900 00000700 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

    f744fc34 8081fb6e 8207dac0 00000000 81baaf20 nt!KiSwapThread+0x83 (FPO: [Non-Fpo])

    f744fc78 8092deb3 81baaf7c 00000000 81f93601 nt!KeWaitForSingleObject+0x2e0 (FPO: [Non-Fpo])

    f744fca0 80924ca9 81f936e0 00000103 81baaf20 nt!IopSynchronousServiceTail+0x180 (FPO: [Non-Fpo])

    f744fd38 8082350b 00000024 00000000 00000000 nt!NtReadFile+0x5d5 (FPO: [Non-Fpo])

    f744fd38 7c8285ec 00000024 00000000 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f744fd64)

    0012eed4 7c82776b 77e418b2 00000024 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0012eed8 77e418b2 00000024 00000000 00000000 ntdll!NtReadFile+0xc (FPO: [9,0,0])

    0012ef40 004010ab 00000024 0012ef60 00001000 kernel32!ReadFile+0x16c (FPO: [Non-Fpo])

    0012ff78 0040131f 00000001 00323038 00323078 client+0x10ab

    0012ffc0 77e6f23b 00000000 00000000 7ffd8000 client+0x131f

    0012fff0 00000000 00401376 00000000 78746341 kernel32!BaseProcessStart+0x23 (FPO: [Non-Fpo])

     

    We have highlighted the first parameter to the ReadFile function. This parameter is the handle to the file we are trying to read.

    Knowing this we can use the !handle extension to display details on this handle. The !handle extension takes three parameters: 1. The handle to get details on, 2: a flag controlling the level of verbosity the extension will display (f being the most verbose level), and 3: the process address.

     

    kd> !handle 00000024 f 81b3a020

    processor number 0, process 81b3a020

    PROCESS 81b3a020  SessionId: 0  Cid: 06bc    Peb: 7ffd8000  ParentCid: 0f2c

        DirBase: 1d486000  ObjectTable: e252bbc8  HandleCount:  10.

        Image: client.exe

    Handle table at e2845000 with 10 Entries in use

    0024: Object: 81baaf20  GrantedAccess: 0012019f Entry: e2845048

    Object: 81baaf20  Type: (823ceca0) File

        ObjectHeader: 81baaf08 (old version)

            HandleCount: 1  PointerCount: 3

            Directory Object: 00000000  Name: \mynamedpipe {NamedPipe}

    We have highlighted the object address (yellow) and the type of the object (green) above. The object is of type file which is defined by the FILE_OBJECT structure. So we can examine that using the dt (Display Type) command. The dt command can take 3 parameters: 1. The structure you want to cast the data as, 2: the address of the object, and 3: the field in the structure to be displayed.

     

    kd> dt nt!_FILE_OBJECT 81baaf20 FsContext2

       +0x010 FsContext2       : 0x81b5cc90

     

    The FsContext2 field points to an NPFS CCB structure. The importance of this structure is that it is charged to the server process which created it. The !pool extension can be used to display information about the ownership of the memory the file object is stored in.

     

    kd> !pool 0x81b5cc90  2

    Pool page 81b5cc90 region is Nonpaged pool

    *81b5cc88 size:   50 previous size:  140  (Allocated) *NpFc Process: 81be0d88

                  Pooltag NpFc : CCB, client control block, Binary : npfs.sys

    PROCESS 81be0d88  SessionId: 0  Cid: 07b4    Peb: 7ffd4000  ParentCid: 0160

        DirBase: 02b6f000  ObjectTable: e234a530  HandleCount:  10.

        Image: server.exe

     

     

    Now you have the established the link from the client application and the server application. Now you will be able to move forward in your debugging and figure out why the server process isn’t writing data back to the client.

     

    - Bryan

  • Ntdebugging Blog

    Disassemble the MBR

    • 1 Comments

     

    UPDATE:  This is why we *LOVE* our readers!  After posting a 32-bit disassembly of 16-bit MBR code, I was gently informed of my misstep by three sharp-eyed readers.  I’ve corrected the original post below.  Thanks go out to Ramon Sola, Jeroen Frijters and 'pedantic gnome' for keeping us honest.  Thanks guys!

     

     

    Hi Everyone,

     

    One of our readers, Pete, asked a very interesting question about disassembling the MBR and I'd like to take a moment to show you the quick & dirty way to do this.

     

    First, dump the MBR to a file using a low-level sector editor such as the Microsoft Resource Kit utility DiskProbe.  Once you've saved the file, launch a program (like Notepad) and attach to it using Windbg (the Windows Debugger). 

     

    Next, find a valid, but unoccupied range of memory (the default heap is a good candidate).  The memory range needs to be at least 512 (0x200) bytes.  For example, here's where I found mine:

     

    000120a0  00000000 00000000 00000000 00000000

    000120b0  00000000 00000000 00000000 00000000

    000120c0  00000000 00000000 00000000 00000000

    000120d0  00000000 00000000 00000000 00000000

    000120e0  00000000 00000000 00000000 00000000

    000120f0  00000000 00000000 00000000 00000000

    00012100  00000000 00000000 00000000 00000000

    00012110  00000000 00000000 00000000 00000000

    00012120  00000000 00000000 00000000 00000000

    <snip>...

     

    Once you've found a good memory range in your process, read the file contents into that memory location using the following command:

     

    0:001> .readmem c:\<<path>>\sector00.bin 120a0 120a0+0x1ff

     

    Tada!  You now have the MBR in memory and it is fully examinable just like any other assembly code. 

     

    0:001> db 000120a0 

    000120a0  33 c0 8e d0 bc 00 7c 8e-c0 8e d8 be 00 7c bf 00  3.....|......|..

    000120b0  06 b9 00 02 fc f3 a4 50-68 1c 06 cb fb b9 04 00  .......Ph.......

    000120c0  bd be 07 80 7e 00 00 7c-0b 0f 85 10 01 83 c5 10  ....~..|........

    000120d0  e2 f1 cd 18 88 56 00 55-c6 46 11 05 c6 46 10 00  .....V.U.F...F..

    000120e0  b4 41 bb aa 55 cd 13 5d-72 0f 81 fb 55 aa 75 09  .A..U..]r...U.u.

    000120f0  f7 c1 01 00 74 03 fe 46-10 66 60 80 7e 10 00 74  ....t..F.f`.~..t

    00012100  26 66 68 00 00 00 00 66-ff 76 08 68 00 00 68 00  &fh....f.v.h..h.

    00012110  7c 68 01 00 68 10 00 b4-42 8a 56 00 8b f4 cd 13  |h..h...B.V.....

     

    Since this is 16-bit assembly, you can unassemble using ‘ur’:

    0:001> ur 120a0
    000120a0 33c0            xor     ax,ax
    000120a2 8ed0            mov     ss,ax
    000120a4 bc007c          mov     sp,7C00h
    000120a7 8ec0            mov     es,ax
    000120a9 8ed8            mov     ds,ax
    000120ab be007c          mov     si,7C00h
    000120ae bf0006          mov     di,600h
    000120b1 b90002          mov     cx,200h
    <snip>...

     

    Have fun Pete!

     

    DiskProbe:       http://technet.microsoft.com/en-us/library/bb457122.aspx

    WinDbg:           http://www.microsoft.com/whdc/devtools/debugging/default.mspx

     

      - David

  • Ntdebugging Blog

    Kernel Stack Overflows

    • 5 Comments

    Hello, this is Omer, and today I would like to talk about a common error that we see in a lot of cases reported to us by customers. It involves drivers taking too much space on the kernel stack that results in a kernel stack overflow, which will then crash the system with one of the following bugchecks:

    1. STOP 0x7F: UNEXPECTED_KERNEL_MODE_TRAP with Parameter 1 set to EXCEPTION_DOUBLE_FAULT, which is caused by running off the end of a kernel stack.

    2. STOP 0x1E: KMODE_EXCEPTION_NOT_HANDLED, 0x7E: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED, or 0x8E: KERNEL_MODE_EXCEPTION_NOT_HANDLED, with an exception code of STATUS_ACCESS_VIOLATION, which indicates a memory access violation.

    3. STOP 0x2B: PANIC_STACK_SWITCH, which usually occurs when a kernel-mode driver uses too much stack space.

    Kernel Stack Overview

    Each thread in the system is allocated with a kernel mode stack. Code running on any kernel-mode thread (whether it is a system thread or a thread created by a driver) uses that thread's kernel-mode stack unless the code is a DPC, in which case it uses the processor's DPC stack on certain platforms.  Stack grows negatively.  This means that the beginning (bottom) of the stack has a higher address than the end (top) of the stack.  For example, let's stay the beginning of your stack is 0x80f1000 and this is where your stack pointer (ESP) is pointing.  If you push a DWORD value onto the stack, its address would be 0x80f0ffc.  The next DWORD value would be stored at 0x80f0ff8 and so on up to the limit (top) of the allocated stack.  The top of the stack is bordered by a guard-page to detect overruns.

    The size of the kernel-mode stack varies among different hardware platforms. For example:

    · On x86-based platforms, the kernel-mode stack is 12K.

    · On x64-based platforms, the kernel-mode stack is 24K. (x64-based platforms include systems with processors using the AMD64 architecture and processors using the Intel EM64T architecture).

    · On Itanium-based platforms, the kernel-mode stack is 32K with a 32K backing store. (If the processor runs out of registers from its register file, it uses the backing store to hold the contents of registers until the allocating function returns. This doesn't affect stack allocations directly, but the operating system uses more registers on Itanium-based platforms than on other platforms, which makes relatively more stack available to drivers.)

    The stack sizes listed above are hard limits that are imposed by the system, and all drivers need to use space conservatively so that they can coexist.

    Exception Overview

    So, now that we have discussed the kernel stack, let’s dive into how the double fault actually happens.

    When we reach the top of the stack, one more push instruction is going to cause an exception. This could be either a simple push instruction, or something along the lines of a call instruction which also pushes the return address onto the stack, etc.

    The push instruction is going to cause the first exception. This will cause the exception handler to kick in, which will then try to allocate the trap frame and other variables on the stack. This causes the second exception.

    This time around, the operating system takes advantage a special x86 structure called the Task State Segment(TSS).  The OS stores the state of the registers in the TSS and then stops.  The TSS can be accessed via an entry in the global descriptor table, and can be used to debug the memory dump that is created.

    The Usual Suspects

    Rogue drivers are usually guilty of one or more of the following design flaws:

    1. Using the stack liberally. Instead of passing large amounts of data on the stack, driver writers should design functions to accept pointers to data structures. These data structures should be allocated out of system space memory(paged or non-paged pool). If you need to pass large number of parameters from one function to another, then group the parameters into a structure and then pass a pointer to that structure.

    2. Calling functions recursively. Heavily nested or recursive functions that are passing large amounts of data on the stack will use too much space and will overflow. Try to design drivers that use a minimal number of recursive calls and nested functions.

    Since the size of the stack is much smaller on x86 machines, you will run into these problems with x86 machines more frequently than any other platform.

    For a more detailed description, please visit

    http://www.microsoft.com/whdc/Driver/tips/KMstack.mspx.

    Debugging Kernel Stack Overflows

    Full kernel dumps are usually enough to find the offending driver. The most common bugcheck code that appears in these dumps is UNEXPECTED_KERNEL_MODE_TRAP (0x7f), with the first argument being EXCEPTION_DOUBLE_FAULT (0x8).

    When you get this dump, the first command that you should run is !analyze-v.

    0: kd> !analyze -v

    *******************************************************************************

    * Bugcheck Analysis *

    *******************************************************************************

    UNEXPECTED_KERNEL_MODE_TRAP (7f)

    This means a trap occurred in kernel mode, and it's a trap of a kind that the kernel isn't allowed to have/catch (bound trap) or that

    is always instant death (double fault). The first number in the bugcheck params is the number of the trap (8 = double fault, etc)

    Consult an Intel x86 family manual to learn more about what these traps are. Here is a *portion* of those codes:

    If kv shows a taskGate

    use .tss on the part before the colon, then kv.

    Else if kv shows a trapframe

    use .trap on that value

    Else

    .trap on the appropriate frame will show where the trap was taken(on x86, this will be the ebp that goes with the procedure KiTrap)

    Endif

    kb will then show the corrected stack.

    Arguments:

    Arg1: 00000008, EXCEPTION_DOUBLE_FAULT

    Arg2: 80042000

    Arg3: 00000000

    Arg4: 00000000

    Debugging Details:

    ------------------

    BUGCHECK_STR: 0x7f_8

    TSS: 00000028 -- (.tss 0x28)

    eax=87b90328 ebx=87b90328 ecx=8aa3d8c0 edx=87b90328 esi=b8cb7138 edi=8084266a

    eip=f7159c53 esp=b8cb7000 ebp=b8cb7010 iopl=0 nv up ei pl nz na po nc

    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202

    Ntfs!NtfsInitializeIrpContext+0xc:

    f7159c53 57 push edi

    Resetting default scope

    DEFAULT_BUCKET_ID: DRIVER_FAULT

    PROCESS_NAME: System

    CURRENT_IRQL: 1

    TRAP_FRAME: b8cb8620 -- (.trap 0xffffffffb8cb8620)

    ErrCode = 00000000

    eax=c1587000 ebx=0000000e ecx=0000000f edx=00000000 esi=87dca350 edi=00000000

    eip=8093837b esp=b8cb8694 ebp=b8cb86d0 iopl=0 nv up ei ng nz ac po cy

    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010293

    nt!CcMapData+0x8c:

    8093837b 8a10 mov dl,byte ptr [eax] ds:0023:c1587000=??

    Resetting default scope

    LAST_CONTROL_TRANSFER: from f7158867 to f7159c53

    Let's follow the instructions that the debugger is giving us. Since the debugger gave us a .tss command, lets run that. After that, run a !thread to get the thread summary:

    0: kd> .tss 0x28

    eax=87b90328 ebx=87b90328 ecx=8aa3d8c0 edx=87b90328 esi=b8cb7138 edi=8084266a

    eip=f7159c53 esp=b8cb7000 ebp=b8cb7010 iopl=0 nv up ei pl nz na po nc

    cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202

    Ntfs!NtfsInitializeIrpContext+0xc:

    f7159c53 57 push edi

    0: kd> !thread

    THREAD 87dca350 Cid 0420.0990 Teb: 7ffdf000 Win32Thread: efdbe430 RUNNING on processor 0

    IRP List:

    89cba088: (0006,01fc) Flags: 00000404 Mdl: 00000000

    Not impersonating

    DeviceMap e10008d8

    Owning Process 8ab8e238 Image: System

    Wait Start TickCount 7260638 Ticks: 0

    Context Switch Count 17 LargeStack

    UserTime 00:00:00.000

    KernelTime 00:00:00.015

    Start Address 0x4a6810ea

    Stack Init b8cba000 Current b8cb7c64 Base b8cba000 Limit b8cb7000 Call 0

    Priority 14 BasePriority 13 PriorityDecrement 0

    We are looking for the kernel stack limits(above in red). For this particular stack we see that the stack starts at b8cba000, and ends at b8cb7000. If you look at the ESP register in the .tss output above, you will see that we have reached the stack limit. The current instruction being attempted is a push which overflows the stack and causes the bugcheck.

    Now that we have determined we do have a stack overflow, let’s find out what caused this, and who the offending driver is.

    The first thing that I do is dump the stack. You might need to increase the number of frames displayed to see the whole stack.

    0: kd> kb

    *** Stack trace for last set context - .thread/.cxr resets it

    ChildEBP RetAddr

    b8cb7010 f7158867 Ntfs!NtfsInitializeIrpContext+0xc

    b8cb71bc 8083f9c0 Ntfs!NtfsFsdRead+0xb7

    b8cb71d0 f7212c53 nt!IofCallDriver+0x45

    b8cb71f8 8083f9c0 fltmgr!FltpDispatch+0x6f

    b8cb720c ba547bcc nt!IofCallDriver+0x45

    WARNING: Stack unwind information not available. Following frames may be wrong.

    b8cb7214 8083f9c0 tmpreflt!TmpAddRdr+0x7b8

    b8cb7228 ba4e08be nt!IofCallDriver+0x45

    b8cb7430 ba4e09d3 DRIVER_A+0x28be

    b8cb7450 b85fa306 DRIVER_A+0x29d3

    b8cb763c b85fa50d DRIVER_B+0x8306

    b8cb765c 8082f0d7 DRIVER_B+0x850d

    b8cb7674 8082f175 nt!IoPageRead+0x109

    b8cb76f8 80849cd5 nt!MiDispatchFault+0xd2a

    b8cb7754 80837d0a nt!MmAccessFault+0x64a

    b8cb7754 8093837b nt!KiTrap0E+0xdc

    b8cb781c f718c0ac nt!CcMapData+0x8c

    b8cb783c f718c6e6 Ntfs!NtfsMapStream+0x4b

    b8cb78b0 f718c045 Ntfs!NtfsReadMftRecord+0x86

    b8cb78e8 f718c0f4 Ntfs!NtfsReadFileRecord+0x7a

    b8cb7920 f7155c3c Ntfs!NtfsLookupInFileRecord+0x37

    b8cb7a30 f715746a Ntfs!NtfsLookupAllocation+0xdd

    b8cb7bfc f7157655 Ntfs!NtfsPrepareBuffers+0x25d

    b8cb7dd8 f715575e Ntfs!NtfsNonCachedIo+0x1ee

    b8cb7ec4 f71588de Ntfs!NtfsCommonRead+0xaf5

    b8cb8070 8083f9c0 Ntfs!NtfsFsdRead+0x113

    b8cb8084 f7212c53 nt!IofCallDriver+0x45

    b8cb80ac 8083f9c0 fltmgr!FltpDispatch+0x6f

    b8cb80c0 ba547bcc nt!IofCallDriver+0x45

    b8cb80c8 8083f9c0 tmpreflt!TmpAddRdr+0x7b8

    b8cb80dc ba4e08be nt!IofCallDriver+0x45

    b8cb82e4 ba4e09d3 DRIVER_A+0x28be

    b8cb8304 b85fa306 DRIVER_A+0x29d3

    b8cb84f0 b85fa50d DRIVER_B+0x8306

    b8cb8510 8082f0d7 DRIVER_B+0x850d

    b8cb8528 8082f175 nt!IoPageRead+0x109

    b8cb85ac 80849cd5 nt!MiDispatchFault+0xd2a

    b8cb8608 80837d0a nt!MmAccessFault+0x64a

    b8cb8608 8093837b nt!KiTrap0E+0xdc

    b8cb86d0 f718c0ac nt!CcMapData+0x8c

    b8cb86f0 f718ef1b Ntfs!NtfsMapStream+0x4b

    b8cb8720 f7186aa7 Ntfs!ReadIndexBuffer+0x8f

    b8cb8894 f7187042 Ntfs!NtfsUpdateFileNameInIndex+0x62

    b8cb8990 f7186059 Ntfs!NtfsUpdateDuplicateInfo+0x2b0

    b8cb8b98 f7186302 Ntfs!NtfsCommonCleanup+0x1e82

    b8cb8d08 8083f9c0 Ntfs!NtfsFsdCleanup+0xcf

    b8cb8d1c f7212c53 nt!IofCallDriver+0x45

    b8cb8d44 8083f9c0 fltmgr!FltpDispatch+0x6f

    b8cb8d58 ba54809a nt!IofCallDriver+0x45

    b8cb8d80 ba54d01d tmpreflt!TmpQueryFullName+0x454

    b8cb8d90 8083f9c0 tmpreflt!TmpQueryFullName+0x53d7

    b8cb8da4 ba4e08be nt!IofCallDriver+0x45

    b8cb8fac ba4e09d3 DRIVER_A+0x28be

    b8cb8fcc b85fa306 DRIVER_A+0x29d3

    b8cb91b8 b85fa50d DRIVER_B+0x8306

    b8cb91d8 80937f75 DRIVER_B+0x850d

    b8cb9208 8092add4 nt!IopCloseFile+0x2ae

    b8cb9238 8092af7a nt!ObpDecrementHandleCount+0x10a

    b8cb9260 8092ae9e nt!ObpCloseHandleTableEntry+0x131

    b8cb92a4 8092aee9 nt!ObpCloseHandle+0x82

    b8cb92b4 80834d3f nt!NtClose+0x1b

    b8cb92b4 8083c0fc nt!KiFastCallEntry+0xfc

    b8cb9330 bf835765 nt!ZwClose+0x11

    b8cb9608 bf8aa2dd win32k!bCreateSection+0x2ad

    b8cb9660 bf826b45 win32k!EngMapFontFileFDInternal+0xc6

    b8cb96c0 bf82784a win32k!PUBLIC_PFTOBJ::bLoadFonts+0x17f

    b8cb991c bf9bcb67 win32k!PUBLIC_PFTOBJ::bLoadAFont+0x77

    b8cb9af0 bf9bcb16 win32k!bInitOneStockFontInternal+0x42

    b8cb9b0c bf9bb0e8 win32k!bInitOneStockFont+0x3f

    b8cb9cf4 bf9ba845 win32k!bInitStockFontsInternal+0x12a

    b8cb9cfc bf8246ad win32k!bInitStockFonts+0xa

    b8cb9d48 bf8242d5 win32k!InitializeGreCSRSS+0x149

    b8cb9d50 80834d3f win32k!NtUserInitialize+0x66

    b8cb9d50 7c82ed54 nt!KiFastCallEntry+0xfc

    0015fdb0 00000000 0x7c82ed54

    The next step is to calculate how much space each frame is taking up. This can be done by walking the stack manually. Just subtract the subsequent EBP from the current EBP for each frame and add up the space used by all the modules.

    Module Stack Usage Percentage
    Ntfs 4152 36%
    DRIVER_A 1572 14%
    win32k 2592 22%
    DRIVER_B 1656 14%
    tmpreflt 72 1%
    fltmgr 120 1%
    nt 1420 12%

    It would be easy to blame NTFS since it is the top stack consumer, but look closer.  Even though NTFS is using the most of space in our example, this is due to both DRIVER_A and DRIVER_B making repeated calls into NTFS to access data.  Alone, it is likely that neither driver would have caused a problem, but both drivers combined resulted in a bugcheck.  Conscientious driver writing and efficient stack usage would have prevented this problem.  Both drivers need to optimize the number of calls they make to NTFS.

    Further reading

    http://www.microsoft.com/whdc/driver/kernel/mem-mgmt.mspx

    http://www.microsoft.com/whdc/Driver/tips/KMstack.mspx

    http://support.microsoft.com/kb/186775

    For more information on Task State Segments, please see the Intel and AMD Processor manuals.

Page 21 of 24 (235 items) «1920212223»