• Ntdebugging Blog

    The default interactive desktop heap size has been increased on 32-bit Vista SP1



    This is going to be a short blog post, but considering the amount of feedback we’ve received on the our two previous desktop heap posts, I think this is worth blogging about.  32-bit Vista SP1 and 32-bit Windows Server 2008 both have a new value for the default size of interactive desktop heaps.  Previously, this heap size was 3 MB by default, but on Vista SP1 and Server 2008 the default interactive heap size is 12 MB.  We’ve heard your comments about the problems associated with the heap size, and this new default size should go a long way towards avoiding interactive desktop heap depletion.


    On 32-bit systems, running Vista SP1 or Windows Server 2008, you’ll now see that the relevant portion of the registry value looks like this (by default)...


    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows



    64-bit Windows still has a default interactive desktop heap size of 20MB.  Keep in mind that in Vista, session view space (the memory range that desktop heaps are allocated from) is now a dynamic address range.  This means that in Vista and Server 2008 the individual desktop heap sizes can be increased to larger sizes with less concern over exhausting session view space.


    - Matthew Justice

  • Ntdebugging Blog

    Debugging a bluescreen at home



    Hi, my name is Chad. I work as an escalation engineer for Microsoft’s OEM support team.


    A while back, I encountered an interesting crash on one of my computers at home, and I thought I’d post about how I debugged it.


    This particular machine had been humming along quite happily for some time, but one day while I was scanning some photos, it bluescreened. Naturally, I hoped it was just a fluke, but after it happened a few more times while doing the same thing, I decided to debug it.


    Ordinarily, if a machine crashes when performing a specific activity, like scanning photos, my first inclination would be to suspect a bug in one of the drivers involved in that activity, like the scanner driver or the USB driver. But in this case, I had been using this scanner for a long time, with the same drivers, and never had this problem, so this sudden crashing was kind of mysterious.


    Let's see what we can tell from the dump!


    The first order of business when looking at a crash dump is the !analyze -v command. I've trimmed some of it here for brevity, but it goes something like this:


    kd> !analyze -v


    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *




    This is a very common bugcheck.  Usually the exception address pinpoints

    the driver/function that caused the problem.  Always note this address

    as well as the link date of the driver/image that contains this address.

    Some common problems are exception code 0x80000003.  This means a hard

    coded breakpoint or assertion was hit, but this system was booted

    /NODEBUG.  This is not supposed to happen as developers should never have

    hardcoded breakpoints in retail code, but ...

    If this happens, make sure a debugger gets connected, and the

    system is booted /DEBUG.  This will let us see why this breakpoint is



    Arg1: c0000005, The exception code that was not handled

    Arg2: 8738e300, The address that the exception occurred at

    Arg3: b9b3dc7c, Trap Frame

    Arg4: 00000000


    Debugging Details:




    EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".




    8738e300 0000             add     [eax],al


    TRAP_FRAME:  b9b3dc7c -- (.trap ffffffffb9b3dc7c)

    ErrCode = 00000002

    eax=00000001 ebx=bc514c68 ecx=0001065e edx=bc510000 esi=00000955 edi=b9b3dd64

    eip=8738e300 esp=b9b3dcf0 ebp=b9b3dd08 iopl=0         nv up ei pl zr na po nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246

    8738e300 0000             add     [eax],al                ds:0023:00000001=??

    Resetting default scope






    LAST_CONTROL_TRANSFER:  from 8051d6a7 to 8053331e



    WARNING: Frame IP not in any known module. Following frames may be wrong.

    b9b3dcec bf801619 b9b3dd64 0012eee4 bf81551f 0x8738e300

    b9b3dd08 bf81553c b9b3dd64 0012eee4 bf81551f win32k!ValidateHwnd+0x5c

    b9b3dd50 804de7ec 0001065e 012eb244 00000000 win32k!NtUserInvalidateRect+0x1d

    b9b3dd50 7c90eb94 0001065e 012eb244 00000000 nt!KiFastCallEntry+0xf8

    0012eecc 77d4b601 0105a943 0001065e 012eb244 ntdll!KiFastSystemCallRet

    0012eed0 0105a943 0001065e 012eb244 00000000 USER32!NtUserInvalidateRect+0xc

    77d4b601 90909090 001134b8 0300ba00 12ff7ffe NikonScan4!GetSource+0x21e93



    From the stack trace, we can tell that NikonScan4.dll made a call into User32, which ultimately ended up calling into win32k.sys down in kernel mode. Win32k was in a function called ValidateHwnd() and then tried to call some function at address 0x8738e300, at which point we tried to dereference an invalid pointer and crashed.


    What’s at 0x8738e300?


    kd> dc 8738e300

    8738e300  00000000 00000001 00000000 87360350  ............P.6.

    8738e310  00000000 00000001 f71af9fe f71b0030  ............0...

    8738e320  f71afb0e f71afbb4 f71b0098 f71b0214  ................

    8738e330  f71afef6 f71aff8e 07fef800 00000000  ................

    8738e340  f71afffc 00000000 0a0e000a 644c6d4d  ............MmLd

    8738e350  8732ea58 870303e0 ffffffff ffffffff  X.2.............

    8738e360  00000012 00000000 f797f000 f7989905  ................

    8738e370  0000c000 00500050 e1971458 00160016  ....P.P.X.......


    kd> !pool 8738e300 2

    Pool page 8738e300 region is Nonpaged pool

    *8738e2f8 size:   50 previous size:    8  (Allocated) *NV 

                  Owning component : Unknown (update pooltag.txt)


    Well, that’s bad. 0x8738e300 isn’t actually a valid address of a function. That location contains some user data (specifically, some nonpaged pool).


    So, that's why we blew up: the ValidateHwnd() function in win32k made a call to this bad address which contains data instead of code! Let's see if we can figure out why it did this. We can find the return address in win32k!ValidateHwnd on the stack, and unassemble the instructions leading up to the point where we ran off into the weeds.


    kd> kv L8

    ChildEBP RetAddr  Args to Child             

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    b9b3dcec bf801619 b9b3dd64 0012eee4 bf81551f 0x8738e300

    b9b3dd08 bf81553c b9b3dd64 0012eee4 bf81551f win32k!ValidateHwnd+0x5c (FPO: [Non-Fpo])

    b9b3dd50 804de7ec 0001065e 012eb244 00000000 win32k!NtUserInvalidateRect+0x1d (FPO: [Non-Fpo])

    b9b3dd50 7c90eb94 0001065e 012eb244 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9b3dd64)

    0012eecc 77d4b601 0105a943 0001065e 012eb244 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0012eed0 0105a943 0001065e 012eb244 00000000 USER32!NtUserInvalidateRect+0xc

    77d4b601 90909090 001134b8 0300ba00 12ff7ffe NikonScan4!GetSource+0x21e93

    77d4b605 001134b8 0300ba00 12ff7ffe 900008c2 0x90909090


    The return address from where we made the bad call is bf801619. Let’s unassemble a few instructions backwards from that address (using the handy but often-overlooked “ub” command) and see what the code was doing:


    kd> ub bf801619


    bf8015ff 8d1c82           lea     ebx,[edx+eax*4]

    bf801602 8bc1             mov     eax,ecx

    bf801604 c1e810           shr     eax,0x10

    bf801607 663b430a         cmp     ax,[ebx+0xa]

    bf80160b 75ad             jnz     win32k!ValidateHwnd+0x3d (bf8015ba)

    bf80160d 807b0801         cmp     byte ptr [ebx+0x8],0x1

    bf801611 7573             jnz     win32k!ValidateHwnd+0xff (bf801686)

    bf801613 ff15e0b298bf call dword ptr [win32k!_imp__PsGetCurrentThread (bf98b2e0)]


    OK, so it's pretty simple. Win32k decided which function address to call by reading it from a pointer stored at a hardcoded location (in other words, in a global variable) within Win32k itself. That pointer is located at bf98b2e0. The debugger helpfully tells us that this pointer is intended to contain the address of a function called PsGetCurrentThread, but let’s double-check this and make sure it actually does. At this point, the working assumption would be that this pointer had gotten corrupted somehow.


    kd> dd bf98b2e0 L1

    bf98b2e0  804e4a15


    Interesting. bf98b2e0 contains the value 804e4a15. This is not even close to the bad address the processor actually called! Remember, from the stack trace, we jumped to 8738e300 instead. What does this pointer actually point to?


    kd> ln 804e4a15

    (804e4a15)   nt!PsGetCurrentThread   |  (804e4a51)   nt!KeSetEventBoostPriority

    Exact matches:

        nt!PsGetCurrentThread = <no type information>


    Sure enough, 804e4a15 is the address of nt!PsGetCurrentThread.


    So, basically, win32k was trying to call PsGetCurrentThread(), and even had the correct pointer in memory to get there, but the processor instead jumped to a bogus address located in the middle of some user data.


    At this point it's pretty safe to say that this is a hardware problem, and furthermore, since the relevant memory looks fine, it looks like a problem within the CPU itself. (I’m not an expert on CPU architecture, but if I were to take a wild guess I’d say that maybe the processor had some problem when reading from its L2 cache.)


    Upon discovering this, I decided to crack open my case and take a look. It didn’t take long to spot the problem:


    Yes, that's my CPU fan. Notice that the heatsink is completely clogged up with dust!


    The resolution to this story: I took a can of compressed air and blew all the dust out of the heatsink. This took care of the problem, and the computer happily scanned hundreds of photos and hasn’t crashed again since. Why was it only crashing when I was scanning photos? Most likely because this was causing the CPU to run at 100% utilization for extended periods of time, and it was simply overheating!


  • Ntdebugging Blog

    Disassemble the MBR



    UPDATE:  This is why we *LOVE* our readers!  After posting a 32-bit disassembly of 16-bit MBR code, I was gently informed of my misstep by three sharp-eyed readers.  I’ve corrected the original post below.  Thanks go out to Ramon Sola, Jeroen Frijters and 'pedantic gnome' for keeping us honest.  Thanks guys!



    Hi Everyone,


    One of our readers, Pete, asked a very interesting question about disassembling the MBR and I'd like to take a moment to show you the quick & dirty way to do this.


    First, dump the MBR to a file using a low-level sector editor such as the Microsoft Resource Kit utility DiskProbe.  Once you've saved the file, launch a program (like Notepad) and attach to it using Windbg (the Windows Debugger). 


    Next, find a valid, but unoccupied range of memory (the default heap is a good candidate).  The memory range needs to be at least 512 (0x200) bytes.  For example, here's where I found mine:


    000120a0  00000000 00000000 00000000 00000000

    000120b0  00000000 00000000 00000000 00000000

    000120c0  00000000 00000000 00000000 00000000

    000120d0  00000000 00000000 00000000 00000000

    000120e0  00000000 00000000 00000000 00000000

    000120f0  00000000 00000000 00000000 00000000

    00012100  00000000 00000000 00000000 00000000

    00012110  00000000 00000000 00000000 00000000

    00012120  00000000 00000000 00000000 00000000



    Once you've found a good memory range in your process, read the file contents into that memory location using the following command:


    0:001> .readmem c:\<<path>>\sector00.bin 120a0 120a0+0x1ff


    Tada!  You now have the MBR in memory and it is fully examinable just like any other assembly code. 


    0:001> db 000120a0 

    000120a0  33 c0 8e d0 bc 00 7c 8e-c0 8e d8 be 00 7c bf 00  3.....|......|..

    000120b0  06 b9 00 02 fc f3 a4 50-68 1c 06 cb fb b9 04 00  .......Ph.......

    000120c0  bd be 07 80 7e 00 00 7c-0b 0f 85 10 01 83 c5 10  ....~..|........

    000120d0  e2 f1 cd 18 88 56 00 55-c6 46 11 05 c6 46 10 00  .....V.U.F...F..

    000120e0  b4 41 bb aa 55 cd 13 5d-72 0f 81 fb 55 aa 75 09  .A..U..]r...U.u.

    000120f0  f7 c1 01 00 74 03 fe 46-10 66 60 80 7e 10 00 74  ....t..F.f`.~..t

    00012100  26 66 68 00 00 00 00 66-ff 76 08 68 00 00 68 00  &fh....f.v.h..h.

    00012110  7c 68 01 00 68 10 00 b4-42 8a 56 00 8b f4 cd 13  |h..h...B.V.....


    Since this is 16-bit assembly, you can unassemble using ‘ur’:

    0:001> ur 120a0
    000120a0 33c0            xor     ax,ax
    000120a2 8ed0            mov     ss,ax
    000120a4 bc007c          mov     sp,7C00h
    000120a7 8ec0            mov     es,ax
    000120a9 8ed8            mov     ds,ax
    000120ab be007c          mov     si,7C00h
    000120ae bf0006          mov     di,600h
    000120b1 b90002          mov     cx,200h


    Have fun Pete!


    DiskProbe:       http://technet.microsoft.com/en-us/library/bb457122.aspx

    WinDbg:           http://www.microsoft.com/whdc/devtools/debugging/default.mspx


      - David

  • Ntdebugging Blog

    NonPagedPool Depletion


    I recently was engaged on an issue where a server was depleting NonPagedPool over a period of a few days.  Ordinarily, we would just use a tool like PoolMon to identify the offending pool tag and then find the driver that uses that pool tag using the method in this article


    However, what made this case interesting was the pool tag and that we were unable to identify the driver using the normal methodology.  You’ll see what I mean in a moment.  The engineer supplied me with a kernel dump of the server while it was in-state and this is what I found.


    Let’s start by taking a look at the virtual memory usage:


    2: kd> !vm


    *** Virtual Memory Usage ***

           Physical Memory:      851420 (   3405680 Kb)

           Page File: \??\C:\pagefile.sys

             Current:   3584000 Kb  Free Space:   3568552 Kb

             Minimum:   3584000 Kb  Maximum:      3584000 Kb

           Available Pages:      573277 (   2293108 Kb)

           ResAvail Pages:       800628 (   3202512 Kb)

           Locked IO Pages:        1067 (      4268 Kb)

           Free System PTEs:      25102 (    100408 Kb)

           Free NP PTEs:            335 (      1340 Kb)

           Free Special NP:           0 (         0 Kb)

           Modified Pages:           22 (        88 Kb)

           Modified PF Pages:        22 (        88 Kb)

           NonPagedPool Usage:    31369 (    125476 Kb)    ß Very high

           NonPagedPool Max:      31986 (    127944 Kb)

           ********** Excessive NonPaged Pool Usage *****

           PagedPool 0 Usage:     19071 (     76284 Kb)

           PagedPool 1 Usage:       735 (      2940 Kb)

           PagedPool 2 Usage:       747 (      2988 Kb)

           PagedPool 3 Usage:       720 (      2880 Kb)

           PagedPool 4 Usage:       746 (      2984 Kb)

           PagedPool Usage:       22019 (     88076 Kb)

           PagedPool Maximum:     38912 (    155648 Kb)


           ********** 3 pool allocations have failed **********


    So we can see that NPP usage is very high given the server is using the /3GB switch which limits NPP to 128MB by default.  We need to identify what pool tag is associated with the high NPP usage:


    2: kd> !poolused /t2 2

       Sorting by  NonPaged Pool Consumed


      Pool Used:

                NonPaged            Paged

     Tag    Allocs     Used    Allocs     Used

     None   246479 50827424         0        0      call to ExAllocatePool

     MmCm     1198 18462512         0        0      Calls made to MmAllocateContiguousMemory , Binary: nt!mm


    Interesting, so the offending tag is “None”.  This means that these allocations were made by calling the function ExAllocatePool instead of ExAllocatePoolWithTag.  ExAllocatePool is obsolete and should no longer be used. 


    Now, I need to find out which driver is calling this function.  First, I need to know where ExAllocatePool lives:


    2: kd> x nt!ExAllocatePool

    e0894d1f nt!ExAllocatePool


    Next, I need to search all the drivers to see which one is importing this function:


    2: kd> !for_each_module s-d @#Base @#End e0894d1f

    f50b8058  e0894d1f e0828e04 e089b708 e084011b  .M..............


    Hmm, looks suspiciously like an import table, let’s see:


    2: kd> dps f50b8058

    f50b8058  e0894d1f nt!ExAllocatePool

    f50b805c  e0828e04 nt!_wcsnicmp

    f50b8060  e089b708 nt!ExFreePoolWithTag

    f50b8064  e083e30a nt!KeInitializeEvent



    Yep, that’s an import table.  You can also verify that this is the import table of a particular module by checking the header (!dh on the module’s base address and look for “Import Address Table Directory”).


    As you can see, we have only one driver that imports ExAllocatePool.  Let’s see which driver this is:


    2: kd> !lmi f50b8058

    Loaded Module Info: [f50b8058]


       Base Address: f50b3000




    I’ve removed the incriminating identifiers from module information displayed above to protect the guilty.  It is interesting to note that the driver name was a GUID and that this driver did not exist on the disk.  This was because the driver is dynamically created when its parent program loads.  


    The software package was removed and the server was happy again. 


    -         David

  • Ntdebugging Blog

    How to Find the Owner of a Named Pipe


    This is a follow-up on the LPC hang blog. The same hang troubleshooting techniques apply to this, but when a named pipe is involved you’ll have to use a slightly different method to following the chain from a client application to the server application. For the purpose of this exercise I’ll use the named pipe server (http://msdn2.microsoft.com/en-us/library/aa365588.aspx) and client (http://msdn2.microsoft.com/en-us/library/aa365592.aspx) given by MSDN.



    Here is a diagram of our scenario.



    Here we have the client application that is waiting on a read operation to a named pipe. We need to determine what process will be putting data into the named pipe which will let the client application move forward.

    Here is the thread in the client application.



    THREAD 81e70858  Cid 06bc.06c0  Teb: 7ffdf000 Win32Thread: 00000000 WAIT: (Executive) UserMode Non-Alertable

        81baaf7c  NotificationEvent

    Not impersonating

    DeviceMap                 e25176b8

    Owning Process            81b3a020       Image:         client.exe

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      73194          Ticks: 1599 (0:00:00:24.984)

    Context Switch Count      21            

    UserTime                  00:00:00.000

    KernelTime                00:00:00.000

    Win32 Start Address client (0x00401376)

    Start Address kernel32!BaseProcessStartThunk (0x77e617f8)

    Stack Init f7450000 Current f744fc04 Base f7450000 Limit f744d000 Call 0

    Priority 10 BasePriority 8 PriorityDecrement 2

    ChildEBP RetAddr  Args to Child             

    f744fc1c 808202b6 81e70858 81e70900 00000700 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])

    f744fc34 8081fb6e 8207dac0 00000000 81baaf20 nt!KiSwapThread+0x83 (FPO: [Non-Fpo])

    f744fc78 8092deb3 81baaf7c 00000000 81f93601 nt!KeWaitForSingleObject+0x2e0 (FPO: [Non-Fpo])

    f744fca0 80924ca9 81f936e0 00000103 81baaf20 nt!IopSynchronousServiceTail+0x180 (FPO: [Non-Fpo])

    f744fd38 8082350b 00000024 00000000 00000000 nt!NtReadFile+0x5d5 (FPO: [Non-Fpo])

    f744fd38 7c8285ec 00000024 00000000 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ f744fd64)

    0012eed4 7c82776b 77e418b2 00000024 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0012eed8 77e418b2 00000024 00000000 00000000 ntdll!NtReadFile+0xc (FPO: [9,0,0])

    0012ef40 004010ab 00000024 0012ef60 00001000 kernel32!ReadFile+0x16c (FPO: [Non-Fpo])

    0012ff78 0040131f 00000001 00323038 00323078 client+0x10ab

    0012ffc0 77e6f23b 00000000 00000000 7ffd8000 client+0x131f

    0012fff0 00000000 00401376 00000000 78746341 kernel32!BaseProcessStart+0x23 (FPO: [Non-Fpo])


    We have highlighted the first parameter to the ReadFile function. This parameter is the handle to the file we are trying to read.

    Knowing this we can use the !handle extension to display details on this handle. The !handle extension takes three parameters: 1. The handle to get details on, 2: a flag controlling the level of verbosity the extension will display (f being the most verbose level), and 3: the process address.


    kd> !handle 00000024 f 81b3a020

    processor number 0, process 81b3a020

    PROCESS 81b3a020  SessionId: 0  Cid: 06bc    Peb: 7ffd8000  ParentCid: 0f2c

        DirBase: 1d486000  ObjectTable: e252bbc8  HandleCount:  10.

        Image: client.exe

    Handle table at e2845000 with 10 Entries in use

    0024: Object: 81baaf20  GrantedAccess: 0012019f Entry: e2845048

    Object: 81baaf20  Type: (823ceca0) File

        ObjectHeader: 81baaf08 (old version)

            HandleCount: 1  PointerCount: 3

            Directory Object: 00000000  Name: \mynamedpipe {NamedPipe}

    We have highlighted the object address (yellow) and the type of the object (green) above. The object is of type file which is defined by the FILE_OBJECT structure. So we can examine that using the dt (Display Type) command. The dt command can take 3 parameters: 1. The structure you want to cast the data as, 2: the address of the object, and 3: the field in the structure to be displayed.


    kd> dt nt!_FILE_OBJECT 81baaf20 FsContext2

       +0x010 FsContext2       : 0x81b5cc90


    The FsContext2 field points to an NPFS CCB structure. The importance of this structure is that it is charged to the server process which created it. The !pool extension can be used to display information about the ownership of the memory the file object is stored in.


    kd> !pool 0x81b5cc90  2

    Pool page 81b5cc90 region is Nonpaged pool

    *81b5cc88 size:   50 previous size:  140  (Allocated) *NpFc Process: 81be0d88

                  Pooltag NpFc : CCB, client control block, Binary : npfs.sys

    PROCESS 81be0d88  SessionId: 0  Cid: 07b4    Peb: 7ffd4000  ParentCid: 0160

        DirBase: 02b6f000  ObjectTable: e234a530  HandleCount:  10.

        Image: server.exe



    Now you have the established the link from the client application and the server application. Now you will be able to move forward in your debugging and figure out why the server process isn’t writing data back to the client.


    - Bryan

  • Ntdebugging Blog

    Wanted: Windows Internals subject matter experts


    Microsoft is looking for five Windows Internals subject matter experts to come work on a very special five to eight day project on the Redmond campus during the month of May 2008.  Candidates must have good communications skills, be non Microsoft employees,  have 5+ years experience with windows, be familiar with the Windows Internals book, have kernel and user mode debugging experience, and be C literate.  


    We prefer candidates strongly represent the Windows IT Professional or Windows development industry.    These typically would be 2nd or 3rd tier escalation resources or Windows developers (developing Win32 code, or device drivers) that are working for Gold Partners, ISVs or are independent consultants or MVPs.


    A Microsoft NDA (Non Disclosure Agreement) will be required.


    Please contact Jeff Dailey at jeffda@microsoft.com with the subject line “Windows Internals Project”

    Please include a copy of your resume and details explaining your Windows background, along with your contact info, and your location.


    We will cover air travel, lodging and food for non local candidates attending from within the US.


    Thank you.


    More Info:

    The Widows Internals experts selected for this project will be brought to Redmond to aid in developing a new certification program based on the Windows Internals.  This is not a hiring effort or product testing session.  Though the candidates should have C literacy they will not be writing or reviewing any Microsoft source code. 


Page 1 of 1 (6 items)