• Ntdebugging Blog

    System Won't Power Down

    • 2 Comments

    Hi All. Recently I had a Windows 2000 case where the machine wouldn’t shut down. After initiating the shutdown process, we saw the user get logged off, and on the console we watched the services shut down. The final “Windows is shutting down…” message was displayed on the screen, and then the screen would go blank. At that point we expected the machine to power down, but it continued running. Even after waiting 10 minutes the machine stayed powered up. The same behavior was exhibited after a restart as well. We tried the same thing in safe mode, but the machine still wouldn’t shut down.

    We setup the machine for a full memory dump, and crashed the box once the screen went blank. The customer uploaded the dump, and I started digging.

    The first thing I noticed was a bunch of processes still running. I expected csrss.exe, smss.exe, winlogon.exe and the System process , but to my surprise I found other processes such as svchost.exe and services.exe still lingering. Why were they still running?

     

    0: kd> !process 0 0

    **** NT ACTIVE PROCESS DUMP ****

    PROCESS f95ed480  SessionId: 0  Cid: 0008    Peb: 00000000  ParentCid: 0000

        DirBase: 00030000  ObjectTable: f95edf68  TableSize:  62.

        Image: System

     

    PROCESS f94de520  SessionId: 0  Cid: 00fc    Peb: 7ffff000  ParentCid: 0008

        DirBase: 2465c000  ObjectTable: f94def08  TableSize:  34.

        Image: SMSS.EXE

     

    PROCESS f94b01c0  SessionId: 0  Cid: 0118    Peb: 7ffff000  ParentCid: 00fc

        DirBase: 26515000  ObjectTable: f94b29e8  TableSize: 103.

        Image: CSRSS.EXE

     

    PROCESS f94a5d60  SessionId: 0  Cid: 0130    Peb: 7ffff000  ParentCid: 00fc

        DirBase: 2671a000  ObjectTable: f94b18a8  TableSize: 114.

        Image: WINLOGON.EXE

     

    PROCESS f9495a60  SessionId: 0  Cid: 014c    Peb: 7ffff000  ParentCid: 0130

        DirBase: 26814000  ObjectTable: f94b2848  TableSize: 204.

        Image: SERVICES.EXE

     

    PROCESS f9494860  SessionId: 0  Cid: 0158    Peb: bffdf000  ParentCid: 0130

        DirBase: 2681d000  ObjectTable: f9499608  TableSize: 247.

        Image: LSASS.EXE

     

    PROCESS f9476500  SessionId: 0  Cid: 01e8    Peb: 7ffff000  ParentCid: 014c

        DirBase: 272a1000  ObjectTable: f94833a8  TableSize: 136.

        Image: svchost.exe

     

     

    Let’s find out.

    How is Shutdown Performed

    At first I thought some service was stuck and not shutting down, causing the entire system to stall. I pulled out my copy of Windows Internals and thumbed through the Startup and Shutdown chapter. I found that to initiate a shutdown, Csrss sends a shutdown message to Winlogon. Winlogon then calls ExitWindowsEx, which in turn sends a message to each session’s Csrss. Inside each session, Csrss then sends messages to all the threads that own a Windows message loop to shut down. If the threads do not respond, Csrss waits for a timeout specified by HKCU\Control Panel\Desktop\HungAppTimeout. Once all the threads that own windows exit, Csrss then terminates the processes as well. If Csrss finds a console application, it sends it a message to shut down. It will wait for a timeout specified by HKCU\Control Panel\Desktop\WaitToKillAppTimeout, and then displays a message on the desktop.

    If you want Windows to terminate the processes immediately after timeout period, you can set the HKCU\Control Panel\Desktop\AutoEndTasks to 1.

    At this point all the processes in the interactive user process have been shut down. Csrss in session 0 then sends messages to the Service Control Manager(SCM) to shut down all the services. When a service is registered, it also specifies a wait hint. During shutdown, this hint is used by the SCM to wait for services to terminate. Prior to Vista, Csrss waits for the SCM to shut down, but if that does not happen by the timeout value in HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTimeout, Csrss would just kill the SCM and proceed with the shutdown. Since this could lead to services not completing their shutdown routines, going forward in Vista and beyond Windows now implements preshutdown notification routines. Services that want to get these notifications can register themselves by using the SetServiceStatus API.

    Once Csrss has finished its pass notifying system processes that the system is shutting down, Winlogon in session 0 finishes the shutdown process by calling the NtShutdownSystem. Note, many system processes such as Csrss, Smss, WinInit, Services, Lsass are still running when the machine actually powers down. They are ready to be shut down, but the processes still exist. NtShutdownSystem calls PoSetSystemPowerState to shut down all the drivers and the rest of the executive subsystems.

    One of the things that PoSetSystemPowerState does is call the I/O manager to send notifications to all the drivers that have requested the shutdown notification. Once the drivers receive these notifications, they can then clean up and perform any special steps that might be needed for their devices. Other subsystems also perform clean up, and the page file is cleared by the Memory Manager if that option was turned on. The I/O manager is called a second time, and then the file system drivers are notified that the system is shutting down. The System actually shuts down in the power manager. The power manager tells the BIOS to turn the power off to the machine.

    The Memory Dump Analysis

    I started by looking at the Winlogon process in session 0 and found the main thread had called NtShutdownSystem as we expected. I always start with this thread when troubleshooting shutdown problems.

     

    0: kd> !process f94a5d60 17

    PROCESS f94a5d60  SessionId: 0  Cid: 0130    Peb: 7ffff000  ParentCid: 00fc

        DirBase: 2671a000  ObjectTable: f94b18a8  TableSize: 114.

        Image: WINLOGON.EXE

        VadRoot f93eb748 Clone 0 Private 515. Modified 396. Locked 0.

        DeviceMap f95c8448

        Token                             e246cdf0

        ElapsedTime                        0:34:42.0937

        UserTime                          0:00:00.0109

        KernelTime                        0:00:01.0015

        QuotaPoolUsage[PagedPool]         30112

        QuotaPoolUsage[NonPagedPool]      31264

        Working Set Sizes (now,min,max)  (1172, 50, 345) (4688KB, 200KB, 1380KB)

        PeakWorkingSetSize                1866

        VirtualSize                       29 Mb

        PeakVirtualSize                   31 Mb

        PageFaultCount                    3603

        MemoryPriority                    FOREGROUND

        BasePriority                      13

        CommitCharge                      888

     

            THREAD f94a5ae0  Cid 130.114  Teb: 7fffe000  Win32Thread: e002e328 WAIT: (Suspended) KernelMode Non-Alertable

                f93c2010  SynchronizationEvent

                f94a5bc8  NotificationTimer

            Not impersonating

            Owning Process f94a5d60

            Wait Start TickCount    138381        Elapsed Ticks: 842

            Context Switch Count    1914                   LargeStack

            UserTime                  0:00:00.0078

            KernelTime                0:00:00.0875

            Start Address winlogon!_setargv (0x0100ae2c)

            Stack Init f1d00000 Current f1cffa60 Base f1d00000 Limit f1cfb000 Call 0

            Priority 15 BasePriority 15 PriorityDecrement 0 DecrementCount 0

     

            ChildEBP RetAddr  Args to Child

            f1cffa78 dd42d893 dd0655c0 f93c2008 dd0654f0 nt!KiSwapThread+0x1b1

            f1cffaa0 dd490be9 f93c2010 00000005 00000000 nt!KeWaitForSingleObject+0x1a3

            f1cffae0 dd4908c5 f93c2000 00000001 00000001 nt!PopWaitForSystemPowerIrp+0xf2

            f1cffb04 dd4903db 00000000 f1cffbf4 f1cffc78 nt!PopSetDevicesSystemState+0xfb

            f1cffbe0 dd468389 00000005 00000004 c0000004 nt!NtSetSystemPowerState+0x329

            f1cffbe0 dd432197 00000005 00000004 c0000004 nt!_KiSystemService+0xc9

            f1cffc64 dd490122 00000005 00000004 c0000004 nt!ZwSetSystemPowerState+0xb

            f1cffd48 dd494db4 00000005 00000004 c0000004 nt!NtSetSystemPowerState+0x70

            f1cffd58 dd468389 00000001 00000000 00000000 nt!NtShutdownSystem+0x2e

            f1cffd58 77f88e07 00000001 00000000 00000000 nt!_KiSystemService+0xc9

            0006fe90 01014c3d 00000001 00000001 000767b8 ntdll!NtShutdownSystem+0xb

            0006fea8 01014f6c 00000060 0000000b 0000000b winlogon!ShutdownMachine+0x165

            0006ff04 0100e20f 000767b8 0000000b 5ffa0000 winlogon!Logoff+0x216

            0006ff20 01007e8c 000767b8 00000005 000735dc winlogon!MainLoop+0x1fb

            0006ff58 0100af70 00071fc8 00000000 000735dc winlogon!WinMain+0x37a

            0006fff4 00000000 7ffff000 000000c8 00000100 winlogon!WinMainCRTStartup+0x156

     

    Looks like the I/O Manager has sent shutdown notifications to all the devices and this winlogon thread is waiting for it to complete. Dumping out the IRP, I found it had been completed, but for some reason the completion routine had never been called. That is why we are waiting forever. Digging further, it looked like someone had messed up the current IO stack location on the IRP, which resulted in the IRP completion routine never being called.

    Let’s take a look at the IRP.

    Based on the thread stack you can see Winlogon sent an IRP down to Plug and Play and is waiting for the action to complete. Since there is no obvious way to get a pointer to the IRP from the stack we’ll need to do some digging. We see that nt!PopWaitForSystemPowerIrp is waiting for some type of dispatch object to signal. According to MSDN the first parameter passed to KeWaitForSingleObject() is the object. Here I’m dumping the raw memory for the Object parameter.

    0: kd> dc f93c2010

    f93c2010  00040001 00000000 f94a5b4c f94a5b4c  ........L[J.L[J.

    f93c2020  00000000 f94a5ae0 00000000 00000063  .....ZJ.....c...

    f93c2030  00000028 00000005 dd47bd20 f93c203c  (....... .G.< <.

    f93c2040  f93c203c f93c2044 f93c2044 f938fea8  < <.D <.D <...8.

    f93c2050  00000000 00000000 01010000 00000100  ................

    f93c2060  f946c860 f946c860 f93c23ac 00000000  `.F.`.F..#<.....

    f93c2070  00000000 f93c2360 f93c2360 f93c207c  ....`#<.`#<.| <.

    f93c2080  f93c207c f93c2084 f93c2084 f93c208c  | <.. <.. <.. <.

     

    0: kd> !pool f93c2010 2

    *f93c2000 size:  420 previous size:    0  (Allocated) *PDss

     

    You may ask yourself, “are any of these values in the PDss pool pointers to IRP pool blocks?”  If you wanted to know that, you could iterate over each one of them passing them to !pool or !address.  That sounds tedious to do manually.  But thankfully, the debugger has some nice command tokens such as .foreach that make this a breeze.  If you want to pass each of the above values to !pool, you can perform the following in the debugger:

    0: kd>  .foreach /pS 1 /ps 1 ( value { dp /c 1 f93c2000 L 420/@$ptrsize } ) { .if( value != 0 ) {.printf "**** %p ****\n", ${value}; !pool ${value} 0x2} }

     

    **** 21018000 ****

    21018000: Unable to get contents of pool block

    **** 73734450 ****

    73734000: Unable to get contents of pool block

    **** 00000002 ****

    00000000: Unable to get contents of pool block

    **** 00000006 ****

    00000000: Unable to get contents of pool block

    .

    . (I truncated this output for the blog. There were many more entries)

    .

    **** f946c860 ****

    **** f946c860 ****

    *f946c800 size:  1e0 previous size:   40  (Allocated) *Irp

    **** f946c860 ****

    *f946c800 size:  1e0 previous size:   40  (Allocated) *Irp

    **** f93c23ac ****

     

    **** f93c207c ****

    *f93c2000 size:  420 previous size:    0  (Allocated) *PDss

     

    Bingo! There is a pointer to an IRP block in our PDss pool. Let’s take a look at f946c800. I’m dumping it’s pool.

     

            0: kd> !pool f946c800

     f946c000 size:   80 previous size:    0  (Allocated)  MmCa

     f946c080 size:   40 previous size:   80  (Free)       ....

     f946c0c0 size:   40 previous size:   40  (Allocated)  Ntfn

     f946c100 size:   a0 previous size:   40  (Allocated)  File (Protected)

     f946c1a0 size:   40 previous size:   a0  (Allocated)  Ntfr.

    .

    .

    .

    *f946c800 size:  1e0 previous size:   40  (Allocated) *Irp

     

    Here you see the IRP tag in this raw memory output. The first two DWORDs contain the POOL_HEADER.

     

    0: kd> dc f946c800

    f946c800  0f018002 20707249 01d80006 00000000  ....Irp ........

    f946c810  00000000 00000000 f946c818 f946c818  ..........F...F.

    f946c820  c00000bb 00000000 0b0a0000 04000000  ................

    f946c830  00000000 00000000 00000000 00000000  ................

    f946c840  00000000 00000000 00000000 00000000  ................

    f946c850  00000000 00000000 00000000 00000000  ................

    f946c860  f93c2060 f93c2060 f946c9e0 00000000  ` <.` <...F.....

    f946c870  00000000 00000000 00000000 00000000  ................

     

    Since the pool header is 8 bytes long we need to add 8 to f946c800 to skip over the header and get to the actual IRP.

     

    0: kd> ? f946c800 + 8

    Evaluate expression: -112801784 = f946c808

     

    0: kd> !irp f946c808

    Irp is active with 10 stacks 11 is current (= 0xf946c9e0)

     No Mdl Thread 00000000:  Irp is completed. 

         cmd  flg cl Device   File     Completion-Context

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [  0, 0]   0  0 00000000 00000000 00000000-00000000   

     

                        Args: 00000000 00000000 00000000 00000000

     [ 16, 2]   0 e1 f9533ba0 00000000 dd491028-f93c2354 Success Error Cancel pending

                  \Driver\Diskperf    nt!PopCompleteSystemPowerIrp

                        Args: 00000000 00000000 00000006 00000005

     

    The Major function is 16 (IRP_MJ_POWER) which synchs up with the power management activity we saw in the stack. You can find all the major function codes for IRPs in the standard header files such as WDM.H or NTDDK.H available with the WDK. There is an excellent chance this is the IRP Winlogon sent down to Plug and Play. Now that we have the IRP in question with the IO Stack locations we see that the pending status is set.

     

                         Args: 00000000 00000000 00000000 00000000

     [ 16, 2]   0 e1 f9533ba0 00000000 dd491028-f93c2354 Success Error Cancel pending

                  \Driver\Diskperf    nt!PopCompleteSystemPowerIrp

                         Args: 00000000 00000000 00000006 00000005

     

     

    It appears to be stuck in \Driver\Diskperf which is the device stack for the device associated with Diskperf.  At this point I determined it was most likely due to one of the devices in it’s devstack.

    0: kd> !devstack f9533ba0

      !DevObj   !DrvObj            !DevExt   ObjectName

      f94d5660  \FileSystem\Foo      f94d5718 

      f95338a0  \Driver\VSP          f9533958 

    > f9533ba0  \Driver\Diskperf     f9533c58 

      f9539150  \Driver\Ftdisk       f9539208  HarddiskVolume1

    !DevNode f953aba8 :

      DeviceInst is "STORAGE\Volume\1&30a96598&0&Signature80Offset7E00Length2730C00"

      ServiceName is "DiskPerf"

     

    Doing a bing search for the drivers in the list above (and shutdown), I found a file system filter driver known to cause this issue in the past. Updating to the latest version of the driver fixed the issue, and the machine would shutdown normally.

    BTW - For more information on how Windows performs a shutdown check out this past article from Ntdebugging - http://blogs.msdn.com/ntdebugging/archive/2007/06/09/how-windows-shuts-down.aspx

     

    Share this post :

     

  • Ntdebugging Blog

    How to Track Leaky Pool

    • 2 Comments

    Today I want to talk about tracking down leaking pool. Back with Server 2003 and before, leaking pool was a major issue because it was a limited resource. In Vista and beyond, it isn’t as much of an issue since pool is allocated dynamically, but it can still cause system performance issues if a component uses too much of it. Since all nonpaged pool allocations are locked in RAM, it leaves the system with a smaller amount of RAM for anything else.

    I recently worked a case involving leaking nonpaged pool and it was easy to troubleshoot. I thought I would walk through the steps I used to narrow down the issue in this article.  So let's get started. Note - The following can be done on a live machine or with a kernel memory dump as well.

    Running the !vm command, you can get statistics about the virtual memory usage on the system.

      

           Virtual Memory Usage

     

     

           Physical Memory:   655219   ( 2620876 Kb)

     

           Page File: \??\C:\pagefile.sys

              Current:   4190208Kb Free Space:   4115148Kb

              Minimum:   4190208Kb Maximum:      4190208Kb

     

           Available Pages:   303760   ( 1215040 Kb)

           ResAvail Pages:    571159   ( 2284636 Kb)

           Modified Pages:       677   (    2708 Kb)

           NonPagedPool Usage: 59613   (  238452 Kb)

           NonPagedPool Max:   69377   (  277508 Kb)

                                                              NonPaged Pool Usage :  85%

           ********** Excessive NonPaged Pool Usage *****

           PagedPool 0 Usage:  31684   (  126736 Kb)

           PagedPool 1 Usage:   1824   (    7296 Kb)

           PagedPool 2 Usage:   1821   (    7284 Kb)

           PagedPool 3 Usage:   1824   (    7296 Kb)

           PagedPool 4 Usage:   1827   (    7308 Kb)

           PagedPool Usage:    38980   (  155920 Kb)

           PagedPool Maximum:  86016   (  344064 Kb)

                                                              Paged Pool Usage :  45%

     

    Hmm….looks like something is using a lot of nonpaged pool. To determine what is using the maximum amount of nonpaged pool, run the !poolused command.

    0: kd>!poolused 2

       Sorting by NonPaged Pool Consumed

     

      Pool Used:

                NonPaged            Paged

     Tag    Allocs     Used    Allocs     Used

     Thre   326514 208968960         0       0

     MmCm      647  2830624         0        0

     LSwi        1  2293760         0        0

     Npfs     1681  1276672      1202   208320

     File     5863   945472         0        0

     Dmio      131   525760         0        0

     Irp      1158   508800         0        0

     

    Adding a value of 2 to end of the command instructs the debugger to sort by nonpaged pool used. For more information on the command, please review the help file.

    According to the output, the “Thre” tag is the highest user of nonpaged pool. To get a rough idea of all the modules that might be allocating pool using this tag, we can search all the modules to see if they contain the string. This can help narrow down the problem to a few drivers. To search all the modules, just use the for_each_module command.

          !for_each_module s -a @#Base @#End "Thre"

     

    Look at David Butler’s blog post if you want to narrow it down even further by looking at each driver’s import table to see if they are importing the ExAllocatePool or ExAllocatePoolWithTag functions. Though, in this case I already know  the “Thre” tag is used by the kernel to allocate thread objects. Could the kernel be leaking thread objects? Most likely not, otherwise we would have seen a lot of customers reporting this issue. I am not ruling it out, but to start off, I am going to focus on the third party software installed on the machine.

    Let’s dump out a few sample pool allocations, to see what exactly they are. I was sure there were going to be a ton of them, so I just hit the Break button once the output started flying by.

    0: kd> !poolfind Thre 0

    Scanning large pool allocation table for Tag: Thre

    Searching NonPaged pool (81093000 : 89093000) for Tag: Thre

     

    81098000 size:  280 previous size:    0  (Allocated) Thre (Protected)

    81098380 size:  280 previous size:  100  (Allocated) Thre (Protected)

    81098600 size:  280 previous size:  280  (Allocated) Thre (Protected)

    81098880 size:  280 previous size:  280  (Allocated) Thre (Protected)

    81098b00 size:  280 previous size:  280  (Allocated) Thre (Protected)

    81098d80 size:  280 previous size:  280  (Allocated) Thre (Protected)

    81099000 size:  280 previous size:    0  (Allocated) Thre (Protected)

    810992c0 size:  280 previous size:   40  (Allocated) Thre (Protected)

    810996e0 size:  280 previous size:  1a0  (Allocated) Thre (Protected)

    81099960 size:  280 previous size:  280  (Allocated) Thre (Protected)

    81099d80 size:  280 previous size:  1a0  (Allocated) Thre (Protected)

    8109a000 size:  280 previous size:    0  (Allocated) Thre (Protected)

    8109a380 size:  280 previous size:  100  (Allocated) Thre (Protected)

    8109a600 size:  280 previous size:  280  (Allocated) Thre (Protected)

    8109a880 size:  280 previous size:  280  (Allocated) Thre (Protected)

    .

    .

    .

     

    It would be interesting to see the thread stacks for the KTHREAD structures in these pools. To dump out the thread using the !thread command, I needed to figure out where the base of the KTHREAD structure started in the pool. To figure out the offset I dumped out all the threads in the System process, picked a random thread, and then calculated the thread’s offset from the start of it’s pool block. You can use any thread in the entire system because the offset will always be the same.

    0: kd> !thread  8904a180

    THREAD 8904a180  Cid 8.18  Teb: 00000000  Win32Thread: 00000000 WAIT: (WrEventPairLow) KernelMode Non-Alertable

        80478d40  Unknown

    Not impersonating

    Owning Process 890754a0

    Wait Start TickCount    3831270       Elapsed Ticks: 209711

    Context Switch Count    28238              

    UserTime                  0:00:00.0000

    KernelTime                0:00:00.0875

    Start Address nt!ExpWorkerThread (0x80417a9a)

    Stack Init f6444000 Current f6443d34 Base f6444000 Limit f6441000 Call 0

    Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 0

    ChildEBP RetAddr  Args to Child

    f6443d4c 8042ef5f bfec0068 80478d40 8904a180 nt!KiSwapThread+0x1b1

    f6443d70 80417b0d 00000000 80442f00 00000000 nt!KeRemoveQueue+0x197

    f6443da8 804578c2 00000000 00000000 00000000 nt!ExpWorkerThread+0x73

    f6443ddc 8046c966 80417a9a 00000000 00000000 nt!PspSystemThreadStartup+0x54

    00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

     

    0: kd> !pool 8904a180 

     8904a080 size:   e0 previous size:   20  (Allocated)  Dire (Protected)

    *8904a160 size:  280 previous size:   e0  (Allocated) *Thre (Protected)

     8904a3e0 size:  280 previous size:  280  (Allocated)  Thre (Protected)

     

    0: kd> ?8904a180-8904a160

    Evaluate expression: 32 = 00000020

     

    So based on my calculations, the Kthread structure starts 32 bytes into the Thre pool.

    Now I can use this offset to dump threads stacks from the pools returned by the poolfind command.  Some of the threads are going to be valid, but most of them are going to be doing something fishy. Basically I need to use a large sample to figure out what would be causing the Thre tag to leak. After a few tries, I was able to find a lot of threads which looked like this -

    kd> !Thread 81098620 

    THREAD 81098620Cid 8e8.a391c  Teb: 00000000  Win32Thread: 00000000 TERMINATED

    Not impersonating

    Owning Process 886f5020

    Wait Start TickCount    1949088       Elapsed Ticks: 2091893

    Context Switch Count    2              

    UserTime                  0:00:00.0000

    KernelTime                0:00:00.0000

    Start Address 0x7c57b740

    Win32 Start Address 0x20003474

    Stack Init 0 Current bca4fc44 Base bca50000 Limit bca4d000 Call 0

    Priority 16 BasePriority 8 PriorityDecrement 0 DecrementCount 0

     

     

    kd> !PROCESS 886f5020 0

    PROCESS 886f5020SessionId: 0  Cid: 08e8    Peb: 7ffdf000  ParentCid: 016c

        DirBase: 694bc000  ObjectTable: 88b3ec68  TableSize: 391189.

        Image: xxxxxx.EXE

     

    Looks like the thread is terminated. But why is it still hanging around?

    0: kd> !object 81098620

    Object: 81098620  Type: (89075900) Thread

        ObjectHeader: 81098608

        HandleCount: 1  PointerCount: 1

     

    Aha! Since the handle count and pointer count are not 0, the system has not freed this Thre allocation. Most likely something has an open handle to this object. Dumping out all the processes using the !process command, I found a particular process with 388,836 open handles. This is the culprit. Interesting enough, this is the same process which was listed in the thread’s process field.

           1  Token

           1  Desktop

           1  Section

           2  WindowStation

           3  Directory

           4  IoCompletion

           5  Mutant

           7  Port

           9  Semaphore

          15  File

          73  Event

       65110  Key

      323155  Thread

     

      388386  Handles in process

     

    Running !handle 0 2 886f5020 and analyzing the output, I found a ton of thread handles. Most likely the process is leaking thread handles and preventing the system from cleaning up the thread allocations. This is resulting in high usage of nonpaged pool, which is causing the performance issues seen by the customer.

    We contacted the software vendor responsible for the process, and it turned out to be a known bug in their software. An update was provided to the customer, and it fixed the issue immediately.

     

    Share this post :

  • Ntdebugging Blog

    Part 1: Got Stack? No. We ran out of Kernel Mode Stack and kv won’t tell me why!

    • 1 Comments

    My name is Ryan Mangipano (ryanman) and I am a Sr. Support Escalation Engineer at Microsoft.  This two part blog will consist of a complete walkthrough of a bugcheck that occurred due to an overflowed stack condition. What is unique about this situation is the stack backtrace wasn’t being displayed.  As we proceed with the walkthrough of the dump analysis, I will be providing demonstrations and background information relating to Task States and Double Faults. 

     

    I began my review with the command !analyze –v

     

    !analyze -v

    UNEXPECTED_KERNEL_MODE_TRAP (7f)

    Arg1: 00000008, EXCEPTION_DOUBLE_FAULT

     

    You can see from the output above that an unexpected kernel mode trap has taken place. Arg1 reported by the output from !analyze –v indicates the type of trap that occurred was a double fault. A double fault indicates that an exception occurred during a call to the handler for a prior exception. Although a double fault can be caused by other reasons (hardware or a corrupt stack pointer value), we most commonly observe this bugcheck when the drivers executing on the system have caused all of the available 12k of Kernel mode stack space to become exhausted.

     

     Threads on a 32-bit system are given 12k of kernel-mode stack space.  16k of kernel virtual address space will actually be consumed due to the occupation of 4K of virtual address space by an invalid PTE. This guard PTE is used to guard the virtual address range before the kernel stack limit (The stack grows toward lower memory addresses).  This 4k guard page is placed in this location to catch stack overflows. The 12k stack size is not configurable because it is hard coded into the kernel.  For more information, please refer to “Windows Internals, Fifth Edition, page 786, Kernel Stacks”.  

     

    If the 12k of kernel stack space is all used up and drivers attempt to use stack space beyond the valid range, a page fault exception will occur as the invalid virtual addresses related to the guard PTE are referenced.   

     

    When this page fault exception occurs, the CPU will automatically attempt to push some data onto the stack before transferring control to the page fault handler (Thank you to one of our reader's for correcting this information). More details on what data is pushed to the stack is available in the Intel Processor Family Developer’s Manual, Vol. 3 Chapter 14 (Protected-Mode Exceptions and Interrupts). However when the CPU tries to push this data, another fault will occur due to the stack pointer still providing an invalid address. This causes a double exception (AKA EXCEPTION_DOUBLE_FAULT).

     

    So how can the OS handle this type of situation in order to write out the dump file?  The code associated with TRAP 0x8 (EXCEPTION_DOUBLE_FAULT) will perform a task state segment switch and obtain a new stack pointer which is valid. Task State Segment switching is a CPU provided mechanism that allows us to switch to a new task state and store a link to the previously executed task state. The information that is needed to restore a task is stored in a task-state segment (TSS).  The debugger command .tss can later be used to switch back to the previous task state to examine the context at the time of failure. More information regarding Task-State Segment (TSS) is available in the Intel Processor Manual Set (volume 3, Chapter 6).

     

     

    In addition to the bugcheck data listed above, the output from the command !analyze -v has also provided me with the .tss command that I needed to type into the debugger.:

     

    TSS:  00000028 -- (.tss 0x28)

     

    You can type .tss 0x28 in the command window but I simply clicked the DML (debugger Markup Language) hyperlink which entered the .tss command for me. As discussed above, this command accepts the address of the saved Task State Segment (TSS) information for the current processor. This command will set the appropriate context just like the .trap or .cxr commands. 

     

    The processor provides a Task Register which contains a 16-bit segment selector.  The register is actually larger. There is other data stored in this register, however it is only viewable by the processor for caching the segment descriptor. Windbg’s r command can be used to dump out the usable portion of this register.

     

    3: kd> rtr

    tr=00000050

     

    So the task register was pointing to a different task (.tss 0x50) at the time of the second exception. But where did !analyze –v get this .tss 0x28 value from?

     

    Let’s do some digging. You can get the address of the TSS for the current processor by using the !pcr command

     

    3: kd> !pcr

    KPCR for Processor 3 at f7737000:

    .
    .

    (omitted several fields for this blog)

     

                  TSS: f773a2e0

     

    Extensions like !pcr are great, but I also like to understand how the values were obtained. So instead of just getting the value from !pcr,  How else can we find it?

    The fs register points to the memory segment that the _KPCR for the current processor is stored. This structure is stored at the base, offset 0x0.

     

    3: kd> rfs

    Last set context:

         fs=00000030       ßpoints to the segment where the nt!_KPCR is stored at the base of.

     

    Let’s see where the _KTSS pointer is stored within the KPCR structure.

     

    3: kd> dt nt!*PCR*

              ntkrpamp!_KPCR

     

    3: kd> dt ntkrpamp!_KPCR TSS

       +0x040 TSS : Ptr32 _KTSS  0x40 is the offset that the pointer to the TSS is stored.

     Let’s use those two values to dump this out. The 0030: represents the memory segment. Note that I have added 0x40 from the base and dumped out this location 

     

    3: kd> dd 0030:00000040 L1   

    0030:00000040  f773a2e0    ßpointer to the nt!_KTSS

     

    3: kd> dt nt!_KTSS f773a2e0 Backlink

     +0x000 Backlink : 0x28    ß And here is our pointer to the previous task state.

     

    This is why !analyze –v   has directed us to type in .tss 0x28

     

    But where did !pcr get the address of the KPCR itself? !pcr is listing f7737000. We can find that out also.

     

    3: kd> dt ntkrpamp!_KPCR SelfPcr

       +0x01c SelfPcr : Ptr32 _KPCR   ßso the pointer is stored at 0x1c

    This command demonstrates the use of fs: instead of 0030: (BTW - I then provided the offset of 1c to get the pointer)

     

    3: kd> dd fs:0x1c L1                 

    0030:0000001c  f7737000         there it is, we found it

     

    To demonstrate that both addresses reference the same data, let’s  dump it out using the size given below.

     

    3: kd> dd f7737000 L0x54         

    f7737000  b8ae60dc 00000000 00000000 f7737fe0

    f7737010  19d5c42c 00000008 7ff9c000 f7737000

    f7737020  f7737120 0000001f 00000000 00000000

    f7737030  ffffffff 00000000 f773d800 f773d400

    f7737040  f773a2e0 00010001 00000008 00000e56

    f7737050  08000300 00000000 00000000 00000000

    f7737060  00000000 00000000 00000000 00000000

    f7737070  00000000 00000000 00000000 00000000

    f7737080  00000000 00000000 00000000 00000000

    f7737090  00100000 00000003 09f15190 00000000

    f77370a0  09f15190 dabc6620 00000000 334e730f

    f77370b0  00000000 00000000 00000000 00000000

    f77370c0  00000000 00000000 00000000 00000000

    f77370d0  00000000 00000000 00000000 00000000

    f77370e0  00000000 00000000 00000000 00000000

    f77370f0  00000000 00000000 00000000 00000000

    f7737100  00000000 00000000 00000000 00000000

    f7737110  00000000 00000000 00000000 00000000

    f7737120  00010001 87d68438 00000000 f773a090

    f7737130  00000003 00000008 0401010f 00000000

    f7737140  00000000 00000000 00000000 00000000

     

    3: kd> dd fs:0 L0x54

    0030:00000000  b8ae60dc 00000000 00000000 f7737fe0

    0030:00000010  19d5c42c 00000008 7ff9c000 f7737000

    0030:00000020  f7737120 0000001f 00000000 00000000

    0030:00000030  ffffffff 00000000 f773d800 f773d400

    0030:00000040  f773a2e0 00010001 00000008 00000e56

    0030:00000050  08000300 00000000 00000000 00000000

    0030:00000060  00000000 00000000 00000000 00000000

    0030:00000070  00000000 00000000 00000000 00000000

    0030:00000080  00000000 00000000 00000000 00000000

    0030:00000090  00100000 00000003 09f15190 00000000

    0030:000000a0  09f15190 dabc6620 00000000 334e730f

    0030:000000b0  00000000 00000000 00000000 00000000

    0030:000000c0  00000000 00000000 00000000 00000000

    0030:000000d0  00000000 00000000 00000000 00000000

    0030:000000e0  00000000 00000000 00000000 00000000

    0030:000000f0  00000000 00000000 00000000 00000000

    0030:00000100  00000000 00000000 00000000 00000000

    0030:00000110  00000000 00000000 00000000 00000000

    0030:00000120  00010001 87d68438 00000000 f773a090

    0030:00000130  00000003 00000008 0401010f 00000000

    0030:00000140  00000000 00000000 00000000 00000000

     

     

    Now that you have an idea of what a state is, let’s examine the stack output of the two states. First, we shall use .tss 0x50 to examine the stack backtrace associated with this state. We shall use the kC command to dump the stack after the .tss command. Notice that we have used the ; command to enter multiple commands on each line.

     

    3: kd> .tss 0x50;kC

    eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=00000000 edi=00000000

    eip=8088b702 esp=f773d3c0 ebp=00000000 iopl=0         nv up di pl nz na po nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00000000

    nt!_KiTrap08:

    8088b702 fa              cli

      *** Stack trace for last set context - .thread/.cxr resets it

     

    nt!_KiTrap08  

    nt!_KiTrap0E       

     

    The stack backtrace shows two trap handlers and nothing else. According to this stack output, we were first attempting to handle a Trap 0x0E which is a page fault. The page fault handler was invoked in an attempt to handle the invalid address that we accessed in the guard page when we overflowed the stack.  You can see that after the page fault, we encountered another exception represented by the KiTrap08. This is a result of the EXCEPTION_DOUBLE_FAULT indicating that the page fault handler has also encountered an exception. This matches what is listed as arg1 in the bugcheck data that !analyze –v has output. So, the stack backtrace for .tss 0x50 shows that we were first executing the task referenced by 0x28 for Trap0E/Page Fault, when a task state switch occurred and we switched to .tss 0x50 to handle the Trap08/DoubleFault. 

     

    Next, we will use the command .tss 0x28 and dump the stack backtrace associated with that task state

     

    3: kd> .tss 0x28;kC

    eax=b8ae0023 ebx=b8ae60ec ecx=87d68438 edx=87758bd8 esi=b8ae6068 edi=808813d8

    eip=8088c718 esp=b8ae5fe4 ebp=b8ae5fe4 iopl=0         nv up di pl zr na pe nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010046

    nt!_KiTrap0E+0x5c:

    8088c718 89550c          mov     dword ptr [ebp+0Ch],edx ss:0010:b8ae5ff0=????????

      *** Stack trace for last set context - .thread/.cxr resets it

     nt!_KiTrap0E       

     

    You can see in the output above that the stack backtrace has only displayed KiTrap0xE. We should see multiple stack frames listed. I’m a bit concerned about the fact that I do not see a valid stack backtrace listed in the output above. Nevertheless, let's proceed with our examination.

     

     

    Now that we have set the proper task state using .tss 0x28 which loaded the registers with the appropriate context, our next step will be to determine where the stack related registers are pointing and how they relate to the 12k range of valid stack addresses for the current thread. This will help us to validate that we did in fact overflow the stack. The easiest way to examine the stack range that this thread was given is to use !thread

     

    !thread   

    Owning Process            874c6800       Image:         StackHog.exe

    ...

    Base b8ae9000 Limit b8ae6000

     

     

    3: kd> resp;rebp

    Last set context:

    esp=b8ae5fe4   ßNotice that this is outside of the Base and Limit ranges listed above.

    Last set context:

    ebp=b8ae5fe4

     

    Since the stack grows toward lower addresses, an overflow of the b8ae6000 limit will result in a value that is below the address of the limit, you can see that the address of esp has fallen out of the valid range of stack space.

     

    3: kd> dd b8ae5fe4 L1

    b8ae5fe4  ????????

     

    Let’s look at this memory range in more detail. The invalid ranges displayed by ????? represent the guard page. The range of valid stack addresses starts (or ends depending on how you look at it) at b8ae6000.

     

    3: kd> dd b8ae5fe0 L10

    b8ae5fe0  ???????? ???????? ???????? ????????

    b8ae5ff0  ???????? ???????? ???????? ????????

    b8ae6000  00000000 00000000 00000000 00000000

    b8ae6010  00000000 b8ae0000 b8ae0023 00000023

     

    Also, note that we are running in trap handler 0x0E.  This is the page fault handler on x86 (refer to your Intel Processor Manuals for more details).

     

    3: kd> u . L1

    nt!_KiTrap0E+0x5c

    8088c718 89550c          mov     dword ptr [ebp+0Ch],edx

     

    The address we we're attempting to access may be in cr2. Let’s dump it out.

     

    3: kd> rcr2

    Last set context:

    cr2=b8ae5fe0     ßThis address is just beyond the stack limit for this thread

     

    What is the present instruction in the trap handler doing?

     

    3: kd> u . L1

    nt!_KiTrap0E+0x5c

    8088c718 89550c          mov     dword ptr [ebp+0Ch],edx

     

    Ok, so were dereferencing ebp plus an offset of 0x0C. What does that add up to be?

     

    3: kd> ? ebp+0x0c

    Evaluate expression: -1196531728 = b8ae5ff0

     

    3: kd> dd b8ae5ff0 L1

    b8ae5ff0  ????????

     

    Once the stack overflowed, we can see there were many access attempts to addresses which are not in the valid stack range. This led us to the 7f bugcheck with the double fault parameter.

     

    When the system bugchecks because the entire 12k range of a thread’s kernel-mode stack space has been filled up, there can be a few causes. Drivers on the stack may have made very large allocations on the stack instead of using other methods of obtaining memory such as calling ExAllocatePoolWithTag(). Sometimes this is done since it is quicker to use the stack instead of making calls to allocate and free memory from the operating system pools. Other times a driver will have made calls in a manner that causes too many other calls to be made filling up the stack. It is possible for nested functions that never encounter an exit condition to continuously call themselves to exhaust the stack. Often a system will have software from many different vendors that all install heavy stack consuming drivers into the I/O path. Each driver will use a portion of stack space that will add up to a lot since there are so many drivers installed.  For example, if a system has too many file system filter drivers installed in the file system stack and they use more than the minimum amount of stack space possible, it’s not uncommon for all of them put together to cause a stack overflow.

     

    Sometimes when dealing with this error, we need to realize that there may not be any one product to blame. A stop 7f sometimes isn’t about identifying the faulting component as it often is in other areas of troubleshooting. It is more about understanding that stack space isn’t an unlimited resource and developing a clear picture of what lead up to the stack space filling up. Sometimes this will result in the need to engage multiple vendors for assistance when there are a combination of drivers on the stack that are all using a large amount of stack space.  Sometimes vendors will provide newer updated drivers that have been optimized to use less stack space. Other times, we simply have too much I/O related software installed and the only answer is to simply remove some of the drivers by uninstalling the product.

     

    NTFS and some 3rd party file system filter drivers employ a technique to avoid a stack overflow.  What they do is probe the stack by calling IoGetRemainingStackSize() and if there is not enough stack space left, they will offload the remainder of the work to a dedicated kernel thread that they created just for that purpose. On Vista or Later (or 2003 x64), developers can call KeExpandKernelStackAndCallout, which will allow chaining to another 16k stack.  For more information, see http://msdn.microsoft.com/en-us/library/aa906738.aspx.

     

    The easiest way to figure out why we have overflowed the stack is to dump it out and examine the stack backtrace. Therefore, this is typically the first and sometimes the only step necessary to perform when reviewing an EXCEPTION_DOUBLE_FAULT memory dump.  We will now proceed to dump out the stack and examine the stack usage of the different drivers and the calls that they made to determine if further investigation is needed. So, let’s do that now.  I will use the L200 option; otherwise the debugger will only display the default number of frames which won’t display the entire stack. It doesn’t make much sense to review only the top of the stack since the entire stack is full.  I dumped the stack and only got one stack frame listed.

     

    3: kd> kfL200

      Memory  ChildEBP RetAddr 

              b8ae5fe4 00000000 nt!_KiTrap0E+0x5c

     

    This is not what I was hoping to see. We don’t have a stack. Let me try using kv to see if there is a trap frame

     

    3: kd> kvL200

    ChildEBP RetAddr  Args to Child             

    b8ae5fe4 00000000 00000000 00000000 00000000 nt!_KiTrap0E+0x5c (FPO: [0,0] TrapFrame-EDITED @ 00000000)

     

    So, I don’t see a valid trap frame either.  I went back to my !analyze –v output and verified that it had also displayed this one frame only. How will we see what filled up the stack to provide recommendations to the customer or analysis on what happened without the stack? In part two of this blog, we will review how to manually reconstruct the stack and pass values into the kf command in order to get a useful stack backtrace to display.

     

     

    Share this post :

Page 1 of 1 (3 items)