How to Track Leaky Pool

How to Track Leaky Pool

  • Comments 2

Today I want to talk about tracking down leaking pool. Back with Server 2003 and before, leaking pool was a major issue because it was a limited resource. In Vista and beyond, it isn’t as much of an issue since pool is allocated dynamically, but it can still cause system performance issues if a component uses too much of it. Since all nonpaged pool allocations are locked in RAM, it leaves the system with a smaller amount of RAM for anything else.

I recently worked a case involving leaking nonpaged pool and it was easy to troubleshoot. I thought I would walk through the steps I used to narrow down the issue in this article.  So let's get started. Note - The following can be done on a live machine or with a kernel memory dump as well.

Running the !vm command, you can get statistics about the virtual memory usage on the system.

  

       Virtual Memory Usage

 

 

       Physical Memory:   655219   ( 2620876 Kb)

 

       Page File: \??\C:\pagefile.sys

          Current:   4190208Kb Free Space:   4115148Kb

          Minimum:   4190208Kb Maximum:      4190208Kb

 

       Available Pages:   303760   ( 1215040 Kb)

       ResAvail Pages:    571159   ( 2284636 Kb)

       Modified Pages:       677   (    2708 Kb)

       NonPagedPool Usage: 59613   (  238452 Kb)

       NonPagedPool Max:   69377   (  277508 Kb)

                                                          NonPaged Pool Usage :  85%

       ********** Excessive NonPaged Pool Usage *****

       PagedPool 0 Usage:  31684   (  126736 Kb)

       PagedPool 1 Usage:   1824   (    7296 Kb)

       PagedPool 2 Usage:   1821   (    7284 Kb)

       PagedPool 3 Usage:   1824   (    7296 Kb)

       PagedPool 4 Usage:   1827   (    7308 Kb)

       PagedPool Usage:    38980   (  155920 Kb)

       PagedPool Maximum:  86016   (  344064 Kb)

                                                          Paged Pool Usage :  45%

 

Hmm….looks like something is using a lot of nonpaged pool. To determine what is using the maximum amount of nonpaged pool, run the !poolused command.

0: kd>!poolused 2

   Sorting by NonPaged Pool Consumed

 

  Pool Used:

            NonPaged            Paged

 Tag    Allocs     Used    Allocs     Used

 Thre   326514 208968960         0       0

 MmCm      647  2830624         0        0

 LSwi        1  2293760         0        0

 Npfs     1681  1276672      1202   208320

 File     5863   945472         0        0

 Dmio      131   525760         0        0

 Irp      1158   508800         0        0

 

Adding a value of 2 to end of the command instructs the debugger to sort by nonpaged pool used. For more information on the command, please review the help file.

According to the output, the “Thre” tag is the highest user of nonpaged pool. To get a rough idea of all the modules that might be allocating pool using this tag, we can search all the modules to see if they contain the string. This can help narrow down the problem to a few drivers. To search all the modules, just use the for_each_module command.

      !for_each_module s -a @#Base @#End "Thre"

 

Look at David Butler’s blog post if you want to narrow it down even further by looking at each driver’s import table to see if they are importing the ExAllocatePool or ExAllocatePoolWithTag functions. Though, in this case I already know  the “Thre” tag is used by the kernel to allocate thread objects. Could the kernel be leaking thread objects? Most likely not, otherwise we would have seen a lot of customers reporting this issue. I am not ruling it out, but to start off, I am going to focus on the third party software installed on the machine.

Let’s dump out a few sample pool allocations, to see what exactly they are. I was sure there were going to be a ton of them, so I just hit the Break button once the output started flying by.

0: kd> !poolfind Thre 0

Scanning large pool allocation table for Tag: Thre

Searching NonPaged pool (81093000 : 89093000) for Tag: Thre

 

81098000 size:  280 previous size:    0  (Allocated) Thre (Protected)

81098380 size:  280 previous size:  100  (Allocated) Thre (Protected)

81098600 size:  280 previous size:  280  (Allocated) Thre (Protected)

81098880 size:  280 previous size:  280  (Allocated) Thre (Protected)

81098b00 size:  280 previous size:  280  (Allocated) Thre (Protected)

81098d80 size:  280 previous size:  280  (Allocated) Thre (Protected)

81099000 size:  280 previous size:    0  (Allocated) Thre (Protected)

810992c0 size:  280 previous size:   40  (Allocated) Thre (Protected)

810996e0 size:  280 previous size:  1a0  (Allocated) Thre (Protected)

81099960 size:  280 previous size:  280  (Allocated) Thre (Protected)

81099d80 size:  280 previous size:  1a0  (Allocated) Thre (Protected)

8109a000 size:  280 previous size:    0  (Allocated) Thre (Protected)

8109a380 size:  280 previous size:  100  (Allocated) Thre (Protected)

8109a600 size:  280 previous size:  280  (Allocated) Thre (Protected)

8109a880 size:  280 previous size:  280  (Allocated) Thre (Protected)

.

.

.

 

It would be interesting to see the thread stacks for the KTHREAD structures in these pools. To dump out the thread using the !thread command, I needed to figure out where the base of the KTHREAD structure started in the pool. To figure out the offset I dumped out all the threads in the System process, picked a random thread, and then calculated the thread’s offset from the start of it’s pool block. You can use any thread in the entire system because the offset will always be the same.

0: kd> !thread  8904a180

THREAD 8904a180  Cid 8.18  Teb: 00000000  Win32Thread: 00000000 WAIT: (WrEventPairLow) KernelMode Non-Alertable

    80478d40  Unknown

Not impersonating

Owning Process 890754a0

Wait Start TickCount    3831270       Elapsed Ticks: 209711

Context Switch Count    28238              

UserTime                  0:00:00.0000

KernelTime                0:00:00.0875

Start Address nt!ExpWorkerThread (0x80417a9a)

Stack Init f6444000 Current f6443d34 Base f6444000 Limit f6441000 Call 0

Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 0

ChildEBP RetAddr  Args to Child

f6443d4c 8042ef5f bfec0068 80478d40 8904a180 nt!KiSwapThread+0x1b1

f6443d70 80417b0d 00000000 80442f00 00000000 nt!KeRemoveQueue+0x197

f6443da8 804578c2 00000000 00000000 00000000 nt!ExpWorkerThread+0x73

f6443ddc 8046c966 80417a9a 00000000 00000000 nt!PspSystemThreadStartup+0x54

00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

 

0: kd> !pool 8904a180 

 8904a080 size:   e0 previous size:   20  (Allocated)  Dire (Protected)

*8904a160 size:  280 previous size:   e0  (Allocated) *Thre (Protected)

 8904a3e0 size:  280 previous size:  280  (Allocated)  Thre (Protected)

 

0: kd> ?8904a180-8904a160

Evaluate expression: 32 = 00000020

 

So based on my calculations, the Kthread structure starts 32 bytes into the Thre pool.

Now I can use this offset to dump threads stacks from the pools returned by the poolfind command.  Some of the threads are going to be valid, but most of them are going to be doing something fishy. Basically I need to use a large sample to figure out what would be causing the Thre tag to leak. After a few tries, I was able to find a lot of threads which looked like this -

kd> !Thread 81098620 

THREAD 81098620Cid 8e8.a391c  Teb: 00000000  Win32Thread: 00000000 TERMINATED

Not impersonating

Owning Process 886f5020

Wait Start TickCount    1949088       Elapsed Ticks: 2091893

Context Switch Count    2              

UserTime                  0:00:00.0000

KernelTime                0:00:00.0000

Start Address 0x7c57b740

Win32 Start Address 0x20003474

Stack Init 0 Current bca4fc44 Base bca50000 Limit bca4d000 Call 0

Priority 16 BasePriority 8 PriorityDecrement 0 DecrementCount 0

 

 

kd> !PROCESS 886f5020 0

PROCESS 886f5020SessionId: 0  Cid: 08e8    Peb: 7ffdf000  ParentCid: 016c

    DirBase: 694bc000  ObjectTable: 88b3ec68  TableSize: 391189.

    Image: xxxxxx.EXE

 

Looks like the thread is terminated. But why is it still hanging around?

0: kd> !object 81098620

Object: 81098620  Type: (89075900) Thread

    ObjectHeader: 81098608

    HandleCount: 1  PointerCount: 1

 

Aha! Since the handle count and pointer count are not 0, the system has not freed this Thre allocation. Most likely something has an open handle to this object. Dumping out all the processes using the !process command, I found a particular process with 388,836 open handles. This is the culprit. Interesting enough, this is the same process which was listed in the thread’s process field.

       1  Token

       1  Desktop

       1  Section

       2  WindowStation

       3  Directory

       4  IoCompletion

       5  Mutant

       7  Port

       9  Semaphore

      15  File

      73  Event

   65110  Key

  323155  Thread

 

  388386  Handles in process

 

Running !handle 0 2 886f5020 and analyzing the output, I found a ton of thread handles. Most likely the process is leaking thread handles and preventing the system from cleaning up the thread allocations. This is resulting in high usage of nonpaged pool, which is causing the performance issues seen by the customer.

We contacted the software vendor responsible for the process, and it turned out to be a known bug in their software. An update was provided to the customer, and it fixed the issue immediately.

 

Share this post :

Leave a Comment
  • Please add 2 and 2 and type the answer here:
  • Post
  • Would those 388,836 open handles have shown up in Task Manager?

  • @Ed

    If you added the Handles column to the Processes tab in Task Manager, I would expect the handle count to accurately reflect the number of handles.

Page 1 of 1 (2 items)