In our previous article we discussed how to identify a pool leak using perfmon. Although it may be interesting to know that you have a pool leak, most customers are interested in identifying the cause of the leak so that it can be corrected. In this article we will begin the process of identifying what kernel mode driver is leaking pool, and possibly identify why.
Often when we are collecting data for a poor performance scenario there are two pieces of data that we collect. Perfmon log data is one, as we discussed in our previous article. The other piece of data is poolmon logs. The memory manager tracks pool usage according to the tag associated with the pool allocations, using a technique called pool tagging. Poolmon gathers this data and displays it in an easy to use format. Poolmon can also be configured to dump data to a log, and in some scenarios it is beneficial to schedule poolmon to periodically collect such logs. There are several available techniques to schedule poolmon, however that is beyond the scope of this article.
Poolmon has shipped with many different packages over the years; it is currently available with the Windows Driver Kit. If you install the WDK to the default folder, poolmon will be in “C:\Program Files (x86)\Windows Kits\8.0\Tools\x64”. Poolmon does not have dependencies on other modules in this folder; you can copy it to your other computers when you need to investigate pool usage.
How does pool tagging work? When a driver allocates pool it calls the ExAllocatePoolWithTag API. This API accepts a tag - a four-letter string - that will be used to label the allocation. It is up to a driver developer to choose this tag. Ideally each developer will choose a tag that is unique to their driver and use a different tag for each code path which calls ExAllocatePoolWithTag. Because each tag should be unique to each driver, if we can identify the tag whose usage corresponds with the leak we can then begin to identify the driver which is leaking the memory. The tag may also give the driver developer clues as to why the memory is being leaked, if they use a unique tag for each code path.
To view the pool usage associated with each tag run “poolmon -b” from a command prompt. This will sort by the number of bytes associated with each tag. If you are tracking pool usage over a period of time, you can log the data to a file with “poolmon -b -n poolmonlog1.txt”, replacing 1 with increasing numbers to obtain a series of logs. Once you have a series of logs you may be able to view usage increasing for a specific tag, in a corresponding fashion to what you see in perfmon.
When analyzing poolmon the important data is at the top. Typically the tag with the largest usage in bytes is the cause of the leak.
In the above data we can see that the tag with the most pool usage is “Leak”. Now that we know what tag is leaking we need to identify what driver is using this tag. Techniques for associating a leak with a tag vary, but findstr is often effective. Most drivers are located in c:\windows\system32\drivers, so that is a good starting point when looking for the driver. If you don’t find a result in that folder, go up a folder and try again, repeating until you get to the root of the drive.
C:\>findstr /s Leak *.sys
·∟ §£♂ Θ─☺ A╗☻ E☼"├Θ╡☺ Hï♣╔♂ ╞ $Θª☺ Hï♣:Hc┴ ┴ê\♦@ë
└δ_Aï ╞♣@∟ ☺ë♣▓← Aï@♦ë♣¼← δCAâ∙♦u╓AïAìI♦A╕Leak;┴☼B┴3╔ï╨ §
In the above output we can see that “Leak” is used in myfault.sys. If we hadn’t forced this leak with notmyfault, the next step in troubleshooting would be an internet search for the tag and the driver. Often such a search will allow you to identify a specific fault within the driver and a solution.
Don’t panic if findstr doesn’t find your tag, or if you find the tag but it is not unique to one driver. In future articles we will cover additional techniques for associating drivers with tags, and for associating allocations with specific code within a driver.
In our previous articles we discussed identifying a pool leak with perfmon, and narrowing the source of the leak with poolmon. These tools are often preferred because they are easy to use, provide verbose information, and can be run on a system without forcing downtime. However, it is not always possible to get perfmon and poolmon data. If a system is experiencing poor performance you may have a business need to get the system up and running as quickly as possible without allowing time to troubleshoot. It is also possible to completely exhaust memory through a pool leak, leaving the system in a state where tools such as perfmon and poolmon will not work. In these scenarios it may be possible to troubleshoot the poor performance by forcing a bugcheck, gathering a memory dump, and performing a post mortem analysis.
Although a dump is not the ideal data to troubleshoot a leak, it can be done. I say less than ideal because a dump is a snapshot of the system memory, and does not provide the historical data which perfmon would provide. The lack of historical data makes it difficult to differentiate between high memory usage due to load and high memory usage due to a leak. It is up to you, as the troubleshooter, to determine if the dump is sufficient evidence of a leak. Sometimes identifying the tag and the driver will help you identify a known issue that causes a leak, or your knowledge of the driver architecture may allow you to determine if the memory usage is normal or not. In some scenarios you may decide to start monitoring with perfmon and collect additional data for a future occurrence.
The first step to debug a pool leak using a dump is to load the dump in windbg, set the symbol path, and reload symbols.
1: kd> .symfix c:\symbols
1: kd> .reload
Loading Kernel Symbols
The !vm command will show memory utilization, the 1 flag will limit the verbosity of this command. For the scenario of a pool leak, the significant values are “NonPagedPool Usage:” and “PagedPool Usage:”. If the debugger identifies a value that is out of the normal range it will flag it, and we can see here that the debugger has flagged excessive nonpaged pool usage. This is similar to the information we obtained in Part 1 using perfmon, but unlike perfmon we do not have trend data to indicate if this is temporary high pool usage due to load or if this is a leak.
1: kd> !vm 1
*** Virtual Memory Usage ***
Physical Memory: 403854 ( 1615416 Kb)
Page File: \??\C:\pagefile.sys
Current: 1048576 Kb Free Space: 1015644 Kb
Minimum: 1048576 Kb Maximum: 4194304 Kb
Available Pages: 106778 ( 427112 Kb)
ResAvail Pages: 225678 ( 902712 Kb)
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 33533355 ( 134133420 Kb)
Modified Pages: 4844 ( 19376 Kb)
Modified PF Pages: 4838 ( 19352 Kb)
NonPagedPool Usage: 155371 ( 621484 Kb)
NonPagedPool Max: 191078 ( 764312 Kb)
********** Excessive NonPaged Pool Usage *****
PagedPool 0 Usage: 27618 ( 110472 Kb)
PagedPool 1 Usage: 3848 ( 15392 Kb)
PagedPool 2 Usage: 299 ( 1196 Kb)
PagedPool 3 Usage: 283 ( 1132 Kb)
PagedPool 4 Usage: 344 ( 1376 Kb)
PagedPool Usage: 32392 ( 129568 Kb)
PagedPool Maximum: 33554432 ( 134217728 Kb)
Session Commit: 7764 ( 31056 Kb)
Shared Commit: 6371 ( 25484 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 5471 ( 21884 Kb)
PagedPool Commit: 32394 ( 129576 Kb)
Driver Commit: 2458 ( 9832 Kb)
Committed pages: 326464 ( 1305856 Kb)
Commit limit: 665998 ( 2663992 Kb)
The debugger can parse the pool tagging database and present similar information as poolmon. The !poolused command will do this, the /t5 option will limit output to the top 5 consumers, the 2 flag will sort by nonpaged pool usage (use the 4 flag if your leak is in paged pool).
1: kd> !poolused /t5 2
Sorting by NonPaged Pool Consumed
Tag Allocs Used Allocs Used
Leak 601 615424000 0 0 UNKNOWN pooltag 'Leak', please update pooltag.txt
Pool 6 1717840 0 0 Pool tables, etc.
nVsC 664 1531552 0 0 UNKNOWN pooltag 'nVsC', please update pooltag.txt
netv 4369 1172224 1 144 UNKNOWN pooltag 'netv', please update pooltag.txt
Thre 607 774048 0 0 Thread objects , Binary: nt!ps
TOTAL 43424 634209952 63565 126487760
The above output shows that the tag “Leak” is associated with almost all of the nonpaged pool usage. This is the same information we obtained in Part 2 using poolmon.
Now we must identify what drivers use the pool tag “Leak”. Because we have a snapshot of the system memory we can search the dump for this tag. We can match each address to a module using the command lm a.
1: kd> !for_each_module s -a @#Base @#End "Leak"
fffff880`044b63aa 4c 65 61 6b 3b c1 0f 42-c1 41 8d 49 fd 8b d0 ff Leak;..B.A.I....
fffff880`044b6621 4c 65 61 6b 3b c1 0f 42-c1 33 c9 8b d0 ff 15 cc Leak;..B.3......
1: kd> lm a fffff880`044b63aa
start end module name
fffff880`044b5000 fffff880`044bc000 myfault (no symbols)
1: kd> lm a fffff880`044b6621
The tag and driver name can be used to search the internet for known problems. If a known issue is found a driver update may be available, and installing this update may prevent a future memory leak.
If there are no updates available for the driver, or if this is your driver and you need to identify the cause of the leak, don’t panic. In future articles we will show techniques for getting call stacks of pool allocations, these call stacks be used to identify under what conditions the driver leaks memory.