Performance Monitor (perfmon) is the preferred tool to measure the performance of Windows systems. The perfmon tool provides an analysis view with a chart and metrics of the Last, Average, Minimum, and Maximum values.
There are scenarios where the line in the chart is the most valuable piece of information, such as a memory leak. Other times we may not be looking for a trend, the Last, Average, Minimum, and Maximum metrics may be valuable. One example where the metrics are valuable is when evaluating average disk latency over a period of time. In this article we are going to use disk latency counters to illustrate how metrics are calculated for performance counters. The concepts we will illustrate with disk latency apply to all performance counters. This article will not be a deep dive into understanding disk latency, there are already many sources of information on that topic.
Most performance counter metrics are pretty straightforward. The minimum and maximum metrics are self-explanatory. The last metric is the last entry in the data. The metric that is confusing is the average. When calculating averages it is important to consider the cardinality of the data. This is especially important when working with data that is already an average, such as the Avg. Disk sec/Read counter which displays the average time per each read from a disk.
Perfmon logs are gathered at a specific time interval, such as every 15 seconds. At every interval the counters are read and an entry is written to the log. In this interval there may have been many reads from the disk, a few reads, or there may have been none. The number of reads performed is a critical aspect of the average calculation, this is the cardinality of the data.
Consider the following 10 entries in a perfmon log:
1 reads took 150ms
0 reads took 0 ms
Often, averages are calculated by adding a column of numbers and dividing by the number of entries. However this calculation does not work for the above data. If we simply add and divide we get an average latency of 15ms (150 / 10) per read, but this is clearly incorrect. There has been 1 read performed and it took 150ms, therefore the average latency is 150ms per read. Depending on the system configuration, an average read latency of less than 20ms may be considered fast and more than 20ms may be considered slow. If we perform the calculation incorrectly we may believe the disk is performing adequately while the correct calculation shows the disk is actually very slow.
What data is used to calculate averages?
Let’s take a look at the data perfmon is working with. Perfmon stores data in two different structures. Formatted values are stored as PDH_FMT_COUNTERVALUE. Raw values are stored as PDH_RAW_COUNTER.
Formatted values are just plain numbers. They contain only the result of calculating the average of one or more raw values, but not the raw data used to obtain that calculation. Data stored in a perfmon CSV or TSV file is already formatted, which means they contain a column of floating point numbers. If our previous example was stored in a CSV or TSV we would have the following data:
The above numbers contain no information about how many reads were performed over the course of this log. Therefore it is impossible to calculate an accurate average from these numbers. That is not to say CSV and TSV files are worthless, there are many performance scenarios (such as memory leaks) where the average is not important.
Raw counters contain the raw performance information, as delivered by the performance counter to pdh.dll. In the case of Avg. Disk sec/Read the FirstValue contains the total time for all reads and the SecondValue contains the total number of reads performed. This information can be used to calculate the average while taking into consideration the cardinality of the data.
Again using the above example, the raw data would look like this:
On first look the above raw data does not resemble our formatted data at all. In order to calculate the average we need to know what the correct algorithm is. The Avg. Disk sec/Read counter is of type PERF_AVERAGE_TIMER and the average calculation is ((Nx - N0) / F) / (Dx - D0). N refers to FirstValue in the raw counter data, F refers to the number of ticks per second, and D refers to SecondValue. Ticks per second can be obtained from the PerformanceFrequency parameter of KeQueryPerformanceCounter, in my example it is 14318180.
Using the algorithm for PERF_AVERAGE_TIMER the calculation for the formatted values would be:
((2147727 - 0) / 14318180) / (1 - 0) = 0.15
((2147727 - 2147727) / 14318180) / (1 - 1) = 0*
*If the denominator is 0 there is no new data and the result is 0.
Because the raw counter contains both the number of reads performed during each interval and the time it took for these reads to complete, we can accurately calculate the average for many entries.
If you’ve taken the time to read this far you may be wondering why I have taken the time to explain such a mundane topic. It is important to explain how this works because many performance tools are not using the correct average calculation and many users are trying to calculate averages using data that is not appropriate for such calculations (such as CSV and TSV files). Programmers should use PdhComputeCounterStatistics to calculate averages and should not sum and divide by the count or duplicate the calculations described in MSDN.
Recently we have found that under some conditions perfmon will use the incorrect algorithm to calculate averages. When reading from log files perfmon has been formatting the values, summing them, and dividing by the number of entries. This issue has been corrected in perfmon for Windows 8/Server 2012 with KB2877211 and for Windows 8.1/Server 2012 R2 as part of KB2883200. We recommend using these fixes when analyzing perfmon logs to determine the average of a performance counter. Note that KB2877211/KB2883200 only change the behavior when analyzing logs, there is no change when the data is collected. This means you can collect performance logs from any version of Windows and analyze them on a system with these fixes installed.
Hello everyone, I'm Ray and I'm here to talk a bit about a dump I recently looked at and a little-referenced memory counter called ResAvail Pages (resident available pages).
The problem statement was: The server hangs after a while.
Not terribly informative, but that's where we start with many cases. First some good housekeeping:
0: kd> vertarget
Windows 7 Kernel Version 7601 (Service Pack 1) MP (2 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 7601.18113.amd64fre.win7sp1_gdr.130318-1533
Machine Name: "ASDFASDF1234"
Kernel base = 0xfffff800`01665000 PsLoadedModuleList = 0xfffff800`018a8670
Debug session time: Thu Aug 8 09:39:26.992 2013 (UTC - 4:00)
System Uptime: 9 days 1:08:39.307
Of course Windows 7 Server == Server 2008 R2.
One of the basic things I check at the beginning of these hang dumps with vague problem statements is the memory information.
0: kd> !vm 21
*** Virtual Memory Usage ***
Physical Memory: 2097038 ( 8388152 Kb)
Page File: \??\C:\pagefile.sys
Current: 12582912 Kb Free Space: 12539700 Kb
Minimum: 12582912 Kb Maximum: 12582912 Kb
Available Pages: 286693 ( 1146772 Kb)
ResAvail Pages: 135 ( 540 Kb)
********** Running out of physical memory **********
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 33526408 ( 134105632 Kb)
******* 12 system cache map requests have failed ******
Modified Pages: 4017 ( 16068 Kb)
Modified PF Pages: 4017 ( 16068 Kb)
NonPagedPool Usage: 113241 ( 452964 Kb)
NonPagedPool Max: 1561592 ( 6246368 Kb)
PagedPool 0 Usage: 35325 ( 141300 Kb)
PagedPool 1 Usage: 28162 ( 112648 Kb)
PagedPool 2 Usage: 24351 ( 97404 Kb)
PagedPool 3 Usage: 24350 ( 97400 Kb)
PagedPool 4 Usage: 24516 ( 98064 Kb)
PagedPool Usage: 136704 ( 546816 Kb)
PagedPool Maximum: 33554432 ( 134217728 Kb)
********** 222 pool allocations have failed **********
Session Commit: 6013 ( 24052 Kb)
Shared Commit: 6150 ( 24600 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 1214088 ( 4856352 Kb)
Pages For MDLs: 67 ( 268 Kb)
PagedPool Commit: 136768 ( 547072 Kb)
Driver Commit: 15548 ( 62192 Kb)
Committed pages: 1648790 ( 6595160 Kb)
Commit limit: 5242301 ( 20969204 Kb)
So we're failing to allocate pool, but we aren't out of virtual memory for paged pool or nonpaged pool. Let's look at the breakdown:
0: kd> dd nt!MmPoolFailures l?9
fffff800`01892160 000001be 00000000 00000000 00000002
fffff800`01892170 00000000 00000000 00000000 00000000
yellow = Nonpaged high/medium/low priority failures
green = Paged high/medium/low priority failures
cyan = Session paged high/medium/low priority failures
So we actually failed both nonpaged AND paged pool allocations in this case. Why? We're "Running out of physical memory", obviously. So where does this running out of physical memory message come from? In the above example this is from the ResAvail Pages counter.
ResAvail Pages is the amount of physical memory there would be if every working set was at its minimum size and only what needs to be resident in RAM was present (e.g. PFN database, system PTEs, driver images, kernel thread stacks, nonpaged pool, etc).
Where did this memory go then? We have plenty of Available Pages (Free + Zero + Standby) for use. So something is claiming memory it isn't actually using. In this type of situation one of the things I immediately suspect is process working set minimums. Working set basically means the physical memory used by a process.
So let's check.
0: kd> !process 0 1
<a lot of processes in this output>.
SessionId: 0 Cid: 0adc Peb: 7fffffda000 ParentCid: 0678
DirBase: 204ac9000 ObjectTable: 00000000 HandleCount: 0.
VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.
ElapsedTime 9 Days 01:08:00.134
Working Set Sizes (now,min,max) (5, 50, 345) (20KB, 200KB, 1380KB)
VirtualSize 65 Mb
PeakVirtualSize 84 Mb
I have only shown one example process above for brevity's sake, but there were thousands returned. 241,423 to be precise. None had abnormally high process working set minimums, but cumulatively their usage adds up.
The “now” process working set is lower than the minimum working set. How is that possible? Well, the minimum and maximum are not hard limits, but suggested limits. For example, the minimum working set is honored unless there is memory pressure, in which case it can be trimmed below this value. There is a way to set the min and/or max as hard limits on specific processes by using the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag via SetProcessWorkingSetSize.
You can view if the minimum and maximum working set values are configured in the _EPROCESS->Vm->Flags structure. Note these numbers are from another system as this structure was already torn down for the processes we were looking at.
0: kd> dt _EPROCESS fffffa8008f76060 Vm
+0x398 Vm : _MMSUPPORT
0: kd> dt _MMSUPPORT fffffa8008f76060+0x398
+0x000 WorkingSetMutex : _EX_PUSH_LOCK
+0x008 ExitGate : 0xfffff880`00961000 _KGATE
+0x010 AccessLog : (null)
+0x018 WorkingSetExpansionLinks : _LIST_ENTRY [ 0x00000000`00000000 - 0xfffffa80`08f3c410 ]
+0x028 AgeDistribution :  0
+0x044 MinimumWorkingSetSize : 0x32
+0x048 WorkingSetSize : 5
+0x04c WorkingSetPrivateSize : 5
+0x050 MaximumWorkingSetSize : 0x159
+0x054 ChargedWslePages : 0
+0x058 ActualWslePages : 0
+0x05c WorkingSetSizeOverhead : 0
+0x060 PeakWorkingSetSize : 0x5ae
+0x064 HardFaultCount : 0x41
+0x068 VmWorkingSetList : 0xfffff700`01080000 _MMWSL
+0x070 NextPageColor : 0x2dac
+0x072 LastTrimStamp : 0
+0x074 PageFaultCount : 0x65c
+0x078 RepurposeCount : 0x1e1
+0x07c Spare :  0
+0x084 Flags : _MMSUPPORT_FLAGS
0: kd> dt _MMSUPPORT_FLAGS fffffa8008f76060+0x398+0x84
+0x000 WorkingSetType : 0y000
+0x000 ModwriterAttached : 0y0
+0x000 TrimHard : 0y0
+0x000 MaximumWorkingSetHard : 0y0
+0x000 ForceTrim : 0y0
+0x000 MinimumWorkingSetHard : 0y0
+0x001 SessionMaster : 0y0
+0x001 TrimmerState : 0y00
+0x001 Reserved : 0y0
+0x001 PageStealers : 0y0000
+0x002 MemoryPriority : 0y00000000 (0)
+0x003 WsleDeleted : 0y1
+0x003 VmExiting : 0y1
+0x003 ExpansionFailed : 0y0
+0x003 Available : 0y00000 (0)
How about some more detail?
0: kd> !process fffffa8008f76060
DirBase: 204ac9000 ObjectTable: 00000000 HandleCount: 0.
No active threads
0: kd> !object fffffa8008f76060
Object: fffffa8008f76060 Type: (fffffa8006cccc90) Process
ObjectHeader: fffffa8008f76030 (new version)
HandleCount: 0 PointerCount: 1
The highlighted information shows us that this process has no active threads left but the process object itself (and its 20KB working set use) were still hanging around because a kernel driver had a reference to the object that it never released. Sampling other entries shows the server had been leaking process objects since it was booted.
Unfortunately trying to directly track down pointer leaks on process objects is difficult and requires an instrumented kernel, so we tried to check the easy stuff first before going that route. We know it has to be a kernel driver doing this (since it is a pointer and not a handle leak) so we looked at the list of 3rd party drivers installed. Note: The driver names have been redacted.
0: kd> lm
start end module name
fffff880`04112000 fffff880`04121e00 driver1 (no symbols) <-- no symbols usually means 3rd party
fffff880`04158000 fffff880`041a4c00 driver2 (no symbols)
0: kd> lmvm driver1
Browse full module list
fffff880`04112000 fffff880`04121e00 driver1 (no symbols)
Loaded symbol image file: driver1.sys
Image path: \SystemRoot\system32\DRIVERS\driver1.sys
Image name: driver1.sys
Browse all global symbols functions data
Timestamp: Wed Dec 13 12:09:32 2006 (458033CC)
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
0: kd> lmvm driver2
Loaded symbol image file: driver2.sys
Image path: \??\C:\Windows\system32\drivers\driver2.sys
Image name: driver2.sys
Timestamp: Thu Nov 30 12:12:07 2006 (456F10E7)
Fortunately for both the customer and us we turned up a pair of drivers that predated Windows Vista (meaning they were designed for XP/2003) that raised an eyebrow. Of course we need a more solid evidence link than just "it's an old driver", so I did a quick search of our internal KB. This turned up several other customers who had these same drivers installed, experienced the same problem, then removed them and the problem went away. That sounds like a pretty good evidence link. We implemented the same plan for this customer successfully.