Hello everyone, I'm Ray and I'm here to talk a bit about a dump I recently looked at and a little-referenced memory counter called ResAvail Pages (resident available pages).
The problem statement was: The server hangs after a while.
Not terribly informative, but that's where we start with many cases. First some good housekeeping:
0: kd> vertarget
Windows 7 Kernel Version 7601 (Service Pack 1) MP (2 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 7601.18113.amd64fre.win7sp1_gdr.130318-1533
Machine Name: "ASDFASDF1234"
Kernel base = 0xfffff800`01665000 PsLoadedModuleList = 0xfffff800`018a8670
Debug session time: Thu Aug 8 09:39:26.992 2013 (UTC - 4:00)
System Uptime: 9 days 1:08:39.307
Of course Windows 7 Server == Server 2008 R2.
One of the basic things I check at the beginning of these hang dumps with vague problem statements is the memory information.
0: kd> !vm 21
*** Virtual Memory Usage ***
Physical Memory: 2097038 ( 8388152 Kb)
Page File: \??\C:\pagefile.sys
Current: 12582912 Kb Free Space: 12539700 Kb
Minimum: 12582912 Kb Maximum: 12582912 Kb
Available Pages: 286693 ( 1146772 Kb)
ResAvail Pages: 135 ( 540 Kb)
********** Running out of physical memory **********
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 33526408 ( 134105632 Kb)
******* 12 system cache map requests have failed ******
Modified Pages: 4017 ( 16068 Kb)
Modified PF Pages: 4017 ( 16068 Kb)
NonPagedPool Usage: 113241 ( 452964 Kb)
NonPagedPool Max: 1561592 ( 6246368 Kb)
PagedPool 0 Usage: 35325 ( 141300 Kb)
PagedPool 1 Usage: 28162 ( 112648 Kb)
PagedPool 2 Usage: 24351 ( 97404 Kb)
PagedPool 3 Usage: 24350 ( 97400 Kb)
PagedPool 4 Usage: 24516 ( 98064 Kb)
PagedPool Usage: 136704 ( 546816 Kb)
PagedPool Maximum: 33554432 ( 134217728 Kb)
********** 222 pool allocations have failed **********
Session Commit: 6013 ( 24052 Kb)
Shared Commit: 6150 ( 24600 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 1214088 ( 4856352 Kb)
Pages For MDLs: 67 ( 268 Kb)
PagedPool Commit: 136768 ( 547072 Kb)
Driver Commit: 15548 ( 62192 Kb)
Committed pages: 1648790 ( 6595160 Kb)
Commit limit: 5242301 ( 20969204 Kb)
So we're failing to allocate pool, but we aren't out of virtual memory for paged pool or nonpaged pool. Let's look at the breakdown:
0: kd> dd nt!MmPoolFailures l?9
fffff800`01892160 000001be 00000000 00000000 00000002
fffff800`01892170 00000000 00000000 00000000 00000000
yellow = Nonpaged high/medium/low priority failures
green = Paged high/medium/low priority failures
cyan = Session paged high/medium/low priority failures
So we actually failed both nonpaged AND paged pool allocations in this case. Why? We're "Running out of physical memory", obviously. So where does this running out of physical memory message come from? In the above example this is from the ResAvail Pages counter.
ResAvail Pages is the amount of physical memory there would be if every working set was at its minimum size and only what needs to be resident in RAM was present (e.g. PFN database, system PTEs, driver images, kernel thread stacks, nonpaged pool, etc).
Where did this memory go then? We have plenty of Available Pages (Free + Zero + Standby) for use. So something is claiming memory it isn't actually using. In this type of situation one of the things I immediately suspect is process working set minimums. Working set basically means the physical memory used by a process.
So let's check.
0: kd> !process 0 1
<a lot of processes in this output>.
SessionId: 0 Cid: 0adc Peb: 7fffffda000 ParentCid: 0678
DirBase: 204ac9000 ObjectTable: 00000000 HandleCount: 0.
VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.
ElapsedTime 9 Days 01:08:00.134
Working Set Sizes (now,min,max) (5, 50, 345) (20KB, 200KB, 1380KB)
VirtualSize 65 Mb
PeakVirtualSize 84 Mb
I have only shown one example process above for brevity's sake, but there were thousands returned. 241,423 to be precise. None had abnormally high process working set minimums, but cumulatively their usage adds up.
The “now” process working set is lower than the minimum working set. How is that possible? Well, the minimum and maximum are not hard limits, but suggested limits. For example, the minimum working set is honored unless there is memory pressure, in which case it can be trimmed below this value. There is a way to set the min and/or max as hard limits on specific processes by using the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag via SetProcessWorkingSetSize.
You can view if the minimum and maximum working set values are configured in the _EPROCESS->Vm->Flags structure. Note these numbers are from another system as this structure was already torn down for the processes we were looking at.
0: kd> dt _EPROCESS fffffa8008f76060 Vm
+0x398 Vm : _MMSUPPORT
0: kd> dt _MMSUPPORT fffffa8008f76060+0x398
+0x000 WorkingSetMutex : _EX_PUSH_LOCK
+0x008 ExitGate : 0xfffff880`00961000 _KGATE
+0x010 AccessLog : (null)
+0x018 WorkingSetExpansionLinks : _LIST_ENTRY [ 0x00000000`00000000 - 0xfffffa80`08f3c410 ]
+0x028 AgeDistribution :  0
+0x044 MinimumWorkingSetSize : 0x32
+0x048 WorkingSetSize : 5
+0x04c WorkingSetPrivateSize : 5
+0x050 MaximumWorkingSetSize : 0x159
+0x054 ChargedWslePages : 0
+0x058 ActualWslePages : 0
+0x05c WorkingSetSizeOverhead : 0
+0x060 PeakWorkingSetSize : 0x5ae
+0x064 HardFaultCount : 0x41
+0x068 VmWorkingSetList : 0xfffff700`01080000 _MMWSL
+0x070 NextPageColor : 0x2dac
+0x072 LastTrimStamp : 0
+0x074 PageFaultCount : 0x65c
+0x078 RepurposeCount : 0x1e1
+0x07c Spare :  0
+0x084 Flags : _MMSUPPORT_FLAGS
0: kd> dt _MMSUPPORT_FLAGS fffffa8008f76060+0x398+0x84
+0x000 WorkingSetType : 0y000
+0x000 ModwriterAttached : 0y0
+0x000 TrimHard : 0y0
+0x000 MaximumWorkingSetHard : 0y0
+0x000 ForceTrim : 0y0
+0x000 MinimumWorkingSetHard : 0y0
+0x001 SessionMaster : 0y0
+0x001 TrimmerState : 0y00
+0x001 Reserved : 0y0
+0x001 PageStealers : 0y0000
+0x002 MemoryPriority : 0y00000000 (0)
+0x003 WsleDeleted : 0y1
+0x003 VmExiting : 0y1
+0x003 ExpansionFailed : 0y0
+0x003 Available : 0y00000 (0)
How about some more detail?
0: kd> !process fffffa8008f76060
DirBase: 204ac9000 ObjectTable: 00000000 HandleCount: 0.
No active threads
0: kd> !object fffffa8008f76060
Object: fffffa8008f76060 Type: (fffffa8006cccc90) Process
ObjectHeader: fffffa8008f76030 (new version)
HandleCount: 0 PointerCount: 1
The highlighted information shows us that this process has no active threads left but the process object itself (and its 20KB working set use) were still hanging around because a kernel driver had a reference to the object that it never released. Sampling other entries shows the server had been leaking process objects since it was booted.
Unfortunately trying to directly track down pointer leaks on process objects is difficult and requires an instrumented kernel, so we tried to check the easy stuff first before going that route. We know it has to be a kernel driver doing this (since it is a pointer and not a handle leak) so we looked at the list of 3rd party drivers installed. Note: The driver names have been redacted.
0: kd> lm
start end module name
fffff880`04112000 fffff880`04121e00 driver1 (no symbols) <-- no symbols usually means 3rd party
fffff880`04158000 fffff880`041a4c00 driver2 (no symbols)
0: kd> lmvm driver1
Browse full module list
fffff880`04112000 fffff880`04121e00 driver1 (no symbols)
Loaded symbol image file: driver1.sys
Image path: \SystemRoot\system32\DRIVERS\driver1.sys
Image name: driver1.sys
Browse all global symbols functions data
Timestamp: Wed Dec 13 12:09:32 2006 (458033CC)
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
0: kd> lmvm driver2
Loaded symbol image file: driver2.sys
Image path: \??\C:\Windows\system32\drivers\driver2.sys
Image name: driver2.sys
Timestamp: Thu Nov 30 12:12:07 2006 (456F10E7)
Fortunately for both the customer and us we turned up a pair of drivers that predated Windows Vista (meaning they were designed for XP/2003) that raised an eyebrow. Of course we need a more solid evidence link than just "it's an old driver", so I did a quick search of our internal KB. This turned up several other customers who had these same drivers installed, experienced the same problem, then removed them and the problem went away. That sounds like a pretty good evidence link. We implemented the same plan for this customer successfully.
Hello folks, this is Pushkar and I recently worked an interesting case dealing with high CPU usage. The case was particularly interesting but it was not a business critical application consuming the CPU. I was tasked to identify why a third party Performance Monitoring tool was causing a performance issue on the server by consuming 100% of the CPU cycles. The irony of the issue itself made this particular case immensely interesting for me and I decided to investigate it further. Typically issues with third party products are addressed by the Application Vendor.
In this case the monitoring tool, Monitor.exe (disclaimer: Identity of the actual product has been removed), was consistently consuming 100% of the CPU cycles, and as expected if the tool was stopped, the CPU usage was back to normal. As I didn’t have information about how the tool worked, I decided to gather data to give me an idea about the underlying API's the tool was calling. This would give me an understanding of its behavior.
To begin with the troubleshooting I gathered Kernrate.exe logs from the server along with a memory dump. On platforms which are Windows Vista and upwards you can use the Windows Performance Toolkit (aka. Xperf), a better alternative to Kernrate.
Note: To learn more about kernrate.exe check here
A quick review of the Kernrate Log showed a lot of CPU time in the Kernel, and the function mostly called was NtQueryDirectoryObject()
TOTAL K 0:07:35.343 (49.2%) U 0:02:40.734 (17.4%) I 0:05:09.171 (33.4%) DPC 0:00:52.984 (5.7%) Interrupt 0:00:20.312 (2.2%)
Total Interrupts= 4208713, Total Interrupt Rate= 9098/sec.
Results for Kernel Mode:
OutputResults: KernelModuleCount = 106
Percentage in the following table is based on the Total Hits for the Kernel
ProfileTime 116300 hits, 65536 events per hit --------
Module Hits msec %Total Events/Sec
PROCESSR 45160 462654 38 % 6397017
NTOSKRNL 43573 462654 37 % 6172215
SYMMPI 18258 462654 15 % 2586287
----- Zoomed module NTOSKRNL.EXE (Bucket size = 16 bytes, Rounding Down) --------
Percentage in the following table is based on the Total Hits for this Zoom Module
ProfileTime 43573 hits, 65536 events per hit --------
NtQueryDirectoryObject 8433 462654 19 % 1194553
memmove 6584 462654 14 % 932638
KeInsertQueueDpc 5593 462654 12 % 792261
KeFlushMultipleTb 3118 462654 7 % 441671
Further Investigation of a dump gathered indicated the function listed above was invoked to parse through the \BaseNamedObjects namespace in the kernel. BaseNamedObjects is a folder in the Kernel Object Namespace where various kernel objects: events, semaphores, mutexes, waitable timers, file-mapping objects, and job objects are created. The purpose of such global namespace is to enable processes on multiple client sessions to communicate with a service application. Another use of such namespace is for applications that use named objects to detect that there is already an instance of the application running in the system across all sessions.
I started looking under the \BaseNamedObjects namespace in the kernel and found there were numerous, over 2900 objects under that Namespace. You can use SysInternal tool WinObj.exe to browse through the Kernel Object Namespace and the list of objects within those namespaces. In my case I had no prior idea of the cause of the issue so I had to look at the dump file. I pasted a snapshot of the WinObj.exe ouput below along with a stack from the kernel dump.
kd> !object \BaseNamedObjects
Object: fffffa8000d72ec0 Type: (fffffadfe7acb1e0) Directory
ObjectHeader: fffffa8000d72e90 (old version)
HandleCount: 36 PointerCount: 10275 <- This indicates the Number of Object under this namespace
Directory Object: fffffa8000004060 Name: BaseNamedObjects
Hash Address Type Name
---- ------- ---- ----
00 fffffadfe5d79660 Job TestJobObj_9920
fffffadfe5d7a8e0 Job TestJobObj_9913
fffffadfe5d7b8e0 Job TestJobObj_9907
fffffadfe5d84060 Job TestJobObj_9850
fffffadfe5d863e0 Job TestJobObj_9843
fffffadfe5d873e0 Job TestJobObj_9837
fffffadfe5d8e060 Job TestJobObj_9790
fffffadfe5d903e0 Job TestJobObj_9783
fffffadfe5d913e0 Job TestJobObj_9777
fffffadfe5dad660 Job TestJobObj_9611
fffffadfe5dae660 Job TestJobObj_9605
fffffadfe5db7660 Job TestJobObj_9551
fffffadfe5db8660 Job TestJobObj_9545
fffffadfe5db98e0 Job TestJobObj_9538
In my case the third party performance monitoring tool was experiencing high CPU when they queryied “Job Objects” and “Job Object Details” performance counters.
Since this was happening with a third party tool, I tried to recreate the issue in-house, and wrote a small utility to create a lot of Named Job Objects under \BaseNamedObjects namespace using CreateJobObject() API. Then I tried to use the plain old Perfmon tool built into Windows. The moment I tried to add the “Job Objects” & “Job Object Details” counters, the CPU utilization from MMC.EXE hit 90% (Perfmon runs as an MMC). Now I could reproduce the issue in-house so I investigated what happened when we try to query those counters.
Here is what I found:
When we attempt to query the Job Object or Job Object Details performance counter it invokes the following functions respectively in Perfproc.dll which would invoke NtQueryDirectoryObject() to locate the Object under the \BaseNamedObjects Namespace. Then it calls QueryInformationJobObject to gather the performance data related to that objects. One of the thread stacks is given below to show how the flow goes through
RetAddr : Args to Child : Call Site
fffff800`01040abd : 00000000`000ad490 fffff800`01385b47 00000000`00002990 00000000`00000000 : nt!NtQueryInformationJobObject
00000000`77ef171a : 00000000`77d423cb 00000000`000ad498 00000000`00000000 00000000`c0000135 : nt!KiSystemServiceCopyEnd+0x3
00000000`77d423cb : 00000000`000ad498 00000000`00000000 00000000`c0000135 00000000`00000000 : ntdll!ZwQueryInformationJobObject+0xa
000007ff`5a794a16 : 00000000`000ad990 00000000`00000008 00000000`00000400 00000000`000addf8 : kernel32!QueryInformationJobObject+0x77
000007ff`5a7932b9 : 00000000`0000dd20 000007ff`5a79aa84 00000000`00000001 00000000`00000040 : perfproc!CollectJobObjectData+0x356
000007ff`7feeb497 : 00000000`00000000 00000000`000adee0 00000000`00000000 00000000`00000001 : perfproc!CollectSysProcessObjectData+0x1f9
000007ff`7fef09d1 : 00000000`000ae1d0 00000000`00000000 00000000`0000007a 00000000`00000000 : ADVAPI32!QueryExtensibleData+0x951
000007ff`7fef0655 : 0039002d`00350038 00000000`77c43cbd 00000000`00000000 00000000`00000000 : ADVAPI32!PerfRegQueryValue+0x66d
000007ff`7ff0b787 : 00000000`00000000 00000000`00000000 00000000`000ae8a0 00000000`000ae888 : ADVAPI32!LocalBaseRegQueryValue+0x356
000007ff`5b17ba27 : 00009ae8`b8ee62a9 00000000`000c46a0 00000000`00200000 00000000`00000000 : ADVAPI32!RegQueryValueExW+0xe9
You can determine the type of performance data queried for each Job Object in Win32_PerfFormattedData_PerfProc_JobObject Class. Although it documents a WMI interface for querying the Performance Data, it represents the same data set the native API also queries.
Once I understood what was happening under the hood, it was easy to deduce that every iteration of the query will be a CPU intensive operation. For each object we are going to do a recursive query for the Performance Data as mentioned in the above document. If the number of items is large, and we are running a broader query of the nature “All Counters” & “All Instances” then every single iteration would consume a lot of CPU. To validate this, I got help from one of my colleagues, Steve Heller, who modified the sample code received from the vendor of the Monitoring Tool (Which demonstrated what they were doing). We noticed they were querying this information every 2.5 seconds. The sample tool demonstrated that with about 10,000 Job Objects, one single iteration of the query was taking roughly 12.5 seconds to complete. No wonder the CPU usage would remain high, because before we finish a single iteration of the query for all the Job Objects, four additional queries were already queued and the CPU Usage continued to remain at 90% or more.
The conclusion that can be drawn from this test is that querying \BaseNamedObjects namespace with a large number of objects under it will invariably result in a fair amount of CPU usage. This could get worse if the query is performed for complex objects e.g., Job Objects for which we would run nested queries for individual Performance Data for each Job Object, and if the data is queried too frequently.
The two aspects responsible for the high CPU usage are:
1. The frequency at which the Performance Data was being queried
2. The number of objects under \BaseNamedObjects namespace.
Though there is no defined limitation on the number of objects that can be created under various Kernel Object Namespaces, absence of such documents doesn’t mean that we should simply create a lot of objects. Applications should be designed carefully to use such kernel resources judiciously.
At the same time because we can query a lot of performance data as the interfaces are publicly available, doesn’t mean we should query them at a very high frequency. Certain operations can be CPU intensive as we have seen here and we should identify what we really want to query and at what frequency.
Quoting for the MSDN article Job Objects
“A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. Operations performed on the job object affect all processes associated with the job object.”
The above point clearly indicates that the Job Object Framework is intended for processes which are participating in a Job to monitor and control each other’s resource usage.
There could be different reason for which a system administrator needs to gather information about the resource usage by a particular Process. One of them could be to meet Service Level Agreement. In that case we can monitor individual processes running on the server with the counters available under “Performance Object -> Process”.
Equipped with the knowledge that querying Performance Data for Job Object may result in excessive CPU usage on the server, an Administrator should evaluate or consult the application owner (If needed) to understand whether they really want to gather this performance data.
The following documents are available for application developers to assist with the creation of application specific namespaces. Creation of a private namespace and creating objects under that namespace would ensure that application specific objects are not exposed to the Performance Library because they won’t appear under \BaseNamedObjects namespace.
This problem would be predominantly noticed on older systems. To mitigate it to a certain extent, Windows Vista (and all post operating systems) have a per-session BaseNamedObjects namespace (i.e., it application-specific objects will be created under namespace in the following form \Sessions\<SessionID>\BaseNamedObjects). Hence these objects will not be exposed to the Performance Libraries and will not be available to query via Performance Monitoring Tools. Unfortunately the changes which were done to the kernel to make this work are too intricate to be back-ported to Windows 2003 SP2.
This brings me to the end of this post, I hope this information gives you better insight on the how oversight in an applications design can lead to the breaking of some other components at a later stage.
In this video Michael introduces the sample based profiling tool named XPERF. He also demonstrates how to use the tool to uncover high DPC’s, and Disk I/O’s. If you would like to see more XPERF blogs on ntdebugging please send your feedback using the email link at the top of the blog site. Feel free to leave comments too.
Location of the video: http://ittv.net/DesktopModules/UltraVideoGallery/uvg.swf?vId=419&portalId=0
BTW - I’m working on making a downloadable version for off-line viewing. Stay Tuned.
Link to Microsoft Windows Performance Toolkit
Link to XPerf version 4.5. This ships with the Win7 SDK below.
We look forward to your feedback!
It's Venkatesh with a WMI nugget. While troubleshooting or debugging WMI issues you may come across WMI queries wherein you don’t know which provider implemented the WMI class used in the query. You may want to know the binary and the product that implemented the provider so you can contact the vendor or upgrade the binary to see if it resolves the issue you are investigating.
To help make the investigation easier, Windows Vista and all post operating systems have the cool new 'Analytic and Debug log' for WMI tracing. To get similar tracing in Windows 2003 and earlier versions, WMI has a “Verbose Logging” option enabled through the WMI control (WmiMgmt.msc). Check out this blog for detailed steps to enable the WMI Analytic and Debug -
After enabling WMI logging you will see events like the ones pictured below. There may be a particular WMI query relevant to the issue you're troubleshooting requiring you to know the provider binary for the WMI class in the query. Apart from the class name (InstProvSamp) in the query you need the WMI namespace to which this class belongs to such as '\\.\root\default' as shown below.
Fig 1: Event ID 1 in the Windows Vista’s Event Viewer\Windows Logs\Applications and Services Logs\Microsoft\Windows\WMI-Activity\Trace log
Now we have the WMI Namespace and the class name so let’s find the provider for it.
1. Get the provider name from the Class Name and Namespace:
Every WMI class has a qualifier “provider” containing the name of the provider to which it belongs. Using wbemtest.exe connect to the namespace we mentioned above(\\.\root\default). Then get the class information for our class, InstProvSamp using the “Open Class…” button. After clicking "OK" the Object editor for InstrProvSamp displays the class information for InstProvSamp including the name of the provider it belongs to (CIM_STRING qualifier “provider” as shown below).
2. Get the class identifier (CLSID) of the provider from the provider name:
For every WMI provider WMI maintains an instance of system class __Win32Provider which contains information about the physical implementation of the provider like CLSID which contains the GUID that represents the class identifier (CLSID) of the provider COM object. Now query the repository to get the provider information for “InstProvSamp” provider as I did below.
3. Get the provider binary from the provider’s CLSID:
During registration every COM object adds an entry under HKLM\SOFTWARE\Classes\CLSID. Using regedit.exe you can browse to provider’s COM registration information using the CLSID obtained from Step 2 as shown below.
This should help you connect the dots while debugging your WMI issue. Let me know if you have additional questions!
Would you like to join the world’s best and most elite debuggers to enable the success of Microsoft solutions?
As a trusted advisor to our top customers you will be working with to the most experienced IT professionals and developers in the industry. You will influence our product teams in sustained engineering efforts to drive improvements in our products.
This role involves deep analysis of product source code and debugging to solve problems in multi-million dollar configurations and will give you an opportunity to stretch your critical thinking skills. During the course of debugging, you will uncover opportunities to improve the customer experience while influencing the current and future design of our products.
In addition to providing support to customers while being the primary interface to our sustained engineering teams, you will also have the opportunity to work with new technologies and unreleased software. Through our continuous investment in depth training and hands-on experience with tough customer challenges you will become the world’s best in this area. Expect to partner with many various roles at Microsoft launching a very successful career!
This position is located is at the Microsoft Global Technical Support Center in Bangalore, India.
Learn more about what an Escalation Engineer does at:
Profile: Ron Stock, CTS Escalation Engineer - Microsoft Customer Service & Support - What is CSS?
Microsoft JobsBlog JobCast with Escalation Engineer Jeff Dailey
Microsoft JobsBlog JobCast with Escalation Engineer Scott Oseychik
Xperf was created by the Windows Fundamentals team to analyze system and application performance. Tate walks through the use of the new tool and shows how to analyze the data for specific scenarios. The slide deck is available from the attachment link below.
Recently I was working with a developer who had created performance counters that work in Performance Monitor but are never collected in a user defined data collector set. The customer explained that their counters update named shared memory inside the application which should be read by perfmon or the data collector set.
Putting counter data in shared memory is a common technique for performance counter developers. A programmer can update performance data in a block of shared memory in their application and then use a performance extension dll (aka an “extensible counter”) to read from the shared memory.
Shared memory is created by calling CreateFileMapping and MapViewOfFile. This memory is then accessed by another application by calling OpenFileMapping. All applications which use this shared memory must pass the same lpName to CreateFileMapping or OpenFileMapping. An example of using these APIs to implement shared memory is available on MSDN.
Based on the customer’s explanation that they are populating shared memory in their application, and their counters work in Performance Monitor but do not work in a user defined data collector set, I suspected that OpenFileMapping was failing for the data collector set.
User defined data collector sets run in a rundll32.exe process. If you have multiple rundll32.exe processes you may need to identify which one is related to your data collector set. The relevant process has a command line similar to “rundll32.exe C:\Windows\system32\pla.dll,PlaHost”. There are several tools that can be used to identify the command line of the process such as tlist.exe, which is included with the Debugging Tools for Windows.
After attaching a debugger to rundll32.exe, I wanted to break on the ret instruction at the end of the OpenFileMappingW function. This would allow me to determine if the function succeeds or fails. According to MSDN “If the function fails, the return value is NULL. To get extended error information, call GetLastError.”
The uf command is an easy way to unassemble a function and find the ret instruction to break on.
0:001> uf kernelbase!OpenFileMappingW
75b88e0d 8bff mov edi,edi
75b88e79 c9 leave
75b88e7a c20c00 ret 0Ch
0:001> bp 75b88e7a
Breakpoint 0 hit
eax=00000000 ebx=00008022 ecx=7ffd8000 edx=00000002 esi=05abf03c edi=00000000
eip=75b88e7a esp=05abeb20 ebp=05abeb3c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
In the above output we can see that the eax register is NULL, indicating that the call to OpenFileMapping failed. The !gle command will show the last error and last status.
LastErrorValue: (Win32) 0x2 (2) - The system cannot find the file specified.
LastStatusValue: (NTSTATUS) 0xc0000034 - Object Name not found.
The failure is that OpenFileMapping cannot find the file. The file name is the third parameter to OpenFileMapping. We can get the first three parameters from the kb command.
ChildEBP RetAddr Args to Child
05abf0d0 6abae355 0002001f 00000000 05abeb7c kernelbase!OpenFileMappingW+0x90
WARNING: Stack unwind information not available. Following frames may be wrong.
05abf0f8 7784fe67 02a7ae90 05abf224 05abf254 ninjaprf+0x10edb
05abf110 7784fc97 00472158 02a7ae90 05abf224 advapi32!CallExtObj+0x17
05abf270 7784efaf 05abf2bc 60fcfa02 05abf778 advapi32!QueryExtensibleData+0x735
05abf654 75ff0468 80000004 05abf778 00000000 advapi32!PerfRegQueryValue+0x5da
05abf748 75ffd505 80000004 05abf778 05abf790 kernel32!LocalBaseRegQueryValue+0x366
05abf7b4 61247dc5 80000004 02a7ae90 00000000 kernel32!RegQueryValueExW+0xb7
05abf830 61250595 80000004 02a7ae58 02a7ae90 pdh!GetSystemPerfData+0x92
05abf89c 6124c753 02a407d0 05abf8e8 61241928 pdh!GetQueryPerfData+0xa4
05abf8b8 61254463 02a407d0 05abf8e8 60fcf32f pdh!PdhiCollectQueryData+0x32
05abf90c 611c6d04 02a58f08 00000000 75ffc3e0 pdh!PdhUpdateLogW+0xa2
05abf9bc 611be128 0045c968 00000000 00000000 pla!HPerformanceCounterDataCollector::Cycle+0x48
05abf9bc 00000000 0045c968 00000000 00000000 pla!PlaiCollectorControl+0x3b7
0:008> da 05abeb7c
The user defined data collector set is failing to open the file "Local\NINJAPERF_S-1-5-18". This is the name that the performance extension dll ninjaprf.dll has given to its shared memory.
Based on the customer’s description this operation works in Performance Monitor. Next, I attached a debugger to perfmon and set the same breakpoint.
eax=000009f8 ebx=00008022 ecx=a7330000 edx=080ee678 esi=06798070 edi=00000000
eip=760be9bb esp=0a84e564 ebp=0a84e580 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
760be9bb c20c00 ret 0Ch
In the above output we can see that eax is a handle number, indicating that the function succeeded.
Looking at the file being opened we can see why this works in one scenario and not in another.
0:016> kb 1
0a84e580 698e4ab9 0002001f 00000000 0a84e5c0 kernelbase!OpenFileMappingW+0x90
0:016> da 0a84e5c0
The file name used is unique for each user. The ninjaprf dll has chosen a file name which includes the SID of the current user. This works for Performance Monitor because the user who starts the application is the same as the user who runs Performance Monitor. However, a user defined data collector set runs in rundll32.exe as the Local System account.
It is possible to run the user defined data collector set as a different user, however the file name being used will not work in that scenario either. By using the “Local\” prefix, the file is created in the local session namespace. The application runs in the user’s session, while rundll32.exe is started by the Task Scheduler service and runs in session 0. This prevents the user defined data collector set from seeing the file created by the application.
If your performance counter uses shared memory to communicate, be aware that your performance extension dll may be run in the context of a different user account and a different user session. Your inter process communication techniques must account for this, or your customers will ask you why your counters do not work outside of Performance Monitor.
I was working on a debug the other day when I ran the “!vm” command and saw that the system had some 48,000 commit requests that failed. This was strange as the system was not out of memory and the page file was not full. I was left scratching my head and thinking “I wish I knew where !vm got that information from.” So I went on a quest to find out, here is what I found.
13: kd> !vm 1
Physical Memory: 12580300 ( 50321200 Kb)
Page File: \??\C:\pagefile.sys
Current: 50331648 Kb Free Space: 50306732 Kb
Minimum: 50331648 Kb Maximum: 50331648 Kb
Available Pages: 4606721 ( 18426884 Kb)
ResAvail Pages: 12189247 ( 48756988 Kb)
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 33460257 ( 133841028 Kb)
Modified Pages: 20299 ( 81196 Kb)
Modified PF Pages: 6154 ( 24616 Kb)
NonPagedPool 0 Used: 19544 ( 78176 Kb)
NonPagedPool 1 Used: 22308 ( 89232 Kb)
NonPagedPool Usage: 53108 ( 212432 Kb)
NonPagedPool Max: 9408956 ( 37635824 Kb)
PagedPool 0 Usage: 168921 ( 675684 Kb)
PagedPool 1 Usage: 4149241 ( 16596964 Kb)
PagedPool 2 Usage: 17908 ( 71632 Kb)
PagedPool Usage: 4336070 ( 17344280 Kb)
PagedPool Maximum: 33554432 ( 134217728 Kb)
Session Commit: 3438 ( 13752 Kb)
Shared Commit: 6522 ( 26088 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 53597 ( 214388 Kb)
PagedPool Commit: 4336140 ( 17344560 Kb)
Driver Commit: 5691 ( 22764 Kb)
Committed pages: 5565215 ( 22260860 Kb)
Commit limit: 25162749 ( 100650996 Kb)
********** 48440 commit requests have failed **********
It turns out that this calculation is from a global ULONG array named “nt!MiChargeCommitmentFailures”. The array has 4 members and they are used to trace the types of commit failures that have taken place. This is done by first calculating the new commit size NewCommitValue = CurrentCommitValue + SystemReservedMemory. Based on this calculation commit errors are tracked in a few different ways, which are listed below with the corresponding member in the array that is incremented.
MiChargeCommitmentFailures - If the system failed a commit request and an expansion of the pagefile has failed.
MiChargeCommitmentFailures - If the system failed a commit and we have already reached the maximum pagefile size.
MiChargeCommitmentFailures - If the system failed a commit while the pagefile lock is held.
MiChargeCommitmentFailures - If the system failed a commit and the NewCommitValue is less than or equal to CurrentCommitValue.
In order to calculate the count of failures, "!vm" adds up the values stored in each array member of the array. Members 0 and 1 are always counted, member 2 is counted if the OS version is Windows 2003/XP and member 3 is counted if the build version is newer than Windows 2003/XP.
Let's look at the array in the dump I was debugging:
13: kd> dc nt!MiChargeCommitmentFailures L4
fffff800`01e45ce0 00000000 0000bd38 00000000 00000000 ....8...........
Converting this to decimal we find the 48000+ commit failures I was seeing the in output of !VM.
13: kd> ?0000bd38
Evaluate expression: 48440 = 00000000`0000bd38
Since I now had my answer, “where does the number come from?”, I was left wanting to know a bit more about the overall flow of why a VirtualAlloc fails to commit.
When memory is allocated by VirtualAlloc the newly allocated memory is not committed to physical memory. Only when the memory is accessed by a read or write is it backed by physical memory.
When this newly allocated memory is accessed for the first time it will need to be backed by commit space. Under normal conditions this is a smooth process, however when the system hits what’s called the commit limit and can’t expand this limit we see commit failures.
So how is the commit limit calculated? Let’s say we have a system with 4GB of physical memory and a pagefile that is 6GB in size. To determine the commit limit we add physical memory and the pagefile size together - in this example the commit limit would be 10GB. Since memory manger will not let any user mode allocation consume every last morsel of commit space it keeps a small amount of the commit space for the system to avoid hangs. When the limit is reached the system tries to grow the page file. If there is no more room to grow the pagefile or the pagefile has reached its configured maximum size, the system will try and free some committed memory to make room for more requests. If expansion of the page file or the attempt to free memory do not allow the allocation to complete, the allocation fails and MiChargeCommitmentFailures is incremented.
To sum it all up, commit limit is RAM + pagefile, commit failures happen when we hit the commit limit and the system is unable to grow the pagefile because it is already at its max. It’s that simple, well almost.
For those that will want to know more about how memory manger works please see the post from Somak: The Memory Shell Game.
This is Tate asking for your direct and valuable feedback on XPerf (The Windows Performance Toolkit). We are knee deep in working with the creators of the tool (The Windows Fundamentals Team) to provide end user feedback which will drive features for the next version. Now that you as our readers have seen several blog posts on the utility and hopefully have used it for a few of your own debugs, what would you like to see in terms of new features, fixes, data collection steps, etc?
Particular areas of interest to the team are:
- Data Collection – easy or difficult, what would you change?
- Usage Scenarios – stack walking is powerful, what would you like the tool to expose that it does not?
- Any blocking factors keeping you from using the toolkit, what are they, how could we remedy that for you?
- Data Analysis – what would make your life as a debugger using the toolkit better?
Please send us your feedback below and thanks in advance.
Hello NTDebuggers… I look at a lot of dump files every day. This being the case I like to take full advantage of the customizable look and feel of windbg. I actually have an association setup between DMP files and a CMD file that loads my customized COLOR workspace each time I double click on a dump file. I like a black background with colorized source and debug command window output.
Here is a snapshot of my typical debug session.
Here is how you set it up.
1) Create the following CMD file and put it in your path. It’s called D.CMD on my system..
echo off Title kd.exe -z %1 C: CD\debuggers start C:\Debuggers\windbg.exe -z %1 -W color
2) Load windbg and under view \ options configure your color options. Note: you will want to set every option in the list box.
3) Save the workspace by selecting FILE \ Save Workspace, I saved mine as COLOR
4) If you don’t have a dump file create one by opening a process in the debugger and doing a .dump /ma C:\test\test.dmp (I’ll assume you have a directory called C:\test)
5) Select the C:\test\test.dmp file in explorer and right click on the properties. (note I used badwindow.dmp)
6) Change the dump file association via drilling down to Change \ Browse \ (Select D.CMD)
7) That’s all there is to it. Now you can simply double click on any DMP file and windbg will start and load with your COLOR workspace.
Good luck and happy debugging.