• Ntdebugging Blog

    ResAvail Pages and Working Sets

    • 0 Comments

    Hello everyone, I'm Ray and I'm here to talk a bit about a dump I recently looked at and a little-referenced memory counter called ResAvail Pages (resident available pages).

     

    The problem statement was:  The server hangs after a while.

     

    Not terribly informative, but that's where we start with many cases. First some good housekeeping:

     

    0: kd> vertarget

    Windows 7 Kernel Version 7601 (Service Pack 1) MP (2 procs) Free x64

    Product: Server, suite: TerminalServer SingleUserTS

    Built by: 7601.18113.amd64fre.win7sp1_gdr.130318-1533

    Machine Name: "ASDFASDF1234"

    Kernel base = 0xfffff800`01665000 PsLoadedModuleList = 0xfffff800`018a8670

    Debug session time: Thu Aug  8 09:39:26.992 2013 (UTC - 4:00)

    System Uptime: 9 days 1:08:39.307

     

    Of course Windows 7 Server == Server 2008 R2.

     

    One of the basic things I check at the beginning of these hang dumps with vague problem statements is the memory information.

     

    0: kd> !vm 21

     

    *** Virtual Memory Usage ***

    Physical Memory:     2097038 (   8388152 Kb)

    Page File: \??\C:\pagefile.sys

      Current:  12582912 Kb  Free Space:  12539700 Kb

      Minimum:  12582912 Kb  Maximum:     12582912 Kb

    Available Pages:      286693 (   1146772 Kb)

    ResAvail Pages:          135 (       540 Kb)

     

    ********** Running out of physical memory **********

     

    Locked IO Pages:           0 (         0 Kb)

    Free System PTEs:   33526408 ( 134105632 Kb)

     

    ******* 12 system cache map requests have failed ******

     

    Modified Pages:         4017 (     16068 Kb)

    Modified PF Pages:      4017 (     16068 Kb)

    NonPagedPool Usage:   113241 (    452964 Kb)

    NonPagedPool Max:    1561592 (   6246368 Kb)

    PagedPool 0 Usage:     35325 (    141300 Kb)

    PagedPool 1 Usage:     28162 (    112648 Kb)

    PagedPool 2 Usage:     24351 (     97404 Kb)

    PagedPool 3 Usage:     24350 (     97400 Kb)

    PagedPool 4 Usage:     24516 (     98064 Kb)

    PagedPool Usage:      136704 (    546816 Kb)

    PagedPool Maximum:  33554432 ( 134217728 Kb)

     

    ********** 222 pool allocations have failed **********

     

    Session Commit:         6013 (     24052 Kb)

    Shared Commit:          6150 (     24600 Kb)

    Special Pool:              0 (         0 Kb)

    Shared Process:      1214088 (   4856352 Kb)

    Pages For MDLs:           67 (       268 Kb)

    PagedPool Commit:     136768 (    547072 Kb)

    Driver Commit:         15548 (     62192 Kb)

    Committed pages:     1648790 (   6595160 Kb)

    Commit limit:        5242301 (  20969204 Kb)

     

    So we're failing to allocate pool, but we aren't out of virtual memory for paged pool or nonpaged pool.  Let's look at the breakdown:

     

    0: kd> dd nt!MmPoolFailures l?9

    fffff800`01892160  000001be 00000000 00000000 00000002

    fffff800`01892170  00000000 00000000 00000000 00000000

    fffff800`01892180  00000000

     

    Where:

        yellow   = Nonpaged high/medium/low priority failures

        green    = Paged high/medium/low priority failures

        cyan      = Session paged high/medium/low priority failures

     

    So we actually failed both nonpaged AND paged pool allocations in this case.  Why?  We're "Running out of physical memory", obviously.  So where does this running out of physical memory message come from?  In the above example this is from the ResAvail Pages counter.

     

    ResAvail Pages is the amount of physical memory there would be if every working set was at its minimum size and only what needs to be resident in RAM was present (e.g. PFN database, system PTEs, driver images, kernel thread stacks, nonpaged pool, etc).

     

    Where did this memory go then?  We have plenty of Available Pages (Free + Zero + Standby) for use.  So something is claiming memory it isn't actually using.  In this type of situation one of the things I immediately suspect is process working set minimums. Working set basically means the physical memory used by a process.

     

    So let's check.

     

    0: kd> !process 0 1

     

    <a lot of processes in this output>.

     

    PROCESS fffffa8008f76060

        SessionId: 0  Cid: 0adc    Peb: 7fffffda000  ParentCid: 0678

        DirBase: 204ac9000  ObjectTable: 00000000  HandleCount:   0.

        Image: cscript.exe

        VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.

        DeviceMap fffff8a000008a70

        Token                             fffff8a0046f9c50

        ElapsedTime                       9 Days 01:08:00.134

        UserTime                          00:00:00.000

        KernelTime                        00:00:00.015

        QuotaPoolUsage[PagedPool]         0

        QuotaPoolUsage[NonPagedPool]      0

        Working Set Sizes (now,min,max)  (5, 50, 345) (20KB, 200KB, 1380KB)

        PeakWorkingSetSize                1454

        VirtualSize                       65 Mb

        PeakVirtualSize                   84 Mb

        PageFaultCount                    1628

        MemoryPriority                    BACKGROUND

        BasePriority                      8

        CommitCharge                      0

     

    I have only shown one example process above for brevity's sake, but there were thousands returned.  241,423 to be precise.  None had abnormally high process working set minimums, but cumulatively their usage adds up.

     

    The “now” process working set is lower than the minimum working set.  How is that possible?  Well, the minimum and maximum are not hard limits, but suggested limits.  For example, the minimum working set is honored unless there is memory pressure, in which case it can be trimmed below this value.  There is a way to set the min and/or max as hard limits on specific processes by using the QUOTA_LIMITS_HARDWS_MIN_ENABLE flag via SetProcessWorkingSetSize.

     

    You can view if the minimum and maximum working set values are configured in the _EPROCESS->Vm->Flags structure.  Note these numbers are from another system as this structure was already torn down for the processes we were looking at.

     

    0: kd> dt _EPROCESS fffffa8008f76060 Vm

    nt!_EPROCESS

       +0x398 Vm : _MMSUPPORT

    0: kd> dt _MMSUPPORT fffffa8008f76060+0x398

    nt!_MMSUPPORT

       +0x000 WorkingSetMutex  : _EX_PUSH_LOCK

       +0x008 ExitGate         : 0xfffff880`00961000 _KGATE

       +0x010 AccessLog        : (null)

       +0x018 WorkingSetExpansionLinks : _LIST_ENTRY [ 0x00000000`00000000 - 0xfffffa80`08f3c410 ]

       +0x028 AgeDistribution  : [7] 0

       +0x044 MinimumWorkingSetSize : 0x32

       +0x048 WorkingSetSize   : 5

       +0x04c WorkingSetPrivateSize : 5

       +0x050 MaximumWorkingSetSize : 0x159

       +0x054 ChargedWslePages : 0

       +0x058 ActualWslePages  : 0

       +0x05c WorkingSetSizeOverhead : 0

       +0x060 PeakWorkingSetSize : 0x5ae

       +0x064 HardFaultCount   : 0x41

       +0x068 VmWorkingSetList : 0xfffff700`01080000 _MMWSL

       +0x070 NextPageColor    : 0x2dac

       +0x072 LastTrimStamp    : 0

       +0x074 PageFaultCount   : 0x65c

       +0x078 RepurposeCount   : 0x1e1

       +0x07c Spare            : [2] 0

       +0x084 Flags            : _MMSUPPORT_FLAGS

    0: kd> dt _MMSUPPORT_FLAGS fffffa8008f76060+0x398+0x84

    nt!_MMSUPPORT_FLAGS

       +0x000 WorkingSetType   : 0y000

       +0x000 ModwriterAttached : 0y0

       +0x000 TrimHard         : 0y0

       +0x000 MaximumWorkingSetHard : 0y0

       +0x000 ForceTrim        : 0y0

       +0x000 MinimumWorkingSetHard : 0y0

       +0x001 SessionMaster    : 0y0

       +0x001 TrimmerState     : 0y00

       +0x001 Reserved         : 0y0

       +0x001 PageStealers     : 0y0000

       +0x002 MemoryPriority   : 0y00000000 (0)

       +0x003 WsleDeleted      : 0y1

       +0x003 VmExiting        : 0y1

       +0x003 ExpansionFailed  : 0y0

       +0x003 Available        : 0y00000 (0)

     

    How about some more detail?

     

    0: kd> !process fffffa8008f76060

    PROCESS fffffa8008f76060

        SessionId: 0  Cid: 0adc    Peb: 7fffffda000  ParentCid: 0678

        DirBase: 204ac9000  ObjectTable: 00000000  HandleCount:  0.

        Image: cscript.exe

        VadRoot 0000000000000000 Vads 0 Clone 0 Private 1. Modified 3. Locked 0.

        DeviceMap fffff8a000008a70

        Token                             fffff8a0046f9c50

        ElapsedTime                       9 Days 01:08:00.134

        UserTime                          00:00:00.000

        KernelTime                        00:00:00.015

        QuotaPoolUsage[PagedPool]         0

        QuotaPoolUsage[NonPagedPool]      0

        Working Set Sizes (now,min,max)  (5, 50, 345) (20KB, 200KB, 1380KB)

        PeakWorkingSetSize                1454

        VirtualSize                       65 Mb

        PeakVirtualSize                   84 Mb

        PageFaultCount                    1628

        MemoryPriority                    BACKGROUND

        BasePriority                      8

        CommitCharge                      0

     

    No active threads

     

    0: kd> !object fffffa8008f76060

    Object: fffffa8008f76060  Type: (fffffa8006cccc90) Process

        ObjectHeader: fffffa8008f76030 (new version)

        HandleCount: 0  PointerCount: 1

     

    The highlighted information shows us that this process has no active threads left but the process object itself (and its 20KB working set use) were still hanging around because a kernel driver had a reference to the object that it never released.  Sampling other entries shows the server had been leaking process objects since it was booted.

     

    Unfortunately trying to directly track down pointer leaks on process objects is difficult and requires an instrumented kernel, so we tried to check the easy stuff first before going that route.  We know it has to be a kernel driver doing this (since it is a pointer and not a handle leak) so we looked at the list of 3rd party drivers installed.  Note: The driver names have been redacted.

     

    0: kd> lm

    start             end                 module name

    <snip>

    fffff880`04112000 fffff880`04121e00   driver1    (no symbols)   <-- no symbols usually means 3rd party       

    fffff880`04158000 fffff880`041a4c00   driver2    (no symbols)          

    <snip>

     

    0: kd> lmvm driver1   

    Browse full module list

    start             end                 module name

    fffff880`04112000 fffff880`04121e00   driver1    (no symbols)          

        Loaded symbol image file: driver1.sys

        Image path: \SystemRoot\system32\DRIVERS\driver1.sys

        Image name: driver1.sys

        Browse all global symbols  functions  data

        Timestamp:        Wed Dec 13 12:09:32 2006 (458033CC)

        CheckSum:         0001669E

        ImageSize:        0000FE00

        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

    0: kd> lmvm driver2

    Browse full module list

    start             end                 module name

    fffff880`04158000 fffff880`041a4c00   driver2    (no symbols)          

        Loaded symbol image file: driver2.sys

        Image path: \??\C:\Windows\system32\drivers\driver2.sys

        Image name: driver2.sys

        Browse all global symbols  functions  data

        Timestamp:        Thu Nov 30 12:12:07 2006 (456F10E7)

        CheckSum:         0004FE8E

        ImageSize:        0004CC00

        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

     

    Fortunately for both the customer and us we turned up a pair of drivers that predated Windows Vista (meaning they were designed for XP/2003) that raised an eyebrow.  Of course we need a more solid evidence link than just "it's an old driver", so I did a quick search of our internal KB.  This turned up several other customers who had these same drivers installed, experienced the same problem, then removed them and the problem went away.  That sounds like a pretty good evidence link. We implemented the same plan for this customer successfully.

  • Ntdebugging Blog

    Ironical Case of a Performance Monitoring Tool causing a Performance Issue!

    • 1 Comments

    Hello folks, this is Pushkar and I recently worked an interesting case dealing with high CPU usage. The case was particularly interesting but it was not a business critical application consuming the CPU. I was tasked to identify why a third party Performance Monitoring tool was causing a performance issue on the server by consuming 100% of the CPU cycles. The irony of the issue itself made this particular case immensely interesting for me and I decided to investigate it further. Typically issues with third party products are addressed by the Application Vendor.

    In this case the monitoring tool, Monitor.exe (disclaimer: Identity of the actual product has been removed), was consistently consuming 100% of the CPU cycles, and as expected if the tool was stopped, the CPU usage was back to normal. As I didn’t have information about how the tool worked, I decided to gather data to give me an idea about the underlying API's the tool was calling. This would give me an understanding of its behavior.

    To begin with the troubleshooting I gathered Kernrate.exe logs from the server along with a memory dump. On platforms which are Windows Vista and upwards you can use the Windows Performance Toolkit (aka. Xperf), a better alternative to Kernrate.

    Note: To learn more about kernrate.exe check here

    A quick review of the Kernrate Log showed a lot of CPU time in the Kernel, and the function mostly called was NtQueryDirectoryObject()

     

     

    TOTAL K 0:07:35.343 (49.2%)  U 0:02:40.734 (17.4%)  I 0:05:09.171 (33.4%)  DPC 0:00:52.984 (5.7%)  Interrupt 0:00:20.312 (2.2%)

           Total Interrupts= 4208713, Total Interrupt Rate= 9098/sec.

     

    -----------------------------

     

    Results for Kernel Mode:

    -----------------------------

     

    OutputResults: KernelModuleCount = 106

    Percentage in the following table is based on the Total Hits for the Kernel

     

    ProfileTime   116300 hits, 65536 events per hit --------

     Module                                Hits   msec  %Total  Events/Sec

    PROCESSR                              45160     462654    38 %     6397017

    NTOSKRNL                              43573     462654    37 %     6172215

    SYMMPI                                18258     462654    15 %     2586287

     

     

    ----- Zoomed module NTOSKRNL.EXE (Bucket size = 16 bytes, Rounding Down) --------

    Percentage in the following table is based on the Total Hits for this Zoom Module

     

    ProfileTime   43573 hits, 65536 events per hit --------

     Module                                Hits   msec  %Total  Events/Sec

    NtQueryDirectoryObject                 8433     462654    19 %     1194553

    memmove                                6584     462654    14 %      932638

    KeInsertQueueDpc                       5593     462654    12 %      792261

    KeFlushMultipleTb                      3118     462654     7 %      441671

     

    Further Investigation of a dump gathered indicated the function listed above was invoked to parse through the \BaseNamedObjects namespace in the kernel. BaseNamedObjects is a folder in the Kernel Object Namespace where various kernel objects: events, semaphores, mutexes, waitable timers, file-mapping objects, and job objects are created. The purpose of such global namespace is to enable processes on multiple client sessions to communicate with a service application. Another use of such namespace is for applications that use named objects to detect that there is already an instance of the application running in the system across all sessions.

    I started looking under the \BaseNamedObjects namespace in the kernel and found there were numerous, over 2900 objects under that Namespace. You can use SysInternal tool WinObj.exe to browse through the Kernel Object Namespace and the list of objects within those namespaces. In my case I had no prior idea of the cause of the issue so I had to look at the dump file.  I pasted a snapshot of the WinObj.exe ouput below along with a stack from the kernel dump.

    WinObj Screenshot

    kd> !object \BaseNamedObjects

    Object: fffffa8000d72ec0  Type: (fffffadfe7acb1e0) Directory

        ObjectHeader: fffffa8000d72e90 (old version)

        HandleCount: 36  PointerCount: 10275        <- This indicates the Number of Object under this namespace

        Directory Object: fffffa8000004060  Name: BaseNamedObjects

     

        Hash Address          Type          Name

        ---- -------          ----          ----

         00  fffffadfe5d79660 Job           TestJobObj_9920

             fffffadfe5d7a8e0 Job           TestJobObj_9913

             fffffadfe5d7b8e0 Job           TestJobObj_9907

             fffffadfe5d84060 Job           TestJobObj_9850

             fffffadfe5d863e0 Job           TestJobObj_9843

             fffffadfe5d873e0 Job           TestJobObj_9837

             fffffadfe5d8e060 Job           TestJobObj_9790

             fffffadfe5d903e0 Job           TestJobObj_9783

             fffffadfe5d913e0 Job           TestJobObj_9777

             fffffadfe5dad660 Job           TestJobObj_9611

             fffffadfe5dae660 Job           TestJobObj_9605

             fffffadfe5db7660 Job           TestJobObj_9551

             fffffadfe5db8660 Job           TestJobObj_9545

             fffffadfe5db98e0 Job           TestJobObj_9538

    <Snipped>

    In my case the third party performance monitoring tool was experiencing high CPU when they queryied “Job Objects” and “Job Object Details” performance counters.

    Since this was happening with a third party tool, I tried to recreate the issue in-house, and wrote a small utility to create a lot of Named Job Objects under \BaseNamedObjects namespace using CreateJobObject() API. Then I tried to use the plain old Perfmon tool built into Windows. The moment I tried to add the “Job Objects” & “Job Object Details” counters, the CPU utilization from MMC.EXE hit 90% (Perfmon runs as an MMC). Now I could reproduce the issue in-house so I investigated what happened when we try to query those counters.

    Here is what I found:

    When we attempt to query the Job Object or Job Object Details performance counter it invokes the following functions respectively in Perfproc.dll which would invoke NtQueryDirectoryObject() to locate the Object under the \BaseNamedObjects Namespace. Then it calls  QueryInformationJobObject to gather the performance data related to that objects. One of the thread stacks is given below to show how the flow goes through

     

    CollectJobDetailData()

    CollectJobObjectData()

     

    kd> kb

    RetAddr           : Args to Child                                                           : Call Site

    fffff800`01040abd : 00000000`000ad490 fffff800`01385b47 00000000`00002990 00000000`00000000 : nt!NtQueryInformationJobObject

    00000000`77ef171a : 00000000`77d423cb 00000000`000ad498 00000000`00000000 00000000`c0000135 : nt!KiSystemServiceCopyEnd+0x3

    00000000`77d423cb : 00000000`000ad498 00000000`00000000 00000000`c0000135 00000000`00000000 : ntdll!ZwQueryInformationJobObject+0xa

    000007ff`5a794a16 : 00000000`000ad990 00000000`00000008 00000000`00000400 00000000`000addf8 : kernel32!QueryInformationJobObject+0x77

    000007ff`5a7932b9 : 00000000`0000dd20 000007ff`5a79aa84 00000000`00000001 00000000`00000040 : perfproc!CollectJobObjectData+0x356

    000007ff`7feeb497 : 00000000`00000000 00000000`000adee0 00000000`00000000 00000000`00000001 : perfproc!CollectSysProcessObjectData+0x1f9

    000007ff`7fef09d1 : 00000000`000ae1d0 00000000`00000000 00000000`0000007a 00000000`00000000 : ADVAPI32!QueryExtensibleData+0x951

    000007ff`7fef0655 : 0039002d`00350038 00000000`77c43cbd 00000000`00000000 00000000`00000000 : ADVAPI32!PerfRegQueryValue+0x66d

    000007ff`7ff0b787 : 00000000`00000000 00000000`00000000 00000000`000ae8a0 00000000`000ae888 : ADVAPI32!LocalBaseRegQueryValue+0x356

    000007ff`5b17ba27 : 00009ae8`b8ee62a9 00000000`000c46a0 00000000`00200000 00000000`00000000 : ADVAPI32!RegQueryValueExW+0xe9

     

    You can determine the type of performance data queried for each Job Object in Win32_PerfFormattedData_PerfProc_JobObject Class. Although it documents a WMI interface for querying the Performance Data, it represents the same data set the native API also queries.

    Once I understood what was happening under the hood, it was easy to deduce that every iteration of the query will be a CPU intensive operation. For each object we are going to do a recursive query for the Performance Data as mentioned in the above document. If the number of items is large, and we are running a broader query of the nature “All Counters” & “All Instances” then every single iteration would consume a lot of CPU. To validate this, I got help from one of my colleagues, Steve Heller, who modified the sample code received from the vendor of the Monitoring Tool (Which demonstrated what they were doing). We noticed they were querying this information every 2.5 seconds. The sample tool demonstrated that with about 10,000 Job Objects, one single iteration of the query was taking roughly 12.5 seconds to complete. No wonder the CPU usage would remain high, because before we finish a single iteration of the query for all the Job Objects, four additional queries were already queued and the CPU Usage continued to remain at 90% or more.

    The conclusion that can be drawn from this test is that querying \BaseNamedObjects namespace with a large number of objects under it will invariably result in a fair amount of CPU usage. This could get worse if the query is performed for complex objects e.g., Job Objects for which we would run nested queries for individual Performance Data for each Job Object, and if the data is queried too frequently.

    The two aspects responsible for the high CPU usage are:

    1.       The frequency at which the Performance Data was being queried

    2.       The number of objects under \BaseNamedObjects namespace.

     

    Though there is no defined limitation on the number of objects that can be created under various Kernel Object Namespaces, absence of such documents doesn’t mean that we should simply create a lot of objects. Applications should be designed carefully to use such kernel resources judiciously.

    At the same time because we can query a lot of performance data as the interfaces are publicly available, doesn’t mean we should query them at a very high frequency. Certain operations can be CPU intensive as we have seen here and we should identify what we really want to query and at what frequency.

    Quoting for the MSDN article Job Objects

     “A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. Operations performed on the job object affect all processes associated with the job object.”  

    The above point clearly indicates that the Job Object Framework is intended for processes which are participating in a Job to monitor and control each other’s resource usage.  

    There could be different reason for which a system administrator needs to gather information about the resource usage by a particular Process. One of them could be to meet Service Level Agreement. In that case we can monitor individual processes running on the server with the counters available under “Performance Object -> Process”.

    Equipped with the knowledge that querying Performance Data for Job Object may result in excessive CPU usage on the server, an Administrator should evaluate or consult the application owner (If needed) to understand whether they really want to gather this performance data. 

    The following documents are available for application developers to assist with the creation of application specific namespaces. Creation of a private namespace and creating objects under that namespace would ensure that application specific objects are not exposed to the Performance Library because they won’t appear under \BaseNamedObjects namespace.

    Object Namespaces

    http://msdn.microsoft.com/en-us/library/ms684295(VS.85).aspx  

     

    CreatePrivateNamespace Function

    http://msdn.microsoft.com/en-us/library/ms682419(VS.85).aspx

     

    Job Objects

    http://msdn.microsoft.com/en-us/library/ms684161(VS.85).aspx

     

    CreateJobObjects Function

    http://msdn.microsoft.com/en-us/library/ms682409(VS.85).aspx

     

    This problem would be predominantly noticed on older systems. To mitigate it to a certain extent, Windows Vista (and all post operating systems) have a per-session BaseNamedObjects namespace (i.e., it application-specific objects will be created under namespace in the following form \Sessions\<SessionID>\BaseNamedObjects). Hence these objects will not be exposed to the Performance Libraries and will not be available to query via Performance Monitoring Tools. Unfortunately the changes which were done to the kernel to make this work are too intricate to be back-ported to Windows 2003 SP2.

    This brings me to the end of this post, I hope this information gives you better insight on the how oversight in an applications design can lead to the breaking of some other components at a later stage.

     

     

  • Ntdebugging Blog

    XPERF DEMO: Michael Morales demonstrates the Windows Profiling Tool

    • 1 Comments

    In this video Michael introduces the sample based profiling tool named XPERF. He also demonstrates how to use the tool to uncover high DPC’s, and Disk I/O’s. If you would like to see more XPERF blogs on ntdebugging please send your feedback using the email link at the top of the blog site. Feel free to leave comments too.

    Location of the video: http://ittv.net/DesktopModules/UltraVideoGallery/uvg.swf?vId=419&portalId=0

    BTW  - I’m working on making a downloadable version for off-line viewing. Stay Tuned.

    Link to Microsoft Windows Performance Toolkit

    http://msdn.microsoft.com/en-us/performance/default.aspx

     

    Link to XPerf version 4.5. This ships with the Win7 SDK below.

    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=6db1f17f-5f1e-4e54-a331-c32285cdde0c

     

    We look forward to your feedback!

    Ron Stock

    Share this post :
  • Ntdebugging Blog

    WMI Nugget: How to Gather the Provider Binary from a WMI Class Name

    • 0 Comments

    It's Venkatesh with a WMI nugget. While troubleshooting or debugging WMI issues you may come across WMI queries wherein you don’t know which provider implemented the WMI class used in the query. You may want to know the binary and the product that implemented the provider so you can contact the vendor or upgrade the binary to see if it resolves the issue you are investigating.

    To help make the investigation easier, Windows Vista and all post operating systems have the cool new 'Analytic and Debug log' for WMI tracing. To get similar tracing in Windows 2003 and earlier versions, WMI has a “Verbose Logging” option enabled through the WMI control (WmiMgmt.msc). Check out this blog for detailed steps to enable the WMI Analytic and Debug -

    http://blogs.msdn.com/wmi/archive/2009/05/27/is-wmiprvse-a-real-villain.aspx

    After enabling WMI logging you will see events like the ones pictured below. There may be a particular WMI query relevant to the issue you're troubleshooting requiring you to know the provider binary for the WMI class in the query. Apart from the class name (InstProvSamp) in the query you need the WMI namespace to which this class belongs to such as '\\.\root\default' as shown below.

    image

    Fig 1: Event ID 1 in the Windows Vista’s Event Viewer\Windows Logs\Applications and Services Logs\Microsoft\Windows\WMI-Activity\Trace log

    Now we have the WMI Namespace and the class name so let’s find the provider for it.

    1. Get the provider name from the Class Name and Namespace:

    Every WMI class has a qualifier “provider” containing the name of the provider to which it belongs. Using wbemtest.exe connect to the namespace we mentioned above(\\.\root\default). Then get the class information for our class, InstProvSamp using the “Open Class…” button. After clicking "OK" the Object editor for InstrProvSamp displays the class information for InstProvSamp including the name of the provider it belongs to (CIM_STRING qualifier “provider” as shown below).

    image

    image

    2. Get the class identifier (CLSID) of the provider from the provider name:

    For every WMI provider WMI maintains an instance of system class __Win32Provider which contains information about the physical implementation of the provider like CLSID which contains the GUID that represents the class identifier (CLSID) of the provider COM object. Now query the repository to get the provider information for “InstProvSamp” provider as I did below.

    image

    image 

    image

    3. Get the provider binary from the provider’s CLSID:

    During registration every COM object adds an entry under HKLM\SOFTWARE\Classes\CLSID. Using regedit.exe you can browse to provider’s COM registration information using the CLSID obtained from Step 2 as shown below.

    image

    This should help you connect the dots while debugging your WMI issue. Let me know if you have additional questions!

    Share this post :
  • Ntdebugging Blog

    Our Bangalore Team is Hiring - Windows Server Escalation Engineer

    • 0 Comments

    Would you like to join the world’s best and most elite debuggers to enable the success of Microsoft solutions?

     

    As a trusted advisor to our top customers you will be working with to the most experienced IT professionals and developers in the industry. You will influence our product teams in sustained engineering efforts to drive improvements in our products.

     

    This role involves deep analysis of product source code and debugging to solve problems in multi-million dollar configurations and will give you an opportunity to stretch your critical thinking skills. During the course of debugging, you will uncover opportunities to improve the customer experience while influencing the current and future design of our products.

     

    In addition to providing support to customers while being the primary interface to our sustained engineering teams, you will also have the opportunity to work with new technologies and unreleased software. Through our continuous investment in depth training and hands-on experience with tough customer challenges you will become the world’s best in this area. Expect to partner with many various roles at Microsoft launching a very successful career!

     

    This position is located is at the Microsoft Global Technical Support Center in Bangalore, India.

     

    Learn more about what an Escalation Engineer does at:

    Profile: Ron Stock, CTS Escalation Engineer - Microsoft Customer Service & Support - What is CSS?

    Microsoft JobsBlog JobCast with Escalation Engineer Jeff Dailey

    Microsoft JobsBlog JobCast with Escalation Engineer Scott Oseychik

     

    Apply here:

    https://careers.microsoft.com/jobdetails.aspx?ss=&pg=0&so=&rw=1&jid=109989&jlang=en&pp=ss

  • Ntdebugging Blog

    Part 1: Tate Calhoun presents Xperf at the 2009 Microsoft GEC

    • 5 Comments

     

    image_thumb

    Xperf was created by the Windows Fundamentals team to analyze system and application performance. Tate walks through the use of the new tool and shows how to analyze the data for specific scenarios. The slide deck is available from the attachment link below.

    Share this post :

  • Ntdebugging Blog

    Use Caution When Implementing IPC for Performance Counters

    • 0 Comments

     

    Recently I was working with a developer who had created performance counters that work in Performance Monitor but are never collected in a user defined data collector set.  The customer explained that their counters update named shared memory inside the application which should be read by perfmon or the data collector set.

     

    Putting counter data in shared memory is a common technique for performance counter developers.  A programmer can update performance data in a block of shared memory in their application and then use a performance extension dll (aka an “extensible counter”) to read from the shared memory.

     

    Shared memory is created by calling CreateFileMapping and MapViewOfFile.  This memory is then accessed by another application by calling OpenFileMapping.  All applications which use this shared memory must pass the same lpName to CreateFileMapping or OpenFileMapping.  An example of using these APIs to implement shared memory is available on MSDN.

     

    Based on the customer’s explanation that they are populating shared memory in their application, and their counters work in Performance Monitor but do not work in a user defined data collector set, I suspected that OpenFileMapping was failing for the data collector set.

     

    User defined data collector sets run in a rundll32.exe process.  If you have multiple rundll32.exe processes you may need to identify which one is related to your data collector set.  The relevant process has a command line similar to “rundll32.exe C:\Windows\system32\pla.dll,PlaHost”.  There are several tools that can be used to identify the command line of the process such as tlist.exe, which is included with the Debugging Tools for Windows.

     

    After attaching a debugger to rundll32.exe, I wanted to break on the ret instruction at the end of the OpenFileMappingW function.  This would allow me to determine if the function succeeds or fails.  According to MSDN “If the function fails, the return value is NULL. To get extended error information, call GetLastError.”

     

    The uf command is an easy way to unassemble a function and find the ret instruction to break on.

     

    0:001> uf kernelbase!OpenFileMappingW

    kernelbase!OpenFileMappingW:

    75b88e0d 8bff            mov     edi,edi

    <snip>

    kernelbase!OpenFileMappingW+0x8f:

    75b88e79 c9              leave

    75b88e7a c20c00          ret     0Ch

    0:001> bp 75b88e7a

    0:001> g

    Breakpoint 0 hit

    eax=00000000 ebx=00008022 ecx=7ffd8000 edx=00000002 esi=05abf03c edi=00000000

    eip=75b88e7a esp=05abeb20 ebp=05abeb3c iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    kernelbase!OpenFileMappingW+0x90:

    75b88e7a c20c00          ret     0Ch

     

    In the above output we can see that the eax register is NULL, indicating that the call to OpenFileMapping failed.  The !gle command will show the last error and last status.

     

    0:008> !gle

    LastErrorValue: (Win32) 0x2 (2) - The system cannot find the file specified.

    LastStatusValue: (NTSTATUS) 0xc0000034 - Object Name not found.

     

    The failure is that OpenFileMapping cannot find the file.  The file name is the third parameter to OpenFileMapping.  We can get the first three parameters from the kb command.

     

    0:008> kb

    ChildEBP RetAddr  Args to Child             

    05abf0d0 6abae355 0002001f 00000000 05abeb7c kernelbase!OpenFileMappingW+0x90

    WARNING: Stack unwind information not available. Following frames may be wrong.

    05abf0f8 7784fe67 02a7ae90 05abf224 05abf254 ninjaprf+0x10edb

    05abf110 7784fc97 00472158 02a7ae90 05abf224 advapi32!CallExtObj+0x17

    05abf270 7784efaf 05abf2bc 60fcfa02 05abf778 advapi32!QueryExtensibleData+0x735

    05abf654 75ff0468 80000004 05abf778 00000000 advapi32!PerfRegQueryValue+0x5da

    05abf748 75ffd505 80000004 05abf778 05abf790 kernel32!LocalBaseRegQueryValue+0x366

    05abf7b4 61247dc5 80000004 02a7ae90 00000000 kernel32!RegQueryValueExW+0xb7

    05abf830 61250595 80000004 02a7ae58 02a7ae90 pdh!GetSystemPerfData+0x92

    05abf89c 6124c753 02a407d0 05abf8e8 61241928 pdh!GetQueryPerfData+0xa4

    05abf8b8 61254463 02a407d0 05abf8e8 60fcf32f pdh!PdhiCollectQueryData+0x32

    05abf90c 611c6d04 02a58f08 00000000 75ffc3e0 pdh!PdhUpdateLogW+0xa2

    05abf9bc 611be128 0045c968 00000000 00000000 pla!HPerformanceCounterDataCollector::Cycle+0x48

    05abf9bc 00000000 0045c968 00000000 00000000 pla!PlaiCollectorControl+0x3b7

    0:008> da 05abeb7c

    05abeb7c  "Local\NINJAPERF_S-1-5-18"

     

    The user defined data collector set is failing to open the file "Local\NINJAPERF_S-1-5-18".  This is the name that the performance extension dll ninjaprf.dll has given to its shared memory.

     

    Based on the customer’s description this operation works in Performance Monitor.  Next, I attached a debugger to perfmon and set the same breakpoint.

     

    Breakpoint 0 hit

    eax=000009f8 ebx=00008022 ecx=a7330000 edx=080ee678 esi=06798070 edi=00000000

    eip=760be9bb esp=0a84e564 ebp=0a84e580 iopl=0         nv up ei pl zr na pe nc

    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246

    kernelbase!OpenFileMappingW+0x90:

    760be9bb c20c00          ret     0Ch

     

    In the above output we can see that eax is a handle number, indicating that the function succeeded.

     

    Looking at the file being opened we can see why this works in one scenario and not in another.

     

    0:016> kb 1

    ChildEBP RetAddr  Args to Child             

    0a84e580 698e4ab9 0002001f 00000000 0a84e5c0 kernelbase!OpenFileMappingW+0x90

    0:016> da 0a84e5c0

    0a84e5c0  "Local\NINJAPERF_S-1-5-21-123578"

    0a84e5e0  "095-571698237-1598563147-18961"

     

    The file name used is unique for each user.  The ninjaprf dll has chosen a file name which includes the SID of the current user.  This works for Performance Monitor because the user who starts the application is the same as the user who runs Performance Monitor.  However, a user defined data collector set runs in rundll32.exe as the Local System account.

     

    It is possible to run the user defined data collector set as a different user, however the file name being used will not work in that scenario either.  By using the “Local\” prefix, the file is created in the local session namespace.  The application runs in the user’s session, while rundll32.exe is started by the Task Scheduler service and runs in session 0.  This prevents the user defined data collector set from seeing the file created by the application.

     

    If your performance counter uses shared memory to communicate, be aware that your performance extension dll may be run in the context of a different user account and a different user session.  Your inter process communication techniques must account for this, or your customers will ask you why your counters do not work outside of Performance Monitor.

  • Ntdebugging Blog

    Commitment Failures, Not Just a Failed Love Story

    • 1 Comments

    I was working on a debug the other day when I ran the “!vm” command and saw that the system had some 48,000 commit requests that failed. This was strange as the system was not out of memory and the page file was not full. I was left scratching my head and thinking “I wish I knew where !vm got that information from.” So I went on a quest to find out, here is what I found.

     

    13: kd> !vm 1

     

    *** Virtual Memory Usage ***

          Physical Memory:    12580300 (  50321200 Kb)

          Page File: \??\C:\pagefile.sys

            Current:  50331648 Kb  Free Space:  50306732 Kb

            Minimum:  50331648 Kb  Maximum:     50331648 Kb

          Available Pages:     4606721 (  18426884 Kb)

          ResAvail Pages:     12189247 (  48756988 Kb)

          Locked IO Pages:           0 (         0 Kb)

          Free System PTEs:   33460257 ( 133841028 Kb)

          Modified Pages:        20299 (     81196 Kb)

          Modified PF Pages:      6154 (     24616 Kb)

          NonPagedPool 0 Used:   19544 (     78176 Kb)

          NonPagedPool 1 Used:   22308 (     89232 Kb)

          NonPagedPool Usage:    53108 (    212432 Kb)

          NonPagedPool Max:    9408956 (  37635824 Kb)

          PagedPool 0 Usage:    168921 (    675684 Kb)

          PagedPool 1 Usage:   4149241 (  16596964 Kb)

          PagedPool 2 Usage:     17908 (     71632 Kb)

          PagedPool Usage:     4336070 (  17344280 Kb)

          PagedPool Maximum:  33554432 ( 134217728 Kb)

          Session Commit:         3438 (     13752 Kb)

          Shared Commit:          6522 (     26088 Kb)

          Special Pool:              0 (         0 Kb)

          Shared Process:        53597 (    214388 Kb)

          PagedPool Commit:    4336140 (  17344560 Kb)

          Driver Commit:          5691 (     22764 Kb)

          Committed pages:     5565215 (  22260860 Kb)

          Commit limit:       25162749 ( 100650996 Kb)

     

          ********** 48440 commit requests have failed  **********

     

    It turns out that this calculation is from a global ULONG array named “nt!MiChargeCommitmentFailures”.  The array has 4 members and they are used to trace the types of commit failures that have taken place. This is done by first calculating the new commit size NewCommitValue = CurrentCommitValue + SystemReservedMemory. Based on this calculation commit errors are tracked in a few different ways, which are listed below with the corresponding member in the array that is incremented.

     

    MiChargeCommitmentFailures[0] - If the system failed a commit request and an expansion of the pagefile has failed.

    MiChargeCommitmentFailures[1] - If the system failed a commit and we have already reached the maximum pagefile size.

    MiChargeCommitmentFailures[2] - If the system failed a commit while the pagefile lock is held.

    MiChargeCommitmentFailures[3] - If the system failed a commit and the NewCommitValue is less than or equal to CurrentCommitValue.

     

    In order to calculate the count of failures, "!vm" adds up the values stored in each array member of the array. Members 0 and 1 are always counted, member 2 is counted if the OS version is Windows 2003/XP and member 3 is counted if the build version is newer than Windows 2003/XP. 

     

    Let's look at the array in the dump I was debugging:

     

    13: kd> dc nt!MiChargeCommitmentFailures L4

    fffff800`01e45ce0  00000000 0000bd38 00000000 00000000  ....8...........

     

     

    Converting this to decimal we find the 48000+ commit failures I was seeing the in output of !VM.

     

    13: kd> ?0000bd38

    Evaluate expression: 48440 = 00000000`0000bd38

     

    Since I now had my answer, “where does the number come from?”, I was left wanting to know a bit more about the overall flow of why a VirtualAlloc fails to commit.

     

    When memory is allocated by VirtualAlloc the newly allocated memory is not committed to physical memory. Only when the memory is accessed by a read or write is it backed by physical memory.

     

    When this newly allocated memory is accessed for the first time it will need to be backed by commit space. Under normal conditions this is a smooth process, however when the system hits what’s called the commit limit and can’t expand this limit we see commit failures.

     

    So how is the commit limit calculated? Let’s say we have a system with 4GB of physical memory and a pagefile that is 6GB in size. To determine the commit limit we add physical memory and the pagefile size together - in this example the commit limit would be 10GB. Since memory manger will not let any user mode allocation consume every last morsel of commit space it keeps a small amount of the commit space for the system to avoid hangs. When the limit is reached the system tries to grow the page file. If there is no more room to grow the pagefile or the pagefile has reached its configured maximum size, the system will try and free some committed memory to make room for more requests. If expansion of the page file or the attempt to free memory do not allow the allocation to complete, the allocation fails and MiChargeCommitmentFailures is incremented.

     

     

    To sum it all up, commit limit is RAM + pagefile, commit failures happen when we hit the commit limit and the system is unable to grow the pagefile because it is already at its max.  It’s that simple, well almost.

     

    For those that will want to know more about how memory manger works please see the post from Somak: The Memory Shell Game.

     

    Randy Monteleone

  • Ntdebugging Blog

    We Need Your XPERF Feedback

    • 0 Comments

    Hi,

     

    This is Tate asking for your direct and valuable feedback on XPerf (The Windows Performance Toolkit).  We are knee deep in working with the creators of the tool (The Windows Fundamentals Team) to provide end user feedback which will drive features for the next version. Now that you as our readers have seen several blog posts on the utility and hopefully have used it for a few of your own debugs, what would you like to see in terms of new features, fixes, data collection steps, etc?

     

    Particular areas of interest to the team are:

     

    -        Data Collection – easy or difficult, what would you change?

    -        Usage Scenarios – stack walking is powerful, what would you like the tool to expose that it does not?

    -        Any blocking factors keeping you from using the toolkit, what are they, how could we remedy that for you?

    -        Data Analysis – what would make your life as a debugger using the toolkit better?

     

    Please send us your feedback below and thanks in advance.

     

    -Tate

  • Ntdebugging Blog

    How to have a colorful relationship with your dump files

    • 3 Comments

    Hello NTDebuggers…  I look at a lot of dump files every day.  This being the case I like to take full advantage of the customizable look and feel of windbg.   I actually have an association setup between DMP files and a CMD file that loads my customized COLOR workspace each time I double click on a dump file.  I like a black background with colorized source and debug command window output.   

    Here is a snapshot of my typical debug session.

    clip_image001[12]

    Here is how you set it up.

    1) Create the following CMD file and put it in your path.  It’s called D.CMD on my system..

    echo off
    Title kd.exe -z %1
    C:
    CD\debuggers
    start C:\Debuggers\windbg.exe -z %1 -W color

    2) Load windbg and under view \ options configure your color options.  Note: you will want to set every option in the list box.

    clip_image002 clip_image003

    3) Save the workspace by selecting FILE \ Save Workspace, I saved mine as COLOR

    4) If you don’t have a dump file create one by opening a process in the debugger and doing a .dump /ma C:\test\test.dmp (I’ll assume you have a directory called C:\test)

    5) Select the C:\test\test.dmp file in explorer and right click on the properties. (note I used badwindow.dmp)

    clip_image005

    6) Change the dump file association via drilling down to Change \ Browse \ (Select D.CMD)

    clip_image006

    7) That’s all there is to it.  Now you can simply double click on any DMP file and windbg will start and load with your COLOR workspace.

    Good luck and happy debugging.

    Jeff Dailey-

Page 21 of 25 (242 items) «1920212223»