‘Hidden’ ETW Stack Trace Feature – Get Stacks All Over the Place!

‘Hidden’ ETW Stack Trace Feature – Get Stacks All Over the Place!

Rate This
  • Comments 0

 

I’ve heard Mark R. say that “sometimes Microsoft gave me a gift” when coming across an interesting feature not in mainstream documentation.  How true that often the odd find can prove ultimately very valuable.  Such was the case when I read Bruce Dawson’s Stack Walking in Xperf blog entry.

Here’s the excerpt that blew me away:

You can also record call stacks for manifest-based ETW (Event Tracing for Windows) events, but the syntax is quite different, and it only works on Windows 7 and above. When you specify your ETW provider to xperf after “-on” you can specify extra parameters after the provider name, separated by colons. These are flags, a level, and, for manifest-based providers, a list of extra data to record, which can include call stacks. You can leave the flags and level fields blank and just specify ‘stack’ (in single quotes) after three colons like this:

     xperf -on Latency -stackwalk profile -start browse -on Microsoft-IE:::'stack'  
     rem Your scenario goes here...     
     xperf -stop browse -stop -d mytrace.etl

This sounds rather ordinary at first,  but looking at the syntax you may realize that this means for just about every manifest based provider (Microsoft-*) one can generate a stack trace on all ETW events therein.  This is really cool because if you are only interested in stacks, you don’t have to monkey around with any other trace level or flags which is often the most frustrating part about manual ETW tracing.  Recall from Ivan’s blog that there are over 600+ inbox providers to explore so we can go crazy here exploring components!  Let’s demonstrate.

First, remember to use the Win7 SDK version of XPerf.  The XPerf(WPT) install *.msi files are found in the Bin directory after installing Win32 Development Tools, so you have to do a two-step at present to get the tools installed.

Let’s list all the providers…

C:\utils\xperf7sdk>xperf -providers | findstr "Microsoft-"
      .
       .
       .
       93c05d69-51a3-485e-877f-1806a8731346:0x0001000000000000:0x4       : Microsoft-Windows-LUA
       9580d7dd-0379-4658-9870-d5be7d52d6de:0x0001000000000000:0x4       : Microsoft-Windows-WLAN-AutoConfig
       959f1fac-7ca8-4ed1-89dc-cdfa7e093cb0:0x0001000000000000:0x4       : Microsoft-Windows-HealthCenterCPL
       96ac7637-5950-4a30-b8f7-e07e8e5734c1:0x0001000000000000:0x4       : Microsoft-Windows-Kernel-BootDiagnostics
       96f4a050-7e31-453c-88be-9634f4e02139:0x0001000000000000:0x4       : Microsoft-Windows-UserPnp
       98583af0-fc93-4e71-96d5-9f8da716c6b8:0x0001000000000000:0x4       : Microsoft-Windows-Dwm-Udwm
       a50b09f8-93eb-4396-84c9-dc921259f952:0x0001000000000000:0x4       : Microsoft-Windows-Diagnosis-MSDE<
       ad5162d8-daf0-4a25-88a7-01cbeb33902e:0x0001000000000000:0x4       : Microsoft-Windows-WPDClassInstaller
       ae4bd3be-f36f-45b6-8d21-bdd6fb832853:0x0001000000000000:0x4       : Microsoft-Windows-Audio
       af2e340c-0743-4f5a-b2d3-2f7225d215de:0x0001000000000000:0x4       : Microsoft-Windows-Netshell
       b03d4051-3564-4e93-93db-3c34f1b5b503:0x0001000000000000:0x4       : Microsoft-Windows-Diagnosis-TaskManager
       b1f90b27-4551-49d6-b2bd-dfc6453762a6:0x0001000000000000:0x4       : Microsoft-Windows-PowerCpl
       b2a40f1f-a05a-4dfd-886a-4c4f18c4334c:0x0001000000000000:0x4       : Microsoft-PerfTrack-IEFRAME
       c89b991e-3b48-49b2-80d3-ac000dfc9749:0x0001000000000000:0x4       : Microsoft-Windows-Documents
       c9bdb4eb-9287-4c8e-8378-6896f0d1c5ef:0x0001000000000000:0x4       : Microsoft-Windows-HomeGroup-ProviderService
       dbe9b383-7cf3-4331-91cc-a3cb16a3b538:0x0001000000000000:0x4       : Microsoft-Windows-Winlogon
       de513a55-c345-438b-9a74-e18cac5c5cc5:0x0001000000000000:0x4       : Microsoft-Windows-Help
       ded165cf-485d-4770-a3e7-9c5f0320e80c:0x0001000000000000:0x4       : Microsoft-Windows-DeviceUx
       e978f84e-582d-4167-977e-32af52706888:0x0001000000000000:0x4       : Microsoft-Windows-TabletPC-InputPanel
       ed6b3ba8-95b2-4cf5-a317-d4af7003884c:0x0001000000000000:0x4       : Microsoft-Windows-Sidebar
       f3f14ff3-7b80-4868-91d0-d77e497b025e:0x0001000000000000:0x4       : Microsoft-Windows-WMP
       fbcfac3f-8459-419f-8e48-1f0b49cdb85e:0x0001000000000000:0x4       : Microsoft-Windows-NetworkProfile
       ffdb9886-80f3-4540-aa8b-b85192217ddf:0x0001000000000000:0x4       : Microsoft-PerfTrack-MSHTML

We have a lot to play with as you can see, but let’s say you are trying to track down a socket leak.  Since we know sockets are implemented in afd.sys, let’s search for that provider…

C:\utils\xperf7sdk>xperf -providers | findstr "AFD"        
e53c6823-7bb8-44bb-90dc3f86090d48a6                               : Microsoft-Windows-Winsock-AFD       
Microsoft-Windows-Winsock-AFD


Now, we can form the XPerf syntax to trace socket operations during the leak.  The “AFDSession” is just the session name we are making up, the rest is the prescriptive syntax. Just to explain the syntax we are starting multiple sessions at the time here – kernel logger session + AFDSession. Similarly we are stopping both sessions at the same time and then merging both into the one file AFD.etl file.

xperf -on Latency -stackwalk profile -start AFDSession -on Microsoft-Windows-Winsock-AFD:::'stack'
<reproduce leak>
xperf -stop AFDSession -stop -d AFD.etl

Run xperfview AFD.etl and switch on “Stack counts by type graph”. If you don’t see this frame, make sure you are on the 7SDK version

clip_image001

The beauty here is the Stack Counts by Type frame which allows us to explore which stack traces where collected, notice all the AFD stack count sources!

clip_image002

Let’s say we have the simple case of a massive number of Create’s happening and thus handle counts rocketing up (leaking) for a given process.  Perhaps exhausting all ephemeral ports on the system, ultimately resulting in any number of hang symptoms.

Simply filter the Events for AfdCreate as follows.

clip_image003

Now, we have a filtered Frame of just those stack counts….

clip_image004

You can imagine that if there were a lot more creates here that the frequency distribution shown above would be much higher, but for the sake of just exploring the feature, let’s just look at these small number of creates anyway.  In fact, let’s select them all by just choosing Select View in the context menu which will get all of these events over the trace sample.

clip_image005

The frame area is now highlighted in light purple indicating the selected area.

clip_image006

Because we are interested in Stacks, make sure you have configured your symbol path and have checked Load Symbols

(be patient, this could take a while to build up your symbol cache)

clip_image007

clip_image008

Next, invoke the Summary Table

clip_image009

Click the flyout control on the left to select only the Creates again.

clip_image010

Next, make it easy on yourself by Filtering on the interesting process only.

clip_image011

After, much better.

clip_image012

Now, here is the fruit of our labor in the full stack trace inside iexplore.exe allocating the socket handles.

clip_image013

Continued…

clip_image001[5]

 

Consider that what we see above is read from the top down.  Iexplore.exe is doing this work through wininet, jsproxy, jscript, etc. all the way down through the calling of ws2_32.dll!socket which ultimately creates the socket and returned our file handle.  I’ve seen so many cases of ephemeral port exhaustion which is essentially a handle leak against sockets, these could have been solved in minutes with this tracing!  Keep in mind that this data isn’t perfectly suited for tracking handle leaks, especially small ones since there’s no accounting for outstanding handles.  However, sometimes the leak is egregious enough that you just need to look at activity over time to be pretty certain of the culprit.  Really, what we are doing here is just probing a particular ETW instrumented API, which is incredibly powerful.

Typically, on 64-bit computers, you cannot capture the kernel stack in certain contexts when page faults are not allowed. To enable walking the kernel stack on AMD64, set the DisablePagingExecutive Memory Management registry value to 1 and reboot. The DisablePagingExecutive registry value is located under the following registry key: HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management

In summary, this isn’t really a hidden feature since it’s documented in EnableTraceEx2’s Enable Parameters parameter, but I hope with this additional context you see what powerful analysis is possible.  Previously, this would have required live debugging a process, so now we have another example of the power of XPerf to decode and view key system activity, non-invasively!

-Tate

Leave a Comment
  • Please add 3 and 5 and type the answer here:
  • Post