<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Pigs Can Fly : xperf</title><link>http://blogs.msdn.com/pigscanfly/archive/tags/xperf/default.aspx</link><description>Tags: xperf</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Stack Walking in Xperf</title><link>http://blogs.msdn.com/pigscanfly/archive/2009/08/06/stack-walking-in-xperf.aspx</link><pubDate>Thu, 06 Aug 2009 20:52:25 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9859435</guid><dc:creator>rgr</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/pigscanfly/comments/9859435.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pigscanfly/commentrss.aspx?PostID=9859435</wfw:commentRss><wfw:comment>http://blogs.msdn.com/pigscanfly/rsscomments.aspx?PostID=9859435</wfw:comment><description>&lt;p&gt;&lt;strong&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Bruce Dawson is a performance analyst on the client performance team.&amp;#160; He has written this guest post on enabling stack walking using xperf for both 32-bit and 64-bit Windows systems (Vista and Win7).&amp;#160; For more posts on xperf see &lt;a href="http://blogs.msdn.com/pigscanfly/pages/xperf-articles.aspx" target="_blank"&gt;this page&lt;/a&gt;.&lt;/p&gt; &lt;strong&gt;&lt;/strong&gt;  &lt;h1&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/h1&gt;  &lt;p&gt;When I first started working with xperf I was confused by the many gotchas and settings surrounding the recording of call stacks. It seemed like there were many bits of crucial information needed in order to successfully record call stacks, and these bits were never gathered in one place. In order to save future generations from this complexity (and to give me a convenient reference to look at) I decided to write up what I have learned, while the lessons are still fresh in my memory. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;font color="#ff0000"&gt;Warning: &lt;/font&gt;&lt;/strong&gt;The syntax examples below make use of ‘-‘ characters and the single quote character. Both of these characters are often altered by word processing software to visually similar but actually different characters, which can lead to obscure syntax errors. If the example syntax doesn’t work then try typing in the command manually instead of using copy/paste to eliminate that as a factor. &lt;/p&gt;  &lt;h1&gt;&lt;strong&gt;Recording Stack Walks&lt;/strong&gt;&lt;/h1&gt;  &lt;p&gt;In order to enable call stacks in xperf you need to choose what type of event you want call stacks for, make sure that event is being recorded, and then enable call stacks for that type of event. The sampling profile events from the kernel provider (interrupting the CPU every millisecond and recording what it is doing) are one of the main uses of call stacks. The profile events can be enabled with the “PROFILE” kernel flag, or by using some of the kernel groups such as “Base” or “Latency” which include the PROFILE flag. For a list of all the kernel flags and kernel groups use: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;xperf -providers k&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;In addition to enabling the sampling profiler we need to enable stack walking on that event with the -stackwalk command. We also need to make sure that the PROC_THREAD and LOADER kernel flags are enabled – so that the xperf tools can identify what modules the code addresses are in. &lt;/p&gt;  &lt;p&gt;Putting this together, the following example command will tell xperf to record a trace with sampling profiler call stacks and save it to mytrace.etl: &lt;/p&gt;  &lt;blockquote&gt;   &lt;table border="0" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;       &lt;tr&gt;         &lt;td valign="top" width="400"&gt;&lt;font face="Courier New"&gt;xperf -on PROC_THREAD+LOADER+PROFILE -stackwalk profile&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="400"&gt;&lt;font face="Courier New"&gt;rem Your scenario goes here…&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="400"&gt;&lt;font face="Courier New"&gt;xperf -d mytrace.etl&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;     &lt;/tbody&gt;&lt;/table&gt; &lt;/blockquote&gt;  &lt;p align="left"&gt;Alternately, the following command will also work, because the “Latency” kernel group includes all of the necessary kernel flags.&lt;/p&gt;  &lt;blockquote&gt;   &lt;table border="0" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;       &lt;tr&gt;         &lt;td valign="top" width="400"&gt;&lt;font face="Courier New"&gt;xperf -on Latency -stackwalk profile&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="400"&gt;&lt;font face="Courier New"&gt;rem Your scenario goes here...&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="400"&gt;&lt;font face="Courier New"&gt;xperf -d mytrace.etl&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;     &lt;/tbody&gt;&lt;/table&gt; &lt;/blockquote&gt;  &lt;p&gt;Many events can have stack walking enabled using this basic method, so that if you want call stacks for context switches, registry operations, etc., you can get them. You can see the list of events that can have call stacks enabled for them with: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;xperf -help stackwalk&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;You can also record call stacks for manifest-based ETW (Event Tracing for Windows) events, but the syntax is quite different, and it only works on Windows 7 and above. When you specify your ETW provider to xperf after “-on” you can specify extra parameters after the provider name, separated by colons. These are flags, a level, and, for manifest-based providers, a list of extra data to record, which can include call stacks. You can leave the flags and level fields blank and just specify ‘stack’ (in single quotes) after three colons like this: &lt;/p&gt;  &lt;blockquote&gt;   &lt;table border="0" cellspacing="0" cellpadding="2" width="495"&gt;&lt;tbody&gt;       &lt;tr&gt;         &lt;td valign="top" width="493"&gt;&lt;font face="Courier New"&gt;xperf -on Latency -stackwalk profile -start browse -on Microsoft-IE:::'stack'&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="493"&gt;&lt;font face="Courier New"&gt;rem Your scenario goes here...&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="493"&gt;&lt;font face="Courier New"&gt;xperf -stop browse -stop -d mytrace.etl&lt;/font&gt;&lt;/td&gt;       &lt;/tr&gt;     &lt;/tbody&gt;&lt;/table&gt; &lt;/blockquote&gt;  &lt;p&gt;The syntax above starts the kernel session with the “Latency” set of kernel flags, enables stack walking on the profile events, starts a user session called “browse”, and enables the “Microsoft-IE” ETW provider for that session, with call stacks recorded for all of those events. Then, when your scenario is finished it stops the user session, stops the kernel session, and merges the kernel and user traces into “mytrace.etl”. It’s a bit of a mouthful, but it works. &lt;/p&gt;  &lt;h3&gt;&lt;strong&gt;Operating System Support for XPerf Stack Walks&lt;/strong&gt;&lt;/h3&gt;  &lt;p&gt;Recording stack walks from xperf is not supported in all variations of Windows and does have some prerequisites. Note also that stack walking for classic ETW events is not supported. Owners of classic ETW providers must upgrade their providers to manifest-based to take advantage of stack walking on their custom events.&lt;/p&gt;  &lt;p&gt;The following chart summarizes the situation for stack walking on kernel and manifest-based events:&lt;/p&gt;  &lt;table border="2" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="133"&gt;&amp;#160;&lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;&lt;strong&gt;32-bit&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;&lt;strong&gt;64-bit&lt;/strong&gt;&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;&lt;strong&gt;XP and below&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;No&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;No&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;&lt;strong&gt;Vista RTM*&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;Yes&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;No&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;&lt;strong&gt;Vista SP1*&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;Yes&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;Yes**&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="133"&gt;&lt;strong&gt;Windows 7&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;Yes&lt;/p&gt;       &lt;/td&gt;        &lt;td valign="top" width="133"&gt;         &lt;p align="center"&gt;Yes**&lt;/p&gt;       &lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;* Stack walking not supported for manifest-based ETW events.&lt;/p&gt;  &lt;p&gt;** Stack walking on 64-bit requires that the DisablePagingExecutive registry key be set. &lt;/p&gt;  &lt;h3&gt;&lt;strong&gt;Disable Paging Executive&lt;/strong&gt;&lt;/h3&gt;  &lt;p&gt;In order for tracing to work on 64-bit Windows you need to set the DisablePagingExecutive registry key. This tells the operating system not to page kernel mode drivers and system code to disk, which is a prerequisite for getting 64-bit call stacks using xperf, because 64-bit stack walking depends on metadata in the executable images, and in some situations the xperf stack walk code is not allowed to touch paged out pages. Running the following command from an elevated command prompt will set this registry key for you. &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;REG ADD &amp;quot;HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management&amp;quot; -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;After setting this registry key you will need to reboot your system before you can record call stacks. Having this flag set means that the Windows kernel locks more pages into RAM, so this will probably consume about 10 MB of additional physical memory. &lt;/p&gt;  &lt;h1&gt;&lt;strong&gt;Looking at Stack Walks&lt;/strong&gt;&lt;/h1&gt;  &lt;p&gt;The call stacks that result from the stack walks are visible in xperfview. You can run xperfview and load your trace or you can invoke it from the command line with: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;xperf mytrace.etl&lt;/font&gt; &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;If you enabled the PROFILE kernel flag then you should see one or more CPU Sampling graphs – if they aren’t enabled then use the selector on the left sidebar, or the Graphs menu, to enable one of them. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/StackWalkinginXperf_12FF9/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/StackWalkinginXperf_12FF9/image_thumb.png" width="540" height="247" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;From one of the CPU sampling graphs you can select an area of interest and then right-click and select Summary Table. Use the selector on the left sidebar of the summary table, or the Columns menu, to make sure that the Stack column is enabled and visible – this is where your call stacks will be shown. Xperfview summarizes multiple stack walks together, collapsing them down where they are identical. The Count field shows how many stacks were collapsed together on each row (how many times that partial call stack was hit), the Weight column shows an estimate of how many milliseconds of CPU time were spent in that call stack, and the %Weight column shows that as a percentage of total CPU time available.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/StackWalkinginXperf_12FF9/image_4.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/StackWalkinginXperf_12FF9/image_thumb_1.png" width="546" height="250" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;h3&gt;&lt;strong&gt;Sampling Implies Estimation&lt;/strong&gt;&lt;/h3&gt;  &lt;p&gt;CPU sampling is a statistical profiling process. If the number of samples is small, or if the code execution is correlated with the sampling interval, then the results can be off by a large amount. It is important to be aware of the limits of sampling in order to avoid reading too much meaning into a small number of counts.&lt;/p&gt;  &lt;h3&gt;&lt;strong&gt;Symbols&lt;/strong&gt;&lt;/h3&gt;  &lt;p&gt;If you haven’t loaded symbols then your call stacks will show DLL names followed by question marks, as shown below. You can tell that you’ve recorded a call stack, but the details of what code was executing are still hidden.&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td width="165"&gt;Stack&lt;/td&gt;        &lt;td width="90"&gt;Count&lt;/td&gt;        &lt;td width="93"&gt;Weight&lt;/td&gt;        &lt;td width="141"&gt;% Weight&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;[Root]&lt;/td&gt;        &lt;td width="90"&gt;4973&lt;/td&gt;        &lt;td width="93"&gt;4972.513&lt;/td&gt;        &lt;td width="141"&gt;93.51&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|- ntdll.dll!?&lt;/td&gt;        &lt;td width="90"&gt;4950&lt;/td&gt;        &lt;td width="93"&gt;4949.49&lt;/td&gt;        &lt;td width="141"&gt;93.08&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;| kernel32.dll!?&lt;/td&gt;        &lt;td width="90"&gt;4950&lt;/td&gt;        &lt;td width="93"&gt;4949.49&lt;/td&gt;        &lt;td width="141"&gt;93.08&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |- MSVCR90.DLL!?&lt;/td&gt;        &lt;td width="90"&gt;4904&lt;/td&gt;        &lt;td width="93"&gt;4903.486&lt;/td&gt;        &lt;td width="141"&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160; MSVCR90.DLL!?&lt;/td&gt;        &lt;td width="90"&gt;4904&lt;/td&gt;        &lt;td width="93"&gt;4903.486&lt;/td&gt;        &lt;td width="141"&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160; MFC90.DLL!?&lt;/td&gt;        &lt;td width="90"&gt;4904&lt;/td&gt;        &lt;td width="93"&gt;4903.486&lt;/td&gt;        &lt;td width="141"&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160; FractalX.EXE!?&lt;/td&gt;        &lt;td width="90"&gt;4904&lt;/td&gt;        &lt;td width="93"&gt;4903.486&lt;/td&gt;        &lt;td width="141"&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160; FractalX.EXE!?&lt;/td&gt;        &lt;td width="90"&gt;4904&lt;/td&gt;        &lt;td width="93"&gt;4903.486&lt;/td&gt;        &lt;td width="141"&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160; FractalX.EXE!?&lt;/td&gt;        &lt;td width="90"&gt;4904&lt;/td&gt;        &lt;td width="93"&gt;4903.486&lt;/td&gt;        &lt;td width="141"&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |&amp;#160; |- FractalX.EXE!?&lt;/td&gt;        &lt;td width="90"&gt;4900&lt;/td&gt;        &lt;td width="93"&gt;4899.485&lt;/td&gt;        &lt;td width="141"&gt;92.14&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |&amp;#160; |- user32.dll!?&lt;/td&gt;        &lt;td width="90"&gt;3&lt;/td&gt;        &lt;td width="93"&gt;3.001464&lt;/td&gt;        &lt;td width="141"&gt;0.06&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |&amp;#160; |- MSVCR90.DLL!?&lt;/td&gt;        &lt;td width="90"&gt;1&lt;/td&gt;        &lt;td width="93"&gt;0.999918&lt;/td&gt;        &lt;td width="141"&gt;0.02&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|&amp;#160; |- FractalX.EXE!?&lt;/td&gt;        &lt;td width="90"&gt;46&lt;/td&gt;        &lt;td width="93"&gt;46.00391&lt;/td&gt;        &lt;td width="141"&gt;0.87&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td width="165"&gt;|- ntkrnlmp.exe!?&lt;/td&gt;        &lt;td width="90"&gt;23&lt;/td&gt;        &lt;td width="93"&gt;23.0229&lt;/td&gt;        &lt;td width="141"&gt;0.43&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&amp;#160; &lt;/p&gt;  &lt;p&gt;A single question mark in the stack column means that no call stack is available at all. If “?!?” is displayed then that means that xperf doesn’t know what executable image was at the address. &lt;/p&gt;  &lt;p&gt;If the DLLs are from Microsoft then you should point your symbol path to Microsoft’s symbol servers (using the _NT_SYMBOL_PATH environment variable or Configure Symbol Paths from the Trace menu). You can have multiple symbol paths, separate by semicolons, to that you can point at Microsoft’s symbols and your own simultaneously. Then you need to select Load Symbols from the Trace menu or a graph context menu. &lt;/p&gt;  &lt;h3&gt;&lt;strong&gt;Exploring Call Stacks&lt;/strong&gt;&lt;/h3&gt;  &lt;p&gt;The call stack display is a bit unusual in that the root of the call stack (conveniently labeled as [Root]) is displayed at the top and you expand downwards from there. The layout is also a bit subtle and non-obvious at first. When you see multiple functions above each other at the same indentation with no minus signs in front of them (for example &lt;strong&gt;endthreadex&lt;/strong&gt; and &lt;strong&gt;_AfxThreadEntry&lt;/strong&gt; below) then that means that in that section of the call stack all of the collapsed stacks took the same path – &lt;strong&gt;endthreadex&lt;/strong&gt; called &lt;strong&gt;_AfxThreadEntry&lt;/strong&gt; in all of the samples and these functions have a parent/child relationship. This layout helps to keep the call stack display as compact as possible.&lt;/p&gt;  &lt;table border="0" cellspacing="0" cellpadding="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td width="297"&gt;Stack&lt;/td&gt;        &lt;td width="64"&gt;Count&lt;/td&gt;        &lt;td width="64"&gt;Weight&lt;/td&gt;        &lt;td width="64"&gt;% Weight&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;[Root]&lt;/td&gt;        &lt;td&gt;4973&lt;/td&gt;        &lt;td&gt;4972.513&lt;/td&gt;        &lt;td&gt;93.51&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|- ntdll.dll!?&lt;/td&gt;        &lt;td&gt;4950&lt;/td&gt;        &lt;td&gt;4949.49&lt;/td&gt;        &lt;td&gt;93.08&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; kernel32.dll!?&lt;/td&gt;        &lt;td&gt;4950&lt;/td&gt;        &lt;td&gt;4949.49&lt;/td&gt;        &lt;td&gt;93.08&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |- MSVCR90.DLL!endthreadex&lt;/td&gt;        &lt;td&gt;4904&lt;/td&gt;        &lt;td&gt;4903.486&lt;/td&gt;        &lt;td&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160;&amp;#160; MSVCR90.DLL!endthreadex&lt;/td&gt;        &lt;td&gt;4904&lt;/td&gt;        &lt;td&gt;4903.486&lt;/td&gt;        &lt;td&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160;&amp;#160; MFC90.DLL!_AfxThreadEntry&lt;/td&gt;        &lt;td&gt;4904&lt;/td&gt;        &lt;td&gt;4903.486&lt;/td&gt;        &lt;td&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160;&amp;#160; FractalX.EXE!PrimaryCalculationProc&lt;/td&gt;        &lt;td&gt;4904&lt;/td&gt;        &lt;td&gt;4903.486&lt;/td&gt;        &lt;td&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |&amp;#160;&amp;#160;&amp;#160;&amp;#160; FractalX.EXE!HomeWork::CalculateNextBit&lt;/td&gt;        &lt;td&gt;4904&lt;/td&gt;        &lt;td&gt;4903.486&lt;/td&gt;        &lt;td&gt;92.21&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |&amp;#160; |- FractalX.EXE!Calculator::Calculate&lt;/td&gt;        &lt;td&gt;4899&lt;/td&gt;        &lt;td&gt;4898.484&lt;/td&gt;        &lt;td&gt;92.12&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |&amp;#160; |&amp;#160; |- FractalX.EXE!Calculator::Calculate&amp;lt;itself&amp;gt;&lt;/td&gt;        &lt;td&gt;2743&lt;/td&gt;        &lt;td&gt;2742.873&lt;/td&gt;        &lt;td&gt;51.58&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |&amp;#160; |&amp;#160; |- FractalX.EXE!Calculator::CalculateHelper&lt;/td&gt;        &lt;td&gt;2156&lt;/td&gt;        &lt;td&gt;2155.611&lt;/td&gt;        &lt;td&gt;40.54&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |&amp;#160; |- FractalX.EXE!IterDataImp&amp;lt;int&amp;gt;::Render&lt;/td&gt;        &lt;td&gt;5&lt;/td&gt;        &lt;td&gt;5.002155&lt;/td&gt;        &lt;td&gt;0.09&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td&gt;|&amp;#160; |- FractalX.EXE!__tmainCRTStartup&lt;/td&gt;        &lt;td&gt;46&lt;/td&gt;        &lt;td&gt;46.00391&lt;/td&gt;        &lt;td&gt;0.87&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&amp;#160; &lt;/p&gt;  &lt;p&gt;When you see multiple functions above each other at the same indentation &lt;i&gt;with&lt;/i&gt; minus signs in front of them (for instance &lt;strong&gt;Calculator::Calculate&lt;/strong&gt; and &lt;strong&gt;IterDataImp&amp;lt;int&amp;gt;::Render&lt;/strong&gt;) it means that all of these functions are called by the function above them. In other words those functions are siblings, all called by &lt;strong&gt;HomeWork::CalculateNextBit&lt;/strong&gt;, and the collapsed call stacks took multiple paths at this point. &lt;/p&gt;  &lt;p&gt;When you see a line that ends with &amp;lt;itself&amp;gt; then than line represents samples that occurred in that function, as opposed to in its descendants. This is often described as the exclusive time, as opposed to inclusive time which includes descendants. The &amp;lt;itself&amp;gt; line for a function, if present, will show up as a child node of the function so that the child nodes’ counts sum up to the parent’s count. In the example above Calculator::Calculate was running for 2743 samples and its child function was running for 2156 samples, for a total of 4899. &lt;/p&gt;  &lt;p&gt;The best way to get the feel for how this works is to play around with it. If you select a call stack and repeatedly press the right arrow key then the call stack will keep expanding down the hottest path and this can be a very effective way to drill down into what is probably the most important call stack. &lt;/p&gt;  &lt;p&gt;To see call stacks for manifest-based ETW events (assuming you enabled them) you need to go to the Generic Events graph and then select Summary Table from its context menu. From there you should make sure that the Stack column is enabled and you can then explore the stacks. The Count column can be very useful to see how many times the various events were hit from various call stacks. When enabled the stacks of manifest-based events are also available in the tooltips of the corresponding event markers in the Generic graph itself. &lt;/p&gt;  &lt;h1&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;/h1&gt;  &lt;p&gt;Call stacks can be recorded with: &lt;/p&gt;  &lt;blockquote&gt;   &lt;table border="0" cellspacing="0" cellpadding="2" width="400"&gt;&lt;tbody&gt;       &lt;tr&gt;         &lt;td valign="top" width="400"&gt;xperf -on Latency -stackwalk profile&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="400"&gt;rem Your scenario goes here...&lt;/td&gt;       &lt;/tr&gt;        &lt;tr&gt;         &lt;td valign="top" width="400"&gt;xperf -d mytrace.etl&lt;/td&gt;       &lt;/tr&gt;     &lt;/tbody&gt;&lt;/table&gt; &lt;/blockquote&gt;  &lt;p&gt;If you are tracing on 64-bit Windows don’t forget to set DisablePagingExecutive and then reboot before recording call stacks. And remember that Windows 7 has better ETW call stack recording support than Windows Vista, and Windows XP has no ETW call stack recording support. &lt;/p&gt;  &lt;p&gt;To analyze call stacks load the trace into xperfview and look in the appropriate summary table in the Stack column. &lt;/p&gt;  &lt;p&gt;Happy tracing! &lt;/p&gt;  &lt;p&gt;Bruce Dawson, Windows Client Performance Team &lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9859435" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/xperf/default.aspx">xperf</category></item><item><title>So just what is in a trace? Using the xperf trace dumper</title><link>http://blogs.msdn.com/pigscanfly/archive/2008/03/16/so-just-what-is-in-a-trace-using-the-xperf-trace-dumper.aspx</link><pubDate>Sun, 16 Mar 2008 19:04:10 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8261959</guid><dc:creator>rgr</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/pigscanfly/comments/8261959.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pigscanfly/commentrss.aspx?PostID=8261959</wfw:commentRss><wfw:comment>http://blogs.msdn.com/pigscanfly/rsscomments.aspx?PostID=8261959</wfw:comment><description>&lt;p&gt;There is a lot of information in a typical kernel trace.&amp;#160; While the Performance Analyzer tool is quite powerful and makes it easy to view a trace graphically, sometimes you just need to see what is in the trace directly.&amp;#160; Xperf makes this easy.&lt;/p&gt;  &lt;p&gt;First, its important to understand that a trace file (.ETL) is simply just the buffers produced by trace session written to a file.&amp;#160; The data in an ETL file isn't pre-processed, summarized, or otherwise annotated with meta data as it comes out of the OS.&amp;#160; Its is just the raw data that comes from a ETW session.&amp;#160; This is because ETW is designed for log time efficiency - ETW does the absolutely minimal amount of work needed to get the trace data to a file, or other consumer. &lt;/p&gt;  &lt;p&gt;This means that all the heavy lifting of post processing trace data happens later. With the xperf tools, there are two places where this occurs:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;In the merge step, xperf takes the kernel trace and trace files and merges them into a single trace file.&amp;#160; Xperf will merges (adds) meta data to the trace (I've got another post that provides all the detailed on merging in the works...).&amp;#160; The result of merging is a single trace file that can be analyzed by the tools directly on the target machine, or copied to another system for analysis.&amp;#160;&amp;#160; Note that the merge step &lt;u&gt;must&lt;/u&gt; happen on the system where the trace was taken (the target system).       &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;When a trace is processed xperf using actions, or loaded into Performance Analyzer, the core trace processing components do a lot of work on the raw trace data.&amp;#160; This includes things like mapping process IDs (PIDs) to file and process names, mapping addresses to filenames, loading symbols for address, unifying stacks, and handling 64-bit and 32-bit differences. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;As you've seen in the &lt;a href="http://blogs.msdn.com/pigscanfly/pages/xperf-articles.aspx" target="_blank"&gt;other posts&lt;/a&gt;, once a trace is merged it can be viewed in the Performance Analyzer.&amp;#160; But, xperf also allows you to see what is in the trace using the dumper action.&amp;#160; This is easy to do:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;&lt;strong&gt;xperf -i fs.etl -a dmper &amp;gt;fs.csv&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The &lt;strong&gt;&lt;font face="Courier New"&gt;-i fs.etl&lt;/font&gt;&lt;/strong&gt; specified that the input file is FS.ETL.&amp;#160;&amp;#160;&amp;#160; The &lt;font face="Courier New"&gt;&lt;strong&gt;-a dumper&lt;/strong&gt;&lt;/font&gt; parameter tells xperf to execute the dumper action.&amp;#160;&amp;#160; The output goes to the standard output.&lt;/p&gt;  &lt;p&gt;There is a short cut for this as well: the dumper action is the default action so if you only specify an input file then xperf simply dumps it.&amp;#160;&amp;#160; For example, the following command does the same thing as the one above:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;&lt;strong&gt;xperf -i fs.etl&amp;#160; &amp;gt;fs.csv&lt;/strong&gt;&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;The resulting file is an ANSI text file where each line is one record.&amp;#160;&amp;#160; Each record consists of a comma delimited set of fields.&amp;#160; The first field of each line is the name (or type) of the record.&amp;#160; &lt;/p&gt;  &lt;p&gt;There are some special lines and sections at the front of the file.&amp;#160;&amp;#160; Each record type is described by a header line.&amp;#160;&amp;#160; The header lines are delimited by the &lt;strong&gt;'BeginHeader'&lt;/strong&gt; and &lt;strong&gt;'EndHeader'&lt;/strong&gt; lines.&amp;#160;&amp;#160; Note that the line immediately after the 'EndHeader' line is unique, it doesn't have a header line.&amp;#160; This line describes some of the characteristics of the trace such as its duration, and the pointer size.&lt;/p&gt;  &lt;p&gt;The first field of each header line is the name (or type) of the ETW record that the header line describes.&amp;#160; The rest of the fields are the names of each of the fields for the record type.&amp;#160; Here is an example of the process start event header (P-Start) and a P-Start event. &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;P-Start,&amp;#160; TimeStamp, Process Name ( PID),&amp;#160; ParentPID,&amp;#160; SessionID,&amp;#160; UniqueKey, UserSid, Command Line&lt;/font&gt;&lt;/p&gt;    &lt;p&gt;&lt;font face="Courier New"&gt;P-Start, 1017280, fs.exe (3004), 3608, 1, 0x86661508,        &lt;br /&gt;S-1-5-21-626881126-397955417-188441333-3225678, c:\coding\fs\Release\fs\fs.exe&amp;#160; blflargorg c:\coding\*.cpp *.h -s&lt;/font&gt; &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This event describes the start of a process.&amp;#160; There is also a corresponding P-End event. For processes that are already running when he trace is begun, the kernel logger includes a pseudo P-Start event.&amp;#160; &lt;/p&gt;  &lt;p&gt;This means that every PID seen in other events will have a corresponding P-Start event in the trace before it is seen in an event. &lt;/p&gt;  &lt;p&gt;Also note that xperf will dump events that you add to your own applications so long as you include an &lt;a href="http://msdn2.microsoft.com/en-us/library/aa384043(VS.85).aspx" target="_blank"&gt;event manifest&lt;/a&gt; in your app.&amp;#160; So, you can add your own events and use xperf to dump them in the context of all the other events you include in the trace. &lt;/p&gt;  &lt;div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:14307c8f-2b25-40e1-bae1-3e6d6a7fb480" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px"&gt;del.icio.us Tags: &lt;a href="http://del.icio.us/popular/performance" rel="tag"&gt;performance&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/xperf" rel="tag"&gt;xperf&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/windows" rel="tag"&gt;windows&lt;/a&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8261959" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/xperf/default.aspx">xperf</category></item><item><title>Using the Windows Sample Profiler with Xperf</title><link>http://blogs.msdn.com/pigscanfly/archive/2008/03/02/using-the-windows-sample-profiler-with-xperf.aspx</link><pubDate>Sun, 02 Mar 2008 20:06:12 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7991482</guid><dc:creator>rgr</dc:creator><slash:comments>6</slash:comments><comments>http://blogs.msdn.com/pigscanfly/comments/7991482.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pigscanfly/commentrss.aspx?PostID=7991482</wfw:commentRss><wfw:comment>http://blogs.msdn.com/pigscanfly/rsscomments.aspx?PostID=7991482</wfw:comment><description>&lt;p&gt;Using the xperf tools, ETW, and the kernel sample profile interrupt all together provides a very effective and easy to use sample profiler for the analysis of both application and system wide performance.&amp;#160; At each sample interrupt, the ETW sub-system captures the instruction pointer and the stack.&amp;#160; This data is lazily and efficiently logged to an ETL file.&amp;#160; Once the data is saved, it can be analyzed with Performance Analyzer.&lt;/p&gt;  &lt;p align="center"&gt;&lt;em&gt;The next article in this series is&amp;#160; &lt;/em&gt;&lt;a href="http://blogs.msdn.com/pigscanfly/archive/2008/03/16/so-just-what-is-in-a-trace-using-the-xperf-trace-dumper.aspx"&gt;So just what is in a trace? Using the xperf trace dumper&lt;/a&gt;&lt;/p&gt;  &lt;p align="center"&gt;&lt;em&gt;&lt;u&gt;Note:&lt;/u&gt;&lt;/em&gt; &lt;em&gt;the examples in this post only works on Vista or Server 2008 32-bit;&amp;#160; Prior operating system's do not support taking stack traces.&amp;#160; Taking stack traces on 64-bit platforms will be the topic of another post.&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;Here is an example of profiling FS.EXE, a grep-like utility I've written.&amp;#160; I use this tool for experimenting with various topics such as efficient I/O, well performing string matching algorithms, and instrumenting applications with ETW. &lt;/p&gt;  &lt;p&gt;For this test, I put the following commands in a CMD file: &lt;/p&gt;  &lt;ul&gt;   &lt;ul&gt;     &lt;li&gt;&lt;font face="Courier New" size="2"&gt;xperf -on PROC_THREAD+LOADER+INTERRUPT+DPC+PROFILE          &lt;br /&gt;-stackwalk profile           &lt;br /&gt;-minbuffers 16 -maxbuffers 1024 -flushtimer 0           &lt;br /&gt;-f e:\tmp.etl           &lt;br /&gt;&lt;/font&gt;&lt;/li&gt;      &lt;li&gt;&lt;font face="Courier New" size="2"&gt;fs.exe farglenorgin c:\coding\*.cpp *.h -s          &lt;br /&gt;&lt;/font&gt;&lt;/li&gt;      &lt;li&gt;&lt;font face="Courier New" size="2"&gt;xperf -d profile.etl&lt;/font&gt; &lt;/li&gt;   &lt;/ul&gt; &lt;/ul&gt;  &lt;p&gt;Since the commands are a bit long, I've separated them above and added line breaks to make them readable.&amp;#160; Each command above should be on one line in your command file.&lt;/p&gt;  &lt;p&gt;The first command turns on the kernel logger and enables the following events:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;PROC_THREAD flag enables the &lt;a href="http://msdn2.microsoft.com/en-us/library/aa364092(VS.85).aspx" target="_blank"&gt;process&lt;/a&gt; and &lt;a href="http://msdn2.microsoft.com/en-us/library/aa364132(VS.85).aspx" target="_blank"&gt;thread&lt;/a&gt; events. These mark the beginning and ending of each process and thread.&amp;#160; The kernel provider guarantees that there will be a begin/end pair for every process and thread during the trace.&amp;#160;&amp;#160; Process and threads that exist before the trace was started or are still running when the trace is stopped also have these events. &lt;/li&gt;    &lt;li&gt;The LOADER flag enables the &lt;a href="http://msdn2.microsoft.com/en-us/library/aa364068(VS.85).aspx" target="_blank"&gt;loader events&lt;/a&gt; that log when the kernel loads an image (an EXE or DLL) &lt;/li&gt;    &lt;li&gt;The INTERRUPT&amp;#160; and DPC flags enable the ETW &lt;a href="http://msdn2.microsoft.com/en-us/library/aa964780(VS.85).aspx" target="_blank"&gt;interrupt&lt;/a&gt; and &lt;a href="http://msdn2.microsoft.com/en-us/library/aa964748(VS.85).aspx" target="_blank"&gt;DPC&lt;/a&gt; events which mark each interrupt and deferred procedure call which are routines that run at &lt;a href="DISPATCH_LEVEL" target="_blank"&gt;DISPATCH_LEVEL&lt;/a&gt;t &lt;/li&gt;    &lt;li&gt;The PROFILE flag does two things; it turns on the systems sample profile interrupt and it enables the kernel's &lt;a href="http://msdn2.microsoft.com/en-us/library/aa964806(VS.85).aspx" target="_blank"&gt;sample profile ETW event&lt;/a&gt;. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The other flags are important as well.&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;The &lt;strong&gt;-stackwalk profile&lt;/strong&gt; parameter turns on ETW's stack walking feature for the sample profile event.&amp;#160;&amp;#160; Every time a sample profile event is triggered by the sample profile interrupt, ETW will capture the stack and save the data in the trace buffers.&amp;#160;&amp;#160; &lt;/li&gt;    &lt;li&gt;The &lt;strong&gt;-minbuffers 16&lt;/strong&gt; parameter sets the minimum number of buffers that ETW will allocate for storing events.&amp;#160; Note, you need at least two for each processor in you system. &lt;/li&gt;    &lt;li&gt;The &lt;strong&gt;-maxbuffers 1024&lt;/strong&gt; parameter sets the maximum number of buffers ETW will allocate to 1024 - a total of 64MB. &lt;/li&gt;    &lt;li&gt;The &lt;strong&gt;-flushtimer 0&lt;/strong&gt; parameter tells ETW to never flush the buffers based on&amp;#160; timer, buffer's will only be written to disk when they are full. &lt;/li&gt;    &lt;li&gt;The &lt;strong&gt;-f e:\tmp.etl&lt;/strong&gt; parameter tells ETW to lazily write the full ETW buffers to e:\tmp.etl.&amp;#160;&amp;#160; This puts the log file on a different physical drive than the drive on which the experiment is running.&amp;#160; This means that the writes that ETW uses to save the trace data do not occur on the interesting drive. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;The second command simply runs the experiment.&amp;#160; It searches for the string '&lt;font face="Courier New"&gt;farglenorgin&lt;/font&gt;' in all my .CPP and .H files.&amp;#160; I'm using a string that doesn't exist so I execute the worst case code paths in the application.&amp;#160; Replace this command with a command to run your experiment, or a pause instruction so you can dork around with a graphical program. &lt;/p&gt;  &lt;p&gt;The third command simply stops the kernel logger, merges the data and saves it in &lt;strong&gt;profile.etl&lt;/strong&gt;. &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;em&gt;&lt;strong&gt;&lt;font color="#ff8000"&gt;NOTE: &lt;/font&gt;&lt;/strong&gt;These commands need to be run from an elevated command prompt.&amp;#160;&amp;#160; Controlling ETW tracing requires administrative privileges.&lt;/em&gt; &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;There is now one other thing to do before examining the data - setting the symbol path.&amp;#160;&amp;#160; Here is how I set the symbol path for this example:&amp;#160; &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;font face="Courier New"&gt;set _NT_SYMBOL_PATH =        &lt;br /&gt;&lt;/font&gt;&lt;font face="Courier New" size="2"&gt;c:\coding\fs\release\fs;        &lt;br /&gt;&lt;/font&gt;&lt;font face="Courier New" size="2"&gt;SRV*c:\symbols*http://msdl.microsoft.com/download/symbols&lt;/font&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This tells the symbol decoder to look for symbols in the release build directory for FS.EXE and in the Windows public symbol server, caching the served symbols in c:\symbols.&amp;#160; The xperf tools uses the symbol decoding libraries from the &lt;a href="http://www.microsoft.com/whdc/DevTools/Debugging/default.mspx" target="_blank"&gt;debugging tools for Windows&lt;/a&gt;.&amp;#160; You can find more information on using symbols &lt;a href="http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx" target="_blank"&gt;here&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Once the trace is taken and the symbol path is set, then simply open the trace in Performance Analyzer with the command &amp;quot;&lt;strong&gt;&lt;font face="Courier New" size="2"&gt;xperf profile.etl&lt;/font&gt;&lt;/strong&gt;&amp;quot;.&amp;#160; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_4.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; margin: 0px 0px 0px 5px; border-right-width: 0px" height="112" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_1.png" width="235" align="right" border="0" /&gt;&lt;/a&gt;The CPU Sampling by Process graph is the most interesting graph for this example.&amp;#160; To select the visible graphs, click on the flyout control on the left of the window, then select the CPU Sampling by CPU, and by Process graphs.&lt;/p&gt;  &lt;p&gt;For his experiment, the CPU sampling by Process graph looks like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_36.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="126" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_17.png" width="538" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;By default, all processes running during the trace are shown except the idle task (as seen above).&amp;#160; You can change which processes are displayed or hidden by by using the check boxes in the legend drop down as shown above.&lt;/p&gt;  &lt;p&gt;This graph illustrates an important concept about the kernel event provider and the xperf tools in general - they are specifically designed to analyze system wide and application performance data and events.&amp;#160; For example, in the legend above, there are many processes listed, but only a few of them actually used any CPU time during the experiment. &lt;/p&gt;  &lt;p&gt;Using the legend, you can eliminate all processes except the interesting one.&amp;#160;&amp;#160; Here is what the graph of CPU utilization for only FS.EXE looks like&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_10.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="209" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_4.png" width="523" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This is pretty cool as it provides a nice overview of the CPU utilization of FS.EXE, but it really doesn't tell us much about where time is being spent in the process it self.&lt;/p&gt;  &lt;p&gt;The real power in Performance Analyzer is in its summary tables.&amp;#160; These are tabular displays of data about a specific chart, or a region in a chart.&amp;#160; For this experiment, I looked at the sample profile data for the entire trace.&amp;#160; To do this, right mouse click on the CPU Sampling by Process chart, make sure that the load symbols option is set, then select the Summary table view. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_12.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="212" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_5.png" width="529" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Note, it will take 10 to 20 seconds for the summary table to show up.&amp;#160; Performance Analyzer is loading symbols while this is happening.&amp;#160; (putting symbol loading on a background thread is on our to do list...).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_14.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; margin: 0px 0px 0px 5px; border-right-width: 0px" height="103" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_6.png" width="108" align="right" border="0" /&gt;&lt;/a&gt;After the summary table pops up, click on the flyout and select the columns for display.&amp;#160; In this case you will want the process, stack and % Weight columns (feel free to experiment with other columns).&amp;#160; &lt;/p&gt;  &lt;p&gt;Next, arrange the columns as follows.&amp;#160; The columns to the left of the gold column are grouping columns.&amp;#160;&amp;#160; You can change the order of columns and put them to the left or right of the gold column by dragging.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_16.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="72" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_7.png" width="529" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Now, you can expand the stacks for FS.EXE and see where it is spending its time.&amp;#160; Not that this isn't by function as in some profilers but by &lt;u&gt;call stack&lt;/u&gt;.&amp;#160; This is much more powerful than simply knowing the functions where time is spent as it also shows you how the time consuming functions were called.&lt;/p&gt;  &lt;p&gt;Its no surprise that my find string utility spends most of its time in the following stack:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_20.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="361" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_9.png" width="536" border="0" /&gt;&lt;/a&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;As with other sample profilers, you can look &amp;quot;up&amp;quot; and &amp;quot;down&amp;quot; the stacks from any particular point. This is commonly called a butterfly view.&amp;#160; Right mouse click on any item in the stack column and experiment with the &lt;em&gt;callers/callees&lt;/em&gt; and &lt;em&gt;inntermost/outermost&lt;/em&gt; options, like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_22.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="206" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_10.png" width="452" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;This stack trace has very simple call stacks so it isn't very useful for looking at butterfly views.&amp;#160; But try one of your own programs and look at a butterfly stack view of a function that is called often from multiple places.&amp;#160;&amp;#160; Or, use the butterfly view too look at a intermediate function and see all the functions it calls, and their stacks. &lt;/p&gt;  &lt;p&gt;The above screen shots and summary table views contain the data from the entire trace.&amp;#160;&amp;#160; This works ok for short traces.&amp;#160; But for longer traces, or even short traces with a lot of detail, we often need to look at specific time spans.&amp;#160; &lt;/p&gt;  &lt;p&gt;For example, there are some time spans in my experiment where FS isn't using very little CPU time.&amp;#160; I'd like to see what FS is up to in that time span.&amp;#160; This is easily done by using the left mouse button to select a time span on the X axis and zooming the graph to that view, or look at the summary table for that span, as in this example.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_30.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="228" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_14.png" width="531" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_32.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; margin: 0px 0px 0px 5px; border-right-width: 0px" height="118" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_15.png" width="163" align="right" border="0" /&gt;&lt;/a&gt;Once the interesting region is selected, I simply use the right mouse button to pop up the context menu and select summary table.&amp;#160; &lt;/p&gt;  &lt;p&gt;Note that you can open up multiple summary tables, each from different regions of a graph, or even different graphs.&amp;#160; This is great for making comparisons.&lt;/p&gt;  &lt;p&gt;The new summary table window now only shows the data for selected time span in the trace.&amp;#160; Is not surprising that FS is spending the little CPU time it is using in user mode asynchronous procedure calls. &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_34.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="288" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UsingWindowsSampleProfilerwithXperf_9E19/image_thumb_16.png" width="536" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;This post illustrates some key concepts:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;The xperf tools are designed for both system wide and application specific analysis. &lt;/li&gt;    &lt;li&gt;Profiling with ETW is very, very light weight.&amp;#160; While the experiment is running, the xperf tools are not even loaded - the kernel itself is collecting the data.&amp;#160;&amp;#160; All analysis is done as post processing tasks. &lt;/li&gt;    &lt;li&gt;OS based sample profiling collects both user and kernel mode stacks.&amp;#160;&amp;#160; &lt;/li&gt;    &lt;li&gt;So long as you have symbols, production code can be profiled - no special debug or instrumented builds are required. &lt;/li&gt;    &lt;li&gt;In this example, I started and stopped FS.exe (the experiment) between the tracing start and stop.&amp;#160; But, since this is ETW based, sample profiling can be started and stopped at any time, without stopping or restating even a single process.&amp;#160; You can profile anything at any time on any system. &lt;/li&gt;    &lt;li&gt;Stack views provide a very powerful method for analyzing where time is spent in a process. &lt;/li&gt;    &lt;li&gt;The general technique with Performance Analyzer is to use the graphics to identify interesting time spans in the trace, then use the summary tables to look at the data in detail. &lt;/li&gt; &lt;/ul&gt;  &lt;div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:0a4849cb-6ab2-401e-8382-df6f25b204c5" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px"&gt;del.icio.us Tags: &lt;a href="http://del.icio.us/popular/performance" rel="tag"&gt;performance&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/windows" rel="tag"&gt;windows&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/tools" rel="tag"&gt;tools&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/xperf" rel="tag"&gt;xperf&lt;/a&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7991482" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/xperf/default.aspx">xperf</category></item><item><title>Xperf support for XP</title><link>http://blogs.msdn.com/pigscanfly/archive/2008/02/24/xperf-support-for-xp.aspx</link><pubDate>Sun, 24 Feb 2008 20:56:35 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7880729</guid><dc:creator>rgr</dc:creator><slash:comments>5</slash:comments><comments>http://blogs.msdn.com/pigscanfly/comments/7880729.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pigscanfly/commentrss.aspx?PostID=7880729</wfw:commentRss><wfw:comment>http://blogs.msdn.com/pigscanfly/rsscomments.aspx?PostID=7880729</wfw:comment><description>&lt;p&gt;&amp;quot;Do the xperf tools support XP or Windows Server 2003?&amp;quot; is a frequently ask question.&amp;#160; The answer is no mostly, and yes for a few things.&amp;#160; &lt;/p&gt;  &lt;p align="center"&gt;&lt;em&gt;The next article in this series is &lt;/em&gt;&lt;a href="http://blogs.msdn.com/pigscanfly/archive/2008/03/02/using-the-windows-sample-profiler-with-xperf.aspx"&gt;&lt;em&gt;Using the Windows Sample Profiler with Xperf&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;xperf.exe&lt;/strong&gt; can be used on Windows XP SP2, and Windows Server 2003 for turning tracing on and of, and merge kernel trace data with user mode traces into a single ETL file.&amp;#160;&amp;#160; These operations are simply called &amp;quot;trace control&amp;quot;.&amp;#160;&amp;#160; NOte that the '&lt;strong&gt;&lt;font face="Courier New"&gt;-stackwalk&lt;/font&gt;&lt;/strong&gt;' switch is not supported on XP because its kernel doesn't support capturing the stack on events, this is anew feature in the Vista kernel.&lt;/p&gt;  &lt;p&gt;However, all operations that require trace decoding (and that's almost everything else), must be done on Vista or Windows Server 2008.&amp;#160; This includes viewing traces in the Windows Performance Analyzer tool (&lt;strong&gt;xperfview.exe&lt;/strong&gt;).&lt;/p&gt;  &lt;p&gt;The next question is this &amp;quot;The xperf tool kit installer doesn't install the tools on XP or WS2003; how do I get the tools on those systems?&amp;quot;&lt;/p&gt;  &lt;p&gt;The answer is simple: From a Vista or WS2008 installation copy &lt;strong&gt;&lt;font face="Courier New"&gt;xperf.exe&lt;/font&gt; &lt;/strong&gt;and &lt;strong&gt;&lt;font face="Courier New"&gt;perfctrl.dll&lt;/font&gt;&lt;/strong&gt; to the target system. This is all xperf needs to support trace control&lt;strong&gt;.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;After you have generated an ETL file, you can then copy it to a Vista or WS2008 system for trace decoding.&amp;#160; &lt;/p&gt;  &lt;p&gt;For those of you interested in the long story....&amp;#160;&amp;#160; &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&amp;lt;boooring&amp;gt;      &lt;br /&gt;&lt;/strong&gt;Event Tracing for Windows was first introduced in Windows in 2000.&amp;#160; Back then, the OS only supported a small number of events; very few other Window's components used ETW.&amp;#160;&amp;#160; In those days, event logging with ETW was in its infancy and the people that wrote event consumers generally also wrote the code that produced the events, or worked closely with those that did.&lt;/p&gt;  &lt;p&gt;Back in the day, many event providers and consumer's simply used the same C/C++ data structures to produce and consume events.&amp;#160; While simple, this sometimes broke because people wouldn't version the events correctly when the event structure changed.&amp;#160; In short, if the producer and the consumer code wasn't kept in sync then things were busted.&amp;#160; This got to be a real problem as ETW was used more broadly.&lt;/p&gt;  &lt;p&gt;This problem was solved by using meta to describe events.&amp;#160; This allowed event consumers to decode events without knowledge of the events binary format.&amp;#160; This worked much better; it allowed the event provider author to change an event's binary format without breaking the consumer.&amp;#160; In the XP time frame &lt;a href="http://www.wbemsolutions.com/tutorials/CIM/cim-mof.html" target="_blank"&gt;MOF files&lt;/a&gt; were used to describe events.&amp;#160;&amp;#160; For example, you can find he kernel's context switch event &lt;a href="http://msdn2.microsoft.com/en-us/library/aa964744(VS.85).aspx" target="_blank"&gt;here&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;Three things changed for Vista:&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The entire Windows build system was updated so that every component was described by an XML based &lt;a href="http://msdn2.microsoft.com/en-us/library/aa375632.aspx" target="_blank"&gt;manifest&lt;/a&gt;.&amp;#160; This included describing ETW events.&amp;#160; We deprecated the MOF format and all new events were authored with XML based descriptions in their manifests using the &lt;a href="http://msdn2.microsoft.com/en-us/library/aa384043(VS.85).aspx" target="_blank"&gt;Event Manifest Schema&lt;/a&gt;.&amp;#160;&amp;#160; &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;The use of ETW became very prevalent - many teams added event providers to their components and used them for &lt;a href="http://msdn2.microsoft.com/en-us/library/aa385780(VS.85).aspx" target="_blank"&gt;Windows Event Logging&lt;/a&gt; (which is ETW based), performance work, diagnostics, and testing.&amp;#160; For example, on my laptop, there are 985 registered ETW event providers.&amp;#160; Use the &amp;quot;xperf -provider&amp;quot; command to see what is registered on your system.       &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;Our team decided to make a major investment in ETW based tools as did other teams around Windows.&amp;#160; This meant that meta information for events was very important as it enabled event providers and the consumers to be more decoupled and cohesive. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;But, this posed one problem for us: do we fully support trace decoding on both Vista and XP?&amp;#160; Or just on Vista?&amp;#160; It was technically possible to keep trace decoding working on XP, but this would require shipping some Vista components with the tools because the required trace decoding infrastructure is only present on Vista.&amp;#160; Unfortunately, this isn't possible for all kinds of business, legal, and some technical reasons.&amp;#160; It would have also doubled our test matrix.&lt;/p&gt;  &lt;p&gt;After much discussion, we decided it was an easily workable compromise to support trace collection on XP, and require Vista or WS2008 for all trace decoding operations.    &lt;br /&gt;&lt;strong&gt;&amp;lt;/boooring&amp;gt;&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:1df2a6b9-40be-4e4d-b9de-cb588180a372" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px"&gt;del.icio.us Tags: &lt;a href="http://del.icio.us/popular/xperf" rel="tag"&gt;xperf&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/windows" rel="tag"&gt;windows&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/performance" rel="tag"&gt;performance&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/tools" rel="tag"&gt;tools&lt;/a&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7880729" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/xperf/default.aspx">xperf</category></item><item><title>Using Xperf to take a Trace (updated)</title><link>http://blogs.msdn.com/pigscanfly/archive/2008/02/16/using-xperf-to-take-a-trace.aspx</link><pubDate>Sat, 16 Feb 2008 05:32:20 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7727028</guid><dc:creator>rgr</dc:creator><slash:comments>9</slash:comments><comments>http://blogs.msdn.com/pigscanfly/comments/7727028.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pigscanfly/commentrss.aspx?PostID=7727028</wfw:commentRss><wfw:comment>http://blogs.msdn.com/pigscanfly/rsscomments.aspx?PostID=7727028</wfw:comment><description>&lt;p&gt;Lets get to it!&amp;#160; Here is how to take a basic trace then look at CPU and disk utilization.&amp;#160;&amp;#160;&amp;#160; Its really simple, just three commands to turn on tracing, turn it off, and then view the trace. &lt;/p&gt;  &lt;p align="center"&gt;&lt;em&gt;The next article in this series is &lt;a href="http://blogs.msdn.com/pigscanfly/archive/2008/02/24/xperf-support-for-xp.aspx"&gt;Xperf support for XP&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;First, from an elevated command prompt window, enable a basic set of the kernel events using this command:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;&lt;font face="Courier New" size="2"&gt;xperf -on PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+INTERRUPT+DPC+CSWITCH -maxbuffers 1024&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;This command enables a set of events in the kernel and sets the maximum number of buffers to 1024.&amp;#160; The default size for each buffer is 64K.&amp;#160; So for this session, ETW will use up to 64MB of memory for ETW buffers.&amp;#160; As buffers are filled with events, they are written to the log file in the background and then made available again for accepting events.&amp;#160; By default, xperf sets the minimum number of buffers to 64.&amp;#160; ETW will start with this many buffers and only allocate more buffers if needed.&amp;#160; Events will only be lost if ETW cannot allocate more buffers and/or keep up with the event rate by writing data to the disk. By default, the kernel events are written to \kernel.etl on the current drive. &lt;/p&gt;  &lt;p&gt;Next, do something interesting - it can be anything from opening Internet explorer and a web page, or compiling a program with Visual studio, to something more complex like opening three or four Microsoft Office applications and doing some work.&lt;/p&gt;  &lt;p&gt;Run the following command when your interesting thing is done: &lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;strong&gt;&lt;font face="Courier New" size="2"&gt;xperf -d foo.etl&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;This simple command will take 10 to 30 seconds (or possibly longer) because it merging the raw kernel event data with meta data and doing some other post processing.&amp;#160; We call this 'stop and merge'.&amp;#160; Here is what this command does&lt;/p&gt;  &lt;ol dir="ltr"&gt;   &lt;li&gt;     &lt;p&gt;Performs a 'run down', during which the kernel logs a set of events that describe the state of the system.&lt;/p&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div dir="ltr" style="margin-right: 0px"&gt;Turns off the kernel logger&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div dir="ltr" style="margin-right: 0px"&gt;Interlaces data from multiple trace files and the kernel trace.&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div dir="ltr" style="margin-right: 0px"&gt;Adds some meta info to the trace needed for processing the trace on other systems. This data is saved in the trace as a set of synthetic events.&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div dir="ltr" style="margin-right: 0px"&gt;Saves the trace data into the file foo.etl (or the file name of your choice).&lt;/div&gt;   &lt;/li&gt; &lt;/ol&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;Finally, load the trace in the Performance Analyzer with the following command&lt;/p&gt;  &lt;blockquote&gt;   &lt;p dir="ltr" style="margin-right: 0px"&gt;&lt;strong&gt;&lt;font face="Courier New" size="2"&gt;xperf foo.etl&lt;/font&gt;&lt;/strong&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;For this example, I took a trace of using Visual Studio 2008 to compile a program.&amp;#160; Here are screen shots of the CPU Usage by CPU and for disk I/O counts.&lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_2.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="167" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_thumb.png" width="494" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_4.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="167" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_thumb_1.png" width="493" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_8.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; margin: 0px 5px 0px 0px; border-right-width: 0px" height="244" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_thumb_3.png" width="186" align="left" border="0" /&gt;&lt;/a&gt; Those are pretty interesting, but lots of things are running in the system, and I'd like to see just the CPU usage for Visual Studio itself.&lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;The CPU usage by process graph makes this easy, just click on the fly out control on the left of the window and select the &lt;em&gt;CPU Usage by Process &lt;/em&gt;graph.&amp;#160; &lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;The fly out frame lists the graphs available for the events in the trace.&amp;#160; If there trace doesn't contain events that are needed for a particular graph, then the graph is not shown.&lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;Performance Analyzer will automatically save the graphs you have selected.&amp;#160; You can change them at any time. &lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;For my trace, the CPU usage for the DEVENV.EXE process and two CL.EXE processes looked like this.&lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_10.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="180" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_thumb_4.png" width="532" border="0" /&gt;&lt;/a&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/UseXperftotakeaTrace_D92D/image_10.png"&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;DEVENV is the Visual Studio 2008 environment itself.&amp;#160;&amp;#160; The CL.EXE processes are the two compiler sessions it started, one for each CPU on my laptop. &lt;/p&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;This is a simple example that illustrates some key points&lt;/p&gt;  &lt;ol dir="ltr"&gt;   &lt;li&gt;     &lt;div dir="ltr" style="margin-right: 0px"&gt;The kernel events can be enabled and disabled at any time.&amp;#160; There is no need to re-boot the system, log-out/log-in, or restart processes to use the kernel events, or any ETW event provider.&amp;#160; ETW events from any source can be dynamically controlled at run time.        &lt;br /&gt;&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div dir="ltr" style="margin-right: 0px"&gt;The xperf tools are designed for a post processing model, one where a trace is captured, then later analyzed.&amp;#160;&amp;#160; This is in contrast to an observational model where you watch dynamic charts, graphics, or tabular data as something occurs.&amp;#160; The reason for this model is that ETW and the tools are designed for log time efficiency.        &lt;br /&gt;&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div dir="ltr" style="margin-right: 0px"&gt;This model is also specifically designed for taking traces on one machine, then analyzing them on another machine.&amp;#160;&amp;#160; This ability is critical for running performance tests in a lab setting.        &lt;br /&gt;&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;     &lt;div dir="ltr" style="margin-right: 0px"&gt;The tools let you look at both system wide activity and process specific activity.&lt;/div&gt;   &lt;/li&gt; &lt;/ol&gt;  &lt;div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:b92c3beb-b7c8-494f-ac9d-cfc31033be04" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px"&gt;del.icio.us Tags: &lt;a href="http://del.icio.us/popular/tools" rel="tag"&gt;tools&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/xperf" rel="tag"&gt;xperf&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/performance" rel="tag"&gt;performance&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/windows" rel="tag"&gt;windows&lt;/a&gt;&lt;/div&gt;  &lt;p dir="ltr" style="margin-right: 0px"&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7727028" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/xperf/default.aspx">xperf</category></item><item><title>Xperf Tools Landing Page and Update</title><link>http://blogs.msdn.com/pigscanfly/archive/2008/02/11/xperf-tools-landing-page-and-update.aspx</link><pubDate>Mon, 11 Feb 2008 04:26:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7597045</guid><dc:creator>rgr</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/pigscanfly/comments/7597045.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pigscanfly/commentrss.aspx?PostID=7597045</wfw:commentRss><wfw:comment>http://blogs.msdn.com/pigscanfly/rsscomments.aspx?PostID=7597045</wfw:comment><description>&lt;p&gt;The &lt;a href="http://www.microsoft.com/whdc/default.mspx" target="_blank" mce_href="http://www.microsoft.com/whdc/default.mspx"&gt;WHDC&lt;/a&gt; folks now have web page setup for the &lt;a class="" href="http://www.microsoft.com/whdc/system/sysperf/perftools.mspx" target="_blank" mce_href="http://www.microsoft.com/whdc/system/sysperf/perftools.mspx"&gt;Windows Performance Toolkit&lt;/a&gt; (aka the 'xperf tools').&amp;#160; The page includes downloads for updates to the versions that ship in the SDK.&amp;#160; In the near future, this page will include pointers to updated documentation, and discussion forums.&lt;/p&gt;  &lt;p align="center"&gt;&lt;em&gt;The next article in this series is &lt;a href="http://blogs.msdn.com/pigscanfly/archive/2008/02/16/using-xperf-to-take-a-trace.aspx"&gt;Using Xperf to take a Trace (updated)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;This page included downloads for V4.1.1 of the Windows Performance Toolkit.&amp;#160; This download is an updated for the SDK version and includes fixes for two small bugs in the version included in the Windows SDK.&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;The COM plug in that handles the power management events is not named correctly, so the power state transition analysis feature doesn't work. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;&lt;/p&gt;  &lt;div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:835204dc-25be-46e0-9a7b-461a860d4168" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px"&gt;del.icio.us Tags: &lt;a href="http://del.icio.us/popular/windows" rel="tag"&gt;windows&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/performance" rel="tag"&gt;performance&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/xperf" rel="tag"&gt;xperf&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/tools" rel="tag"&gt;tools&lt;/a&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7597045" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/xperf/default.aspx">xperf</category></item><item><title>Xperf, a new tool in the Windows SDK</title><link>http://blogs.msdn.com/pigscanfly/archive/2008/02/09/xperf-a-new-tool-in-the-windows-sdk.aspx</link><pubDate>Sat, 09 Feb 2008 04:59:47 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7550033</guid><dc:creator>rgr</dc:creator><slash:comments>15</slash:comments><comments>http://blogs.msdn.com/pigscanfly/comments/7550033.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pigscanfly/commentrss.aspx?PostID=7550033</wfw:commentRss><wfw:comment>http://blogs.msdn.com/pigscanfly/rsscomments.aspx?PostID=7550033</wfw:comment><description>&lt;p&gt;The SDK team just &lt;a href="http://blogs.msdn.com/windowssdk/archive/2008/02/07/windows-sdk-rtms.aspx" target="_blank"&gt;shipped the latest version of the Windows SDK&lt;/a&gt; which supports Windows Server 2008 and Vista SP1.&amp;#160; The SDK now includes an important new tool; the &lt;strong&gt;Windows Performance Tool Kit&lt;/strong&gt; from the Windows performance team (we call them the xperf tools for short...)&lt;/p&gt;  &lt;p align="center"&gt;&lt;em&gt;This is the first article in the xperf series, the next one is      &lt;br /&gt;&lt;/em&gt;&lt;a href="http://blogs.msdn.com/pigscanfly/archive/2008/02/11/xperf-tools-landing-page-and-update.aspx"&gt;&lt;em&gt;Xperf Tools Landing Page and Update&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The xperf tools have long been an internal tool used by our team, and widely throughout Windows, for system-wide performance analysis.&amp;#160; Xperf got its start many years ago as a set of command-line tools that produce reports based off the &lt;a href="http://msdn.microsoft.com/msdnmag/issues/07/04/ETW/default.aspx" target="_blank"&gt;ETW&lt;/a&gt; instrumentation in the kernel[1]. Many other components and applications in Windows are instrumented with ETW and xperf can enable these events, dump them, and analyze them. &lt;/p&gt;  &lt;p&gt;Xperf is an important tool for anyone doing system performance work on Windows because it's specifically designed to give you a complete system-wide view of performance over long periods of time (10's of seconds, to minutes)[2].&amp;#160; It's also the only tool that knows how to fully process all the events from the kernel and correlate them into something that makes sense.&amp;#160; &lt;/p&gt;  &lt;p&gt;For example, here is a detail graph of all the disk I/O to the system drive on my laptop for opening this post, editing it a bit, and then closing Live Writer.&amp;#160; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/NewPerformanceToolintheWindowsSDKXPerf_11405/screen-capture%5B5%5D.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="180" alt="screen-capture[5]" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/NewPerformanceToolintheWindowsSDKXPerf_11405/screen-capture%5B5%5D_thumb.png" width="517" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Here is an example of the CPU and disk utilization for Outlook 2007 launch:&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/NewPerformanceToolintheWindowsSDKXPerf_11405/image_10.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="215" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/NewPerformanceToolintheWindowsSDKXPerf_11405/image_thumb_4.png" width="516" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Here is the same view, but with the data from all processes visible:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/NewPerformanceToolintheWindowsSDKXPerf_11405/image_14.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="216" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/NewPerformanceToolintheWindowsSDKXPerf_11405/image_thumb_6.png" width="522" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/NewPerformanceToolintheWindowsSDKXPerf_11405/image_4.png"&gt;&lt;img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; margin: 0px 0px 0px 5px; border-right-width: 0px" height="298" alt="image" src="http://blogs.msdn.com/blogfiles/pigscanfly/WindowsLiveWriter/NewPerformanceToolintheWindowsSDKXPerf_11405/image_thumb_1.png" width="137" align="right" border="0" /&gt;&lt;/a&gt;In addition to graphical displays, the tools can also display tabular data (what we call &amp;quot;summary data&amp;quot;). The screen capture to the right is a table of sample profile events during a 6.5 second period during a find string operation over a tree of source code.&amp;#160; For that period, 73.93% of the total CPU time was in the idle thread, 6.78% was in the find string utility and the reset of the time was distributed around services, the system, xperf itself (at 3%) and other processes. As you start playing with the summary tables, try shifting around the columns to get different types of views on the data; for example, grouping IOs per process, IO type (read/write/...), IO size, IO service time, and so forth.&lt;/p&gt;  &lt;p&gt;These simple examples barely scratch the surface of the data that the tools can gather and the richness of the information they can display.&amp;#160; The tools have several other important features including:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Full support for symbol decoding.&amp;#160; This uses the same mechanism as the &lt;a href="http://www.microsoft.com/whdc/devtools/debugging/default.mspx" target="_blank"&gt;Debugging Tools for Windows&lt;/a&gt;.&amp;#160; This includes full support for the &lt;a href="http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx" target="_blank"&gt;public Windows symbols&lt;/a&gt;, and for your own symbols. &lt;/li&gt;    &lt;li&gt;The ability to dump all the events from a trace file to a CSV file.&amp;#160; If the summary tables don't display what you want, then you can write your own trace processing tools on top of the text dump, or the (generally XML-based) output of the command-line actions. &lt;/li&gt;    &lt;li&gt;Windows Vista supports collecting stack traces on all the kernel events.&amp;#160;&amp;#160; One of the most useful things to do is collecting stack traces on the sample profile event. This is an extremely powerful tool for understanding where and why a program is spending time. &lt;/li&gt;    &lt;li&gt;The xperf command-line tool can be used to control all the ETW trace providers in a system, including all the kernel events.      &lt;div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:2b4f5071-8aea-4a31-9dd3-7cc21515c462" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px"&gt;del.icio.us Tags: &lt;a href="http://del.icio.us/popular/tools%20windows%20performance" rel="tag"&gt;tools windows performance&lt;/a&gt;&lt;/div&gt;   &lt;/li&gt;    &lt;li&gt;The xperf distribution also contains a quick start guide and basic reference manual.&amp;#160; Just look for the document &lt;strong&gt;Performance.Analyzer.QuickStart.docx&lt;/strong&gt;, its in XPS format as well. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;In the coming weeks, I'll blog more about the tools, how to use them, and the kernel ETW events.&amp;#160;&amp;#160; We'll also soon have a web page up for the tools.&amp;#160;&amp;#160; This is where you will soon find updates, additional documentation, and a message forum.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;u&gt;&lt;font color="#008000"&gt;Now!&amp;#160;&amp;#160; Here is how you can get the tools!&lt;/font&gt; &lt;/u&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;Install the SDK by &lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyId=F26B1AA4-741A-433A-9BE5-FA919850BDBF&amp;amp;displaylang=en" target="_blank"&gt;downloading the ISO image&lt;/a&gt;, or &lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyId=E6E1C3DF-A74F-4207-8586-711EBE331CDC&amp;amp;displaylang=en" target="_blank"&gt;using the Web based installer&lt;/a&gt;. &lt;/li&gt;    &lt;li&gt;Find the xperf MSI in the SDK's &amp;quot;bin&amp;quot; directory.&amp;#160;&amp;#160; It will be named &lt;strong&gt;xperf_x86.msi&lt;/strong&gt;, &lt;strong&gt;xperf_x64.msi&lt;/strong&gt;, or &lt;strong&gt;xperf_ia64.msi&lt;/strong&gt;, depending on the architecture for which you install the SDK.&amp;#160;&amp;#160; &lt;/li&gt;    &lt;li&gt;You can then install the xperf tools from the MSI directly, or copy the xperf MSI file to another location and install it from there.&amp;#160; For example, you could keep the MSI files on a USB key. &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;We'll soon have a web page up for the tools on the MSDN site... stay tuned!&lt;/p&gt;  &lt;div class="wlWriterSmartContent" id="scid:0767317B-992E-4b12-91E0-4F059A8CECA8:d62ba28f-15e2-4e49-a95d-8eb46afaf6c2" style="padding-right: 0px; display: inline; padding-left: 0px; padding-bottom: 0px; margin: 0px; padding-top: 0px"&gt;del.icio.us Tags: &lt;a href="http://del.icio.us/popular/windows" rel="tag"&gt;windows&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/perfomance" rel="tag"&gt;perfomance&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/xperf" rel="tag"&gt;xperf&lt;/a&gt;,&lt;a href="http://del.icio.us/popular/tools" rel="tag"&gt;tools&lt;/a&gt;&lt;/div&gt;  &lt;p&gt;[1]&amp;#160; You can see the events supported by the kernel in the docs for the &lt;strong&gt;EnableFlags&lt;/strong&gt; field of the &lt;a href="http://msdn2.microsoft.com/en-us/library/aa363784(VS.85).aspx" target="_blank"&gt;&lt;strong&gt;EVENT_TRACE_PROPERTIES&lt;/strong&gt;&lt;/a&gt; structure.&amp;#160; I'm going to blog more about these...&lt;/p&gt;  &lt;p&gt;[2] The xperf tools from the Windows Performance Toolkit are very complimentarily to the &lt;a href="http://technet.microsoft.com/en-us/sysinternals/default.aspx" target="_blank"&gt;SysInternals tools&lt;/a&gt;. &lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7550033" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/pigscanfly/archive/tags/xperf/default.aspx">xperf</category></item></channel></rss>