<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Funny, It Worked Last Time : Performance</title><link>http://blogs.msdn.com/ryanmy/archive/tags/Performance/default.aspx</link><description>Tags: Performance</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Event Tracing for Windows (ETW) -- Part 2</title><link>http://blogs.msdn.com/ryanmy/archive/2005/06/09/427520.aspx</link><pubDate>Fri, 10 Jun 2005 00:42:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:427520</guid><dc:creator>ryanmy</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/ryanmy/comments/427520.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ryanmy/commentrss.aspx?PostID=427520</wfw:commentRss><description>&amp;nbsp;&amp;nbsp;&amp;nbsp; So, there were two major groups of comments on the last post, and I'll try to address them.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; The first was a question about &lt;b&gt;managed support&lt;/b&gt;
for ETW.&amp;nbsp; I talked to the ETW team, and the current state is that
there is no official managed interface for ETW.&amp;nbsp; Being a standard
Win32 API, it is posisble to &lt;a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconconsumingunmanageddllfunctions.asp"&gt;PInvoke&lt;/a&gt;
the functions involved, and several internal teams have written their
own managed wrappers around ETW.&amp;nbsp; This isn't expected to change up
through Whidbey (Visual Studio 2005); for Orcas (the version of VS
after that!) an official managed interface is on the table.&amp;nbsp; The
second is one of &lt;b&gt;backwards compatibility&lt;/b&gt;
-- ETW is only available in Win2K and later OSes.&amp;nbsp; Users will
expect that software works similarly on all OSes; thus, if you want to
support 9x-era OSes, you have to write your own logging code
anyways.&amp;nbsp; So, if you are already putting in old-style logging, why
use ETW?&amp;nbsp; I'll try to answer that with this entry.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; One of the gains of ETW is that it's fast; you can
spit out thousands of events per second while using relatively little
CPU, far faster than you can fprintf() a string to disk.&amp;nbsp; The
biggest gain, though, is combining multiple sources -- including
information outside your own process.&amp;nbsp; And the most notable
external source is the kernel.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; The XP and Codename Longhorn kernels are &lt;a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/perfmon/base/nt_kernel_logger_constants.asp"&gt;extremely extensive providers&lt;/a&gt;, and can be enabled to log any or all of these to a log, and we publish decoding information for:&lt;br&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;i&gt;Hardware configuration events&lt;/i&gt; -- notes on the system's CPUs, hard drives, NICs, video card, and ACPI power states&lt;/li&gt;
  &lt;li&gt;&lt;i&gt;Disk-level I/O&lt;/i&gt; -- every I/O on the system, including IRP flags, operation time in microseconds, number of bytes, and target disk&lt;/li&gt;
  &lt;li&gt;&lt;i&gt;File-level I/O&lt;/i&gt; -- every access to every file on the system (including information to tie it to the disk I/Os above)&lt;/li&gt;
  &lt;li&gt;&lt;i&gt;Image layouts&lt;/i&gt; -- filenames, locations in memory, and PIDs for every image in the system&lt;/li&gt;
  &lt;li&gt;&lt;i&gt;Page faults&lt;/i&gt; -- pointers
to instructions and pages whenever a fault occurs (including COWs,
demand-zero faults, hard page faults, transition faults, and guard
pages)&lt;/li&gt;
  &lt;li&gt;&lt;i&gt;Network I/O&lt;/i&gt; -- all TCP and UDP actions, including connects/accepts, transmits (and retransmits!), recieves, etc.&lt;/li&gt;
  &lt;li&gt;&lt;i&gt;Registry I/O&lt;/i&gt; -- all Registry key/value&amp;nbsp; creation/deletions/changes, registry flushes, etc.&lt;/li&gt;
  &lt;li&gt;&lt;i&gt;Process and thread info&lt;/i&gt; -- all creations/deletions of processes and threads&lt;/li&gt;
&lt;/ul&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; And Codename Longhorn adds even more events -- most notably,
the ability to trace extremely fine-grained high-frequency events such
as individual context switches, interrupts (ISRs and DPCs), etc.&amp;nbsp;&amp;nbsp;  &lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; (FYI, since all of the above information is
exceedingly detailed, you can only enable the kernel provider if you
have Administrator privileges, are part of the Performance Log Users
group, or a service running as LocalSystem, LocalService, or
NetworkService.)&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; Thus, ETW can be used as an effective debugging
tool.&amp;nbsp; By allowing ETW to pull from, sort, and combine events from
multiple providers, you can get a powerful log of everything the system
was doing, probably the most accurate log available (save for running
the entire OS in a debugger).&amp;nbsp; It's an incredible tool for
noticing "hey, things act strangely when X, Y, and Z, but not W, are
happening" at a system level, as well as a code level, and it takes far
less time than getting symbols and attaching a debugger/profiler to the
system.&amp;nbsp; And it's all available to devs -- and even to users,
given a generic ETW tool such as tracelog in Server 2003!&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; Even if you don't specifically use the kernel as an
information source, ETW's ability to combine providers is useful for
mixing and matching information from multiple DLLs, EXEs, etc. in a
system.&amp;nbsp; ETW events are timestamped by the kernel to extremely
high resolution (RDTSC on stable machines, converted to microsecond
intervals; MM timers on others) and are automatically sorted at process
time, so you don't have to write or parse plaintext date/time formats.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; The goal I'm driving at is that when you have more
than one DLL or EXE providing information, individually implementing
logging for each component means that you usually need a third app to
read in the logs from each component and combine them into a single log
with coherent event ordering, and this can be difficult -- especially
if you have to tie it to some event.&amp;nbsp; ETW allows you to automate
all that, and it's exceedingly efficient at it as well.&amp;nbsp; Even if
you are only personally maintaining one component, ETW can log very
quickly, and it can be shipped in retail builds -- and if you publish
the structures of some or all of the events you provide, you can give
valuable information to your consumers and to future devs without ever
needing to work with them.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; Next entry, I'll start discussing how providers are
written, starting with thread structures and common ways of publishing
event structs.&lt;br&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=427520" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ryanmy/archive/tags/Performance/default.aspx">Performance</category></item><item><title>Event Tracing for Windows (ETW)</title><link>http://blogs.msdn.com/ryanmy/archive/2005/05/27/422772.aspx</link><pubDate>Sat, 28 May 2005 05:13:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:422772</guid><dc:creator>ryanmy</dc:creator><slash:comments>10</slash:comments><comments>http://blogs.msdn.com/ryanmy/comments/422772.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ryanmy/commentrss.aspx?PostID=422772</wfw:commentRss><description>&amp;nbsp;&amp;nbsp;&amp;nbsp; A lot of work in performance tuning is
organizational.&amp;nbsp; There's only so much work one can do with a
profiler and a single module.&amp;nbsp; A good example is the Registry --
we can attach profilers to the Registry access routines and optimize
them until they run as smooth as silk, but performance will still be
impacted if you do thousands of Registry accesses per second.&amp;nbsp; For
many problems, the cause is systemic: several components in a chain of
command that are individually well-tuned, but didn't expect to call
each other in a huge chain.&amp;nbsp; A good example of that is DirectShow
-- no matter how skillfully crafted an individual filter is, if the
mean path in a DirectShow graph is ten filters deep (with memory
management between each one for passing buffers of audio or video
around), latency is going to be high.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; More often than not, the best solution is simply
logging.&amp;nbsp; Log when filters are instantiated or connected, log when
Registry accesses are made, etc..&amp;nbsp; You want to mark high-level
concepts, and try to get a picture for what's going on with the system
as a whole.&amp;nbsp; This works fine if you only have one application that
has to log... but more often that not, these systemic problems have
hundreds of files involved, most of which aren't coded by you!&amp;nbsp; If
every programmer performs their logging in a different way, it can be a
nightmare to combine all those logs together, mixing different types of
timestamps and different methods of delivery, and get a single ordered
log of what happened over time.&amp;nbsp; Of course, that's exactly what we
need... and that's where Event Tracing for Windows, or ETW for short,
comes in.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp; ETW is, at its core, a unified system for one-way
packetized I/O managed by the Windows kernel, built for logging.&amp;nbsp;
Every use of ETW has three participants in it -- the controller, the
provider, and the consumer:&lt;br&gt;
&lt;ul&gt;
  &lt;li&gt;
A &lt;b&gt;provider&lt;/b&gt; is an module (DLL/EXE) doing something worth logging.&amp;nbsp;
Most of the time, it runs without logging; it can, however, be
"enabled" by a controller, at which point it recieves a handle from the
kernel and starts logging "events" to that handle.&amp;nbsp; An event is
an arbitrary struct (binary block) of data, the only condition being
that it start with a 48-byte header.&amp;nbsp; This header contains a
timestamp and identifying information.&lt;br&gt;
    &lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;
A &lt;b&gt;controller&lt;/b&gt; controls the actual act of logging.&amp;nbsp; The controller
can ask the kernel to start a logging session, creating a handle and
specifying that the kernel should take any events delivered to that
handle and save them to a file.&amp;nbsp; (That file is usually on a hard
drive, although we occasionally save them to RAM drives to ensure
minimal interference.)&amp;nbsp; The controller can also enable and disable
logging by providers, passing them a handle to log to.&lt;br&gt;
    &lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;
A &lt;b&gt;consumer&lt;/b&gt; reads events out of a file created by a logging session and
parses them.&amp;nbsp; (It is also technically possible to have a consumer
directly attach to a logging session's handle and retrive events in
real-time, but this is rare.)&lt;/li&gt;
&lt;/ul&gt;



&amp;nbsp;&amp;nbsp;&amp;nbsp; So, why use this system over your own homebrew system?&lt;br&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;b&gt;Uniformity&lt;/b&gt;.&amp;nbsp; If you're debugging systemic problems involving
multiple components, and all the involved components use ETW, you can
have them all deliver their information to a single log file with
uniform, steady timestamps, and write a single application that parses
them all.&lt;br&gt;
    &lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;b&gt;Speed&lt;/b&gt;.&amp;nbsp; ETW is extremely fast for providers to use, since all the
I/O is handled by the kernel instead of by your module.&amp;nbsp; It
typically takes &lt;b&gt;only 1500-2000 cycles&lt;/b&gt;, depending on settings, to
deliver an event and return to your code.&amp;nbsp; One can easily deliver
thousands of events per second even on ancient machines.&amp;nbsp; We've
achieved &lt;b&gt;20,000 events per second while only using 5% CPU load
on a P3 500MHz&lt;/b&gt;!&amp;nbsp; &lt;i&gt;(Yes, we have machines that old in our
perf testing labs -- not everyone who uses Longhorn will be using a
modern machine!)&lt;/i&gt;&lt;br&gt;
    &lt;br&gt;
  &lt;/li&gt;&lt;li&gt;
    &lt;b&gt;Consistency&lt;/b&gt;.&amp;nbsp; With fprintf() or other homebrew systems, logging
tends to be very slow and intrusive and is thus usually compiled
in.&amp;nbsp; With ETW, logging is extremely fast; furthermore, since
logging is turned on by a controller and is usually off by default, you
can actually leave the ETW events in final shipping code!&amp;nbsp; If
problems are found in the field, send the tester an app that starts a
trace and turns on the provider, then read it later.&amp;nbsp; Many, many
components in Longhorn will ship as ETW providers.&lt;br&gt;
    &lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;b&gt;
Reliability&lt;/b&gt;.&amp;nbsp; ETW isn't a new thing -- it's actually been in the
OS and actively used since Win2K, and has been constantly refined since
then.&amp;nbsp; Furthermore, ETW is available in both user-mode apps and
kernel components.&amp;nbsp; (The latter access it through a
MJ_SYSTEM_CONTROL IRP.)&amp;nbsp; This leads to...&lt;br&gt;
    &lt;br&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;b&gt;
OS cooperation&lt;/b&gt;.&amp;nbsp; The Windows kernel can provide many highly useful
events via ETW for diagnosing performance problems.&amp;nbsp; Find out when
and where disk I/Os, registry accesses, hard faults, and other
performance problems happen!&amp;nbsp; More on this later...&lt;/li&gt;
&lt;/ul&gt;





&amp;nbsp;&amp;nbsp;&amp;nbsp; I'll start discussing the actual APIs in the next
entry -- those whose curiosity has been piqued can jump into the &lt;a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/perfmon/base/event_tracing.asp"&gt;MSDN
documentation&lt;/a&gt;, which is not very good IMO but better than
nothing.&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=422772" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ryanmy/archive/tags/Performance/default.aspx">Performance</category></item><item><title>Misinformation and the The Prefetch Flag</title><link>http://blogs.msdn.com/ryanmy/archive/2005/05/25/421882.aspx</link><pubDate>Thu, 26 May 2005 02:27:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:421882</guid><dc:creator>ryanmy</dc:creator><slash:comments>24</slash:comments><comments>http://blogs.msdn.com/ryanmy/comments/421882.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ryanmy/commentrss.aspx?PostID=421882</wfw:commentRss><description>&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;Hello!&amp;nbsp; I haven't updated this blog in a
while; work and other events have conspired to keep me from
writing.&amp;nbsp; Also, blogs.msdn.com moved internally from .Text to
Telligent Community Server, and my CSS markup was an unfortunate
casualty of the move, so I'm working on redesigning the blog's visual
appearance.&amp;nbsp; More entries will be coming eventually.&amp;nbsp; :)&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;In the meantime, I want to defuse a long-standing controversy -- the /prefetch flag.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;With modern computing, the absolute worst thing
you can ever do for performance is having to touch the hard drive -- or
any non-memory storage for that matter.&amp;nbsp; The fastest hard drives
on earth are still horridly slow compared to a PC's main memory; even
with solid state drives, in order to access the drive,&amp;nbsp;one has to
jump into system code and drivers, and this will push your own
program's code out of the CPU's L2 cache.&amp;nbsp; (This is called a
locality loss.)&amp;nbsp; There's two typical reasons one has to touch the
disk --&amp;nbsp;the first is when the application requests it explicitly
(Word asks the OS to load blog.doc into memory), and the other is a
"hard fault" -- when the application tries to use memory that has been
paged out to disk via "virtual memory" and needs to be paged back in.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;Now, imagine that a DVD player program always
starts playback by loading a DLL to decode MPEG-2 video.&amp;nbsp; Wouldn't
it be nice if we could attempt to pre-load the MPEG-2 DLL whenever we
loaded the DVD player's EXE?&amp;nbsp; That way, when it tries to run code
on that DLL, one doesn't have to hard fault and go to disk for
it!&amp;nbsp;&amp;nbsp; This&amp;nbsp;is what a prefetcher does: it tracks what
code pages are used by an application, and&amp;nbsp;the next time that
application loads, it loads those pages in advance as soon as it's got
some idle time.&amp;nbsp; A prefetcher was added to Windows in XP, and is
vastly improved in Windows Longhorn.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;XP systems have a Prefetch directory underneath
the windows root directory,&amp;nbsp;full of .pf files -- these are lists
of pages to load.&amp;nbsp; The file names are generated from hashing the
EXE to load -- whenever you load the EXE, we hash, see if there's a
matching &lt;strong&gt;(exename)&lt;/strong&gt;-(&lt;strong&gt;hash).pf&lt;/strong&gt;
file in the prefetch directory, and if so we load those pages.&amp;nbsp;
(If it doesn't exist, we track what pages it loads, create that file,
and pick a handful of them to save to it.)&amp;nbsp; So, first off, &lt;em&gt;it is a&amp;nbsp;bad idea to periodically clean out that folder&lt;/em&gt;
as some tech sites suggest.&amp;nbsp; For one thing, XP will just re-create
that data anyways; secondly, it trims the files anyways if there's ever
more than 128 of them so that it doesn't needlessly consume space.&amp;nbsp; So not only is deleting the directory
totally unnecessary, but you're also putting a temporary dent in your
PC's performance.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;Secondly, one can specify a &lt;strong&gt;/prefetch:#&lt;/strong&gt;
flag when launching an app.&amp;nbsp; Many&amp;nbsp;people have noticed that
auto-generated shortcuts to Windows Media Player do this, and the
number varies depending on what it does.&amp;nbsp; For example, the
shortcut used by the shell when you double-click a WMV file to play it
has one prefetch number; the auto-run shortcut to play or rip music
that appears when you insert a music CD have other numbers.&amp;nbsp; Some
sites have guessed that this switch turns on prefetching, and suggest
that you add that to every executable you care about -- this has
appeared on &lt;a href="http://www.iamnotageek.com/a/67-p1.php"&gt;so&lt;/a&gt; &lt;a href="http://www.pcmech.com/show/optimize/677/7"&gt;many&lt;/a&gt;, &lt;a href="http://www.winguides.com/forums/showflat.php?Cat=&amp;amp;Board=brdNewTweaks&amp;amp;Number=90351&amp;amp;page=6&amp;amp;view=collapsed&amp;amp;sb=5&amp;amp;part=1"&gt;many&lt;/a&gt;, &lt;a href="http://www.softwaretipsandtricks.com/windowsxp/articles/416/1/Load-Applications-Faster"&gt;&lt;em&gt;many&lt;/em&gt;&lt;/a&gt; &lt;a href="http://www.tweakguides.com/Firefox_12.html"&gt;sites&lt;/a&gt; to be urban legend.&amp;nbsp; &lt;a href="http://www.edbott.com/weblog/archives/000621.html"&gt;Other sites&lt;/a&gt;
write this off as garbage and guess that it's a switch specific to
Media Player, guessing from references to prefetching in the Windows
driver subsystem.&amp;nbsp; &lt;em&gt;Both guesses are incorrect.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;The &lt;strong&gt;/prefetch:#&lt;/strong&gt; flag is looked at
by the OS when we create the process -- however, it has one (and only
one) purpose.&amp;nbsp; We add the passed number to the hash.&amp;nbsp;
Why?&amp;nbsp; WMP is a multipurpose application and may do many different
things.&amp;nbsp; The DLLs and code that it touches will be very different
when&amp;nbsp;playing a WMV than when playing a DVD, or when ripping a CD,
or when listening to a Shoutcast stream, or any of the other things
that WMP can do.&amp;nbsp; If we only had one hash for WMP, then the
prefetch would only be correct for one such use.&amp;nbsp; Having incorrect
prefetch data would not be a fatal error -- it'd just load pages into
memory that'd never get used, and then get swapped back out to disk as
soon as possible.&amp;nbsp; Still, it's counterproductive.&amp;nbsp; By
specifying a &lt;strong&gt;/prefetch:#&lt;/strong&gt; flag with a different number
for each "mode" that WMP can do, each mode gets its own separate hash
file, and thus we properly prefetch.&amp;nbsp; (This behavior isn't specific to WMP -- it does the same for any app.)&lt;br&gt;
&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;This flag is looked at when we create the first thread in the process, but it is &lt;em&gt;not&lt;/em&gt;
removed by CreateProcess from the command line, so any app that chokes
on unrecognized command line parameters will not work with it.&amp;nbsp;
This is why so many people notice that Kazaa and other apps crash or
otherwise refuse to start when it's added.&amp;nbsp; Of course, WMP knows
that it may be there, and just silently ignores its existence.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;I suspect that the "add /prefetch:1 to make rocket
go now" urban legend will never die, though.&amp;nbsp; I know that at least
one major company ships products with it in their shortcuts, without
ever asking us... just for good measure, I guess.&amp;nbsp; :-P&amp;nbsp; All
it does is change your hash number -- the OS is doing exactly the same
thing it did before, and just saving the prefetch pages to a different
file.&lt;br&gt;
&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;(ATTENTION: This is merely an informative
article; this information is completely unsupported, and the
functionality may change or disappear entirely in future versions of
Windows or service packs.&amp;nbsp; Furthermore, it is merely a hint for
the XP prefetcher, and it may choose to ignore it if it wishes.)&lt;/em&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=421882" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ryanmy/archive/tags/Performance/default.aspx">Performance</category></item></channel></rss>