A lot of work in performance tuning is organizational.  There's only so much work one can do with a profiler and a single module.  A good example is the Registry -- we can attach profilers to the Registry access routines and optimize them until they run as smooth as silk, but performance will still be impacted if you do thousands of Registry accesses per second.  For many problems, the cause is systemic: several components in a chain of command that are individually well-tuned, but didn't expect to call each other in a huge chain.  A good example of that is DirectShow -- no matter how skillfully crafted an individual filter is, if the mean path in a DirectShow graph is ten filters deep (with memory management between each one for passing buffers of audio or video around), latency is going to be high.

    More often than not, the best solution is simply logging.  Log when filters are instantiated or connected, log when Registry accesses are made, etc..  You want to mark high-level concepts, and try to get a picture for what's going on with the system as a whole.  This works fine if you only have one application that has to log... but more often that not, these systemic problems have hundreds of files involved, most of which aren't coded by you!  If every programmer performs their logging in a different way, it can be a nightmare to combine all those logs together, mixing different types of timestamps and different methods of delivery, and get a single ordered log of what happened over time.  Of course, that's exactly what we need... and that's where Event Tracing for Windows, or ETW for short, comes in.

    ETW is, at its core, a unified system for one-way packetized I/O managed by the Windows kernel, built for logging.  Every use of ETW has three participants in it -- the controller, the provider, and the consumer:
  • A provider is an module (DLL/EXE) doing something worth logging.  Most of the time, it runs without logging; it can, however, be "enabled" by a controller, at which point it recieves a handle from the kernel and starts logging "events" to that handle.  An event is an arbitrary struct (binary block) of data, the only condition being that it start with a 48-byte header.  This header contains a timestamp and identifying information.

  • A controller controls the actual act of logging.  The controller can ask the kernel to start a logging session, creating a handle and specifying that the kernel should take any events delivered to that handle and save them to a file.  (That file is usually on a hard drive, although we occasionally save them to RAM drives to ensure minimal interference.)  The controller can also enable and disable logging by providers, passing them a handle to log to.

  • A consumer reads events out of a file created by a logging session and parses them.  (It is also technically possible to have a consumer directly attach to a logging session's handle and retrive events in real-time, but this is rare.)
    So, why use this system over your own homebrew system?
  • Uniformity.  If you're debugging systemic problems involving multiple components, and all the involved components use ETW, you can have them all deliver their information to a single log file with uniform, steady timestamps, and write a single application that parses them all.

  • Speed.  ETW is extremely fast for providers to use, since all the I/O is handled by the kernel instead of by your module.  It typically takes only 1500-2000 cycles, depending on settings, to deliver an event and return to your code.  One can easily deliver thousands of events per second even on ancient machines.  We've achieved 20,000 events per second while only using 5% CPU load on a P3 500MHz(Yes, we have machines that old in our perf testing labs -- not everyone who uses Longhorn will be using a modern machine!)

  • Consistency.  With fprintf() or other homebrew systems, logging tends to be very slow and intrusive and is thus usually compiled in.  With ETW, logging is extremely fast; furthermore, since logging is turned on by a controller and is usually off by default, you can actually leave the ETW events in final shipping code!  If problems are found in the field, send the tester an app that starts a trace and turns on the provider, then read it later.  Many, many components in Longhorn will ship as ETW providers.

  • Reliability.  ETW isn't a new thing -- it's actually been in the OS and actively used since Win2K, and has been constantly refined since then.  Furthermore, ETW is available in both user-mode apps and kernel components.  (The latter access it through a MJ_SYSTEM_CONTROL IRP.)  This leads to...

  • OS cooperation.  The Windows kernel can provide many highly useful events via ETW for diagnosing performance problems.  Find out when and where disk I/Os, registry accesses, hard faults, and other performance problems happen!  More on this later...
    I'll start discussing the actual APIs in the next entry -- those whose curiosity has been piqued can jump into the MSDN documentation, which is not very good IMO but better than nothing.