Welcome to MSDN Blogs Sign in | Join | Help

Pigs Can Fly

Windows performance, development, and related issues
(and maybe some not so related...)
StackOverflow answer – why learn multi-core programming?

I must admit, I’m addicted to Stackoverflow.  Its a great site, being both interesting and easy to use.

Recently, I ran across this question “Are you concerned about multicore”.   HenryR, a PhD candidate at Cambridge is asking if the “developer on the street” needs to concern him/herself with multi-core development  practices.  

Henry’s question has a few answers, including the accepted one dmckee (a particle physicist)which I’ll focus on here.

There are many types of programs such as

  • A simple script one of might write to do some work then throw away,
  • high performance very parallel scientific  applications,
  • Main stream operating systems
  • Line of business applications
  • Web based apps
  • Command line based applications
  • Applications with graphical user interfaces.

This is simply the tip of the ice berg – I’m confident that with a little brainstorming, this list would be come large.

Dmckee’s answer is correct – but only for a small set of things.  Indeed, for a very simple application, that is not CPU bound, then making it multi-threaded may be more work than needed.  For example, I have utilities that are not multi-threaded.

However, even utilities can take advantage of multi-threading, and I argue that its de rigueur for a production program to do so.

The astute reader may say “But wait a minute!   Henry ask about multi-core, not multi-threading…”.

Yup, but to leverage multi-core, you need to write multi-threaded programs.   Using more than one thread has other advantages beyond parallelizing CPU bound operations so they have better throughput.

To pre-fetch the next comment “But if its not CPU bound, then why bother with multi-threading?  That’s just a waste of time!”

In my mind, there are three reasons for this:

  1. It can make your program more responsive from a user perspective.
  2. Even for things that are not CPU bound, it makes sense to do them in parallel.
  3. It can break up complex single threaded state machines into simpler, more procedural code.

In Windows, the user interface is message based: Windows sends a UI program a message for ech UI event the program needs to handle.   A program with any kind of graphical UI uses the GetNextMessage() family of functions to get them.  I don’t know about Linux or OSX, but I suspect they are similar in this regard.

This is conventionally done in a message loop where a program loops on getting then processing messages.   If there are no messages, then the program can wait (block) for a new message, or do some background, or idle processing.

Messages included all kinds of things, but most importantly user input – mouse moves and clicks, and keyboard events.  These are often translated into windowing operations (moves, resizes, etc) and control actions (button clicks, scrolling, etc).  In short, this means that coding handling the UI is a big state machine and that this state is not kept on the stack.  It has to be kept explicitly somewhere else.   

This has an important implication – for an application to main responsive, it must be able to quickly and consistently service the message loop.   This is done on a single thread, indeed many applications only have a single thread that does everything. 

Windows programs have been written this way for a long time.  Some people might argue that this is a design deficiency.  That’s a topic for a different discussion about application compatibility – do remember, this pattern was set back in the very early 90’s – Windows 3.11 was released in 1992.

In any case, delays in the message loop that the user can perceive are called hangs.  There are two ways to keep the message loop from hanging.

  1. Break up all operations into pieces that execute quickly enough so that the message loop can always be serviced quickly.   Indeed, lots of developers have tackled the problem just this way.
  2. Move longer running operations to another thread, or threads.

The short story is that only #2 works effectively. Why?  Simply put – I/O.  Specifically disk and network I/O.  But other I/O, such as GPU operations can also be an issue.

The problem is that any disk or network I/O operation can potentially be long – hundreds or thousands of milliseconds long.  Programs that do their file I/O and networking I/O on the UI thread will have hangs.  It is unavoidable.

Some may point out that “Hey!  That doesn’t matter, making it multi-threaded doesn’t speed things up!”.

Right…. and wrong…

Right in the sense that it doesn’t increase an applications throughput – the work still has to get done.   But wrong in the sense that it absolutely positively benefits the user.

So for point #1 multi-threading can make your program more responsive from a user perspective because you can move potentially long running operations off the UI thread – keeping the UI “alive” and responsive.   Trust me, your users will love you for this.

In my next post, I”ll discusses the source code for a C# program that shows how simple it is to move operations off the UI thread in WPF applications.   This doesn’t require any complex knowledge of fine grained synchronization, or complex multi-threaded programming.  The code is also easy to understand, not state machine driven and easy to debug.   All things that conventional wisdom holds is true for multi-threaded programming.

In future posts, I’ll talk more about the example source above, points #2 and #3 , and some general multi-core and multi-threaded topics.

In summary – multi-core and multi-threaded programming is much more than simply speeding up (parallelizing) compute bound operations.  Efficient threading can bring other benefits to your programs as well. 

So just what is in a trace? Using the xperf trace dumper

There is a lot of information in a typical kernel trace.  While the Performance Analyzer tool is quite powerful and makes it easy to view a trace graphically, sometimes you just need to see what is in the trace directly.  Xperf makes this easy.

First, its important to understand that a trace file (.ETL) is simply just the buffers produced by trace session written to a file.  The data in an ETL file isn't pre-processed, summarized, or otherwise annotated with meta data as it comes out of the OS.  Its is just the raw data that comes from a ETW session.  This is because ETW is designed for log time efficiency - ETW does the absolutely minimal amount of work needed to get the trace data to a file, or other consumer.

This means that all the heavy lifting of post processing trace data happens later. With the xperf tools, there are two places where this occurs:

  1. In the merge step, xperf takes the kernel trace and trace files and merges them into a single trace file.  Xperf will merges (adds) meta data to the trace (I've got another post that provides all the detailed on merging in the works...).  The result of merging is a single trace file that can be analyzed by the tools directly on the target machine, or copied to another system for analysis.   Note that the merge step must happen on the system where the trace was taken (the target system).
  2. When a trace is processed xperf using actions, or loaded into Performance Analyzer, the core trace processing components do a lot of work on the raw trace data.  This includes things like mapping process IDs (PIDs) to file and process names, mapping addresses to filenames, loading symbols for address, unifying stacks, and handling 64-bit and 32-bit differences.

As you've seen in the other posts, once a trace is merged it can be viewed in the Performance Analyzer.  But, xperf also allows you to see what is in the trace using the dumper action.  This is easy to do:

xperf -i fs.etl -a dmper >fs.csv

The -i fs.etl specified that the input file is FS.ETL.    The -a dumper parameter tells xperf to execute the dumper action.   The output goes to the standard output.

There is a short cut for this as well: the dumper action is the default action so if you only specify an input file then xperf simply dumps it.   For example, the following command does the same thing as the one above:

xperf -i fs.etl  >fs.csv

The resulting file is an ANSI text file where each line is one record.   Each record consists of a comma delimited set of fields.  The first field of each line is the name (or type) of the record. 

There are some special lines and sections at the front of the file.   Each record type is described by a header line.   The header lines are delimited by the 'BeginHeader' and 'EndHeader' lines.   Note that the line immediately after the 'EndHeader' line is unique, it doesn't have a header line.  This line describes some of the characteristics of the trace such as its duration, and the pointer size.

The first field of each header line is the name (or type) of the ETW record that the header line describes.  The rest of the fields are the names of each of the fields for the record type.  Here is an example of the process start event header (P-Start) and a P-Start event.

P-Start,  TimeStamp, Process Name ( PID),  ParentPID,  SessionID,  UniqueKey, UserSid, Command Line

P-Start, 1017280, fs.exe (3004), 3608, 1, 0x86661508,
S-1-5-21-626881126-397955417-188441333-3225678, c:\coding\fs\Release\fs\fs.exe  blflargorg c:\coding\*.cpp *.h -s

This event describes the start of a process.  There is also a corresponding P-End event. For processes that are already running when he trace is begun, the kernel logger includes a pseudo P-Start event. 

This means that every PID seen in other events will have a corresponding P-Start event in the trace before it is seen in an event.

Also note that xperf will dump events that you add to your own applications so long as you include an event manifest in your app.  So, you can add your own events and use xperf to dump them in the context of all the other events you include in the trace.

del.icio.us Tags: ,,
Using the Windows Sample Profiler with Xperf

Using the xperf tools, ETW, and the kernel sample profile interrupt all together provides a very effective and easy to use sample profiler for the analysis of both application and system wide performance.  At each sample interrupt, the ETW sub-system captures the instruction pointer and the stack.  This data is lazily and efficiently logged to an ETL file.  Once the data is saved, it can be analyzed with Performance Analyzer.

The next article in this series is  So just what is in a trace? Using the xperf trace dumper

Note: the examples in this post only works on Vista or Server 2008 32-bit;  Prior operating system's do not support taking stack traces.  Taking stack traces on 64-bit platforms will be the topic of another post.

Here is an example of profiling FS.EXE, a grep-like utility I've written.  I use this tool for experimenting with various topics such as efficient I/O, well performing string matching algorithms, and instrumenting applications with ETW.

For this test, I put the following commands in a CMD file:

    • xperf -on PROC_THREAD+LOADER+INTERRUPT+DPC+PROFILE
      -stackwalk profile
      -minbuffers 16 -maxbuffers 1024 -flushtimer 0
      -f e:\tmp.etl
    • fs.exe farglenorgin c:\coding\*.cpp *.h -s
    • xperf -d profile.etl

Since the commands are a bit long, I've separated them above and added line breaks to make them readable.  Each command above should be on one line in your command file.

The first command turns on the kernel logger and enables the following events:

  • PROC_THREAD flag enables the process and thread events. These mark the beginning and ending of each process and thread.  The kernel provider guarantees that there will be a begin/end pair for every process and thread during the trace.   Process and threads that exist before the trace was started or are still running when the trace is stopped also have these events.
  • The LOADER flag enables the loader events that log when the kernel loads an image (an EXE or DLL)
  • The INTERRUPT  and DPC flags enable the ETW interrupt and DPC events which mark each interrupt and deferred procedure call which are routines that run at DISPATCH_LEVELt
  • The PROFILE flag does two things; it turns on the systems sample profile interrupt and it enables the kernel's sample profile ETW event.

The other flags are important as well.

  • The -stackwalk profile parameter turns on ETW's stack walking feature for the sample profile event.   Every time a sample profile event is triggered by the sample profile interrupt, ETW will capture the stack and save the data in the trace buffers.  
  • The -minbuffers 16 parameter sets the minimum number of buffers that ETW will allocate for storing events.  Note, you need at least two for each processor in you system.
  • The -maxbuffers 1024 parameter sets the maximum number of buffers ETW will allocate to 1024 - a total of 64MB.
  • The -flushtimer 0 parameter tells ETW to never flush the buffers based on  timer, buffer's will only be written to disk when they are full.
  • The -f e:\tmp.etl parameter tells ETW to lazily write the full ETW buffers to e:\tmp.etl.   This puts the log file on a different physical drive than the drive on which the experiment is running.  This means that the writes that ETW uses to save the trace data do not occur on the interesting drive.

The second command simply runs the experiment.  It searches for the string 'farglenorgin' in all my .CPP and .H files.  I'm using a string that doesn't exist so I execute the worst case code paths in the application.  Replace this command with a command to run your experiment, or a pause instruction so you can dork around with a graphical program.

The third command simply stops the kernel logger, merges the data and saves it in profile.etl.

NOTE: These commands need to be run from an elevated command prompt.   Controlling ETW tracing requires administrative privileges.

There is now one other thing to do before examining the data - setting the symbol path.   Here is how I set the symbol path for this example: 

set _NT_SYMBOL_PATH =
c:\coding\fs\release\fs;
SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

This tells the symbol decoder to look for symbols in the release build directory for FS.EXE and in the Windows public symbol server, caching the served symbols in c:\symbols.  The xperf tools uses the symbol decoding libraries from the debugging tools for Windows.  You can find more information on using symbols here.

Once the trace is taken and the symbol path is set, then simply open the trace in Performance Analyzer with the command "xperf profile.etl". 

imageThe CPU Sampling by Process graph is the most interesting graph for this example.  To select the visible graphs, click on the flyout control on the left of the window, then select the CPU Sampling by CPU, and by Process graphs.

For his experiment, the CPU sampling by Process graph looks like this:

image

By default, all processes running during the trace are shown except the idle task (as seen above).  You can change which processes are displayed or hidden by by using the check boxes in the legend drop down as shown above.

This graph illustrates an important concept about the kernel event provider and the xperf tools in general - they are specifically designed to analyze system wide and application performance data and events.  For example, in the legend above, there are many processes listed, but only a few of them actually used any CPU time during the experiment.

Using the legend, you can eliminate all processes except the interesting one.   Here is what the graph of CPU utilization for only FS.EXE looks like

image

This is pretty cool as it provides a nice overview of the CPU utilization of FS.EXE, but it really doesn't tell us much about where time is being spent in the process it self.

The real power in Performance Analyzer is in its summary tables.  These are tabular displays of data about a specific chart, or a region in a chart.  For this experiment, I looked at the sample profile data for the entire trace.  To do this, right mouse click on the CPU Sampling by Process chart, make sure that the load symbols option is set, then select the Summary table view.

image

Note, it will take 10 to 20 seconds for the summary table to show up.  Performance Analyzer is loading symbols while this is happening.  (putting symbol loading on a background thread is on our to do list...).

imageAfter the summary table pops up, click on the flyout and select the columns for display.  In this case you will want the process, stack and % Weight columns (feel free to experiment with other columns). 

Next, arrange the columns as follows.  The columns to the left of the gold column are grouping columns.   You can change the order of columns and put them to the left or right of the gold column by dragging.

image

Now, you can expand the stacks for FS.EXE and see where it is spending its time.  Not that this isn't by function as in some profilers but by call stack.  This is much more powerful than simply knowing the functions where time is spent as it also shows you how the time consuming functions were called.

Its no surprise that my find string utility spends most of its time in the following stack:

image 

As with other sample profilers, you can look "up" and "down" the stacks from any particular point. This is commonly called a butterfly view.  Right mouse click on any item in the stack column and experiment with the callers/callees and inntermost/outermost options, like this:

image

This stack trace has very simple call stacks so it isn't very useful for looking at butterfly views.  But try one of your own programs and look at a butterfly stack view of a function that is called often from multiple places.   Or, use the butterfly view too look at a intermediate function and see all the functions it calls, and their stacks.

The above screen shots and summary table views contain the data from the entire trace.   This works ok for short traces.  But for longer traces, or even short traces with a lot of detail, we often need to look at specific time spans. 

For example, there are some time spans in my experiment where FS isn't using very little CPU time.  I'd like to see what FS is up to in that time span.  This is easily done by using the left mouse button to select a time span on the X axis and zooming the graph to that view, or look at the summary table for that span, as in this example.

image

imageOnce the interesting region is selected, I simply use the right mouse button to pop up the context menu and select summary table. 

Note that you can open up multiple summary tables, each from different regions of a graph, or even different graphs.  This is great for making comparisons.

The new summary table window now only shows the data for selected time span in the trace.  Is not surprising that FS is spending the little CPU time it is using in user mode asynchronous procedure calls.

image

This post illustrates some key concepts:

  • The xperf tools are designed for both system wide and application specific analysis.
  • Profiling with ETW is very, very light weight.  While the experiment is running, the xperf tools are not even loaded - the kernel itself is collecting the data.   All analysis is done as post processing tasks.
  • OS based sample profiling collects both user and kernel mode stacks.  
  • So long as you have symbols, production code can be profiled - no special debug or instrumented builds are required.
  • In this example, I started and stopped FS.exe (the experiment) between the tracing start and stop.  But, since this is ETW based, sample profiling can be started and stopped at any time, without stopping or restating even a single process.  You can profile anything at any time on any system.
  • Stack views provide a very powerful method for analyzing where time is spent in a process.
  • The general technique with Performance Analyzer is to use the graphics to identify interesting time spans in the trace, then use the summary tables to look at the data in detail.
del.icio.us Tags: ,,,
Xperf support for XP

"Do the xperf tools support XP or Windows Server 2003?" is a frequently ask question.  The answer is no mostly, and yes for a few things. 

The next article in this series is Using the Windows Sample Profiler with Xperf

xperf.exe can be used on Windows XP SP2, and Windows Server 2003 for turning tracing on and of, and merge kernel trace data with user mode traces into a single ETL file.   These operations are simply called "trace control".   NOte that the '-stackwalk' switch is not supported on XP because its kernel doesn't support capturing the stack on events, this is anew feature in the Vista kernel.

However, all operations that require trace decoding (and that's almost everything else), must be done on Vista or Windows Server 2008.  This includes viewing traces in the Windows Performance Analyzer tool (xperfview.exe).

The next question is this "The xperf tool kit installer doesn't install the tools on XP or WS2003; how do I get the tools on those systems?"

The answer is simple: From a Vista or WS2008 installation copy xperf.exe and perfctrl.dll to the target system. This is all xperf needs to support trace control.

After you have generated an ETL file, you can then copy it to a Vista or WS2008 system for trace decoding. 

For those of you interested in the long story....  

<boooring>
Event Tracing for Windows was first introduced in Windows in 2000.  Back then, the OS only supported a small number of events; very few other Window's components used ETW.   In those days, event logging with ETW was in its infancy and the people that wrote event consumers generally also wrote the code that produced the events, or worked closely with those that did.

Back in the day, many event providers and consumer's simply used the same C/C++ data structures to produce and consume events.  While simple, this sometimes broke because people wouldn't version the events correctly when the event structure changed.  In short, if the producer and the consumer code wasn't kept in sync then things were busted.  This got to be a real problem as ETW was used more broadly.

This problem was solved by using meta to describe events.  This allowed event consumers to decode events without knowledge of the events binary format.  This worked much better; it allowed the event provider author to change an event's binary format without breaking the consumer.  In the XP time frame MOF files were used to describe events.   For example, you can find he kernel's context switch event here.

Three things changed for Vista:

  1. The entire Windows build system was updated so that every component was described by an XML based manifest.  This included describing ETW events.  We deprecated the MOF format and all new events were authored with XML based descriptions in their manifests using the Event Manifest Schema.  
  2. The use of ETW became very prevalent - many teams added event providers to their components and used them for Windows Event Logging (which is ETW based), performance work, diagnostics, and testing.  For example, on my laptop, there are 985 registered ETW event providers.  Use the "xperf -provider" command to see what is registered on your system.
  3. Our team decided to make a major investment in ETW based tools as did other teams around Windows.  This meant that meta information for events was very important as it enabled event providers and the consumers to be more decoupled and cohesive.

But, this posed one problem for us: do we fully support trace decoding on both Vista and XP?  Or just on Vista?  It was technically possible to keep trace decoding working on XP, but this would require shipping some Vista components with the tools because the required trace decoding infrastructure is only present on Vista.  Unfortunately, this isn't possible for all kinds of business, legal, and some technical reasons.  It would have also doubled our test matrix.

After much discussion, we decided it was an easily workable compromise to support trace collection on XP, and require Vista or WS2008 for all trace decoding operations.
</boooring>

del.icio.us Tags: ,,,
Using Xperf to take a Trace (updated)

Lets get to it!  Here is how to take a basic trace then look at CPU and disk utilization.    Its really simple, just three commands to turn on tracing, turn it off, and then view the trace.

The next article in this series is Xperf support for XP

First, from an elevated command prompt window, enable a basic set of the kernel events using this command:

xperf -on PROC_THREAD+LOADER+DISK_IO+HARD_FAULTS+INTERRUPT+DPC+CSWITCH -maxbuffers 1024

This command enables a set of events in the kernel and sets the maximum number of buffers to 1024.  The default size for each buffer is 64K.  So for this session, ETW will use up to 64MB of memory for ETW buffers.  As buffers are filled with events, they are written to the log file in the background and then made available again for accepting events.  By default, xperf sets the minimum number of buffers to 64.  ETW will start with this many buffers and only allocate more buffers if needed.  Events will only be lost if ETW cannot allocate more buffers and/or keep up with the event rate by writing data to the disk. By default, the kernel events are written to \kernel.etl on the current drive.

Next, do something interesting - it can be anything from opening Internet explorer and a web page, or compiling a program with Visual studio, to something more complex like opening three or four Microsoft Office applications and doing some work.

Run the following command when your interesting thing is done:

xperf -d foo.etl

This simple command will take 10 to 30 seconds (or possibly longer) because it merging the raw kernel event data with meta data and doing some other post processing.  We call this 'stop and merge'.  Here is what this command does

  1. Performs a 'run down', during which the kernel logs a set of events that describe the state of the system.

  2. Turns off the kernel logger
  3. Interlaces data from multiple trace files and the kernel trace.
  4. Adds some meta info to the trace needed for processing the trace on other systems. This data is saved in the trace as a set of synthetic events.
  5. Saves the trace data into the file foo.etl (or the file name of your choice).

Finally, load the trace in the Performance Analyzer with the following command

xperf foo.etl

For this example, I took a trace of using Visual Studio 2008 to compile a program.  Here are screen shots of the CPU Usage by CPU and for disk I/O counts.

image

image

image Those are pretty interesting, but lots of things are running in the system, and I'd like to see just the CPU usage for Visual Studio itself.

The CPU usage by process graph makes this easy, just click on the fly out control on the left of the window and select the CPU Usage by Process graph. 

The fly out frame lists the graphs available for the events in the trace.  If there trace doesn't contain events that are needed for a particular graph, then the graph is not shown.

Performance Analyzer will automatically save the graphs you have selected.  You can change them at any time.

For my trace, the CPU usage for the DEVENV.EXE process and two CL.EXE processes looked like this.

image

DEVENV is the Visual Studio 2008 environment itself.   The CL.EXE processes are the two compiler sessions it started, one for each CPU on my laptop.

This is a simple example that illustrates some key points

  1. The kernel events can be enabled and disabled at any time.  There is no need to re-boot the system, log-out/log-in, or restart processes to use the kernel events, or any ETW event provider.  ETW events from any source can be dynamically controlled at run time.
  2. The xperf tools are designed for a post processing model, one where a trace is captured, then later analyzed.   This is in contrast to an observational model where you watch dynamic charts, graphics, or tabular data as something occurs.  The reason for this model is that ETW and the tools are designed for log time efficiency.
  3. This model is also specifically designed for taking traces on one machine, then analyzing them on another machine.   This ability is critical for running performance tests in a lab setting.
  4. The tools let you look at both system wide activity and process specific activity.
del.icio.us Tags: ,,,

Advertisements are now Music Videos

I just love this add for the Sony Ericsson XPERIA X1.... its just barely an add.  Its really a pretty cool music video...

Xperf Tools Landing Page and Update

The WHDC folks now have web page setup for the Windows Performance Toolkit (aka the 'xperf tools').  The page includes downloads for updates to the versions that ship in the SDK.  In the near future, this page will include pointers to updated documentation, and discussion forums.

The next article in this series is Using Xperf to take a Trace (updated)

This page included downloads for V4.1.1 of the Windows Performance Toolkit.  This download is an updated for the SDK version and includes fixes for two small bugs in the version included in the Windows SDK.

  1. The COM plug in that handles the power management events is not named correctly, so the power state transition analysis feature doesn't work.

del.icio.us Tags: ,,,
Xperf, a new tool in the Windows SDK

The SDK team just shipped the latest version of the Windows SDK which supports Windows Server 2008 and Vista SP1.  The SDK now includes an important new tool; the Windows Performance Tool Kit from the Windows performance team (we call them the xperf tools for short...)

This is the first article in the xperf series, the next one is
Xperf Tools Landing Page and Update

The xperf tools have long been an internal tool used by our team, and widely throughout Windows, for system-wide performance analysis.  Xperf got its start many years ago as a set of command-line tools that produce reports based off the ETW instrumentation in the kernel[1]. Many other components and applications in Windows are instrumented with ETW and xperf can enable these events, dump them, and analyze them.

Xperf is an important tool for anyone doing system performance work on Windows because it's specifically designed to give you a complete system-wide view of performance over long periods of time (10's of seconds, to minutes)[2].  It's also the only tool that knows how to fully process all the events from the kernel and correlate them into something that makes sense. 

For example, here is a detail graph of all the disk I/O to the system drive on my laptop for opening this post, editing it a bit, and then closing Live Writer. 

screen-capture[5]

Here is an example of the CPU and disk utilization for Outlook 2007 launch:

 image

Here is the same view, but with the data from all processes visible:

image

imageIn addition to graphical displays, the tools can also display tabular data (what we call "summary data"). The screen capture to the right is a table of sample profile events during a 6.5 second period during a find string operation over a tree of source code.  For that period, 73.93% of the total CPU time was in the idle thread, 6.78% was in the find string utility and the reset of the time was distributed around services, the system, xperf itself (at 3%) and other processes. As you start playing with the summary tables, try shifting around the columns to get different types of views on the data; for example, grouping IOs per process, IO type (read/write/...), IO size, IO service time, and so forth.

These simple examples barely scratch the surface of the data that the tools can gather and the richness of the information they can display.  The tools have several other important features including:

  • Full support for symbol decoding.  This uses the same mechanism as the Debugging Tools for Windows.  This includes full support for the public Windows symbols, and for your own symbols.
  • The ability to dump all the events from a trace file to a CSV file.  If the summary tables don't display what you want, then you can write your own trace processing tools on top of the text dump, or the (generally XML-based) output of the command-line actions.
  • Windows Vista supports collecting stack traces on all the kernel events.   One of the most useful things to do is collecting stack traces on the sample profile event. This is an extremely powerful tool for understanding where and why a program is spending time.
  • The xperf command-line tool can be used to control all the ETW trace providers in a system, including all the kernel events.
    del.icio.us Tags:
  • The xperf distribution also contains a quick start guide and basic reference manual.  Just look for the document Performance.Analyzer.QuickStart.docx, its in XPS format as well.

In the coming weeks, I'll blog more about the tools, how to use them, and the kernel ETW events.   We'll also soon have a web page up for the tools.   This is where you will soon find updates, additional documentation, and a message forum.

Now!   Here is how you can get the tools!

  1. Install the SDK by downloading the ISO image, or using the Web based installer.
  2. Find the xperf MSI in the SDK's "bin" directory.   It will be named xperf_x86.msi, xperf_x64.msi, or xperf_ia64.msi, depending on the architecture for which you install the SDK.  
  3. You can then install the xperf tools from the MSI directly, or copy the xperf MSI file to another location and install it from there.  For example, you could keep the MSI files on a USB key.

We'll soon have a web page up for the tools on the MSDN site... stay tuned!

del.icio.us Tags: ,,,

[1]  You can see the events supported by the kernel in the docs for the EnableFlags field of the EVENT_TRACE_PROPERTIES structure.  I'm going to blog more about these...

[2] The xperf tools from the Windows Performance Toolkit are very complimentarily to the SysInternals tools.

Really, we don't interview this way... really...

One of my favorite blogs is Worse Than Failure (WFT).   Many of the articles are very interesting.  But gee, you just can't believe everything you read on the web.    Recently, Alex Papadimoulis posted an article titled Job Interview 2.0: Now With Riddles!  I'm not sure where he got is information, but I know the Windows organization (its pretty big you know) Microsoft hasn't interviewed this way in many, many years.

I know because I'm a hiring manager in the core operating system division (COSD) and I've done about a hundred interviews for developers over the last three and a half years.  I've also been through all the interview training.  

Really, we don't interview this way... really... 

Nit Pickers Note: yes of course some oddball might ask a brain teaser every now and then, and your very own personal and genuine experience might have include some thing thought was a brain teaser, but the norm is just a regular, normal, interview.

Handy WPF Tool - Style Snooper

Walt Ritscher has a really handy tool on his blog called 'Style Snooper'.  This  utility will display the style of any WPF control from its assembly.   It does this by parsing all the public, concrete, non-generic classes  in the assembly that derive from FrameWorkElement.  Even better, he's provided the source code.

Beyond Hello World - Update 5, TreeMap Control Working, Perf Issues

CLCV V5 now has a fully working TreeMap control that zooms, supports mouse over events and looks pretty good.  The regions are laid out with the Squarified TreeMap algorithm.  Even better, the tree map itself scales to large numbers of nodes - easily 100's of thousands, and on my lap top, it will handle a couple of million nodes relatively well.   But there are two major performance problems:  1) WPF rendering seems to be very expensive.  2) Tree Map nodes are relatively expensive in terms of memory size.  This limits the number of nodes to a ~2 million.  You can find a link to the source code at the end of this post.

Tree Map Sceen Shot

Note - all performance numbers (number of objects, times and profile data) are gathered from my new Acer Ferrari 5000 laptop.

From a memory usage perspective the control performs acceptably for even a few hundred thousand nodes.  Tens of thousands of nodes are no problem.  But, it does begin to struggle with about a million, with two million being a empirical limit.  A future post on this WPF version of CLCV will focus on memory optimizations and usage in detail.

For this post, I'll focus on WPF rendering performance.  In summary, the primary performance issue that limits the scaling of my treemap control isn't memory or CPU time to compute or render the layout - it is actual physical rendering of the rectangles and text by WPF itself.  WPF cannot keep up with more than about 750 rectangles and a small number of scaled text items.  

My treemap control is pretty well optimized (there is still some work to do) - but the computational performance of the control itself is not the limiting factor.  The most important aspect of my treemap implementation is that it is smart about how much work it does -limiting its computations and rendering to only the nods that are reasonably large.  

When learning about treemaps, I looked at some existing code and read all the web  material I could find.  It looks like most implementations are pretty basic - they compute the size and location of each rectangle for the entire tree.  I thought this was a bit odd.  It seemed to me that it would be more efficient to only process the part of the tree that would be visible - e.g. computing and rendering tiny or invisible rectangles didn't seem like a good design decision to me.

Tiny RectangesMy first implementation took the straight forward approach as well.   As I suspected, for larger trees - even those with a only a few thousands nodes, computing and rendering tiny rectangles was a waste of computation and rendering time.  As you can see in the image to the right, rectangles that are smaller than some limit are too small to effectively mouse over or to meaningfully discern their relative area with respect to their parent.

So, I modified both the layout code and the rendering code to stop when the area of the rectangles became less than some threshold.  This worked very well both from a visual perspective and performance standpoint.  It also allows the data tree to become very large as it means that neither the layout and rendering costs grow with the overall tree size as the number of nodes surpasses the number of rectangles that the algorithm decides to actually render.

Experiment: CLC V5 includes a unit test for the treemap control that allows you to control the size of the three and set the treemap parameters such as the Area Limit.

The area limit is the minimum size of the rectangles that will be rendered.  Assuming there are enough nodes, the larger this number, the fewer rectangles will be visible.  

From empirical testing I determined that a value of 64 works pretty well.

For trees over a one or two thousand nodes, there are usually more nodes than can be displayed.   In these scenarios, the scaling limitations are inherent to WPF itself, not the layout computations, or rendering in the treemap control.

Its important to understand how the control works and when WPF actually does the visible rendering.  The treemap control has two main steps:   the first is the actual layout computations of the tree, and the second is the rendering of the tree.   Once the control is rendered, then WPF will take the rendering data generated in the control's OnRender function and actually turn that into DirectX commands that become visible on the screen.   Its this third step that is the limiting factor.

The layout computations are handled by a function called ComputeAllNodeBoundingRects.  This  function starts at the root node and recursively computes the size of each visible rectangle for the nodes in the tree.  This function stops its recursion when it reaches nodes that have visible areas less than the Area Limit.  You can experiment with area limit values using the unit test (see above).  If you set the area limit to zero, then the layout computations will compute a rectangle for every node in the tree.

Computing the bounding rectangles only needs to happen when the treemap control receives a new data tree (by setting the TreeMapData property), or when the render size changes (handled by the OnRenderSizeChanged event).

When WPF determines that a control needs to be rendered, it calls the controls OnRender method.  This control's OnRender method recursively paints each of the rectangles computed in ComputeAllNodeBoundingRects.   It also renders the text for the root nodes and first level nodes.

Both of these functions are relatively fast - here are some statistics for the initial random tree.  LQ means 'lower quartile', UQ means 'upper quartile', IQ range is the inter quartile range divided by the median value.   Range is the total range (max-min) divided by the median.  Times are in milliseconds.

  Count Min LQ Median Mean UQ Max IQ Range % Range %
Compute 100 2.6 2.8 3.0 3.4 3.4 7.7 18.4% 171.7%
Render 100 5.8 6.2 6.7 7.3 7.7 19.6 21.8% 204.9%

These times are for computing and rendering from 1,316 to 1,539 visible nodes.   As you can see from the times, neither compute nor rendering times in the control itself should inhibit the control from rendering smoothly.  This is especially true when the control is rendering due to events other than size changes.

But, even with the times above, the control still struggles to render - you can see this by just rolling the mouse over the control and watching the lagging mouse-over node changes, or by resizing the control.  In both cases, if the control is painting more than 700 rectangles or so, then it can't keep up.

While relatively anecdotal, the CPU utilization for the control also seems on the high side.  Just continually moving the mouse over the control uses considerable CPU, 70 to 80%.  This is high, especially considering that my lap top has a dual core AMD Turion processor with has a WinEI rating of 4.8.

While there does seem to be some material performance and scaling issues with WPF, there is a lot to like about the programming model.   I'll talk about this more in subsequent posts.

Source Code for this post:  CLCV-BLOG-5.ZIP contains the source code to this post.   I didn't include the data files in this zip file as they haven't changed - you can simply use the ones from CLCV-BLOG-3.ZIP.

Riffing on Rico

Rico's recent post is interesting as it hints at things some developers do that sound like a good idea, but really pose some big performance problems and can sometimes be hard to repair disasters of ship stopping proportions.  Rico, here are some other possibilities for your list...

1) An architect decided that the file system would be a cheap database by storing large numbers of items as individual files where the name is the key.

3) The driver developer that decided polling for monitor presence in a DPC.   Polling time was a couple of hundred milliseconds, but it only polls once every three seconds. No problem!  

4) The hard drive that occasionally decides to spend 1.5 seconds flushing its cache (and nothing else...)

Beyond Hello World - Update 4, File loading 27 times faster!

CLCV V4 now loads files about 27 times faster than V3 when running on my laptop.   The tree view is also about 5 times faster.  This comes from changing my initial naive implementation to a smarter one where I minimize the inter-thread communication and handle the tree view much more efficiently.  More information about the source code is at the bottom of this post.  Here is what I learned in working on V4.

  • Updating WPF UI elements can be expensive for large numbers of items.  An effective and simple strategy is to only load data into WPF UI elements that the user actually needs to see - don't use WPF objects as the store for large numbers of objects.  
  • Updating pixels on the screen is expensive - even with relatively high performance hardware.  Changing pixels (e.g. updating UI controls) at a rate faster than the monitor refresh rate is a waste of system resources, CPU time, GPU time, and memory bandwidth.  UI updates that are related to (or driven from) high throughput tasks should be paced to no more than the monitor refresh rate.

V3 was the first functional version of CLCV and while it worked, it loaded files very slowly - talking 90 seconds or so (on my laptop) to load the largest data file (about 8.75MB in size).  Given that the ultimate goal is to handle CSV files that hold data for the entire Windows Vista source code tree, I really needed to improve CLCV's performance.  

 

The cause of the performance problem was easy to find using the Visual Studio 2005 profiler.  I simply enabled instrumented profiling and the offending code was obvious.

The most time consuming operation in V3 wasn't the actual file read time, it was the overhead of building the tree of tree view items.  In V3, the file reading thread sent a message to the UI thread for every new file and directory from the CSV file.  The UI thread then added that information to the tree of tree view control items as the file was read.  This was very time consuming and took much longer than actualy reading the file.

It turns out that building a tree of tree view items is simply expensive.  The most expensive part being setting the string that is displayed as the item's header.  This is closely followed by actually making a new item the child of another (ItemCollection.Add).  I don't yet know why this is expensive, only that it is.  It may be because I'm building the tree bottom up instead of top down - Kiran Kumar discusses this as a potential issue here.  I need to chat with my friend Tim Cahill the WPF performance guy about this a bit more.

 

For V4, I took a different approach: I re-factored the code so the file loading thread now does all the work of constructing a tree of objects that represent the original directory structure and its files.  I also moved the logic that updates the progress bar from the UI thread to the file loader thread - only sending progress messages to the UI thread every 33 milliseconds (this is configurable). 

The results were dramatically better.  The file loader thread could read the file and re-construct the directory tree much faster than the previous version.  On my laptop, the 8.75MB file could be loaded in about 1.47 seconds - not bad at all.  In this version, the file loader thread sends a message to the UI thread to update the progress bar no more frequently than every 33 milliseconds (about 30 times per second).  This is more than often enough so that the progress bar is visibily smooth, but not so much that updating the progress bar takes too long.

Experiment: Updating the progress bar is expensive - way to expensive to do this for every line read from the file.  You can see how expensive this can be by using CLCV V4 and setting the "Progress Update Interval" in the options dialog box to 0.   When this is zero, the file reading thread will send a message to the UI thread for every record read from the the CSV file.   The messages get queued up in the UI threads dispatcher object and processed as fast as the UI can handle them.   Processing all these messages and updating the progress bar takes a much longer time than processing the entire file!  The lesson here is that the overall process of updating the UI (chaning pixels on the screen) is simply expensive.  There is really no need to update the UI faster than the monitor refresh rate.  Often about 1/2 to even 1/3 of the monitor refresh rate is just fine.   

Note that for most things, a UI update rate of 30 or so updates per second will present a smooth animaition to the user.   If needed, updating the UI faster -- at or close to the monitor refresh rate -- will work well.

Next, I needed a strategy for minimizing the costs of building the tree of tree view items that display the directory tree and files. Given that the file reader thread builds a tree structure, the solution was simple:  CLCV V4 only populates the tree view control when it needs to, and only populates the items the user needs to see.  After the file is loaded, the UI code updates the tree view with the top level elements.  When the user expands an element, the UI thread then populates the data for that item's children.  

In this way, CLCV only pays the costs for populating elements the user needs to see; only tree view items that will actually be displayed are created and added to the tree view item tree.  This also amortizes the costs of the items displayed over the time the user actualy minipulates the control.  So even if the user browses the entire three, he never experiences a sluggish tree view control.

Experiment: This works quite well, if you use CLCV version 3 to load the large test file (test-l.clc.csv) the first click on the bottom entry ("GSFD") will take a noticeably, and annoyingly, long time to expand ( about 2.5 seconds on my laptop).   With V4, this happens quickly: it takes about 150ms to populate 922 child items (~6,000 items per second, or ~166us).  It looks like it takes about anothe r300 milliseconds to draw the new items: its hard to measure this because WPF doesn't have a begin and ending events for item expansion (I just used a stop watch).

With V4, the profiler shows a very different picture: The bulk of the CPU time during file loading is spent actually parsing the text data from the file, primarily converting text fields to decimal values using UInt32.TryParse().

This is as expected.  It probably isn't worth optimizing the conversion operations by writing a hand crafted text to Uint32 conversion routine as such a routine would need to handle localization and othe rissues.  I expect it would only result modest performance improvements.  The primary file loading bottleneck in V4 isn't CPU time, but I/O efficiency.

Loading a 8.75MB file in 1.47 seconds is pretty good - its about 6 MB/s; and its plenty fast enough for my intended scenario.   But, it is still slow compared to other applications.  I've developed other tools (such as CLC itself) that are completly I/O bound and use only one thread for the entire application.  These tools easily drive the disk at its maximum sequential read rate of about 35 MB/s while handling much more complex string opeations than merely parsing some CSV data.  For example, CLC is an order of magnitude faster than CLCV.  The reason is the stream reader class - it doesn't do asynchronous or un-buffered I/O and its paying the cost of transcoding ANSI or UTF8 text into UTF16 for storing in .NET string objects.  

In contrast, my natively implemented tools read files fully asynchronously in un-buffered mode.  This makes I/O operations happen in paralell with data processing - not sequentially.  They also parse the text data directly from the buffers populated by the operating system file read operation - no data copies are needed.  Next, they handle ANSI and UNICODE directly - they don't transcode all input text to UTF-16.   

Now this isn't to say that the StreamBuffer class is slow - it isn't.  Using off the shelf I/O libraries in native C++ results in comparable performance to the .NET StreamBuffer class.  The difference comes from using hand crafted I/O routines specificaly designed to for sequntial text line reading and maximizing parallism between disk I/O and computation. 

One of my future tasks is writing a COM wrapper for my hand crafted C++ I/O libraries so they can be used efficiently from .NET languages.  I expect this will allow CLCV to come very close to the file read performance of my native tools.

Source Code for this post:  CLCV-BLOG-4.ZIP contains the source code to this post.   I didn't include the data files in this zip file as they haven't changed - you can simpy use the ones from CLCV-BLOG-3.ZIP.

 

 

 

 

Beyond Hello World - Update 3, Control Templates, Multithreading, and more... (with source)

I've learned a lot in working on my first real WPF application such as implimenting multi-threaded file reading, how to use the dispatcher object, how to use control templates to customize controls, the basics of application configuration, using abstract C# classes, and using anonymous methods.

I've posted the source code that goes along with this post.  It is in the file CLCV-Blog-3.ZIP at the end of this post.  This version of CLCV has some new features

  • It has an option dialog box for setting the user's preferences.  This mechanism uses the application properties wizard in Visual Studio - this was surprisingly easy to do, but it does have one problem; I can't control where the data is saved! (yet).
  • File I/O is multi-threaded.  One thread reads the lines from the CSV input file and sends the resulting data to the UI thread using the UI's Dispatcher.
  • File reading can be quickly canceled - an important attribute for a program that is designed to load large files.
  • CLCV displays an accurate progress bar as it loads the file.  
  • The recently used file list now works correctly
    • It discards files that no longer exist when CLCV starts
    • It keeps the list sorted in order of use (most recent at the top).

There are many new concepts for a native C++ developer to learn when ramping up on WPF and C#.  One of the most important is how to customize the look and feel of your applications - this is a Raison D’être for WPF.  In my earlier versions of CLCV, I couldn't figure out how to customize the look and feel of buttons so I rolled my own using a rectangle and some animations.  It looked ok, but this was completely the wrong way to go about it.

One of my friends (Joe Laughlin) pointed me in the right direction: Customizing controls is easily done using Control Templates which are designed to do exactly what I needed: completely controling the look and feel of any WPF control while maintaining their semantics.  Even better, with WPF resources, this can be asily done for an entire application.  Control templates made it stright foreward for me to set the look and feel of buttons in my application to a blue colored theme.  So far, I've just scratched the surface of control templates, but you can see what I did by looking in resources\button.xaml.  It is tied into the application in App.xaml like this:

    <Application.Resources>
        <ResourceDictionary>
            <ResourceDictionary.MergedDictionaries>
                <ResourceDictionary Source="Resources\Button.xaml" />
            </ResourceDictionary.MergedDictionaries>
        </ResourceDictionary>
    </Application.Resources>

The XAML above applies my customized Button control template to every button in the application without touching the XAML for the other windows or pages. This is very cool. More than cool, its a great example of how WPF separates design (look and feel) from an application's semantics.

An important feature of CLCV V3 is its options dialog box.  This makes it easy to tweak and tune key parameters without recompiling.  Here is some info on the options:

  • Show Console: when checked, CLCV will display its debug/diagnostic console.
  • Synchronous Line Reads: this forces the file reader thread to post messages to the UI thread synchronously using Invoke() instead of BeginInvoke(). ( using Invoke()this is slow)
  • Animate Directory Loading: when checked, this forces CLCV to visibly populate the tree view control as items are read (this is slow).
  • Maximum Outstanding Messages: this is the maximum number of outstanding messages the file reader thread will have with the UI thread.
  • Message priority: this is the DispatcherPriority  used by the reader thread send messages to the UI thread (in the calls to Invoke() or BeginInvoke().

While V3 of CLCV is still naively implemented in many regards (this is my first WPF app and I only started with C# and WPF in late December '06).  But, I knew from the get go that I'd have to handle file reading in a thread separate from the UI thread to avoid UI hangs and sluggishness during file loading.

Of course, it is possible for single threaded applications to efficiently read files and keep their UI responsive - I have a native C++ class that provides I/O support for this.  My native class is extremely efficient and can easily drive the disk at its maximum sequential read rate with one thread. But, this is more complex in .NET 3.0.  While .NET does provide the fundamental support for asynchronous I/O, I'd essentially have to re-implement my Native C++ classes in C# and I'm not ready for that yet. 

This approach would also be just as messy in WPF as it would be in native code: mixing I/O and procssing UI events in one thread requires a state machine approach to handle issuing asynchronous reads, processing UI messages, and handling completed read events.  This is prone to complexity and can be difficult to debug and maintain.

Fortunately, it is very easy to create a file reading thread in .NET and for that thread to send its data to the UI thread asynchronously.  It takes surprisingly little code to do this:

   1:          public FileLoader( DataViewWindowClass dvw )
   2:          {
   3:              MyDataViewWindow = dvw;
   4:   
   5:              UseSyncronousLineReads      = TheApp.UserProperties.SyncronousLineReadsFlag;
   6:   
   7:              StartFileLoadHandler       += MyDataViewWindow.StartFileLoad;
   8:              NewFileHandler             += MyDataViewWindow.AddNewFileHandler;
   9:              NewDirHandler              += MyDataViewWindow.AddNewDirHandler;
  10:              NewDirTreeHandler          += MyDataViewWindow.AddNewDirTreeHandler;
  11:              FileCompletedSignalHandler += MyDataViewWindow.FileCompletedSignal;
  12:   
  13:              FileLoaderThreadEntryPoint  = new ThreadStart( LoaderThread );
  14:              MyThread                    = new Thread( FileLoaderThreadEntryPoint );
  15:              WaterMarkSemaphore          = new Semaphore( TheApp.UserProperties.MaxOutstandingMessageCount, 
  16:                                                           TheApp.UserProperties.MaxOutstandingMessageCount );
  17:          }

The code above is from DataView.Xaml.cs.  It creates the FileLoader object which owns the file reading thread.  The reading thread uses five delegates to communicate with the UI thread:

  1. One to signal that the file loading operation has actually started.  This message is used to send the file size to the UI thread so it can setup the progress bar.
  2. One message each for for files, directories, and directory trees.   Each of these messages also includes the number of bytes read from the file so far.  This allows the UI thread to keep the progress bar up to date.
  3. And finally, one to signal that the file read is completed.

The reading thread does all the work to read the file, which is in comma separated value (CSV) format.  The CSV file contains all the data necessary for CLCV to reconstruct the directory tree scanned by CLC.  (note, in the TestData directroy from ZIP file, I've included three CSV files, one small, one medium sized, and one large - this is actual data from some of my source code trees).

The reader thread handles the following work

  • file reads using a StreamReader object
  • detecting header lines and blank lines
  • splitting each line into the comma separated fields
  • parsing and converting the text data to binary data and putting that data in objects
  • sending those objects to the UI thread

All in all, this works quite well for a first attempt: the UI stays alive (doesn't hang), the file read operation can be quickly canceled, and its all done with straight forward code.

Note that getting the dispatch priority correct is very important.  On a single CPU system (like my laptop), using too low a priority simply causes the entire process to drag out.  Using too high a priority causes the I/O thread to starve the UI thread.  For example, if you set the dispatch priority to "input", then the UI thread may need to work through large numbers of input messages before it processes input events, such as a cancel request.   Setting the dispatch priroity above "input" (to "loaded" or "render") will keep input events from being processed.

Going higher, to "render" will interfere with the actual rendering of the UI causing the progress bar to be jerky.  Going higher than "render"  to "databind", "normal", or "send" completly stops UI rendering and blocks all input thus hanging the UI; this is specific problem that multi-threaded I/O is intended to handle. 

The "Background" dispatch priority seems to work acceptably well.  Using this priority, the UI remains responsive while consuming input events fromt the reader thread relativly smoothly.

Note that using dispatcher priorities is orthogonal to setting thread priorities - the dispatcher priority is simply the priority at which the dispatcher removes input events (delegates) from its input queues.

But, there are some performance issues in this initial naive implementation:

  1. The progress bar doesn't terminate nicely - it gets updated to 100% complete in the UI file completion routine (see the function FileCompletedSignal() ), but WPF spends a lot of time in this function before it re-renders the UI.  I need to figure this out.
  2. The biggest problem is the time it takes to actually build set of tree view items that are used to populate the list of directories and files.   This takes 10 to 20 times more than the actual file I/O, which is very surprising given that in native code, I have tools that do significantly more text processing while still remaining I/O bound - all in one thread.
  3. For small trees (just a few hundred items) this is fast enough not to be annoying (try loading the small and medium size test files).  However, for the 83,000+ elements in the largest file, processing all the data into TreeView items is excruciatingly slow.

  4. The performance of the tree view control itself also bogs down with the largest file.   Its noticeably sluggish when the selection changes.

That being said, my first implimention is certainly naive - I simply build a set of tree view items that mirror the entire file and directory structure.  It doesn't look like too much trouble be a lot smarter about this - populating the tree view as needed from another data structure.  I'm going to try this next.

In upcoming posts, I'll explore better ways to handle the TreeView control (only populating it as necessary), do some profiling to see if I can speed up data processing, and add some more advanced control templates, and explain why its important to throttle the number of messages from the file reading thread (a producer) to the UI thread (the consumer).

CES - Car Audio

Ok, I'm officially and old fogey - I had no idea how big the car audio business is... its HUGE.  Most of the 409 thousand square feet of the CES north hall was packed full of car audio companies selling anything and fevering for car audio. There array of amplifiers, speakers, thick cables, batteries, small video screens, DVD players, navigation devices, and stereos was jaw dropping in its breadth.

There were super tricked out cars everywhere.  For example, this Scion is probably double its original weight due to all the speakers, amplifiers and LCD displays (it has displays 16 total).

Some cars where so customized I couldn't even determine their original make and model. This one didn't even have a steering wheel.

 

 

Many of these cars where just for show, but they also had some other incredibly cool - and fully drivable - show cards.  This Audi has some kickin' intercooled turbos.

 

 

 

 

This black coup is both subtle and gorgeous.  It has a flawless matt black finish worthy of a stealth fighter.  It was my favorite of the show.

When I was building cars (many moons ago) speakers where important, but the world revolved around amplifiers.  Today, the situation is just the opposite, amplifiers are important, but man-o-man, speakers are KING, especially the big base speakers.  The big ones are beyond big, the are gi-normous.  This one is hanging from a cherry picker... 

However, the single most  amazing thing at the show was the 5.0 Farad 18 volt capacitor.  Yes Lucy, that's a full five Farads.    Holy electrons Bat Man

More Posts Next page »
Page view tracker