Been a bit quiet on this blog lately, as most of my postings have ended up on the AppFabric CAT team blog. As part of a recent effort to build a canned LINQPad data context to use with StreamInsight, I went through a few rounds of performance testing to create an application that could (in order of priority):
The intent for this system was to be able to store and replay “canned” data sets for StreamInsight applications. The performance metrics, in order of priority were:
The “tricks” (constraints, really) I was able to take advantage of were:
The reference data set consisted of 733 MB of data downloaded from Environment Canada’s National Climate Archives web site, in XML format. My first pass through performance tuning consisted of evaulating the following techniques:
In order to perform the testing I needed a benchmark framework which could execute a deterministic test set and capture associated performance data. This framework needed to:
First pass with the full test set were pretty clear – compressed CSV and XML files look to be the clear winner for size (I threw in compressed XML just to be fair). We execute the tests by opening up and executing the PerformanceTests project.
Or, if you prefer the raw data:
Now that we’ve winnowed down the field to two candidates, let’s go and test them for read performance. We’ll run four tests here:
Note that we expect the XmlSerializer read performance to suck, as it is designed (and intended) to pull the entire file into memory and provide random access as an in-memory object. Remember one of the constraints we can take advantage of - Forward only access to data – we don’t have to pay the price to load the entire data set into memory for random access. I could have written a forward-parser using XmlTextReader or XElement, but to be honest, making it generic was more work than I wanted to bother with (remember – this has to be readily applicable to all forms of StreamInsight compatible data).
Note that we only lose ~ 13% of our read throughput by leveraging the GZipStream class. That’s pretty awesome
We’re now pretty close to our goals, but have plenty of room to push the performance envelope. Up to this point we’ve been using relatively simple measures and metrics to quantify and evaluate performance (file sizes via the FileInfo class, and elapsed time via the Stopwatch class). Now that we’re diving a bit deeper into performance tuning, it’s time to pull out the big guns. There are a number of excellent performance and profiling tools available for .NET (including the Visual Studio Profiler). My personal favorite is Red Gate’s excellent, ANTS Performance Profiler, part of their line of .NET Development tools.
One of the reasons this is my go-to application for performance investigations is that when I’m working with customers whom don’t have (or can’t provide) an instrumented build, or cannot install Visual Studio on a live box – it’s a great lightweight experience for understanding how your application is burning CPU and memory.
To follow along through this next section, download an evaluation copy of the ANTS Performance Profiler.
The first thing we’ll do is turn off the other tests in the framework so we can focus on the Compressed CSV path, and disable the file writing (i.e. we just want to execute the file reading code path). To streamline this, we fork the framework into a a separate project, PerformanceTests_Profile.
Next, we run the test framework in the ANTS Performance Profiler:
The profiler will then start up the framework application, and read back the compressed CSV file. Wait until the profiler completes to get a full data set (note that this will take longer than normal due to the overhead of profiling).
Let’s start by looking at the Wall time view – ensure that the 3rd drop down box is set to wall-clock time. This shows two hot paths:
To focus on just our code paths, switch back to CPU time. Waiting for synchronization disappears, and we see MoveNext() – our enumerator method – popping to the top with 97% of time with children (remember that the .Count() method forces the enumerator to run through the entire file). This is where we’ll focus our efforts.
Download the profile trace to walk through this section. We see the performance culprits being threefold:
In order to get our performance up to where it needs to be, let’s focus on each of these in order (biggest bang for the buck first).
Our first approach will be to remove the use of reflection and change type for assigning values. Since we need to maintain a flexibility, we need another “generic” method of assigning values to variables. Since functional programming is cool, let’s approach this problem by defining a function delegate Action<string, T> that is responsible for assigning values to an object of type T given an array of strings.
This involves updating the signature for ReadEvents<T> as below (the signature change on lines 1-2, and invoking the function delegate on line 25).
The function definition is fairly straightforward:
This preserves our need for the Csv reading code to be generic (as the parsing code is externally defined and passed in), as well as keeping complexity reasonable. Let’s have a look at our updated performance results.
Much better! Download the profile trace to walk through this section. We see the performance culprits being threefold:
Based on our profiler results, parsing DateTime values account for 30% of our current CPU time. We have a number of potential performance fixes we can apply here:
Let’s start by investigating the use of a more optimized DateTime parsing routine – that provided by the XmlConvert class. As XML serialization has been heavily optimized as .NET has evolved, it stands to reason that there may be some optimal parsing algorithms buried in there. Let’s find out
In order to use XmlConvert.ToDateTime, we also need to change the format in which we encode date time values by altering the WriteEvents method and recreating the data file:
Note the use of XmlConvert on line 3 (using ToString(DateTimeOffset)) and on line 11 (using ToString(DateTime, XmlDateTimeSerializationMode)).
Next, we need to update our parsing methods in ReadEvents<T>
And in the assignment function:
Let’s go ahead and run this and observe the results.
Now we’re getting there! Download the profile trace to walk through this section. We see the performance culprits being twofold:
Weird. Apparently the format in which we’re storing the StartTime values is more expensive to parse (it stands to reason they’re different – StartTime value is being parsed from a DateTimeOffset record. Wait – we wrote out a DateTimeOffset, but are reading in a DateTime. Could that make a difference?
Going back to our change in ReadEvents<T>(), let’s modify it to use XmlConvert.ToDateTimeOffset instead.
Wow. Huge difference! Download the profile trace to walk through this section. We now see the performance culprits being centered in the assign function:
Almost there. Let’s make a few small changes to finish this out – let’s switch in the XmlConvert functions for Double.Parse() and Int32.Parse().
Ok, let’s modify the assignment method to use XmlConvert instead of Double.Parse:
Uh-oh – performance regression! Looks like the Double.Parse() is the better function Download the profile trace to walk through this section.
Looks like our version 3 is the most performant reader. Having achieved all of our performance goals – let’s call it done!
Now that we’ve finished our performance exploration, let’s run the full test suite and review our results and key learnings.
Or, for the raw data:
The key learnings are:
For more reading on Red Gate’s ANTS Performance profiler check out: