And now the exciting solution to Performance Quiz #3!
There's actually an enourmous amount of philosophy that can be learned from this little example in terms of good habits but I'd like to anchor this in the numbers so let's begin with those.
I did 4 runs with everything cached, and jitting everything, this actually gives the rad slower parser an advantage in terms of reported time because the input file isn't so terribly large that load time can be ignored. These results were on my own desktop machine, I'm sure you'll get different numbers if you try the test but the ratio should be similar.
Rad Parser: 1.092sClassic Parser: 0.149sRatio: 7.366
So the RAD parser is over 7 times slower and if I was to subtract out the cost of starting an empty program from both (about 50ms) then you'd get more like a factor of 10 for just the code we wanted to test.
Space AnalysisThese statistics were gathered by running the programs under CLRProfiler which tracks every managed allocation.
Rad ParserTotal Allocations 48307 Objects 2357857 Bytes
Top Ten Allocated Types
Count Bytes Type 42813 1408708 System.String 2514 713712 System.Int32  2515 200608 System.String  14 10564 System.Object  8 6183 System.Byte  25 5690 System.Char  171 3420 System.Text.StringBuilder 15 2160 System.Collections.Hashtable.bucket  15 780 System.Collections.Hashtable 34 680 System.RuntimeType
Total Allocations 539 Objects 46454 Bytes
Count Bytes Type 253 15262 System.String 14 10564 System.Object  5 4864 System.Byte  14 3712 System.Int32  21 3456 System.Char  14 2016 System.Collections.Hashtable.bucket  14 728 System.Collections.Hashtable 14 564 System.String  6 384 System.Globalization.CultureInfo 16 320 System.RuntimeType
That result is simply stunning, it's 50 times the number of allocated bytes, most of which obviously were just thrown away. Lucky for us the GC was able to reguarly reclaim that memory so the impact on total workingset was considerably less (it should end up being near the size of your L2 cache because of GC hueristics). Nonetheless allocating all that memory was not a good thing for performance.
Let's look at the allocations in a bit more detail.
I extracted the top 10 allocating stacks from both runs out of the CLRProfiler log and I've got them here in text format.
Let's do the classic parser first this time because we can dispense with it fairly quickly. I'm only going to show the first two of the top ten stacks.
Stack 1 allocates 7590 bytes 6 6240 System.Object  1 72 System.OutOfMemoryException 1 72 System.StackOverflowException 1 72 System.ExecutionEngineException 1 72 System.Threading.ThreadAbortException 1 20 System.SharedStatics 1 108 System.AppDomain 11 782 System.String 1 20 System.String  4 80 System.Security.Permissions.HostProtectionAttribute 1 24 System.Reflection.Assembly 1 28 Microsoft.Win32.Win32Native.InputRecord
Stack 2 allocates 4702 bytesClassicParser::Main static void (String) 1 76 System.IO.FileStream 1 4108 System.Byte  6 506 System.String 1 12 System.Int32
Now what's interesting here is the main function is only stack #2, you can see the byte array there and a few string literals, on boxed integer, and of course the file stream itself. Some of those must be coming from inlining because they don't appear directly in the source code for main (might make another interesting blog entry to run all these down actually). The biggest source of allocations had no managed stack associated with them at all -- these are objects created by the runtime at startup time to set up the enviroment, open standard file streams, and get that first appdomain running. There are actually more of those for even small numbers of bytes further down and all of those actually appear in the RAD parser as well.
So what's interesting here is that of the 46454 bytes allocated in total, our parser is directly responsible for a mere 4702 of them. Wow.
Let's look now at the RAD Parser stacks, there's a lot more meat on those ones.
Stack 1 allocates 985000 bytesRadParser::Main static void (String)System.String::Split String (wchar int32 System.StringSplitOptions)System.String::InternalSplitKeepEmptyEntries String (int32 int32 int32 int32)System.String::Substring String (int32 int32) 40000 985000 System.String
Stack 2 allocates 710034 bytesRadParser::Main static void (String)System.String::Split String (wchar int32 System.StringSplitOptions) 2500 710000 System.Int32  1 34 System.String
Stack 3 allocates 359436 bytesRadParser::Main static void (String)System.IO.StreamReader::ReadLine String ()System.String::CtorCharArrayStartLength String (wchar int32 int32) 2334 359436 System.String
Stack 4 allocates 200000 bytesRadParser::Main static void (String)System.String::Split String (wchar int32 System.StringSplitOptions)System.String::InternalSplitKeepEmptyEntries String (int32 int32 int32 int32) 2500 200000 System.String 
Stack 5 allocates 41208 bytesRadParser::Main static void (String)System.IO.StreamReader::ReadLine String ()System.Text.StringBuilder::.ctor void (String int32)System.Text.StringBuilder::.ctor void (String int32 int32 int32)System.String::GetStringForStringBuilder static String (String int32 int32 int32) 166 41208 System.String
Stack 6 allocates 7590 bytes 6 6240 System.Object  1 72 System.OutOfMemoryException 1 72 System.StackOverflowException 1 72 System.ExecutionEngineException 1 72 System.Threading.ThreadAbortException 1 20 System.SharedStatics 1 108 System.AppDomain 11 782 System.String 1 20 System.String  4 80 System.Security.Permissions.HostProtectionAttribute 1 24 System.Reflection.Assembly 1 28 Microsoft.Win32.Win32Native.InputRecordAll I can say is wow. We have to get down to stack 6 before we see the 7590 bytes of root allocations. You can see there's significant cost associated with splitting the strings -- making the arrays and the string pieces. Another hidden cost leaps right out here as well -- since we don't know in advance how many string parts there are there's a pass over the string to find the delimeter offsets, those are stored in an temporary integer array.
Cost Per Byte
Another way of thinking about these parsers is to consider how many times any byte of the input file has to be touched, you can see in the RAD parser we have to do newline parsing, and then scanning for delimeters and splitting, and then numeric conversions. So it's readily apparent that there will be many memory touches for each byte of the file. Consider just this counting the CRLF there are 70 bytes per line of input and 2500 such lines, that's 175,000 bytes to read. But we allocated over 13 times that in bytes to process the file the rad way.
The classic parser is very economical by comparison. Even if the data had not been in the disk cache we would only add a few milliseconds of overhead to both algorithms for reading such a small file. The memory management costs of the rad way are easily greater than any reasonable I/O cost.
Warnings and Good Practice
So should you all abandon your RAD practices and go study Knuth's parsing techniques? Well... I dunno about that. But there are definitely some lessons here. I think my message is this: If you choose the RAD parser you have made your coding that much easier but you still have a responsibility to measure the quality of your solution against some test data and see what your performance is going to be like. Every professional piece of software should have performance goals, even if they are very simple ones that are very easy to meet. Sometimes goals like “Yeah, 10 seconds is fine” is all it takes, other times we have to have strict time and space budgets. If you were in “Yeah 10 seconds is fine” mode then I think the RAD parser is a fine choice.
And here’s the 2nd half of the message. If you decide to do something fancier than the RAD way, you can easily find yourself in a world of hurt on correctness issues. Even this fairly simple parsing problem has much potential for mistakes – I put in error checks for three cases but did you notice that there’s no error check for an input token that is just a “-” – some people noticed… It’s easy to make mistakes when you are reaching for sharper tools, which is all the more reason to understand your performance goals so that you can choose a plan that is going to economically meet those goals. Is the “classic” parser approach doomed to be full of errors? No I think not. We just have to be more careful, there are well understood techniques for creating provably correct parsers but we’ll obviously be investing more in our algorithm to get there.
So there’s your balance – don’t blindly choose the easiest approach, it may be much too awful. Don’t naively choose a complex approach, the effort required to achieve correctness may be daunting.
Set quantitative goals so that you can objectively make the necessary plans to meet them.