Performance Quiz #6 -- Looking at the sixth cut
Well, it's time for me to surrender. Sort of :)
Raymond pulls out all the stops in his sixth version by painting a big bullseye on his biggest remaining source of slowness which is operator new. He turns in an excellent result here. On my benchmark machine I see the number drop from 124ms to 62ms -- a full 2x faster from start to finish. And observing the footnote on my previous message, the runtime for his application is now comparable to the CLR's startup overhead... I can't beat this time.
Let's look at the results table now to see how we ended up:
| Version |
Execution Time (seconds) |
| Unmanaged v1 |
1.328 |
| Unmanaged v2 |
0.828 |
| Unmanaged v3 |
0.343 |
| Unmanaged v4 |
0.187 |
| Unmanaged v5 With Bug |
0.296 |
| Unmanaged v5 Corrected |
0.124 |
| Unoptimized Managed port of v1 |
0.124 |
| Optimized Managed port of v1 |
0.093 |
| Unmanaged v6 |
0.062 |
Six versions and quite a bit of work later, we've been soundly trumped. But before I discuss that, let me put up the internal profile of Raymond's version 6
I've applied my usual filters to the call tree (nothing lower than 5% inclusive) and I also pruned out a couple of functions below HeapAlloc because they have long names and are boring :)
| Function Name (Sanitized) |
Exclusive Percent |
Inclusive Percent |
| _mainCRTStartup |
0 |
97.826 |
| _main |
0 |
97.826 |
| Dictionary::Dictionary(void) |
5.435 |
96.739 |
| MultiByteToWideChar |
19.565 |
25 |
| GetCPHashNode |
5.435 |
5.435 |
| operator new(unsigned int) |
1.087 |
16.304 |
| .. |
0 |
14.13 |
| .. |
0 |
14.13 |
| AllocateHeap |
4.348 |
13.043 |
| _free |
0 |
8.696 |
| FreeHeap |
2.174 |
8.696 |
| DictionaryEntry::Parse(...) |
1.087 |
33.696 |
| StringPool::AllocString(...) |
2.174 |
27.174 |
| _lstrcpynW |
19.565 |
25 |
| __SEH_prolog |
5.435 |
5.435 |
You can see that the memory allocation time is way down as a percentage, and of course that's a smaller percentage of a smaller total time. I think he gets a lot of raw speed from his improved locality thanks to that new allocator as well. Interestingly SEH overhead is up to a signifcant level in this run (now over 5% for the first time). Still nothing to be worried about.
So am I ashamed by my crushing defeat? Hardly. The managed code got a very good result for hardly any effort. To defeat the managed Raymond had to:
- Write his own file/io stuff
- Write his own string class
- Write his own allocator
- Write his own international mapping
Of course he used available lower level libraries to do this, but that's still a lot of work. Can you call what's left an STL program? I don't think so, I think he kept the std::vector class which ultimately was never a problem and he kept the find function. Pretty much everything else is gone.
So, yup, you can definately beat the CLR. Raymond can make his program go even faster I think.
Interestingly, the time to parse the file as reported by both programs internal timers is about the same -- 30ms for each. The difference is in the overhead.
Tomorrow I'm going to talk about the space used by these programs and that will wrap it up. Though I think Raymond is going to go on and do some actual UI and so forth with this series. That should be fun to watch.