<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Performance Quiz #7 -- Generics Improvements and Costs -- Solution</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx</link><description>Believe it or not I actually spend a good amount of time thinking about my little quizzes hoping to come up with a small piece of code that illustrates a bunch of different things in a simple enough way that many people feel they can jump right in and</description><dc:language>en-US</dc:language><generator>Telligent Evolution Platform Developer Build (Build: 5.6.50428.7875)</generator><item><title>re: Performance Quiz #7 -- Generics Improvements and Costs -- Solution</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx#457975</link><pubDate>Tue, 30 Aug 2005 17:34:40 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:457975</guid><dc:creator>ricom</dc:creator><description>&amp;gt;&amp;gt;While List&amp;lt;&amp;gt; enumerator is better than ArrayList it's still non-optimal for enumerating a collection. &lt;br&gt;&lt;br&gt;I actually measured times that were very similar to the above, I was planning to write something about that when I got back from vacation.  I think maybe I still will.  &lt;br&gt;&lt;br&gt;However I still urge caution in interpreting the results of Micro-benchmarks in any kind of broad way.  This particular one was designed to magnify enumeration overheads which is why I didn't find it fair comment to say something like foreach is so and so % slower than for.&lt;br&gt;&lt;br&gt;If you were to dig deeper in this you'd find that the reasons it being slower break down into two categories:&lt;br&gt;&lt;br&gt;1) It has safety checks to help you if the collection is modified out from under you(generally a good tradeoff)&lt;br&gt;&lt;br&gt;2) It doesn't get fully inlined (partly due to the above)&lt;br&gt;&lt;br&gt;With ArrayList the problem, as I've mentioned, is that the implementation choice was not especially good.&lt;br&gt;&lt;br&gt;In neither case is there any find of &amp;quot;foreach&amp;quot; issue -- it's question of how the enumerator is implemented and what choices were made there.  You could even write your own arraylist wrapper value type (no storage overhead) that makes a different choice and get different behavior.  You could have the for behavior if you want it.&lt;br&gt;&lt;br&gt;Interestingly, if you do the same experiment for arrays you'll find (well I found) that the foreach pattern is FASTER with /o+ why?  The C# compiler generatates better IL for that pattern and the jit does a better job stripping bounds checks.  I tried to rewrite my for loop in such a way as to get rid of every bit of overhead but I couldn't figure out the secret sauce.  With foreach I got it without even trying.&lt;br&gt;&lt;br&gt;This data doesn't support a general conclusion like foreach is always worse on collections.  Sometimes foreach is the only choice as far as that goes.  High performance collections can and do make difference enumeration design choices that can give great performance.  Wrapper classes on our generics can easily make different choices -- ones you like better.&lt;br&gt;&lt;br&gt;That said it's important to know what choice was made in our own basic collection types and it doesn't happen to be the very fastest one.  Although it is much better.  Remember the enumeration overhead is very much magnified in this example -- by design.&lt;br&gt;&lt;br&gt;&amp;gt;&amp;gt;In real life application you have allocations of different kinds of objects not only enumerators. The huge number of allocated enumerators will increase the pressure on the GC.&lt;br&gt;&lt;br&gt;As you indicated (in context of the quote above) my comments are specific to this benchmark but of course I'm the one whose cautioning not to interpret the results of this analysis too broadly.  It is interesting to note that the raw cost of the allocations isn't really the problem, because if it was you would see it even in this benchmark.&lt;br&gt;&lt;br&gt;The true cost is just as you indicated, if there were other operations you would have side-effects on them (to some extent, you would have to measure to see just how much impact).  You would affect the overall locality and age the objects faster than they otherwise would potentially increasing the overall cost of collection.&lt;br&gt;&lt;br&gt;While allocations are cheap, not allocating at all is still cheaper :)&lt;br&gt;&lt;br&gt;Locality issues can be, as you say, difficult to find because the tax can be spread about your code.  However you would see the enumerators in this case in the CLRProfiler, even though they are short lived.  I often go fishing for locality issues by looking at overall allocation volume using CLRProfiler.  My LogDump tool (recently posted) is also helpful for this because sometimes the logs get very large.&lt;br&gt;&lt;br&gt;Good comments, thank you!&lt;br&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=457975" width="1" height="1"&gt;</description></item><item><title>allocation/collection cost</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx#457712</link><pubDate>Tue, 30 Aug 2005 02:43:04 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:457712</guid><dc:creator>Alex</dc:creator><description>&amp;gt; What about all those allocations, many people noticed that the arraylist version is allocating enumerators like crazy.  That must be the problem.  10,000,000 enumerator allocations can't be good right?  Surely that's the dominant cost?  Well, it turns out it isn't.  This guy here JIT_NewFast tells the tale, only 1.87% of the run time was spent in allocations.  Now since this guy is single threaded, if any collections happened they would be triggered by that call so the collections are part of that 1.87%.  In fact if you look at the performance counters while this program is running you'll see that the average time in the garbage collector is only really about one half of one percent (0.5%) so most of that time is just doing the allocations and not collecting. &lt;br&gt;&lt;br&gt;This argument is applicable only to microbenchmarks.  In real life application you have allocations of different kinds of objects not only enumerators.  The huge number of allocated enumerators will increase the pressure on the GC.  And GC will spend considerably more time doing collections, and collections will happen more frequently.&lt;br&gt;&lt;br&gt;The worst thing is that because enumerators are temporary objects the cost of their allocation and collection is very difficult to find in a real life application.  You won't see them in the CLR Profiler, because they are short-lived.  Yet they contribute to the time spent in GC.&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=457712" width="1" height="1"&gt;</description></item><item><title>foreach argument</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx#457700</link><pubDate>Tue, 30 Aug 2005 02:21:53 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:457700</guid><dc:creator>Alex</dc:creator><description>Here it comes :)&lt;br&gt;&lt;br&gt;foreach loop:&lt;br&gt;Generic List: 1.80703441359167 sec&lt;br&gt;ArrayList: 6.64174133863382 sec&lt;br&gt;&lt;br&gt;for loop:&lt;br&gt;Generic List: 0.603019657526306 sec&lt;br&gt;ArrayList: 2.61797574831438 sec&lt;br&gt;&lt;br&gt;While List&amp;lt;&amp;gt; enumerator is better than ArrayList it's still non-optimal for enumerating a collection.&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=457700" width="1" height="1"&gt;</description></item><item><title>re: Performance Quiz #7 -- Generics Improvements and Costs -- Solution</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx#457555</link><pubDate>Mon, 29 Aug 2005 20:49:45 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:457555</guid><dc:creator>Steven Padfield</dc:creator><description>What tool did you use to obtain the trace you presented in answer to Q3?&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=457555" width="1" height="1"&gt;</description></item><item><title>re: Performance Quiz #7 -- Generics Improvements and Costs -- Solution</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx#457489</link><pubDate>Mon, 29 Aug 2005 17:02:44 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:457489</guid><dc:creator>Michael</dc:creator><description>Great work, Rico!&lt;br&gt;&lt;br&gt;Just another reminder that I'll be studying every day for the rest of my life in an attempt to learn a fraction of what I think I need to know (and that's not including what I *want* to know... sigh... ).&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=457489" width="1" height="1"&gt;</description></item><item><title>re: Performance Quiz #7 -- Generics Improvements and Costs -- Solution</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx#457057</link><pubDate>Sat, 27 Aug 2005 10:29:29 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:457057</guid><dc:creator>ricom</dc:creator><description>&amp;gt;&amp;gt;It's disappointing that when discussing performance of the CLR the answer always seems to be don't use virtual functions (and interfaces?). &lt;br&gt;&lt;br&gt;Well that's a bit of a leap.  I am the first to say don't use features you don't need -- that's just being smart but don't use virtuals?  No I wouldn't go there.  &lt;br&gt;&lt;br&gt;All we can really conclude here is that the implementation choices in the original ArrayList were pretty unfortunate for raw performance and the advice we now give for how to implement enumerators (which we followed in the List&amp;lt;T&amp;gt; class) is helping a great deal.&lt;br&gt;&lt;br&gt;I wouldn't conclude anything broad from this micro-benchmark.  Comparisons between platforms have been done an notoriously give mixed results.  &lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=457057" width="1" height="1"&gt;</description></item><item><title>re: Performance Quiz #7 -- Generics Improvements and Costs -- Solution</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx#457020</link><pubDate>Sat, 27 Aug 2005 07:17:44 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:457020</guid><dc:creator>Jeffrey Sax</dc:creator><description>I guess I considerably underestimated the impact of the virtual function calls.&lt;br&gt;&lt;br&gt;To distinguish this effect from the memory usage, I tested the ArrayList option again, but this time took the enumerator construction out of the loop, and did the enumerating manually using MoveNext and Current.&lt;br&gt;&lt;br&gt;The result: this code is roughly 10% faster than the foreach version. Relative to the generic version, the allocations cause a slow-down of roughly 30%, much more than the 5% or so from the raw allocations.&lt;br&gt;&lt;br&gt;In other words, about 1/8 of the difference is caused by the allocations and their side effects - significant, but not dominant.&lt;br&gt;&lt;br&gt;The rest is due to virtual call overhead, divided roughly 55/45 between dispatch stubs and Current not being inlined.&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=457020" width="1" height="1"&gt;</description></item><item><title>re: Performance Quiz #7 -- Generics Improvements and Costs -- Solution</title><link>http://blogs.msdn.com/b/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx#456958</link><pubDate>Sat, 27 Aug 2005 02:36:30 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:456958</guid><dc:creator>Brien</dc:creator><description>It's disappointing that when discussing performance of the CLR the answer always seems to be don't use virtual functions (and interfaces?).&lt;br&gt;&lt;br&gt;It would be nice if high performance and polymorphism were not mutually exclusive.&lt;br&gt;&lt;br&gt;Especially since the Java guys have seemed to figure out how to make this work.&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=456958" width="1" height="1"&gt;</description></item></channel></rss>