<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Developer Division Performance Engineering blog : Performance testing</title><link>http://blogs.msdn.com/ddperf/archive/tags/Performance+testing/default.aspx</link><description>Tags: Performance testing</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>PDC2008 preConference Workshop</title><link>http://blogs.msdn.com/ddperf/archive/2008/10/22/pdc2008-preconference-workshop.aspx</link><pubDate>Wed, 22 Oct 2008 18:36:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9011219</guid><dc:creator>MarkBFriedman</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/ddperf/comments/9011219.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ddperf/commentrss.aspx?PostID=9011219</wfw:commentRss><description>&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Over the past several weeks, I have been working overtime developing a presentation on web application performance to be given at the upcoming Professional Developer’s Conference (PDC), which is next week in Los Angeles. This is partly why I have been remiss about blogging this month. At least, that is my excuse, and I am sticking to it.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The presentation is entitled “Performance by design using the .NET Framework” and I am presenting jointly with two colleagues in the Developer Division, Rico Mariani and Vance Morrison. It is one of ten PreConference sessions that are scheduled to run all day on Sunday. My portion of the session is an extended discussion of optimization &amp;amp; scaling strategies for web applications. The scope encompasses ASP.NET, AJAX, Silverlight, WPF &amp;amp; WCF. Information about the upcoming event is &lt;/FONT&gt;&lt;A href="http://www.microsoftpdc.com/Agenda/Preconference.aspx#performance-by-design-using-the-net-framework" mce_href="http://www.microsoftpdc.com/Agenda/Preconference.aspx#performance-by-design-using-the-net-framework"&gt;&lt;FONT color=#0000ff size=3 face=Calibri&gt;here&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt;.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;I have attended several PDCs in the past as a customer, and found them to be amazing events. Before the days of widespread blogging, the “Ask the Experts” sessions at the PDC were often the only way to get an authoritative answer to your question. The actual Conference sessions emphasize imminently arriving technology and future directions, aimed at the professional developer who needs to be able to anticipate and plan. The technical sessions run the gamut from Windows 7, the Windows Live Cloud computing initiatives, IE8, Surface and Windows for Workflow. There will be previews of the next version of the .NET Framework, Visual Studio, and the Visual Studio Team System. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;This is the first time I will be on the other side of the podium for the event. In our preCon session, Rico, Vance and I will focus on facilities available in the Framework today, including the best practices and tools we recommend to help you design &amp;amp; build an application that meets its performance and scalability requirements. The intended audience is experienced .NET developers. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;If you are reading this blog &amp;amp; coming to my session, be sure to say hello. I’d like to get the chance to meet you in person. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;-- Mark Friedman&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9011219" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ddperf/archive/tags/Performance+Engineering/default.aspx">Performance Engineering</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/.NET/default.aspx">.NET</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Scalability/default.aspx">Scalability</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Performance+testing/default.aspx">Performance testing</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/PDC2008/default.aspx">PDC2008</category></item><item><title>Lessons from the test lab: investigating a pleasant surprise</title><link>http://blogs.msdn.com/ddperf/archive/2008/06/18/lessons-from-the-test-lab-investigating-a-pleasant-surprise.aspx</link><pubDate>Thu, 19 Jun 2008 00:33:20 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8618468</guid><dc:creator>jonathanh</dc:creator><slash:comments>8</slash:comments><comments>http://blogs.msdn.com/ddperf/comments/8618468.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ddperf/commentrss.aspx?PostID=8618468</wfw:commentRss><description>&lt;p&gt;This post describes our recent investigation into an interesting performance problem: benchmarks that we were surprised to find running significantly faster than we expected on new hardware. Along the way we discuss useful benchmarking tools, how to validate results, and why it pays to know exactly what hardware you're running on.&lt;/p&gt;  &lt;p&gt;This all started in our performance test lab. During the development of Visual Studio, each new build undergoes a suite of automated performance tests, running in a lab full of identical machines. These performance tests allow us to track Visual Studio's performance over time, and &lt;a href="http://blogs.msdn.com/ddperf/archive/2008/05/20/visual-studio-performance-testing-noise-is-enemy-1.aspx"&gt;detect performance regressions&lt;/a&gt; (when something gets unexpectedly worse). We recently added a batch of new machines in our lab, and that's when the fun started.&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Pop Quiz: How Much Faster?&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;Old machine: dual-core Intel Pentium D 830 processor, running at 3 GHz, with 1 GB of RAM.&lt;/p&gt;  &lt;p&gt;New machine: quad-core Intel Xeon 5355 processor, running at 2.66 GHz, with 4 GB of RAM. &lt;/p&gt;  &lt;p&gt;Given the differences in the two hardware configurations above, how much faster would you expect the new machine to be when running a Visual Studio performance test? Lower than, same as, twice, three times or four times the performance of the older machine? &lt;/p&gt;  &lt;p&gt;One line of reasoning might look at the relative clock frequencies of the processors on the two machines. This might lead you to expect the newer processor cores to perform slower than the older cores, since their clock frequency is 11% lower. By this reasoning you might conclude that single-threaded applications would perform poorly on the new machine. &lt;/p&gt;  &lt;p&gt;Another line of reasoning would factor in the number of cores in the two systems. Since the new machine has twice the number of cores, you might expect it to have about twice the performance on multi-threaded applications. (If you also accounted for the lower clock frequency, you'd end up with a figure of 1.78 times the performance of the old machine.) &lt;/p&gt;  &lt;p&gt;A third approach might estimate the impact of RAM size. We’ve quadrupled the amount of RAM, so maybe any benchmarks that used to page to disk can now execute entirely in memory and hence will be orders of magnitude faster. [We'll cheat here and tell you that our benchmarks are generally not memory constrained]. &lt;/p&gt;  &lt;p&gt;So far, all these options seem plausible. What's your guess? &lt;/p&gt;  &lt;p&gt;What we naively expected to find lay somewhere between the first two lines of reasoning - that the new machines would be 1-2 times faster than the old machines, depending on the particular benchmark.&lt;/p&gt;  &lt;p&gt;What we actually found is that many of our single-threaded CPU-bound benchmarks run about &lt;strong&gt;twice as fast&lt;/strong&gt; on the new machine, while scalable multi-threaded benchmarks run up to &lt;strong&gt;four times as fast&lt;/strong&gt;. This was a pleasant surprise, because it significantly reduces the overall time to run all the benchmarks. But it did leave us wondering why we were getting much greater speedups than our naive explanations would suggest. The rest of this post explores that question.&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Using WinSAT and SPEC to Validate Benchmark Results&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;To make sure this wasn't a fluke result, we used the &lt;a href="http://msdn.microsoft.com/en-us/library/ms737378(VS.85).aspx"&gt;Windows System Assessment Tool&lt;/a&gt; (winsat.exe). This is a built-in tool that can give quickly give a representative view of a machine's performance. It is multi-threaded, taking full advantage of all the cores on a machine. Here are the WinSAT CPU results: &lt;/p&gt;  &lt;table cellspacing="0" cellpadding="2" width="422" border="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="189"&gt;&lt;strong&gt;Benchmark&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="89"&gt;&lt;strong&gt;Old Machine&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="95"&gt;&lt;strong&gt;New Machine&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="47"&gt;&lt;strong&gt;Speedup&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="189"&gt;CPU – Compression (MB/s)&lt;/td&gt;        &lt;td valign="top" width="89"&gt;70.5&lt;/td&gt;        &lt;td valign="top" width="95"&gt;262.0&lt;/td&gt;        &lt;td valign="top" width="47"&gt;3.7&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="189"&gt;CPU – Encryption (MB/s)&lt;/td&gt;        &lt;td valign="top" width="89"&gt;52.3&lt;/td&gt;        &lt;td valign="top" width="95"&gt;139.3&lt;/td&gt;        &lt;td valign="top" width="47"&gt;2.7&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;We also wanted to validate our results against other real-world benchmarks. For this we turned to the &lt;a href="http://www.spec.org/"&gt;SPEC website&lt;/a&gt;. SPEC produces a series of benchmark suites, plus a very formal process that ensures results are reproducible and can fairly be applied across different manufacturers. More importantly for our purposes, SPEC posts all reported benchmark results on their web site. You won’t always be able to find your exact machine listed, but after using results from a tool like CPU-Z you can generally find results from a machine with the same CPU configuration and clock speed. &lt;/p&gt;  &lt;p&gt;We used the &amp;quot;CINT2006&amp;quot; benchmarks – this is a widely-used benchmark suite concentrating on integer performance. We compared results for both CINT2006, which is a good test of single-threaded performance, and CINT2006 Rate, which tests the ability of a system to execute multiple copies of CINT2006, and is therefore a better test of multi-threaded performance. For two representative machines that are similar to our old and new hardware, here are the results:&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;table cellspacing="0" cellpadding="2" width="422" border="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="189"&gt;&lt;strong&gt;Benchmark&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="89"&gt;&lt;strong&gt;Old Machine&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="95"&gt;&lt;strong&gt;New Machine&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="47"&gt;&lt;strong&gt;Speedup&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="189"&gt;CINT2006&lt;/td&gt;        &lt;td valign="top" width="89"&gt;9.85&lt;/td&gt;        &lt;td valign="top" width="95"&gt;15.5&lt;/td&gt;        &lt;td valign="top" width="47"&gt;1.6&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="189"&gt;CINT2006 Rate&lt;/td&gt;        &lt;td valign="top" width="89"&gt;18.0&lt;/td&gt;        &lt;td valign="top" width="95"&gt;44.4&lt;/td&gt;        &lt;td valign="top" width="47"&gt;2.5&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;The WinSAT and SPEC results confirm that the new machines are much faster than our naive expectations, even for benchmarks such as CINT2006 that cannot take advantage of the extra cores. So what were we missing? &lt;/p&gt;  &lt;p&gt;&lt;b&gt;Using CPU-Z to Examining Machine Configurations&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;To answer this, we need a deeper understanding of the configurations of the two systems. &lt;/p&gt;  &lt;p&gt;Unfortunately, finding detailed configuration information isn't always straightforward. For example, we know that level two (L2) cache size impacts performance, but Windows doesn't report it, and it's not easy to reboot into the BIOS to take a look at cache size when the machine is located in a remote test lab. This is where machine reporting tools like &lt;a href="http://www.cpuid.com/cpuz.php"&gt;CPU-Z&lt;/a&gt; come in. You can run CPU-Z remotely on an unknown machine and get back a nicely formatted HTML report showing exactly what the hardware is. Here's a deeper look at our old and new systems:&lt;/p&gt;  &lt;table cellspacing="0" cellpadding="2" width="408" border="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="155"&gt;&lt;strong&gt;Feature&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="118"&gt;&lt;strong&gt;Old Machine&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="141"&gt;&lt;strong&gt;New Machine&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="155"&gt;CPU name&lt;/td&gt;        &lt;td valign="top" width="118"&gt;Pentium D 830          &lt;br /&gt;(“Smithfield”)&lt;/td&gt;        &lt;td valign="top" width="141"&gt;Xeon X5355          &lt;br /&gt;(“Clovertown”)&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="155"&gt;CPU speed&lt;/td&gt;        &lt;td valign="top" width="118"&gt;3.00 GHz&lt;/td&gt;        &lt;td valign="top" width="141"&gt;2.66 GHz&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="155"&gt;Number of cores&lt;/td&gt;        &lt;td valign="top" width="118"&gt;2&lt;/td&gt;        &lt;td valign="top" width="141"&gt;4&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="155"&gt;L1 cache (per core)&lt;/td&gt;        &lt;td valign="top" width="118"&gt;16 KB&lt;/td&gt;        &lt;td valign="top" width="141"&gt;32 KB&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="155"&gt;L2 cache (total)&lt;/td&gt;        &lt;td valign="top" width="118"&gt;2 MB&lt;/td&gt;        &lt;td valign="top" width="141"&gt;8 MB&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="155"&gt;System RAM&lt;/td&gt;        &lt;td valign="top" width="118"&gt;1 GB DDR2&lt;/td&gt;        &lt;td valign="top" width="141"&gt;4 GB DDR2&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&lt;b&gt;Using BCDEdit to Disable Cores&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;Now we can try to tease out the relative impacts of the many changes from the old configurations the new configurations. The first and easiest step is to disable two out of four cores on a new machine, to enable a fairer &amp;quot;apples to apples&amp;quot; comparison of cores between old and new machines.&lt;/p&gt;  &lt;p&gt;To do this we used the Windows BCDEdit tool, which replaces the old method of editing BOOT.INI by hand. Here we were particularly concerned with the order in which cores are disabled. This is important because the 8 MB of L2 cache in the Xeon “Clovertown” processors is divided: two of the four cores share 4 MB, and the other two cores share the other 4 MB. To keep our benchmark comparisons as fair as possible, we wanted to make sure that only one of the L2 caches was in use after disabling two cores. We used CPU-Z again after rebooting to confirm this.&lt;/p&gt;  &lt;p&gt;Now we were in a position to do a fairer “cores to cores” comparison between the old and new machines. Here's a summary from WinSAT: &lt;/p&gt;  &lt;table cellspacing="0" cellpadding="2" width="422" border="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="189"&gt;&lt;strong&gt;Benchmark&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="89"&gt;&lt;strong&gt;Old Machine&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="95"&gt;&lt;strong&gt;New (2 cores)&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="47"&gt;&lt;strong&gt;Speedup&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="189"&gt;CPU – Compression (MB/s)&lt;/td&gt;        &lt;td valign="top" width="89"&gt;70.5&lt;/td&gt;        &lt;td valign="top" width="95"&gt;131.9&lt;/td&gt;        &lt;td valign="top" width="47"&gt;1.9&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="189"&gt;CPU – Encryption (MB/s)&lt;/td&gt;        &lt;td valign="top" width="89"&gt;52.3&lt;/td&gt;        &lt;td valign="top" width="95"&gt;69.7&lt;/td&gt;        &lt;td valign="top" width="47"&gt;1.3&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="189"&gt;Memory Bandwidth (MB/s)&lt;/td&gt;        &lt;td valign="top" width="89"&gt;4,041&lt;/td&gt;        &lt;td valign="top" width="95"&gt;3,360&lt;/td&gt;        &lt;td valign="top" width="47"&gt;0.8&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;Now we can really see the advantage of the latest processors – on a core-for-core basis, they are 1.3-1.9x faster on the CPU-intensive WinSAT benchmarks, despite having lower clock frequencies.&lt;/p&gt;  &lt;p&gt;Good, now on to the next… wait a second. Look at that memory bandwidth result. Our new machines have &lt;i&gt;less&lt;/i&gt; memory bandwidth than the old machines? That doesn't look right: although memory performance hasn't been keeping pace with CPU speeds, it &lt;i&gt;has&lt;/i&gt; been improving over time. Compared to a three-year-old machine, we'd expect these new machines to have slightly better memory bandwidth, and definitely not worse. What gives?&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Memory Channels&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;A primary limiting factor to memory bandwidth is the number of memory channels that are in use. And this turns out to be the problem here: although the new machines have four memory channels and eight memory slots, only two of those slots are filled, because the vendor supplied us with two 2 GB memory modules per machine. This maximizes future expansion potential – we can take the machine up to 16 GB without throwing away any of our initial investment in memory. But in the meantime using two memory slots limits us to two memory channels in use. If instead we had four 1 GB memory modules we'd have four memory channels in use, improving memory interleaving from 2:1 to 4:1 and increasing memory bandwidth. To confirm this, we populated four memory slots on one of the new machines (going from 4 GB to 8 GB) and reran WinSAT:&lt;/p&gt;  &lt;table cellspacing="0" cellpadding="2" width="422" border="0"&gt;&lt;tbody&gt;     &lt;tr&gt;       &lt;td valign="top" width="189"&gt;&lt;strong&gt;Benchmark&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="89"&gt;&lt;strong&gt;2 channels&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="95"&gt;&lt;strong&gt;4 channels&lt;/strong&gt;&lt;/td&gt;        &lt;td valign="top" width="47"&gt;&lt;strong&gt;Speedup&lt;/strong&gt;&lt;/td&gt;     &lt;/tr&gt;      &lt;tr&gt;       &lt;td valign="top" width="189"&gt;Memory Bandwidth (MB/s)&lt;/td&gt;        &lt;td valign="top" width="89"&gt;3,360&lt;/td&gt;        &lt;td valign="top" width="95"&gt;4,134&lt;/td&gt;        &lt;td valign="top" width="47"&gt;1.2&lt;/td&gt;     &lt;/tr&gt;   &lt;/tbody&gt;&lt;/table&gt;  &lt;p&gt;&lt;b&gt;Conclusions&lt;/b&gt;&lt;/p&gt;  &lt;p&gt;It's always possible to run more experiments to further isolate and explain benchmark results, but after a while you reach a point of diminishing returns. With the results we have so far, we can already draw some useful conclusions. &lt;/p&gt;  &lt;p&gt;The first conclusion is that our naive explanations greatly underestimated just how much better the newer processors are at executing real benchmarks, despite their slower clock speeds. The results from WinSAT and SPEC clearly show this, with core-to-core performance that is 1.3-1.9x faster on the new machines, depending on the benchmark. &lt;/p&gt;  &lt;p&gt;This is perhaps the most important lesson for developers to learn: clock speeds are no longer a good indicator of true performance. Although clock speeds have plateaued, processor designers continue to find ways to make each new generation significantly faster than the last. In our case, the old machines have Pentium D processors (“Smithfield”), while the new machines have Xeon 5-series processors (“Clovertown”).&amp;#160; And while the newer processors have slightly slower clock speeds, their micro-architecture executes more instructions per clock cycle. &lt;/p&gt;  &lt;p&gt;The second conclusion is that it's very hard to perform fair comparisons. The two machines have several configuration differences, including clock frequency, number of cores, core micro-architecture, cache sizes, bus speed, memory size and speed, and so on. We showed an example of isolating the effect of just one of these differences, the number of cores, using the BCDEdit tool. Isolating the effect of every single difference would require much more effort.&lt;/p&gt;  &lt;p&gt;Indeed, some of these differences are interrelated, and it is hard to change one without affecting another. For example, CPU architects make their micro-architecture design decisions based on cache sizes. Now imagine a hypothetical experiment that tried to isolate the effect of L2 cache size by giving each core just 1 MB of cache. This would be especially hard on the newer processors, which have been designed on the assumption that they have 2 MB of L2 cache per core&lt;a href="file://tkzaw-pro-13/#_ftn1_6097" name="_ftnref1_6097"&gt;[1]&lt;/a&gt;. In trying to perform a fairer comparison, we would have actually handicapped one system!&lt;/p&gt;  &lt;p&gt;Our final conclusion is that it truly pays to benchmark and compare systems. In our case, the simplest possible benchmark (WinSAT) showed an unexpected memory bandwidth loss, which we then traced back to a machine mis-configuration. So that was the final pleasant surprise: if we hadn't gotten curious about why the new machines were so much faster, we would never have found that they could be faster still!&lt;/p&gt;  &lt;p&gt;David Berg    &lt;br /&gt;Sunny Egbo     &lt;br /&gt;Jonathan Hardwick     &lt;br /&gt;Peter Okonski&lt;/p&gt;  &lt;hr align="left" width="33%" size="1" /&gt;  &lt;p&gt;&lt;a href="file://tkzaw-pro-13/#_ftnref1_6097" name="_ftn1_6097"&gt;[1]&lt;/a&gt; Because two cores share a single 4 MB L2 cache on the Clovertown processors, the exact size of the cache that is used by each core is not fixed at 2 MB per core; the use will vary during program execution. Cache hungry threads might get more of the cache, while less cache hungry threads get less. Even when two cache hungry threads run on the two cores, their memory hotspots are asynchronous; thus, the net effect is that each thread gets more of the cache when they need it and less when they don’t need it.&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8618468" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ddperf/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Visual+Studio/default.aspx">Visual Studio</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Performance+testing/default.aspx">Performance testing</category></item><item><title>Visual Studio Performance Testing -- Noise is Enemy #1</title><link>http://blogs.msdn.com/ddperf/archive/2008/05/20/visual-studio-performance-testing-noise-is-enemy-1.aspx</link><pubDate>Tue, 20 May 2008 03:25:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8520260</guid><dc:creator>MarkBFriedman</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/ddperf/comments/8520260.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ddperf/commentrss.aspx?PostID=8520260</wfw:commentRss><description>&lt;P class=MsoNormal&gt;Performance testing is essential to our quest to make Visual Studio provide a highly responsive user experience.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;We do performance testing early and often. Before a new feature is checked into the main branch, a test build is created, and 100 to 200 tests are run to assess performance. These tests include scenarios for start-up, debugging, project build, editor interactions, and much more.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;This sounds like a lot of work. Well, it is. But there’s even more work that takes place before a single bit is tested. &lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;Here’s a brief summary of what we do. First, we work with Visual Studio product units (e.g., C#, debugger, XML editor, and about 30 others) to identify performance-sensitive user scenarios and to set response time goals for these scenarios. The product teams develop test cases for their scenarios. Our team, Developer Division Performance Engineering, reviews these tests to see if they meet certain acceptance criteria. Once a test has been accepted, it is incorporated into the division-wide suite that is used to assess if performance has degraded on a check-in. We call this a performance regression.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;Of course, we need a test infrastructure as well. We have a hundred machines dedicated to performance testing. We also have automation software that runs tests, analyzes the results, and notifies product teams of performance regressions. Initially, our perspective was that automating performance testing has unique requirements that demand a unique infrastructure. Now, we recognize that performance testing has much in common with functional testing. Further, we want to use features such as load testing and test case management that are part of Visual Studio Team Test, a direction that is being pursued by Developer Division as a whole (in part to “dog food” Visual Studio features).&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;During a performance test, we collect a large number of PerfMon counters, CPU profiles, ETW traces, and more. However, the decision whether there is a regression (and hence if the changes can be checked-in) currently depends on just two metrics – CPU Time and Elapsed Time. If either of these is “significantly” higher than they are supposed to be, the check-in is rejected. (We deal with memory and I/O in other ways.)&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;But what constitutes significantly higher CPU Time and Elapsed Time? This question has been a source of animated discussions with developers eager to check-in their changes.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;In the past, we used an adaptive but somewhat opaque approach to establish regression thresholds. The adaptive part allowed us to automatically reset performance baselines for regression thresholds, but it was done in a complex way. As a result, there were many times when developers contested that there was a performance regression because there was confusion about the performance baseline. One of the lessons we learned is that effective performance testing requires a transparent methodology for setting regression thresholds.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;Our latest approach is to periodically set regression thresholds based on the results of multiple runs of the same baseline build. This provides a clear basis for comparing performance results on check-in.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;This approach has another advantage as well – it allows us to eliminate certain sources of noise. For example, variations in the chip sets used on our test machines caused some machines to have very long I/O times (in excess of 20% longer). As a result, we developed a machine acceptance procedure whereby we eliminate these anomalous machines. There are concerns about tests as well. For example, some tests use very little CPU and so a seemingly significant increase in CPU Time can be due to very small changes in the way context switching takes place during the run. Indeed, even for CPU bound work, we have noticed CPU Time can vary by 5% to 10% due to changes in the way context switching occurs.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;These are all examples of test noise.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Properly addressing test noise is critical to the effectiveness of the performance engineering process. Nothing is more frustrating to a developer than to spend several days pursuing a performance bug that turns out to be noise.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;Test noise is any characteristic of a test that makes it so that CPU Time and/or Elapsed Time vary so much that we can’t reliably determine if performance has improved or degraded. To assess whether a test is too noisy, we run it multiple times on the same machine with a baseline build, and see if we get similar results. By similar, we mean that results do not vary more than 10% from the mean of the run.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;Below are results from a test with little noise for CPU Time. The horizontal axis shows the run, and the vertical axis is CPU Time in milliseconds. The asterisks are test executions or iterations that are done repeatedly within a run. In most cases, we do not include the first iteration because it often differs a great deal from the others due to initialization effects. (Of course, for start-up scenarios, we only use the first iteration!)&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;&lt;IMG style="WIDTH: 560px; HEIGHT: 420px" height=420 src="http://5l3vgw.bay.livefilestore.com/y1pzbrt-s_HGnnVxDeLI_bb0hHTHsWSQwj4-NLU5DhVrHxp2lisCpXA7_SlBxUDKNasPS2bZdp5Z__lOEJfbWlL6A/noisytest-image001.gif" width=560 align=baseline vspace=3 mce_src="http://5l3vgw.bay.livefilestore.com/y1pzbrt-s_HGnnVxDeLI_bb0hHTHsWSQwj4-NLU5DhVrHxp2lisCpXA7_SlBxUDKNasPS2bZdp5Z__lOEJfbWlL6A/noisytest-image001.gif"&gt;&lt;/P&gt;
&lt;P class=MsoNormal&gt;We look for several things to see if a test is too noisy. As in the figure above, we want all of the performance counters to lie within the range of plus or minus 10% of the mean. Also, we look for a fairly even scatter of performance counters below and above the mean, but with no discernable pattern. If there is a pattern, it suggests that something is systematically biasing the test and needs to be fixed.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;Now, let’s look at some noisy tests. One case is that the test may actually be addressing more than one scenario with very different performance characteristics. Here’s an example of a test that appears to have this multiple scenario character:&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;&lt;IMG style="WIDTH: 560px; HEIGHT: 420px" height=420 src="http://5l3vgw.bay.livefilestore.com/y1pzbrt-s_HGnnEmoOhaPSlOjwjp2ZqcqX-lFuuzFzYXHKkrvTaOHrE7Ht60uuD0nmiDGVDWMG_J9CCJkRDLhfN-w/noisytest-image002.gif" width=560 align=baseline vspace=3 mce_src="http://5l3vgw.bay.livefilestore.com/y1pzbrt-s_HGnnEmoOhaPSlOjwjp2ZqcqX-lFuuzFzYXHKkrvTaOHrE7Ht60uuD0nmiDGVDWMG_J9CCJkRDLhfN-w/noisytest-image002.gif"&gt;&lt;/P&gt;
&lt;P class=MsoNormal&gt;Another possibility is that the test may occasionally do something that takes a lot longer, or the test may be so short that it is subject to intermittent activities occurring in the OS (e.g., lazy writes) or the CLR (e.g., garbage collection). Here’s an example of a test result that appears to have this kind of problem:&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;&lt;IMG style="WIDTH: 560px; HEIGHT: 420px" height=420 src="http://5l3vgw.bay.livefilestore.com/y1pzbrt-s_HGnkae2XNCAk38p-4RsGAHecjTZHm5JVeL5Jyl_Lz0DEjpCJqQt69Qwwg9yUacbI_tGte61y8ghZzMg/noisytest-image003.gif" width=560 align=baseline vspace=3 mce_src="http://5l3vgw.bay.livefilestore.com/y1pzbrt-s_HGnkae2XNCAk38p-4RsGAHecjTZHm5JVeL5Jyl_Lz0DEjpCJqQt69Qwwg9yUacbI_tGte61y8ghZzMg/noisytest-image003.gif"&gt;&lt;/P&gt;
&lt;P class=MsoNormal&gt;In this case, there are two extreme values. At first glance, it may seem that there is another problem as well since several of the iterations have values below the “mean-10%” line. However, this is an artifact of having two large values that make the mean larger.&lt;/P&gt;
&lt;P class=MsoNormal&gt;It’s hard work to do all of this. But performance testing was central to performance gains in Visual Studio 2008 compared with previous releases. And, with the help of changes in our handling of test noise, more performance improvements are on the way. &lt;/P&gt;
&lt;P class=MsoNormal&gt;We will share the best practices for performance testing that we develop for Visual Studio (and other Developer Division products) with our customers. Longer term, we hope to incorporate these best practices into Visual Studio tools so that you can be more effective with your performance testing.&lt;/P&gt;
&lt;P class=MsoNormal mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal&gt;by Joe Hellerstein, Ph.D.&lt;/P&gt;
&lt;P class=MsoNormal&gt;Senior Architect, Developer Division Performance Engineering&lt;/P&gt;
&lt;P class=MsoNormal&gt;Joe Hellerstein is the author of &lt;EM&gt;Feedback Control of Computer Systems&lt;/EM&gt;, published by Wiley-IEEE Press in 2004. He had a long and distinguished career at IBM Research prior to joining Microsoft in 2006.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8520260" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ddperf/archive/tags/Performance+Engineering/default.aspx">Performance Engineering</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Visual+Studio/default.aspx">Visual Studio</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Performance+testing/default.aspx">Performance testing</category></item></channel></rss>