<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>David Kline : Testing</title><link>http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx</link><description>Tags: Testing</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Microsoft Tester Center open for business</title><link>http://blogs.msdn.com/davidklinems/archive/2007/10/22/microsoft-tester-center-open-for-business.aspx</link><pubDate>Mon, 22 Oct 2007 23:31:02 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:5608462</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/5608462.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=5608462</wfw:commentRss><description>&lt;p&gt;&lt;a href="http://blogs.msdn.com/alanpa/"&gt;Alan Page&lt;/a&gt; announced today that the new &lt;a href="http://msdn2.microsoft.com/en-us/testing/default.aspx"&gt;Microsoft Tester Center&lt;/a&gt; site was live.&amp;#xA0; This is very cool.&amp;#xA0; What I love about this site is summed up in three words... it's about &lt;a href="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx"&gt;testing&lt;/a&gt;.    &lt;br /&gt;    &lt;br /&gt;I encourage anyone interested in learning more about testing and what testers do to check out the new &lt;a href="http://msdn2.microsoft.com/en-us/testing/default.aspx"&gt;site&lt;/a&gt; -- especially the whiteboard videos.    &lt;br /&gt;    &lt;br /&gt;Enjoy!    &lt;br /&gt;-- DK    &lt;br /&gt;    &lt;br /&gt;&lt;font face="Arial" size="1"&gt;Disclaimer(s):     &lt;br /&gt;This posting is provided &amp;quot;AS IS&amp;quot; with no warranties, and confers no rights.&lt;/font&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=5608462" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item><item><title>Knowing when to Ship: Part 2 - Interpreting Quality Metrics</title><link>http://blogs.msdn.com/davidklinems/archive/2007/08/27/knowing-when-to-ship-part-2-interpreting-quality-metrics.aspx</link><pubDate>Mon, 27 Aug 2007 23:39:40 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4597346</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/4597346.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=4597346</wfw:commentRss><description>&lt;p&gt;In the first installment of this series, I detailed a list of key &lt;a href="http://blogs.msdn.com/davidklinems/archive/2007/07/30/knowing-when-to-ship-part-1-quality-metrics.aspx"&gt;metrics&lt;/a&gt; used to determine when a product is ready to ship.&amp;nbsp; Today, I'm going to talk about how I interpret these metrics.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Avoid out of context metrics&lt;/strong&gt;&lt;br&gt;The first thing to remember when looking at quality metrics is to look at the data in groups (at least pairs).&amp;nbsp; Individual metrics can be misleading if taken out of context.&amp;nbsp; Let's consider pass rate, for example.&amp;nbsp; Deciding to ship a product, based on pass rate only is a scary proposition.&amp;nbsp; Without understanding how much of the product was covered by the tests, it is easy to have a false sense of security about the product's quality and ship it too soon.&amp;nbsp; The same is true for code coverage.&amp;nbsp; Just because code was called does not mean it behaved correctly.&amp;nbsp; Always consider the pass rate as well.&lt;br&gt;&lt;br&gt;Warning sign(s):&lt;/p&gt; &lt;ul&gt; &lt;li&gt;High pass rate (~100%) and low code coverage (&amp;lt;50%)&lt;/li&gt; &lt;li&gt;High code coverage (ex: 90%) and low pass rate&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;strong&gt;Beware of large amounts of code volatility / churn&lt;/strong&gt;&lt;br&gt;To me, the exception that proves the rule of "avoid out of context metrics" is code volatility / churn.&amp;nbsp; Let me take that back...&amp;nbsp; there is still contextual data to consider when looking at code volatility: time.&amp;nbsp; When looking at code volatility, be mindful of where the product is in it's schedule.&amp;nbsp; Is it in the initial development phase?&amp;nbsp;&amp;nbsp;In stabilization?&amp;nbsp; The middle of the final test pass?&amp;nbsp; Code changes.&amp;nbsp;&amp;nbsp;It's not a bad thing,&amp;nbsp;its how bugs get fixed.&amp;nbsp; The important thing to remember is that the more change in the code, the more testing is required to ensure the product meets the quality requirements.&lt;br&gt;&lt;br&gt;Warning sign(s):&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Large amounts of code change close to release (including betas)&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;strong&gt;Beware of low code coverage percentage&lt;br&gt;&lt;/strong&gt;Though I spoke briefly about code coverage in the "avoid out of context metrics" section, it deserves some additional (pardon the pun) coverage here.&amp;nbsp; In my &lt;a href="http://blogs.msdn.com/davidklinems/archive/2007/07/30/knowing-when-to-ship-part-1-quality-metrics.aspx"&gt;earlier post&lt;/a&gt;, I mentioned that I consider code coverage to be a measurement of risk.&amp;nbsp; It is this reason that I advise careful tracking of the code coverage of your product.&lt;br&gt;&lt;br&gt;At MEDC, I said that I code that has not been covered cannot have it's quality quantified.&amp;nbsp; I personally consider uncovered code to likely be buggy.&amp;nbsp; Even if the code has been reviewed carefully by a group of very talented developers, there may be bugs hiding in the code.&amp;nbsp; Race conditions are a prime example of a bug that arises in code that "looks right", but does not actually behave correctly.&lt;br&gt;&lt;br&gt;Additionally, as &lt;a href="http://blogs.msdn.com/ryanms"&gt;Ryan Chapman&lt;/a&gt; has stated in his MEDC performance sessions, the larger a binary is, the longer it takes to load.&amp;nbsp; If functionality is not necessary for the application, as possibly evidenced by the code not being covered, it acts as "excess baggage" and can lead to longer load times.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Performance data&lt;br&gt;&lt;/strong&gt;Interpreting performance data is, to me, an interesting topic.&amp;nbsp; When I approach performance, I have the following questions in mind:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;On what device was this data collected&lt;/li&gt; &lt;li&gt;How long did the test run / how many iterations of the scenario&lt;/li&gt; &lt;li&gt;What reporting technique was used (fastest or "gymnastics" style)&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;In my mind, the most important of these questions is the first one (what device).&amp;nbsp; When comparing performance data, it is very important to compare results on the &lt;strong&gt;&lt;em&gt;same&lt;/em&gt;&lt;/strong&gt; device.&amp;nbsp; For comparing performance, I define "device" to be the pairing of hardware plus operating system.&amp;nbsp; The same hardware running different versions of the operating system (ex: Windows Mobile 5 and Windows Mobile 6) can display noticeable performance differences.&lt;br&gt;&lt;br&gt;I talk in more detail about performance testing, including reporting techniques,&amp;nbsp;&lt;a href="http://blogs.msdn.com/davidklinems/archive/2007/06/08/testing-performance.aspx"&gt;here&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;It is also useful to couple code coverage data with performance results.&amp;nbsp; By running the tests a second time, under code coverage, it helps to verify that the intended scenario is actually being measured.&lt;br&gt;&lt;br&gt;Warning sign(s):&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Code coverage data does not match intended scenario&lt;/li&gt; &lt;li&gt;Measuring micro-benchmarks using too few (&amp;lt; 10000) iterations or too short (&amp;lt; 1 sec) total run time&lt;/li&gt;&lt;/ul&gt; &lt;p&gt;Take care,&lt;br&gt;-- DK&lt;br&gt;&lt;br&gt;&lt;font face="Arial" size="1"&gt;Disclaimer(s):&lt;br&gt;This posting is provided "AS IS" with no warranties, and confers no rights.&lt;/font&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=4597346" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item><item><title>Knowing when to Ship: Part 1 - Quality Metrics</title><link>http://blogs.msdn.com/davidklinems/archive/2007/07/30/knowing-when-to-ship-part-1-quality-metrics.aspx</link><pubDate>Mon, 30 Jul 2007 23:22:19 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4135676</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/4135676.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=4135676</wfw:commentRss><description>&lt;p&gt;One of the most common questions I hear towards the end of a product cycle is "are we ready to ship?".&amp;nbsp; The answer to the question depends on many factors, most importantly the status of the quality metrics.&lt;br&gt;&lt;br&gt;Today, I would like to spend some time talking about some of the key quality metrics.&amp;nbsp; In later parts of this series (I expect a couple more), I will go into some more detail including: interpretation of quality metrics and how to identify warning signs.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Pass Rate&lt;br&gt;&lt;/strong&gt;Pass rate is one of the most straight forward metrics.&amp;nbsp; This metric measures the percentage of tests which passed during the test run.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Bug / Defect Counts&lt;br&gt;&lt;/strong&gt;When looking at bug counts as a quality metric, it is important to include resolved as well as active defects.&amp;nbsp; Active bugs (the number of known defects in the product) are a well understood metric.&amp;nbsp; &lt;br&gt;&lt;br&gt;The more interesting number, for me, is the count of resolved (not yet closed) bugs -- especially those resolved as "fixed".&amp;nbsp; For each of the bugs that have been fixed, a change has been made to the product.&amp;nbsp; Every change introduces a risk of some part of the product being broken.&amp;nbsp; It is very important to keep a close eye on the resolved bug and to make sure each one of them is re-tested to ensure that the bug was indeed fixed and that no other issues arose due to the change(s).&lt;br&gt;&lt;br&gt;One related metric, which I consider to be part of the Bug Counts is the percentage of reactivated bugs.&amp;nbsp; These are the ones that, once resolved, were reactivated because the original issue was not completely fixed.&amp;nbsp; As a rule, I only reactivate a bug if the issue described within was not fixed.&amp;nbsp; Any additional issues related to the fix, such as a new bug being introduced, should be tracked in a separate entry in the bug database.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Mean Time To Failure (MTTF) / Mean Time Between Failures (MTBF)&lt;br&gt;&lt;/strong&gt;One of the most important aspects of testing a product is placing it under stress.&amp;nbsp; A while back, I described my two classifications of &lt;a href="http://blogs.msdn.com/davidklinems/archive/2007/07/10/stress-testing-for-fun-and-profit.aspx"&gt;stress tests&lt;/a&gt; (long haul and short haul) and mentioned that the results of long haul stress testing tend toward the product's MTTF/MTBF metric.&amp;nbsp;&amp;nbsp; These results should be tracked between stress test passes (looking for changes that cause robustness to decrease) and measured against the product specification.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Performance&lt;br&gt;&lt;/strong&gt;As with stress results, it is important to track performance data between test passes and measure against the product specification.&amp;nbsp; I discuss measuring performance &lt;a href="http://blogs.msdn.com/davidklinems/archive/2007/06/08/testing-performance.aspx"&gt;here&lt;/a&gt;.&lt;br&gt;&lt;br&gt;The above metrics directly measure the quality of a product: how well it works, how much work is left to do, how long it will run and how fast.&amp;nbsp; The next two I will talk about are often discussed in quality meetings and status reports.&amp;nbsp; While not direct measurements of quality, they are still very important as they are measurements of&amp;nbsp;risk.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Code Coverage&lt;br&gt;&lt;/strong&gt;Code coverage is one of the best measurements of the testing being performed on a product.&amp;nbsp; If the code coverage data is very low (below 50%) the product is not&amp;nbsp;being adequately&amp;nbsp;tested.&amp;nbsp; Any portion of a product that is not being covered by testing is a risk.&amp;nbsp; Blocks of code that are not being tested cannot have their quality properly assessed.&amp;nbsp; If the code has not been exercised, there is no way to be sure of the quality.&amp;nbsp; Even when &lt;a href="http://blogs.msdn.com/davidklinems/archive/2006/01/24/517073.aspx"&gt;code is reviewed&lt;/a&gt; carefully (and appears correct), it does not always do what the developer intended.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Code Volatility / Code Churn&lt;br&gt;&lt;/strong&gt;By measuring the amount of change in the product from one build to the next, the amount of testing required to validate quality can be assessed.&amp;nbsp; The larger the amount of change, the more testing is required to ensure the quality of the product.&amp;nbsp; As a product gets closer to the target ship date, the changes in the code should decrease, leading to a stable product.&amp;nbsp; I like to think of the weeks leading up to releasing a product like watching an airplane&amp;nbsp;land.&amp;nbsp; A smooth, gradual descent leads to a very comfortable landing.&amp;nbsp; A steep descent is significantly less comforting.&amp;nbsp; &lt;br&gt;&lt;br&gt;Take care!&lt;br&gt;-- DK&lt;br&gt;&lt;br&gt;&lt;font face="Arial" size="1"&gt;Disclaimer(s):&lt;br&gt;This posting is provided "AS IS" with no warranties, and confers no rights.&lt;/font&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=4135676" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item><item><title>Stress Testing for Fun and Profit</title><link>http://blogs.msdn.com/davidklinems/archive/2007/07/10/stress-testing-for-fun-and-profit.aspx</link><pubDate>Tue, 10 Jul 2007 21:20:46 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3801622</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/3801622.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=3801622</wfw:commentRss><description>&lt;p&gt;For me, one of the most enjoyable parts of being a test developer is writing stress tests.&amp;nbsp; Stress testing can be a lot of fun and, contrary to it's name, a good stress reliever (for the test developer).&amp;nbsp; During the &lt;a href="http://blogs.msdn.com/davidklinems/archive/2007/01/26/medc-2007-real-world-testing-of-managed-smart-device-applications.aspx"&gt;most recent MEDC&lt;/a&gt;, I spoke briefly about stress testing and defined it as coming in two forms: "Long Haul" and "Short Haul".&lt;br&gt;&lt;br&gt;&lt;strong&gt;Long Haul Stress&lt;/strong&gt;&lt;br&gt;In long haul stress, the test is designed to emulate customer scenarios.&amp;nbsp; If a scenario is expected to be executed three times per second under normal operation, a long haul stress test will perform that scenario on multiple threads for a long time (days/weeks).&amp;nbsp; By using multiple threads and long run times, the Mean-Time-Between-Failures (MTBF) / Mean-Time-To-Failure (MTTF) for the component can be approximated.&lt;br&gt;&lt;br&gt;In addition to simultaneous threads repeating the target scenario, long haul stress test often up the ante by manipulating other aspects of the system.&amp;nbsp; Available storage (hard disk / flash) and RAM are common targets for increasing the stress level of a long haul test.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Short Haul Stress&lt;/strong&gt;&lt;br&gt;Short haul stress, or "load testing" is my personal favorite type of stress testing.&amp;nbsp; Short haul tests are designed to be extremely stressful and to run for a relatively brief amount of time (24 hours).&amp;nbsp; If a long haul test exercises a scenario three times per second on, say, 10 threads, a short haul version of that same test may perform the scenario 30000 times per second on more than 1000 threads.&amp;nbsp;&amp;nbsp; The same system resource tweaking options used in long haul testing are often used in short haul testing as well.&lt;br&gt;&lt;br&gt;To me, short haul stress is not really scenario based, however.&amp;nbsp; I tend to treat short haul testing as an opportunity to be as hard on the component I am testing as humanly possible.&amp;nbsp; I have been known to throw "everything but the kitchen sink" into my stress tests; every functional area of the component, multiple copies of data files, dozens of threads performing similar tasks, forced errors, etc.&lt;br&gt;&lt;br&gt;At the start of this post, I asserted that stress testing can be fun.&amp;nbsp; While writing short haul stress tests, I have occasionally noticed some odd glances into my office (maybe I should close the door more often) when I watch the app run and let that maniacal laugh creep out.&amp;nbsp; Of course, the best reward of stress testing is to bring a component's developer into the office to show off the latest bug and hear a sound rivaling the most beautiful music... "Your app does &lt;em&gt;&lt;strong&gt;WHAT&lt;/strong&gt;&lt;/em&gt;?!?!".&amp;nbsp; YES!&amp;nbsp; I got one again. :)&lt;br&gt;&lt;br&gt;Take care!&lt;br&gt;-- DK&lt;br&gt;&lt;br&gt;&lt;font face="Arial" size="1"&gt;Disclaimers:&lt;br&gt;This posting is provided "AS IS" with no warranties, and confers no rights.&lt;/font&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3801622" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item><item><title>Testing performance</title><link>http://blogs.msdn.com/davidklinems/archive/2007/06/08/testing-performance.aspx</link><pubDate>Fri, 08 Jun 2007 22:41:43 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3167951</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/3167951.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=3167951</wfw:commentRss><description>&lt;p&gt;A couple of weeks ago, I described the &lt;a href="http://blogs.msdn.com/davidklinems/archive/2007/05/24/testing-priorities.aspx"&gt;test development priorities&lt;/a&gt; for the .NET Compact Framework team.&amp;nbsp; As part of that discussion, I stated that performance should be tested in parallel with the other forms of testing (unit, customer scenarios, etc).&amp;nbsp; Today, I would like to spend some time talking about performance testing.&lt;br&gt;&lt;br&gt;&lt;strong&gt;General considerations&lt;br&gt;&lt;/strong&gt;When testing performance, there are a few key things to keep in mind.&amp;nbsp; &lt;br&gt;&lt;br&gt;First, it is important to measure the intended scenario.&amp;nbsp; In the past, I have seen a number of tests that were intended to time a data processing operation &amp;nbsp;(ex: sort algorithm) that&amp;nbsp;inadvertently&amp;nbsp;included the time to load the data from the file system.&amp;nbsp;&amp;nbsp;As a result, the&amp;nbsp;performance data being reported was quite inaccurate.&lt;br&gt;&lt;br&gt;Second, since today's operating systems are multi-tasking, it is recommended to test performance on as clean of a system (fewest running processes) as possible and to run the performance test multiple times to ensure accurate data.&amp;nbsp; Depending on the scenario being timed, the jitter (variance in results) can be significant and important to take into account when reporting results.&amp;nbsp; I will talk a bit more about performance reporting shortly.&lt;br&gt;&lt;br&gt;Lastly, I have found that most meaningful performance measurements for applications I have written are when the application is in sustained operation.&amp;nbsp; What I mean by this is that any Just-In-Time compilation has already occurred and I am measuring the typical run time performance of my scenario.&amp;nbsp; Most often, code is Just-In-Time compiled once per instance of the application, so the performance impact is felt only the first time the method is called.&amp;nbsp; There are some exceptions to this rule of thumb, and I have found that in those cases, even when pre-JIT compiling&amp;nbsp;the scenario, the results are reasonably accurate.&amp;nbsp; What are these exceptions?&amp;nbsp; Memory pressure and moving the application to the background.&amp;nbsp; In each of these cases, the .NET Compact Framework's Garbage Collector will run and may discard JIT compiled code and be forced to recompile when the situation is resolved (memory has been freed, the application is brought to the foreground).&amp;nbsp; In both of these cases, the timing loop will reflect the time spent freeing memory and re-JIT compiling.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Macro-benchmarks&lt;br&gt;&lt;/strong&gt;Macro-benchmarks are performance tests for long duration scenarios.&amp;nbsp; By "long", I mean that the scenario takes a second or more to complete.&amp;nbsp; When testing macro-benchmarks, I typically run the scenario a handful (10-50) times in a loop.&amp;nbsp; Because the scenario takes a significant amount of time to run, the time spent looping does not significantly impact the measurements.&amp;nbsp; When the timing loop is complete, dividing the time spent (number of elapsed ticks) by the number of loop iterations results in the scenario's performance measurement, as shown in the following example.&lt;br&gt;&lt;code&gt;&lt;br&gt;// run the scenario once (to avoid timing the initial Just-In-Time compile)&lt;br&gt;Scenario(data);&lt;br&gt;&lt;br&gt;// timing loop&lt;br&gt;Int32 iterations = 10;&lt;br&gt;Int32 start_ms = Environment.TickCount;&lt;br&gt;for(Int32 i = 0; i &amp;lt; iterations; i++)&lt;br&gt;{&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;}&lt;br&gt;Int32 finish_ms = Environment.TickCount;&lt;br&gt;&lt;br&gt;// calculate the performance&lt;br&gt;Double average_ms= (Double)(finish_ms - start_ms) / (Double)(iterations);&lt;br&gt;&lt;/code&gt;&lt;br&gt;Macro-benchmarks are often based on customer scenarios and improving their performance can often lead to significant gains in customer satisfaction.&amp;nbsp; At times, making improvements to macro-benchmarks requires changes to the scenario implementation.&amp;nbsp; To address this, I highly recommend measuring your scenarios as early as possible in the development of the product.&amp;nbsp; That way, if the performance does not meet the requirements, there is time to write and test the new implementation.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Micro-benchmarks&lt;br&gt;&lt;/strong&gt;Micro-benchmarks are performance tests on the small scale, measuring very short duration operations.&amp;nbsp; To get an accurate measurement, performance tests need to run the scenario a large number of times (I typically run my micro-benchmark scenarios 10,000 or more times).&amp;nbsp; Also, it is important to keep in mind that, unlike macro-benchmarks, the time spent looping can significantly impact the performance measurements.&amp;nbsp; To minimize these impacts, I recommend what I call "playing optimizing compiler" and partially unrolling your timing loop.&amp;nbsp; The example below updates the macro-benchmark example for micro-benchmark testing.&lt;br&gt;&lt;code&gt;&lt;br&gt;// run the scenario once (to avoid timing the initial Just-In-Time compile)&lt;br&gt;Scenario(data);&lt;br&gt;&lt;br&gt;Int32 iterations = 1000;&lt;br&gt;Int32 callsInLoop = 10;&lt;br&gt;&lt;br&gt;// timing loop&lt;br&gt;Int32 start_ms = Environment.TickCount;&lt;br&gt;for(Int32 i = 0; i &amp;lt; iterations; i++)&lt;br&gt;{&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scenario(data);&lt;br&gt;}&lt;br&gt;Int32 finish_ms = Environment.TickCount;&lt;br&gt;&lt;br&gt;// calculate the performance&lt;br&gt;Double average_ms= (Double)(finish_ms - start_ms) / (Double)(iterations * callsInLoop);&lt;br&gt;&lt;/code&gt;&lt;br&gt;Since the scenario is very short, this test runs the code 10,000 times and partially unrolls the loop (1,000 iterations of 10 calls).&amp;nbsp; This significantly minimizes the impact of the time spent looping.&amp;nbsp; In one example scenario I tested, the time spent&amp;nbsp;when in a&amp;nbsp;tight loop (10,000&amp;nbsp;iterations of 1 call),&amp;nbsp;was reported as 1 millisecond.&amp;nbsp; When&amp;nbsp;using the timing loop shown above, &amp;nbsp;that time was reduced to 0.95 milliseconds - 5% faster than previously reported.&amp;nbsp; With additional loop unrolling&amp;nbsp;(ex: 500 iterations of 20 calls) we can further improve the accuracy of the measurement.&amp;nbsp; Of course, there is a point of diminishing returns when continued unrolling becomes unrealistic and the improved measurement accuracy is no longer significant.&lt;br&gt;&lt;br&gt;&lt;strong&gt;Reporting performance results&lt;br&gt;&lt;/strong&gt;I mentioned results reporting earlier.&amp;nbsp; When I report performance results, I use one of two methods:&amp;nbsp;raw speed reporting and what&amp;nbsp;I call "gymnastics" reporting.&lt;br&gt;&lt;br&gt;In raw speed reporting, I run my performance test multiple times (the exact number depending on whether or not I am timing a micro- or macro-benchmark) and keeping only the fastest result.&amp;nbsp; This approach helps to factor out the sometimes subtle differences in results when running on multi-tasking operating systems (ex: the scheduler runs a background task) and is closer to the maximum throughput of the code.&lt;br&gt;&lt;br&gt;In "gymnastics" reporting, I again run my performance test multiple times, but this time, I discard the extreme results (fastest and slowest) and average the remaining data.&amp;nbsp; The resulting data is closer to the typical performance that the customer will see during everyday use of the product.&lt;br&gt;&lt;br&gt;Take care!&lt;br&gt;-- DK&lt;br&gt;&lt;br&gt;&lt;font face="Arial" size="1"&gt;Disclaimers:&lt;br&gt;This posting is provided "AS IS" with no warranties, and confers no rights.&lt;/font&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3167951" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item><item><title>Alan Page on Root Cause Analysis</title><link>http://blogs.msdn.com/davidklinems/archive/2007/06/04/alan-page-on-root-cause-analysis.aspx</link><pubDate>Mon, 04 Jun 2007 20:44:26 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3083989</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/3083989.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=3083989</wfw:commentRss><description>&lt;p&gt;I have worked with &lt;a href="http://blogs.msdn.com/alanpa/"&gt;Alan&lt;/a&gt; off and on for a very long time now.&amp;nbsp; When I heard that he gave a Lightning Talk at &lt;a href="http://www.sqe.com/stareast"&gt;STAR East&lt;/a&gt;, the first thing that came to mind was "I wish I could have seen that".&amp;nbsp; Last week, he posted what he talked about (and more?).&amp;nbsp; I very much recommend giving his "&lt;a href="http://blogs.msdn.com/alanpa/archive/2007/05/29/bug-prevention-in-five-minutes.aspx"&gt;Bug prevention in five minutes&lt;/a&gt;" post a read.&amp;nbsp; He presents a great technique for root cause analysis (RCA) -- including one of my favorite analyses (I don't want to be a spoiler here, but it involves lack of sleep).&lt;br&gt;&lt;br&gt;Take care,&lt;br&gt;-- DK&lt;br&gt;&lt;br&gt;&lt;font face="Arial" size="1"&gt;Disclaimers:&lt;br&gt;This posting is provided "AS IS" with no warranties, and confers no rights.&lt;/font&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3083989" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item><item><title>Testing Priorities</title><link>http://blogs.msdn.com/davidklinems/archive/2007/05/24/testing-priorities.aspx</link><pubDate>Fri, 25 May 2007 01:47:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2852810</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/2852810.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=2852810</wfw:commentRss><description>&lt;P&gt;&lt;A href="http://blogs.msdn.com/davidklinems/archive/2007/05/14/what-is-testing.aspx" mce_href="http://blogs.msdn.com/davidklinems/archive/2007/05/14/what-is-testing.aspx"&gt;Last time&lt;/A&gt;, I defined testing as the 'art of mitigating pain'.&amp;nbsp; What I did not talk about is how to prioritize your testing (pain mitigation) efforts.&lt;BR&gt;&lt;BR&gt;On the .NET Compact Framework team, we use the following set of test development priorities.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Unit Tests 
&lt;LI&gt;Build Verification Tests (BVTs) 
&lt;LI&gt;Customer Scenarios / Integration Tests 
&lt;LI&gt;Conformance Tests&lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;Performance, stress and security testing should be performed in parallel with the testing listed in the list above.&amp;nbsp; My basic rules are to start performance, stress and security testing as soon as there is something to test and continue testing throughout the product cycle.&amp;nbsp; The phrase I keep in mind is "measure early, measure often".&amp;nbsp; I will discuss performance and security testing in the near future.&amp;nbsp; Security testing I will leave to the experts and would like to reiterate my &lt;A href="http://blogs.msdn.com/davidklinems/archive/2006/12/07/recommended-reading-iv.aspx" mce_href="http://blogs.msdn.com/davidklinems/archive/2006/12/07/recommended-reading-iv.aspx"&gt;recommendation to read &lt;EM&gt;Hunting Security Bugs&lt;/EM&gt;&lt;/A&gt;, by Tom Gallagher, Bryan Jeffries and Lawrence Landauer (Microsoft Press, ISBN: 0-7356-2187).&lt;BR&gt;&lt;BR&gt;&lt;STRONG&gt;Unit Tests&lt;BR&gt;&lt;/STRONG&gt;What are unit tests?&amp;nbsp; Unit tests are the simple tests that developers perform to ensure, at the most basic level, that their code behaves correctly.&amp;nbsp; These tests typically consist of executing some code once with valid data, once with invalid data and verifying the proper behavior.&amp;nbsp; A simple example is to pass 12 and 4 into function which divides the first value by the second and verify that the result is 3, then calling the same method with 7 and 0 to verify that the proper exception (DivideByZero) is thrown.&amp;nbsp; &lt;BR&gt;&lt;BR&gt;Some developers perform unit testing using the debugger to step through their code and monitor the state of the system.&amp;nbsp; While this is a great approach while developing, I much prefer formalizing the unit tests into scripts or executables that can be automated and run as part of the daily acceptance testing.&lt;BR&gt;&lt;BR&gt;To help make this easier, Visual Studio&amp;nbsp;2005 (Visual Studio 'Orcas', for Smart Device projects)&amp;nbsp;provides a very cool &lt;A href="http://msdn2.microsoft.com/en-us/library/ms182515(vs.80).aspx" mce_href="http://msdn2.microsoft.com/en-us/library/ms182515(vs.80).aspx"&gt;Unit Test development feature&lt;/A&gt; which examines your project and generates test methods in which developers add the appropriate test code.&lt;BR&gt;&lt;BR&gt;&lt;STRONG&gt;Build Verification Tests (BVTs)&lt;/STRONG&gt;&lt;BR&gt;Build Verification Tests are designed to validate that a basic functionality of a new product build is functioning correctly.&amp;nbsp; BVTs are not exhaustive tests, they are the front line test that must pass before additional testing can be performed.&lt;BR&gt;&lt;BR&gt;Build Verification tests are more complex than Unit Tests and cover more of the code.&amp;nbsp; Wherever possilble (always?), BVTs should be automated so that they run consistently each and every time.&lt;BR&gt;&lt;BR&gt;&lt;STRONG&gt;Customer Scenarios and Integration Tests&lt;/STRONG&gt;&lt;BR&gt;If a product cannot be used by customers, the product will likely fail in the market.&amp;nbsp; With that in mind, I assert that Customer Scenarios are the most important tests to write.&amp;nbsp; Bugs found by customer scenario tests should be prioritized very high.&lt;BR&gt;&lt;BR&gt;What are customer scenario tests?&amp;nbsp; These are the tests that are written to behave as much like the target customer as possible.&amp;nbsp; For example, if the product is a class library (like the .NET Compact Framework), customer scenario tests are applications that use the class library to solve a 'real world' problem.&amp;nbsp; Most often, these tests are also integration tests bringing together two or more features (ex: web services and file system) and verify that they behave correctly when used together.&amp;nbsp;&amp;nbsp; It is entirely possible for each feature's tests to pass when run in isolation and fail when other features are added to the test application due to resource contention, memory pressure, etc.&amp;nbsp;&lt;BR&gt;&lt;BR&gt;&lt;STRONG&gt;Conformance Tests&lt;/STRONG&gt;&lt;BR&gt;Conformance tests are feature specific tests that expand upon the unit and build verification tests.&amp;nbsp; Conformance tests come in three flavors: positive, negative and boundary.&lt;BR&gt;&lt;BR&gt;Positive tests exercise the product using "acceptable" data.&amp;nbsp; The result of a positive test is that the product correctly performs the requested task and returns the proper response (ex: 4 + 12 = 16).&lt;BR&gt;&lt;BR&gt;Negative tests (also called error condition tests) exercise the product using "unacceptable" data.&amp;nbsp; These tests are designed to ensure that the product correctly handles erronious situations (invalid data, server not responding, etc).&amp;nbsp; To me, these are some of the most enjoyable tests to write.&lt;BR&gt;&lt;BR&gt;Boundary tests exercise the boundaries of a product.&amp;nbsp; If a product's specification states that values between 15 and 173 are allowed, boundary tests cover values such as 14, 15, 16 to ensure that behavior with data around the lower boundary is correct.&amp;nbsp; Boundary tests are a mixture of positive and negative tests.&lt;/P&gt;
&lt;P&gt;&lt;BR&gt;Take care,&lt;BR&gt;-- DK&lt;BR&gt;&lt;BR&gt;&lt;FONT face=Arial size=1&gt;Disclaimers:&lt;BR&gt;This posting is provided "AS IS" with no warranties, and confers no rights.&lt;BR&gt;The information contained within this post is in relation to beta software.&amp;nbsp; Any and all details are subject to change.&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=2852810" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item><item><title>What is testing?</title><link>http://blogs.msdn.com/davidklinems/archive/2007/05/14/what-is-testing.aspx</link><pubDate>Tue, 15 May 2007 02:02:58 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2635392</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/2635392.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=2635392</wfw:commentRss><description>&lt;p&gt;At MEDC 2007, I had the pleasure of delivering a session entitled 'Real World Testing of Managed Smart Device Applications'.&amp;nbsp; While developing the presentation, I spent some time thinking about a key theme of the talk -- what is testing.&lt;br&gt;&lt;br&gt;I define testing as the art of mitigating pain.&amp;nbsp; Whenever a bug is found after a product has shipped, it hurts.&amp;nbsp; It hurts the customer in loss of productivity (while the bug is investigated / worked around) and it hurts the developer (cost of investigating and fixing in the field, delays work on new versions).&amp;nbsp; It costs much much less to identify and fix bugs during development.&lt;br&gt;&lt;br&gt;When I think of testing as pain mitigation, I think of the auto industry and their crash tests.&amp;nbsp; Crash testing is designed to ensure that the least amount of harm comes to the car's passengers in the event of an accident.&amp;nbsp; When I keep this image in mind, it drives home (no pun intended) the need for quality tests and high levels of product coverage during testing.&lt;br&gt;&lt;br&gt;Take care,&lt;br&gt;-- DK&lt;br&gt;&lt;br&gt;&lt;font face="Arial" size="1"&gt;Disclaimers:&lt;br&gt;This posting is provided "AS IS" with no warranties, and confers no rights.&lt;/font&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=2635392" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item><item><title>Recommended Reading IV</title><link>http://blogs.msdn.com/davidklinems/archive/2006/12/07/recommended-reading-iv.aspx</link><pubDate>Fri, 08 Dec 2006 09:28:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1236926</guid><dc:creator>DavidKlineMS</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/davidklinems/comments/1236926.aspx</comments><wfw:commentRss>http://blogs.msdn.com/davidklinems/commentrss.aspx?PostID=1236926</wfw:commentRss><description>It's been a while since I &lt;A class="" href="http://blogs.msdn.com/davidklinems/archive/2005/09/26/474224.aspx" mce_href="http://blogs.msdn.com/davidklinems/archive/2005/09/26/474224.aspx"&gt;last recommended a book&lt;/A&gt;... in fact, it's been more than a year.&amp;nbsp; &lt;BR&gt;&lt;BR&gt;There are a number of good books on how to write secure code, now there's one on how to make sure that developers have written secure software: Hunting Security Bugs (Microsoft Press, ISBN: 0-7356-2187) by Tom Gallagher, Bryan Jeffries and Lawrence Landauer.&lt;BR&gt;&lt;BR&gt;If, like me, you test software for a living, this book is a must read.&amp;nbsp; If you write production code, I recommend you also read this book (it'll help you prepare for what your test developers are going to do to your product).&lt;BR&gt;&lt;BR&gt;Time for me to get back to reading.&amp;nbsp; Enjoy!&lt;BR&gt;-- DK&lt;BR&gt;&lt;BR&gt;&lt;FONT size=1&gt;Disclaimer(s):&lt;BR&gt;This posting is provided "AS IS" with no warranties, and confers no rights.&lt;/FONT&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=1236926" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Books/default.aspx">Books</category><category domain="http://blogs.msdn.com/davidklinems/archive/tags/Testing/default.aspx">Testing</category></item></channel></rss>