[This note was first published to a Microsoft internal blog in October 2008]

 

It is common to include code coverage goals as key criteria for deciding if is time to exit various milestones in the software development cycle.  This suggests that code coverage numbers must be providing value; why else would so many teams use them?  There appear to be three reasons that code coverage metrics are used as exit criteria:  

  • Code coverage is relatively easy to measure and provides a quantitative value that can be compared against some numeric criteria to determine success or failure.
  • Code coverage is an indication of the quality of the test suite and/or testing efforts. 
  • There is an underlying assumption that code coverage is correlated in some tangible way with product quality.  (That is, teams use code coverage as a surrogate for the admittedly harder to measure product quality.)

Management likes an unambiguous way to make decisions, and code coverage numbers seem to provide exactly the type of hard data that is useful for making a critical ship/no ship decision.  However, don’t be seduced by code coverage numbers.  The numbers must be used with extreme caution as low code coverage values are associated with low quality test suites, but higher code coverage values do not automatically imply a test suite is of high quality.  And the relationship between code coverage and product quality is rarely, if ever validated.  Failing to validate the relationship between code coverage and product quality renders code coverage numbers effectively useless as a measure of product quality. 

 

Only very low code coverage values (at or near 0%) have any meaning without additional investigation:  these values are a clear indication that the test suite is woefully inadequate.  Larger values for code coverage, including 100% coverage, require additional investigation in order to make a useful interpretation of the value.  Code coverage measures quality of the test site along a single dimension:  how much of the product code is exercised.  It does not measure product quality. 

 

Product quality is how well the software meets the user’s needs and expectations.  For the sake of illustration, imagine that product quality can be evaluated and assigned a numeric value from 0 to 100 (worst to best, respectively).   Consider two extreme cases.  In one case, the tests fully exercise product code and verify that the code works as specified, but the product was improperly specified and does not fulfill any of the end user expectations (coverage is 100% and quality is 0).  In the other case, the product does exactly what the customer wants it to do, but the tests are poorly designed and exercise none of the product code (coverage is 0% and quality is 100). 

 

We do not have a clear understanding of the relationship between code coverage and product quality.  If the two are correlated at all, there is no reason to assume the relationship is linear or that it is consistent from one portion of the product to another (without incorrectly assuming that all teams and developers produce code of similar quality and complexity and that all test teams are equally adept at creating tests that provide the same coverage per code complexity).  There are just too many variables to provide confidence in the numbers.

 

If code coverage cannot be effectively used as a surrogate for product quality and only correlates with test suite quality at low coverage values, then when does it add value?  Next time I will discuss some of the scenarios where code coverage is a useful tool in the software development process.