In my experience, many testers tend to identify with certain types of measurements and metrics which on the surface may seem to be desirable to maximize.  These statistics can provide feedback to the tester to indicate that they are making progress and getting something done through the course of their work.  However, when one looks below the surface, “overachieving” within these statistics may in fact be an indicator of an unhealthy software development system.  Let’s take a look at a couple of tester-related statistics which I believe are worthwhile to scrutinize:  Number of Tests, and Number of Bugs Found.

 

  1. Number of Tests

 

I think many agree it is good to provide regression coverage by authoring stable automated tests – it provides a way to repeatedly check that the system has not deviated from a well-known “good” state.  As part of the daily job of the SDET, writing test automation is something that most of us do on a regular basis.  And heck, if writing one test case to provide coverage is good, then writing 100 cases must be better, right?  This line of thought unfortunately seems to be quite common.  I have seen multiple instances of testers being proud of the sheer volume of tests they added to the automation system: 

 

“As part of my feature, I added 450 tests to the BVTs, isn’t it great?”

 

Not necessarily :)   One should consider the ramifications before claiming that “more is always better”:

 

  • Tests written as part of automation suites are usually long-lived and passed down to subsequent owners, and the TCO of each test case is usually (much) greater than the initial cost to author the case.  Each test written has maintenance costs that are compounded over the entire lifetime of the test, this includes such items as:
    • Triage of test failures
    • Changes of test code to accommodate design changes
    • Fixing unreliable/unstable test variations
    • Reporting test results
    • Etc...
  • When multiple tests provide duplicate coverage, the time spent on maintenance of duplicated tests is a sunk cost which prevents that time being spent on other areas with low (or zero) coverage

 

What then should be the tester’s goal when writing automation?  In my opinion, testers should shoot for the “Minimal Maximal Set”, which is coined to describe the minimal set of test cases which provide the maximum amount of coverage.  If you can generate a set of 2 robust test cases which provides the same amount of coverage as another set of 100, the TCO for the minimal set will likely be far less over the lifetime of the cases and will be net positive in terms of team efficiency.

 

  1. Number of Bugs Opened

 

Testers find bugs; it is part of their DNA.  On the surface, it is great to see testers find many defects that would otherwise make their way to the customer.  It is also great to see testers to carry a certain pride in being able to uncover bugs in the process of their work.  However, this is again a case of “more is [not always] better”.  Some types of bugs found again and again can indicate intrinsic problems in the way that we develop and test the system.  For example, consider the following:

 

  • A large quantity of BVT/Pri0 types of defects being found repeatedly can be an indicator that proper defect prevention procedures are not sufficient to catch low hanging fruit before developers check in their new code.  In essence, teams should never be in a situation where BVT blockers are happening that could have been easily prevented by a check-in test or code review procedure.  We should push to find/prevent issues as early as possible in the software development process as it is the most efficient way to ship a quality product
  • A pattern of low quality bugs (for example, a preponderance of "won't fix" or "not repro" bugs) may indicate that the test team is wasting the feature team's time with non-issues.  Sometimes this can also indicate that the test team has the wrong priorities in mind when testing the product
  • Large numbers of base-case functional bugs found late in the milestone may indicate a disconnect between the functional requirements and assumptions that the test team is making
  • Patterns of similar defects being found milestone after milestone can indicate that the feature team is not learning from past mistakes.  It can also indicate that proper analysis is not being performed against the defect patterns at the end of the individual milestones in order to prevent the problems from happening again in subsequent milestones
  • Categories of bugs being missed may indicate an overall lack of attention to fundamentals like security, stress, performance, globalization, etc.
  •  Large numbers of integration bugs being found late in the cycle can hint at design problems between intersecting components

 

Again, what should be the goal here?  While it is a fact that good testers can find lots of bugs, I believe that we should look far beyond the numbers and ask ourselves "What do these bugs tell us about the product?" and "What do these bugs tell us about the way we test the product?"  Many folks tend to gravitate towards simple bug statistics because they are easy to measure, but unfortunately they don't tell us much in isolation.   

 

Only when we look at what the statistics are really telling us can we begin to make meaningful progress...

 

-Liam Price, Microsoft