“Any idea how many bugs may still be in the product?”  A question I've been asked and heard others get asked on many occasions.  It drives me nuts.  Not because it's a dumb question, as we'd all certainly love to have the answer to it.  It's just that it's pretty much impossible to answer.  But saying it's impossible isn't a good answer either.  It's pretty much a cop out.  Those that ask the question are more than aware you're not going to have an exact number of bugs in response.  All they're trying to get at is a feel for your confidence in the quality of the product at that specific point in time.

There are several metrics you can use to come up with a somewhat acceptable partial answers.  Things like bug trends and code coverage data come to mind.  But there's nothing out there that gives you a ton of confidence in your answer.  And man does that suck. 

I decided to spend a few hours today looking for any information describing what those depending on software in mission-critical environments do to feel okay with their answers to this much loved question.  Google kept leading me to NASA documents which were definitely pretty interesting but still left me searching for more. It turns out that “process” is quite big at NASA and it's far from a surprise as it's just expected to have in environments like theirs.  They've written lots of documents that describe processes for processes.  They also seem to do a pretty good job at analyzing past projects and have collected some interesting data points.  Some of the most interesting I found were in this slide deck.  It's not always clear if the source of the data is the Standish Group's CHAOS Report or NASA's own, but it's interesting either way.  Here's an example of some of the fun-facts they mention:

  • 53% of 8,380 software projects were over budget by 189%, late by 222%, and missing 39% of capabilities.  31% of the projects were cancelled and the remaining 16% were successful (e.g. in budget, on time, etc.).

There's another slide titled “Where are software errors introduced?” which showed “requirements specification” as the answer to that question 68% of the time, “design and implementation” 23% of the time, and “installation and commissioning” with the remaining 9%.  It's not a big surprise that specs are leading this race but it makes you wonder what can be done to improve in this area.  Obviously, writing clearer, more complete specs is what needs to happen but it's definitely easier said than done. 

On the projects I've worked on, we sometimes have features that are really well spec'd out (e.g. C# language in general) and others that aren't spec'd out nearly as well (e.g. Alink, a tool Grant has posted about a few times recently).  Even in the case where a solid spec exists for a feature, it often times isn't a “solid spec” until way later on in the product cycle. This can lead to developers implementing a feature with a design that has many open questions and testers verifying a not-so-clear level of correctness.  It's clear that if we got the spec right from the beginning, before developers wrote code and testers tested, we'd probably end up with less bugs, but it's just not feasible.  Plans and feature designs change significantly over a product cycle for a variety of reasons (customer feedback, design issues, change of priorities, etc.).  We need to find ways to keep things as dynamic as they need to be but as concrete as possible throughout.  Fun.  (Going to try to stop rambling now)

Anyway, another site I ended up at was testing.com.  This is a site that is incredibly rich in content related to software testing run by Brian Marick.  I've read several of Brian's papers over time and been to a couple of his talks at conferences before.  He's one of the few testing industry leaders I've really been able to agree with most of the time and relate to in general as he seems to speak from real experience and is able to clearly communicate it.

I read two of his “writings” today.  The first was Classic Testing Mistakes and the other one was A Manager's Guide To Evaluating Test Suites (written with James Bach and Cem Kaner).  An interesting section of the latter was appendix A where they describe the approaches for evaluating test suites they rejected.  The approaches rejected were error seeding and mutation testing which can somewhat be used to predict future defects in a program.  Both of these involve explicitly adding bugs to your code to get a feel for what percentage of bugs your tests are catching and missing.  Clearly this approach has several problems, which they do a good job of pointing out, but oh how I wish something along these lines could really work.  I might actually play around with something in these areas if I come up with anything that's even remotely promising.

Anyone have any thoughts/ideas on ways that can lead to answers to my favorite question?