I recently had some flight/airport time to catch up on a couple articles I've been meaning to read.

What is a good test case by Cem Kaner
This is a good summary for people new to software testing.  I'm a strong believer in using multiple "test styles" and test activities as part of an overall testing strategy.  This paper breaks black box testing into Function, Domain, Specification, Risk-based, Stress, Regression, User, Scenario, State-model based, High volume automated and Exploratory testing.  Internally we may use different terms, but hit most of these categories in some form.   For example, in addition to functional testing, which is probably the dominant style, we also do stress, capacity, security, specification (feature specs as well as others Logo requirement and Accessiblity/Section 508 etc), scenario and exploratory testing.  We do “User” testing through both dogfooding our product as well as through betas and early adopter programs.  My team has a few pilot projects with State-model based testing, but it’s limited right at the moment.  We have recently done a lot more high volume automated testing on our IDE features, where we take arbitrary code and apply generic tests to it.  This has been pretty successful.  We’ve taken our compiler test suite of 20,000+ language tests and run in through this engine and have found a number of bugs we didn’t with traditional methods.

A Critique of Software Defect Prediction Models by Norman E. Fenton and Martin Neil
A Decision-Analytic Stopping Rule for Validation of Commercial Software Systems by Tom Chávez
Both papers related to predicting bugs and had interesting approaches, which I want to look into more when I get more time.  I appreciate the point in Fenton and Neil’s paper about testing effort and testability needing to be part of the equation.   They make a good point that variables such as developer skills are very important and can be more important than module size or complexity in bug rates.  If these are held constant, it’s more valid, but a magic number like the “Goldilocks’s Conjecture” that says there’s an ideal module size is suspect.

Chávez’s stopping rule paper also sounds promising, although in general I think we are still a long way from being able to rely on numbers alone.  I think we will still rely a lot on people’s experience and gut feel for setting schedules and determining when to ship.  We do something internally (and I’m sure many other software projects out there do the same), which we call “bake-time”, where we might have zero ship stoppers at a given point in time, but we still want to wait before we ship because of the possibility of unfound bugs.  Historically last minute ship stoppers have popped up due to hard to predict activities (if we knew what they were we’d do those tests ahead of time).   Typically it’s a result of someone intentionally doing creative testing or performing a real world scenario people hadn’t thought of yet.  It’s a challenge to anticipate what these are, but it’s this hunt that makes for some very exciting and creative testing.

Internally, we use a low tech but relatively accurate method of predicting bug rates with historical data.  We look at the shape of active and incoming bug graphs from previous product cycles and incoming rates from test passes and fixed rates to help predict our dates.  One interesting observation is that while we are driving to hit zero bugs for a even milestone, the total number of bugs stays roughly constant if add the current milestone’s bugs with the bug pushed off to the next milestone, however, the make-up of these bugs changes to weigh the low priority and lower severity variety over time.

I’d be interested to know if other people have implemented an bug predicting or stopping rules.