A key attribute of test quality is that the tests must be effective, that is, there must be a high probability that they will detect defects in the product. Conceptually this is simple, but it is more difficult to come up with simple metrics that can be used to determine how effective existing tests are or if we are making progress in attempts to drive process improvements based on test effectiveness.
The fields of epidemiology and medical diagnostics use some simple quantitative measures (sensitivity, specificity, and positive predictive value) in discussions of how good diagnostic procedures are at detecting the presence of disease (see University of Iowa, Department of Pathology 2007 for additional details).
In medical diagnostics, sensitivity is the percentage of patients with the disease that are correctly identified as having the disease. Specificity is the percentage of patients that do not have the disease that are correctly identified as not having the disease. Positive Predictive Value (PPV) is the percentage of patients correctly identified as having the disease among all individuals identified as having the disease. In medical terms, good diagnostic tests have high sensitivity, specificity, and positive predictive value.
Parallels can be drawn between medical tests for the diagnosis of disease and software testing for the diagnosis of product defects. Conceptually, sensitivity and specificity should translate well to software testing as we want tests that are likely to detect product defects when present (high sensitivity) and which do not result in a large number of false failures (high specificity).
The matrix of the possible software test outcomes is:
True Fail (TF)
False Pass (FP)
No Product Issue
False Fail (FF)
True Pass (TP)
Sensitivity = (TF / (TF + FP)) * 100
Specificity = (TP / (FF + TP)) * 100
Positive Predictive Value = (TF / (FF + TF)) * 100
However, calculation of accurate values for sensitivity and specificity are near impossible in software testing because there is no “golden standard test” that can be used as the reference to distinguish between tests that pass because nothing is wrong with the product (True Pass) and tests that pass but do not detect a defect when present (False Pass). (In this case, sensitivity can be described as the percentage of tests that detect a product issue among all tests that could have detected a product issue. The practical problem is determining how many tests could have detected a problem.)
Positive predictive value (PPV) is relatively easy to calculate for software testing (as long as we have good data on the root cause of test failures). Test quality improvement efforts should track PPV and should include activities designed to improve the metric over time. Any activities which reduce the number of false failures will help to drive up PPV, and while it cannot be measured directly, an increase in PPV will tend to be correlated with an increase in testing specificity. Reducing the number of false failures makes it easier to quickly and accurately diagnose when the product contains a defect.
University of Iowa, Department of Pathology. 2007. Online Laboratory Services Handbook (Appendix). http://www.medicine.uiowa.edu/path_handbook/Appendix/Chem/pred_value_theory.html