Let's think about this for a second: How come a developer can easily calculate implementation time, but why a tester always scratch his (or her) head when asked to provide time needed to stabilize the software?  Why is test estimate so difficult and elusive to figure out?  In my opinion, I think accurate prediction of software testing is one of the the most complex problems in software development.  Alan Page has a great post about test guesstimation, which I think summarize this problem very well.  But I think this topic deserves more scrutiny and hence this post.

Why is that so?

First, let's think about what's being asked of a tester when (s)he is involved in a typical software development project:

  1. Think about test methodologies
  2. Write up test plan
  3. Document test scenarios and test cases
  4. Implement test automation, test tools
  5. Execute tests, find bugs
  6. Investigate bugs
  7. Resolve bugs
  8. Regress bugs
  9. Repeat step 4-8 until one of these two things occurs -- run out of time or no new bug is found.

Please note that this is not finite list. 

There are quite a number of the above tasks which are not concrete.  For example: #1 Think about test methodologies.  What type of test methodologies are there and applied to the software?  Should the tester always run Performance test?  Stress test?  Security test?  It is somewhat a gray area which is hard to determine whether which test methodologies are absolute must, and which are nice-to-haves. 

Next up is the external dependency which involve during testing, most notably when test start filing bugs -- the developer.  It is easy to say that once a bug is found, it's going to take X amount of dev time to fix it.  However, it's much more difficult for test to say that the fix truly works, and the fix itself does not break other parts of code.  The worst case scenario here would be a tester file a bug -> dev fix it -> test found out the fix broke and cause 3 new bugs -> dev the new bugs -> test found out the fixes are also broken which cause 5 more new bugs -> ...  The becomes the inevitable circle of death.

Moving onto an even more gray area, #9 Repeat steps 4-8.  This is essentially executing a new test cycle.  However, the fundamental question is, how many test cycles does a product really need? If tester ran out of time (e.g. deadline is met), and software still contains lots of bugs, does this mean that the testing phase is complete?  (Hopefully not)  What about if there is no new bugs identified, does this mean that the software is stable?  The answer to these questions really is it depends.  There are a lot of "influencing" variables which can alter the outlook of the software -- business goal, product stage (prototype, beta, maintenance release, etc.), product impact (is this medical software, navigational system, banking apps, or simply games, cool add-ons, widgets, etc.), product complexity, resource capability, and many other factors.

I hope by now you see where this is heading.  There are simply too many moving parts being involved in software testing.  As a result of that, estimating the actual time becomes almost a guess work.

Oh by the way, how do we actually know that once testing has signed off, testing is truly complete?  What about the quality level of the software?  Everybody that I have worked with, both past and present, all said pretty much the same thing -- everybody wants high quality software.  But how on earth does one go about measuring quality?  (The topic of quality measurement is probably worth its own separate blog post.  Hopefully, some other time.)

There are probably a few things that the tester can do to figure out the quality level -- determine test coverage, look at test execution pass/fail rate, bug report, customer complaints, etc.  However, these are not absolute measuring units.  One good example, having high test coverage does not mean the software quality is high.  What if we have every possible testcases which cover all UI operations, but the customers think that the UI is not intuitive. 

As you can see, measuring software quality is really non-deterministic.


Okay, enough with the ranting.  Now, let's think about what can a tester do to obtain a better estimate. 

I am a big believer in using historical data, derive some conclusion from that statistics, and apply that knowledge learnt to the current project.  We can easily start tracking past estimates on all project being worked on and compare that against actual time taken.  Using this very basic form of math, we can develop a trend and see whether how accurate we were.  If we are, we can find out reasons which contribute to the accuracy.  If we are not accurate, then we can find out what areas to improve.

One great example of usage statistics is baseball.  Ball players and coaches use statistics and probability quite successfully to determine match up between a pitcher and a hitter.  A pitcher ERA, batting average, slugging percentage, strike out rate, and other stats are all constantly being monitored to find an upper edge. 

If baseball can have great success with statistics, then I definitely think software testing field can definitely and easily adapt that as well.