We have occasional debates here at Microsoft on the value of ‘white-box’ testing.  (For the sake of this discussion, ‘white-box’ testing means testing that takes advantage of internal product knowledge, while ‘black-box’ testing relies solely on the published API and behavior.)

On the one hand, if the person who developed code writes the test for it, or if the tester was intimately involved in the design of the feature, then the test design is often blinded by the dev design.  For example, if the development of the feature goes to great lengths to ensure that the threading and locking behavior is correct, then the tester may spend a majority of their test effort there.  This is a problem if, for example, there are major flaws in the way the feature interacts with its config files.  You don’t want a certain focus in product development dictating a similar focus in test development.

On the other hand, there are some kinds of bugs that can *only* be found with white-box testing.  If code has an off-by-one error that only kicks in at the exact buffer size that switches between the fast-path algorithm and the slow-path algorithm, there is every possibility that even well-designed equivalence classes against the published API won’t find the problem.  A quick peek at the code, however, can tell you exactly what buffer size to use, if you want to give the product a conniption fit.

However, there is one kind of white-box testing that is pretty much universally seen as useful – fault injection testing.  This is where you inject code at compile- or run-time to force faults that would not normally occur.  These are great!  Once you have a fault-injection test framework set up, a fault scenario that might ordinarily require an IBM mainframe database and a female goat can be duplicated in a few couple minutes, with a few simple lines of code.  If you are hard-core about testing error handling in your program, you owe it to yourself to investigate fault-injection schemes.  It is similarly good for increasing code-coverage numbers, if you use that metric.

Now, there are a lot of different ways you can do fault injection.  Recently I’ve seen some work done injecting faults at the protocol level during SOAP message transmission.  It is interesting to send SOAP messages from one place to another, and have a little bitty guy who sits in between and drops, alters, or otherwise interferes with the messages flowing by.  (Would that be an example of MaXMLwell’s Demon?)  Going further back in the past, I’ve seen a system that systematically generated every possible memory allocation fault while running our test suite.

Now, that last example brings up the ‘dark side’ of fault injection testing.  You see, there were a couple problems with that memory allocation testing.  Here is how it worked:

1.      Pick a single test variation from your test suite, and run it.  Keep a count of the number of memory allocations executed.

2.      Run the variation again.  Fail the first memory allocation, and all subsequent ones.

3.      Run the variation again.  Fail the second memory allocation, and all subsequent ones.

4.      Keep doing this until you’ve gone through the entire set of memory allocations that a successful variation run does.

5.      Go to the next test variation.

Now, the most obvious problem is that this test takes *forever* to run.  It literally took weeks to run through a simple set of COM tests.  (Oops, did I accidentally reveal the product in question?)  You really have to look at the bugs you find with this method, and ask yourself “is finding these bugs worth the cost of the testing?”  There might be other testing you could be doing that would find better bugs, or find bugs more efficiently.  So if you are doing ‘high-volume’ fault injection, you really need to balance the cost of the testing against its value to you.

Another problem, really more of a nit, is that while this method seems comprehensive, there are many cases it doesn’t hit.  What if the memory allocations are only failing intermittently?  If something fails only on the pattern “allocation failed, allocation succeeded, allocation succeeded again, allocation now fails again”, then the test above won’t find it.  I don’t have an answer to this one, except to give the sad truth that you can always think of more things to test, than you could possibly actually write (or run) tests for.

Another, much more significant problem, is this: how do you know if it failed?  We took the easy way out, and said “if it AV’s, blue-screens, or otherwise crashes – it failed.”  So if it appears to succeed; or if it returns a wrong error code; or if the behavior of the program is just wrong, in a non-crashing way – we would not detect it.  Why did we do this?  Well, remember our discussion of costs and tradeoffs.  I’ll give you a million lines of test code, developed over ten years by 40 different people.  Are you gonna track through that and fix it so it reports errors correctly *for every possible memory allocation failure path*?  Note that I’m not claiming that level of detailed testing isn’t useful – it just wasn’t feasible in that case, on the scale of the entire test suite.  Picking a set of core functionality and writing some exceptionally robust test code specifically for this type of fault injection scenario might be a useful endeavor.  We’ll see.  <grin>

For those concerned about Microsoft code quality, note that these days we do also have some static analysis tools that will churn through a reasonable subset of possible call graphs in our programs, and report possible problems in error paths.  It even files bugs automatically – the Windows folks love that, I’m sure!

Some of my readers may be familiar with Microsoft’s stress-testing efforts, where we often hammer a machine with tests to the point of program failure.  While stress testing is useful, don’t be fooled into thinking that it is an adequate replacement for fault-injection testing.  The biggest problem with stress testing is the “early exit” problem.  If you are crushing a machine to the point that memory allocations are failing, then most programs are just going to die immediately when you run them.  You are only going to end up testing the first 10% of the program, which presumably is not  the intent.  Another issue is that stress test failures tend to be non-deterministic; you don’t necessarily know when (or if) a particular failure will occur, and it can be very difficult to determine the actions that led up to the failure.  This makes debugging a stress failure much more, well, stressful; than debugging an equivalent failure in a fault injection test.

Lastly, note that (as with so many of the topics I write about), there is much, much, more to it than I have written.  For example, there are many different kinds of faults that you can inject.  We talked about memory faults and protocol faults; but there are also file access faults, security faults (a biggie!), system object access faults (events, mutexes, etc), registry faults – you can even feel free to inject faults into your own internal product code – what happens when your lower-level stuff throws an exception to the higher-level stuff?

It is really not that hard to get started with, however; and I highly recommend it.