Reader Geth writes to ask that I talk about what he refers to as NOK - Not OK - testing.

The basic premise is to determine how robust the application is to failures, how gracefully it handles problems. For example, when the database is unavailable, the table is unavailable, remote resources are unavailable, etc.

I wrote Geth back to let him know a) thanks for the post idea (which I'm always glad to receive!) and b) that I quite like the term "NOK" and am going to appropriate it. (One can never have enough TLAs donchaknow! <g/>)

NOK testing is as often not done as it is incredibly important - which is to say, very!

Verifying that your application does what its specification (whatever form that takes for your team) says it should do is important, but not sufficient. Checking that your application does not do what its specification says it should not do is also important but also not sufficient. It's easy to forget to check that your application does not do anything untoward, unexpected, or unnecessary, but this is also important - unless you like learning from your customers your product did something that was most definitely Not OK!

A different aspect of Not OK testing is to understand how your application reacts when resources it uses are Not OK. What happens if the file to which it tries to write is read-only? What happens if the database goes down or takes a long time to respond? What happens if the network is under heavy load, like first thing in the morning when everyone in your entire company is checking email at the same time? What happens if your hard disk fills up? What happens when you run out of memory? What happens if your application tries to load a third-party control that crashes on initialization? What happens if a user tries to inject SQL code into your web app? And on and on and on.

Don't be surprised if each person on your team has a different opinion as to what the answer to each to these questions should be! There is no one correct answer, in fact; "correct" will vary from product to product and perhaps even across releases of a single product. Version One products are often less stable than subsequent releases, since the focus for V1 tends to be "Get it done and into the market to trump our competitors" whereas later releases have an established install base to take care of. Microsoft OneNote makes it a point to never lose your work; your web browser likely doesn't care so much if you have to manually reopen web pages after it crashes. An application may choose to handle some error cases by retrying - a failure to connect to the database for example; other problems by ignoring them - as might be the case when an email program cannot connect to a master server to download the latest-and-greatest address book; and yet other errors by failing fast and hard - such as in out-of-memory conditions.

What exactly happens in any particular situation is less important than knowing what will happen, and having that happen consistently and deterministically. Stress testing, load testing, performance testing, security testing, and risk testing each cover a specific type of NOK testing; the degree to which each matters is very application-specific. If you have a standalone application that does not interface with any networks or the Internet, for example, then security testing may not matter so much. As is the case with all the other testing you do, you must prioritize your work, and some things will be underdone or even omitted entirely.

Spend some time thinking about and testing around these issues and you will reduce the number of customers who tell you "Not OK!"


*** Want a fun job on a great team? I need a tester! Interested? Let's talk: Michael dot J dot Hunter at microsoft dot com. Great coding skills required.