In a paper titled: Incremental Compositional Dynamic Test Generation, researchers from the University of Wisconsin, Madison, during the production phase of Windows 7, found one-third of all of the security bugs in Windows 7, using a process called Satisfiability Modulo Theories (SMT) solver or as it also known as: Z3. 

Since these bugs were found during production prior to release, Windows 7 code did a good job is shown by the small number of bugs over the past 3 months in the NIST.GOV National Vulnerability Database, as of April 20, 2011:

  • Windows 7                    75 security vulnerabilities
  • OS X vulnerabilities        494 security vulnerabilities
  • Linux                       101 security vulnerabilities
  • Red Hat                     285 security vulnerabilities 
  • Free BSD                      6 security vulnerabilities

Apple states that Mac OS X Snow Leopard is “The world’s most advanced OS”, it appears that one could conclude that as of April 20, 2011 OS X has some security issues.  Especially since OS X is loosely based on Free BSD, which only has 6 security vulnerabiities.

Of course, no OS is secure, all operating systems have vulnerabilities, any assumption that your OS is secure is not a valid assumption.  However, when you are designing, managing or maintaining software, don't you want your software to be secure? One of the ways you do this could be WhiteBox Fuzzing.

So how does this work?

The way it works is that the Whitebox Fuzzer is a software program that runs dynamically while the program that is being tested is also run.  The image below shows graphically how it works.  Actually, it is a pretty poor schematic.  Fuzz testing is form of blackbox testing, so called because in aerospace the aircraft use blackboxes and line technicians have to test them without opening them.  In the case of WhiteBox Fuzzers, the system randomly mutates well-formed inputs and tests on the resulting data.  Grammars are used to generate the well-formed inputs and encode application-specific knowledge and test heuristics.  This could cause low code coverage is the input is a 32-bit input value.  Using a concept titled systematic dynamic test generation which generally is based on a search algorithm utilizing collected constraints developed in earlier testing.  This models a good human tester as they will repeat know tests that provide breaking scenarios over time.  For all of this to make sense you would have to read the paper, which I suggest that you do. 

If you want to learn more about Z3 or SMT, then take a look at this link, bear in mind that Microsoft is making this tool available but it is only uses the open source community language set, see the link:

System Architecture

SAGE performs a generational search by repeating four different types of tasks.

  1. Tester
    • Tester task implements the function Run&Check by executing a program under test on a test input and looking for unusual events such as access violation exceptions and extreme memory consumption. The subsequent tasks proceed only if the Tester task did not encounter any such errors. If Tester detects an error, it saves the test case and performs automated triage.
  2. Tracer
    • Tracer task runs the target program on the same input file again, this time recording a log of the run which will be used by the following tasks to replay the program execution offline. This task uses the iDNA framework [3] to collect complete execution traces at the machine-instruction level.
  3. CoverageCollector
    • CoverageCollector task replays the recorded execution to compute which basic blocks were executed during the run. SAGE uses this information to implement the function Score discussed in the previous section.
  4. SymbolicExecutor
    • SymbolicExecutor task implements the function ExpandExecution by replaying the recorded execution once again, this time to collect input related
      constraints and generate new inputs using the constraint solver Disolver (yes: it is spelled, disolver, not dissolver)

What kind of bugs were found?

  • Most of the bugs found were Buffer Overruns, and this new technique dynamically tested BILLIONS of instructions using a WhiteBox fuzzer called the SAGE.

Paper summary

The reference paper is a tough read. So here it is in five sentences or less:

  1. Ask this question: Given a set S of symbolic test summaries for a program Prog and a new version Prog′ of Prog, which summaries in S are still valid must summaries for Prog′?
  2. Present three algorithms with different precision to statically check which old test summaries are still valid for a new program version.
  3. Create a simple impact analysis of code changes on the static control-flow and call graphs of the program
  4. More precise predicate-sensitive refined algorithm using verification-condition generation and automated theorem proving
  5. Checking the validity of a symbolic test summary against a program regardless of program changes

There, 5 sentences. But really read the paper it is worth the time.

Schematic

image

And finally just in case I mentioned Alfred Thompson, or even thought about him while blogging, see his blog at http://blogs.msdn.com/alfredth

He had nothing to do with this blog if it goes badly.