Adaptive Fault Injection
One of the topics I covered in Ilias' and my joint presentation at WinHEC this year was new requirements coming for getting a WHQL signature for drivers using KMDF (UMDF as well, but this discussion doesn't currently apply to them). One part of those new requirements is that your driver survives a rigorous fault injection from the WdfTester tool. The method used for that is what I intend to discuss briefly in this post.
By "survives", I mean no bugchecks, hangs, or leaks- graceful failure, not fault tolerance, is the goal here.
The method used is one I call adaptive fault injection. The term "adaptive fault injection" is admittedly my invention (and I'm probably the only person that uses it)- but I thought it fit. So this is an attempt to define what I mean by the term.
The Problem
For readily available fault injection, I had two tools available to me in 2006, when I began looking into ways to improve code coverage in the KMDF loader in an automated way.
- Driver Verifier Low Resource Simulation- this did boost our numbers, but it took a long time,, and since it was random, it took extensive analysis to find out what was covered in terms of code paths, and the only way to cover a particular fault was to run until you finally hit it. Not very predictable. To me, it was a shotgun handed to me when what I really wanted was a needle.
- The always on, but programmable threshold mechanism in the KMDF Verifier. For one thing, it didn't help me with the loader (but I was also looking for generic solutions), but for another, I wanted to be able to say- "I want just this one fault, period". So it was still a bludgeon, and not the precise instrument I desired.
Now, we had experimented with IAT hooks and had even used them experimentally for fault injection [the tool just failed pool allocations]. I could easily see that I could use them for precision, but the result could be a very high-maintenance mechanism- if I had to individually code each fault, the price for the technique would be way too high.
I wanted something that could examine the system under test, find all the failure points, and then cover each and every one of them. Something that would adapt to the system's behavior as it evolved.
Hence "adaptive fault injection". So perhaps the basic idea is less of a mystery?
Feedback is almost always your friend
If you have an IAT hook, you can log the activity through the intercepted call- both the data going into the called routine, and the values returned. You can't see state changes occurring on either side of the hook, but at least you've got a point you can begin at [and this situation reminds me a lot of the days when we were testing circuit boards via their external connectors].
So the basic idea for the first iteration was quite simple:
- Monitor all the points you can, and record the activity on them.
- Analyze the recorded activity, and determine where inputs going back into the system can be modified to simulate failures.
- Count the number of such failures you have identified.
- Repeat the original activity as many times as needed, and on each cycle, inject each failure in turn. Log this activity much as was done in the first step [and the logging mechanism should report the injection activity, of course].
WdfTester was under development at the same time as my tool [which I named "SanAndreas" after the infamous fault line running up the West Coast of the USA], and utilized the same concept, although in a different form. My trigger was a simple integral counter, making for a simple loop. WdfTester instead injects a fault on a specific count of a given DDI call. Neither method is perfect [I'll get back to that in a bit], but in highly repeatable traces, there's no real difference between them. By the way, I have no idea if we both had the same idea, or if we discussed this [although we probably did, because when I first began that task I said I wanted this sort of injection mechanism, long before I did the implementing].
So, this provides more precise injection, at the cost of potentially longer run times, but the times are deterministic and computable once the analysis step has completed. But it also requires little programming or reprogramming, as it does adapt itself to the observed behavior of the system under test.
Where this still falls short
At least a couple of places come quickly to mind:
- In multithreaded cases, observed behavior may not be that deterministic. In that case, we at least try to inject something, but there is potential for things to be missed, or even for the same fault to be injected multiple times.
- The state that cannot be observed may matter- I believe this is primarily a concern when designing the repeatable case you want to inject, or at least this risk can be mitigated by considering the state when deciding what sequence to apply this technique to.
- The system may adapt to the faults you inject [for instance a failed I/O may be retried]. This is bad only in the sense that it may prevent you from reaching paths you would still like to reach. One could address this by recursively repeating the analysis and subsequent phases (and then using a more complicated injector- first inject this {series of} fault{s}, then in turn inject these]. Such a mechanism may need to have runtime bounds placed on it to prevent endless recursion...
Still, even with these flaws, I've found the technique even in this most primitive of forms to be a step forward in having a more precise fault injection method available to me in my bag of tools.
As long as I'm typing...
Gee, Patrick- I've played all three [but I preorder at GameStop, even though the price is higher]- which reminds me that I need to pick up Call of Duty: World At War [and a day later- World of Warcraft: Wrath of the Lich King]. Didn't get far in Fallout 3, but one play-through each on Fable II and Gears of War 2.
I've been pulling tunes off some really old cassette tapes [bands I played in during the 70's and 80's- recorded in mono on hand-helds, for the most part, along with some solo practice recordings]- think I'll use that nice USB stick I got at WinHEC to bring a few of the livable ones to the office- see if anyone can figure out which tunes I'm the guitarist or bass player and / or vocalist on... Of course, the poor quality of the audio ought to be a clue. If I find one I can live with, maybe I'll link it somewhere [I have a few I wrote myself, are public domain tunes, or are just instrumental noodling, so I can avoid the copyright bogeyman].
Also, since things are now disclosed- this is what the new 1.9 controls in WdfVerifier will look like.
Now Playing: Grateful Dead Album: Built To Last- Victim or the Crime- seem to end on Dead tunes of late, not that I don't listen to plenty else.