The Black Bug

Published 19 April 08 05:11 PM | carlcs 

Right now I'm reading The Black Swan. I'm almost finished.  Here's my book report .

The Black Swan is one of the rare books that seems to express and clarify my own thoughts. It gave me a description and a framework for an outlook that I like to think I already had.  The author, Nassim Taleb calls this outlook "skeptical empricism."  I heard about this book some time ago but at the time I was only mildly interested. The high level descriptions that tend to accompany it just don't give it justice. The blurb usually says something like "Teleb posits that large unlikely events dominate most phenomenon." Then they add in a couple of examples from stock market crashes and 9-11. While this is not an inaccurate description, the real point is more subtle. The key is how we assess the likelihood of unlikely events, both positive and negative.

Much of how we think about risk is predicated on the idea of 'normal' distributions. This is true both when we are trying to be rigorous and use a sophisticated mathematical tools, as well as when we casually make estimates, try to predict something about the future, or simply make a plan. The normal distribution is implied anytime you discuss such things as averages or standard deviation. Taleb claims, and makes a convincing argument, that normal distributions hardly ever exist in the real world.  Instead power laws dominate. One example he uses is wealth distribution. If you have any two people the sum of whose net worth is 1 Billion dollars, you are far more likely to have one person with a net worth of nearly one billion and another person of more modest means than you are of having two people with a net worth of 500 million. Taleb refers to the unlikely events that dominate the overall distribution 'black swans.'

Other manifestations of this behavior can be seen in large-scale projects, whether it is a construction project or a software project. Vast cost and schedule over-runs are commonplace. The reason is that the problems you face in a complex system don't follow normal distributions. Enormous schedule-slipping issues don't get exponentially less common the more severe they are. Issues that can delay a project by years may be uncommon, but they aren't unheard of. The difference may not seem significant, but the implications are enormous. For example, if your project faces a 20% risk of slipping ten days, with a normal distribution it's essentially impossible that you would slip a year. With a more realistic power-law distribution, it could be 10%.  That's unlikely, but you can be sure to run into a couple of problems of that order a couple of times in your career. Anyone who has shipped a reasonable amount of software can spin a couple of stories about these 'black bugs' that created these serious distortions.

This understanding of risk really helped me understand what bothers me about typical schedule creation and end game mechanics. There is often an element that just doesn't seem to appreciate the enormous difficulty in shipping software. You often hear people estimate the incoming bug rate and the average fix rate and then project these rates linearly until you reach some stabilization period. What's really dangerous is that after people make these estimates and realize they have some number of "dev days" and start putting in more features based on how many days they expect to have left. Thirty years after the Mythical Man Month and the calculation continues to persist. They make the classic mistake and assume that all unknowns are 'known unknowns' and that their difficulty is normally distributed and possibly even declining as you get closer to ship.  They may never say 'normally distributed' or 'Gaussian' but just by projecting a fix rate, they are implicitly making that claim whether they know it or not.

Just for fun, I did a query against a bug data-base and looked at the distribution in the time it took to resolve a bug. For those developers among you, I queried for code change bugs resolved as fixed and calculated the difference between open date and resolved date. While it's tough to prove that a distribution actually follows a power-law, the resulting plot sure looks like it. There are lots of bugs that get resolved in just a day or so, but the long tail just goes on and on.

The critique to all of this is obvious, and Taleb addresses it in his book. The problem is that power laws are not very good for prediction. Just deriving the exponent is enormously problematic because the extrapolation is very sensitive to the initial data. But even if you have the real distribution, what should you do with this information? You can't just add a year to a project because there is a ten percent chance that it might slip that long. You can't even multiply the probability by the length and slip by a month. That time is either too short or too long depending on whether the bug exists. But yet you have to plan.

The first step is to acknowledge that there is a problem. Black bugs do exist and you will get bit by them at some point. You've seen it happen to other teams, and it's folly to think you are just better than they are. This realization might be enough for some people, the fear of destabilizing changes might seep into your system deeply enough that you can make the right decisions. But Taleb has some more concrete strategies that he applies to the stock market. His strategy is that for areas that are subject to negative black swan events, be enormously conservative. Don't trust any risk assessments because the assessments are all derived through models that use the normal distribution. At the same time, try to take advantage of positive black swans with small investments you can afford to lose. In the stock market this means that the majority of your wealth (say 80%) should be invested in the most secure investment you can find, treasury bonds for example. The rest should be in invested in small amounts in highly risky, enormously leveraged instruments. It only takes one of these to pay off for you to be able to retire in comfort.

For software, this means that once the feature set is locked and you are marching towards ship, you should be as conservative and hard core as possible. You need to be even more conservative than you think you need to. Go ahead and make schedule estimates. Feel free to plot glide paths. Just don't believe them. And don't look for extra work to fill up the gaps. Any change, any code churn generates and obscures black bugs. It obscures them because, when you check in a new feature you'll fix a lot obvious issues for a while before the mist clears and you get to the deeper, harder-to-find problems.  It's like looking into a pool of water. If the water is still churning, you might not see the barracuda just below the surface. Don't assume that just because a bug is hard to find that it isn't serious.

Raise the bug bar slowly. If you raise the bug bar to exclude trivial issues early, then you've stilled the waters and increased the amount of time you have to look for the one bug that could cause you to slip. Conversely, if you keep putting in features and fixing unimportant bugs right until the final lockdown, you miss much of the focused test pass necessary to ferret out possible black bugs. Don't assume that if the incoming rate starts to decline that you are starting to scrape the bottom of the bucket. You may have gotten past the surface issues. No matter how long you test and how many customers run your product, you can never be sure that you won't still run into a black bug.

If there is some design change that you are just itching to make and you are convinced that you just can't ship without this feature, try to think of the worst thing that could happen. If you decide to fix this in code, you have traded a known bug for a unknown number bugs. Perhaps you have traded a bug you can alleviate with documentation for a black bug that tanks the product in the press. Since this came in with a DCR, it's late and it's much less likely that you'll find it in time.  Most importantly, don't schedule to the hilt. Even if you are sure you can squeeze more into the schedule, and some feature would really make customers happy, just say no. Unless you know you will lose sales it can wait until the next version. You may think the risk is low, but never forget you just can't calculate risk. All you know is that risk a lot higher than you think it is.

 

If you actually do find some slack in the schedule, don't add features, start experimenting with ideas for the next version. Look for opportunities for positive black swans. These are higher risk, speculative features that could have reap enormous benefits if you they pan out. These features are your chance for real innovation. While I am a also a big fan of careful value proposition work, the real game changing features always come from left field from some individual contributor with an idea. As you stop checking in changes and start looking for black swans, don't worry about your team not having enough to do. Setup a system to let the experiment and create something truly unique. You'll have time to plan a product that aligns to the markets needs and moves forward key strategic initiatives, but you'll also create something unexpected. Most of this experimental work probably won't make it into the product. But the level of commitment and passion going into it will far exceed anything a top down strategy can achieve.

This conservative approach to shipping may sound boring and like a  recipe for non-innovative products, but it's not. Remember that what you have is probably a whole lot greater than you think it is. You've probably lived with your product in one way or another for years now. When you start your application, all you can see is what it could have been. You see cut features and opportunities lost.  This is natural, but that's not what your customers will see. They will see the features that did get in. And if you executed well and didn't sully the project with last-minute changes that didn't have time to integrate cleanly, the product will be a joy to use and customers will love it. Just keep reminding yourself of that.

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Stock Market » Blog Archive » The Black Bug said on April 19, 2008 2:17 PM:

PingBack from http://stock-market.tedtheblog.info/?p=1246

Leave a Comment

(required) 
(optional)
(required) 

About carlcs

I've been working at Microsoft since the beginning of 1998. I have been both a developer and a program manager and have worked on COM+, Enterprise Scalability, Core File Services, and Terminal Services. I am currently a program manager on the Windows Essential Business Server team.
Page view tracker