Eric S. Raymond wrote, “Given enough eyeballs, all bugs are shallow.” He calls this Linus’ law.
The open source community uses this argument to assert that open source software is more secure than proprietary software. Advocates of proprietary software attack this argument on a variety of grounds, but here’s a little secret: Raymond was right. One cannot deny the logic. In fact, it is a tautology. If you assume that all individuals have a non-zero probability of finding and fixing a bug, then all you need is “enough” individuals. A million monkeys banging on a million keyboards will eventually produce Twelfth Night. Mathematically, the many-eyeballs argument, and the million-monkeys argument are equivalent.
The unstated implication in the many-eyeballs argument is a syllogism. Let me state it explicitly.
Code review makes software more secure
Open source software is reviewed more than proprietary software
Therefore, open source software is more secure than proprietary software
Unless you’re prepared to upend thousands of years of philosophy, you have to accept the argument as valid. But just because the form of the argument is valid, doesn’t mean that you’ve reached a sound conclusion. The premises could be false. But the emotional appeal of the argument is so compelling that it has often been accepted without a close examination of the merits.
So, allow me to attack the argument on its merits.
I will concede the major premise – that code review makes software more secure -- so the only thing left to attack is the minor premise: open source software is reviewed more than proprietary software. Raymond wrote, “the closed-source world cannot win an evolutionary arms race with open-source communities that can put orders of magnitude more skilled time into a problem.” Oh, it’s easy to believe it. After all, the source is available to millions and millions of people worldwide. It must be reviewed by more people than proprietary systems like Windows, right?
It’s not at all clear. The key word in Raymond’s argument is can. I’ll concede that open source can be reviewed by more people than proprietary software, but I don’t think it is reviewed by more people than proprietary software.
Have you heard the old story about four people named Everybody, Somebody, Anybody, and Nobody? There was an important job to be done and Everybody was sure that Somebody would do it. Anybody could have done it, but Nobody did it. Somebody got angry about that because it was Everybody's job. Everybody thought that Anybody could do it, but Nobody realized that Everybody wouldn't do it. It ended up that Everybody blamed Somebody when Nobody did what Anybody could have done.
There’s a lot of that going on in the open source community these days.
Speaking of code review and fixing bugs, Neil Gunton said, “In reality, it's generally very, very difficult to fix real bugs in anything but the most trivial Open Source software. I know that I have rarely done it, and I am an experienced developer.”
Examining open source security, Jeremy Zawodny said, “In the past few years, I've seen little evidence to support Eric's many-eyeballs theory.”
John Viega wrote, “[…] the fact that many eyeballs are looking at a piece of software is not likely to make it more secure. It is likely, however, to make people believe that it is secure. The result is an open source community that is probably far too trusting when it comes to security.”
Gene Spafford wrote, “the nature of whether code is produced in an open or proprietary manner is largely orthogonal to whether the code […] should be highly trusted.”
Okay, maybe these are isolated quotes. Maybe if there was someplace where people could record the work they’ve done to audit the security of code, the open source community could demonstrate the effect of the many-eyeballs phenomenon.
Sardonix was a project sponsored by DARPA, to harness and document the many-eyeballs theory in the form of a web site. Crispin Cowan formed the project in 2001 and created the www.sardonix.org web site. By 2002, a mailing list was active, along with a portal on the web site to record auditing activity. As Sardonix was getting started, Cowan said, “No one is doing auditing," and added, "reviewing old code is tedious and boring and no one wants to do it."
When David Wagner, a professor at Berkeley, heard of Sardonix he assigned some of his students to do reviews. In March, 2003, Cowan wrote, “There are over 300 people on this mailing list. Since the site went active in September 2002, the only audits submitted have been from [Wagner’s] students. It seems as if everyone is waiting for someone else to do the audit work.”
In 2004, after the project failed to achieve its goals, Cowan wrote, “I got a great deal of participation from people who had opinions on how the studliness ranking should work, and then squat from anybody actually reviewing code.” The ‘studliness ranking’ he refers was the “points” given to reviewers in the Sardonix project. According to Cowan, who is now a Security Program Manager for Windows, “the scientific conclusion of Sardonix is that auditing is both demanding of high skill and tedious, and so karma/reputation/good will is not enough to motivate people to do it. You must pay them to do it, precisely as Microsoft does.”
So, Raymond was right in his assertion that many eyeballs help. But where are the many eyeballs? I submit that Microsoft has more eyeballs doing code review than the open source community. True, there are potentially more eyeballs in the open source community than the proprietary community, but the proprietary community has an advantage: the salary participation program. Microsoft has people who are paid to review code. Lots of them: trained professionals with tooling support, direct access to the people who originally wrote the code, and strong executive support.
And it’s not like Microsoft source code is restricted to Microsoft personnel. There are more than a dozen different programs through which organizations and individuals can gain access to Microsoft source code.
But Raymond was wrong when he wrote, “the closed-source world cannot win an evolutionary arms race with open-source communities that can put orders of magnitude more skilled time into a problem.”
The open source community simply doesn’t have ‘orders of magnitude’ more skilled time to devote to a problem. According to Stephen R. Schach, Linus’ law is not applicable to open-source software development because, “most members of the periphery [those outside the core developer group] do not have the necessary debugging skills” and “the vast numbers of ‘eyeballs’ apparently do not exist.”
I conceded the major premise earlier: that code review can make software more secure. But code review is hardly all that makes software more secure. Getting software right is very, very difficult. For a great example of why software is really hard, despite being subject to tons of review, please see the excellent post by Joshua Bloch of Google:
Bloch’s blog points to a crucial algorithm that goes back 60 years, and it’s still not quite right. It’s been subject to tons of code review, including a formal proof that the algorithm it was based on was correct. To quote Bloch, “It is not sufficient merely to prove a program correct; you have to test it too.”
Code review alone is not sufficient. Testing is not sufficient. Tools are not sufficient. Features are not sufficient. None of the things we do in isolation are sufficient. To get software truly correct, especially to get it secure, you have to address all phases of the software development lifecycle, and integrate security into the day-to-day activities.
Raymond said, “It's not that open sourcing is perfect, it's not that the many-eyeballs effect is in some theoretical sense necessarily the best possible way to do things, the problem is that we don't know anything that works as well.”
But we do. Good software engineering processes can yield bigger benefits than code review alone. Code review is an important part of good software engineering, but code review is not a substitute for good software engineering.
Indeed, we’ve found that code review is a relatively inefficient way to find many security-related bugs. Code review is the only way some types of bugs can be found, but those bugs are relatively rare. In most cases, static and dynamic analysis or fuzz testing is more efficient at catching implementation-level bugs than code review. And once a type of bug has been found through manual code review, it should get incorporated into automatic code review tools anyway.
The Microsoft Security Development Lifecycle, or SDL, includes code review as a component, but code review is not our primary weapon in securing our code. Our weapon is the entire SDL process, which includes mandatory engineer training, security design reviews, threat modeling, fuzz testing, static and dynamic analysis, the identification of high-risk practices, and measurable criteria and requirements for each of the various phases in the software lifecycle, including servicing and support, user experience, user education, and marketing.
The SDL has been shown to be an effective method for producing more secure software. And it does it at commercial scale with a far more comprehensive approach to security than code review alone. But could “enough” code review, which might happen for popular projects like Linux and the Apache Web Server, compensate for the more comprehensive approach of the SDL? It’s not a priori unreasonable to believe that a distributed project with many actors working in their own self-interest could produce better results than an engineered approach like the SDL.
But there’s evidence that it does not.
Coverity asks, “would you like to know about 0day defects months in advance?” They ask that to promote their work in scanning open source projects for security vulnerabilities. Quoting from Coverity’s 2009 report:
“In January 2006, Coverity, Inc., was awarded a contract from the U.S. Department of Homeland Security […] to improve the security and quality of open source software[…] Since 2006 [Coverity] scanned over 60 million unique lines of code on a recurring basis from more than 280 open source popular source projects.”
We think that’s great. The work that Coverity is doing falls into a category of analysis known as “static analysis,” which Coverity defines as “a set of techniques for examining a software system and making determinations about what its behavior will be at run time, using information collected without running the code.” Microsoft and the SDL are big proponents of static analysis.
In some ways, you can think of static analysis as automated code review. But a static analysis tool looks at source code the same way a complier does, and so it has much more knowledge of what will happen than a human does. Our primary tools for performing static analysis are PreFast and FxCop, but some products in Microsoft actually use Coverity or other static analysis tools, in certain situations.
In addition to static analysis, the SDL also requires various kinds of dynamic analysis, which is the kind of analysis you perform when the code is actually running. Fuzz testing is a kind of dynamic analysis, but so is the use of tools like AppVerifier.
Now, let’s take a look at a portion of Coverity’s results from their 2009 report:
Coverity Scan Report Data
Total Lines of Code Scanned
Total Open Source Projects Analyzed
Total Defects Found
Total Defects Fixed
You may wonder about the meaning of finding more than 38,000 defects in open source software. Truthfully, that number is context free. 38,000 defects might sounds like a lot, but when you run static analysis tools you tend to find a lot of bugs, and some of these defects are surely false positives or low-impact bugs. I’m not trying to use this 38,000 number to suggest that open source software is unreasonably buggy.
However, I will note that code review didn’t find at least 38453 defects that were found through static analysis. I guess we have not yet reached “enough” eyeballs.
I’ll also note that the SDL requires Microsoft software to be “PreFast clean” and “FxCop clean” meaning that all static analysis defects are fixed or confirmed as false positives. If you look at the fix rate of the defects identified by Coverity, in 2008, 8500 of 27752 defects were fixed, or about 31%. In 2009, that figure dropped to about 29%. Finding bugs is great, but you have to do something with them to accrue any benefit.
And finally, I think if you compare the results from the Sardonix project to the work Coverity is doing, I think you will have to agree that code review is a relatively inefficient means of finding bugs.
Now, let me return to the question that Coverity asked, “Would you like to know about 0day defects months in advance?”
In the post that asks this question, Coverity highlights an interesting bug that is difficult to find through code review because the bug depends on a particular compiler optimization. The code doesn’t work as written, because the compiler optimizes some of the code away, and the result is a security bug. It’s a classic problem, but only if you understand the optimizations that compilers might make, and code review isn’t likely to catch that, as it didn’t in this case. On the other hand, static analysis is excellent for finding bugs like this because it uses the same techniques that compilers use to analyze code. Plus a static analysis tool doesn’t get tired of reviewing code.
But just like some bugs are easier to find with static analysis than code review, some other types of bugs are easier to find with dynamic analysis than static analysis. And some bugs are better prevented through threat modeling, or fuzz testing, or attack surface reduction, etc.
I think code review is great; I think static analysis is great. But I also think dynamic analysis, fuzz testing, training, servicing, threat modeling, and all the other pieces of the SDL are great too.
You might argue that the mere fact that Coverity can do this work is just another set of eyeballs. But I reject that argument entirely. This is a government subsidy to go do some hard and useful work, not a magic property of the fact that these are open source projects. The real beneficiaries of the subsidy are not Coverity (who is providing a fine service), but other companies whose business model is primarily about services and not software.
I think those companies are big enough that they ought to be able to do some of this themselves.
In product after product, Microsoft continues to ship fewer vulnerabilities than our competitors. Look at the results from Jeff Jones blog: http://blogs.technet.com/security/. Jeff is a Microsoft guy, of course, and thus not an entirely impartial source. But conduct your own research, use your own methodology and I think you’ll see: in product after product, the Microsoft offering is usually more secure than the competitors. We achieved those results through long-term sustained application of the SDL.
I have great respect for our competitors in the open source world. They produce some very competitive products, and are often very committed to security, as are we. We don’t discount the competition at all. But the many-eyeballs epithet is an implicit assertion that code review is the only thing that matters, and that the open source community does more code review than the proprietary community.
On the contrary: through the SDL, Microsoft’s many eyeballs cover the entire gamut of the software lifecycle with trained experienced professionals. While nobody knows the exact count of people doing security reviews in open source software, there’s no evidence to support a claim that its “orders of magnitude” more than the proprietary community, while the best available evidence suggests that the actual number is very small indeed, and except for extremely simple bugs or extremely complex bugs, code review is generally less efficient than other means of finding bugs.
Hope is not a security strategy. By contrast, the Security Development Lifecycle is a proven strategy. The many eyeballs argument is neat, tidy, compelling, and wrong.