December, 2007

  • The Security Development Lifecycle

    Common Criteria and answering the question 'Is it Safe'

    • 9 Comments

    Hi all, Eric Bidstrup here.

     

    One of the areas that our group is also involved is in industry standards regarding security assurance, and Common Criteria (aka ISO 15408) is the standard internationally recognized by 24 governments (including the US, UK, Germany, Japan, and others). It’s interesting to consider that while all consumers of computer software want to have both confidence and detailed information about the security of software they want to purchase (or have already purchased), Common Criteria (CC) has failed to gain broad acceptance and recognition in the private sector or in any community beyond government agencies. Microsoft has been very vocal in the CC community on suggestions as to why that is and how to modify CC for broader commercial acceptance, and so I thought I’d share some of those thoughts here. Currently, Common Criteria fails to meet customer needs as a useful indicator of the likelihood of security vulnerabilities in software.

     

    At a very fundamental level, when someone in either the private sector or from a government agency considers purchasing or using a software product, one of the questions that may come up is “Is it Safe”? (Apologies for the lame and over-usedMarathon Man” movie reference).  I choose this imprecise reference to “safe” since most people don’t think deeply about what it means beyond “I don’t want bad things to happen to me or people/property/data I care about”. In terms of software security, all of the following most people would think of as being “bad”: Viruses, worms, malware, hackers, criminals, and espionage.  These items listed have one thing in common – all of those bad things require a weakness (a “vulnerability”) in the software used, and finding a way to exploit that vulnerability for a nefarious purpose.  Security professionals have various frameworks on how to define “safe” that usually factor in some of the following considerations:

    1)      Value of protected assets

    2)      Assumptions about the sophistication of and level of resources available to an attacker. Defining “attacker” can cover a spectrum that ranges from a well intentioned but misguided employee to people we commonly think of as “hackers” to employees of a hostile intelligence service.

    3)      Level of confidence/assurance that is sought by people responsible for protecting the assets noted in #1 from the attackers noted in #2.

     

    Obviously different customers will have different criteria for determining “Is it Safe”? Small businesses will have different needs from large multinational corporations who will have different needs from government security agencies. To answer that question, security professionals require time (usually at substantial cost) to analyze not only the considerations above, but also examine in depth the software itself, its intended use, the environment in which it will be used, and a variety of other factors. Consumers who are not security savvy will likely make judgments based on sound bites from the media and intuition rather than any specific data or analysis. The Internet can be a dangerous place; a computer with vulnerable software is an easier target than one without such software.

     

    When considering what types of software vulnerabilities could occur, there are three general categories of potential vulnerabilities:

    1)      Design vulnerabilities – software that was not designed adequately to meet security requirements, needs, or expectations.

    2)      Implementation vulnerabilities – software that exposes risk based on implementation deficiencies.

    3)      Deployment vulnerabilities – software that was misconfigured in deployment as to expose risk that might have been prevented by other configurations.

     

    Let’s talk about each of these in the context of Common Criteria.

     

    For classes of products where protection profiles (PP) have been defined, CC arguably does a reasonable job is addressing design vulnerabilities. A protection profile outlines customers’ interests and needs in terms of security features/functionality. Smart cards are a great example where the threat and risks to a class or products have been well defined and reflected in the protection profiles. Operating Systems and DBMSs are other examples where useful protection profiles have been created. CC as currently applied is arguably deficient is in two ways: 1) PPs don’t currently exist for many categories of products (Mobile devices and instant messaging applications for example). 2) An evaluation is not internationally “required” to evaluate a given product against a PP (although the US has such policies). The former would be a solvable problem if industry were willing to step in and help lead creation of protection profiles where none exist currently as the smart card vendors have done. Solving the latter would require more fundamental policy changes by the governing bodies of Common Criteria, and presumes a solution exists to the former.

     

    Where Common Criteria arguably does NOT do a reasonable job is in addressing implementation vulnerabilities. While CC does have some limited provisions that attempt to address this concern, experience in the real world offers ample evidence that CC fails to meet customer (both government and private sector) needs and expectations for assurance that a given product does not contain implementation vulnerabilities that expose customers to risk. It has been our experience that customers typically don’t care whether they are exposed to risk from a design vulnerability or an implementation vulnerability, they care that they are exposed to risk. Period. When customers ask “Is it Safe?” they expect software that can be deployed and maintained to operate securely in the face of adversarial activity. The chairman of the Common Criteria Development Board (David Martin) agreed with these points in his presentation at the ICCC in Rome this year. It’s not that CC can’t do this; it’s just that it currently doesn’t. This is the area where Steve Lipner, myself, and others have pointed out repeatedly (maybe too repeatedly) that CC needs to improve.

     

    As I mentioned above, Common Criteria also falls short meeting customer needs in producing useful information that addresses deployment vulnerabilities. A CC evaluation is conducted against a specific configuration of a product known as the “Target of Evaluation” (aka TOE). Information in the TOE is expressed using CC language and syntax which is typically not digestible by average IT personnel. The TOE is defined by the vendor, and may or may not reflect the product’s default installation configuration, or other common configurations reflecting how the product is deployed in the real world. In many examples, the guidance on deploying software securely is at odds with how it is used in the real world.  For example, as I recall, a few years ago, an operating system was evaluated under the US Controlled Access Protection Profile in a configuration that had only an FTP server (configured for anonymous access) enabled. This sort of fiction doesn’t meet customer needs.

     

    One of the other key challenges of Common Criteria today is the timeliness of completing CC evaluations. It typically takes 12 to 24 months or longer to complete an evaluation at the highest assurance levels (EAL4) that can be attained by general purpose commercial software products. Since software vendors will typically release new major versions of their products at 18-36 month intervals, this creates a dilemma for customers in that CC evaluation results typically lag about one version behind the currently available version of a given product. Hence, adding time and effort to address current CC deficiencies to a process that is already too slow to meet customer needs creates a real quandary. 

     

    This all leads up to asking some fundamental question about the goals and purpose of Common Criteria. If CC simply validates conformance to a set of documented security feature requirements, then CC needs to better communicate this limited scope to its customers in order to set expectations that it will “help keep honest people honest” – but is incomplete or inadequate in terms of assurance of the security of assets on a system.  (CC is good in some bounded scenarios such as smart cards, but much less good in scenarios with larger scale/complex software.)  If CC aspires to truly meet customer needs to answer the question “Is it Safe?” – then CC needs to consider the real world evidence in terms of vulnerability rates found in CC evaluation products to discover it is currently failing to meet customer needs in that regard. Microsoft has had several products evaluated under CC (Microsoft Internet Security and Acceleration Server (ISA), Microsoft SQL Server 2005 SP1, Microsoft Exchange Server 2003, and several versions of Microsoft Windows). However, CC has been an insufficient answer to the question our customers ask “Is it Safe?”  The Security Development Lifecycle is what has made the difference in enabling Microsoft to successfully reduce vulnerabilities in our products.

     

    If customers expect a real-world answer to the question “Is it Safe?” to be answered by Common Criteria, then Common Criteria must change.

     

  • The Security Development Lifecycle

    Security is not all about Security Updates

    • 5 Comments
    Hi, Michael here.

    I'm always asked "How can you claim the SDL is working when Microsoft still issues security updates?" So I want to make sure people understand the goals of the SDL and perhaps more importantly, the non-goals.

    There are three major security-related disciplines here at Microsoft and people outside the company often confuse the three.

    • 1. Security feature development
    • 2. Security response
    • 3. Secure software engineering

    The first is all about building security features such as authentication technologies, firewalls and such. This is not SDL. At Microsoft the SDL obviously impacts the design and code that goes into these security features, however.

    Next is the response process. All software has security vulnerabilities at some stage, and it's important that quality updates for all supported versions of the software in all supported languages be available as soon as possible. But no sooner! You can't rush a security fix out with minimal testing or on a subset of supported platforms or languages because you run the risk of releasing sub-quality fixes, or protecting some customers, but not all.

    Finally, we come to secure software engineering. When we set out on the SDL journey, we realized that we needed to achieve two main objectives. The first is to reduce the number of vulnerabilities that creep into the software's design and code. I want to emphasize this point because this is the single most important goal of the SDL: To reduce the number of vulnerabilities in software products. This is not about who can fix bugs faster, SDL is about reducing the chance that vulnerabilities are added to the software in the first place. Writing lots of code quickly, shipping it and then racing to fix security bugs later is not engineering, it's chaos, and it's not good for customers. A question I like to ask software developers outside of Microsoft is, "what are you doing to reduce the chance an engineer will add a new security bug to the system?" The answer to this question must be holistic and include:

    • Education
    • Secure design and attack surface reduction
    • Threat modeling
    • Secure coding requirements (note the word, "requirements" not "best practices")
    • Static analysis tools
    • Testing requirements
    • End-user security documentation.
    • Response Planning

    In a nutshell, this is a high-level view of SDL process.

    The next goal of the SDL is to reduce the impact of security vulnerabilities missed during the software development process. Security is an ongoing arms race where attackers constantly devise new attacks to thwart the defender's defenses. Which means you can never hope for zero security vulnerabilities. We have seen many of these forward-looking defenses in action in Windows Vista, IIS6, SQL Server 2005 and Office 2007.

    Look carefully at the list of products I just mentioned, they are all products that had a full release after the implementation of security process improvements at Microsoft. They are not service packs, and this is where I need to make a critically important point about the SDL. To gain the full impact and benefit of the SDL, you must apply the SDL to a product at its inception. With the exception of Windows XP SP2, (which was a security-focused release, but predates the SDL), service packs at Microsoft include fixes and perhaps some opportunistic feature enhancements requested by customers. Such releases cannot get the full benefit of the SDL, because security is not just about bug fixes, it is a holistic property that goes beyond fixing implementation vulnerabilities to encompass sound design and defense in depth.

    Ultimately, this means that newer Microsoft code is more secure than the older Microsoft code, and that is the trend we're seeing across the board. Don't expect to see a marked drop in the vulnerability count in older code.  You won't see it, because we can't dramatically improve the security of an already released product.
  • The Security Development Lifecycle

    Reliability Vs. Security

    • 1 Comments

    James Whittaker here.

    At the International Symposium on Software Reliability Engineering (ISSRE 07, Trollhattan Sweden) one would think that the security versus reliability debate would be very one-sided. After all, reliability is the attendees’ mainstay and if there is one group of folks on the planet who would see security as a subset or subsidiary concern, it might be the industry and academic experts that attend this prestigious IEEE conference.

    I gave the ‘industry keynote’ to open the second day of ISSRE 07 this past November, and started this debate by focusing on the topic that consumes my days: security. I painted a picture of the disaster scenarios we spend a heroic amount of effort trying to avoid and talked about the technical and organizational challenges to getting it right. But after the talk, the discussion centered on a broader topic: is security more difficult to achieve than reliability? Afterwards, a gaggle of professors from five continents and practitioners from Saab, Ericsson, Microsoft, Cisco, IBM and Google debated the matter from the halls of the conference to the pubs in the Trollhattan city center.

    Here are two points discussed at length during the debate:

    1.       Reliability folks are lucky – they have a clear definition of what a bug is: a deviation between the application and the spec. Having a spec means understanding which behaviors are bugs and which are by design; it’s an unerring guide to testing. Security folks have no such oracle since we have no way of specifying all the ways in which an application might be exploited (a threat model might represent our best effort). Without such a spec, topics such as coverage, completeness and so forth have little meaning for security folks and testing is much harder because without a spec we don’t know what we are looking for.

     

    This is a nice state of affairs for reliability until you realize that specs are not what they are cracked up to be. Given the traditional natural language format of most written specs, they are notoriously ambiguous and have an annoying tendency to become out of date as the code evolves and they do not! Sorry, but I refuse to score any advantage to reliability on this point. The state of our collective design documentation and specs won’t allow it.

     

    2.       Security folks are lucky – they only have to deal with a subset of the entire bug space. Their only concern is those components that consume untrusted input and only then the subset of issues that might be exploitable. The rest of the issues can be ignored. Reliability people, on the other hand, must deal with the entirety of the application because reliability bugs can be anywhere. Reliability folks deal with this by weighting their tests according to an operational profile, an unwieldy proposition at best and one that security folks can safely ignore (because hackers don’t follow an operational profile).

     

    As a security guy, this sounds pleasing: I have a smaller problem to deal with! But the solar system is a lot smaller than the galaxy and it isn’t particularly more ‘explorable’ because of its smaller size. It’s only recently, after centuries of study, that we realized there are Pluto-sized rocks out there. Let’s face it, even by reducing the places we have to explore, there are still too many to have any hope of covering them all. The solar system and the galaxy are the same size because they are both too big to be adequately explored with our current methods. Advantage to Security? Nope.

    The one thing we both have in common is an unqualified ability to cause pain to our users. Of course there are exceptions, but with security that pain is extreme and happens over the short period of time in which the exploit runs undetected (and the subsequent recovery). With reliability, the pain is often less intense but occurs more frequently and over longer periods of time; it’s those annoying little bugs that waste time and force awkward work-arounds. You can pull the band-aid off all at once or endure it a little at a time. The pain is equally unacceptable.

    There is one point I will readily cede to the reliability community: they can teach the security community a thing or two about analyzing data. Metrics are an often-used if still imprecise reliability tool. The use of Bayesian statistics, stochastic processes and reliability modeling is well developed and has been proven time and again on real software development data. Reliability analysis is predictive and can be used to monitor the development process. But in security we rely on simple counting of vulnerabilities and metrics such as ‘days of risk.’ Security measures are more often used to place blame and point fingers than to estimate or predict anything. Security learning tends more toward Pavlov than Markov: when it keeps on hurting, eventually we stop doing it.

    But there is also one point the reliability community must cede: security folks are more proactive with corrective action. We spend far more time acting on data than analyzing it. In security, we’ve managed to mitigate and even drive to near-extinction entire classes of vulnerabilities. Despite our inability to measure security, we are very good at driving development and testing process change.  The SDL is a perfect example of this – it’s been proven in practice on some of the most complex software on the planet. Yes, we get it wrong from time-to-time, but we learn from those mistakes.

    Security and reliability are different aspects of the general problem of protecting our customers. There is much to learn by our communities working together and sharing solutions that will make our software work better and more securely. ISSRE convinced me that we in the security community are missing out on decades of research in fault and failure analysis that would serve us well. And I think the reverse is true too, that by our example, reliability can be better embedded into the development lifecycle to drive improvements and better protect customers.

    I look forward to ISSRE 08, enough so that I’ve helped convince Microsoft to host it. See you next November in Redmond.

Page 1 of 1 (3 items)