June, 2007

  • The Security Development Lifecycle

    Lessons Learned from MS07-029: The DNS RPC Interface Buffer Overrun

    • 8 Comments

    Hi, Michael Howard here (again).

    Before I get started on this post, I want to set some expectations. My plan is to blog occasionally about our root cause analyses, but I will not blog about every vulnerability we fix simply because I don't have the time; while we analyze each and every vulnerability we address in a bulletin I want to highlight only those that can help customers better understand particular elements of how this works with SDL . With that in mind, I will blog about vulnerabilities that interest me, and most will be critical in one or more of our products. Finally, I outline specifically how the vulnerability in question relates to potential SDL change or improvement.

    As you are probably aware, Microsoft recently issued a security bulletin that fixed a security vulnerability in the DNS server code in Windows Server components. I want to spend a couple of minutes to explain the vulnerability and how various versions of Windows server were affected, and most importantly, what we learned from the vulnerability.

    Which products are affected?

    It's important to note, this vulnerability is in the DNS server code and is only in the Windows server platforms; Windows XP and Windows Vista users are not affected at all because the client operating systems do not include DNS server code, only DNS client code.

    The DNS server is not enabled by default, except when Small Business Server is installed, or the computer is configured as an Active Directory domain controller.

    Windows 2000 and Windows Server 2003 systems running DNS are affected, as is "Longhorn" Server beta 2, but we fixed the vulnerability in time for beta 3. In fact, to protect customers, we chose to run the real risk of slipping beta 3 to include this change.

    The Nature of the Vulnerability

    The vulnerability is in code listening on ephemeral RPC ports used for DNS server management, the vulnerability is not in the mainline port 53 DNS processing code. Since Windows XP SP2, we made RPC communication authenticated by default; this was a direct outcome of lessons learned from the Blaster worm. But what's interesting is that this RPC end-point is anonymously accessible.

    In and of itself, anonymous access is not a security vulnerability, it's an attack surface issue because an anonymously accessible entry point has a larger attack surface than an authenticated entry point.

    The Code

    The vulnerability is a stack-based buffer overrun in a structure; the structure that's overrun is:

    typedef struct _CountName {

        UCHAR   Length;

        UCHAR   LabelCount;

        CHAR    RawName[ DNS_MAX_NAME_LENGTH+1 ]; 

    } COUNT_NAME, *PCOUNT_NAME;

    A pointer to this structure is passed to the following function as well as the DNS name to crack, the last element in PCOUNT_NAME, PCOUNT_NAME->RawName is overwritten by untrusted data. Untrusted code and data constructs appear in red.

    DNS_STATUS Name_ConvertFileNameToCountName(

        PCOUNT_NAME     pCountName,

        PCHAR           pchName,

        DWORD           cchNameLength) {

     

        PCHAR       pch;

        UCHAR       ch;

        PCHAR       pchstartLabel;      // ptr to start of label

        PCHAR       pchend;             // ptr to end of name

        PCHAR       presult;

        PCHAR       presultLabel;

        PCHAR       presultMax;

        WORD        charType = 0;

        WORD        maskDowncase;

        DNS_STATUS  status;

        INT         labelLength;        // length of current label

        UCHAR       labelCount = 0;

     

        //  result buffer, leave space for label

        presultLabel = presult = pCountName->RawName;

        presultMax = presult + DNS_MAX_NAME_LENGTH;

        presult++;

     

        //  Character selection mask

        //      '\' slash quote

        //      '.' dot label separator are special chars

        //      upper case must be downcased

        //      everything else is copied

        maskDowncase = B_UPPER;

     

        //  setup start and end ptrs and verify length

        pchstartLabel = pch = pchName;

        if ( !cchNameLength )

            cchNameLength = strlen( pch );

     

        pchend = pch + cchNameLength;

     

        while ( pch ) {

     

            if ( pch >= pchend ) {

                ch = 0;

                charType = FC_NULL;

            }

     

            ...

     

            //  downcase upper case

            if ( charType & maskDowncase ) {

                //  if name exceeds DNS name max => invalid

                if ( presult >= presultMax )

                    goto InvalidName;

     

                *presult++ = DOWNCASE_ASCII(ch);

                continue;

            }

     

            if ( charType & B_DOT ) {

                //  verify label length

                labelLength = (int)(presult - presultLabel - 1);

     

                if ( labelLength > DNS_MAX_LABEL_LENGTH )

                    goto InvalidName;

     

                //  set label count in result name

                *presultLabel = (CHAR)labelLength;

                presultLabel = presult++;

     

                if ( pch >= pchend ) {

                    if ( labelLength != 0 ) {

                        labelCount++;

                        *presultLabel = 0;

                        break;

                    }

     

                    presult--;

                    break;

                }

     

                //  set up for next label

                if ( labelLength != 0 ) {

                    labelCount++;

                    continue;

                }

     

                goto InvalidName;

            }

     

            //  quoted character

            //      - single quote just get next char

            //      - octal quote read up to three octal characters

            else if ( ch == SLASH_CHAR ) {

                //  if name exceeds DNS name max => invalid

                if ( presult >= presultMax )

                    goto InvalidName;

     

                << length of presult is not constrained >>

          << extractQuotedChar overwrites pch >>

                pch = extractQuotedChar(

                        presult++,

                        pch,

                        pchend );

            }

        }     

    As you can see there is a lot of boundary checking in the code, but the critical one that constrains presult to pch is missing, thus we have this vulnerability and the lesson to learn here is that it only takes one missing check!

    Analysis Tools

    Our static analysis tools didn't find this because of the nature of the structure that was overrun:

    typedef struct _CountName {

        UCHAR   Length;

        UCHAR   LabelCount;

        CHAR    RawName[ DNS_MAX_NAME_LENGTH+1 ]; 

    } COUNT_NAME, *PCOUNT_NAME;

    Look at the last element, it's a buffer. There are a number of structures that follow the "put a buffer at the end" pattern, for example the security identifier (SID) structure:

    typedef struct _SID {

       BYTE  Revision;

       BYTE  SubAuthorityCount;

       SID_IDENTIFIER_AUTHORITY IdentifierAuthority;

       DWORD SubAuthority[ANYSIZE_ARRAY];

    } SID, *PSID;

    The buffer at the end of the SID structure is a series of DWORDs with its initial size ANYSIZE_ARRAY, is set to 1. These are often referred to as variable-length arrays, and our static analysis tools are tuned to look for some of these constructs, but PCOUNT_NAME->RawName is not a variable length array, it's a fixed array, and right now, our tools do not analyze such constructs. We've proposed changes to our static analysis tools to allow us to find this vulnerability type.

    In short, we had the combination of missing constraints on data and the use of a variable length array that foiled our static analyzers.

    Fuzzing

    We performed minimal RPC fuzz testing on this interface because we understood this to be an administrator-only-accessible interface. We didn't discover this vulnerability because previously our process did not include tooling to verify whether an RPC end-point is authenticated or not. It's important to understand that given a set of interfaces into a system, analysis and testing is prioritized based on accessibility. For example, a remotely and anonymously accessible network interface will get much more scrutiny than a local-admin-only interface.

    Keep in mind that fuzzing can only do so much, and in this case, the string required to trigger the vulnerable code path was a series of backslashes followed by either an octal number or a single character. The probability of this specific string being generated at random is very small. We're working to make our fuzzers smarter in this regard. Fuzzing is a practice that constantly evovlves. Clearly, there is contextual information from the code and from the input domain that can be used for the next iteration of our fuzzing tools. This really shows that fuzz-testing is not a security testing panacea; it's very effective, but it's not going to uncover all vulnerabilities.

    Operating Systems Defenses

    Of great interest are the defenses that come into play. Remember, the goals of SDL are two-fold:

    • The first is to reduce the number of security vulnerabilities in the code
    • The second is to reduce the severity of the vulnerabilities that are not found by the current SDL process.

    The rest of this section outlines with versions of Windows have which defenses, if you are not familiar with /GS, /SafeSEH, Address Space Layout Randomization (ASLR) and data execution prevention (DEP/NX), then you should read one of my previous blog posts on the subject.

    In Windows 2000 there are no defenses at all because Windows 2000 pre-dates SDL. In Windows 2000 there is no firewall, no /GS, no DEP/NX and no ASLR, so exploit code runs predictably and easily.

    Windows Server 2003 is compiled with /GS, DEP/NX is available, but there is no ASLR. There is a firewall in Windows Server 2003, but it is not enabled by default. For some coding constructs, real world exploits have circumvented the /GS implementation in Windows Server 2003 in the past.

    Windows "Longhorn" Server, and now Windows Server 2008 is compiled with /GS, linked with /SafeSEH, DEP/NX is enabled by default, ASLR is also enabled by default and the stack is randomized too (the heap is randomized too, but that's not relevant to this vulnerability). The firewall in Windows "Longhorn" Server and later is enabled by default. This combination of defenses raises the bar substantially. In fact, /GS and /SafeSEH alone would have been good enough to mitigate attacks because all known /GS bypass techniques are mitigated by code created by the VC++ 2005 compiler. Of course, new exploit techniques to bypass /GS could be found in the future.

    Another interesting point is in Windows Server 2003, it is possible in some cases for an exploit to disable DEP/NX <link: http://uninformed.org/?v=2&a=4>. This attack is not effective in Windows Vista and Windows "Longhorn" Server, because we won't allow user-mode code to change the status of its DEP/NX bit once it is set.

    Another critical defense is service restart policy. This combined with ASLR makes exploits less viable by limiting the number of attacks possible. At the same time, ASLR increases the number of attempts required by increasing the entropy of the attack surface.

    Simply put, when a service crashes, Windows can take action such as running another process or restarting the service. In the case of the DNS service on Windows "Longhorn" Server, the service will restart at most twice within 24 hours. Thus, within 24 hours, if there are three crashes, the following sequence of events will happen:

    • Attack. Crash. Restart.
    • Attack. Crash. Restart.
    • Attack. Crash. Dead.

    In other words, in an attempt to circumvent ASLR, an attacker has only three tries to get critical addresses right, after which they're prevented from additional tries because the service is no longer running. Also, there is a two-minute delay between restarts, substantially slowing down the attacker and foiling automated attempts.

    For architectural and performance reasons ASLR uses 8 bits of entropy, so an attacker has to conduct at most 256 attacks to defeat ASLR. We had a great deal of discussion internally about how to make the tradeoff between the extra reliability afforded by services restarting, and the extent to which ASLR is undermined when a service does restart. We decided that 2 restarts a day was a reasonable tradeoff - if a service is crashing so frequently that it needs to restart more than twice a day, there are only two possibilities:  1) the service is under attack, and the smart thing to do is to stay down, or 2) the service is buggy should be fixed. Customers can control the restart behavior on a service-by-service basis, but we don't recommend it.

    Note that this doesn't create a DOS threat where one didn't exist before. Instead, we've limited the number of times that a service will allow itself to remain under attack.

    Summary

    There is a lot to learn from the DNS RPC  vulnerability. As an outcome of this vulnerability, we are more carefully scrubbing all RPC end-points to verify whether they should really be anonymously accessible. We have also updated our fuzzers to add more context-centric test cases and these updates are now in use. Our static analysis tools will be updated to accommodate more variable-length array variants.

    We have also updated our internal training to make sure people understand that they validate their assumptions about network end-point accessibility.

    On the good news front (I am an optimist, after all!), the defenses in Windows "Longhorn" Server beta 2 looked solid; even though it had the vulnerability, the combination of...

    • DEP/NX
    • ASLR
    • GS
    • Stack randomization
    • SafeSEH
    • Firewall
    • Service Restart Policy

    ... are good mitigations as they make it substantially harder for exploit code to run reliably.

    - Michael

    (Thanks to all the folks who helped put this analysis together: Chris Walker, Chris Budd, Eric Bidstrup, Nitin Kumar Goel, Steve Lipner, Shawn Hernan, Adam Shostack, James Whittaker and Dave Ladd.)

  • The Security Development Lifecycle

    A Security Lesson that Transcends Programming Language and Operating System Religion

    • 2 Comments

    Hi, Michael here.

    A few weeks ago, my boss, Steve Lipner placed a copy of eWeek on my desk opened to an article entitled, “Java Security Traps Getting Worse.” In summary, the article, which is also available online (http://www.eweek.com/article2/0,1895,2128071,00.asp) lamented the state of security in applications written using Java. I’ll let you read the article and draw your own conclusions, but I have my own thoughts on the problem, and I want to explain my position by way of example.

    In 2003, I presented a paper on the SDL at the Workshop on Software Security (http://dimacs.rutgers.edu/Workshops/Software/abstracts.html) and the topic of conversation turned quickly to how bad the C and C++ programming languages are from a security perspective. Someone from the audience shouted out that “We should just ban C and C++ and replace it with Java.” I replied that we could ban <insert programming language of choice> and replace it with <another programming language of choice> and we’d still have lots of security bugs. I said this because programming languages by themselves don’t make secure code, but I believe that a number of people believe they will. I think a good many people believe that so long as the language is not C or C++, then the code is inherently secure.

    This belief in a programming security silver bullet is very dangerous, and I believe we address this nicely in the SDL. In fact, a lack of a silver bullet is a core principle of the SDL. We teach people, actually, no we don’t teach people: we ram it down engineer’s throats: the number one skill a person designing, building or testing software should learn is that input should never be trusted. Regardless of OS or programming language. Ever.

    Don’t get me wrong, I’m not bagging the data in the article; I’ve seen people believe that .NET code is a security panacea, and it’s not. In all of our SDL education we stress the point that .NET code is not a security cure-all, and we make sure that developers understand that if you choose to ignore the golden rule about never trusting input no language or tool will save you. Using safe string functions can help mitigate risks of buffer overruns in C/C++, but won’t help with integer arithmetic, XSS, SQL injection, canonicalization or crypto issues.

    The key point is that languages are just tools; anyone using a tool needs to understand the strengths and limitations of any given tool in order to make informed use of the tool.

    One final thought; in my opinion, well-educated software developers using C/C++ who follow secure programming practices and use appropriate tools will deliver more secure software than developers using Java or C# who do not follow sound security discipline.

  • The Security Development Lifecycle

    SDL Training at the Microsoft Security Response and Safety Summit

    • 1 Comments

    Hi – Dave here.

    If you have read Michael Howard’s blog for a while, you may recall that our team held a two-and-a-half day SDL training session back in November for fifty senior engineers from a number of the hardware OEMs and some of their component suppliers.  At that time, we heard strong feedback from the group that they wanted us to work with other sectors of the IT community, (particularly with ISVs) to get more people attuned to the importance of security development practices. Since that time we have been doing mostly individual engagements – informally briefing and working with customers, partners and others usually on a one-to-one basis.
     
    Yesterday, we held a somewhat shorter version of the November session for an interesting mix of security-focused groups as part of the Microsoft Security Response and Safety Summit (MSRSS).  MSRSS is an event focused on sharing information with a number of different security groups that comprise the Microsoft Security Response Alliance and the Secure IT Alliance.  Our track was one of two running concurrently – we had forty or so in attendance from government, ISPs, AV vendors, CERTs, security ISVs and the like.

    Scott Charney opened the event with a discussion about the evolving threat environment – he emphasized the need for cooperation and information sharing from all the players in the IT security space. After Scott’s call to action, we moved into the training portion of the day.  Most of the SDL blog gang presented – Eric Bidstrup provided an overview of the SDL; Adam Shostack discussed the threat modeling process and how it is evolving; Michael Howard did a variant of his “Writing Secure Code” talk; James Whittaker talked about security testing and the final session focused on privacy, with Tina Knutson and Sue Glueck.  In November, I gave a talk on the intersection of security policy and development practice – given the time constraint, this time I gracefully accepted the role of emcee and checked hats and coats for the attendees.  : ^ )

    At first I was a little concerned about the potential for “impedance mismatch” – after all, we were talking about the various facets of security development methodology and this was an event traditionally focused on how to collaborate effectively in a time of security crisis.  It occurred to me about half way through the opening talk that it was okay if this wasn’t a developer to developer talk – everyone in the room had “a dog in the security fight” and as a result, it’s incumbent upon us to have as many conversations with folks in the community (regardless of discipline) as we can.

    Exposure to these partner groups provided us with interesting perspectives – the attendees asked good questions (a few of them pointy) and many of us were corralled during the breaks for robust discussions on a variety of subjects.
     
    Some of you may be thinking "So what? Microsoft had another security event – whoopee!!”  Fair enough.  However, in our defense, I’d like to make two points.  First, I think it’s mildly amusing that the notion of Microsoft hosting a security collaboration event has become so commonplace – it wasn’t so long ago that Microsoft and security couldn’t be uttered in the same sentence without fits of laughter – it’s interesting how times change.  Second, (and of far greater importance) we strongly believe that it’s in the best interest of protecting our mutual customers to work with these groups and share our knowledge. Conversely, it’s an opportunity for us to learn about emerging issues and concerns that we typically aren’t exposed to on a day-to-day basis.

    Bottom Line: It was a great opportunity and a practice that we will no doubt carry forward.

  • The Security Development Lifecycle

    The Making of a Privacy Savvy Test Team

    • 4 Comments

    Rob Roberts here.

    Software test engineers have a lot of things to consider when testing their products: performance, security, accessibility, reliability, usability, and a whole bunch of other “-ilities.” And now to address our increasingly interconnected world, they have yet another important area to evaluate - privacy.

    It’s hard to imagine a testing topic more arcane than security but privacy may be just that. Just like software security, privacy isn’t taught in universities, there are few engineering-oriented books on the subject, and just what it means to do privacy testing is still being worked out. You can’t have privacy without security, but there are many other things to consider around privacy besides keeping data secure.  To help narrow what privacy behaviors testers should consider, we provide an approach for test teams as well as some basic tools they can use to understand the specific privacy scenarios relevant to their application. 

    The definitive source for all things privacy in the SDL is our internal privacy guidelines for developing products and services (the public version is posted here).   This document considers a wide range of privacy scenarios from storing customer information in the enterprise to privacy considerations around developing and publishing Web sites.  It includes quite a bit of detail – something our privacy Subject Matter Experts (SMEs) depend on to do thoughtful and thorough privacy reviews.  While some privacy testers may want to understand every nuance in this material, many would prefer to be given something that was more focused so they can concentrate on what they do best: breaking software.  To streamline the development of privacy “test” scenarios, we distilled the guidance and identified these three key steps for testers:

    Step 1 Understand what data is collected and how it is used.

    Test Engineers examine documentation (threat models, specs, design documents and so forth), meet with feature experts, and review source code to determine what data is being collected, stored and/or transferred by their application. This “data” includes all input, local storage (temporary or persistent) or remote storage that the application can access. Once they know what data is in play, they must classify it by type (anonymous, PII, sensitive PII), context (local storage vs. transfer off the system), and visibility (e.g., hidden metadata).  If necessary the tester can then validate these classifications with a privacy SME to determine what user information is potentially at risk, and how that risk is mitigated by the application.

    When testers know what the data is and exactly how it flows through the system being tested, they identify the privacy impacting scenarios of their application.  Based on our privacy guidelines we created a “Privacy Bug Bar” to describe potential privacy bugs and their severity. 

    A tool we are prototyping for select internal groups is an interactive form for test engineers to identify privacy impacting behaviors and receive a summary of the applicable privacy scenarios (from the privacy guidelines) along with pre-defined bug bar information for those scenarios.  This tool is expected to save test engineers time in assessing privacy issues by giving them only the relevant privacy scenarios that affect their project.

    Step 2 Review the application’s privacy statement.

    During development privacy SMEs work with teams to create privacy statements for projects with privacy impacting behaviors.  An effective privacy statement informs users how the product, service, or web site behavior impacts their privacy, and what controls are available to change those behaviors.  Part of the tester’s responsibility is to ensure the published privacy statement accurately represents what their application does.

    Step 3 Verify data use.

    Armed with the knowledge of relevant privacy behaviors from steps 1 and 2, testers can now enhance their test cases by including privacy-aware scenarios throughout their functional testing.  If their testing discovers a violation of the guidelines then the predefined bug bar will help them rate its severity.

    One of our biggest concerns in the privacy space is off-system communication, or what we call “phoning home.” An example of phoning home is software that goes online to check for newer versions or security updates.  Applications often rely on the ability to communicate across networks, but as data moves further away from the local system, risk of data exposure increases. When applications phone home, care needs to be taken to assure that this communication happens only with the customer’s permission, that they know what data is being sent, that the data is sent securely, and how it will be used upon reaching its destination.  Currently many test teams rely on very basic network monitoring to validate when their application is phoning home.  This is a manual and time consuming process that we are working to automate.

    With this approach and our ongoing commitment to automation we are working to embed privacy into the testing process so that testers understand the relevant privacy scenarios for the software they are testing and enhance their ability discover problems before the software ships.

    Privacy is important to users, and is a major component to building trustworthy software.  Testers play a key role in meeting the privacy demands that the SDL places on Microsoft products and services.

Page 1 of 1 (4 items)