Lessons Learned from MS07-029: The DNS RPC Interface Buffer Overrun

Lessons Learned from MS07-029: The DNS RPC Interface Buffer Overrun

Rate This
  • Comments 8

Hi, Michael Howard here (again).

Before I get started on this post, I want to set some expectations. My plan is to blog occasionally about our root cause analyses, but I will not blog about every vulnerability we fix simply because I don't have the time; while we analyze each and every vulnerability we address in a bulletin I want to highlight only those that can help customers better understand particular elements of how this works with SDL . With that in mind, I will blog about vulnerabilities that interest me, and most will be critical in one or more of our products. Finally, I outline specifically how the vulnerability in question relates to potential SDL change or improvement.

As you are probably aware, Microsoft recently issued a security bulletin that fixed a security vulnerability in the DNS server code in Windows Server components. I want to spend a couple of minutes to explain the vulnerability and how various versions of Windows server were affected, and most importantly, what we learned from the vulnerability.

Which products are affected?

It's important to note, this vulnerability is in the DNS server code and is only in the Windows server platforms; Windows XP and Windows Vista users are not affected at all because the client operating systems do not include DNS server code, only DNS client code.

The DNS server is not enabled by default, except when Small Business Server is installed, or the computer is configured as an Active Directory domain controller.

Windows 2000 and Windows Server 2003 systems running DNS are affected, as is "Longhorn" Server beta 2, but we fixed the vulnerability in time for beta 3. In fact, to protect customers, we chose to run the real risk of slipping beta 3 to include this change.

The Nature of the Vulnerability

The vulnerability is in code listening on ephemeral RPC ports used for DNS server management, the vulnerability is not in the mainline port 53 DNS processing code. Since Windows XP SP2, we made RPC communication authenticated by default; this was a direct outcome of lessons learned from the Blaster worm. But what's interesting is that this RPC end-point is anonymously accessible.

In and of itself, anonymous access is not a security vulnerability, it's an attack surface issue because an anonymously accessible entry point has a larger attack surface than an authenticated entry point.

The Code

The vulnerability is a stack-based buffer overrun in a structure; the structure that's overrun is:

typedef struct _CountName {

    UCHAR   Length;

    UCHAR   LabelCount;

    CHAR    RawName[ DNS_MAX_NAME_LENGTH+1 ]; 

} COUNT_NAME, *PCOUNT_NAME;

A pointer to this structure is passed to the following function as well as the DNS name to crack, the last element in PCOUNT_NAME, PCOUNT_NAME->RawName is overwritten by untrusted data. Untrusted code and data constructs appear in red.

DNS_STATUS Name_ConvertFileNameToCountName(

    PCOUNT_NAME     pCountName,

    PCHAR           pchName,

    DWORD           cchNameLength) {

 

    PCHAR       pch;

    UCHAR       ch;

    PCHAR       pchstartLabel;      // ptr to start of label

    PCHAR       pchend;             // ptr to end of name

    PCHAR       presult;

    PCHAR       presultLabel;

    PCHAR       presultMax;

    WORD        charType = 0;

    WORD        maskDowncase;

    DNS_STATUS  status;

    INT         labelLength;        // length of current label

    UCHAR       labelCount = 0;

 

    //  result buffer, leave space for label

    presultLabel = presult = pCountName->RawName;

    presultMax = presult + DNS_MAX_NAME_LENGTH;

    presult++;

 

    //  Character selection mask

    //      '\' slash quote

    //      '.' dot label separator are special chars

    //      upper case must be downcased

    //      everything else is copied

    maskDowncase = B_UPPER;

 

    //  setup start and end ptrs and verify length

    pchstartLabel = pch = pchName;

    if ( !cchNameLength )

        cchNameLength = strlen( pch );

 

    pchend = pch + cchNameLength;

 

    while ( pch ) {

 

        if ( pch >= pchend ) {

            ch = 0;

            charType = FC_NULL;

        }

 

        ...

 

        //  downcase upper case

        if ( charType & maskDowncase ) {

            //  if name exceeds DNS name max => invalid

            if ( presult >= presultMax )

                goto InvalidName;

 

            *presult++ = DOWNCASE_ASCII(ch);

            continue;

        }

 

        if ( charType & B_DOT ) {

            //  verify label length

            labelLength = (int)(presult - presultLabel - 1);

 

            if ( labelLength > DNS_MAX_LABEL_LENGTH )

                goto InvalidName;

 

            //  set label count in result name

            *presultLabel = (CHAR)labelLength;

            presultLabel = presult++;

 

            if ( pch >= pchend ) {

                if ( labelLength != 0 ) {

                    labelCount++;

                    *presultLabel = 0;

                    break;

                }

 

                presult--;

                break;

            }

 

            //  set up for next label

            if ( labelLength != 0 ) {

                labelCount++;

                continue;

            }

 

            goto InvalidName;

        }

 

        //  quoted character

        //      - single quote just get next char

        //      - octal quote read up to three octal characters

        else if ( ch == SLASH_CHAR ) {

            //  if name exceeds DNS name max => invalid

            if ( presult >= presultMax )

                goto InvalidName;

 

            << length of presult is not constrained >>

      << extractQuotedChar overwrites pch >>

            pch = extractQuotedChar(

                    presult++,

                    pch,

                    pchend );

        }

    }     

As you can see there is a lot of boundary checking in the code, but the critical one that constrains presult to pch is missing, thus we have this vulnerability and the lesson to learn here is that it only takes one missing check!

Analysis Tools

Our static analysis tools didn't find this because of the nature of the structure that was overrun:

typedef struct _CountName {

    UCHAR   Length;

    UCHAR   LabelCount;

    CHAR    RawName[ DNS_MAX_NAME_LENGTH+1 ]; 

} COUNT_NAME, *PCOUNT_NAME;

Look at the last element, it's a buffer. There are a number of structures that follow the "put a buffer at the end" pattern, for example the security identifier (SID) structure:

typedef struct _SID {

   BYTE  Revision;

   BYTE  SubAuthorityCount;

   SID_IDENTIFIER_AUTHORITY IdentifierAuthority;

   DWORD SubAuthority[ANYSIZE_ARRAY];

} SID, *PSID;

The buffer at the end of the SID structure is a series of DWORDs with its initial size ANYSIZE_ARRAY, is set to 1. These are often referred to as variable-length arrays, and our static analysis tools are tuned to look for some of these constructs, but PCOUNT_NAME->RawName is not a variable length array, it's a fixed array, and right now, our tools do not analyze such constructs. We've proposed changes to our static analysis tools to allow us to find this vulnerability type.

In short, we had the combination of missing constraints on data and the use of a variable length array that foiled our static analyzers.

Fuzzing

We performed minimal RPC fuzz testing on this interface because we understood this to be an administrator-only-accessible interface. We didn't discover this vulnerability because previously our process did not include tooling to verify whether an RPC end-point is authenticated or not. It's important to understand that given a set of interfaces into a system, analysis and testing is prioritized based on accessibility. For example, a remotely and anonymously accessible network interface will get much more scrutiny than a local-admin-only interface.

Keep in mind that fuzzing can only do so much, and in this case, the string required to trigger the vulnerable code path was a series of backslashes followed by either an octal number or a single character. The probability of this specific string being generated at random is very small. We're working to make our fuzzers smarter in this regard. Fuzzing is a practice that constantly evovlves. Clearly, there is contextual information from the code and from the input domain that can be used for the next iteration of our fuzzing tools. This really shows that fuzz-testing is not a security testing panacea; it's very effective, but it's not going to uncover all vulnerabilities.

Operating Systems Defenses

Of great interest are the defenses that come into play. Remember, the goals of SDL are two-fold:

  • The first is to reduce the number of security vulnerabilities in the code
  • The second is to reduce the severity of the vulnerabilities that are not found by the current SDL process.

The rest of this section outlines with versions of Windows have which defenses, if you are not familiar with /GS, /SafeSEH, Address Space Layout Randomization (ASLR) and data execution prevention (DEP/NX), then you should read one of my previous blog posts on the subject.

In Windows 2000 there are no defenses at all because Windows 2000 pre-dates SDL. In Windows 2000 there is no firewall, no /GS, no DEP/NX and no ASLR, so exploit code runs predictably and easily.

Windows Server 2003 is compiled with /GS, DEP/NX is available, but there is no ASLR. There is a firewall in Windows Server 2003, but it is not enabled by default. For some coding constructs, real world exploits have circumvented the /GS implementation in Windows Server 2003 in the past.

Windows "Longhorn" Server, and now Windows Server 2008 is compiled with /GS, linked with /SafeSEH, DEP/NX is enabled by default, ASLR is also enabled by default and the stack is randomized too (the heap is randomized too, but that's not relevant to this vulnerability). The firewall in Windows "Longhorn" Server and later is enabled by default. This combination of defenses raises the bar substantially. In fact, /GS and /SafeSEH alone would have been good enough to mitigate attacks because all known /GS bypass techniques are mitigated by code created by the VC++ 2005 compiler. Of course, new exploit techniques to bypass /GS could be found in the future.

Another interesting point is in Windows Server 2003, it is possible in some cases for an exploit to disable DEP/NX <link: http://uninformed.org/?v=2&a=4>. This attack is not effective in Windows Vista and Windows "Longhorn" Server, because we won't allow user-mode code to change the status of its DEP/NX bit once it is set.

Another critical defense is service restart policy. This combined with ASLR makes exploits less viable by limiting the number of attacks possible. At the same time, ASLR increases the number of attempts required by increasing the entropy of the attack surface.

Simply put, when a service crashes, Windows can take action such as running another process or restarting the service. In the case of the DNS service on Windows "Longhorn" Server, the service will restart at most twice within 24 hours. Thus, within 24 hours, if there are three crashes, the following sequence of events will happen:

  • Attack. Crash. Restart.
  • Attack. Crash. Restart.
  • Attack. Crash. Dead.

In other words, in an attempt to circumvent ASLR, an attacker has only three tries to get critical addresses right, after which they're prevented from additional tries because the service is no longer running. Also, there is a two-minute delay between restarts, substantially slowing down the attacker and foiling automated attempts.

For architectural and performance reasons ASLR uses 8 bits of entropy, so an attacker has to conduct at most 256 attacks to defeat ASLR. We had a great deal of discussion internally about how to make the tradeoff between the extra reliability afforded by services restarting, and the extent to which ASLR is undermined when a service does restart. We decided that 2 restarts a day was a reasonable tradeoff - if a service is crashing so frequently that it needs to restart more than twice a day, there are only two possibilities:  1) the service is under attack, and the smart thing to do is to stay down, or 2) the service is buggy should be fixed. Customers can control the restart behavior on a service-by-service basis, but we don't recommend it.

Note that this doesn't create a DOS threat where one didn't exist before. Instead, we've limited the number of times that a service will allow itself to remain under attack.

Summary

There is a lot to learn from the DNS RPC  vulnerability. As an outcome of this vulnerability, we are more carefully scrubbing all RPC end-points to verify whether they should really be anonymously accessible. We have also updated our fuzzers to add more context-centric test cases and these updates are now in use. Our static analysis tools will be updated to accommodate more variable-length array variants.

We have also updated our internal training to make sure people understand that they validate their assumptions about network end-point accessibility.

On the good news front (I am an optimist, after all!), the defenses in Windows "Longhorn" Server beta 2 looked solid; even though it had the vulnerability, the combination of...

  • DEP/NX
  • ASLR
  • GS
  • Stack randomization
  • SafeSEH
  • Firewall
  • Service Restart Policy

... are good mitigations as they make it substantially harder for exploit code to run reliably.

- Michael

(Thanks to all the folks who helped put this analysis together: Chris Walker, Chris Budd, Eric Bidstrup, Nitin Kumar Goel, Steve Lipner, Shawn Hernan, Adam Shostack, James Whittaker and Dave Ladd.)

Comments
  • Can you comment on two items?

    - Do you think other static analyzers would have caught this item?  

    - Were the threat models correct and this was missed during the testing phase with respect to attack surface, or was the threat model and other documentation incorrect hence the lower priority given this interface?

  • asteingruebl,

    i don't know if other analyzers would find this - it would be interesting to try other tools.

    the interface was given lower pri because it was deemed admin-only

    -michael

  • I just posted the root cause analysis for the DNS RPC buffer overrun over on the SDL blog.

  • Michael Howard erläutert im SDL-Blog Details zu dem DNS RPC Interface Buffer Overrund (MS07-029) und

  • ping back from http://tr-tr-mitya.spaces.live.com/blog/cns!8CA28E6A05580D3E!173.entry

  • Trustworthy Computing http://msdn.microsoft.com/msdnmag/issues/07/11/Lessons/default.aspxMichael HowardThis

  • Thank you!

  • Hello, Michael here... Over the last couple of years, I've released information about various Microsoft

Page 1 of 1 (8 items)
Leave a Comment
  • Please add 6 and 7 and type the answer here:
  • Post