Lessons learned from the Animated Cursor Security Bug

Lessons learned from the Animated Cursor Security Bug

Rate This
  • Comments 22

Michael Howard here.

A core tenet of the SDL is to take and incorporate lessons learned when we issue a security update, and there is a great deal to learn from the recent animated cursor bug, MS07-017, so I want to spend a few minutes to go over some of the things we have learned from this bug. 

First of all, this code is pretty old; is in Windows 2000, and predates the SDL. The SDL has parts (i.e., design review, threat modeling, testing, and security push) that focus on the product as a whole, and parts (i.e., code review and use of tools) that are focused on code.  In the Windows Vista process, we banned certain APIs, like strcpy and strncpy, and changed well over 140,000 calls to use safer calls.  memcpy wasn’t on that list. We also built in a lot of defense-in-depth measures because we know that the SDL can’t catch everything.  Let’s start by looking at some of the defense-in-depth measures we have in place that didn’t stop the threat:

-GS

The bug is a stack-based buffer overrun in code that looks like this:

HICON LoadAniIcon() {

    ...
    ANIHEADER myANIheader;

    memcpy(&myANIheader, untrustedData->headerdata, untrustedData->headerlength);

 

Because there are no candidate buffers on the function’s stack, there is no -GS cookie added to the stack, even though the code is compiled with -GS. This is not the first time we’ve seen code with no cookie, and this has made us rethink the heuristics used by the compiler when it determines whether to place a cookie on the stack or not. But, changing the compiler is a long-term task. In the short-term, we have a new compiler pragma that forces the compiler to be much more aggressive, and we will start using this pragma on new code.

ASLR & SafeSEH

The next issue is that the code is wrapped in an exception handler that catches code failures. This can usually be good for reliability, but it has an interesting security side-effect:  Windows Vista includes address space layout randomization (ASLR), and the goal of ASLR is to reduce the likelihood that an attacker can determine the address of critical functions.  This makes it harder to make exploits run correctly. But, if the vulnerable code is wrapped in an exception handler that catches many errors, a failed attempt will not crash the component and the attacker can try again with a different set of addresses. David LeBlanc mentions the dangers of exception handling in a recent blog post.

Note that by itself the ”catch everything” construct is not a security bug, but it can aid an attacker if the exception handler wraps vulnerable code.

Now, let’s look at analysis tools.

Analysis tools

Interestingly, our static analysis tools do not flag this construct as a security bug because it’s a very low-priority warning, and is presently not an SDL “must-fix” warning. Compiling the following code with the C++ compiler in Visual Studio 2005 or the Windows Development Kit...

void Foo() {

      __try {

      }

      __except( 1 ) {

      }

}

...yields the warning:

 

1>c:\code\demo\demo.cpp(36) : warning C6320: Exception-filter expression is the constant EXCEPTION_EXECUTE_HANDLER. This might mask exceptions that were not intended to be handled

 

We’re investigating this issue further to determine ways of finding exception handlers that may wrap potentially vulnerable code.

The next question is:  why did our static analysis tools not find this bug? Code that uses calls such as memcpy is hard to flag as vulnerable without generating a great many false positives. This is a research problem that no one has solved, here or elsewhere.  Another angle is to replace calls to memcpy with memcpy_s which forces the developer to think about the destination buffer size. We may ban memcpy for new code, but we still need to analyze this further. Stay tuned.

SAL

Another interesting note is that this code is really hard to annotate with SAL because the function takes a PVOID buffer and the buffer length is buried inside a structure that is passed by reference!  SAL can help find many bugs, however. Take the following code:

char Sample(div_t *i, size_t n) {

   char c[32];

   memcpy(c,(void*)i,n*sizeof div_t);

   return c[0];

}

char y = Sample((div_t*)dwArray,21);

 

This would normally not be flagged as an error, but when you add SAL annotations:

char Sample(__in_ecount(n) div_t *i, size_t n)

The compiler generates the following warnings:

1>c:\code\demo\demo.cpp(55) : warning C6385: Invalid data: accessing 'argument 1', the readable size is '84' bytes, but '168' bytes might be read: Lines: 50, 55

1>c:\ code\demo\demo.cpp(55) : warning C6202: Buffer overrun for 'dwArray', which is possibly stack allocated, in call to 'Sample': length '168' exceeds buffer size '84'

Fuzz Testing

Next:  why did our fuzz tests not find the bug? The animated cursor code was fuzz-tested extensively per the SDL requirements. It turns out none of the .ANI fuzz templates had a second “anih” record.  This is now addressed, and we are continually enhancing our fuzzing tools to make sure they add manipulations that duplicate arbitrary object elements better.

A Silver Lining: IE Defenses

There is one silver lining: on Windows Vista, known malicious exploits that affected real customers were stopped dead because of UAC and Protected Mode IE; that is not to say exploits could not be written that are UAC/Protected Mode aware.

Our goals are to code well, catch bugs, and reduce the chance that new bugs enter the code. We have a set of features in place to catch things the process misses.  We have a learning process and layers of defense in depth because we don’t expect perfect code -- ever.

Summary

In summary, SDL is not perfect, nor will it ever ever be perfect.  We still have work to do, and this bug shows that. We have a new -GS pragma that adds more stack cookies; we’ve updated our fuzz tools; we will pay closer attention to exception handlers that could mask vulnerabilities, and we will investigate the impact of banning memcpy for new code. Finally, we will update our education as necessary with lessons learned from this bug.

-Michael

Comments
  • I just posted an analysis over on the SDL blog of the lessons we learned from the recent animated curoser

  • Any time you catch an exception you don't know how to handle, you've got a bug. And it can sometimes even be a security bug, as shown here. Any code that executes while the system is in an unknown state is potentially dangerous code. If there is an access violation, the system should nearly always be considered to be in an unknown state. Something has happened that shouldn't have happened and all bets are off. Trying to recover is just scary.

    The sad part about this is that /EHa allows access violations to be caught accidentally. C# allows access violations to be caught accidentally (in fact, the design of the exception hierarchy makes it VERY HARD to avoid catching them).

  • My team at work announced the launch of "The Security Development Lifecycle" blog today. After the intro post, Michael Howard leads off with "Lessons Learned from the Animated Cursor Security Bug." I'm pretty excited. We're focused on transparency around

  • Thursday, April 26, 2007 5:11 PM by MSRCTEAM SDL Lessons learned from MS07-017 Hi everyone this is Adrian

  • The current solutions still fail.  Even when they catch an cookie, they result in an application failing.  When an intrustion fails, it still functions as a denial of service.

    If you don't bolt security in from the moment you type main(, you don't have any.

    Dropping functions like strcpy and memcpy make it easier to avoid bad patterns, but the problem starts even earlier - you MUST check the data you accepting to see if it is valid.  Does it fit into a buffer?  Does the size value provided match the size of the data provided?  What is the extra bit of data hanging off the end?  Is a 200K URL reasonable?  How many headers does an HTTP request really need?

    Should an app really have to crash and burn just because data arrives that can easily be identified as bogus instead of tossing it out, and possibly disconnecting a session it is involved with?

    Security has to be written right into the code, you can't add it in later with a bit of compiler magic.

    Pretending you can "fix" old code with a few compiler flags is going to result in problems like this repeating endlessly.  At some point, you have to stop, go back and FIX the foundation, or you might as well be building on quicksand.

  • Xepol, that's one of the reasons we're seriously looking at memcpy --> memcpy_s.

    -GS is not 'compiler magic' it's there as a backup, "just in case."

  • You didn't answer the most interesting question: Why did Microsoft fail to find the MS07-017 vulnerability when you were working on the MS05-002 patch?

    All you had to do was search the source code for all other places where the ReadChunk function was used (that's the function that actually does the memcpy). There are only three callers to it - one of them is safe, one of them was the cause of MS05-002 and the third one of MS07-017. This is exactly how I found the the MS07-017 bug - I was actually looking at the changes in the MS05-002 patch and decided to look at the other callers to ReadChunk.

    The SDL process certainly requires you to audit the code for any related vulnerabilities (page 197, Secure Development Lifecycle, Howard and Lipner). Why did it fail in this case?

  • First off I would like to give Microsoft kudos for launching this blog and using it to educate developers on secure coding using real world examples.  I always find real world problems sink in much deeper than contrived examples.

    In the memcpy example above you show that the size is coming from untrustedData->headerlength.  Couldn't you do taint analysis on this value and see that this size is coming from untrusted data. A very dangerous situation. If that is the case the static analyzer should flag it and have the developer review that the size was properly validated.

    Sure this is noisy but not as noisy as flagging all memcpy's that have a non-static size.  Then of course you have to weigh the work of weeding out that noise vs. better techniques for finding this problem such as better file fuzzing.

    -Chris

  • Chris, thanks for the kind words. As you can see, we still have more work to do wrt tainting, but the problem, as you point out is noise. One thing we've done in the past is run noisy tools and have experts hand triage the output. But it's very slow going!

    FWIW, we found the bug pretty quickly with the updated fuzzer!

    -Michael

  • In place of "But, changing the compiler is a long-term task." please read "We don't want to do this because we'd have to recompile the whole system and ship SP1 on a new DVD."

    "Another angle is to replace calls to memcpy with memcpy_s which forces the developer to think about the destination buffer size."

    Since when ?  A simple script could change most of the uses of memcpy automatically.  While such a change may have prevented this bug, it wouldn't solve the real issue.

    C.Wysopal's response on "taint analysis" is spot on.  The core issue here is failure to validate that tainted data before using it.  Everything else discussed is just a band-aid to cover for that error.

  • TimF,

    >>A simple script could change most of the uses of memcpy automatically

    yeah, and probably get it wrong, too!

  • Michael Howard provides a very insightful look at how the animated cursor bug bypassed the numerous security

  • Last month, a critical security bug was found in most versions of Windows. This bug generated a lot of

  • A very interesting blog: The Security Development Lifecycle It has recent posts on Security Education

  • Ciekawa analiza tego, dlaczego błąd w plikach ANI znalazł się również w VISTA mimo zastosowania SDL. Błędy zawsze się zdarzały i będą się zdarzać. Najważniejsze jest to, by z błędów wyciągnąć lekcję na przyszłość. Microsoft zacz

Page 1 of 2 (22 items) 12
Leave a Comment
  • Please add 4 and 1 and type the answer here:
  • Post