ATL, MS09-035 and the SDL

ATL, MS09-035 and the SDL

Rate This
  • Comments 17

Hello, Michael here.

<updated: 7/31 - changed the compiler 'warning' to 'error'>

Today, the Microsoft Security Response Center (MSRC) released two out-of-band security bulletins, MS09-034 and MS09-035, and a Security Advisory, to address security bugs in the Active Template Library (ATL) and I think it’s appropriate that I explain why the SDL did not find these bugs and what we learned.

I’ve said this many times, but I’ll say it again, because I think it bears repeating. A bug of any kind is an opportunity to learn and then adjust your development practices if appropriate. In this post I will only outline the bugs fixed in ATL, not any of the defense-in-depth mechanisms added to Internet Explorer as part of MS09-034. You can find more information about the IE update in Dave Ross’s blog post at Security Research & Defense.

But before I explain the bugs, I want to spend a couple of minutes to explain ATL. The Active Template Library (ATL) is a set of lightweight C++ classes originally designed to make creating COM objects easier. ATL handles all the object reference counting and handles common COM tasks with ease. But ATL is not restricted to COM; there are classes to handle smart pointers, images, the registry and ACLs and more.

When a developer creates a C++ project in Visual Studio, they are given the option to create an ATL project and if the developer opts to do so, the most important headers are automatically added to the project.

One final point before I discuss the bugs, the ATL source code is available for you to review; in the case of Visual Studio 2008, in the %ProgramFiles%\Microsoft Visual Studio 9.0\vc\atlmfc folder.

Now let’s dig into the bugs.

Bug #1: A Typo!

This is the core issue in the MSVidCtl ActiveX control. The bug is in a modified version of an older version of ATL, and is not in the public ATL code, but in a privately updated version of the ATL code.

Sidebar: How do ActiveX and COM differ?

Skip this section if you want to focus on the core security issues; I added this to answer to a question I get a lot. The Component Object Model (COM) is a binary specification that defines how objects can interact. An ActiveX object is a COM object. The major feature that characterizes ActiveX objects is their ability to be used from scripting languages. Doing so is often called ‘automation’. ActiveX objects use a COM interface named IDispatch which allows the script engine to resolve and call methods in the object at runtime. This is often called ‘late binding.’

The bug is simply a typo, can you spot it? I have removed extraneous code and error checking to make it easier to spot, and removed references to the psa variable (it’s a SAFEARRAYBOUND if you need to know)

__int64 cbSize;
hr = pStream->Read((void*) &cbSize, sizeof(cbSize), NULL);
BYTE *pbArray;
HRESULT hr = SafeArrayAccessData(psa, reinterpret_cast<LPVOID *>(&pbArray));
hr = pStream->Read((void*)&pbArray, (ULONG)cbSize, NULL);

I’ll give you one more clue – it’s a one character typo.

Give up? Look at the last line. The first argument is incorrect. It should be:

hr = pStream->Read((void*)pbArray, (ULONG)cbSize, NULL);

The extra ‘&’ character in the vulnerable code causes the code to write potentially untrusted data, of size cbSize, to the address of the pointer to the array, pbArray, rather than write the data into the array, and the pointer is on the stack. This is a stack-based buffer overrun vulnerability.

I contend that this would be very difficult to spot in a code review, and is not picked up by the C/C++ compiler owing to the (void*) cast. If the cast is removed, the compiler issues an error like this:

C2664: '<function>' : cannot convert parameter 1 from 'BYTE **' to 'BYTE *'

I despise C-style casting because it’s utterly unsafe; C++ casting is safer, although the reinterpret_cast operator is almost as bad as C-style casting.

So why did we miss this?

Our static analysis tools don’t flag this one because the cast tells the compiler and tools, “I know what I’m doing!” I looked over a few dozen instances of casting code like this in various code bases and they were all correct, so adding a rule to flag this kind of code would be prone to false positives and I would not want to subject anyone to a potentially massive amount of noise.

In the SDL we require that teams fuzz their controls, but our fuzzing tools didn’t find this because the method in question requires a specially formed input stream that includes many sentinel bytes. I explain the weaknesses of fuzzing here. We are in the process of adding more heuristics to our fuzzing engine so it can include these COM-specific bytes if needed.

Our banned API removal doesn’t find this because there is no banned API in play.

Some of the defenses such as ASLR and DEP in Windows might come into play, depending on the component in question. That seems like a vague answer, but I say “depending” because ATL is a source code template library that is used to build software, and it is up to the developers to use these defenses. Customers using Internet Explorer 8 on Windows Vista SP1 and later are better protected because ASLR and DEP are enabled by default.

The code is compiled with /GS, but there is no stack cookie for the vulnerable function because there are no local variables to protect, so /GS protection is ineffective in this instance.

Bug #2: Using ATL Property Maps to Instantiate a COM object

ATL allows COM objects to easily persist their properties to a stream of bytes and that byte-stream can then be re-constituted by the object at a later time. ATL does this using a ‘property map.’ The stream can be comprised of a series of tuples. When using tuples, the first portion of the tuple is the data type and, depending on the data type, a size (for example, an n-byte string [VT_BSTR]) and the second portion is the data itself.

If the data type in the stream is VT_DISPATCH or VT_UNKNOWN, then the control might be vulnerable.

The vulnerable code is in the shipping ATL source code, it’s in the CComVariant::ReadFromStream() method.

So how did we miss this? The SDL offers no requirements or recommendations about using ATL property maps; in fact, the SDL offers few practices about hosting COM containers, mainly because there are so few of them, the most well-known COM container is Internet Explorer. We do require that teams use tools to identify their Safe-for-Scripting and Safe-for-Instantiation controls, however.

In theory fuzzing should have found this, but our fuzzing engine does not build the correct stream and the stream is rejected. See the previous bug.

What We’re Doing

I want to point out that this is all very fluid right now owing to our rapid turn-around getting the bulletin out and I want to make sure we do the right thing in the SDL rather than rushing things and getting it wrong.

First and foremost, we are updating our fuzzing tools to help find COM stream-related issues quickly, and we will update the SDL to tell teams to fuzz any COM object they have using any of the risky interfaces (like IPersistStream*, IPersistStorage, etc.)

Second, we’re going to tell teams they must use the new ATL libraries. Today we have a “minimum compiler and linker toolset” requirement, but we don’t explicitly tell people which ATL to use. We’re going to change that!

Finally, I want to drill a little deeper into casting issues. This will be a side project for me over the next few months, as I wade through bug databases and code to see if there are other related issues. I’ll also speak to various static analysis and C/C++ language experts here at Microsoft and across the industry to get their views and insight. If you have a professional opinion on casting issues, please feel free to let me know through this blog.

Comments
  • "The extra ‘&’ character in the vulnerable code"

    In x86 assembly, this would cause the compiler to use the lea instruction to calculate the pointer to the data instead of using the mov instruction to load the data itself. lea instruction is opcode 8D while mov instruction is opcode 8B. In MSVIDCTL version 6.5.6001.18000, the first byte of the lea instruction is at file offset 0x465CC. Using a binary editor, you can change byte 8D at this file offset to byte 8B.

  • The compiler could and/or static analysis tool could be a bit smarter and recognize that casting foo** to bar* is not quite the same as casting foo* to bar* and that the former may deserve more attention...

  • Michael, that compiler message is no warning. It is an outright error as it should be.

    COM streams take void*, which I agree with because any single indirection will bind to [const] void*.

    http://msdn.microsoft.com/en-us/library/aa380011(VS.85).aspx

    The fact that this code did any pointer casting is a WTF. It was unnecessary, made the code uglier, and let this bug slip through. Epic fail.

  • VS 2008 SP1: 365 MB VS 2005 SP1: 250 MB  VS.NET 2003: 72 MB. BIG FAIL MS.

  • Michael,

    Based on the samples of relatively new Microsoft code I've seen (ATL, Rotor, Detours library), it's not casts that are causing problems but the refusal to wrap unsafe code into safe libraries.

    A couple of examples, off the top of my head:

    1. I wrote a library that's a rough equivalent of MS Research Detours library (minus the problems that Detours has). My library is completely type safe so you cannot redirect a function with one signature to a function with a different signature or calling convention. Detours on the other hand has a wonderful type system inspired by C. ;)

    2. Rotor (does that really share a lot of code with the CLR?) does things like manual try-catch blocks instead of offloading its job to RIIA.

    3. The fact that, for example, CString has a Lock/Unlock buffer methods instead of using RIIA through another type, or the fact that anybody is using CComPtr's & operator instead of the same pattern, clearly tells me that the only way to figure if there's a catastrophic problem with that code is to inspect it. Ditto for implicit conversions.

    4. Why didn't you wrap various COM interfaces with type-safe wrappers that can be optimized out at compile time? The interfaces you mentioned have been with us for ten years.

    Etc.

    I don't think I've written C++ code that cause an access violation in a long, long time. And you personally obviously know safe C++ coding practices. Why not implement them so that the compiler can catch them instead of requiring testing?

    Thanks,

    Dejan

  • I don't understand the subtleties of SxS and the ATL DLLs.  I ship MSI's that contain the .msp of a particular ATL version..

    If my customers install the MS update that SxS will automatically choose the new DLLs when my code requests the old, vulnerable version?

  • Yeah, I spotted the extraneous '&' as soon as I saw that code.  I guess I'm just used to seeing this sort of thing in C++ and I'm attuned to it.

    I agree that the best way to stop writing these errors is to wrap unsafe code in a typesafe library and minimize the amount of code in the wrapper.

    There are a few things in C++ that I consider to be code smells: void pointers, C-style casts and use of reinterpret_cast<>.  These things are almost always a source of errors in C++.  Oh, and reinterpret_cast<> *is* just as dangerous as a C-style cast when it comes to pointers.

  • Just had the same epiphany as h4x above, just two days later. :)

    Why the (void*) cast? Doesn't every non-const C++ pointer implicitly convert to void*? It's the line noise there that's obscuring the bug in a code inspection.

    Reply to legalize, above:

    reinterpret_cast is probably a smidge better because if I remember correctly it doesn't cast away const.

  • Your biggest problem was not the static analysis but the fact that there was a bug in the library code which provided some functionality but there was obviously nowhere a test case for that same functionality!

    I'm not saying that it's possible to have the test case for all possible code, but the library code directly included in a lot of other highly exposed projects surely should have special treatment.

    Am I right that any test which would try to reasonably use the code above (reading significantly more than 4 or 8 bytes) would crash the testing application and the bug would be discovered early enough?

  • yuhong2, the problem is simply looking for LEA vs MOV doesn't indicate vulnerability. Clearly, there's work to be done in this area!

  • kliu0x52, you are totally correct - but i looked at a bunch of code that exhibited this behaviour, and it's all safe. so we need to a way to do this without too much noise.

  • h4x, you are 100% correct, i will update the text. thanks.

  • djelovic, you could argue that adding a C-style cast to a C++ app is poor form, and i would agree. in many cases our tools do catch many of these bugs, but the C-style cast bascially tells the compiler to back off, "i know what i'm doing."

  • legalize, part of what i'm doing is going over old bugs to see if there are patterns. i think you have a point with (void*) and reinterpret_cast etc. these are on my radar!

  • acq, we do test this stuff, but it appears the test case never picked up the bug. as i point out in the post, we're updating our fuzz tests to *explicitley* find this kind of bug in the interfaces affected.

Page 1 of 2 (17 items) 12
Leave a Comment
  • Please add 6 and 1 and type the answer here:
  • Post