• The Old New Thing

    The arms race between programs and users


    There is a constant struggle between people who write programs and the people who actually use them. For example, you often see questions like, "How do I make my program so the user can't kill it?"

    Now, imagine if there were a way to do this. Ask yourself, "What would the world be like if this were possible?"

    Well, then there would be some program, say, xyz.exe, that is unkillable. Now suppose you're the user. There's this program xyz.exe that has gone haywire, so you want to exit it. But it won't let you exit. So you try to kill it, but you can't kill it either.

    This is just one of several arms races that you can imagine.

    • "I don't want anybody to kill my process." vs. "How do I kill this runaway process?"
    • "I want to shove this critical dialog in the user's face." vs. "How do I stop programs from stealing focus?"
    • "I don't want anybody to delete this file." vs. "How do I delete this file that refuses to be deleted?"
    • "How do I prevent this program from showing up in Task Manager?" vs. "How can I see all the programs that are running on my computer?"

    Eventually you have to decide which side wins, and Windows has decided to keep users in control of their own programs and data, and keep administrators in control of their own computer. So users can kill any process they want (given sufficient privileges), they can stop any program from stealing focus, and they can delete any file they want (again, given sufficient privileges).

    Programs can try to make themselves more difficult to kill (deny PROCESS_TERMINATE access, deny PROCESS_CREATE_THREAD access so people can't CreateRemoteThread(EndProcess), deny PROCESS_VM_WRITE so people can't scribble into your stack and make you doublefault, deny PROCESS_SUSPEND_RESUME so they can't suspend you), but eventually you just can't stop them from, say, elevating to Debug privilege, debugging your process, and moving EIP to "ExitProcess".

    Notice that you can kill CSRSS.EXE and WINLOGON.EXE if you like. Your computer will get very angry at you, but you can do it. (Save you work first!)

    Another useful question to ask yourself: "What's to prevent a virus from doing the same thing?" If there were a way to do these things, then a virus could take advantage of them and make itself invisible to Task Manager, undeletable, and unkillable. Clearly you don't want that, do you?

  • The Old New Thing

    Bad version number checks

    Version numbers. Very important. And so many people check them wrong.

    This is why Windows 95's GetVersion function returned 3.95 instead of 4.0. A lot of code checked the version number like this:

      UINT Ver = GetVersion();
      UINT MajorVersion = LOBYTE(uVer);
      UINT MinorVersion = HIBYTE(uVer);
      if (MajorVersion < 3 || MinorVersion < 10) {
       Error("This program requires Windows 3.1");

    Now consider what happens when the version number is reported as 4.0. The major version check passes, but the minor version check fails since 0 is less than 10.

    This bug was so rife that we gave up shimming every app that had the problem and just decided, "Fine. If anybody asks, say that the Windows version is 3.95."

    I suspect this is also why DirectX always reports its version as 4.x.

  • The Old New Thing

    Sure, we do that

    The DirectX video driver interface for Windows 95 had a method that each driver exposed called something like "DoesDriverSupport(REFGUID guidCapability)" where we handed it a capability GUID and it said whether or not that feature was supported.

    There were various capability GUIDs defined, things like GUID_CanStretchAlpha to ask the driver whether it was capable of stretching a bitmap with an alpha channel.

    There was one driver that returned TRUE when you called DoesDriverSupport(GUID_XYZ), but when DirectDraw tried to use that capability, it failed, and in a pretty spectacular manner.

    So one of the DirectDraw developers called the vendor and asked them, "So does your card do XYZ?"

    Their response: "What's XYZ?"

    Turns out that their driver's implementation of DoesDriverSupport was something like this:

    BOOL DoesDriverSupport(REFGUID guidCapability)
      return TRUE;

    In other words, whenever DirectX asked, "Can you do this?" they answered, "Sure, we do that," without even checking what the question was.

    (The driver must have been written by the sales department.)

    So the DirectDraw folks changed the way they queried for driver capabilities. One of the developers went into his boss's office, took a network card, extracted the MAC address, and then smashed the card with a hammer.

    You see, this last step was important: The GUID generation algorithm is based on a combination of time and space. When you ask CoCreateGuid to create a new GUID, it encodes the time of your request in the first part of the GUID and information that uniquely identifies your machine (the network card's MAC address, which is required to be unique by the standards that apply to network card).

    By smashing the network card with a hammer, he prevented that network card from ever being used to generate a GUID.

    Next, he added code to DirectDraw so that when it starts up, it manufactures a random GUID based on that network card (which - by its having been destroyed - can never be validly created) and passes it to DoesDriverSupport. If the driver says, "Sure, we do that", DirectDraw says, "Aha! Caught you! I will not believe anything you say from now on."
  • The Old New Thing

    Adjustor thunks


    Yesterday we learned about the layout of COM objects and I hinted at "adjustor thunks".

    If you find yourself debugging in disassembly, you'll sometimes find strange little functions called "adjustor thunks". Let's take another look at the object we laid out last time:

    class CSample : public IPersist, public IServiceProvider
      // *** IUnknown ***
      STDMETHODIMP QueryInterface(REFIID riid, void** ppv);
      STDMETHODIMP_(ULONG) Release();
      // *** IPersist ***
      // *** IQueryService ***
      STDMETHODIMP QueryService(REFGUID guidService,
                      REFIID riid, void** ppv);
      LONG m_cRef;
    p    lpVtbl    QueryInterface (1)
    q    lpVtbl    QueryInterface (2) AddRef (1)
    m_cRef AddRef (2) Release (1)
    ... Release (2) GetClassID (1)
    QueryService (2)

    In the diagram, p is the pointer returned when the IPersist interface is needed, and q is the pointer for the IQueryService interface.

    Now, there is only one QueryInterface method, but there are two entries, one for each vtable. Remember that each function in a vtable receives the corresponding interface pointer as its "this" parameter. That's just fine for QueryInterface (1); its interface pointer is the same as the object's interface pointer. But that's bad news for QueryInterface (2), since its interface pointer is q, not p.

    This is where the adjustor thunks come in.

    The entry for QueryInterface (2) is a stub function that changes q to p, and then lets QueryInterface (1) do the rest of the work. This stub function is the adjustor thunk.

      sub     DWORD PTR [esp+4], 4 ; this -= sizeof(lpVtbl)
      jmp     CSample::QueryInterface

    The adjustor thunk takes the "this" pointer and subtracts 4, converting q into p, then it jumps to the QueryInterface (1) function to do the real work.

    Whenever you have multiple inheritance and a virtual function is implemented on multiple base classes, you will get an adjustor thunk for the second and subsequent base class methods in order to convert the "this" pointer into a common format.

  • The Old New Thing

    The layout of a COM object


    The Win32 COM calling convention specifies the layout of the virtual method table (vtable) of an object. If a language/compiler wants to support COM, it must lay out its object in the specified manner so other components can use it.

    It is no coincidence that the Win32 COM object layout matches closely the C++ object layout. Even though COM was originally developed when C was the predominant programming language, the designers saw fit to "play friendly" with the up-and-coming new language C++.

    The layout of a COM object is made explicit in the header files for the various interfaces. For example, here's IPersist from objidl.h, after cleaning up some macros.

    typedef struct IPersistVtbl
        HRESULT ( STDMETHODCALLTYPE *QueryInterface )(
            IPersist * This,
            /* [in] */ REFIID riid,
            /* [iid_is][out] */ void **ppvObject);
            IPersist * This);
            IPersist * This);
            IPersist * This,
            /* [out] */ CLSID *pClassID);
    } IPersistVtbl;
    struct IPersist
        const struct IPersistVtbl *lpVtbl;

    This corresponds to the following memory layout:

    p    lpVtbl    QueryInterface

    What does this mean?

    A COM interface pointer is a pointer to a structure that consists of just a vtable. The vtable is a structure that contains a bunch of function pointers. Each function in the list takes that interface pointer (p) as its first parameter ("this").

    The magic to all this is that since your function gets p as its first parameter, you can "hang" additional stuff onto that vtable:

    p    lpVtbl    QueryInterface
    other stuff

    The functions in the vtable can use offsets relative to the interface pointer to access its other stuff.

    If an object implements multiple interfaces but they are all descendants of each other, then a single vtable can be used for all of them. For example, the object above is already set to be used either as an IUnknown or as an IPersist, since IUnknown is a subset of IPersist.

    On the other hand, if an object implements multiple interfaces that are not descendants of each other, then you get multiple inheritance, in which case the object is typically laid out in memory like this:

    p    lpVtbl    QueryInterface (1)
    q    lpVtbl    QueryInterface (2) AddRef (1)
    other stuff
    AddRef (2) Release (1)
    Release (2) ...

    If you are using an interface that comes from the first vtable, then the interface pointer is p. But if you're using an interface that comes from the second vtable, then the interface pointer is q.

    Hang onto that diagram, because tomorrow we will learn about those mysterious "adjustor thunks".

  • The Old New Thing

    The white flash

    If you had a program that didn't process messages for a while, but it needed to be painted for whatever reason (say, somebody uncovered it), Windows would eventually lose patience with you and paint your window white.

    Or at least, that's what people would claim. Actually, Windows is painting your window with your class background brush. Since most people use COLOR_WINDOW and since COLOR_WINDOW is white in most color schemes, the end result is a flash of white.

    Why paint the window white? Why not just leave it alone?

    Well, that's what it used to do, but the result was that the previous contents of the screen would be shown where the window "would be". So suppose you were looking at Explorer, and then you restored a program that stopped responding. Inside the program's main window would be... a picture of Explorer. And then people would try to double-click on what they thought was Explorer but was really a hung program.

    In Windows XP, the behavior for a window that has stopped painting is different. Now, the system captures the pixels of the unresponsive window and just redraws those pixels if the window is unable to draw anything itself. Note, however, that if the system can't capture all of the pixels - say because the window was partially covered - then the parts that it couldn't get are filled in with the class brush.

    Which is usually white.
  • The Old New Thing

    What happened to DirectX 4?

    If you go through the history of DirectX, you'll see that there is no DirectX 4. It went from DirectX 3 straight to DirectX 5. What's up with that?

    After DirectX 3 was released, development on two successor products took place simultaneously: a shorter-term release called DirectX 4 and a more substantial longer-term release called DirectX 5.

    But based on the feedback we were getting from the game development community, they didn't really care about the small features in DirectX 4; what they were much more interested in were the features of DirectX 5. So it was decided to cancel DirectX 4 and roll all of its features into DirectX 5.

    So why wasn't DirectX 5 renamed to DirectX 4?

    Because there were already hundreds upon hundreds of documents that referred to the two projects as DirectX 4 and DirectX 5. Documents that said things like "Feature XYZ will not appear until DirectX 5". Changing the name of the projects mid-cycle was going to create even more confusion. You would end up with headlines like "Microsoft removes DirectX 5 from the table - kiss good-bye to feature XYZ" and conversations reminiscent of Who's on First:

    "I have some email from you saying that feature ABC won't be ready until DirectX 5. When do you plan on releasing DirectX 5?"

    "We haven't even started planning DirectX 5; we're completely focused on DirectX 4, which we hope to have ready by late spring."

    "But I need feature XYZ and you said that won't be ready until DirectX 5."

    "Oh, that email was written two weeks ago. Since then, DirectX 5 got renamed to DirectX 4, and DirectX 4 was cancelled."

    "So when I have a letter from you talking about DirectX 5, I should pretend it says DirectX 4, and when it says DirectX 4, I should pretend it says 'a project that has since been cancelled'?"

    "Right, but check the date at the top of the letter, because if it's newer than last week, then when it says DirectX 4, it really means the new DirectX 4."

    "And what if it says DirectX 5?"

    "Then somebody screwed up and didn't get the memo."

    "Okay, thanks. Clear as mud."

  • The Old New Thing

    The history of calling conventions, part 5: amd64

    The last architecture I'm going to cover in this series is the AMD64 architecture (also known as x86-64).

    The AMD64 takes the traditional x86 and expands the registers to 64 bits, naming them rax, rbx, etc. It also adds eight more general purpose registers, named simply R8 through R15.

    • The first four parameters to a function are passed in rcx, rdx, r8 and r9. Any further parameters are pushed on the stack. Furthermore, space for the register parameters is reserved on the stack, in case the called function wants to spill them; this is important if the function is variadic.

    • Parameters that are smaller than 64 bits are not zero-extended; the upper bits are garbage, so remember to zero them explicitly if you need to. Parameters that are larger than 64 bits are passed by address.

    • The return value is placed in rax. If the return value is larger than 64 bits, then a secret first parameter is passed which contains the address where the return value should be stored.

    • All registers must be preserved across the call, except for rax, rcx, rdx, r8, r9, r10, and r11, which are scratch.

    • The callee does not clean the stack. It is the caller's job to clean the stack.

    • The stack must be kept 16-byte aligned. Since the "call" instruction pushes an 8-byte return address, this means that every non-leaf function is going to adjust the stack by a value of the form 16n+8 in order to restore 16-byte alignment.

    Here's a sample:

    void SomeFunction(int a, int b, int c, int d, int e);
    void CallThatFunction()
        SomeFunction(1, 2, 3, 4, 5);
        SomeFunction(6, 7, 8, 9, 10);

    On entry to CallThatFunction, the stack looks like this:

    xxxxxxx0 .. rest of stack ..
    xxxxxxx8 return address <- RSP

    Due to the presence of the return address, the stack is misaligned. CallThatFunction sets up its stack frame, which might go like this:

        sub    rsp, 0x28

    Notice that the local stack frame size is 16n+8, so that the result is a realigned stack.

    xxxxxxx0 .. rest of stack ..
    xxxxxxx8 return address
    xxxxxxx0   (arg5)
    xxxxxxx8   (arg4 spill)
    xxxxxxx0   (arg3 spill)
    xxxxxxx8   (arg2 spill)
    xxxxxxx0   (arg1 spill) <- RSP

    Now we can set up for the first call:

            mov     dword ptr [rsp+0x20], 5     ; output parameter 5
            mov     r9d, 4                      ; output parameter 4
            mov     r8d, 3                      ; output parameter 3
            mov     edx, 2                      ; output parameter 2
            mov     ecx, 1                      ; output parameter 1
            call    SomeFunction                ; Go Speed Racer!

    When SomeFunction returns, the stack is not cleaned, so it still looks like it did above. To issue the second call, then, we just shove the new values into the space we already reserved:

            mov     dword ptr [rsp+0x20], 10    ; output parameter 5
            mov     r9d, 9                      ; output parameter 4
            mov     r8d, 8                      ; output parameter 3
            mov     edx, 7                      ; output parameter 2
            mov     ecx, 6                      ; output parameter 1
            call    SomeFunction                ; Go Speed Racer!

    CallThatFunction is now finished and can clean its stack and return.
            add     rsp, 0x28

    Notice that you see very few "push" instructions in amd64 code, since the paradigm is for the caller to reserve parameter space and keep re-using it.

    [Updated 11:00am: Fixed some places where I said "ecx" and "edx" instead of "rcx" and "rdx"; thanks to Mike Dimmick for catching it.]

  • The Old New Thing

    The history of calling conventions, part 4: ia64

    The ia-64 architecture (Itanium) and the AMD64 architecture (AMD64) are comparatively new, so it is unlikely that many of you have had to deal with their calling conventions, but I include them in this series because, who knows, you may end up buying one someday.

    Intel provides the Intel® Itanium® Architecture Software Developer's Manual which you can read to get extraordinarily detailed information on the instruction set and processor architecture. I'm going to describe just enough to explain the calling convention.

    The Itanium has 128 integer registers, 32 of which (r0 through r31) are global and do not participate in function calls. The function declares to the processor how many registers of the remaining 96 it wants to use for purely local use ("local region"), the first few of which are used for parameter passing, and how many are used to pass parameters to other functions ("output registers").

    For example, suppose a function takes two parameters, requires four registers for local variables, and calls a function that takes three parameters. (If it calls more than one function, take the maximum number of parameters used by any called function.) It would then declare at function entry that it wants six registers in its local region (numbered r32 through r37) and three output registers (numbered r38, r39 and r40). Registers r41 through r127 are off-limits.

    Note to pedants: This isn't actually how it works, I know. But it's much easier to explain this way.

    When the function wants to call that child function, it puts the first parameter in r38, the second in r39, the third in r40, then calls the function. The processor shifts the caller's output registers so they can act as the input registers for the called function. In this case r38 moves to r32, r39 moves to r33 and r40 moves to r34. The old registers r32 through r38 are saved in a separated register stack, different from the "stack" pointed to by the sp register. (In reality, of course, these "spills" are deferred, in the same way that SPARC register windows don't spill until needed. Actually, you can look at the whole ia64 parameter passing convention as the same as SPARC register windows, just with variable-sized windows!)

    When the called function returns, the register then move back to their previous position and the original values of r32 through r38 are restored from the register stack.

    This creates some surprising answers to the traditional questions about calling conventions.

    What registers are preserved across calls? Everything in your local region (since it is automatically pushed and popped by the processor).

    What registers contain parameters? Well, they go into the output registers of the caller, which vary depending on how many registers the caller needs in its local region, but the callee always sees them as r32, r33, etc.

    Who cleans the parameters from the stack? Nobody. The parameters aren't on the stack to begin with.

    What register contains the return value? Well that's kind of tricky. Since the caller's registers aren't accessible from the called function, you'd think that it would be impossible to pass a value back! That's where the 32 global registers come in. One of the global registers (r8, as I recall) is nominated as the "return value register". Since global registers don't participate in the register window magic, a value stored there stays there across the function call transition and the function return transition.

    The return address is typically stored in one of the registers in the local region. This has the neat side-effect that a buffer overflow of a stack variable cannot overwrite a return address since the return address isn't kept on the stack in the first place. It's kept in the local region, which gets spilled onto the register stack, a chunk of memory separate from the stack.

    A function is free to subtract from the sp register to create temporary stack space (for string buffers, for example), which it of course must clean up before returning.

    One curious detail of the stack convention is that the first 16 bytes on the stack (the first two quadwords) are always scratch. (Peter Lund calls it a "red zone".) So if you need some memory for a short time, you can just use the memory at the top of the stack without asking for permission. But remember that if you call out to another function, then that memory becomes scratch for the function you called! So if you need the value of this "free scratchpad" preserved across a call, you need to subtract from sp to reserve it officially.

    One more curious detail about the ia64: A function pointer on the ia64 does not point to the first byte of code. Intsead, it points to a structure that describes the function. The first quadword in the structure is the address of the first byte of code, and the second quadword contains the value of the so-called "gp" register. We'll learn more about the gp register in a later blog entry.

    (This "function pointer actually points to a structure" trick is not new to the ia64. It's common on RISC machines. I believe the PPC used it, too.)

    Okay, this was a really boring entry, I admit. But believe it or not, I'm going to come back to a few points in this entry, so it won't have been for naught.

  • The Old New Thing

    Why do member functions need to be "static" to be used as a callback?

    As we learned yesterday, nonstatic member functions take a secret "this" parameter, which makes them incompatible with the function signature required by Win32 callbacks. Fortunately, nearly all callbacks provide some way of providing context. You can shove the "this" pointer into the context so you can reconstruct the source object. Here's an example:

    class SomeClass {
     static DWORD CALLBACK s_ThreadProc(LPVOID lpParameter)
      return ((SomeClass*)lpParameter)->ThreadProc();
     DWORD ThreadProc()
      ... fun stuff ...

    Some callback function signatures place the context parameter (also known as "reference data") as the first parameter. How convenient, for the secret "this" parameter is also the first parameter. Looking at the various calling conventions available to us, it sure looks like the __stdcall calling convention for member functions matches our desired stack layout rather well. Let's take WAITORTIMERCALLBACK for example:

    __stdcall callback __stdcall method call thiscall method call
    .. rest of stack .. .. rest of stack .. .. rest of stack ..
    TimerOrWaitFired TimerOrWaitFired TimerOrWaitFired <- ESP
    lpParameter <- ESP this <- ESP

    Well, "thiscall" doesn't match, but the two "__stdcall"s do. Fortunately the compiler is smart enough to recognize this and can optimize the s_ThreadProc static method to nothing if you just give it enough of a nudge:

    class SomeClass {
     static DWORD CALLBACK s_ThreadProc(LPVOID lpParameter)
      return ((SomeClass*)lpParameter)->ThreadProc();
     DWORD __stdcall ThreadProc()
      ... fun stuff ...

    If you look at the code generation for the s_ThreadProc function, you'll see that has been reduced to nothing but a jump instruction, since the compiler has realized that the two calling conventions coincide here so there is no actual translation to do.

    ?s_ThreadProc@SomeClass@@SGKPAX@Z PROC NEAR
      jmp     ?ThreadProc@SomeClass@@QAGKXZ
    ?s_ThreadProc@SomeClass@@SGKPAX@Z ENDP

    Now some people would take this one step further and just cast the second parameter to CreateThread to LPTHREAD_START_ROUTINE and get rid of the helper s_ThreadProc function entirely. I strongly advise against this. I have seen too many people cause trouble by miscasting function pointers; more on this in a future entry.

    Although we took advantage above of a coincidence between the two __stdcall calling conventions, we did not rely on it. If the coincidence in calling conventions fails to occur, the code is still correct. This is important when it comes time to port this code to another architecture, one where the coincidence may longer be true!

Page 43 of 49 (484 items) «4142434445»