January, 2004

  • The Old New Thing

    What can go wrong when you mismatch the calling convention?

    Believe it or not, calling conventions is one of the things that programs frequently get wrong. The compiler yells at you when you mismatch a calling convention, but lazy programmers will just stick a cast in there to get the compiler to "shut up already".

    And then Windows is stuck having to support your buggy code forever.

    The window procedure

    So many people misdeclare their window procedures (usually by declaring them as __cdecl instead of __stdcall), that the function that dispatches messages to window procedures contains extra protection to detect incorrectly-declared window procedures and perform the appropriate fixup. This is the source of the mysterious 0xdcbaabcd on the stack. The function that dispatches messages to window procedures checks whether this value is on the stack in the correct place. If not, then it checks whether the window procedure popped one dword too much off the stack (if so, it fixes up the stack; I have no idea how this messed up a window procedure could have existed), or whether the window procedure was mistakenly declared as __cdecl instead of __stdcall (if so, it pops the parameters off the stack that the window procedure was supposed to do).

    DirectX callbacks

    Many DirectX functions use callbacks, and people once again misdeclared their callbacks as __cdecl instead of __stdcall, so the DirectX enumerators have to do special stack cleanup for those bad functions.


    I remember there was one program that decided to declare their CreateViewWindow function incorrectly, and somehow they managed to trick the compiler into accepting it!

    class BuggyFolder : public IShellFolder ... {
     // wrong function signature!
     HRESULT CreateViewObject(HWND hwnd) { return S_OK; }

    Not only did they get the function signature wrong, they returned S_OK even though they failed to do anything! I had to add extra code to clean up the stack after calling this function, as well as verify that the return value wasn't a lie.

    Rundll32.exe entry points

    The function signature required for functions called by rundll32.exe is documented in this Knowledge Base article. That hasn't stopped people from using rundll32 to call random functions that weren't designed to be called by rundll32, like user32 LockWorkStation or user32 ExitWindowsEx.

    Let's walk through what happens when you try to use rundll32.exe to call a function like ExitWindowsEx:

    The rundll32.exe program parses its command line and calls the ExitWindowsEx function on the assumption that the function is written like this:

    void CALLBACK ExitWindowsEx(HWND hwnd, HINSTANCE hinst,
           LPSTR pszCmdLine, int nCmdShow);
    But it isn't. The actual function signature for ExitWindowsEx is
    BOOL WINAPI ExitWindowsEx(UINT uFlags, DWORD dwReserved);
    What happens? Well, on entry to ExitWindowsEx, the stack looks like this:

    .. rest of stack ..
    return address <- ESP

    However, the function is expecting to see

    .. rest of stack ..
    return address <- ESP

    What happens? The hwnd passed by rundll32.exe gets misinterpreted as uFlags and the hinst gets misinterpreted as dwReserved. Since window handles are pseudorandom, you end up passing random flags to ExitWindowsEx. Maybe today it's EWX_LOGOFF, tomorrow it's EWX_FORCE, the next time it might be EWX_POWEROFF.

    Now suppose that the function manages to return. (For example, the exit fails.) The ExitWindowsEx function cleans two parameters off the stack, unaware that it was passed four. The resulting stack is

    .. rest of stack ..
    nCmdShow (garbage not cleaned up)
    pszCmdLine <- ESP (garbage not cleaned up)
    Now the stack is corrupted and really fun things happen. For example, suppose the thing at ".. rest of the stack .." is a return address. Well, the original code is going to execute a "return" instruction to return through that return address, but with this corrupted stack, the "return" instruction will instead return to a command line and attempt to execute it as if it were code.

    Random custom functions
    An anonymous commenter exported a function as __cdecl but treated it as if it were __stdcall. This will seem to work, but on return, the stack will be corrupted (because the caller is expecting a __stdcall function that cleans the stack, but what it gets is a __cdecl funcion that doesn't), and bad things will happen as a result.

    Okay, enough with the examples; I think you get the point. Here are some questions I'm sure you're asking:

    Why doesn't the compiler catch all these errors?

    It does. (Well, not the rundll32 one.) But people have gotten into the habit of just inserting the function cast to get the compiler to shut up.

    Here's a random example I found:

       WPARAM wParam, LPARAM lParam);

    This is the incorrect function signature for a dialog procedure. The correct signature is

    INT_PTR CALLBACK DialogProc(HWND hwndDlg, UINT uMsg,
        WPARAM wParam, LPARAM lParam);

    You start with

              hWnd, DlgProc);
    but the compiler rightly spits out the error message
    error C2664: 'DialogBoxParamA' : cannot convert parameter 4
    so you fix it by slapping a cast in to make the compiler shut up:
              hWnd, reinterpret_cast<DLGPROC>(DlgProc));

    "Aw, come on, who would be so stupid as to insert a cast to make an error go away without actually fixing the error?"

    Apparently everyone.

    I stumbled across this page that does exactly the same thing, and this one in German which gets not only the return value wrong, but also misdeclares the third and fourth parameters, and this one in Japanese. It's as easy to fix (incorrectly) as 1-2-3.

    How did programs with these bugs ever work at all? Certainly these programs worked to some degree or people would have noticed and fixed the bug. How can the program survive a corrupted stack?

    I'll answer this question tomorrow.

  • The Old New Thing

    Words I'd like to ban in 2004

    It seems to be fashionable to do a "top words" list this time of year. We have Google 2003 Zeitgeist, Top Yahoo! Searches 2003, Merriam-Webster's Words of the Year for 2003, YourDictionary.com's Top Ten Words of 2003, Lake Superior State University's Banished Words List for 2004; still waiting for the American Dialect Society's choice for Word of the Year for 2003.

    I like LSSU's approach, so here's my list of words I'd like to ban.


    Thank goodness this has faded, but there are still some citations out there. Please don't use it to describe my work. It makes me sound like a dog in a show. (No offense to dogs in shows!)


    Everybody is "the leading this" or "the leading that". Here's my rule: If you say you're the leading XYZ or (even dodgier) "among the leading XYZs", then have to list at least three companies that are not leaders in the XYZ market. Because if nobody is following you, then you're not really "leading", now, are you.

    And the word I most would like to banish from the English language:

    Ask (as a noun)

    This has taken over Microsoft-speak in the past year or so and it drives me batty. "What are our key asks here?", you might hear in a meeting. Language tip: The thing you are asking for is called a "request". Plus, of course, the thing that is an "ask" is usually more of a "demand" or "requirement". But those are such unfriendly words, aren't they? Why not use a warm, fuzzy word like "ask" to take the edge off?

    Answer: Because it's not a word.

    I have yet to find any dictionary which sanctions this usage. Indeed, the only definition for "ask" as a noun is A water newt [Scot. & North of Eng.], and that was from 1913!

    Answer 2: Because it's passive-aggressive.

    These "asks" are really "demands". So don't guilt-trip me with "Oh, you didn't meet our ask. We had to cut half our features. But that's okay. We'll just suffer quietly, you go do your thing, don't mind us."

  • The Old New Thing

    Fixing security holes in other programs

    Any crash report that involves a buffer overrun quickly escalates in priority. The last few that came my way were actually bugs in other programs that were detected by Windows.

    For example, there were a few programs that responded to the LVN_GETDISPINFO notification by overflowing the LVITEM.pszText buffer, writing more than LVITEM.cchTextMax characters.

    Another responded to IContextMenu::GetContextMenu by overflowing the pszName buffer, writing more than cchMax characters.

    Fortunately, in both cases, the overflow was only one character, so we were able to fix it by over-allocating the buffer by one and underreporting its size. That way, if the program overflows the buffer by one, it doesn't corrupt anything.

    Another one overflows one of its own stack buffers if you right-click on a file whose name is longer than MAX_PATH. (These files are legal but are hard to create or manipulate.) Not much we can do to prevent that one.

    So remember folks, watch those buffer sizes and don't overflow them. Security is everybody's job. We're all in this together.
  • The Old New Thing

    The history of calling conventions, part 3

    Okay, here we go: The 32-bit x86 calling conventions.

    (By the way, in case people didn't get it: I'm only talking in the context of calling conventions you're likely to encounter when doing Windows programming or which are used by Microsoft compilers. I do not intend to cover calling conventions for other operating systems or that are specific to a particular language or compiler vendor.)

    Remember: If a calling convention is used for a C++ member function, then there is a hidden "this" parameter that is the implicit first parameter to the function.


    The 32-bit x86 calling conventions all preserve the EDI, ESI, EBP, and EBX registers, using the EDX:EAX pair for return values.

    C (__cdecl)

    The same constraints apply to the 32-bit world as in the 16-bit world. The parameters are pushed from right to left (so that the first parameter is nearest to top-of-stack), and the caller cleans the parameters. Function names are decorated by a leading underscore.


    This is the calling convention used for Win32, with exceptions for variadic functions (which necessarily use __cdecl) and a very few functions that use __fastcall. Parameters are pushed from right to left [corrected 10:18am] and the callee cleans the stack. Function names are decorated by a leading underscore and a trailing @-sign followed by the number of bytes of parameters taken by the function.


    The first two parameters are passed in ECX and EDX, with the remainder passed on the stack as in __stdcall. Again, the callee cleans the stack. Function names are decorated by a leading @-sign and a trailing @-sign followed by the number of bytes of parameters taken by the function (including the register parameters).


    The first parameter (which is the "this" parameter) is passed in ECX, with the remainder passed on the stack as in __stdcall. Once again, the callee cleans the stack. Function names are decorated by the C++ compiler in an extraordinarily complicated mechanism that encodes the types of each of the parameters, among other things. This is necessary because C++ permits function overloading, so a complex decoration scheme must be used so that the various overloads have different decorated names.

    There are some nice diagrams on MSDN illustrating some of these calling conventions.

    Remember that a calling convention is a contract between the caller and the callee. For those of you crazy enough to write in assembly language, this means that your callback functions need to preserve the registers mandated by the calling convention because the caller (the operating system) is relying on it. If you corrupt, say, the EBX register across a call, don't be surprised when things fall apart on you. More on this in a future entry.

  • The Old New Thing

    The history of calling conventions, part 4: ia64

    The ia-64 architecture (Itanium) and the AMD64 architecture (AMD64) are comparatively new, so it is unlikely that many of you have had to deal with their calling conventions, but I include them in this series because, who knows, you may end up buying one someday.

    Intel provides the Intel® Itanium® Architecture Software Developer's Manual which you can read to get extraordinarily detailed information on the instruction set and processor architecture. I'm going to describe just enough to explain the calling convention.

    The Itanium has 128 integer registers, 32 of which (r0 through r31) are global and do not participate in function calls. The function declares to the processor how many registers of the remaining 96 it wants to use for purely local use ("local region"), the first few of which are used for parameter passing, and how many are used to pass parameters to other functions ("output registers").

    For example, suppose a function takes two parameters, requires four registers for local variables, and calls a function that takes three parameters. (If it calls more than one function, take the maximum number of parameters used by any called function.) It would then declare at function entry that it wants six registers in its local region (numbered r32 through r37) and three output registers (numbered r38, r39 and r40). Registers r41 through r127 are off-limits.

    Note to pedants: This isn't actually how it works, I know. But it's much easier to explain this way.

    When the function wants to call that child function, it puts the first parameter in r38, the second in r39, the third in r40, then calls the function. The processor shifts the caller's output registers so they can act as the input registers for the called function. In this case r38 moves to r32, r39 moves to r33 and r40 moves to r34. The old registers r32 through r38 are saved in a separated register stack, different from the "stack" pointed to by the sp register. (In reality, of course, these "spills" are deferred, in the same way that SPARC register windows don't spill until needed. Actually, you can look at the whole ia64 parameter passing convention as the same as SPARC register windows, just with variable-sized windows!)

    When the called function returns, the register then move back to their previous position and the original values of r32 through r38 are restored from the register stack.

    This creates some surprising answers to the traditional questions about calling conventions.

    What registers are preserved across calls? Everything in your local region (since it is automatically pushed and popped by the processor).

    What registers contain parameters? Well, they go into the output registers of the caller, which vary depending on how many registers the caller needs in its local region, but the callee always sees them as r32, r33, etc.

    Who cleans the parameters from the stack? Nobody. The parameters aren't on the stack to begin with.

    What register contains the return value? Well that's kind of tricky. Since the caller's registers aren't accessible from the called function, you'd think that it would be impossible to pass a value back! That's where the 32 global registers come in. One of the global registers (r8, as I recall) is nominated as the "return value register". Since global registers don't participate in the register window magic, a value stored there stays there across the function call transition and the function return transition.

    The return address is typically stored in one of the registers in the local region. This has the neat side-effect that a buffer overflow of a stack variable cannot overwrite a return address since the return address isn't kept on the stack in the first place. It's kept in the local region, which gets spilled onto the register stack, a chunk of memory separate from the stack.

    A function is free to subtract from the sp register to create temporary stack space (for string buffers, for example), which it of course must clean up before returning.

    One curious detail of the stack convention is that the first 16 bytes on the stack (the first two quadwords) are always scratch. (Peter Lund calls it a "red zone".) So if you need some memory for a short time, you can just use the memory at the top of the stack without asking for permission. But remember that if you call out to another function, then that memory becomes scratch for the function you called! So if you need the value of this "free scratchpad" preserved across a call, you need to subtract from sp to reserve it officially.

    One more curious detail about the ia64: A function pointer on the ia64 does not point to the first byte of code. Intsead, it points to a structure that describes the function. The first quadword in the structure is the address of the first byte of code, and the second quadword contains the value of the so-called "gp" register. We'll learn more about the gp register in a later blog entry.

    (This "function pointer actually points to a structure" trick is not new to the ia64. It's common on RISC machines. I believe the PPC used it, too.)

    Okay, this was a really boring entry, I admit. But believe it or not, I'm going to come back to a few points in this entry, so it won't have been for naught.

  • The Old New Thing

    Why does the copy dialog give such horrible estimates?

    Because the copy dialog is just guessing. It can't predict the future, but it is forced to try. And at the very beginning of the copy, when there is very little history to go by, the prediction can be really bad.

    Here's an analogy: Suppose somebody tells you, "I am going to count to 100, and you need to give continuous estimates as to when I will be done." They start out, "one, two, three...". You notice they are going at about one number per second, so you estimate 100 seconds. Uh-oh, now they're slowing down. "Four... ... ... five... ... ..." Now you have to change your estimate to maybe 200 seconds. Now they speed up: "six-seven-eight-nine" You have to update your estimate again.

    Now somebody who is listening only to your estimates and not the the person counting thinks you are off your rocker. Your estimate went from 100 seconds to 200 seconds to 50 seconds; what's your problem? Why can't you give a good estimate?

    File copying is the same thing. The shell knows how many files and how many bytes are going to be copied, but it doesn't know know how fast the hard drive or network or internet is going to be, so it just has to guess. If the copy throughput changes, the estimate needs to change to take the new transfer rate into account.

  • The Old New Thing

    The format of string resources

    Unlike the other resource formats, where the resource identifier is the same as the value listed in the *.rc file, string resources are packaged in "bundles". There is a rather terse description of this in Knowledge Base article Q196774. Today we're going to expand that terse description into actual code.

    The strings listed in the *.rc file are grouped together in bundles of sixteen. So the first bundle contains strings 0 through 15, the second bundle contains strings 16 through 31, and so on. In general, bundle N contains strings (N-1)*16 through (N-1)*16+15.

    The strings in each bundle are stored as counted UNICODE strings, not null-terminated strings. If there are gaps in the numbering, null strings are used. So for example if your string table had only strings 16 and 31, there would be one bundle (number 2), which consists of string 16, fourteen null strings, then string 31.

    (Note that this means there is no way to tell the difference between "string 20 is a string that has length zero" and "string 20 doesn't exist".)

    The LoadString function is rather limiting in a few ways:

    • You can't pass a language ID. If your resources are multilingual, you can't load strings from a nondefault language.
    • You can't query the length of a resource string.

    Let's write some functions that remove these limitations.

    LPCWSTR FindStringResourceEx(HINSTANCE hinst,
     UINT uId, UINT langId)
     // Convert the string ID into a bundle number
     LPCWSTR pwsz = NULL;
     HRSRC hrsrc = FindResourceEx(hinst, RT_STRING,
                         MAKEINTRESOURCE(uId / 16 + 1),
     if (hrsrc) {
      HGLOBAL hglob = LoadResource(hinst, hrsrc);
      if (hglob) {
       pwsz = reinterpret_cast<LPCWSTR>
       if (pwsz) {
        // okay now walk the string table
        for (int i = 0; i < uId & 15; i++) {
         pwsz += 1 + (UINT)*pwsz;
     return pwsz;

    After converting the string ID into a bundle number, we find the bundle, load it, and lock it. (That's an awful lot of paperwork just to access a resource. It's a throwback to the Windows 3.1 way of managing resources; more on that in a future entry.)

    We then walk through the table skipping over the desired number of strings until we find the one we want. The first WCHAR in each string entry is the length of the string, so adding 1 skips over the count and adding the count skips over the string.

    When we finish walking, pwsz is left pointing to the counted string.

    With this basic function we can create fancier functions.

    The function FindStringResource is a simple wrapper that searches for the string in the default thread language.

    LPCWSTR FindStringResource(HINSTANCE hinst, UINT uId)
     return FindStringResourceEx(hinst, uId,

    The function GetResourceStringLengthEx returns the length of the corresponding string, including the null terminator.

    UINT GetStringResourceLengthEx(HINSTANCE hinst,
     UINT uId, UINT langId)
     LPCWSTR pwsz = FindStringResourceEx
                           (hinst, uId, langId);
     return 1 + (pwsz ? *pwsz : 0);

    And the function AllocStringFromResourceEx loads the entire string resource into a heap-allocated memory block.

    LPWSTR AllocStringFromResourceEx(HINSTANCE hinst,
     UINT uId, UINT langId)
     LPCWSTR pwszRes = FindStringResourceEx
                           (hinst, uId, langId);
     if (!pwszRes) pwszRes = L"";
     LPWSTR pwsz = new WCHAR[(UINT)*pwszRes+1];
     if (pwsz) {
       pwsz[(UINT)*pwszRes] = L'\0';
       CopyMemory(pwsz, pwszRes+1,
                  *pwszRes * sizeof(WCHAR));
     return pwsz;

    (Writing the non-Ex functions GetStringResourceLength and AllocStringFromResource is left as an exercise.)

    Note that we must explicitly null-terminate the string since the string in the resource is not null-terminated. Note also that the string returned by AllocStringFromResourceEx must be freed with delete[]. For example:

    LPWSTR pwsz = AllocStringFromResource(hinst, uId);
    if (pwsz) {
      ... use pwsz ...
      delete[] pwsz;

    Mismatching vector "new[]" and scalar "delete" is an error I'll talk about in a future entry.

    Exercise: Discuss how the /n flag to rc.exe affects these functions.
  • The Old New Thing

    The history of calling conventions, part 5: amd64

    The last architecture I'm going to cover in this series is the AMD64 architecture (also known as x86-64).

    The AMD64 takes the traditional x86 and expands the registers to 64 bits, naming them rax, rbx, etc. It also adds eight more general purpose registers, named simply R8 through R15.

    • The first four parameters to a function are passed in rcx, rdx, r8 and r9. Any further parameters are pushed on the stack. Furthermore, space for the register parameters is reserved on the stack, in case the called function wants to spill them; this is important if the function is variadic.

    • Parameters that are smaller than 64 bits are not zero-extended; the upper bits are garbage, so remember to zero them explicitly if you need to. Parameters that are larger than 64 bits are passed by address.

    • The return value is placed in rax. If the return value is larger than 64 bits, then a secret first parameter is passed which contains the address where the return value should be stored.

    • All registers must be preserved across the call, except for rax, rcx, rdx, r8, r9, r10, and r11, which are scratch.

    • The callee does not clean the stack. It is the caller's job to clean the stack.

    • The stack must be kept 16-byte aligned. Since the "call" instruction pushes an 8-byte return address, this means that every non-leaf function is going to adjust the stack by a value of the form 16n+8 in order to restore 16-byte alignment.

    Here's a sample:

    void SomeFunction(int a, int b, int c, int d, int e);
    void CallThatFunction()
        SomeFunction(1, 2, 3, 4, 5);
        SomeFunction(6, 7, 8, 9, 10);

    On entry to CallThatFunction, the stack looks like this:

    xxxxxxx0 .. rest of stack ..
    xxxxxxx8 return address <- RSP

    Due to the presence of the return address, the stack is misaligned. CallThatFunction sets up its stack frame, which might go like this:

        sub    rsp, 0x28

    Notice that the local stack frame size is 16n+8, so that the result is a realigned stack.

    xxxxxxx0 .. rest of stack ..
    xxxxxxx8 return address
    xxxxxxx0   (arg5)
    xxxxxxx8   (arg4 spill)
    xxxxxxx0   (arg3 spill)
    xxxxxxx8   (arg2 spill)
    xxxxxxx0   (arg1 spill) <- RSP

    Now we can set up for the first call:

            mov     dword ptr [rsp+0x20], 5     ; output parameter 5
            mov     r9d, 4                      ; output parameter 4
            mov     r8d, 3                      ; output parameter 3
            mov     edx, 2                      ; output parameter 2
            mov     ecx, 1                      ; output parameter 1
            call    SomeFunction                ; Go Speed Racer!

    When SomeFunction returns, the stack is not cleaned, so it still looks like it did above. To issue the second call, then, we just shove the new values into the space we already reserved:

            mov     dword ptr [rsp+0x20], 10    ; output parameter 5
            mov     r9d, 9                      ; output parameter 4
            mov     r8d, 8                      ; output parameter 3
            mov     edx, 7                      ; output parameter 2
            mov     ecx, 6                      ; output parameter 1
            call    SomeFunction                ; Go Speed Racer!

    CallThatFunction is now finished and can clean its stack and return.
            add     rsp, 0x28

    Notice that you see very few "push" instructions in amd64 code, since the paradigm is for the caller to reserve parameter space and keep re-using it.

    [Updated 11:00am: Fixed some places where I said "ecx" and "edx" instead of "rcx" and "rdx"; thanks to Mike Dimmick for catching it.]

  • The Old New Thing

    If you know Swedish, the world is funnier

    As I was driving through Seattle the other day, I saw a sign for a personal storage company called "Stor-More".

    I then had to laugh because in Swedish, "Stor-Mor" means "Big Momma".

    It's not restricted to Swedish. On my trip to Germany last year, my travelling companions found several German signs amusing:

    • "Ausfahrt" ("highway exit")
    • "Schmuck" ("jewelry")
    • "Bad Kissing" (a town's name; more accurately, "Bad Kissingen", but never let the truth get in the way of a good joke. "Bad" in German means "bath" or "spa")

    When he told some German colleagues about this hilarious town name, they just looked at him as if to say, "What about Bad Kissingen? It's a nice town. What's so funny about it?" Only when he suggested that they look at it in English did they see the joke.

    For some reason I love multilingual jokes.

  • The Old New Thing

    Ikea walk-through

    Jeff Davis tipped me off to this Ikea walk-through. Frustratingly, the walkthrough doesn't include any cheat codes.

    Even though Ikea was founded by a Swede, its company colors match the Swedish national colors, all its product names are Swedish, and it is clearly associated with Sweden in the minds of everyone, it is in fact headquartered in Denmark. Probably for tax reasons.

    The name of the plastic stool Förby has provided me much entertainment. At first, it was amusingly similar to the name of the annoying toy Furby. And it turns out it's also amusingly similar to the Swedish word "förbi" which means "gone past". (German "vorbei".)

    During an evening of sore hands trying to drive screws into wood using a tiny little hex wrench, some friends and I came up with new Ikea product names. Here are a few:

    • Frusträt
    • Flims
    • Instabil
    • Brøkken (okay fine this is Norwegian, not Swedish; I hadn't started studying Swedish yet)
    • Kräp

    And, of course, the complimentary twine for tying your packages to the top of your car:

    • Strink

    Some of these might even be better than the actual name they came up with for a children's bed.

Page 1 of 5 (43 items) 12345