• The Old New Thing

    The history of calling conventions, part 1

    The great thing about calling conventions on the x86 platform is that there are so many to choose from!

    In the 16-bit world, part of the calling convention was fixed by the instruction set: The BP register defaults to the SS selector, whereas the other registers default to the DS selector. So the BP register was necessarily the register used for accessing stack-based parameters.

    The registers for return values were also chosen automatically by the instruction set. The AX register acted as the accumulator and therefore was the obvious choice for passing the return value. The 8086 instruction set also has special instructions which treat the DX:AX pair as a single 32-bit value, so that was the obvious choice to be the register pair used to return 32-bit values.

    That left SI, DI, BX and CX.

    (Terminology note: Registers that do not need to be preserved across a function call are often called "scratch".)

    When deciding which registers should be preserved by a calling convention, you need to balance the needs of the caller against the needs of the callee. The caller would prefer that all registers be preserved, since that removes the need for the caller to worry about saving/restoring the value across a call. The callee would prefer that no registers be preserved, since that removes the need to save the value on entry and restore it on exit.

    If you require too few registers to be preserved, then callers become filled with register save/restore code. But if you require too many registers to be preserved, then callees become obligated to save and restore registers that the caller might not have really cared about. This is particularly important for leaf functions (functions that do not call any other functions).

    The non-uniformity of the x86 instruction set was also a contributing factor. The CX register could not be used to access memory, so you wanted to have some register other than CX be scratch, so that a leaf function can at least access memory without having to preserve any registers. So BX was chosen to be scratch, leaving SI and DI as preserved.

    So here's the rundown of 16-bit calling conventions:

    All calling conventions in the 16-bit world preserve registers BP, SI, DI (others scratch) and put the return value in DX:AX or AX, as appropriate for size.

    C (__cdecl)
    Functions with a variable number of parameters constrain the C calling convention considerably. It pretty much requires that the stack be caller-cleaned and that the parameters be pushed right to left, so that the first parameter is at a fixed position relative to the top of the stack. The classic (pre-prototype) C language allowed you to call functions without telling the compiler what parameters the function requested, and it was common practice to pass the wrong number of parameters to a function if you "knew" that the called function wouldn't mind. (See "open" for a classic example of this. The third parameter is optional if the second parameter does not specify that a file should be created.)

    In summary: Caller cleans the stack, parameters pushed right to left.

    Function name decoration consists of a leading underscore. My guess is that the leading underscore prevented a function name from accidentally colliding with an assembler reserved word. (Imagine, for example, if you had a function called "call".)

    Pascal (__pascal)
    Pascal does not support functions with a variable number of parameters, so it can use the callee-clean convention. Parameters are pushed from left to right, because, well, it seemed the natural thing to do. Function name decoration consists of conversion to uppercase. This is necessary because Pascal is not a case-sensitive language.

    Nearly all Win16 functions are exported as Pascal calling convention. The callee-clean convention saves three bytes at each call point, with a fixed overhead of two bytes per function. So if a function is called ten times, you save 3*10 = 30 bytes for the call points, and pay 2 bytes in the function itself, for a net savings of 28 bytes. It was also fractionally faster. On Win16, saving a few hundred bytes and a few cycles was a big deal.

    Fortran (__fortran)
    The Fortran calling convention is the same as the Pascal calling convention. It got a separate name probably because Fortran has strange pass-by-reference behavior.

    Fastcall (__fastcall)
    The Fastcall calling convention passes the first parameter in the DX register and the second in the CX register (I think). Whether this was actually faster depended on your call usage. It was generally faster since parameters passed in registers do not need to be spilled to the stack, then reloaded by the callee. On the other hand, if significant computation occurs between the computation of the first and second parameters, the caller has to spill it anyway. To add insult to injury, the called function often spilled the register into memory because it needed to spare the register for something else, which in the "significant computation between the first two parameters" case means that you get a double-spill. Ouch!

    Consequently, __fastcall was typically faster only for short leaf functions, and even then it might not be.

    Okay, those are the 16-bit calling conventions I remember. Part 2 will discuss 32-bit calling conventions, if I ever get around to writing it.

  • The Old New Thing

    Why not just block the apps that rely on undocumented behavior?

    Because every app that gets blocked is another reason for people not to upgrade to the next version of Windows. Look at all these programs that would have stopped working when you upgraded from Windows 3.0 to Windows 3.1.
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Compatibility

    Actually, this list is only partial. Many times, the compatibility fix is made inside the core component for all programs rather than targetting a specific program, as this list does.

    (The Windows 2000-to-Windows XP list is stored in your C:\WINDOWS\AppPatch directory, in a binary format to permit rapid scanning. Sorry, you won't be able to browse it easily. I think the Application Compatibility Toolkit includes a viewer, but I may be mistaken.)

    Would you have bought Windows XP if you knew that all these programs were incompatible?

    It takes only one incompatible program to sour an upgrade.

    Suppose you're the IT manager of some company. Your company uses Program X for its word processor and you find that Program X is incompatible with Windows XP for whatever reason. Would you upgrade?

    Of course not! Your business would grind to a halt.

    "Why not call Company X and ask them for an upgrade?"

    Sure, you could do that, and the answer might be, "Oh, you're using Version 1.0 of Program X. You need to upgrade to Version 2.0 for $150 per copy." Congratulations, the cost of upgrading to Windows XP just tripled.

    And that's if you're lucky and Company X is still in business.

    I recall a survey taken a few years ago by our Setup/Upgrade team of corporations using Windows. Pretty much every single one has at least one "deal-breaker" program, a program which Windows absolutely must support or they won't upgrade. In a high percentage of the cases, the program in question was developed by their in-house programming staff, and it's written in Visual Basic (sometimes even 16-bit Visual Basic), and the person who wrote it doesn't work there any more. In some cases, they don't even have the source code any more.

    And it's not just corporate customers. This affects consumers too.

    For Windows 95, my application compatibility work focused on games. Games are the most important factor behind consumer technology. The video card that comes with a typical computer has gotten better over time because games demand it. (Outlook certainly doesn't care that your card can do 20 bajillion triangles a second.) And if your game doesn't run on the newest version of Windows, you aren't going to upgrade.

    Anyway, game vendors are very much like those major corporations. I made phone call after phone call to the game vendors trying to help them get their game to run under Windows 95. To a one, they didn't care. A game has a shelf life of a few months, and then it's gone. Why would they bother to issue a patch for their program to run under Windows 95? They already got their money. They're not going to make any more off that game; its three months are over. The vendors would slipstream patches and lose track of how many versions of their program were out there and how many of them had a particular problem. Sometimes they wouldn't even have the source code any more.

    They simply didn't care that their program didn't run on Windows 95. (My favorite was the one that tried to walk me through creating a DOS boot disk.)

    Oh, and that Application Compatibility Toolkit I mentioned above. It's a great tool for developers, too. One of the components is the Verifier: If you run your program under the verifier, it will monitor hundreds of API calls and break into the debugger when you do something wrong. (Like close a handle twice or allocate memory with GlobalAlloc but free it with LocalAlloc.)

    The new application compatibility architecture in Windows XP carries with it one major benefit (from an OS development perspective): See all those DLLs in your C:\WINDOWS\AppPatch directory? That's where many of the the compatibility changes live now. The compatibility workarounds no longer sully the core OS files. (Not all classes of compatibility workarounds can be offloaded to a compatibility DLL, but it's a big help.)
  • The Old New Thing

    When programs grovel into undocumented structures...


    Three examples off the top of my head of the consequences of grovelling into and relying on undocumented structures.

    Defragmenting things that can't be defragmented
    In Windows 2000, there are several categories of things that cannot be defragmented. Directories, exclusively-opened files, the MFT, the pagefile... That didn't stop a certain software company from doing it anyway in their defragmenting software. They went into kernel mode, reverse-engineered NTFS's data structures, and modified them on the fly. Yee-haw cowboy! And then when the NTFS folks added support for defragmenting the MFT to Windows XP, these programs went in, modified NTFS's data structures (which changed in the meanwhile), and corrupted your disk.

    Of course there was no mention of this illicit behavior in the documentation. So when the background defragmenter corrupted their disks, Microsoft got the blame.

    Parsing the Explorer view data structures
    A certain software company decided that they wanted to alter the behavior of the Explorer window from a shell extension. Since there is no way to do this (a shell extension is not supposed to mess with the view; the view belongs to the user), they decided to do it themselves anyway.

    From the shell extension, they used an undocumented window message to get a pointer to one of the internal Explorer structures. Then they walked the structure until they found something they recognized. Then they knew, "The thing immediately after the thing that I recognize is the thing that I want."

    Well, the thing that they recognize and the thing that they want happened to be base classes of a multiply-derived class. If you have a class with multiple base classes, there is no guarantee from the compiler which order the base classes will be ordered. It so happened that they appeared in the order X,Y,Z in all the versions of Windows this software company tested against.

    Except Windows 2000.

    In Windows 2000, the compiler decided that the order should be X,Z,Y. So now they grovelled in, saw the "X" and said "Aha, the next thing must be a Y" but instead they got a Z. And then they crashed your system some time later.

    So I had to create a "fake X,Y" so when the program went looking for X (so it could grab Y), it found the fake one first.

    This took the good part of a week to figure out.

    Reaching up the stack
    A certain software company decided that it was too hard to take the coordinates of the NM_DBLCLK notification and hit-test it against the treeview to see what was double-clicked. So instead, they take the address of the NMHDR structure passed to the notification, add 60 to it, and dereference a DWORD at that address. If it's zero, they do one thing, and if it's nonzero they do some other thing.

    It so happens that the NMHDR is allocated on the stack, so this program is reaching up into the stack and grabbing the value of some local variable (which happens to be two frames up the stack!) and using it to control their logic.

    For Windows 2000, we upgraded the compiler to a version which did a better job of reordering and re-using local variables, and now the program couldn't find the local variable it wanted and stopped working.

    I got tagged to investigate and fix this. I had to create a special NMHDR structure that "looked like" the stack the program wanted to see and pass that special "fake stack".

    I think this one took me two days to figure out.

    I hope you understand why I tend to go ballistic when people recommend relying on undocumented behavior. These weren't hobbyists in their garage seeing what they could do. These were major companies writing commercial software.

    When you upgrade to the next version of Windows and you experience (a) disk corruption, (b) sporadic Explore crashes, or (c) sporadic loss of functionality in your favorite program, do you blame the program or do you blame Windows?

    If you say, "I blame the program," the first problem is of course figuring out which program. In cases (a) and (b), the offending program isn't obvious.

  • The Old New Thing

    Sometimes, an app just wants to crash

    I think it was Internet Explorer 5.0, when we discovered that a thirdparty browser extension had a serious bug, the details of which aren't important. The point was that this bug was so vicious, it crashed IE pretty frequently. Not good. To protect the users from this horrible fate, we marked the object as "bad" so IE wouldn't load it.

    And then we got an angry letter from the company that wrote this browser extension. They demanded that we remove the marking from their object and let IE crash in flames every time the user wanted to surf the web. Why? Because they also wanted us to hook up Windows Error Reporting to detect this crash and put up a dialog that says, "A fix for the problem you experienced is available. Click here for more information," and the "more information" was a redirect to the company's web site (where you could upgrade to version x.y of Program ABC for a special price of only $nnn!). (Actually I forget whether the upgrade was free or not, but the story is funnier if you had to pay for it.)

    In other words, they were crashing on purpose in order to drive upgrade revenue.

    (Astute readers may have noticed an additional irony: If the plug-in crashed IE, then how could the user view the company's web page so they could purchase and download the latest version?)
  • The Old New Thing

    The unsafe device removal dialog


    In a comment, somebody asked what the deal was with the unsafe device removal dialog in Windows 2000 and why it's gone in Windows XP.

    I wasn't involved with that dialog, but here's what I remember: The device was indeed removed unsafely. If it was a USB storage device, for example, there may have been unflushed I/O buffers. If it were a printer, there may have been an active print job. The USB stack doesn't know for sure (those are concepts at a higher layer that the stack doesn't know about) - all it knows is that it had an active channel with the device and now the device is gone, so it gets upset and yells at you.

    In Windows XP, it still gets upset but it now keeps its mouth shut. You're now on your honor not to rip out your USB drive before waiting two seconds for all I/O to flush, not to unplug your printer while a job is printing, etc. If you do, then your drive gets corrupted / print job is lost / etc. and you're on your own.
  • The Old New Thing

    Why are structure sizes checked strictly?


    You may have noticed that Windows as a general rule checks structure sizes strictly. For example, consider the MENUITEMINFO structure:

    typedef struct tagMENUITEMINFO {
      UINT    cbSize; 
      UINT    fMask; 
      UINT    fType; 
      UINT    fState; 
      UINT    wID; 
      HMENU   hSubMenu; 
      HBITMAP hbmpChecked; 
      HBITMAP hbmpUnchecked; 
      ULONG_PTR dwItemData; 
      LPTSTR  dwTypeData; 
      UINT    cch; 
    #if(WINVER >= 0x0500)
      HBITMAP hbmpItem; // available only on Windows 2000 and higher

    Notice that the size of this structure changes depending on whether WINVER >= 0x0500 (i.e., whether you are targetting Windows 2000 or higher). If you take the Windows 2000 version of this structure and pass it to Windows NT 4, the call will fail since the sizes don't match.

    "But the old version of the operating system should accept any size that is greater than or equal to the size it expects. A larger value means that the structure came from a newer version of the program, and it should just ignore the parts it doesn't understand."

    We tried that. It didn't work.

    Consider the following imaginary sized structure and a function that consumes it. This will be used as the guinea pig for the discussion to follow:

    typedef struct tagIMAGINARY {
      UINT cbSize;
      BOOL fDance;
      BOOL fSing;
      // v2 added new features
      IServiceProvider *psp; // where to get more info
    // perform the actions you specify
    STDAPI DoImaginaryThing(const IMAGINARY *pimg);
    // query what things are currently happening
    STDAPI GetImaginaryThing(IMAGINARY *pimg);

    First, we found lots of programs which simply forgot to initialize the cbSize member altogether.

    IMAGINARY img;
    img.fDance = TRUE;
    img.fSing = FALSE;

    So they got stack garbage as their size. The stack garbage happened to be a large number, so it passed the "greater than or equal to the expected cbSize" test and the code worked. Then the next version of the header file expanded the structure, using the cbSize to detect whether the caller is using the old or new style. Now, the stack garbage is still greater than or equal to the new cbSize, so version 2 of DoImaginaryThing says, "Oh cool, this is somebody who wants to provide additional information via the IServiceProvider field." Except of course that it's stack garbage, so calling the IServiceProvider::QueryService method crashes.

    Now consider this related scenario:

    IMAGINARY img;

    The next version of the header file expanded the structure, and the stack garbage happened to be a large number, so it passed the "greater than or equal to the expected cbSize" test, so it returned not just the fDance and fSing flags, but also returned an psp. Oops, but the caller was compiled with v1, so its structure doesn't have a psp member. The psp gets written past the end of the structure, corrupting whatever came after it in memory. Ah, so now we have one of those dreaded buffer overflow bugs.

    Even if you were lucky and the memory that came afterwards was safe to corrupt, you still have a bug: By the rules of COM reference counts, when a function returns an interface pointer, it is the caller's responsibility to release the pointer when no longer needed. But the v1 caller doesn't know about this psp member, so it certainly doesn't know that it needs to be psp->Release()d. So now, in addition to memory corruption (as if that wasn't bad enough), you also have a memory leak.

    Wait, I'm not done yet. Now let's see what happens when a program written in the future runs on an older system.

    Suppose somebody is writing their program intending it to be run on v2. They set the cbSize to the larger v2 structure size and set the psp member to a service provider that performs security checks before allowing any singing or dancing to take place. (E.g., makes sure everybody paid the entrance fee.) Now somebody takes this program and runs it on v1. The new v2 structure size passes the "greater than or equal to the v1 structure size" test, so v1 will accept the structure and Do the ImaginaryThing. Except that v1 didn't support the psp field, so your service provider never gets called and your security module is bypassed. Now everybody is coming into your club without paying.

    Now, you might say, "Well those are just buggy programs. They deserve to lose." If you stand by that logic, then prepare to take the heat when you read magazine articles like "Microsoft intentionally designed <Product X> to be incompatible with <software from a major competitor>. Where is the Justice Department when you need them?"
  • The Old New Thing

    Scoble's rant on UI defaults


    Robert Scoble posted an entry in his Longhorn blog on the subject of what the UI defaults should be. It sure has stirred up a lot of controvery. I may pick at the remarks over the upcoming days, but for now I posted responses to two of the comments he kicked up.

    We recently did a survey of users of all abilities. Beginners, intermediates, experts: The number one complaint all of them had about the user interface - 30% of all respondents mentioned this, evenly spread across all categories - was "Too many icons on the desktop." So it's not just beginners. Experts also don't like the clutter. (Yes, I was surprised by the results, too.)
  • The Old New Thing

    If FlushInstructionCache doesn't do anything, why do you have to call it?


    If you look at the implementation of FlushInstructionCache on Windows 95, you'll see that it's just a return instruction. It doesn't actually do anything. So why do you have to call it?

    Because the act of calling it is the whole point. The control transfers implicit in calling a function suffice to flush the instruction cache on a Pentium. The function doesn't have to do anything else; it is fact that you called a function that is important.
  • The Old New Thing

    Why do I have to return this goofy value for WM_DEVICECHANGE?


    To deny a device removal query, you must return the special value BROADCAST_QUERY_DENY, which has the curious value 0x424D5144. What's the story behind that?

    Well, we first tried following the pattern set by WM_QUERYENDSESSION, where returning TRUE allows the operation to proceed and returning FALSE causes the operation to fail. But when we did this, we found that lots of programs were denying all Plug and Play removal requests - programs that were written for Windows 3.1 which didn't have Plug and Play! How could this be?

    These programs decided, "Well, I have the Windows 3.1 SDK right here in front of me and I looked at all the messages. The ones I care about, I handled, and for all the others, I will just return zero instead of calling DefWindowProc." And they managed to get this to work in Windows 3.1 because they read the SDK carefully and found the five or six messages that require a nonzero return value and made sure to return that nonzero value. The rest got zero.

    And then when we added a new message that required a nonzero return value (which DefWindowProc provided), these programs continued to return zero and caused all device removal queries to fail.

    So we had to change the "cancel" return value to something that wasn't zero. To play it extra safe, we also made the "cancel" return value something other than 1, since we suspected that there would be lots of programs who were just returning TRUE to all messages and we didn't want to have to rewrite the specification twice.

  • The Old New Thing

    What did the letters "NT" originally stand for?

    Finally, the real story of what the letters "N" and "T" originally stood for is now public, so I can stop being coy about it. (Via LockerGnome.)
Page 43 of 48 (471 items) «4142434445»