• The Old New Thing

    What tools should I assume everybody has?

    My code samples assume you are using the latest header files from the Platform SDK (free download), the one that includes support for Win64. If you have an older SDK then you won't have various new data types like UINT_PTR and INT_PTR and should just use UINT and INT.

    I write code that is Win64-compliant as a matter of course since all code in Windows must be Win64-compliant. Writing noncompliant code is as foreign to me as it would be for a chess player to consider the ramifications of an illegal move. It doesn't even enter my mind.

    The question for readers: Should I assume that everybody has the latest header files? Or should I write old-style code (that won't run on Win64)?
  • The Old New Thing

    ia64 - misdeclaring near and far data

    As I mentioned yesterday, the ia64 is a very demanding architecture. Today I'll discuss another way that lying to the compiler will come back and bite you.

    The ia64 does not have an absolute addressing mode. Instead, you access your global variables through the r1 register, nicknamed "gp" (global pointer). This register always points to your global variables. For example, if you had three global variables, one of them might be kept at [gp+0], the second at [gp+8] and the third at [gp+16].

    (I believe the Win32 MIPS calling convention also used this technique.)

    On the ia64, there is a limitation in the "addl" instruction: You can only add constants up to 22 bits, which comes out to 4MB. So you can have only 4MB of global variables.

    Well, it turns out that some people want more than 4MB of global variables. Fortunately, these people don't have one million DWORD variables. Rather, they have a few really big global arrays.

    The ia64 compiler solves this problem by splitting global variables into two categories, "small" and "large". (The boundary between "small" and "large" can be set by a compiler flag. I believe the default is to treat anything larger than 8 bytes as "large".)

    The code to access a "small" variable goes like this:

            addl    r30 = -205584, gp;; // r30 -> global variable
            ld4     r30 = [r30]         // load a DWORD from the global variable

    (The gp register actually points into the middle of your global variables, so that both positive and negative offsets can be used. In this case, the variable happened to live at a negative offset from gp.)

    By comparison, "large" global variables are accessed through a two-step process. First, the variable itself is allocated in a separate section of the file. Second, a pointer to the variable is placed into the "small" globals variables section of the module. As a result, accessing a "large" global variable requires an added level of indirection.

            addl    r30 = -205584, gp;; // r30 -> global variable forwarder
            ld8     r30 = [r30];;       // r30 -> global variable
            ld4     r30 = [r30]         // load a DWORD from the global variable

    If you leave the size of an object unspecified, like

    extern BYTE b[];
    then the compiler plays it safe and assumes the variable is large. If it turns out that the variable is small, the forwarder pointer will still be there, and the code will do the double-indirection to fetch something that could have been accessed with a single indirection. The code is slightly less efficient, but at least it still works.

    On the other hand, if you misdeclare the object as being small when it is actually large, then you end up in trouble. For example, if you write

        extern BYTE b;
    in one file, and
        extern BYTE b[256];
    in another, then files that include the first header will think the object is small and generate "small" code to access it, but files that include the second header will think it is large. And if the object turns out to be large after all, the code that used the first header file will fail pretty spectacularly.

    So don't do that. When you declare a variable, make sure to declare it accurately. Otherwise the ia64 will catch you in a lie, and you will pay.

  • The Old New Thing

    Uninitialized garbage on ia64 can be deadly

    On Friday, we talked about some of the bad things that can happen if you call a function with the wrong signature. The ia64 introduces yet another possible bad consequence of a mismatched function signature which you may have thought was harmless.

    The CreateThread function accepts a LPTHREAD_START_ROUTINE, which has the function signature

    DWORD CALLBACK ThreadProc(LPVOID lpParameter);
    One thing that people seem to like to do is to take a function that returns void and cast it to a LPTHREAD_START_ROUTINE. The theory is, "I don't care what the return value is, so I may as well use a function that doesn't have a return value. The caller will get garbage, but that's okay; garbage is fine." Here one web page that contains this mistake:
    void MyCritSectProc(LPVOID /*nParameter*/)  
    { ... }
    hMyThread = CreateThread(NULL, 0,
                 (LPTHREAD_START_ROUTINE) MyCritSectProc,  
                 NULL, 0, &MyThreadID);  
    This is hardly the only web page that supplies buggy sample code. Here's sample code from a course at Old Dominion University that makes the same mistake, and sample code from Long Island University, It's like shooting fish in a barrel. Just google for CreateThread LPTHREAD_START_ROUTINE and pretty much all the hits are people calling CreateThread incorrectly. Even sample code in MSDN gets this wrong. Here's a whitepaper that misdeclares both the return value and the input parameter in a manner that will crash on Win64,

    And it's all fun until somebody gets hurt.

    On the ia64, each 64-bit register is actually 65 bits. The extra bit is called "NaT" which stands for "not a thing". The bit is set when the register does not contain a valid value. Think of it as the integer version of the floating point NaN.

    The NaT bit gets set most commonly from speculative execution. There is a special form of load instruction on the ia64 which attempts to load the value from memory, but if the load fails (because the memory is paged out or the address is invalid), then instead of raising a page fault, all that happens is that NaT bit gets set, and execution continues.

    All mathematical operations on NaT just produce NaT again.

    The load is called "speculative" because it is intended for speculative execution. For example, consider the following imaginary function:

    void SomeClass::Sample(int *p)
      if (m_ready) {

    The assembly for this function might go like this:

          alloc r35=ar.pfs, 2, 2, 1 // 2 input, 2 locals, 1 output
          mov r34, rp               // save return address
          ld4 r32=[r32]             // fetch m_ready
          ld4.s r36=[r33];;         // speculative load of *p
          cmp.eq p14, p15=r0, r32   // m_ready == 0?
    (p15) chk.s r36=[r33]           // if not, validate r36
    (p15) br.call rp=DoSomething    //         call
          mov rp=r34;;              // return return address
          mov.i ar.pfs=r35          // clean registers
          br.ret rp;;               // return

    I suspect most of you haven't seen ia64 assembly before. Since this isn't an article on ia64 assembly, I'll gloss over the details that aren't relevant to my point.

    After setting up the register frame and saving the return address, we load the value of m_ready and also perform a speculative load of *p into the r36 register. Notice that we are starting to execute the "true" branch of the "if" statement before we even know whether the condition is true! That's why this is known as speculative execution.

    (Why do this? Because memory access is slow. It is best to issue memory accesses as far in advance of their use as possible, so you don't sit around stalled on RAM.)

    We then check the value of m_ready, and if it is nonzero, we execute the two lines marked with (p15). The first is a "chk.s" instruction which says, "If the r36 register is NaT, then perform a nonspeculative load from [r33]; otherwise, do nothing."

    So if the speculative load of *p had failed, the chk.s instruction will attempt to load it for real, raising the page fault and allowing the memory manager to page the memory back in (or to let the exception dispatcher raise the STATUS_ACCESS_VIOLATION).

    Once the value of the r36 register has been settled once and for all, we call DoSomething. (Since we have two input registers [r32, r33] and two local registers [r34, r35], the output register is r36.)

    After the call returns, we clean up and return to our own caller.

    Notice that if it turns out that m_ready was FALSE, and the access of *p had failed for whatever reason, then the r36 register would have been left in a NaT state.

    And that's where the danger lies.

    For you see, if you have a register whose value is NaT and you so much as breathe on it the wrong way (for example, try to save its value to memory), the processor will raise a STATUS_REG_NAT_CONSUMPTION exception.

    (There do exist some instructions that do not raise an exception when handed a NaT register. For example, all arithmetic operations support NaT; they just produce another NaT as the result. And there is a special "store to memory, even if it is NaT" instruction, which is handy when dealing with variadic functions.)

    Okay, maybe you can see where I'm going with this. (It sure is taking me a long time.)

    Suppose you're one of the people who take a function returning void and cast it to a LPTHREAD_START_ROUTINE. Suppose that function happens to leave the r8 register as NaT, because it ended with a speculative load that didn't pan out. You now return back to kernel32's thread dispatcher with NaT as the return value. Kernel32 then tries to save this value as the thread exit code and raises a STATUS_REG_NAT_CONSUMPTION exception.

    Your program dies deep inside kernel and you have no idea why. Good luck debugging this one!

    There's an analogous problem with passing too few parameters to a function. If you pass too few parameters to a function, the extra parameters might be NaT. And the great thing is, even if the function is careful not to access that parameter until some other conditions are met, the compiler may find that it needs to spill the parameter, thereby raising the STATUS_REG_NAT_CONSUMPTION exception.

    I've actually seen it happen. Trust me, you don't want to get tagged to debug it.

    The ia64 is a very demanding architecture. In tomorrow's entry, I'll talk about some other ways the ia64 will make you pay the penalty when you take shortcuts in your code and manage to skate by on the comparatively error-forgiving i386.

  • The Old New Thing

    How can a program survive a corrupted stack?

    Continuing from yesterday:

    The x86 architecture traditionally uses the EBP register to establish a stack frame. A typical function prologue goes like this:

      push ebp       ; save old ebp
      mov  ebp, esp  ; establish new ebp
      sub  esp, nn*4 ; local variables
      push ebx       ; must be preserved for caller
      push esi       ; must be preserved for caller
      push edi       ; must be preserved for caller
    This establishes a stack frame that looks like this, for, say, a __stdcall function that takes two parameters.

    .. rest of stack ..
    return address
    saved EBP <- EBP
    saved EBX
    saved ESI
    saved EDI <- ESP

    Parameters can be accessed with positive offsets from EBP; for example, param1 is [ebp+8]. Local variables have negative offsets from EBP; for example, local2 is [ebp-8].

    Now suppose that a calling convention or function declaration mismatch occurs and extra garbage is left on the stack:

    .. rest of stack ..
    return address
    saved EBP <- EBP
    saved EBX
    saved ESI
    saved EDI
    garbage <- ESP

    The function doesn't really feel any damage yet. The parameters are still accessible at the same positive offsets and the local variables are still accessible at the same negative offsets.

    The real damage doesn't occur until it's time to clean up. Look at the function epilogue:

      pop  edi       ; restore for caller
      pop  esi       ; restore for caller
      pop  ebx       ; restore for caller
      mov  esp, ebp  ; discard locals
      pop  ebp       ; restore for caller
      retd 8         ; return and clean stack

    In a normal stack, the three "pop" instructions match with the actual values on the stack and nobody gets hurt. But on the garbage stack, the "pop edi" actually loads garbage into the EDI register, as does the "pop esi". And the "pop ebx" - which thinks it's restoring the original value of EBX - actually loads the original value of the EDI register into EBX. But then the "mov esp, ebp" instruction fixes the stack back up, so the "pop ebp" and "retd" are executed with a repaired stack.

    What happened here? Things sort of got put back on their feet. Well, except that the ESI, EDI, and EBX registers got corrupted. If you're lucky, the values in ESI, EDI and EBX weren't important and could have survived corruption. Or all that was important was whether the value was zero or not, and you were lucky and replaced one nonzero value with another. For whatever reason, the corruption of those three registers is not immediately apparent, and you end up never realizing what you did wrong.

    Maybe the corruption has a subtle effect (say, you changed a value from zero to nonzero, causing the caller to go down the wrong codepath), but it's subtle enough that you don't notice, so you ship it, throw a party, and start the next project.

    But then a new compiler comes along, say one that does FPO optimization.

    FPO stands for "frame pointer omission"; the function dispenses with the EBP register as a frame register and instead just uses it like any other register. On the x86, which has comparatively few registers, an extra arithmetic register goes a long way.

    With FPO, the function prologue goes like this:

      sub  esp, nn*4 ; local variables
      push ebp       ; must be preserved for caller
      push ebx       ; must be preserved for caller
      push esi       ; must be preserved for caller
      push edi       ; must be preserved for caller
    The resulting stack frame looks like this:

    .. rest of stack ..
    return address
    saved EBP
    saved EBX
    saved ESI
    saved EDI <- ESP

    Everything is now accessed relative to the ESP register. For example, local-nn is [esp+0x10].

    Under these conditions, garbage on the stack is much more fatal. The function epilogue goes like this:

      pop  edi       ; restore for caller
      pop  esi       ; restore for caller
      pop  ebx       ; restore for caller
      pop  ebp       ; restore for caller
      add  esp, nn*4 ; discard locals
      retd 8         ; return and clean stack

    If there is garbage on the stack, the four "pop" instructions will restore the wrong values, as before, but this time, the cleanup of local variables won't fix anything. The "add esp, nn*4" will adjust the stack by what the function believes to be the correct amount, but since there was garbage on the stack, the stack pointer will be off.

    .. rest of stack ..
    return address
    local2 <- ESP (oops)

    The "retd 8" instruction now attempts to return to the caller, but instead it returns to whatever is in local2, which is probably not valid code.

    So this is an example of where optimizing your code reveals other people's bugs.

    Monday, I'll give a much more subtle example of something that can go wrong if you use the wrong function signature for a callback.

  • The Old New Thing

    Aw, poor guy, he's so depressed

    I suspect Tanzi isn't going to get much sympathy from, well, anybody.
    Parmalat's Tanzi is "Depressed"

    Lawyers for Calisto Tanzi, the jailed head of now-bankrupt European food and dairy group Parmalat, claim that he is "depressed" in prison, constantly asking about his family. The lawyers have suggested that Tanzi be released from prison and placed under house arrest. Apparently, Tanzi was well enough in the last few weeks to travel to Ecuador, but the stress of prison is simply too much.

  • The Old New Thing

    Google just keeps adding stuff

    ResearchBuzz pointed out still more google search keywords like area codes, UPC, and whois. I'm still waiting for PLU, those code numbers on the food in the produce aisle. Here's a brief history of PLU codes for those geeky enough to care (like me).
  • The Old New Thing

    What can go wrong when you mismatch the calling convention?

    Believe it or not, calling conventions is one of the things that programs frequently get wrong. The compiler yells at you when you mismatch a calling convention, but lazy programmers will just stick a cast in there to get the compiler to "shut up already".

    And then Windows is stuck having to support your buggy code forever.

    The window procedure

    So many people misdeclare their window procedures (usually by declaring them as __cdecl instead of __stdcall), that the function that dispatches messages to window procedures contains extra protection to detect incorrectly-declared window procedures and perform the appropriate fixup. This is the source of the mysterious 0xdcbaabcd on the stack. The function that dispatches messages to window procedures checks whether this value is on the stack in the correct place. If not, then it checks whether the window procedure popped one dword too much off the stack (if so, it fixes up the stack; I have no idea how this messed up a window procedure could have existed), or whether the window procedure was mistakenly declared as __cdecl instead of __stdcall (if so, it pops the parameters off the stack that the window procedure was supposed to do).

    DirectX callbacks

    Many DirectX functions use callbacks, and people once again misdeclared their callbacks as __cdecl instead of __stdcall, so the DirectX enumerators have to do special stack cleanup for those bad functions.


    I remember there was one program that decided to declare their CreateViewWindow function incorrectly, and somehow they managed to trick the compiler into accepting it!

    class BuggyFolder : public IShellFolder ... {
     // wrong function signature!
     HRESULT CreateViewObject(HWND hwnd) { return S_OK; }

    Not only did they get the function signature wrong, they returned S_OK even though they failed to do anything! I had to add extra code to clean up the stack after calling this function, as well as verify that the return value wasn't a lie.

    Rundll32.exe entry points

    The function signature required for functions called by rundll32.exe is documented in this Knowledge Base article. That hasn't stopped people from using rundll32 to call random functions that weren't designed to be called by rundll32, like user32 LockWorkStation or user32 ExitWindowsEx.

    Let's walk through what happens when you try to use rundll32.exe to call a function like ExitWindowsEx:

    The rundll32.exe program parses its command line and calls the ExitWindowsEx function on the assumption that the function is written like this:

    void CALLBACK ExitWindowsEx(HWND hwnd, HINSTANCE hinst,
           LPSTR pszCmdLine, int nCmdShow);
    But it isn't. The actual function signature for ExitWindowsEx is
    BOOL WINAPI ExitWindowsEx(UINT uFlags, DWORD dwReserved);
    What happens? Well, on entry to ExitWindowsEx, the stack looks like this:

    .. rest of stack ..
    return address <- ESP

    However, the function is expecting to see

    .. rest of stack ..
    return address <- ESP

    What happens? The hwnd passed by rundll32.exe gets misinterpreted as uFlags and the hinst gets misinterpreted as dwReserved. Since window handles are pseudorandom, you end up passing random flags to ExitWindowsEx. Maybe today it's EWX_LOGOFF, tomorrow it's EWX_FORCE, the next time it might be EWX_POWEROFF.

    Now suppose that the function manages to return. (For example, the exit fails.) The ExitWindowsEx function cleans two parameters off the stack, unaware that it was passed four. The resulting stack is

    .. rest of stack ..
    nCmdShow (garbage not cleaned up)
    pszCmdLine <- ESP (garbage not cleaned up)
    Now the stack is corrupted and really fun things happen. For example, suppose the thing at ".. rest of the stack .." is a return address. Well, the original code is going to execute a "return" instruction to return through that return address, but with this corrupted stack, the "return" instruction will instead return to a command line and attempt to execute it as if it were code.

    Random custom functions
    An anonymous commenter exported a function as __cdecl but treated it as if it were __stdcall. This will seem to work, but on return, the stack will be corrupted (because the caller is expecting a __stdcall function that cleans the stack, but what it gets is a __cdecl funcion that doesn't), and bad things will happen as a result.

    Okay, enough with the examples; I think you get the point. Here are some questions I'm sure you're asking:

    Why doesn't the compiler catch all these errors?

    It does. (Well, not the rundll32 one.) But people have gotten into the habit of just inserting the function cast to get the compiler to shut up.

    Here's a random example I found:

       WPARAM wParam, LPARAM lParam);

    This is the incorrect function signature for a dialog procedure. The correct signature is

    INT_PTR CALLBACK DialogProc(HWND hwndDlg, UINT uMsg,
        WPARAM wParam, LPARAM lParam);

    You start with

              hWnd, DlgProc);
    but the compiler rightly spits out the error message
    error C2664: 'DialogBoxParamA' : cannot convert parameter 4
    so you fix it by slapping a cast in to make the compiler shut up:
              hWnd, reinterpret_cast<DLGPROC>(DlgProc));

    "Aw, come on, who would be so stupid as to insert a cast to make an error go away without actually fixing the error?"

    Apparently everyone.

    I stumbled across this page that does exactly the same thing, and this one in German which gets not only the return value wrong, but also misdeclares the third and fourth parameters, and this one in Japanese. It's as easy to fix (incorrectly) as 1-2-3.

    How did programs with these bugs ever work at all? Certainly these programs worked to some degree or people would have noticed and fixed the bug. How can the program survive a corrupted stack?

    I'll answer this question tomorrow.

  • The Old New Thing

    The history of calling conventions, part 5: amd64

    The last architecture I'm going to cover in this series is the AMD64 architecture (also known as x86-64).

    The AMD64 takes the traditional x86 and expands the registers to 64 bits, naming them rax, rbx, etc. It also adds eight more general purpose registers, named simply R8 through R15.

    • The first four parameters to a function are passed in rcx, rdx, r8 and r9. Any further parameters are pushed on the stack. Furthermore, space for the register parameters is reserved on the stack, in case the called function wants to spill them; this is important if the function is variadic.

    • Parameters that are smaller than 64 bits are not zero-extended; the upper bits are garbage, so remember to zero them explicitly if you need to. Parameters that are larger than 64 bits are passed by address.

    • The return value is placed in rax. If the return value is larger than 64 bits, then a secret first parameter is passed which contains the address where the return value should be stored.

    • All registers must be preserved across the call, except for rax, rcx, rdx, r8, r9, r10, and r11, which are scratch.

    • The callee does not clean the stack. It is the caller's job to clean the stack.

    • The stack must be kept 16-byte aligned. Since the "call" instruction pushes an 8-byte return address, this means that every non-leaf function is going to adjust the stack by a value of the form 16n+8 in order to restore 16-byte alignment.

    Here's a sample:

    void SomeFunction(int a, int b, int c, int d, int e);
    void CallThatFunction()
        SomeFunction(1, 2, 3, 4, 5);
        SomeFunction(6, 7, 8, 9, 10);

    On entry to CallThatFunction, the stack looks like this:

    xxxxxxx0 .. rest of stack ..
    xxxxxxx8 return address <- RSP

    Due to the presence of the return address, the stack is misaligned. CallThatFunction sets up its stack frame, which might go like this:

        sub    rsp, 0x28

    Notice that the local stack frame size is 16n+8, so that the result is a realigned stack.

    xxxxxxx0 .. rest of stack ..
    xxxxxxx8 return address
    xxxxxxx0   (arg5)
    xxxxxxx8   (arg4 spill)
    xxxxxxx0   (arg3 spill)
    xxxxxxx8   (arg2 spill)
    xxxxxxx0   (arg1 spill) <- RSP

    Now we can set up for the first call:

            mov     dword ptr [rsp+0x20], 5     ; output parameter 5
            mov     r9d, 4                      ; output parameter 4
            mov     r8d, 3                      ; output parameter 3
            mov     edx, 2                      ; output parameter 2
            mov     ecx, 1                      ; output parameter 1
            call    SomeFunction                ; Go Speed Racer!

    When SomeFunction returns, the stack is not cleaned, so it still looks like it did above. To issue the second call, then, we just shove the new values into the space we already reserved:

            mov     dword ptr [rsp+0x20], 10    ; output parameter 5
            mov     r9d, 9                      ; output parameter 4
            mov     r8d, 8                      ; output parameter 3
            mov     edx, 7                      ; output parameter 2
            mov     ecx, 6                      ; output parameter 1
            call    SomeFunction                ; Go Speed Racer!

    CallThatFunction is now finished and can clean its stack and return.
            add     rsp, 0x28

    Notice that you see very few "push" instructions in amd64 code, since the paradigm is for the caller to reserve parameter space and keep re-using it.

    [Updated 11:00am: Fixed some places where I said "ecx" and "edx" instead of "rcx" and "rdx"; thanks to Mike Dimmick for catching it.]

  • The Old New Thing

    If you know Swedish, the world is funnier

    As I was driving through Seattle the other day, I saw a sign for a personal storage company called "Stor-More".

    I then had to laugh because in Swedish, "Stor-Mor" means "Big Momma".

    It's not restricted to Swedish. On my trip to Germany last year, my travelling companions found several German signs amusing:

    • "Ausfahrt" ("highway exit")
    • "Schmuck" ("jewelry")
    • "Bad Kissing" (a town's name; more accurately, "Bad Kissingen", but never let the truth get in the way of a good joke. "Bad" in German means "bath" or "spa")

    When he told some German colleagues about this hilarious town name, they just looked at him as if to say, "What about Bad Kissingen? It's a nice town. What's so funny about it?" Only when he suggested that they look at it in English did they see the joke.

    For some reason I love multilingual jokes.

  • The Old New Thing

    The history of calling conventions, part 4: ia64

    The ia-64 architecture (Itanium) and the AMD64 architecture (AMD64) are comparatively new, so it is unlikely that many of you have had to deal with their calling conventions, but I include them in this series because, who knows, you may end up buying one someday.

    Intel provides the Intel® Itanium® Architecture Software Developer's Manual which you can read to get extraordinarily detailed information on the instruction set and processor architecture. I'm going to describe just enough to explain the calling convention.

    The Itanium has 128 integer registers, 32 of which (r0 through r31) are global and do not participate in function calls. The function declares to the processor how many registers of the remaining 96 it wants to use for purely local use ("local region"), the first few of which are used for parameter passing, and how many are used to pass parameters to other functions ("output registers").

    For example, suppose a function takes two parameters, requires four registers for local variables, and calls a function that takes three parameters. (If it calls more than one function, take the maximum number of parameters used by any called function.) It would then declare at function entry that it wants six registers in its local region (numbered r32 through r37) and three output registers (numbered r38, r39 and r40). Registers r41 through r127 are off-limits.

    Note to pedants: This isn't actually how it works, I know. But it's much easier to explain this way.

    When the function wants to call that child function, it puts the first parameter in r38, the second in r39, the third in r40, then calls the function. The processor shifts the caller's output registers so they can act as the input registers for the called function. In this case r38 moves to r32, r39 moves to r33 and r40 moves to r34. The old registers r32 through r38 are saved in a separated register stack, different from the "stack" pointed to by the sp register. (In reality, of course, these "spills" are deferred, in the same way that SPARC register windows don't spill until needed. Actually, you can look at the whole ia64 parameter passing convention as the same as SPARC register windows, just with variable-sized windows!)

    When the called function returns, the register then move back to their previous position and the original values of r32 through r38 are restored from the register stack.

    This creates some surprising answers to the traditional questions about calling conventions.

    What registers are preserved across calls? Everything in your local region (since it is automatically pushed and popped by the processor).

    What registers contain parameters? Well, they go into the output registers of the caller, which vary depending on how many registers the caller needs in its local region, but the callee always sees them as r32, r33, etc.

    Who cleans the parameters from the stack? Nobody. The parameters aren't on the stack to begin with.

    What register contains the return value? Well that's kind of tricky. Since the caller's registers aren't accessible from the called function, you'd think that it would be impossible to pass a value back! That's where the 32 global registers come in. One of the global registers (r8, as I recall) is nominated as the "return value register". Since global registers don't participate in the register window magic, a value stored there stays there across the function call transition and the function return transition.

    The return address is typically stored in one of the registers in the local region. This has the neat side-effect that a buffer overflow of a stack variable cannot overwrite a return address since the return address isn't kept on the stack in the first place. It's kept in the local region, which gets spilled onto the register stack, a chunk of memory separate from the stack.

    A function is free to subtract from the sp register to create temporary stack space (for string buffers, for example), which it of course must clean up before returning.

    One curious detail of the stack convention is that the first 16 bytes on the stack (the first two quadwords) are always scratch. (Peter Lund calls it a "red zone".) So if you need some memory for a short time, you can just use the memory at the top of the stack without asking for permission. But remember that if you call out to another function, then that memory becomes scratch for the function you called! So if you need the value of this "free scratchpad" preserved across a call, you need to subtract from sp to reserve it officially.

    One more curious detail about the ia64: A function pointer on the ia64 does not point to the first byte of code. Intsead, it points to a structure that describes the function. The first quadword in the structure is the address of the first byte of code, and the second quadword contains the value of the so-called "gp" register. We'll learn more about the gp register in a later blog entry.

    (This "function pointer actually points to a structure" trick is not new to the ia64. It's common on RISC machines. I believe the PPC used it, too.)

    Okay, this was a really boring entry, I admit. But believe it or not, I'm going to come back to a few points in this entry, so it won't have been for naught.

Page 429 of 450 (4,494 items) «427428429430431»