January, 2004

  • The Old New Thing

    Why does the x86 have so few registers?

    One of the comments to my discussion of 16-bit calling conventions wondered why the 8086 had so few registers.

    The 8086 was a 16-bit version of the even older 8080 processor, which had six 8-bit registers, named A, B, C, D, E, H, and L. The registers could be used in pairs to products three 16-bit pseudo-registers, BC, DE, and HL. What's more, you could put a 16-bit address into the HL register and use the pseudo-register "M" to deference it. So, for example, you could write "MOV B, M" and this meant to load the 8-bit value pointed to by the HL register pair into the B register.

    The 8086 took these 8080 registers and mapped them sort of like this:

    • A -> AL
    • H -> BH, L -> BL; HL -> BX; M -> [BX]
    • B -> CH, C -> CL; BC -> CX
    • D -> DH, E -> DL; DE -> DX

    This is why the 8086 instruction set can only dereference through the [BX] register and not the [CX] or [DX] registers: On the original 8080, you could not dereference through [BC] or [DE], only thorugh M=[HL].

    This much so far is pretty official. The instruction set for the 8086 was chosen to be upwardly-compatible with the 8080, so as to facilitate machine translation of existing 8-bit code to this new 16-bit processor. Even the MS-DOS function calls were designed so as to faciliate machine translation.

    What about the SI and DI registers? I suspect they were inspired by the IX and IY registers available on the Z-80, a competitor to the 8080 which took the 8080 instruction set and extended it with more registers. The Z-80 allowed you to dereference through [IX] and [IY], so the 8086 lets you dereference through [SI] and [DI].

    And what about the BP register? I suspect that was invented on the fly in order to facilitate stack-based parameter passing. Notice that the BP register is the only 8086 register that defaults to the SS segment register and which can be used to access memory directly.

    Why not add even more registers, like today's processors with their palette of 16 or even 128 registers? Why limit the 8086 to only eight registers (AX, BX, CX, DX, SI, DI, BP, SP)? Well, that was then and this is now. At that time, processors did not have lots of registers. The 68000 had a whopping sixteen registers, but if you look more closely, only half of them were general purpose arithmetic registers; the other half were used only for accessing memory.

  • The Old New Thing

    The history of calling conventions, part 1

    The great thing about calling conventions on the x86 platform is that there are so many to choose from!

    In the 16-bit world, part of the calling convention was fixed by the instruction set: The BP register defaults to the SS selector, whereas the other registers default to the DS selector. So the BP register was necessarily the register used for accessing stack-based parameters.

    The registers for return values were also chosen automatically by the instruction set. The AX register acted as the accumulator and therefore was the obvious choice for passing the return value. The 8086 instruction set also has special instructions which treat the DX:AX pair as a single 32-bit value, so that was the obvious choice to be the register pair used to return 32-bit values.

    That left SI, DI, BX and CX.

    (Terminology note: Registers that do not need to be preserved across a function call are often called "scratch".)

    When deciding which registers should be preserved by a calling convention, you need to balance the needs of the caller against the needs of the callee. The caller would prefer that all registers be preserved, since that removes the need for the caller to worry about saving/restoring the value across a call. The callee would prefer that no registers be preserved, since that removes the need to save the value on entry and restore it on exit.

    If you require too few registers to be preserved, then callers become filled with register save/restore code. But if you require too many registers to be preserved, then callees become obligated to save and restore registers that the caller might not have really cared about. This is particularly important for leaf functions (functions that do not call any other functions).

    The non-uniformity of the x86 instruction set was also a contributing factor. The CX register could not be used to access memory, so you wanted to have some register other than CX be scratch, so that a leaf function can at least access memory without having to preserve any registers. So BX was chosen to be scratch, leaving SI and DI as preserved.

    So here's the rundown of 16-bit calling conventions:

    All calling conventions in the 16-bit world preserve registers BP, SI, DI (others scratch) and put the return value in DX:AX or AX, as appropriate for size.

    C (__cdecl)
    Functions with a variable number of parameters constrain the C calling convention considerably. It pretty much requires that the stack be caller-cleaned and that the parameters be pushed right to left, so that the first parameter is at a fixed position relative to the top of the stack. The classic (pre-prototype) C language allowed you to call functions without telling the compiler what parameters the function requested, and it was common practice to pass the wrong number of parameters to a function if you "knew" that the called function wouldn't mind. (See "open" for a classic example of this. The third parameter is optional if the second parameter does not specify that a file should be created.)

    In summary: Caller cleans the stack, parameters pushed right to left.

    Function name decoration consists of a leading underscore. My guess is that the leading underscore prevented a function name from accidentally colliding with an assembler reserved word. (Imagine, for example, if you had a function called "call".)

    Pascal (__pascal)
    Pascal does not support functions with a variable number of parameters, so it can use the callee-clean convention. Parameters are pushed from left to right, because, well, it seemed the natural thing to do. Function name decoration consists of conversion to uppercase. This is necessary because Pascal is not a case-sensitive language.

    Nearly all Win16 functions are exported as Pascal calling convention. The callee-clean convention saves three bytes at each call point, with a fixed overhead of two bytes per function. So if a function is called ten times, you save 3*10 = 30 bytes for the call points, and pay 2 bytes in the function itself, for a net savings of 28 bytes. It was also fractionally faster. On Win16, saving a few hundred bytes and a few cycles was a big deal.

    Fortran (__fortran)
    The Fortran calling convention is the same as the Pascal calling convention. It got a separate name probably because Fortran has strange pass-by-reference behavior.

    Fastcall (__fastcall)
    The Fastcall calling convention passes the first parameter in the DX register and the second in the CX register (I think). Whether this was actually faster depended on your call usage. It was generally faster since parameters passed in registers do not need to be spilled to the stack, then reloaded by the callee. On the other hand, if significant computation occurs between the computation of the first and second parameters, the caller has to spill it anyway. To add insult to injury, the called function often spilled the register into memory because it needed to spare the register for something else, which in the "significant computation between the first two parameters" case means that you get a double-spill. Ouch!

    Consequently, __fastcall was typically faster only for short leaf functions, and even then it might not be.

    Okay, those are the 16-bit calling conventions I remember. Part 2 will discuss 32-bit calling conventions, if I ever get around to writing it.

  • The Old New Thing

    Don't trust the return address

    Sometimes people ask, "So I know how to get my return address [use the _ReturnAddress() intrinsic]; how do I figure out what DLL that return address belongs to?"


    Even if you figure out which DLL the return address belongs to [use Get­Module­Handle­Ex(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS)], that doesn't mean that that is actually the DLL that called you.

    A common trick is to search through a "trusted" DLL for some code bytes that coincidentally match ones you (the attacker) want to execute. This can be something as simple as a "retd" instruction, which are quite abundant. The attacker then builds a stack frame that looks like this, for, say, a function that takes two parameters.

    hacked parameter 1
    hacked parameter 2

    After building this stack frame, the attacker then jumps to the start of the function being attacked.

    The function being attacked looks at the return address and sees trusted_retd, which resides in a trusted DLL. It then foolishly trusts the caller and allows some unsafe operation to occur, using hacked parameters 1 and 2. The function being attacked then does a "retd 8" to return and clean the parameters. This transfers control to the trusted_retd, which performs a simple retd, which now gives control to the hacker_code_addr, and the hacker can use the result to continue his nefarious work.

    This is why you should be concerned if somebody says, "This code verifies that its caller is trusted..." How do they know who the caller really is?

Page 5 of 5 (43 items) 12345