(This is an older post, with some mild cleanup, and fixed links)
Before I start: ABI = Application Binary Interface – this is the spec that describes
how to call functions, pass parameters, unwind the stack, handle exceptions, etc...
It’s also sometimes call the ‘Calling Convention’
There is a persistent misconception among people who are implementing x64 compilers/code
generators, and folks that write ASM code for x64, who have a functioning solution
for x86. People too frequently assume they can just keep most of their code, while
changing ESP to RSP, and things should 'just work'. This is fundamentally not true.
When initially working on the x64 ABI, it was decided that we wanted to clean up
the way that exception handling & general function invocation worked. We had
a brand new architecture, we wanted to cut out all of the legacy junk that continually
prevents, or at least overcomplicates, achieving great performance on the x86 platform.
With this in mind, x64 was given a single calling convention – no __cdecl / __stdcall
/ __fastcall / __thiscall mess. There was also a dramatic change in the way that
x64 unwinds the stack, compared to how x86 does it.
Unwinding the stack is used in a wide variety of places, including handling exceptions,
garbage collection, and displaying the call stack from a debugger. On x86, every
function that needs some sort of attention due to an exception must add an element
to a thread-global linked list upon entry, and remove it upon exit. For non-exception
unwinds, the thing doing the unwind must grok through some nasty meta-data that
tries to describe what & when the compiler is setting up the stack frame. This
meta-data was only implemented after about 1999, and is primarily supported in debuggers.
I’ll be ignoring this junk, and instead focus on how exception handling works, since
this is the primary way your code will break if you don’t get this right (undebuggable
code is still broken, but you won’t notice until you try to get a stack trace).
So, the x86 thread-global linked list contains a list of structures, each element
of such contains a function pointer to call in the event of an exception, and then
some data that said function will consume. Thus, you’ll see fs:[0] references scattered
throughout C++ code:every function that contains a destructor that must be invoked
if an exception occurs must have one of these things. When your code creates a new
object, there is a small bit of code executed to update the data on this thread-global
list element. When that object is destroyed, more code is executed. Because of this,
x86 can catch hundreds of thousands of exceptions per second, but if you don’t actually
take any exceptions, your CPU executes lots of blobs of code that have no purpose
except to be sure that the rare exception is handled properly. Finding the first
function that needs to handle an exception, or destroy an object is an O(1) operation:
it’s a single lookup of FS:[0]. VERY fast. Unfortunately, that should not
be a common scenario. Exceptions should be exceptional, not common!In addition,
this linked list really resides on the stack, thus there is a function pointer sitting
right below the return address on your stack!Buffer overruns, anyone?There is now
an extra blob of information that indicates what functions are valid to be invoked
as exception handlers (link /SAFESEH, if you’re curious), but this is only used
if every contribution to your .exe or .dll has this information.
With this in mind, on x64 every function has a very strict structure that must be
properly described in static data. The prolog is the only place in which you can
adjust your stack frame pointer. The prolog can only be up to 255 bytes long. All
modifications of your stack frame pointer, as well as all saves of nonvolatile registers
(RBP, RSI, RDI, R12-R15, XMM6-XMM15) must be described in this static data, so that
they can be restored correctly if an exception occurs. If you function is missing
this static data, and an exception is raised, the thread will be terminated by the
OS.
In an effort to get stuff into peoples hands, here’s an excerpt that I’ve prepended
to the ABI document that has not yet seen the light of day. I’ve updated the links
to point you
at the sections on MSDN. Rather than posting the whole thing, I've just put in my
description, with links back to the slightly older version on MSDN. The current
version has some more detail, but nothing you couldn't figure out through, ahem,
trial & error...
<snip>
Overview
The in depth nature of an ABI document doesn’t lend itself to ‘easy reading’. However,
it is the case that a detailed knowledge of the entire ABI is rarely necessary to
accomplish most programming tasks. This section is simply a quick overview of the
ABI, with pointers to the sections that describe the various aspects in more detail.
It also tries to point out particular ‘gotchas’ that must be strictly adhered to,
in an effort to minimize the problems encountered.
Calling convention
The x64 Windows ABI is a 4 register ‘fast-call’ calling convention, with stack-backing
for those registers. There is a strict one-to-one correspondence between arguments
in a function, and the registers for those arguments. Any argument that doesn’t fit
in 8 bytes, or is not 1, 2, 4, or 8 bytes, must be passed by reference.
There is no attempt to spread a single argument across multiple registers. The x87
register stack is unused. It may be used, but must be considered volatile across
function calls. All floating point operations are done using the 16 XMM registers.
The arguments are passed in registers RCX, RDX, R8, and R9. If the arguments are
float/double, they are passed in XMM0L, XMM1L, XMM2L, and XMM3L. 16 byte arguments
are passed by reference. Parameter passing is described in detail at
http://tinyurl.com/q2exu. In addition to these registers, RAX, R10,
R11, XMM4, and XMM5 are volatile. All other registers are non-volatile. Register
usage is documented in detail at http://tinyurl.com/pe7ec
and http://tinyurl.com/fjkd5.
The caller is responsible for allocating space for parameters to the callee, and
must always allocate sufficient space for the 4 register parameters, even if the
callee doesn’t have that many parameters. This aids in the simplicity of
supporting K&R C’s unprototyped functions, and ‘vararg’ C/C++ functions. For
vararg/unprototyped functions any float values must be duplicated in the corresponding
general-purpose register. Any parameters above the first 4 must be stored on the
stack, above the backing-store for the first 4, prior to the call. Vararg function
details can be found at http://tinyurl.com/s6deb.
Unprototyped function information is detailed at
http://tinyurl.com/ha4w9.
Alignment
Most structures are aligned to their natural alignment. The primary exceptions
are the stack pointer and malloc/alloca memory, which are aligned to 16 byte, in
order to aid performance. Alignment above 16 bytes must be done manually, but since
16 bytes is a common alignment size for XMM operations, this should suffice for
most code. For more information about structure layout and alignment see
http://tinyurl.com/j9qf3. For information about the stack layout, see
http://tinyurl.com/rjbfx.
Unwindability
All non-leaf functions [functions that neither call a function, nor allocate any
stack space themselves] must be annotated with data [referred to as xdata or ehdata,
which is pointed to from pdata] that describes to the operating system how to properly
unwind them, to recover non-volatile registers. Prologs & epilogs are highly
restricted, so that they can be properly described in xdata. The stack pointer must
be aligned to 16 bytes, except for leaf functions, in any region of code that isn’t
part of an epilog or prolog. For details about the proper structure of function
prolog & epilogs, see http://tinyurl.com/z2mge.
For more information about exception handling, and the exception handling/unwinding
pdata & xdata see http://tinyurl.com/gd2x9.
</snip>
Anyway, I hope that helps. I could not believe how long it took to find this
stuff on MSDN. For some (lame) reason, blogs are better indexed than MSDN in both
MSN search, and Google. Hopefully this entry will make finding this information
easier.
Here’s a link to the official x64 ABI documentation, which goes into excruciating
detail about this stuff. Sorry if it’s not very readable – I wrote a few parts,
along with a few other people, and we’re primarily engineers, not writers. We have
had a great UE write take over on this document, and it should be slightly improved
when it sees the light of day as part of the VS 2005 documentation, but until then,
this is it:
http://tinyurl.com/qbadc
Hopefully, that link will continue to work for a while. MSDN likes to occasionally
restructure the way it references information, just to keep us all on our toes.