I've had a fairly large number of e-mails with various people both inside & outside Microsoft explaining the AMD64 unwind data. I generally push them at the ABI documentation (which I've linked to in this entry). But the ABI documentation really requires a complete reading before you can really understand how the unwind information all fits together. So, in an attempt to make dealing with unwind information, I'm going to attempt a more 'chatty' explanation of unwind data.
The 2 phase exception model
Both Win32 & Win64 have a 2 phase exception model. The first phase walks the list of functions that _might_ handle an exception, asking them if they will _actually_ handle the particular exception that occurred (this is done by calling filter functions). During this phase, the filter functions either return EXCEPTION_CONTINUE_SEARCH (0) which indicates that this handler will not handle the exception, EXCEPTION_HANDLE_EXCEPTION (1), which indicates that this handler will handle the exception. One other option is EXCEPTION_RESUME_EXECUTION (-1) that I won't mention (look it up in MSDN if you're curious).
Why do you need unwind data?
First, you should probably understand WHY the Windows AMD64 ABI requires unwind data. In Win32 x86 land, when an exception occurs, the OS just looks at a pointer at fs:0 which is the head of a linked list of information regarding what to do in the event of an exception. Each function that requires any sort of attention when an exception occurs needs to create a node on this linked list. This node must contain all information necessary to handle the two phase exception mechanism of Win32. For Windows for AMD64, the way the functions that need to be invoked during exception handling are discovered not by walking a linked list, but crawling the stack. Thus, the stack must _always_ be in a state that can be statically walked. To accomplish this, there are 2 fundamental issues: how to discover the stack frame size of a function, and how to recover non-volatile (aka caller saved) register values. This is exactly what AMD64 unwind data describes. The trouble most people run into, though, is that even if your function doesn't need to do anything if an exception occurs, the function may be called by a function that does, and it may then call a function that will throw an exception. If this is the case, when the exception is thrown, the function's stack frame must be fully described. As an added bonus, because all stack frames are accurately described, there's never any reason to use a frame pointer unless absolutely necessary due to something like _alloca or __declspec(align(>16)).
So, even if you just have a tiny little function that only calls another function, you still need unwind data, or when an exception occurs, your process will simply be terminated.
OK, What is unwind data?
Simply put, it's a meta-language that describes what, when, and how a function's frame is built. There are opcodes that indicate when & by how much the stack pointer has been adjusted, when & where a non-volatile register has been saved, or when, where, and to what offset a frame pointer has been set. When you're writing assembly code for ML64, there are predefined macros to help describe this stuff:
PROC FRAME [optional handler address]
.SAVEREG reg, offset
.SAVEXMM128 reg, offset
.SETFRAME reg, offset
Note that all those offsets are actually restricted to be properly aligned. The .SAVEREG offset must be a multiple of 8, while .SAVEXMM128 offset must be a multiple of 16. The .SETFRAME offset must be a multiple of 16, and must be between 16 and 240. In addition, all frame manipulation must be completed in the first 254 bytes of the function. If you want to push registers saves & restores further into the function, you must use chained unwind info, which will have to be the subject of another blog entry...
Looking up the directives on MSDN will give you examples of how they're all used. If you feel like authoring code that conforms to the prologue unwind descriptors is restrictive, you'll love the epilogue requirements. All function epilogues must look like this:
(optional) lea rsp, [frame ptr + frame size] or add rsp, frame size
pop reg (zero or more)
ret (or jmp)
No other instructions may occur betwen the first lea/add and the final jmp or ret. At first glance, this may seem like you can't restore any XMM registers. The trick is that all non-volatile registers except those that you want to restore using a pop must be restored prior to entry to the epilogue. The reason this works is that if the OS has to unwind an epilogue, it already has the correct values in all the registers except the ones that are restored via pop, so things really do work out well.
One other note: if the final jmp isn't an ip-relative jmp, but an indirect jmp, it must be preceded by the REX prefix, to indicate to the OS unwind routines that the jump is headed outside of the function, otherwise, the OS assumes it's a jump to a different location inside the same function.
From here, I think I might see what kind of questions pop up, and go from there. I'll also add a future entry to describe how to use chained unwind information to allow you to save registers later in the function that the first 254 bytes (although I _think_ this requires that you need to author your own .pdata & .xdata which is a whole lot more complicated...)
It is a very good writing!
And, how about in Linux? I am meeting the trouble now.
Or, any suggestion?
Sorry, I know nothing about how stack unwinding works for Linux. I've heard that it involves Dwarf2 debug info some how, but have never bothered to confirm that fact...
How do you determine the return IP when epilogue has a JMP instrcution.
@Bino: The return IP is what ever is sitting at the top of the stack immediately before (and after) the JMP instruction is executed, same as an epilog that ends in a RET.