I noticed another fun thing about structured exception handling the other day; it's probably old news to the compiler team, but I thought it was interesting.

Imagine you have code like this:

DoSomething(); // This may raise an exception.
try {
   *x = NULL;  // This may raise an exception, too.
   ...
}

On x86, this is pretty straightforward.  There are exception registration records on the stack, chained together, with pointers to functions to be called to filter the exception and unwind the stack.  The exception handling code just walks a linked list.

On amd64, the stack needs to be examined.  There are no exception registration records; the return addresses are walked to find the call stack and the exception handling context for each call frame.

So if you're a compiler, you must not generate code like this:

  0000000100001C6E: FF 15 A4 F4 FF FF  call        qword ptr [__imp_DoSomething]
  0000000100001C74: 66 C7 03 00 00     mov         word ptr [rbx],0

... because then the return address of the call would be the mov instruction, and exceptions raised in DoSomething() would be in the same context as exceptions raised by the mov instruction.

So the compiler adds a nop:

  0000000100001C6E: FF 15 A4 F4 FF FF  call        qword ptr [__imp_DoSomething]
  0000000100001C74: 90                 nop
  0000000100001C75: 66 C7 03 00 00     mov         word ptr [rbx],0

That nop's sole purpose in life (so far as I can determine, anyway) is to provide a return address from the call.

Incidentally, ia64 doesn't appear to require this--at least, the compiler doesn't generate it. I'd assume that since ia64 instructions are more consistently sized, it's easier to figure out the address of the actual call instruction for a given return pointer (and this appears to be what the exception handling code actually does).