Ever since v1, corprof.idl has contained the following ominous comment above the typedefs for FunctionEnter/Leave/Tailcall.

* NOTE!!!
*
* It is VERY IMPORTANT to note that these function implementations must be
* __declspec(naked), since the EE is not saving any registers before calling
* any of them. YOU MUST SAVE ALL REGISTERS YOU USE, INCLUDING FPU REGISTERS
* IF THE FPU STACK IS NOT EMPTY AND YOU INTEND TO USE IT.

But what does it mean, exactly? And what black magic must one perform to avoid the presumably dire consequences of ignoring it?

Calling Conventions and Register Preservation

The most important part of the comment is that bit in all caps—you must save all registers you use, including floating point registers. The reason for this lies in how functions get hooked, and more specifically in the calling convention used.

Raymond Chen has a good blog post on the x86 calling conventions, and there are some diagrams on MSDN that get into a bit of it.

When you ask the JIT to hook a given function, it strives to do so as inexpensively as it can. In the prologue, it puts the simplest possible call to FunctionEnter—just push the args and call—and does the same in the epilogue with FunctionLeave and in any tailcall location with FunctionTailcall. So on x86 you get code like this (comments are mine):

push 0325BFE8h    // Push the FunctionID
call 797EAC98h    // Call the FunctionEnter hook
...               // JITted code for the managed function

This looks a lot like stdcall—parameters on the stack, pushed from right to left, and removed by the callee—except the JIT isn’t saving any of the caller-saved registers (EAX, ECX, EDX, and floating-point registers).

So if the JIT puts anything interesting in those registers (which it does), and your hook scribbles on them without restoring their original values afterwards (which you practically can’t avoid if your hook is written in C), you’ve just committed the Cardinal Sin of Diagnostic Tools and seriously horked a process you’re supposed to just be monitoring.

64-bit

CLR 2.0 added support for x64 and IA64. On these platforms the Microsoft compilers only have a single calling convention, which is effectively fastcall (lots of arguments passed in registers; once again Raymond Chen comes to the rescue with a post on the x64 calling convention). The 64-bit JITs take a similar approach to the x86 one, namely using the standard calling convention except without saving some of the normally caller-saved registers.

Writing a Proper Function Hook

Given the above, all you have to do to have a proper function hook is:

  • Implement the callee’s half of the calling convention
  • Also save the caller-saved registers if you happen to use them

There are many ways to skin this particular cat. You can use just about any combination of the following:

  • Pure assembly code
  • Inline assembly code (x86 only)
  • __declspec(naked) (x86 only)
  • Normal C functions

Just a few of the ways you can put these things together are:

  1. Write your hook purely in assembly code. This is great if you’re an assembly whiz, have a really small amount of work to do in the hook, and want it to be as fast as possible.
  2. Write a little stub in assembly code that saves the caller-saved registers and then chains the call to a normal C function to do the real work. The normal function will of course save the callee-saved registers for you. This is a good option if you need to write the bulk of your hook in C.
  3. Write a __declspec(naked) function that uses some inline assembly to save all the registers and then does the work inline. __declspec(naked) just tells the compiler that you’re going to do all work of implementing the calling convention yourself. This option is only available on x86, and it lets you save some call overhead over #2 on that platform. (I suppose it’s technically possible that you might save some registers you didn’t need to with this option, but it’s not likely—on x86 the C compiler quickly runs out of registers.)
  4. Write a normal function that uses some inline assembly to save the caller-saved registers. I mention this one only for completeness. I don’t recommend it, because the interactions between C code and inline assembly code in non-naked functions can be weird, and it’s tough to get this right, especially if you’re changing the C part of things at all. #3 is just as fast and gives you a lot more control over what’s going on.

So the whole thing about “these functions MUST be declspec(naked)” isn’t quite true; it’s just arguably the easiest way of doing it on x86.

Sample Code

By popular demand, I’ve cooked up some sample code for how to do this on the three platforms (thanks to Josh Williams for the 64-bit code and to Sean Selitrennikoff for the x86 FP-saving code).

Download: FunctionHooks.zip (Updated 9/11/2005)

In this zip file you’ll find three code files: leave_x86,cpp, leave_x64.asm, and leave_ia64.s. Each file shows the conservative way of implementing a FunctionLeave2 hook—saving all registers—with instructions for how to transform that code to be used for the other hooks. These hooks have a really good chance of working on future versions of the CLR.

If you don’t mind sacrificing some compatibility in exchange for speed, there are some shortcuts you can take based on the current implementation of these hooks. The following table outlines what you need to do beyond what a standard C function would do for each case.

 

v1.x Hooks

v2.0 Hooks

  FunctionEnter FunctionTailcall FunctionLeave FunctionEnter2* FunctionTailcall2* FunctionLeave2*
v1.x Save caller-saved integer regs Save caller-saved integer and FP regs Not Applicable (these hooks didn't exist in 1.x)
v2.0 x86 Nothing; standard C function OK** Save caller-saved integer regs Save caller-saved integer and FP regs
v2.0 x64 Nothing; standard C function OK Save integer and FP return regs
v2.0 IA64 Save FP argument regs

* - If you use COR_PRF_ENABLE_FRAME_INFO, COR_PRF_ENABLE_FUNCTION_ARGS, COR_PRF_ENABLE_FUNCTION_RETVAL, or COR_PRF_ENABLE_STACK_SNAPSHOT, the v2.0 hooks will not be direct direct calls to the profiler; the CLR will put some code in the middle that gathers the requested data. If that is the case, you can just use regular functions for the hooks.

** - In 2.0, the old-style enter/leave callbacks are never direct calls to the profiler; there’s some CLR code in there doing a translation. So you can just use regular functions for these hooks, but you’re really paying a perf penalty to do so. If all you need is the FunctionID, best to use the Enter2 family with all the special flags turned off.

The sample code for x86 and x64 shows in comments where you can remove code in order to make some of these optimizations. Again, these assumptions have a really good chance of being invalid in future versions of the CLR, so you make them at the risk of breaking in the future.

Feel free to take this code and copy it into your own. As usual, it’s provided as-is with no warranties. :-) (Though do let me know if there are any flaws in it.)

Feedback

I'm still trying to decide what kind of blog I want to have, so let me know what you think. Is this the kind of post you want to see coming from a guy who's working on the things I'm working on? Is it helpful to you, at the right level, etc.? What else would you like to see me writing about?