Ever since v1, corprof.idl has contained the following ominous comment above the typedefs for FunctionEnter/Leave/Tailcall.
* NOTE!!!** It is VERY IMPORTANT to note that these function implementations must be* __declspec(naked), since the EE is not saving any registers before calling* any of them. YOU MUST SAVE ALL REGISTERS YOU USE, INCLUDING FPU REGISTERS* IF THE FPU STACK IS NOT EMPTY AND YOU INTEND TO USE IT.
But what does it mean, exactly? And what black magic must one perform to avoid the presumably dire consequences of ignoring it?
The most important part of the comment is that bit in all caps—you must save all registers you use, including floating point registers. The reason for this lies in how functions get hooked, and more specifically in the calling convention used.
Raymond Chen has a good blog post on the x86 calling conventions, and there are some diagrams on MSDN that get into a bit of it.
When you ask the JIT to hook a given function, it strives to do so as inexpensively as it can. In the prologue, it puts the simplest possible call to FunctionEnter—just push the args and call—and does the same in the epilogue with FunctionLeave and in any tailcall location with FunctionTailcall. So on x86 you get code like this (comments are mine):
push 0325BFE8h // Push the FunctionIDcall 797EAC98h // Call the FunctionEnter hook... // JITted code for the managed function
This looks a lot like stdcall—parameters on the stack, pushed from right to left, and removed by the callee—except the JIT isn’t saving any of the caller-saved registers (EAX, ECX, EDX, and floating-point registers).
So if the JIT puts anything interesting in those registers (which it does), and your hook scribbles on them without restoring their original values afterwards (which you practically can’t avoid if your hook is written in C), you’ve just committed the Cardinal Sin of Diagnostic Tools and seriously horked a process you’re supposed to just be monitoring.
CLR 2.0 added support for x64 and IA64. On these platforms the Microsoft compilers only have a single calling convention, which is effectively fastcall (lots of arguments passed in registers; once again Raymond Chen comes to the rescue with a post on the x64 calling convention). The 64-bit JITs take a similar approach to the x86 one, namely using the standard calling convention except without saving some of the normally caller-saved registers.
Given the above, all you have to do to have a proper function hook is:
There are many ways to skin this particular cat. You can use just about any combination of the following:
Just a few of the ways you can put these things together are:
So the whole thing about “these functions MUST be declspec(naked)” isn’t quite true; it’s just arguably the easiest way of doing it on x86.
By popular demand, I’ve cooked up some sample code for how to do this on the three platforms (thanks to Josh Williams for the 64-bit code and to Sean Selitrennikoff for the x86 FP-saving code).
Download: FunctionHooks.zip (Updated 9/11/2005)
In this zip file you’ll find three code files: leave_x86,cpp, leave_x64.asm, and leave_ia64.s. Each file shows the conservative way of implementing a FunctionLeave2 hook—saving all registers—with instructions for how to transform that code to be used for the other hooks. These hooks have a really good chance of working on future versions of the CLR.
If you don’t mind sacrificing some compatibility in exchange for speed, there are some shortcuts you can take based on the current implementation of these hooks. The following table outlines what you need to do beyond what a standard C function would do for each case.
v1.x Hooks
v2.0 Hooks
* - If you use COR_PRF_ENABLE_FRAME_INFO, COR_PRF_ENABLE_FUNCTION_ARGS, COR_PRF_ENABLE_FUNCTION_RETVAL, or COR_PRF_ENABLE_STACK_SNAPSHOT, the v2.0 hooks will not be direct direct calls to the profiler; the CLR will put some code in the middle that gathers the requested data. If that is the case, you can just use regular functions for the hooks.
** - In 2.0, the old-style enter/leave callbacks are never direct calls to the profiler; there’s some CLR code in there doing a translation. So you can just use regular functions for these hooks, but you’re really paying a perf penalty to do so. If all you need is the FunctionID, best to use the Enter2 family with all the special flags turned off.
The sample code for x86 and x64 shows in comments where you can remove code in order to make some of these optimizations. Again, these assumptions have a really good chance of being invalid in future versions of the CLR, so you make them at the risk of breaking in the future.
Feel free to take this code and copy it into your own. As usual, it’s provided as-is with no warranties. :-) (Though do let me know if there are any flaws in it.)
I'm still trying to decide what kind of blog I want to have, so let me know what you think. Is this the kind of post you want to see coming from a guy who's working on the things I'm working on? Is it helpful to you, at the right level, etc.? What else would you like to see me writing about?