12/6/2007 Update: The bug described below has been fixed in CLR 2.0 SP1. (See this post for more info.) However, you may still find this info useful as a tutorial on performing IL rewriting.
Holger, a helpful reader of this blog, has pointed out a bug in the CLR Profiling API, in how it enables profilers to inspect certain kinds of return values. I'm going to explain the bug and offer a way for you to work around it. Although we do plan to release a fix for this bug, it's still unclear when that fix will be available, and it could be a long, long time away. (Don't delay shipping your profiler until this gets fixed!)
You can use SetEnterLeaveFunctionHooks2 to tell the CLR to call special hooks in your profiler whenever a managed function is entered and left. In these hooks you may opt in to receiving information about the parameters (on Enter) and return value (on Leave). Jonathan Keljo has the definitive blog entry on how you set this up.
In particular, you may opt in to receiving return values by using the profiler flags COR_PRF_MONITOR_ENTERLEAVE and COR_PRF_ENABLE_FUNCTION_RETVAL, and calling SetEnterLeaveFunctionHooks2(). In your FunctionLeave2 hook:
void FunctionLeave2(FunctionID funcId, UINT_PTR clientData, COR_PRF_FRAME_INFO func, COR_PRF_FUNCTION_ARGUMENT_RANGE *retvalRange);
you may use retvalRange to explore the return value. retvalRange is a struct with two fields: startAddress and length.
If all of the following are true, startAddress may be wrong:
And even then, you may not hit the bug, as that brings us to #3. #3 talks about an invariant in our code base that was unfortunately broken... some of the time. The profiling API made an assumption that the return value will always be copied to the leaf-most end of the native stack by the time a managed function returns. Unfortunately, the JIT/NGEN compiler doesn't always generate code that does this. Here are some examples of when this assumption is invalid:
I'm trying to give you as much information as I can about this, but a key point here is you will never really know for sure when the JIT/NGEN compiler set up the stack frame in an unexpected way (i.e., when #3 of the 3 numbered items above is true). The JIT is a complicated beast, and it would be foolish of me to try to list out every possible way the stack layout will cause this bug to occur. Although these examples may give you an idea of a minimum bound on how often #3 is true, you should always assume that #3 will be true. Your profiler will know at run-time when #1 and #2 are true. So if either of them is false, it's safe for you to inspect return values. But your profiler cannot know with 100% certainty when #3 is true. So if #1 & #2 are true, you must assume #3 is true as well, or else your profiler is doomed to report bogus data at seemingly random times.
We tossed around some ideas on how your profilers can work around this until we can officially release a fix. Indeed, I could probably write an interesting blog entry just on one of the ideas we dismissed. Eventually we decided that our best idea was for the profiler to use IL rewriting to get at the return values in a safe way.
Please don't cry. IL rewriting isn't that bad. :-) In the future, I plan to write some entries with tips on doing IL rewriting safely. For now, here is a general idea for how a profiler might use IL rewriting to inspect return values.
Intercept JITCompilationStarted callbacks, waiting for functions whose return values you're interested in. One gotcha to look out for here would be if the app you're profiling is loading NGENd images. If so, you won't be getting JITCompilationStarted messages for those functions that have already been compiled to native code. At least, not without a little help from you. You'll also want to intercept JITCachedFunctionSearchStarted(), which gets called for each function before the CLR chooses to use the NGENd version of it. If you set *pbUseCachedFunction to FALSE inside this callback, the CLR will ignore the NGENd version of the function, and will JIT the function instead. Then, you'll get your JITCompilationStarted callback.
But wait! You might not even get the JITCachedFunctionSearchStarted callback! The comment in corprof.idl warns us that the only reliable way to get JITCachedFunctionSearchStarted is if the profiled app is using the ngen /Profile flavor of the NGENd modules. You can ensure this happens if you set any of the special profiler event flags that force the CLR to use the /Profile flavor of all NGENd modules. COR_PRF_USE_PROFILE_IMAGES is the most obvious of these flags. But COR_PRF_MONITOR_ENTERLEAVE will do the trick just as well, and you were probably already setting that one if you were trying to inspect return values in your Leave2 hook. If your profiler sets any of the flags that force the CLR to load the /Profile-flavored NGEN modules, but some of the /Profile-flavored NGEN modules aren't available on the box running the profiled app, the CLR will just default back to JITting the functions in those modules (in which case you'll get the JITCompilationStarted callbacks anyway).
OK, so finally we're reliably getting JITCompilationStarted callbacks. When you receive JITCompilationStarted for a function whose return value interests you, grab its IL: GetFunctionInfo lets you go from FunctionID to ModuleID and the function's mdToken; GetILFunctionBody() takes the ModuleID and mdToken and gives you the IL.
The trick we'll use to rewrite the IL is based on the fact that the return value will be loaded onto the IL stack just before returning. As a brief aside, note that the IL "stack" is really a logical stack. When the JIT generates native code, pushes and pops against the IL stack don't necessarily translate to pushes and pops on the native stack. Before you go too deep into IL rewriting you will definitely want at least a basic understanding of how IL works. Free information can be found by running ildasm on any assembly (to view the IL that results from compiling C#, VB.NET, etc.), and by reading through the ECMA CLI Specification (currently on msdn at http://msdn.microsoft.com/netframework/ecma/). Partition III goes into the details of the IL instructions, and Partition II talks about metadata.
OK, back to the trick. By the time you come across a ret instruction, you know the return value should be sitting at the leaf of the IL stack. What we'd like to do is peek at this value somehow. Perhaps the easiest way to do this is to just write a managed assembly that contains a function to receive the value as a parameter, and then perform operations on it (e.g., calling its ToString() implementation). Let's call this function "MyInspector()". You can write MyInspector using your favorite managed language, editor, and compiler. You can write MyInspector at design time, and you don't need to use IL directly (that's what the compiler is for :-)). The dynamic IL you'll rewrite at run-time will just call MyInspector with the return value.
Let's start with the dynamic IL. You've found a ret instruction. Now what? You know you want to take the return value that was pushed onto the IL stack, and pass it to your MyInspector() function. Fortunately IL passes parameters by loading them onto the stack, so it would seem the return value is right where we want it! The catch is that your MyInspector() function will automatically pop the parameter off of the stack, leaving an empty stack by the time your ret instruction gets called. We need to push that return value again somehow. Luckily, "dup" comes to the rescue. It just takes the leaf-most item on the stack and pushes it again--a handy shortcut for doing this manually.
Now you have a choice. Do you pass the value directly to MyInspector or do you box it first? That depends on the type of parameter you design MyInspector to accept. If you pass the value directly, then you'll need many different inspector functions, each taking a parameter with a specific value type. You could write many such MyInspector functions for well-known non-primitive value types, like System.Decimal. Alternatively, you could have a single MyInspector that just takes an Object. To call this function, your dynamic IL would need to box the return value before calling MyInspector. Probably you'll want to do a combination of both approaches: have specific MyInspectors for known types, and default to the Object version for everything else. In this example, I'll just show a single Object-typed MyInspector():
namespace MyProfiler { public class Helper { public static void MyInspector(Object o) { Console.WriteLine("MyProfiler.Helper.MyInspector: {0}", o.ToString()); } } }
Original:
ret
dup box (TypeRef / TypeDef token) call MyProfiler.Helper.MyInspector ret
Be careful which functions you perform IL rewriting on! Here are some things to keep in mind:
For those of you who don't want to write any of your profiler's inspection logic using managed code, I'll blog soon about how you can rewrite IL to call into unmanaged code. That way, instead of authoring your own managed assembly to do inspection, you can have the rewritten IL call directly into your profiler's unmanaged code to do this inspection.
To wrap up: we have a bug in the CLR, and the fix may be a long time coming. [12/6/2007 Update: This bug has been fixed in CLR 2.0 SP1. (See this post for more info.)] So in the meantime you'll want to get familiar with IL rewriting if you need to inspect non-primitive, value-type return values on x86. Also, I want to give a quick shout out to a couple members of the CLR team who've helped considerably with this content: thanks, Jan Kotas and Rick Winter! And of course, thanks again to Holger for bringing this to our attention.