Welcome to MSDN Blogs Sign in | Join | Help

CLR V4: Load your profiler without using the registry

One of the new features in CLR V4 is the ability to load your profiler without needing to register it first.  In V2, we would look at the following environment variables:

COR_ENABLE_PROFILING=1

COR_PROFILER={CLSID of profiler}

and look up the CLSID from COR_PROFILER in the registry to find the full path to your profiler's DLL.  Just like with any COM server DLL, we look for your profiler's CLSID under HKEY_CLASSES_ROOT, which merges the classes from HKLM and HKCU.

We mostly follow the same algorithm in V4, so you can continue registering your profiler if you wish.  However, in V4 we look for one more environment variable first:

COR_PROFILER_PATH=full path to your profiler's DLL

If that environment variable is present, we skip the registry look up altogether, and just use the path from COR_PROFILER_PATH to load your DLL.  A couple things to note about this:

  • COR_PROFILER_PATH is purely optional.  If you don't specify COR_PROFILER_PATH, we use the old procedure of looking up your profiler's CLSID in the registry to find its path
  • If you specify COR_PROFILER_PATH and register your profiler, then COR_PROFILER_PATH always wins.  Even if COR_PROFILER_PATH points to an invalid path, we will still use COR_PROFILER_PATH, and just fail to load your profiler.
  • COR_PROFILER is always required.  If you specify COR_PROFILER_PATH, we skip the registry look up; however, we still need to know your profiler's CLSID, so we can pass it to your class factory's CreateInstance call.
Posted by davbr | 0 Comments

What does Dave look like?

Find out on channel 9 as Jon Langdon, Thomas Lai, and I discuss some of the new diagnostics features in CLR V4.

Posted by davbr | 1 Comments

Run your V2 profiler binary on CLR V4

Ok, you've installed VS 2010 beta 1, along with .NET FX 4.0 beta 1, and you're wondering--can you run your profiler against this new .NET framework without recompiling the profiler?

Yes, you can!  Though not by default.  Although CLR V4 is much more compatible with CLR V2 than CLR V2 was with CLR V1.1, there are still some differences that can mess with your profiler's mind.  So by default, the CLR V4 runtime refuses to activate V2 profilers.  If you try, you'll see that the CLR will LoadLibrary the profiler, use the class factory to generate an instance of the callback object, try to QI for the new ICorProfilerCallback3, and then when that fails, the CLR will release the callback object and log an event to the event log explaining that you need to set COMPLUS_ProfAPI_ProfilerCompatibilitySetting appropriately as your way of "opting in" to running the older profiler binary against the newer CLR.  This message is meant for your users, as a way of telling them that the profiler vendor (you) may not yet have tested the profiler against CLR V4.

So, how to use COMPLUS_ProfAPI_ProfilerCompatibilitySetting?  Set it to one of the following 3 values:

  • EnableV2Profiler
    • It enables the V2 profiler to be activated by V4 CLR.
  • DisableV2Profiler (default)
    • V4 CLR refuses to activate the V2 profiler, and logs an event to the event log.
  • PreventLoad
    • V4 CLR does not load the profiler, regardless of the profiler’s version.  This is useful for preventing problems in certain in-process side-by-side CLR scenarios.  More on that in an upcoming post.
Posted by davbr | 1 Comments

CLR V4 Beta 1 Released!

Now is the time to try out your profiler against the new .NET FX 4.0 Beta 1 bits.  I'll be writing about some gotchas, and how to take advantage of the new features.  But first, get started downloading:

Visual Studio 2010 Product Page

You can find some reference documentation on the new profiling interfaces here:

http://msdn.microsoft.com/en-us/library/ms404386(VS.100).aspx

Posted by davbr | 1 Comments

FunctionHooks.zip re-uploaded

Jonathan Keljo's blog entry on the enter/leave/tailcall function hooks had a link to sample code that's been broken for a while.  You can now find the sample code here: FunctionHooks.zip

Posted by davbr | 1 Comments

Why we have CORPROF_E_UNSUPPORTED_CALL_SEQUENCE

What follows is a long-lost blog entry that Jonathan Keljo had been working on.  I brushed off some of the dust and am posting it here for your enjoyment.  Thank you, Jonathan!

 

In CLR 2.0 we added a new HRESULT, CORPROF_E_UNSUPPORTED_CALL_SEQUENCE.  This HRESULT is returned from ICorProfilerInfo methods when called in an "unsupported way".  This "unsupported way" is primarily an issue with those nasty beasts, hijacking profilers (though read on for cases where non-hijacking profilers can see this HRESULT, too).  Hijacking profilers are those profilers that forcibly reset a thread's register context at completely arbitrary times to enter profiler code, and then usually to re-enter the CLR via ICorProfilerInfo.  Why is that so bad?  Well, for the sake of performance, lots of the IDs the profiling API gives out are just pointers to relevant data structures within the CLR. So lots of ICorProfilerInfo calls just rip information out of those data structures and pass them back. Of course, the CLR might be changing things in those structures as it runs, maybe (or maybe not) taking locks to do so.  Imagine the CLR was already holding (or attempting to acquire) such locks at the time the profiler hijacked the thread.  Now, the thread re-enters the CLR, trying to take more locks or inspect structures that were in the process of being modified, and are thus in an inconsistent state.  Deadlocks and AVs are easy to come by in such situations.

In general, if you're a non-hijacking profiler sitting inside an ICorProfilerCallback method and you're calling into ICorProfilerInfo, you're fine. For example, you get a ClassLoadFinished and you start asking for information about the class. You might be told that information isn't available yet (CORPROF_E_DATAINCOMPLETE) but the program won't deadlock or AV.  This class of calls into ICorProfilerInfo are called "synchronous", because they are made from within an ICorProfilerCallback method.

On the other hand, if you're hijacking or otherwise calling ICorProfilerInfo functions on a managed thread but not from within an ICorProfilerCallback method, that is considered an "asynchronous" call.  In v1.x you never knew what would happen in an asynchronous call. It might deadlock, it might crash, it might give a bogus answer, or it might give the right answer.

In 2.0 we've added some simple checks to help you avoid this problem. If you call an unsafe ICorProfilerInfo function asynchronously, instead of crossing its fingers and trying, it will fail with CORPROF_E_UNSUPPORTED_CALL_SEQUENCE.  The general rule of thumb is, nothing is safe to call asynchronously.  But here are the exceptions that are safe, and that we specifically allow to be called asynchronously:

  • GetEventMask/SetEventMask
  • GetCurrentThreadID
  • GetThreadContext
  • GetThreadAppDomain
  • GetFunctionFromIP
  • GetFunctionInfo/GetFunctionInfo2
  • GetCodeInfo/GetCodeInfo2
  • GetModuleInfo
  • GetClassIDInfo/GetClassIDInfo2
  • IsArrayClass
  • ForceGC
  • SetFunctionIDMapper
  • DoStackSnapshot

There are also a few things to keep in mind:

  1. ICorProfilerInfo calls made from within the fast-path Enter/Leave callbacks are considered asynchronous.  (Though ICorProfilerInfo calls made from within the slow-path Enter/Leave callbacks are considered synchronous.)  See the blog entries here and here for more info on fast / slow path.
  2. ICorProfilerInfo calls made from within instrumented code (i.e., IL you've rewritten to call into your profiler and then into ICorProfilerInfo) are considered asynchronous.
  3. Calls made inside your FunctionIDMapper hook are considered to be synchronous.
  4. Calls made on threads created by your profiler, are always considered to be synchronous.  (This is because there's no danger of conflicts resulting from interrupting and then re-entering the CLR on that thread, since a profiler-created thread was not in the CLR to begin with.)
  5. Calls made inside a StackSnapshotCallback are considered to be synchronous iff the call to DoStackSnapshot was synchronous.
Posted by davbr | 0 Comments

New stuff in Profiling API for upcoming CLR 4.0

Now that we've finally announced at PDC many of the new features coming up in the next major release of Visual Studio and CLR, I can elaborate some on what's coming up for the profiling API.  Also, see Rick Byers's blog entry which also talks about debugging improvements.

What the CLR will do for you

Our upcoming profiling API-specific features are inspired by a vision to improve production troubleshooting, and happily such features will improve the developer desktop experience as well.

Attach / detach

We will now allow profilers to attach to and detach from managed processes that are already running.  You no longer need to set environment variables and load the profiler when the managed app starts up.  (However, if you would like your profiler to load when the process starts up, you would still use the existing activation mechanism with environment variables.)  Attach / detach works with only a limited set of scenarios--see below.

You initiate the attach from a separate executable that we call the "trigger process".  If you already have a shell for your profiler, then your shell will typically serve as the trigger process.  We will provide you an API that your trigger process will call, which takes parameters that describe the target app to profile and details about your profiler.  That API will cause the target app to load your profiler into the target app's process space using the same code that we currently use to load your profiler from startup.

When your profiler is ready to detach from the process, it calls a method right on ICorProfilerInfo3 (the new Info interface for CLR 4.0).  That will cause the CLR to stop issuing profiling callbacks, and slip a bit until the profiler is provably off of all the threads' call stacks.  The profiler will then be unloaded from the process space, and another profiler may be attached again if the end user wishes.

Due to the nature of the profiling APIs, only a subset of the APIs will be available to profilers that attach to a live process, as opposed to those profilers that load on startup.  Specifically, attaching profilers will be able to use the APIs that enable sampling and memory profiling.  This includes such operations as walking the stack, mapping instruction pointers to managed methods and their metadata, receiving most GC callbacks, and inspecting statics and object instances on the heap, along with their type information.

APIs that will not be supported for attaching profilers include the ObjectAllocated callback and APIs that enable instrumentation (IL rewriting) of methods.  Those scenarios require the ability to "rejit" a method that has already been JITted or loaded from an NGENd module, and unfortunately we are unable to provide that functionality in CLR 4.

"Still no rejit?!  Are you kidding me?"

I wish I were.  Given customer demand, I personally rate rejit as more important than most of the other profiling API features we're doing combined.  However, rejit is very expensive in terms of development and testing.  After doing the math, it became apparent that even if we cut most of the other new profiling API features for CLR 4.0, rejit would still not fit.  But given the high demand, we're certainly still looking to deliver rejit in a future release of the CLR, just not 4.0.

Registry-free activation

An obstacle to getting profilers installed into production data centers is that operations managers distrust "impactful" installations that modify machine state.  So we're providing a way where you no longer need to register your profiler when it's installed or used.  (I'm talking here about the COM registration that uses the Windows registry to map your profiler's CLSID to the full path to the profiler's DLL.)  It also turns out that, even in developer desktop scenarios, relying on the registry can be a common cause of failures.  For example, perhaps your profiler tries to regsvr32 itself under HKLM, but the user does not have administrative privileges.  Or maybe the user does have administrative privileges, but is using Vista in non-elevated mode.  So registry-free activation should help with all of those scenarios.  Note that registry-free activation is optional.  Your profiler may continue to use traditional COM registration if you like.

Profiler Backward Compatibility

CLR 2.0 saw significant enough changes from CLR 1.1 that we refused to load 1.1 profilers into CLR 2.0.  However, CLR 4.0 is compatible enough with CLR 2.x that we will allow 2.x profilers to load into CLR 4.0 applications.  This behavior would not be the default, however.  If end users try to load their 2.x profiler into a CLR 4.0 application, the load will fail, and they will see an event log entry telling them either to upgrade their profiler, or to set a special environment variable to explicitly allow the older profiler to load.  By forcing the end users to opt in to this behavior, we set the expectation that it is not guaranteed or tested that 2.x profilers will still work, though we believe it is likely they will work in many scenarios.

Enter/Leave/Tailcall Enhancements

We have made some enhancements to the Enter/Leave/Tailcall interface to cut down on overhead when your profiler does not care about getting parameter or return value information. 

Other random stuff

You will also find several minor enhancements and bug fixes to the profiling API.  Not worth listing them all out here, but the forthcoming documentation on ICorProfilerCallback3 / ICorProfilerInfo3 will describe them.

What you must do for the CLR

Here are some of your responsibilities for playing nicely with CLR 4.0 applications.

In-process side-by-side CLR instances

Probably the biggest impact to your profiler as you upgrade it to CLR 4.0 will be supporting in-process side-by-side CLR instances.  This is actually a CLR-wide feature for 4.0 (not profiling API), but it has impact on profiling API tools.  Certain scenarios will now result in multiple instances of the CLR loaded into a single process, primarily to support backward-compatibility for managed components that load into a host.  (Imagine one old (2.x) CLR instance alongside a new (4.0) CLR instance in the same process.)  From the profiler’s point of view, it will be loaded multiple times, once per CLR instance.  This means your DLL gets LoadLibrary’d multiple times and you’ll receive multiple “CreateInstance” calls to your class factory object, to generate multiple instances of your ICorProfilerCallback implementation.  You can deal with this by:

  • Returning failure from all but one of your CreateInstance() calls.  This allows you to “pick” which CLR instance you wish to interact with.  OR
  • Succeeding many or all of your CreateInstance() calls.  This allows you to examine multiple CLRs simultaneously.

Pick 1 / Pick first: With this first approach, your profiler will choose to return success from only one CreateInstance() call.  "Pick 1" implies you allow your user to specify "which" CLR to profile, usually specified in terms of the version number of the CLR of interest.  "Pick First" implies you don't even ask your user--you just simplistically succeed the first CreateInstance() call, and fail the rest.  First CLR wins.  The advantages of these approaches are they are fairly easy to implement, and either one qualifies your profiler as being "side-by-side aware".

Pick many / Pick all: With this approach, your profiler collects data on multiple CLRs, with the intent of presenting that data to the user in some unified way.  You will have to be careful to manage multiple instances of your ICorProfilerCallback implementation and probably eliminate much of your global state.  For example, if you call into an ICorProfilerInfo from one CLR with IDs (e.g., AppDomainID) of the other CLR, that will likely cause an AV.  Many profilers are implemented with a global pointer to the "one and only" instance of their ICorProfilerCallback implementation.  This would no longer work, as you will now have multiple instances of your ICorProfilerCallback implementation, and each one must keep track of the corresponding ICorProfilerInfo interface to call into.  There will be enhancements to the profiling API to make this management easier, most notably improvements to the Enter/Leave/Tailcall and FunctionIDMapper interfaces.

I cannot stress the following enough, so I will state it twice.  When you update your profiler to work with CLR 4.0, you must update your profiler to become side-by-side aware.  This means you must do some amount of work, even if it is the simple "pick first" approach.  The CLR determines whether your profiler is "updated for CLR 4.0" by QI'ing for the new ICorProfilerCallback3 defined in the CLR 4.0 corprofl.IDL file.  If your profiler successfully returns a pointer to your ICorProfilerCallback3 implementation, then your profiler is considered a 4.0 profiler.  So to restate: If your profiler provides an ICorProfilerCallback3 implementation, then your profiler must be side-by-side aware.  The reason for this rule is that the CLR puts certain safeguards in place to protect older (2.x) profilers when they might load into scenarios that involve in-process side-by-side CLR instances.  If you claim your profiler is updated for 4.0, those safeguards are lifted, and you really don't want that to happen unless you're side-by-side aware.

If you're curious to learn more about this "in-process side-by-side CLR instances" feature, unfortunately the blogs and documentation are still pretty thin for the moment (though I imagine that will change in the coming months).  You can take a look at the PDC talk on CLR futures, which discussed this feature at a high level.  Go to the PDC 2008 site, and find the session called "PC49 Microsoft .NET Framework: CLR Futures".

Profiler Backward Compatibility

Not much extra to state here, but just to be explicit, you of course have a choice.  If you have a profiler that works just fine against CLR 2.0, you may either update it to work with CLR 4.0, or not update it.  If you choose to update it, that means you must implement ICorProfilerCallback3 and provide that implementation to the CLR when the CLR QI's for ICorProfilerCallback3.  And, due to the contract stated above, you must also ensure your profiler is side-by-side aware (pick 1, pick first, pick many, pick all, it's up to you).  The alternative is, don't update your profiler!  You will miss out on the new profiling API features listed above.  And your profiler may also not work well in scenarios that load in-process side-by-side CLR instances.  But maybe you don't care, or maybe you just want a temporary stopgap for your users until you've had the time to update and test your profiler for CLR 4.0.  Just remember that CLR 4.0 will not activate your 2.x profiler by default.  You will need to tell your users about the special environment variable mentioned above to get your 2.0 profiler to load into CLR 4.0.  Since this is all new stuff, the environment variable has not yet been documented at the time of this blog entry, but you can expect more info on it in MSDN when we release, and possibly info on this blog sooner.

 

I hope you've found this overview useful.  Since this is new and not documented yet, there's not much you can do to start preparing for CLR 4.0 yet.  However, I'd recommend you take a look through your code and see what it would take to become side-by-side aware.

Posted by davbr | 3 Comments

Visual Studio 2008 SP1 and .NET Framework 3.5 SP1 Released

See Soma's blog entry for more information.  Also, I updated the table that maps Visual Studio versions, .NET Framework versions, and CLR versions here.

Posted by davbr | 0 Comments

BUG: GetILFunctionBody returns wrong size

In case you missed it, there was a post on our forum here:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1366051&SiteID=1

and a comment on my blog here:

http://blogs.msdn.com/davbr/archive/2007/03/06/creating-an-il-rewriting-profiler.aspx#1881536

about an issue with GetILFunctionBody returning the wrong size.  This is indeed a bug in CLR 2.x, and it is recommended you follow the ECMA spec help you parse the function header to determine the actual size.  More details are in the forum post linked above.  This bug will be fixed in a future release of the CLR.

Posted by davbr | 3 Comments

Debugging Your Profiler II: SOS and IDs

In this debugging post, I'll talk about the various IDs the profiling API exposes to your profiler, and how you can use SOS to give you more information about the IDs.  As usual, this post assumes you're using CLR 2.x.

S.O.What Now?

SOS.DLL is a debugger extension DLL that ships with the CLR.  You'll find it sitting alongside mscorwks.dll.  While originally written as an extension to the windbg family of debuggers, Visual Studio can also load and use SOS.  If you search the MSDN blogs for "SOS" you'll find lots of info on it.  I'm not going to repeat all that's out there, but I'll give you a quick primer on getting it loaded.

In windbg, you'll need mscorwks.dll to load first, and then you can load SOS.  Often, I don't need SOS until well into my debugging session, at which point mscorwks.dll has already been loaded anyway.  However, there are some cases where you'd like SOS loaded at the first possible moment, so you can use some of its commands early (like !bpmd to set a breakpoint on a managed method).  So a surefire way to get SOS loaded ASAP is to have the debugger break when mscorwks gets loaded (e.g., "sxe ld mscorwks").  Once mscorwks is loaded, you can load SOS using the .loadby command:

0:000> sxe ld mscorwks
0:000> g
ModLoad: 79e70000 7a3ff000   C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=7efdd000 edi=20000000
eip=77a1a9fa esp=002fea38 ebp=002fea78 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
ntdll!NtMapViewOfSection+0x12:
77a1a9fa c22800          ret     28h
0:000> .loadby sos mscorwks

With SOS loaded, you can now use its commands to inspect the various IDs that the profiling API passes to your profiler.

Note: The following contains implementation details of the runtime.  While these details are useful as a debugging aid, your profiler code cannot make assumptions about them.  These implementation details are subject to change at whim.

FunctionID Walkthrough

For starters, take a look at FunctionIDs.  Your profiler receives a FunctionID anytime you hit a callback that needs to, well, identify a function!  For example, when it's time to JIT, the CLR issues JITCompilationStarted (assuming your profiler subscribed to that callback), and one of the parameters to the callback is a FunctionID.  You can then use that FunctionID in later calls your profiler makes back into the CLR, such as GetFunctionInfo2.

As far as your profiler is concerned, a FunctionID is just an opaque number.  It has no meaning in itself; it's merely a handle you can pass back into the CLR to refer to the function.  Under the covers, however, a FunctionID is actually a pointer to an internal CLR data structure called a MethodDesc.  I must warn you again that you cannot rely on this information when coding your profiler.  The CLR team reserves the right to change the underlying meaning of a FunctionID to be something radically different in later versions.  This info is for entertainment and debugging purposes only!

Ok, so FunctionID = (MethodDesc *).  How does that help you?  SOS just so happens to have a command to inspect MethodDescs: !dumpmd.  So if you're in a debugger looking at your profiler code that's operating on a FunctionID, it can beneficial to you to find out which function that FunctionID actually refers to.  In the example below, the debugger will break in my proifler's JITCompilationStarted callback and look at the FunctionID.  It's assumed that you've already loaded SOS as per above.

0:000> bu UnitTestSampleProfiler!SampleCallbackImpl::JITCompilationStarted
0:000> g
...
Breakpoint 0 hit
eax=00c133f8 ebx=00000000 ecx=10001218 edx=00000001 esi=002fec74 edi=00000000
eip=10003fc0 esp=002fec64 ebp=002feca4 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
UnitTestSampleProfiler!SampleCallbackImpl::JITCompilationStarted:
10003fc0 55              push    ebp

The debugger is now sitting at the beginning of my profiler's JITCompilationStarted callback.  Let's take a look at the parameters.

0:000> dv
           this = 0x00c133f8
     functionID = 0x1e3170
 fIsSafeToBlock = 1

Aha, that's the FunctionID about to get JITted.  Now use SOS to see what that function really is.

0:000> !dumpmd 0x1e3170
Method Name: test.Class1.Main(System.String[])
Class: 001e1288
MethodTable: 001e3180
mdToken: 06000001
Module: 001e2d8c
IsJitted: no
m_CodeOrIL: ffffffff

Lots of juicy info here, though the Method Name typically is what helps me the most in my debugging sessions.  mdToken tells us the metadata token for this method.  MethodTable tells us where another internal CLR data structure is stored that contains information about the class containing the function.  In fact, the profiing API's ClassID is simply a MethodTable *.  [Note: the "Class: 001e1288" in the output above is very different from the MethodTable, and thus different from the profiling API's ClassID.  Don't let the name fool you!]  So we could go and inspect a bit further by dumping information about the MethodTable:

0:000> !dumpmt 0x001e3180
EEClass: 001e1288
Module: 001e2d8c
Name: test.Class1
mdToken: 02000002  (C:\proj\HelloWorld\Class1.exe)
BaseSize: 0xc
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 6

And of course, !dumpmt can be used anytime you come across a ClassID and want more info on it.

IDs and their Dumpers

Now that you see how this works, you'll need to know how the profiling IDs relate to the various SOS commands that dump info on them:

ID Internal CLR Structure SOS command
AssemblyID Assembly * !DumpAssembly
AppDomainID AppDomain * !DumpDomain
ModuleID Module * !DumpModule
ClassID MethodTable * !DumpMT
ThreadID Thread * !Threads (see note)
FunctionID MethodDesc * !DumpMD
ObjectID Object * (i.e., a managed object) !DumpObject

Note:  !Threads takes no arguments, but simply dumps info on all threads that have ever run managed code.  If you use "!Threads -special" you get to see other special threads separated out explicitly, including threads that perform GC in server-mode, the finalizer thread, and the debugger helper thread.

More Useful SOS Commands

It would probably be quicker to list what isn't useful!  I encourage you to do a !help to see what's included. Here's a sampling of what I commonly use:

!u is a nice SOS analog to the windbg command "u". While the latter gives you a no-frills disassembly, !u works nicely for managed code, including spanning the disassembly from start to finish, and converting metadata tokens to names.

!bpmd lets you place a breakpoint on a managed method. Just specify the module name and the fully-qualified method name. For example:

!bpmd MyModule.exe MyNamespace.MyClass.Foo 

If the method hasn't jitted yet, no worries. A "pending" breakpoint is placed.  If your profiler performs IL rewriting, then using !bpmd on startup to set a managed breakpoint can be a handy way to break into the debugger just before your instrumented code will run (which, in turn, is typically just after your instrumented code has been jitted). This can help you in reproducing and diagnosing issues your profiler may run into when instrumenting particular functions (due to something interesting about the signature, generics, etc.).

!PrintException: If you use this without arguments you get to see a pretty-printing of the last outstanding managed exception on the thread; or specify a particular Exception object's address.

 

Ok, that about does it for SOS. Hopefully this info can help you track down problems a little faster, or better yet, perhaps this can help you step through and verify your code before problems arise.

Posted by davbr | 2 Comments

Debugging Your Profiler I: Activation

This is the first of some tips to help you debug your profiler.  Note that these tips assume you're using CLR 2.x (see this entry for info on how CLR version numbers map to .NET Framework version numbers).  In today's post, I address a frequent question from profiler developers and users: "Why didn't my profiler load?".

Event log

In the Application event log, you'll see entries if the CLR attempts, but fails, to load and initialize your profiler.  So this is a nice and easy place to look first, as the message may well make it obvious what went wrong.

Weak link in the chain?

The next step is to carefully retrace this chain to make sure everything is registered properly:

Environment variables --> Registry --> Profiler DLL on File system.

The first link in this chain is to check the environment variables inside the process that should be profiled.  If you're running the process from a command-prompt, you can just try a "set co" from the command prompt:

C:\>set co
(blah blah, other vars beginning with "co")
Cor_Enable_Profiling=0x1
COR_PROFILER={C5F90153-B93E-4138-9DB7-EB7156B07C4C}

If your scenario doesn't allow you to just run the process from a command prompt, like say an asp.net scenario, you may want to attach a debugger to the process that's supposed to be profiled, or use IFEO (HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options) to force a debugger to start when the worker process starts.  In the debugger, you can then use "!peb" to view the environment block, which will include the environment variables.

Once you verify Cor_Enable_Profiling and COR_PROFILER are ok, it's time to search the registry for the very same GUID set in your COR_PROFILER environment variable.  You should find it at a path like this:

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{C5F90153-B93E-4138-9DB7-EB7156B07C4C}

Note!  If you're on a 64 bit box, be aware of the WOW64 redirectors, and ensure you're looking at the proper view of the environment and registry!

If the registry has the GUID value, it's finally time to check out your file system.  Go under the InprocServer32 subkey under the GUID:

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{C5F90153-B93E-4138-9DB7-EB7156B07C4C}\InprocServer32

and look at the default value data.  It should be a full path to your profiler's DLL.  Verify it's accurate.  If not, perhaps you didn't properly run regsvr32 against your profiler, or maybe your profiler's DllRegisterServer had problems.

Time for a debugger

If the above investigation indicates everything's ok, then your profiler is properly registered and your environment is properly set up, but something bad must be happening at run time.  You'll want symbols for the CLR, which are freely available via Microsoft's symbol server.  If you set this environment variable, you can ensure windbg will always use the symbol server:

set _NT_SYMBOL_PATH=srv*C:\MySymbolCache*http://msdl.microsoft.com/download/symbols

Feel free to add more paths (separate them via ";") so you can include your profiler's symbols as well.  Now, from a command-prompt that has your Cor_Enable_Profiling and COR_PROFILER variables set, run windbg against the executable you want profiled.  The debuggee will inherit the environment, so the profiling environment variables will be propagated to the debuggee.

Note: The following contains implementation details of the runtime.  While these details are useful as a debugging aid, your profiler code cannot make assumptions about them.  These implementation details are subject to change at whim.

Once windbg is running, try setting this breakpoint:

bu mscordbc!EEToProfInterfaceImpl::CreateProfiler

Now go!  If you hit that breakpoint, that verifies the CLR has determined that a profiler has been requested to load from the environment variables, but the CLR has yet to read the registry.  Let's see if your DLL actually gets loaded.  You can use

sxe ld NameOfYourProfiler.dll

or even set a breakpoint inside your Profiler DLL's DllMain.  Now go, and see if your profiler is getting loaded.  If you can verify your profiler's DLL is getting loaded, then you now know your registry is pointing to the proper path, and any static dependencies your profiler has on other DLLs have been resolved.  But will your profiler COM object get instantiated properly?  Set breakpoints in your class factory (DllGetClassObject) and your profiler COM object's QueryInterface to see if you can spot problems there.  For example, if your profiler only works against CLR 1.x, then the CLR's call into your QueryInterface will fail, since you don't implement ICorProfilerCallback2.

If you're still going strong, set a breakpoint in your profiler's Initialize() callback.  Failures here are actually a popular cause for activation problems.  Inside your Initialize() callback, your profiler is likely calling QueryInterface for the ICorProfilerInfoX interface of your choice, and then calling SetEventMask, and doing other initialization-related tasks, like calling SetEnterLeaveFunctionHooks(2).  Do any of these fail?  Is your Initialize() callback returning a failure HRESULT?

Hopefully by now you've isolated the failure point.  If not, and your Initialize() is happily returning S_OK, then your profiler is apparently loading just fine.  At least it is when you're debugging it.  :-)

Posted by davbr | 7 Comments

Versions of Microsoft .NET Framework, CLR, and Your Profiler

[Updated 8/13/2008 with the release of Visual Studio 2008 SP1.]

With the many releases of the Microsoft .NET Frameworks and their service packs, it might not be obvious what versions of the Common Language Runtime (CLR) come alongside them and whether your profiler should care.  So I'm posting this to help clarify the various versions.

Note: I'm not talking about the Microsoft .NET Compact Framework or Microsoft .NET Micro Framework in this post.

Check it out!

Before I go any further, I invite you to learn more about the recently released Visual Studio 2008 SP1 and Microsoft .NET Framework 3.5 SP1.  Some great places to start:

Visual Studio 2008 and Microsoft .NET Framework 3.5:

this post from ScottGu's blog, this post from Soma's blog, or directly at the Visual Studio page http://www.microsoft.com/vstudio.

Visual Studio 2008 SP1 and Microsoft .NET Framework 3.5 SP1:

this post from Soma's blog.

Definitions

Ok, so what's the difference between "Common Language Runtime (CLR)" and "Microsoft .NET Framework"?  I think of it as:

CLR + managed libraries and tools = Microsoft .NET Framework

The CLR is the low-level technology (much of it written in unmanaged, native code) that includes the garbage collector, security subsystem, just-in-time compiler, type system, the profiling API (of course :-)), and other similar stuff.  Much of this tends to reside in mscorwks.dll.

If you then add onto that list the many rich managed libraries for implementing graphical user interfaces, web services, accessing Windows OS functionality, etc., as well as the managed language compilers and tools, you get the Microsoft .NET Framework.

Disclaimer: These definitions are how I, geek-Dave, keep things straight in my head.  I'm not in marketing so the names I'm using might not be perfectly accurate (e.g., I'm probably missing terms like "SDK" or "redistributable package").  Please don't take these as Microsoft Official Definitions.

Versions

You can get the various Microsoft .NET Framework versions via Windows Update, by searching on http://download.microsoft.com, or by installing the corresponding version of Visual Studio.  Also, Windows Vista comes with Microsoft .NET Framework 3.0 as an optionally installable component.

Here's how the Microsoft .NET Framework versions >= 2.0 correlate with the CLR versions:

Microsoft .NET Framework version CLR version Ships with Visual Studio Version
2.0 2.0 2005
2.0 SP1 2.0 SP1 2008 (via .NET 3.5 install)
3.0 2.0 (comes with Vista)
3.0 SP 1 2.0 SP1 2008 (via .NET 3.5 install)
3.5 2.0 SP1 2008
3.5 SP 1 2.0 SP2 2008 SP1

Note that the latest CLR version at the time of writing this post is CLR 2.0 SP2 (no such thing as CLR 3.x!).

One exciting note here is that Visual Studio 2008 will actually let you target all three of .NET 2.0, .NET 3.0, and .NET 3.5.

Your Profiler

CLR Versions

Your profiler interacts with the CLR Profiling API, so you likely care which CLR version you're interacting with.  Knowing that the .NET Framework 2.x/3.x versions are all based on CLR 2.x simplifies the picture for you, as you're assured that the profiling API remains compatible across those versions.  However, it's fair to expect that CLR service packs might bring small changes or bug fixes.  CLR 2.0 SP1, for example, contains the following profiling API fixes:

  • SetEnterLeaveFunctionHooks2 returned E_INVALIDARG if any of the specified callback pointers were NULL.  For example, one might wish to specify an Enter probe (non-NULL) but not specify a Leave probe (NULL); this was disallowed.  Fix: SetEnterLeaveFunctionHooks2 now allows one or more of the callback parameters to be NULL.
  • Various fixes around IMethodMalloc::Alloc
  • When a profiler was monitoring Leave calls in order to inspect non-primitive, value-type return values, the profiling API would sometimes specify an invalid value.  This has been fixed.  (More info about this bug was posted here and here.)

CLR 2.0 SP2 had no fixes or changes of note in the profiling API.

Microsoft .NET Framework Versions

Your profiler may also (optionally) take dependencies on the libraries.  For example, if you have an instrumenting profiler that performs IL rewriting on some framework library code, then your profiler should deal gracefully with the cases that the libraries themselves may be different in the various .NET Framework versions.  Other than that, your profiler probably doesn't need to care which version of the libraries is in use.

Long Story Short

Although there are several Microsoft .NET Framework versions out there, your CLR 2.x-based profiler should still be fine.  Of course, the CLR is under active development (those libraries teams aren't the only ones having fun), so look forward to exciting things for your profiler to take advantage of in the future.

Posted by davbr | 13 Comments

Test my Code! (And get paid for it.)

We've got a new job opening listed here.  If your idea of fun is writing debuggers and profilers that automate the testing of the CLR APIs on which they're built, then you'll want to check out this opening.  Be the first on your block to discover the latest and greatest CLR features (and then write code to pound the heck out of it!).  As a bonus, you get to open bugs and assign them to me.  :-).

Posted by davbr | 1 Comments

Enter, Leave, Tailcall Hooks Part 2: Tall tales of tail calls

For most people the idea of entering or returning from a function seems straightforward. Your profiler's Enter hook is called at the beginning of a function, and its Leave hook is called just before the function returns. But the idea of a tail call and exactly what that means for the Profiling API is less straightforward.

In Part 1 I talked about the basics of the Enter / Leave / Tailcall hooks and generally how they work. You may want to review that post first if you haven't seen it yet. This post builds on that one by talking exclusively about the Tailcall hook, how it works, and what profilers should do inside their Tailcall hooks.

Tail calling in general

Tail calling is a compiler optimization that saves execution of instructions and saves reads and writes of stack memory. When the last thing a function does is call another function (and other conditions are favorable), the compiler may consider implementing that call as a tail call, instead of a regular call.

    static public void Main()
    {
        Helper();
    }

    static public void Helper()
    {
        One();
        Two();
        Three();
    }

    static public void Three()
    {
        ...
    }

In the code above, the compiler may consider implementing the call from Helper() to Three() as a tail call. What does that mean, and why would that optimize anything? Well, imagine this is compiled without a tail call optimization. By the time Three() is called, the stack looks like this (my stacks grow UP):

Three
Helper
Main

Each of those functions causes a separate frame to be allocated on the stack. All the usual contents of a frame, including locals, parameters, the return address, saved registers, etc., get stored in each of those frames. And when each of those functions returns, the return address is read from the frame, and the stack pointer is adjusted to collapse the frame of the returning function. That's just the usual overhead associated with making a function call.

Now, if the call from Helper() to Three() were implemented as a tail call, we'd avoid that overhead, and Three() would just "reuse" the stack frame that had been set up for Helper(). While Three() is executing, the call stack would look like this:

Three
Main

And when Three() returns, it returns directly to Main() without popping back through Helper() first.

Folks who live in functional programming languages like Scheme use recursion at least as often as C++ or C# folks use while and for loops. Such functional programming languages depend on tail call optimizations (in particular tail recursion) to avoid overflowing the stack. While imperative languages like C++ or C# don't have such a vital need for tail call optimizations, it's still pretty handy as it reduces the number of instructions executed and the writes to the stack.  Also, it's worth noting that the amount of stack space used for a single frame can be more than you'd expect.  For example, in CLR x64, each regular call (without the tail call optimization) uses a minimum of 48 bytes of stack space, even if it takes no arguments, has no locals, and returns nothing.  So for small functions, the tail call optimization can provide a significant overhead reduction in terms of stack space.

The CLR and tail calls

When you're dealing with languages managed by the CLR, there are two kinds of compilers in play. There's the compiler that goes from your language's source code down to IL (C# developers know this as csc.exe), and then there's the compiler that goes from IL to native code (the JIT 32/64 bit compilers that are invoked at run time or NGEN time). Both the source->IL and IL->native compilers understand the tail call optimization. But the IL->native compiler--which I'll just refer to as JIT--has the final say on whether the tail call optimization will ultimately be used. The source->IL compiler can help to generate IL that is conducive to making tail calls, including the use of the "tail." IL prefix (more on that later). In this way, the source->IL compiler can structure the IL it generates to persuade the JIT into making a tail call. But the JIT always has the option to do whatever it wants.

When does the JIT make tail calls?

I asked Fei Chen and Grant Richins, neighbors down the hall from me who happen to work on the JIT, under what conditions the various JITs will employ the tail call optimization. The full answer is rather detailed. The quick summary is that the JITs try to use the tail call optimization whenever they can, but there are lots of reasons why the tail call optimization can't be used. Some reasons why tail calling is a non-option:

  • Caller doesn't return immediately after the call (duh :-))
  • Stack arguments between caller and callee are incompatible in a way that would require shifting things around in the caller's frame before the callee could execute
  • Caller and callee return different types
  • We inline the call instead (inlining is way better than tail calling, and opens the door to many more optimizations)
  • Security gets in the way
  • The debugger / profiler turned off JIT optimizations

Here are their full, detailed answers.

Note that how the JIT decides whether to use the tail calling optimization is an implementation detail that is prone to change at whim.  You must not take dependencies on this behavior. Use this information for your own personal entertainment only.

Your Profiler's Tailcall hook

I'm assuming you've already read through Part 1 and are familiar with how your profiler sets up its Enter/Leave/Tailcall hooks, so I'm not repeating any of those details here. I will focus on what kind of code you will typically want to place inside your Tailcall hook:

typedef void FunctionTailcall2(

                FunctionID funcId,

                UINT_PTR clientData,

                COR_PRF_FRAME_INFO func);

Tip: More than once I've seen profiler writers make the following mistake. They will take their naked assembly-language wrapper for their Enter2 and Leave2 hooks, and paste it again to use as the Tailcall2 assembly-language wrapper. The problem is they forget that the Tailcall2 hook takes a different number of parameters than the Enter2 / Leave2 hooks (or, more to the point, a different number of bytes is passed on the stack to invoke the Tailcall2 hook). So, they'll take the "ret 16" at the end of their Enter2/Leave2 hook wrappers and stick that into their Tailcall2 hook wrapper, forgetting to change it to a "ret 12". Don't make the same mistake!

It's worth noting what these parameters mean. With the Enter and Leave hooks it's pretty obvious that the parameters your hook is given (e.g., funcId) apply to the function being Entered or Left. But what about the Tailcall hook? Do the Tailcall hook's parameters describe the caller (function making the tail call) or the callee (function being tail called into)?

Answer: the parameters refer to tail caller.

The way I remember it is that the Tailcall hook is like an "Alternative Leave" hook. A function ends either by returning (in which case the CLR invokes your Leave hook) or a function ends by tail calling out to somewhere else (in which case the CLR invokes your Tailcall hook). In either case (Leave hook or Tailcall hook) the hook's parameters tell you about the function that's ending. If a function happens to end by making a tail call, your profiler is not told the target of that tail call. (The astute reader will realize that actually your profiler is told what the target of the tail call is--you need only wait until your Enter hook is called next, and that function will be the tail call target, or "tail callee". (Well, actually, this is true most of the time, but not all! (More on that later, but consider this confusing, nested series of afterthoughts a hint to a question I pose further down in this post.)))

Did you just count the number of closing parentheses to ensure I got it right? If so, I'd like to make fun of you but I won't--I'd have counted the parentheses, too. My house is glass.

Ok, enough dilly-dallying. What should your profiler do in its Tailcall hook? Two of the more common reasons profilers use Enter/Leave/Tailcall hooks in the first place is to keep shadow stacks or to maintain call traces (sometimes with timing information).

Shadow stacks

The CLRProfiler is a great example of using Enter/Leave/Tailcall hooks to maintain shadow stacks. A shadow stack is your profiler's own copy of the current stack of function calls on a given thread at any given time. Upon Enter of a function, you push that FunctionID (and whatever other info interests you, such as arguments) onto your data structure that represents that thread's stack. Upon Leave of a function, you pop that FunctionID. This gives you a live list of managed calls in play on the thread. The CLRProfiler uses shadow stacks so that whenever the managed app being profiled chooses to allocate a new object, the CLRProfiler can know the managed call stack that led to the allocation. (Note that an alternate way of accomplishing this would be to call DoStackSnapshot at every allocation point instead of maintaining a shadow stack. Since objects are allocated so frequently, however, you'd end up calling DoStackSnapshot extremely frequently and will often see worse performance than if you had been maintaining shadow stacks in the first place.)

 

OK, so when your profiler maintains a shadow stack, it's clear what your profiler should do on Enter or Leave, but what should it do on Tailcall? There are a couple ways one could imagine answering that question, but only one of them will work! Taking the example from the top of this post, imagine the stack looks like this:

Helper
Main

and Helper is about to make a tail call into Three().  What should your profiler do?

Method 1: On tailcall, pop the last FunctionID.  (In other words, treat Tailcall just like Leave.)

So, in this example, when Helper() calls Three(), we'd pop Helper().  As soon as Three() is called, our profiler would receive an Enter for Three(), and our shadow stack would look like this:

Three
Main

This approach mirrors reality, because this is what the actual physical stack will look like.  Indeed, if one attaches a debugger to a live process, and breaks in while the process is inside a tail call, the debugger will show a call stack just like this, where you see the tail callee, but not the tail caller.  However, it might be a little confusing to a user of your profiler who looks at his source code and sees that Helper() (not Main()) calls Three().  He may have no idea that when Helper() called Three(), the JIT chose to turn that into a tail call. In fact, your user may not even know what a tail call is. You might therefore be tempted to try this instead:

Method 2: On tailcall, "mark" the FunctionID at the top of your stack as needing a "deferred pop" when its callee is popped, but don't pop yet.

With this strategy, for the duration of the call to Three(), the shadow stack will look like this:

Three
Helper
(marked for deferred pop)
Main

which some might consider more user-friendly. And as soon as Three() returns, your profiler will sneakily do a double-pop leaving just this:

Main

So which should your profiler use: Method 1 or Method 2? Before I answer, take some time to think about this, invoking that hint I cryptically placed above in nested parentheses. And no, the fact that the parentheses were nested is not part of the actual hint.

The answer: Method 1. In principle, either method should be fine. However, the behavior of the CLR under certain circumstances will break Method 2. Those "certain circumstances" are what I alluded to when I mentioned "this is true most of the time, but not all" above.  These mysterious "certain circumstances" involve a managed function tail calling into a native helper function inside the runtime. Here's an example:
static public void Main()
{
    Thread.Sleep(44);
    Helper();
}

It just so happens that the implementation of Thread.Sleep makes a call into a native helper function in the bowels of the runtime. And that call happens to be the last thing Thread.Sleep does. So the JIT may helpfully optimize that call into a tail call. Here are the hook calls your profiler will see in this case:

(1) Enter (into Main)
(2) Enter (into Thread.Sleep)
(3) Tailcall (from Thread.Sleep)
(4) Enter (into Helper)
(5) Leave (from Helper)
(6) Leave (from Main)

Note that after you get a Tailcall telling you that Thread.Sleep is done, (in (3)), the very next Enter you get (in (4)) is NOT the Enter for the function being tail called. This is because the CLR only provides Enter/Leave/Tailcall hooks for managed functions, and the very next managed function being entered is Helper().  So, how will Method 1 and Method 2 fare in this example?

Method 1: Shadow stack works

By popping on every Tailcall hook, your shadow stack stays up to date.

Method 2: Shadow stack fails

At stage (4), the shadow stack looks like this:

Helper
Thread.Sleep
(marked for "deferred pop")
Main

If you think it might be complicated to explain tail calls to your users so they can understand the Method 1 form of shadow stack presentation, just try explaining why it makes sense to present to them that Thread.Sleep() is calling Helper()!

And of course, this can get arbitrarily nasty:

static public void Main()
{
    Thread.Sleep(44);
    Thread.Sleep(44);
    Thread.Sleep(44);
    Thread.Sleep(44);
    Helper();
}

would yield:

Helper
Thread.Sleep
(marked for "deferred pop")
Thread.Sleep (marked for "deferred pop")
Thread.Sleep (marked for "deferred pop")
Thread.Sleep (marked for "deferred pop")
Main

And things get more complicated if you start to think about when you actually pop a frame marked for "deferred pop". In all the above examples, you would do so as soon as the frame above it gets popped. So once Helper() is popped (due to Leave()), you'd cascade-pop all the Thread.Sleeps. But what if there is no frame above the frames marked for "deferred pop"?  To wit:

static public void Main()
{
    Helper()
}

static public void Helper()
{
    Thread.Sleep(44);
    Thread.Sleep(44);
    Thread.Sleep(44);
    Thread.Sleep(44);
}

would yield:

Thread.Sleep (marked for "deferred pop")
Thread.Sleep (marked for "deferred pop")
Thread.Sleep (marked for "deferred pop")
Thread.Sleep (marked for "deferred pop")
Helper
Main

until you get a Leave hook for Helper(). At this point, you need to pop Helper() from your shadow stack, but he's not at the top-- he's buried under all your "deferred pop" frames. So your profiler would need to perform the deferred pops if a frame above OR below them gets popped. Hopefully, the yuckiness of this implementation will scare you straight.  But the confusion of presenting crazy stacks to the user is the real reason to abandon Method 2 and go with Method 1.

Call tracing

The important lesson to learn from the above section is that sometimes a Tailcall hook will match up with the next Enter hook (i.e., the tail call you're notified of in your Tailcall hook will have as its callee the very function you're notified of in the next Enter hook), and sometimes the Tailcall hook will NOT match with the next Enter hook (in particular when the Tailcall hook refers to a tail call into a native helper in the runtime). And the sad fact is that the Enter/Leave/Tailcall hook design does not currently allow you to predict whether a Tailcall will match the next Enter.

As an illustration, consider two simple tail call examples:

Matching Example

    static public void Main()
    {
        One();
        ...(other code here)...
    }

    static public void One()
    {
        Two();
    }

Non-matching Example

    static public void Main()
    {
        Thread.Sleep(44);
        Two();
    }

In either case, your profiler will see the following hook calls

(1) Enter (into Main)
(2) Enter (into One / Thread.Sleep)
(3) Tailcall (from One / Thread.Sleep)
(4) Enter (into Two)
...

In the first example, (3) and (4) match (i.e., the tail call really does call into Two()). But in the second example, they do not (the tail call does NOT call into Two()).

Since you don't know when Tailcall will match the next Enter, your implementation of call tracing, like shadow stack maintenance, must treat a Tailcall hook just like a Leave. If you're logging when functions begin and end, potentially with the amount of time spent inside the function, then your Tailcall hook should basically do the same thing as your Leave hook. A call to your Tailcall hook indicates that the specified function is over and done with, just like a call to your Leave hook.

As with shadow stacks, this will sometimes lead to call graphs that could be confusing. "Matching Example" had One tail call Two, but your graph will look like this:

Main
 |
 |-- One
 |-- Two

But at least this effect is explainable to your users, and is self-correcting after the tail call is complete, while yielding graphs that are consistent with your timing measurements. If instead you try to outsmart this situation and assume Tailcalls match the following Enter, the errors can snowball into incomprehensible graphs (see the nasty examples from the shadow stack section above).

How often does this happen?

So when does a managed function in the .NET Framework tail call into a native helper function inside the CLR? In the grand scheme of things, not a lot. But it's a pretty random and fragile list that depends on which JIT is in use (x86, x64, ia64), and can easily change as parts of the runtime are rev'd, or even as JIT compilation flags are modified by debuggers, profilers, and other environmental factors while a process is active. So you should not try to guess this list and make dependencies on it.

Can't I just turn tail calling off?!

If all this confusion is getting you down, you might be tempted to just avoid the problem in the first place. And yes, there is a way to do so, but I wouldn't recommend it in general. If you call SetEventMask, specifying COR_PRF_DISABLE_OPTIMIZATIONS inside your mask, that will tell the JIT to turn off the tail call optimization. But the JIT will also turn off ALL optimizations. Profilers that shouldn't perturb the behavior of the app should definitely not do this, as the code generation will be very different.

Watching CLR tail calls in action

If you're writing a profiler with Enter/Leave/Tailcall hooks, you'll want to make sure you exercise all your hooks so they're properly tested. It's easy enough to make sure your Enter/Leave hooks are called--just make sure the test app your profiler runs against has a Main()! But how to make sure your Tailcall hook is called?

The surest way is to have a simple managed app that includes an obvious tail call candidate, and make sure the "tail." IL prefix is in place. You can use ilasm / ildasm to help build such an assembly. Here's an example I tried on x86 using C#.

Start with some simple code that makes a call that should easily be optimized into a tail call:

using System;
class Class1
{
    static int Main(string[] args)
    {
        return Helper(4);
    }

    static int Helper(int i)
    {
        Random r = new Random();
        i = (i / 1000) + r.Next();
        i = (i / 1000) + r.Next();
        return MakeThisATailcall(i);
    }

    static int MakeThisATailcall(int i)
    {
        Random r = new Random();
        i = (i / 1000) + r.Next();
        i = (i / 1000) + r.Next();
        return i;
    }
}

You'll notice there's some extra gunk, like calls to Random.Next(), to make the functions big enough that the JIT won't inline them. There are other ways to avoid inlining (including from the profiling API itself), but padding your test functions is one of the easier ways to get started without impacting the code generation of the entire process. Now, compile that C# code into an IL assembly:

    csc /o+ Class1.cs

(If you're wondering why I specified /o+, I've found that if I don't, then suboptimal IL gets generated, and some extraneous instructions appear inside Helper between the call to MakeThisATailcall and the return from Helper. Those extra instructions would prevent the JIT from making a tail call.)

Run ildasm to get at the generated IL

    ildasm Class1.exe

Inside ildasm, use File.Dump to generate a text file that contains a textual representation of the IL from Class1.exe.  Call it Class1WithTail.il.  Open up that file and add the tail. prefix just before the call you want optimized into a tail call (see highlighted yellow for changes):

  .method private hidebysig static int32 
          Helper(int32 i) cil managed
  {
    // Code size       45 (0x2d)
    // Code size       46 (0x2e)
    .maxstack  2
    .locals init (class [mscorlib]System.Random V_0)
    IL_0000:  newobj     instance void [mscorlib]System.Random::.ctor()
    IL_0005:  stloc.0
    IL_0006:  ldarg.0
    IL_0007:  ldc.i4     0x3e8
    IL_000c:  div
    IL_000d:  ldloc.0
    IL_000e:  callvirt   instance int32 [mscorlib]System.Random::Next()
    IL_0013:  add
    IL_0014:  starg.s    i
    IL_0016:  ldarg.0
    IL_0017:  ldc.i4     0x3e8
    IL_001c:  div
    IL_001d:  ldloc.0
    IL_001e:  callvirt   instance int32 [mscorlib]System.Random::Next()
    IL_0023:  add
    IL_0024:  starg.s    i
    IL_0026:  ldarg.0
    IL_0027:  call       int32 Class1::MakeThisATailcall(int32)
    IL_002c:  ret
    IL_0027:  tail.
    IL_0028:  call       int32 Class1::MakeThisATailcall(int32)
    IL_002d:  ret
  } // end of method Class1::Helper
  

Now you can use ilasm to recompile your modified textual IL back into an executable assembly.

    ilasm /debug=opt Class1WithTail.il

You now have Class1WithTail.exe that you can run!  Hook up your profiler and step through your Tailcall hook.

You Can Wake Up Now

If you didn't learn anything, I hope you at least got some refreshing sleep thanks to this post. Here's a quick recap of what I wrote while you were napping:

  • If the last thing a function does is call another function, that call may be optimized into a simple jump (i.e., "tail call").  Tail calling is an optimization to save the time of stack manipulation and the space of generating an extra call frame.
  • In the CLR, the JIT has the final say on when it employs the tail call optimization. The JIT does this whenever it can, except for a huge list of exceptions. Note that the x86, x64, and ia64 JITs are all different, and you'll see different behavior on when they'll use the tail call optimizations.
  • Since some managed functions may tail call into native helper functions inside the CLR (for which you won't get an Enter hook notification), your Tailcall hook should treat the tail call as if it were a Leave, and not depend on the next Enter hook correlating to the target of the last tail call.  With shadow stacks, for example, this means you should simply pop the calling function off your shadow stack in your Tailcall hook.
  • Since tail calls can be elusive to find in practice, it's well worth your while to use ildasm/ilasm to manufacture explicit tail calls so you can step through your Tailcall hook and test its logic.
David has been a developer at Microsoft for over 70 years (allowing for his upcoming time-displacement correction). He joined Microsoft in 2079, first starting in the experimental time-travel group. His current assignment is to apply his knowledge of the future to eliminate the "Wait for V3" effect customers commonly experience in his source universe. By using Retroactive Hindsight-ellisenseTM his goal is to "get it right the first time, this time" in a variety of product groups.
Posted by davbr | 6 Comments

New version of CLRProfiler released (for CLR 2.x only)

If you're having trouble with CLRProfiler crashing the aspnet_wp process, along the lines of this forum post:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1039428&SiteID=1

then you'll want to know that we have officially uploaded a fixed version of CLR Profiler here: 

http://www.microsoft.com/downloads/details.aspx?FamilyID=a362781c-3870-43be-8926-862b40aa0cd0&DisplayLang=en

So please feel free to go there directly to get the latest and greatest.  (The update is for CLR 2.x only).

Posted by davbr | 1 Comments
More Posts Next page »
 
Page view tracker