If you care about performance at a very low level, at one point you’ve asked yourself why the compiler, JIT, or runtime did or did not inline a certain method.  Unless you worked on the compiler, JIT, or runtime, you really had no way of telling, other than trial and error (sort of like asking how a magic 8 ball comes up with its answers).  We, on the JIT team, decided to fix that in CLR 4.  Now you the end-user or programmer can see why the JIT, and in some cases the runtime decided to disallow inlining or tail calls.  We also tell you when the JIT succeeded in inlining or tail calling a certain method.  All of this wonderful information is made available by Event Tracing for Windows.

 

First a disclaimer, I’m not an expert on ETW events, and this post isn’t about ETW events in general, it is only about a couple of new events exposed by the JIT to enable performance junkies to hurt themselves. J

 

For Windows XP, Widows Server 2003, Windows Vista and Windows 7, you already have everything you need on your machine.  For Windows 2000, you’ll need to download the Microsoft Windows 2000 Server Resource Kit.  According to the gurus around here previous to Windows Vista registering an ETW provider was not cheap, so the runtime only does it when requested to.  On Vista and newer OSes, it is cheap enough to do all the time, so you only need to request it on pre-Vista OSes.You request it by setting the following environment variable before running the application you are interested in:

SET COMPLUS_ETWENABLED=1

To start logging ETW events do this:

logman start clrevents -p {e13c0d23-ccbc-4e12-931b-d9cc2eee27e4} 0x1000 5 -ets

There are lots more options to tweak here, but the important part is the GUID (the CLR ETW provider GUID), the mask 0x1000 (the JitTracingKeyword), and the level 5 (everything).  More information about logman.exe can be found at http://technet.microsoft.com/en-us/library/bb490956.aspx.  After you’ve started ETW, run your scenario, and then stop ETW as follows:

logman stop clrevents -ets

This will create clrevents.etl.

 

[Begin Edit 1/27/2010]

Several people have reported that on Vista and Win7 (and their 2008 server counter parts), one additional step is needed so that the events can be properly decoded.  It is my understanding this only needs to be done once, but I am no expert here on ETW.  It is also my understanding that this needs to be run from an elevated command prompt.  I am merely placing this here to help others.  Also you will need to replace the path with the equivalent path on your machine/setup/configuration.

wevtutil im c:\windows\microsoft.net\framework\v4.0.21006\clr-etw.man

[End Edit 1/27/2010]

 

To decode it further run this:

tracerpt clrevents.etl

This will create 2 files: dumpfile.csv and summary.txt. The former has all the events, the latter gives a nice summary of the events. On Vista, tracerpt will generate dumpfile.xml instead of dumpfile.csv.  I ran this on a Vista machine so I’m going to deal with the XML format, but the csv format is similar.

 

Here is a sample MethodJitInliningSucceeded event:

 

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

      <System>

                  <Provider Name="Microsoft-Windows-DotNETRuntime" Guid="{e13c0d23-ccbc-4e12-931b-d9cc2eee27e4}" />

                  <EventID>185</EventID>

                  <Version>0</Version>

                  <Level>5</Level>

                  <Task>9</Task>

                  <Opcode>83</Opcode>

                  <Keywords>0x1000</Keywords>

                  <TimeCreated SystemTime="2009-04-14T14:31:52.168851900Z" />

                  <Correlation ActivityID="{00000000-0000-0000-0000-000000000000}" />

                  <Execution ProcessID="15476" ThreadID="16936" ProcessorID="3" KernelTime="90" UserTime="60" />

                  <Channel />

                  <Computer />

      </System>

      <UserData>

                  <MethodJitInliningSucceeded xmlns='myNs'>

                        <MethodBeingCompiledNamespace>Factorial</MethodBeingCompiledNamespace>

                        <MethodBeingCompiledName>Main</MethodBeingCompiledName>

                              <MethodBeingCompiledNameSignature>void  (class System.String[])</MethodBeingCompiledNameSignature>

                              <InlinerNamespace>Factorial</InlinerNamespace>

                              <InlinerName>Main</InlinerName>

                              <InlinerNameSignature>void  (class System.String[])</InlinerNameSignature>

                              <InlineeNamespace>Factorial</InlineeNamespace>

                              <InlineeName>fact</InlineeName>

                              <InlineeNameSignature>unsigned int32  (unsigned int32)</InlineeNameSignature>

                              <ClrInstanceID>13</ClrInstanceID>

                  </MethodJitInliningSucceeded>

      </UserData>

      <RenderingInfo Culture="en-US">

                  <Level>Verbose </Level>

                  <Opcode>JitInliningSucceeded </Opcode>

                  <Keywords>

                              <Keyword>JitTracingKeyword </Keyword>

                  </Keywords>

                  <Task>CLR Method </Task>

                  <Message>MethodBeingCompiledNamespace=Factorial;

MethodBeingCompiledName=Main;

MethodBeingCompiledNameSignature=void  (class System.String[]);

InlinerNamespace=Factorial;

InlinerName=Main;

InlinerNameSignature=void  (class System.String[]);

InlineeNamespace=Factorial;

InlineeName=fact;

InlineeNameSignature=unsigned int32  (unsigned int32);

ClrInstanceID=13 </Message>

      </RenderingInfo>

</Event>

 

We’ll focus in on the UserData element because it looks the prettiest.  First you have 3 sets of triples: MethodBeingCompiled, Inliner, and Inlinee crossed with Namespace, Name, and Signature.  I think the namespace, name and signature are fairly obvious, but the reason they are separate instead of pretty printed into a simple element is a matter of performance.  ETW is supposed to be lightweight.  Computing these strings is not lightweight, but we felt we made a nice trade-off (just enough information to be useful, but don’t spend time on making it prettier than it needs to be).  MethodBeingCompiled is the method the VM asked the JIT to generate code for.  Inliner refers to the method that the JIT is trying to generate code for.  If the Inliner is not the same as the MethodBeingCompiled it is because the JIT decided to try and inline the code for Inliner into the code for MethodBeingCompiled rather than generate a call to Inliner.  However, just because you see a method showing up as the Inliner, it doesn’t mean the JIT has actually succeeded in inlining it.  Finally the Inlinee refers to the method that the JIT is trying to inline (rather than generate a call to).

 

If this was a MethodJitInliningFailed event it would have 2 additional elements: FailAlways and FailReason.  If FailAlways is true, it is a hint back to the JIT and VM that inlining will always fail for the given Inlinee, regardless of the situation, and so subsequent compiles will be able to abort the inlining attempt faster (this is what FailReason of “It is marked as 'INLINE_NEVER'” means).  The FailReason is a free-form text field.  Originally this came from some internal code we had to help us in our debugging.  The text is often cryptic, and is not localized.  Part of the reason we did not localize it was performance.  The other part is that these strings are often so cryptic many English speakers will have a hard time with them.  For us we often end up treating them like a GUID: we search the sources for the places where that reason is returned to figure out why a particular inline failed.  So if English is not your native language, don’t fret too much, you’re probably not missing much.  That covers the 2 most common events: MethodJitInliningSucceeded and MethodJitInliningFailed.

 

Now we move onto the only other 2 events for this keyword: MethodJitTailCallSucceeded and MethodJitTailCallFailed.  The first 9 elements are the same (except replace Inliner with Caller, and Inlinee with Callee).  The next element is the TailPrefix element.  This element tells whether the tail. IL prefix was present.  For x86 this currently will always be true because the x86 JIT does not generate tail calls as an optimization.  See other blog posts for why tail calls are a good optimization that the x64 and IA64 JIT often performs even without the tail. il prefix.  We find this useful because, although some compilers emit the tail. prefix in IL, they do not have any source syntax to expose it, so the programmer does not know looking only at the source what was in the IL.

 

For MethodJitTailCallSucceded, you get a TailCallType.  This is an enum, which has names that should be localized down in the Message element.  Due to a bug in Beta 1 you will only get the numeric values.  Also the pre-Vista versions of tracerpt do not do the translation from value to string.  So on post Beta 1 builds with Vista or newer, you should the pretty names, otherwise expect the numbers.  I will list both, especially because the number still appear in the UserData element.  OptimizedTailCall, 0, means a typical tail call, where the outgoing arguments are pushed into the incoming parameter slots, the epilog is executed but instead of returning, it ends with a jump to the new method.  RecursiveLoop, 1, means a recursive tail call, where the JIT sees that the method calls itself, and replaces the call with a jump back to the start of the method (thus skipping even the prolog and epilog). HelperAssistedTailCall, 2, means that the tail call cannot be accomplished directly and must go through a helper function call.  These helper-assisted tail calls are slower than normal calls and only used as a last resort when the program needs to do a tail call in order to prevent stack overflows.  Thus currently you should never see a HelperAssistedTailCall with TailPrefix=false.

 

For MethodJitTailCallFailed, you don’t get TailCallType element, but you do get FailReason, which is similar to MethodJitInliningFailed’s FailReason.  It is also cryptic english-only text.  In future posts we hope to explain some of the more common reasons.

 

So for the few people that really need to squeeze out every last bit of performance from your code, you can now see a little bit further into how the JIT compiles your code.  Occasionally this will enable you to refactor a method in such a way as to improve inlining or tail calls.  If you do this remember though that every runtime has different heuristics and algorithms, and just because a methods was or wasn’t inlined in one version doesn’t mean the same will be true on a different version.  If a method absolutely cannot be inlined, use MethodImplOptions.NoInlining.  If it absolutely must be inlined then inline it manually (i.e. copy and paste the code).  That recommendation should never change.  If however you’re looking for some simple performance boosts, you can try these events and possibly learn something.  One example we found internally is that moving a try/catch or try/finally up or down the call chain can often have a big performance impact because it might enable/disable inlining inside a very important loop.

 

Grant Richins

CLR Codegen Team

Update 10/8/2009 - replace em-dash in logman command line with a normal dash that you can paste into a command window and it will work.