RyuJIT CTP2: Getting Ready for Prime-time

RyuJIT CTP2: Getting Ready for Prime-time

Rate This
  • Comments 38

This post announces an updated preview of the .NET team’s new 64-bit Just-In-Time (JIT) compiler. It was written by Mani Ramaswamy, Program Manager for the .NET Dynamic Code Execution Team.

Note: RyuJIT CTP3 is available here: http://blogs.msdn.com/b/dotnet/archive/2014/04/03/the-next-generation-of-net.aspx.

The developer preview of RyuJIT, CTP1, received a thunderous response (so much so we had to post a FAQ soon after). Two questions commonly asked were when would there be an update and when would it support feature X or Y that is in the existing 64-bit .NET JIT compiler. CTP2 answers both questions. This release of RyuJIT has equivalent functionality of existing JIT64: there aren’t any feature differences between RyuJIT and the existing JIT64 at this point. RyuJIT generates code that’s on average better than the existing JIT64, while it continues to maintain the 2X throughput wins over JIT64.

Improvements: Features, Reliability, Performance

The two main features which weren’t supported in CTP1 were “opportunistic” tail calls and Edit & Continue. With CTP2, both of these features are supported. Additionally, a host of other features have been added to achieve functional parity with JIT64. Along the way, we’ve (the .NET Code Generation team) also added a number of performance tweaks and optimizations so that code generated using RyuJIT is generated fast (the throughput metric) and runs fast (the code quality metric).

But why stop there? We have thrown every test at our disposal at RyuJIT and it has come out with flying colors – whether it be running common server software using IKVM.NET (a Java Virtual Machine implemented in .NET), or complex ASP.NET workloads, or even simple Windows Store apps. Thanks to everyone who tried out the first CTP of RyuJIT and filed bug reports – we’ve fixed every single one of them, and at this point, RyuJIT doesn’t have any known bugs.

We continue to look at ways to improve the overall quality of RyuJIT, and will likely discover a few more bugs along the way. From the enthusiastic response we got from the first CTP, we’ll surely hear back with a few more bugs from our early adopters, i.e. you.

When it comes to performance, CTP1 demonstrated that RyuJIT handily beats JIT64 on throughput (how fast the compiler generates code) by a factor of 2X. We’ve been careful to maintain our throughput wins, and this CTP should yield similar throughput numbers. With CTP1, the focus was on throughput and to get some early feedback, and not so much on code quality (how fast the generated code executes).

While with CTP1, we were in the same ball park as JIT64, we were still 10-20% slower on code quality, with some outliers. With CTP2, we’ve addressed that – at this point, on average we should be at par or beating JIT64 on code quality. If during your evaluation, you find a benchmark where RyuJIT is trailing JIT64 performance significantly, please reach out to us – by the time we’re done, RyuJIT should be producing code that’s better than what JIT64 produced. This is not to say that there couldn’t be a few micro-benchmarks where JIT64 produces more optimal code, but rather to say that on average RyuJIT should be on par or better, and in the few (rare) cases it does trail JIT64 performance, it trails by only a few percentage points. We tried out many common code quality benchmark suites internally, and found that RyuJIT code quality on average is better than the existing .NET JIT64 compiler – thus if you do find an outlier, we’re most interested.

The chart below shows our performance, relative to JIT64’s across a number of benchmarks, some very small, others fairly large. Positive numbers indicate RyuJIT performing better than JIT64. Negative numbers indicate the opposite. The gray section is the limit of “statistical noise” for each benchmark, so any bar that is within the gray area indicates effectively identical performance. Check the CodeGen blog within a day or two for a detailed description of the methodology and specifics about the benchmarks we’re running. Overall, we’re doing quite well, with only a handful of losses, and some very nice wins!

 

 

While we needed to first get all the functionality and quality metrics lined up and achieve parity on performance (code quality) with JIT64 (we’re already 2X faster on throughput, in case you forgot), our re-architecture puts us in a great place for optimizing .NET dynamic code execution scenarios. Over the next few months, you will continue to hear from us as we move forward on quality and performance.

Evaluating the Preview

The process to install RyuJIT remains simple and is the same as that for CTP1. While it’s not for production code yet, we look forward to hearing from you on functionality, quality and performance. Send feedback and questions to ryujit@microsoft.com. Even if you’ve figured out the issue or a workaround yourself we want to hear from you.

You can download the RyuJIT installer now. RyuJIT only works on 64-bit editions of Windows 8.1 or Windows Server 2012 R2. Also, RyuJIT doesn’t impact NGen to keep your system isolated.

After installation, there are two ways to turn on RyuJIT. If you just want to enable RyuJIT for one application, set an environment variable: COMPLUS_AltJit=*. If you want to enable RyuJIT for your entire machine, set the registry key HKLM\SOFTWARE\Microsoft\.NETFramework\AltJit to the string "*". Both methods cause the 64-bit CLR to use RyuJIT instead of JIT64. And both are temporary settings—installing RyuJIT doesn’t make any permanent changes to your machine (aside from installing the RyuJIT files in a directory)

Stay tuned to this blog for more on RyuJIT as it progresses from a CTP to become the One True .NET JIT Compiler. If you want to dive more deeply into the geeky details behind RyuJIT’s development, you can do that at the .NET CodeGen blog. For example, here's more detail on performance numbers. We’ll answer questions and explain design decisions there. And remember to send us mail at ryujit@microsoft.com after you download and install the CTP. We want to hear from you!

Leave a Comment
  • Please add 8 and 6 and type the answer here:
  • Post
  • If NGen is not impacted, what happens when RyuJIT is enabled for the entire machine and the user runs an NGen-ed program?  Is the NGen-ed image ignored, or is RyuJIT bypassed?

    A micro-benchmark question: how much is gained for the startup time of PowerShell ISE?

    Do you have advice on optimizing code for RyuJIT (as in: code constructs that should be avoided)?

  • @Lionel: If NGen is not impacted...: RyuJIT is bypassed completely for NGen. The "AltJit" mechanism is only observed when the JIT is loaded in a normal context. When the JIT is loaded for NGen, it's ignored on the default JIT (JIT64 for a 64 bit .NET runtime) is used.

    I haven't looked at the PowerShell ISE, but it would be easy enough to: All you'd need to do is set the COMPLUS_AltJit variable to * and launch it, and then launch it normally. You can check to see if RyuJIT is in use by looking at loaded modules to see if 'protojit.dll' is in the process.

    As far as advice on optimizing for RyuJIT, currently, it's still in flux. We're not done with optimizations, so any advice would be premature. I'd much rather have that question flipped around: what code constructs would you like us to optimize for?

  • Is there any chance we see improved performance for rectangular arrays. I am quite annoyed that they are actually slower than jagged arrays although they should be significantly faster. I know that this is low priority and unimportant but somehow it bugs me although I never had to use rectangular arrays in high performance scenario.

  • "I'd much rather have that question flipped around: what code constructs would you like us to optimize for?"

    Well... the list example I posted a while ago still generates horrible code :)

           static int Sum(List<int> list) {

               int sum = 0;

               foreach (int x in list)

                   sum += x;

               return sum;

           }

  • I sure would like to see some SIMD options soon.

  • - It looks like that plain safe code is optimized much better than unsafe code. Any plans on it?

    - When I wrapped a volatile variable in a struct everything was perfectly  inlined, and even though JIT generated more instructions (instead of just single MOV)  perf was better. Probably because CPU had to do less work translating MOV to microcode or because of chunks in which it reads  from instructions cache, don't know.  

    Do you take into account such effects on modern processors?

    - Would be nice if   the following  comparison would be optimized to just comparing typehandle values in objects' headers:

    bool  sameType =  obj1.GetType() == obj2.GetType();

    - Getting RuntimeTypeHandle's value could be optimized to just getting typehandle:

    IntPtr typeHandleValue = obj..GetType().TypeHandle.Value;

    That's it.. for  now )

  • It would be nice to be able to tell the JIT to prioritize code quality over throughput. We don't always care that much about startup time, but we do care very much about performance once the application is up.

  • Is RyuJIT going to replace current JIT in the next framework version?

  • How can the environment variable COMPLUS_AltJit=* affect a single application? Replace * with the application name / path?

  • @Wolfgang: Well, only set the environment variable in the context you're running the application in, not for the whole system. So skip Control Panel, use cmd.

  • @Stilgar: We've made some headway here, but I'm not sure if it's quite there yet. I'll poke around a bit more and post back.

  • @Mike Danes: I know. And any excuse I might try and make will just sound whiny, so I'll just leave it at that :-)

  • @LKeene: Continue to vote it up on UserVoice!

  • @OmariO:

    1- unsafe code is just not that high a priority for us right now. That said, it should be terrible: we probably won't do much worse than JIT64. If you've got some code that we're not doing too well on, send it along!

    2- To be honest, modeling what 'modern processors' do is a pretty difficult thing, because they all behave quite differently. At a high order, we tend to favor smaller code over "faster" code because in the real world, if your hot path fits in 1 fewer pages, most cycles saved by using larger code sequences gets eaten by the page fault.

    3 & 4- I'll poke around in what we generate today and post back.

  • @André Slupik: We've gotten the RyuJIT compiler to a point where we actually have some optimizations that we can "turn up to 11". Knowing when & where to do so is the hard part (see previous comment about code size), but we're definitely looking in this direction.

Page 1 of 3 (38 items) 123