An easy solution for improving app launch performance

An easy solution for improving app launch performance

Rate This
  • Comments 19

Over the last ten years of building the .NET runtime, quite a number of assumptions have changed. Early on we could assume that most computer users only had one processor. Today, the assumption is that you have at least two processors. While including parallelism in an app for performance challenges most developers, what if that parallelism came for free? That's exactly what we've done with our newest CLR performance feature. Today, Dan Taylor, a program manager from the CLR performance team, shares how multicore JIT can make your app start faster. The best part -- you just have to include two lines of code to try it out. Super easy! --Brandon

In this post, I will provide an in-depth review of how the Multicore JIT technology works, and then show you how easy it is to use in your .NET Framework 4.5 apps.

App launch is faster with Multicore JIT

On the .NET Framework performance team, we spend a lot of time looking at the launch performance of managed applications. Large managed applications require JIT (just-in-time) compilation at launch time, so improving launch performance can be challenging. .NET Framework developers have been able to use Ngen.exe (Native Image Generator) to move code generation from application startup time to installation time. However, for the most part, this pre-compilation option is available only for large .NET Framework applications that also happen to have an installer.

As developers continue to take advantage of the great productivity benefits of the .NET Framework, they are using managed code in places where there is no installer and where Ngen is not available. To address the needs of these developers and to round out our portfolio of performance technologies in the .NET Framework 4.5, we have introduced Multicore JIT, which uses parallelization to reduce the JIT compilation time during application startup.

With Multicore JIT, methods are compiled on two cores in parallel. The more code you execute on your startup path, the more effective Multicore JIT will be at reducing startup time. Improvements of 20%-50% are very typical, which is great news to anyone developing medium to large .NET Framework applications that are not able to take advantage of NGen. You can improve the startup time of your application by up to 50% with very little work, even if it runs off of a USB stick.

Real-world benefit of Multicore JIT

Let’s take a look at how this works in practice with a few real-world applications. Bing.com recently moved to Windows Server 2012 and the .NET Framework 4.5. Because of Multicore JIT, their ASP.NET based services now start up 50% faster, going from an average of around 155 seconds to just under 80 seconds. You can read more about the Bing.com results with Multicore JIT in their recent blog post.

Multicore JIT can also yield significant improvements to desktop Windows Presentation Foundation (WPF) applications. The graph below shows startup times with and without Multicore JIT for three desktop WPF applications. In terms of code executed on startup, these applications are small to medium-sized, and are certainly much smaller than Bing.com’s ASP.NET applications.

image

A comparison of startup time with and without Multicore JIT

Even though these applications are small to medium-sized, the startup improvements from Multicore JIT range from 16% to 35%. In the case of Windows Performance Analyzer, the startup path included loading a performance trace and displaying a number of graphs. This shows that even with a large amount of non-JIT related work, Multicore JIT can result in a big improvement in the overall startup time of the application.

Let’s take a look at the CPU characteristics of Multicore JIT by looking at Paint.NET. This application normally uses Ngen, but for this analysis the native images were removed and the application’s source was modified to enable Multicore JIT. The following graph compares the CPU during launch of Paint.NET with and without Multicore JIT.

image

Paint.NET startup improvements from Multicore JIT

In the Multicore JIT scenario, instead of JIT-compiling methods on one CPU, two CPUs were used and the application was able to reach the end of its startup execution more quickly. An 8 core computer was used, so 12.5% on the % CPU axis is one CPU at full utilization, and 25% is two CPUs at full utilization.

Take a look at the following Channel 9 interview with a developer from the Windows team, who used multicore JIT in the apps that we looked at above.

How Multicore JIT works

Multicore JIT uses two modes of operation: recording mode and playback mode. During recording mode, the JIT compiler records every method it is asked to compile. Once the CLR determines that startup is complete, it saves a profile of all the methods that were executed to disk.

image

Multicore JIT recording mode

When Multicore JIT is enabled, recording mode is used the first time your application is launched. For subsequent launches, playback mode is used. Playback mode loads the profile from disk and uses the information to compile methods in the background, before they are needed by the main thread.

 

image

Multicore JIT playback mode

As a result, the main thread doesn't need to do as much compilation, and your application launches faster. The recording and playback features are turned on only for multicore machines, since single-core machines do not benefit from parallelization.

Using Multicore JIT

We've made it simple for you to use Multicore JIT from your application.

In a .NET Framework desktop application, all you need to do is use the System.Runtime.ProfileOptimization class to start profiling at the entry point of your application—the rest happens automatically. The following code shows how you can enable Multicore JIT by inserting two method calls in your application constructor:

public App() 
{
   
ProfileOptimization.SetProfileRoot(@"C:\MyAppFolder");
   
ProfileOptimization.StartProfile("Startup.Profile");
}
 

Starting Multicore JIT in an app constructor

The SetProfileRoot call tells the runtime where to store JIT profiles, and the StartProfile call enables Multicore JIT by using the provided profile name. The first time your application is launched, the profile does not exist, so Multicore JIT operates in recording mode and writes out a profile to the specified location. The second time your application launches, the CLR loads the profile from the previous launch, and Multicore JIT operates in playback mode.

If your application has a multi-stage startup, you can call StartProfile at any point in your application to take advantage of parallel compilation. For example, after your initial startup sequence, you might display a menu that enables the user to navigate into different parts of your app, with each navigation loading new code paths and causing more JIT compilation. In this case, you could use one JIT profile for the main menu and another profile for the various items in the main menu.

When we developed Multicore JIT, we took into consideration that ASP.NET applications run in a hosted environment, so we turned on Multicore JIT for these applications automatically. So if you're running ASP.NET 4.5, you don't have to do any extra work to turn on Multicore JIT. To make your ASP.NET applicationstart up faster, simply upgrade your server to ASP.NET 4.5.

If you want to turn Multicore JIT off in your ASP.NET 4.5 applications, use the new profileGuidedOptimizations flag in the web.config file as follows:

<?xml version="1.0" encoding="utf-8" ?> 
<configuration>
<!--
... -->
<
system.web>
<
compilation profileGuidedOptimizations="None" />
<!--
... -->
</
system.web>
</
configuration>

XML code for turning off Multicore JIT in ASP.NET applications

Wrapping up

Multicore JIT is an easy-to-use performance feature for applications that do not use Ngen. You can use this feature to speed up your application launch time by up to 50% with very little work. If you are developing an ASP.NET application, you will see automatic benefits by moving to ASP.NET 4.5. If you are developing a desktop application, you can turn Multicore JIT on with only a few lines of code. If you’d like to know more about Multicore JIT, you can take a look at our in-depth Channel 9 interview above or see the System.Runtime.ProfileOptimization class topic in the MSDN Library.

We provide performance improvements with each version of the .NET Framework, so you should always try out your applications on the latest version. To read about some of the other improvements we have made in the latest version, be sure to check out Ashwin Kamath’s article Overview of Performance Improvements in .NET 4.5.

Follow us or talk to us on Twitter -- http://twitter.com/dotnet.

Leave a Comment
  • Please add 5 and 4 and type the answer here:
  • Post
  • What is the effect of UAC when the application is installed in a subdir of "ProgramFiles" ?

  • Might Multicore JIT eventually become the default?

  • @DM, the profile directory where you store the data should be something that does not interfere with UAC. Using the Application Data store within the user profile is a good idea. Otherwise, there should not be an issue with UAC.

    @Anonymous Coward, we'll consider making something like this a default based on future learning. It is already the default for ASP.NET applications and Silverlight 5 apps. We'd love to hear how using multicore JIT makes an improvement to your app?

  • Is it possible to enable multicore JIT for a legacy MFC appkication that has significant .Net components embedded within it?  The entry point isn't managed, so I'm not sure how one could go about enabling profiling. Perhaps call into a managed component from the native entry point to enable profiling?

  • What makes this happen automatically in ASP.NET? Just release an app that is targeting 4.5? Does the server need to be IIS8?

  • @Anonymous Coward, if you want to know why it is not currently the default I have posted an answer on StackOverflow: stackoverflow.com/.../why-is-multicore-jit-not-on-by-default-in-net-4-5

    @Greg D, if you load a managed component that targets .NET 4.5 then you should be able to turn on profiling for any .dlls using the same instance of the CLR. Since .NET 4.5 is an in-place update, this means its possible to turn on Multicore JIT for .NET 4.0 assemblies as well. We may have not tested your exact scenario however, so your mileage may vary.

    @Chris Maristic, simply installing .NET 4.5 should turn on Multicore JIT for ASP.NET 4.0+ applications automatically. When you install .NET 4.5 the ASP.NET runtime will be updated in-place and will call ProfileOptimization on your behalf.

  • If an app already uses NGEN but jitting still occurs (due to generics etc.), then will this optimization help, or will the use of NGEN for some images effectively disable it?

    thanks!

    don

  • @dsyme, you can use Multicore JIT with NGEN to improve on JIT time from generics and etc., but this "leftover" JIT time is usually not significant so I did not discuss this combination in the post.

  • Brandon thank you for taking the time to respond to all of the questions on here. Microsoft making the commitment to users of their technology to be involved in the community like this is one of the strongest reasons to stay with the Microsoft platform. A few years ago this was entirely the opposite and Microsoft has made dramatic improvements in regards to the community.

  • What about Auto-Parallelization and Auto-Vectorization (available in VC++ 2012)?

    msdn.microsoft.com/.../hh872235.aspx

  • @John, we do not auto-vectorize or auto-parallelize, but we have provided the Task Parallel Library (TPL) to make it easy for .NET developers to parallelize code. See: msdn.microsoft.com/.../dd460717.aspx

  • This sounds great. Thanks for this post.

    Your post describes that it runs the first time in recording mode and every subsequent launches only in playback mode. I have seen that it runs always in recording mode and updates the profile every time (see modified date/time of the profile file). The good thing of this behavior is that I don’t need to care about updating the profile when a new app version is deployed.

    My Question: Because the recording mode runs every time – how does this affect the performance?

  • @jbe, recording mode uses a very light-weight token format so that the runtime and file I/O costs are negligible (the file size is usually 10-20KB). As you point out, the benefit of this is that the profile will always self-correct, and if there is a major change in behavior in the application Multicore JIT will adapt automatically.

  • I can't use it for my WPF application. No file created after StartProfile.

    I am using Window 7 pro 64 bit, VS 2012 RC. CPU is 4 cores XEON.

    Any ideas?

  • interesting.

    thanks for posting.

Page 1 of 2 (19 items) 12