Jason Zander is Corporate Vice President of Development for the Windows Azure team at Microsoft. Learn more about Jason.
More videos »
I thought it would be useful to provide a primer on the NGen tool and pre-jitting your code for performance reasons. In particular, there are some gotchas you must be aware of when authoring your product. In this entry, I'm going to cover some background material on paging (which you can skip if you are an expert already). Then we'll cover the workings of the NGen tool, some servicing implications, and finally some future directions.
Before we get started, let me keep up a Microsoft tradition and include the key takeaways right here. If you get nothing else out of this topic or can't read the whole thing, make sure you absorb the following:
Always measure to make sure it is a win for your application Make sure your application is well behaved in the face of brittleness and servicing
Always measure to make sure it is a win for your application
Make sure your application is well behaved in the face of brittleness and servicing
Bottom line recommendation: keep your eye on the technology, experiment with it, but plan to wait for a future version before really pulling it into your application.
Windows uses a virtual address space on your machine, so for a 32-bit system you get from 0 to 4GB of addressable memory for each process. Windows code is typically compiled into a Portable Executable file (PE file), which contains sections of code and data marked with page attributes like read, write, and execute. When the OS loads such a file into a process, it maps the memory from your file into physical pages that can be addressed by the process. So far so good?
On the x86, calls to methods are typically in the form of "call address", where address is an absolute value from 0 to 4GB, and tells the CPU the precise location it should transfer to. This poses a problem for the compiler, because it means that when the user's file is loaded, it needs to know precisely where all of the methods it will call inside that file live (not just relative to the start of the file, but the absolute address in the entire process). There are two things that kick in here to aid you:
Base Address
This is the address you specify as a developer (either through your compiler (eg: /baseaddress in VB.Net or C#) or using the rebase tool) where you want your executable to be loaded. The compiler will now assume the file will get loaded there, and can now predict the absolute address of every method in the file.
Relocs
Just in case your file can't be loaded to that base address, say if someone is already loaded there, the compiler will emit a set of relocs in the file that tell the OS where absolute addresses are located in the image. If the file gets relocated to a new place in the process, the OS will now fix-up the addresses -- essentially adjust them to the new home of the code or data. This allows flexibility, but is also expensive; keep reading to find out why.
Besides allowing the compiler to stitch together your program, a base address gives you a predictable location for your file to get loaded every time it is executed. This is important, because if you have sections of the file (say all of your executable code) that are read only, then we'd like to be efficient as possible on the machine and share those pages between processes. The OS accomplishes this if your pages are marked for read only and sharable. So if you have the code for strcpy from msvcrt.dll at location 0x70124800, then the one page of physical memory where that code lives can be viewed in all of the processes on the machine that also need it, provided those process have loaded the msvcrt.dll to the same address.
See the advantage? Overall system memory pressure goes down with shared pages because only one physical page is used no matter how many times you load it. Also, speed of loading code goes up, because chances are the system already has that file loaded in some other process on the machine. This is typically referred to as a "warm startup", because the OS has already loaded many of the pages you need, and doesn't have to go out to disk to get them. So bottom line, sharing of pages between processes is a GOOD THING.
I mentioned that having to relocate a file away from its base address is a BAD THING for your shareable data. This loss of sharing is the reason. If you cannot load at your preferred base address, then those addresses in those otherwise sharable pages are now wrong. So the OS has to make a copy of the page for your process, mark it write, and then fix-up all of the invalid values. This is bad because it takes both more time to do this (slower load times) and more space (for the extra unshared pages).
I should point out that some pages are, of course, intended to be per-process. Your global data for example wouldn't make much sense if you were sharing it with another running instance of your application! But in general we try very hard to reduce the number of pages in the system because of the high cost of the extra memory pressure.
Ok, all of this background is interesting, but what does this have to do with Managed code and the CLR? First, we also use the PE file format for managed code, so your VB.Net application will be stored in the same file format as kernel32.dll. This allows managed executables to appear anywhere you would normally expect. For example if you want to do a CoCreateInstance on your managed code, or do a LoadLibrary directly, you can do so. This file format choice means we have to follow the same rules for assigning base addresses. And guess what? We made the metadata and IL your compiler generates read only + sharable so we could use the same memory management benefits you get with unmanaged code.
Now think about what the JIT compiler does for a minute. It just-in-time compiles your program one method at a time. That means we allocate, on the fly, some memory and write the necessary native code for your program out to that location. When we need to call a method, we know where we put it in the absolute address range, so we can do the same "call address" you saw for unmanaged code. The advantage of the JIT is that it can literally stitch your program together as you go, and it only compiles the code that you actually execute. But since this is happening on the fly, all of those pages where this code is allocated are for that process only. We get none of the sharing advantages you got with unmanaged code in read only + sharable pages, and it also takes time to run that compiler. We did some experiments early on in the Runtime as proof of concept for our managed C++ compiler which included recompiling Word as an IL image. It worked great! But it was slow. Office is a big application, and using the JIT for this case didn't put our best foot forward.
Wouldn't it be great if you could get the same page sharing advantage as unmanaged code, and not have to run the JIT every time for a big application like Office? That's the NGen tool, and we'll drill into that in the next section.
NGen stands for "Native Image Generator". The tool allows us to run the JIT compiler on all of your IL in an assembly (a PE file) at one sitting, and cache the results out to disk. Now when you want to load and run that assembly, we can find it in the cache and load it just like an unmanaged image. Because the code is read only + sharable, you get the same benefits of page sharing.
So what precisely is in that image that gets created? Let's look at the contents:
Header
All PE files contain the standard set of headers, and an NGen image is no different.
Even with some of the trade offs mentioned here, we've seen some remarkable performance wins from this technique (and it only gets better each new release). There are, however, some considerations you need to make before you jump on board the NGen bandwagon. We'll cover those now.
Performance Win?
Measure, measure, measure. You should always verify that this is a win for you. First, you should be writing either a shared library (like the BCL itself) or a client application that would really benefit from this kind of win. You must go try your app with and without to make sure it is worth the effort. It may not always be. For example in a Server scenario, where the application runs a long time, you can amortize the cost of jitting over the run of your server. Combine that with lack of sharing across AppDomains and NGen isn't a win.
To Cache or Not to Cache
Generating an NGen image takes time. You will be compiling all of your code at once into the final binary. The larger the file, the longer this takes. We do this for the .NET Framework during installation, and you can see the pause. In our case it makes sense: all of your applications will run that much faster because of this. You should decide if your application can handle this kind of wait. You may not want to do this for dynamic web content in a browser for example. Who wants to wait for the compile to finish for it to come up once? And will you ever run the same program as is again?
Brittleness
The MSDN documentation gives you the command line arguments and usage of the tool (which comes with the distribution). You should read very carefully through the section on brittleness. As an example, the ngen'd image is tightly coupled to the version of the Framework you compiled against. If that version is serviced (we ship a Service Pack for example), then your image will not be loaded, and your application will automatically fall back to jitting. To be clear: your code will still run, but it will not take advantage of the speed improvements you measured. This is something we are spending a lot of time addressing in the next version of the product.
So you've decided to ngen your image. Now what? This section contains some steps you should be taking:
Picking a Base Address
Pick a good set of base addresses for your PE files. The NGen'd image will get placed right behind your IL image in the process. You need to allocate enough space between your IL PE files for this image to be loaded. The general guideline is to allocate at least 3x the original size of the IL image (so for example if your IL assembly was 1 MB large, you should allocate a 3MB total range for that assembly plus it's ngen'd image). You should take a look at the size of your NGen'd images and verify you have enough space, not only for what you ship, but for some reasonable amount of growth if you ship a bug fix release of the file.
When to NGen?
You need to pick when you want to invoke the tool. For the distribution, we invoke ngen as a final step during setup. This is the best approach in most cases, because your application will start fast from the first time it is run. However, this will consume space on the user's machine, so if you think a particular application, or component, that you ship may not be run often (or at all), then you might consider deferring ngen to when the application starts the first time. For example, you could schedule a windows timed task to compile it at night after the first time the code is run.
Servicing
When you release bug fixes to customers in your managed code, you will need to regenerate the ngen'd images as well. This is pretty simple to do, just run the ngen command again. But you need to make sure it is covered with the setup/patching feature you are shipping.
Uninstall
Remember to use ngen /delete to remove your unneeded assemblies from the cache when you uninstall your application. Currently the CLR will remove all assemblies tied to a version of the framework on uninstall of the .NET FX, but it doesn't try to figure out when you've uninstalled just your application.
As mentioned above, there are brittleness issues with ngen in V1.0 and V1.1 (aka Everett). So you need to plan out what you will do in the face of those things changing. As an example, we will release a service pack of the CLR at some point, and your cached ngen images will no longer load. Your code will still work, but it will run under the jitter which will be slower (you did measure to verify you needed ngen, right?).
Right now fixing this is tricky. Expect us to improve this situation in the future, but for now, here are some ideas on how you can address this:
Setup/Patching
Make sure your setup and patching programs are doing the right thing. If you ship a fixed version of your IL code, you need to re-run ngen on those files for it to be up to date.
At this point you've probably looked through the list of Servicing Hints and thought to yourself: "Wow that's kinda ugly!" And you're right. NGen for Version 1.0 and 1.1 was primarily designed and engineered for internal use by the CLR itself. When we install SP's of our stuff, we force a re-ngen of all of the core components, which keeps that part of your app running fast.
Going forward, Ngen is still a key foundation for our performance story. It gives you the working set wins (better page sharing, quicker loading) that are required for starting your application faster. It also allows for more aggressive optimizations in the compiler. If we tried doing really aggressive optimizations every time you ran the JIT, you'd actually run slower just waiting for the compiler to finish.
Expect in the future that we will be addressing the clumsiness and the servicing issues so your life is easier. Here just a few things we're thinking about:
ngen /repair
We'll be talking about a feature called "ngen /repair" at the October 2003 PDC in LA next month which dramatically simplifies fixing up the cached images.
And finally in closing, make sure to re-read those key take aways.
There are some important links you may be interested in reading:
MSDN Documentation Gregor's Perf Talk Jan's Perf Talk Rico's Perf Talk