A first hand look from the .NET engineering teams
This post was authored by Xy Ziemba, Program Manager on the .NET Native team.
At BUILD, we announced .NET Native Preview. .NET Native is a compilation technology and a small runtime that allow .NET applications to start up to 60% faster and have a smaller memory footprint. We've previously discussed at a high-level how .NET Native provides these performance benefits. Here, we'll talk about how the .NET Native tool-chain works.
.NET Native ships as a single SDK that lets you easily convert a .NET application from Microsoft Intermediate Language (MSIL) to native code. While we've made this experience just a few clicks, there are a number of steps involved in converting MSIL to native code. We'll go through some of those steps and give an overview of how .NET Native converts your app into native code.
There are seven major steps in building a .NET Native application:
Step 1 is how applications are built today. .NET Native adds steps 2-7 and the process is automated by the IL Compiler (ILC). These subsequent steps work off the MSIL part of the an application.
Before we start, please note that .NET Native is under active development. This means that a lot of the minutiae is changing and some of these major steps might change too! Let's look at each of these steps in detail.
All .NET applications start as source code, including .NET Native apps. The source code is compiled to MSIL binaries (EXEs and/or DLLs) using a language compiler, such as the C# compiler. There are a few other tools that are used in a typical build process, such as packaging an app into an APPX package. In the case of existing Window Store apps, an APPX package containing MSIL binaries is the final app artifact that can be uploaded to the Windows Store.
The .NET Native tool-chain inserts multiple additional steps in between source code compilation and packaging. Those additional steps are described below.
The .NET Native tool-chain starts with the IL Compiler , or ILC. ILC begins by pre-generating additional code needed to make the application run. On other .NET platforms, this code is generated as needed at runtime. ILC starts with two tools that generate code for marshaling and serialization.
The first of these tools is the Marshaling Code Generator, or MCG. Like its name implies, MCG generates marshaling code for native code interoperability scenarios such as Windows Runtime calls, P/Invoke calls, and COM interop.
MCG scans the entire application looking for any call to native code and any possible entry point for native code to call managed code. For each callee, three things happen:
All the code for this is generated as C# in appname.interop.g.cs. This is compiled into appname.WinMDInterop.dll and is added to the application.
Second is the Serialization Generator, or SG. This tool generates code to assist .NET serializers such as DataContractSerializer or DataContractJsonSeralizer. It scans the application to identify the types of objects an app will serialize. SG analyzes these types and produces the serialization and deserialization functions used at runtime.
Why does .NET Native do this? In short, these tools provide performance wins. Tools like SG and MCG move the analysis and generation of code from application users' computers to the developer's computer. This leads to simpler apps and a simpler runtime with fewer moving parts that perform better. We've used this design principle throughout the product. We refer to it as static compilation.
ILC next 'merges' the application. It gathers almost all code the application needs to run – the application EXE itself, generated code from SG and MCG, referenced managed WinMDs and DLLs, and referenced parts of the .NET Framework. These MSIL binaries are all combined it into a single EXE that contains all types and data for the application. For example, the new EXE contains its own copy of System.Object and System.String. This is analogous to static linking when you build a C++ application.
A couple extra things happen to make everything work. First, references to external assemblies are rewritten to reference the current assembly. This makes the applications one big self-contained unit. Second, prefixes are added to type names to identify the original assembly that contained the type. .NET Native internally understands these prefixes and maintains a lookup table to identify the originating assembly for any piece of code. The debugger and reflection also use the same lookup table so you don't ever see these prefixes.
By unifying the application, the merge step ensures that all the subsequent tooling only has to deal with a single artifact.
Merging all the framework DLLs with an application adds a lot of unnecessary code and size to an application. So, ILC performs 'dependency reduction' using an engine called the Dependency Reducer, or DR. (Compiler folks sometimes call this 'tree-shaking'.)
The Dependency Reducer works by identifying all code that could execute and throwing out the rest. To do this, the Dependency Reducer performs a variety of tasks including:
The last item is especially important. Visual Studio automatically adds a default Runtime Directive policy when you migrate an application to .NET Native. You can find this policy in default.rd.xml. By modifying the Runtime Directives, you can tune the behavior of the Dependency Reducer and often eliminate even more unused code from an application. This will make the application smaller and allow it to build faster.
Steps 2-4 were all 'MSIL transformations'. We also perform dozens of smaller transformations in addition to those detailed above.
Many of these transformations move runtime work into ILC. For example, other .NET runtimes include the ability to generate GetHashCode and Equals implementations for value types. In .NET Native, these implementations are generated in advance of runtime. In another example, .NET Native generates implementations for calls to Delegate.Invoke at compile time whereas other .NET runtimes generate these implementations at runtime. By not doing this work at runtime, .NET Native gets multiple small performance wins that add up to big wins that your users will notice.
Other transformations exist to persist information that would otherwise be lost when an application is turned into native code. For example, type and member names are not present in native code, but reflection requires that information. Transforms collect data that your application needs for reflection. It encodes the data in a format that can persist through the compilation.
appname.ilexe is emitted after these transformations are all completed. This is the last step that directly modifies MSIL.
At this point, there is a merged and reduced EXE containing MSIL that is ready to be compiled from high-level MSIL into machine code. ILC invokes a modified version of the Microsoft VC++ compiler called NUTC.
NUTC is modified to import MSIL and to understand the .NET type system. The optimizations and analysis are all powered by the C++ backend. This brings the best of C++ to .NET. This means that .NET Native applications take advantage of Microsoft Visual C++ inlining, dead code removal, and vectorization. This is the step that really provides the "native" in .NET Native.
NUTC doesn't actually output a binary that's ready to run. It outputs MDIL, or Machine Dependent Intermediate Language. MDIL isn't a high-level language like MSIL. Instead, MDIL includes a given platform's assembly language with some additional tokens to avoid hard-coding certain addresses and pointers. These additional tokens create a looser coupling between NUTC and the .NET Native runtime. They are resolved in the next step through a process called 'binding'.
If you're interested in MDIL, check out this Channel 9 talk.
The binder converts MDIL into machine code that a given architecture (e.g. ARM, x64) can run. This tool resolves the MDIL instructions in the file and hard-codes them to the .NET Native runtime. For example, the binder connects object and array allocations in the applications to the garbage collector in the runtime. You can also think of this like linking a traditional C application.
The binder's final output is an optimized DLL that contains the app code. Of course, an app can't run from just a DLL. So, the binder also emits a small stub application to load the DLL and start the application's execution.
As you can see, the current .NET Native tool-chain is composed of a lot of parts. However, this has all been done to move logic and complexity from the runtime and into the tool-chain. This means that applications start faster, are simpler, and can be better optimized.
In the coming weeks, we'll have additional blog posts to explore various parts of the tool-chain.
Any chance you can expand on the role of cloud compilation done in the two stores (Windows & Windows Phone) ... Exactly their role in the toolchain ?!
Interesting article! It was cool to learn more about how this works under the hood. One thing you didn't cover though: Why does the binder emit both an EXE and DLL? What purpose does the stub serve that prevents the app code from being in a single EXE?
How does this handle apps using dynamic methods or linq expression runtime compilation, and loading assemblies at runtime? I guess there is still *some* dependency on the Framework proper for these corner cases?
Great post! Can't wait to see where .NET Native will go.
@Xy: I think LINQ is already compiled in MSIL (transformed as foreach) and the dynamic keyword type can be determined from static analysis (all possible path that leads to the dynamic variable assignation). But I would also be interested to know the official answer.
It is one of the things that seemed impossible to have at .NET land. But here it is!! Thank you for your great work. The commitment to .NET really means a lot to us who make a living from using it!
I really like the .NET native project and hope it succeeds!
I have a couple of concerns though when reading this article. You mention that all dependant DLLs are merged into the EXE. So if I have 50 MB of 3rd party .NET components and dependant assemblies, are these all merged into one big 50+ MB executable? And is that really faster for startup? Isn't it possible to keep dependant assemblies as separate files?
> Analyzing data bindings in XAML documents
Does that work reliably? What if the analayzer fails to associate an XAML binding with a property of an object that is only used within XAML? Will it remove that unnecessary property and then fail at runtime?
@LiquidBoy - there's some more information as to how compile in the cloud works on the phone here: channel9.msdn.com/.../Mani-Ramaswamy-and-Peter-Sollich-Inside-Compiler-in-the-Cloud-and-MDIL
@Cory/Vincent - LINQ expression trees are handled via an expression tree interpreter. Interestingly enough, we found that for some reasonably large number of invocations, the interpreter beats out JITing a dynamic method since you don't wind up paying the overhead of compilation at runtime. The dynamic keyword works by keeping around enough reflection information to be able to invoke back into your application at runtime. By default, Visual Studio emits a file (rd.xml) which tells the .NET Native tool chain to make your application code reflectable at runtime. You can certainly tune these settings, at which point you'll need to be careful to make sure to enable reflection invocation on types that will use dynamic invocation at runtime.
@flöle - we do merge everything together at build time, however we also do tree shaking which means that code that is not used at runtime is thrown out of the final image. Due to this, we've found that either (a) you need the full 50MB of dependent code and therefore your package is larger due to your application being large or (b) you don't need most of that code, so your application doesn't include it. This isn't to say that we'll never have a story for having multiple files at runtime - that's one of those items which we will tune as we continue to develop .NET Native.
Note that for startup costs, you don't need to page the full executable into memory - it does need to have address space reserved, but the binary itself only has to have the pages which are used at startup loaded in. So, sufficiently clever binary layout can help to reduce startup cost over having to read multiple modules.
As for XAML, we have heuristics baked into the tool chain and we get an assist from the XAML compiler in identifying which items need to be bindable. While we believe it to be unlikely, if you start playing around with default reflection policy (by tweaking your .rd.xml) it would theoretically be possible to defeat our heuristics and expose a situation where something you tried to bind to was missing. If you get into that situation, however, we'd love to hear from you so that we can make our heuristics better. If you do find yourself in that boat, you can let us know either via email, a connect bug, or the .NET Native forums and we can see what we can do to help you out.
Our of curiosity, does this compiler tool-chain support PGO?
@Adam - currently we don't have a way to pass PGO data to the tool chain, but that is something which we will be considering as we continue development of .NET Native.
@Shawn Thanks for the additional information!
I still don't understand how your XAML parsing is supposed to work though.
> As for XAML, we have heuristics baked into the tool chain [...] identifying which items need to be bindable.
How can you ever reliably know which types are used as data source for bindings? Are you tracking which objects are assigned to the DataContext property within C# code or something like that?
Great article. So, in a way, the tool chain works like LLVM.
One question regarding tree shaking: what is your approach for the cases of API development? The rationale is great when you are developing an app, but in the case of APIs (dll), the process would lead to missing code.
@flöle - they're heuristics so they're subject to change over time, but for now we do things like notice that you've data bound to a property named "AddCustomerCommand" for instance. Then, when analyzing your program we notice that you've got a class, perhaps AddCustomerViewModel with a property of type AddCustomerCommand. We can then assume that this might be the object being data bound to and mark it as necessary for your program.
@Ultrahead - right now .NET Native targets application packages, so we don't currently have a model that lets you, for instance, expose a class library surface out of a native binary.
This scenario works fairly naturally due to the design of the tree shaking algorithm. The inputs to the tree shaker are a set of known roots, and the output is a set of necessary types and members given those known roots. The input roots come from a variety of sources - the obvious one is the application's Main method, but they also include XAML data bindings and input from the .rd.xml file that controls what types will be available for reflection for instance. In the case of libraries, the seed roots would also include the public surface area that is exposed out (for WinRT activation, this boils down to the public surface area of the activatable classes listed in the APPX manifest).
Given this input set of roots, the tree shaking algorithm can then walk the program to produce the necessary methods and types for program execution, and the tool chain can then use this information to discard the remaining code which is now unnecessary. Since the algorithm is expressed in terms of roots it generalizes quite naturally to support library surface area.
Sounds very exciting...perhaps someone can expand on a the native tool chain facilitate runtime code generation (such as reflection, emit, instantiation of generic types, linq expression compilation, etc)
Also, is source-level debugging affected?