A first hand look from the .NET engineering teams
This post was authored by Xy Ziemba, Program Manager on the .NET Native team.
At BUILD, we announced .NET Native Preview. .NET Native is a compilation technology and a small runtime that allow .NET applications to start up to 60% faster and have a smaller memory footprint. We've previously discussed at a high-level how .NET Native provides these performance benefits. Here, we'll talk about how the .NET Native tool-chain works.
.NET Native ships as a single SDK that lets you easily convert a .NET application from Microsoft Intermediate Language (MSIL) to native code. While we've made this experience just a few clicks, there are a number of steps involved in converting MSIL to native code. We'll go through some of those steps and give an overview of how .NET Native converts your app into native code.
There are seven major steps in building a .NET Native application:
Step 1 is how applications are built today. .NET Native adds steps 2-7 and the process is automated by the IL Compiler (ILC). These subsequent steps work off the MSIL part of the an application.
Before we start, please note that .NET Native is under active development. This means that a lot of the minutiae is changing and some of these major steps might change too! Let's look at each of these steps in detail.
All .NET applications start as source code, including .NET Native apps. The source code is compiled to MSIL binaries (EXEs and/or DLLs) using a language compiler, such as the C# compiler. There are a few other tools that are used in a typical build process, such as packaging an app into an APPX package. In the case of existing Window Store apps, an APPX package containing MSIL binaries is the final app artifact that can be uploaded to the Windows Store.
The .NET Native tool-chain inserts multiple additional steps in between source code compilation and packaging. Those additional steps are described below.
The .NET Native tool-chain starts with the IL Compiler , or ILC. ILC begins by pre-generating additional code needed to make the application run. On other .NET platforms, this code is generated as needed at runtime. ILC starts with two tools that generate code for marshaling and serialization.
The first of these tools is the Marshaling Code Generator, or MCG. Like its name implies, MCG generates marshaling code for native code interoperability scenarios such as Windows Runtime calls, P/Invoke calls, and COM interop.
MCG scans the entire application looking for any call to native code and any possible entry point for native code to call managed code. For each callee, three things happen:
All the code for this is generated as C# in appname.interop.g.cs. This is compiled into appname.WinMDInterop.dll and is added to the application.
Second is the Serialization Generator, or SG. This tool generates code to assist .NET serializers such as DataContractSerializer or DataContractJsonSeralizer. It scans the application to identify the types of objects an app will serialize. SG analyzes these types and produces the serialization and deserialization functions used at runtime.
Why does .NET Native do this? In short, these tools provide performance wins. Tools like SG and MCG move the analysis and generation of code from application users' computers to the developer's computer. This leads to simpler apps and a simpler runtime with fewer moving parts that perform better. We've used this design principle throughout the product. We refer to it as static compilation.
ILC next 'merges' the application. It gathers almost all code the application needs to run – the application EXE itself, generated code from SG and MCG, referenced managed WinMDs and DLLs, and referenced parts of the .NET Framework. These MSIL binaries are all combined it into a single EXE that contains all types and data for the application. For example, the new EXE contains its own copy of System.Object and System.String. This is analogous to static linking when you build a C++ application.
A couple extra things happen to make everything work. First, references to external assemblies are rewritten to reference the current assembly. This makes the applications one big self-contained unit. Second, prefixes are added to type names to identify the original assembly that contained the type. .NET Native internally understands these prefixes and maintains a lookup table to identify the originating assembly for any piece of code. The debugger and reflection also use the same lookup table so you don't ever see these prefixes.
By unifying the application, the merge step ensures that all the subsequent tooling only has to deal with a single artifact.
Merging all the framework DLLs with an application adds a lot of unnecessary code and size to an application. So, ILC performs 'dependency reduction' using an engine called the Dependency Reducer, or DR. (Compiler folks sometimes call this 'tree-shaking'.)
The Dependency Reducer works by identifying all code that could execute and throwing out the rest. To do this, the Dependency Reducer performs a variety of tasks including:
The last item is especially important. Visual Studio automatically adds a default Runtime Directive policy when you migrate an application to .NET Native. You can find this policy in default.rd.xml. By modifying the Runtime Directives, you can tune the behavior of the Dependency Reducer and often eliminate even more unused code from an application. This will make the application smaller and allow it to build faster.
Steps 2-4 were all 'MSIL transformations'. We also perform dozens of smaller transformations in addition to those detailed above.
Many of these transformations move runtime work into ILC. For example, other .NET runtimes include the ability to generate GetHashCode and Equals implementations for value types. In .NET Native, these implementations are generated in advance of runtime. In another example, .NET Native generates implementations for calls to Delegate.Invoke at compile time whereas other .NET runtimes generate these implementations at runtime. By not doing this work at runtime, .NET Native gets multiple small performance wins that add up to big wins that your users will notice.
Other transformations exist to persist information that would otherwise be lost when an application is turned into native code. For example, type and member names are not present in native code, but reflection requires that information. Transforms collect data that your application needs for reflection. It encodes the data in a format that can persist through the compilation.
appname.ilexe is emitted after these transformations are all completed. This is the last step that directly modifies MSIL.
At this point, there is a merged and reduced EXE containing MSIL that is ready to be compiled from high-level MSIL into machine code. ILC invokes a modified version of the Microsoft VC++ compiler called NUTC.
NUTC is modified to import MSIL and to understand the .NET type system. The optimizations and analysis are all powered by the C++ backend. This brings the best of C++ to .NET. This means that .NET Native applications take advantage of Microsoft Visual C++ inlining, dead code removal, and vectorization. This is the step that really provides the "native" in .NET Native.
NUTC doesn't actually output a binary that's ready to run. It outputs MDIL, or Machine Dependent Intermediate Language. MDIL isn't a high-level language like MSIL. Instead, MDIL includes a given platform's assembly language with some additional tokens to avoid hard-coding certain addresses and pointers. These additional tokens create a looser coupling between NUTC and the .NET Native runtime. They are resolved in the next step through a process called 'binding'.
If you're interested in MDIL, check out this Channel 9 talk.
The binder converts MDIL into machine code that a given architecture (e.g. ARM, x64) can run. This tool resolves the MDIL instructions in the file and hard-codes them to the .NET Native runtime. For example, the binder connects object and array allocations in the applications to the garbage collector in the runtime. You can also think of this like linking a traditional C application.
The binder's final output is an optimized DLL that contains the app code. Of course, an app can't run from just a DLL. So, the binder also emits a small stub application to load the DLL and start the application's execution.
As you can see, the current .NET Native tool-chain is composed of a lot of parts. However, this has all been done to move logic and complexity from the runtime and into the tool-chain. This means that applications start faster, are simpler, and can be better optimized.
In the coming weeks, we'll have additional blog posts to explore various parts of the tool-chain.
@Shawn: thanks for the explanation.
@Shawn Thanks, that is exactly what I wanted to know.
Everything will be merged and dependency free but what about code reuse – every .net framework app is using common things that now sit in .NET Framework. If I understand right every app will have a copy of some part of .NET and this lead to massive code in memory that could be shared?
This is interesting stuff. Would this process support compilation of a mixed managed application as a fully native application via C++/CLI?
@dnf - think of the current developer previews a lot like you would libraries such as the C++ STL or Boost. In those types of libraries, the code is pulled into the application and is therefore optimized together with the application. This type of whole program analysis allows for optimizations such as devirtualization for example. Additionally, the tree shaking allows your application's copy of System.String to have only the String members you use, while my application's copy may have a disjoint set of members that I use.
There is a tradeoff here to be sure - and it's one knob you'll probably see us dial a bit as we continue the developer previews of .NET Native.
Another way to think of things is that with the current Windows Store application model, each application package brings with it all of the dependencies it has - so every application that needs a library such as SharpDX already contained the code for SharpDX. .NET Native provides the ability to pull the portions of that code that the application uses into the application's package and discards the rest.
@DaveCunningham - right now we don't support C++/CLI assemblies with .NET Native. We do support applications which are partially managed and partially native (across separate module boundaries). In those cases you'll wind up with a .NET Native module that contains the native code generated for your managed modules along with your original native modules as the output from the build process.
Is it true that Microsoft do not have the expertise (the smart people) to come up with a new version of Visual Basic 6.0 ?
Great post! It is very interesting and helpful to know how .NET Native works under the hood. One question: Why is this article not part of the documentation in the MSDN (msdn.microsoft.com/.../dn807190(v=vs.110).aspx)? Would be a great overview article about the fundamentals of .NET Native like you have e.g. for WPF (msdn.microsoft.com/.../ms750441(v=vs.100).aspx).