If you develop .NET client applications that are deployed to users over Terminal Services (TS) or Citrix then this is the post for you. Why? – well, there’s a bit of an issue that not a lot of people know about, and it can really ruin your day. First off some preamble about how we got here and why it’s an issue. I’ll then present a solution.
When a process runs on Windows it typically has some executable code and some data. To vastly simplify this let’s just say that the memory space taken up by my application includes different pages, some code - some data. There may well be other types of stuff in memory but that’s not important to this discussion at the moment. My application runs, loads up code into the pages allocated for code, and loads up data into the pages allocated for data. We’ll call these ‘Private’ memory pages.
Then another instance of my application is started on the same machine. This could be me running a new instance on my own desktop, so I now have two windows open doing the same thing, or (and this is the more critical one) it could be another user on the same server running the application over TS or Citrix.
Now, memory is a scarce resource as there’s never enough of it. (As an aside, that’s a bit of a lie to anyone who has been tinkering with computers for nearly 30 years and who started with 1K and still got it to do something, but what the heck).
With multiple instances of my application running, it stands to reason that I’m using up more memory. And as you’ll know if you found this post after the event, running out of memory can happen on a Terminal Server/Citrix farm when you have a boatload of users logged in.
As memory is such a scarcity, it would be nice to squeeze as much use out of what you have.
Windows has a facility whereby memory pages can be marked as ‘shareable’. What this means in practice is that some pages from a given process can be shared by another process, and indeed could be shared with many processes. And what’s typically in these shared pages you might ask – well code of course. As code doesn’t change (more on that lie in a moment!) then it makes sense for the operating system to share pages between instances of the same application, which would minimise the total number of pages used in a multi-user environment such as TS or Citrix. Cool!
Now, back to that lie about code not changing. When you write a .NET application and compile it up, you get an executable that contains code, right?. Wrong – you get an executable that contains IL, which is in effect code, but until someone brings out a general purpose CPU that understands IL it’s not executable – it needs to be converted into code by the JIT compiler.
Now then, guess what happens when the JIT gets hold of your ‘code’. Well, it compiles it of course – but here’s the critical thing, it compiles your code into Private pages. And what’s special about Private pages – well the clue is in the name, they’re private to your process, meaning that JITted code cannot be shared between processes. Eek!
So, your swanky .NET client application may be easy to write and debug, but it’s memory footprint on a TS/Citrix box is less than stellar. Never fear though, there is a way out of this conundrum.
You may never have found a use for NGEN (the Native Image Generator), and indeed you might not even know such a beast exists – so for those of you who don’t know, what NGEN does is pre-compile all of the IL into x86/x64 assembly language and stores the compiled image on disk to be used when needed.
A lot has been written about NGEN and how it can affect the startup performance of your application, but a less well known feature is that using NGEN you can also affect the amount of memory used by multiple instances of your application.
Note: If you’re thinking “Hey, I can save memory on all my .NET apps using NGEN” you would be wrong. This stuff really only makes sense if your app is being run over TS/Citrix.
The critical thing to understand about NGEN is that whilst it compiles your code, it also marks the code pages as shareable, so that multiple instances of your application can share parts of the memory space used by the first instance. And that’s really useful if you’re running under Terminal Services.
In the image on the left I have two executables (Instance A & B) containing code and data pages. There’s a 1:1 correspondence between how big my .exe is and how much physical memory is used up overall. With two processes running I use double the space of running one instance of the process.
On the right hand side however there’s only half of the code pages in memory as the code is shared between my two application instances (C & D). Note – this is a vast over simplification, and you are unlikely to be able to share all of the code pages from any given .exe, but you will be able to share a lot of them so it’s worth trying.
The Windows Resource Kit includes a tool called vadump (Virtual Address Dump) which is available for download here. It’s a cool tool but has recently been joined by the most excellent VMMap available from the Sysinternals site. This is a really cool new tool by Mark Russinovich and Bryce Cogswell and shows the same sort of information as vadump but in a Window (rather than a command prompt ala vadump).
To provide some examples from vmmap I created a simple .NET application which contains a fair bit of code. As I don’t like writing any more code than I need to I wrote a program to write a program that contained a load of code. I needed a decent amount of code in the application so that the figures would make some sense – if I had only a small amount of code in my .exe then the amount of pages used by the JITter vs. NGEN wouldn’t be significant.
If you compare the first row from the shareable WS column you’ll see that the NGEN assembly has roughly 9Mb more shareable code than the original image. I know, 9Mb isn’t a huge amount, but it is a sizeable chunk if you run several users off the same box, and this was from one (admittedly large) .NET assembly. My app was about 3.5 Mb on disk – an application I’ve been working on from one of my customers is just over 40Mb as it includes a bunch of controls from 3rd parties and a whole host of other code.
Assuming I could get the same sort of benefit by running NGEN over my customers application then I might save 100Mb (!) per running instance. It doesn’t take a lot of users to make that significant – the 11th user would bring us to a saving of around 1Gb. Now we’re talking.
When you run NGEN it finds referenced assemblies (i.e. those that have a hard assembly reference from the main .exe). If you are dynamically loading assemblies then NGEN won’t find these so that’s another place where VMMap can be of help. If you run it up on your application then you can see which images are loaded that are using NGEN (well, you can when you know what you are looking for).
If you run VMMap the main part of the screen shows a list view. The .exe I’m running was called ShowNGENMemory.exe, so you need to scroll down the display to look for ShowNGENMemory.ni.exe. The .ni part indicates that this is a native image. If there’s an assembly with no corresponding .ni image then exit the app, run NGEN over the offending assembly and try again.
One other thing you should do to all of your assemblies is set a base address. What’s that I hear you say? Well, when Windows loads any image file (.exe, .dll) it looks for a base address – this is effectively the ‘preferred address’ in virtual memory where that image would like to load. The loader reads the base address, checks that address is free, and if so loads up your image. If however that address is already taken (i.e. something else loaded at your base address, or there’s an image loaded whose footprint goes over your area of memory) then we have some extra work to do. And as you can guess, this work takes time – we’re essentially going through the module and changing addresses). For a thorough examination of rebase please see the article here – it may be old but still a great read. This is less important with managed assemblies but still relevant.
Setting a base address is also a great idea due to the way pages are shared by the operating system. In short, if we need to swap out a given image then if that image was loaded at its preferred base address it can simply be thrown away. If an image is not at it’s preferred base address then the image will be stored in the page file, which obviously takes time to write to and subsequently read from.
To set a base address, just go to the ‘Build’ tab of your project properties window and click on the ‘Advanced’ button. There you’ll see the following dialog and at the bottom is the base address.
You really only need to set base addresses on DLL’s – the default is 0x400000 but you should change this for all of your assemblies. You can test relocations using another Sysinternals tool – ProcessExplorer. When you run this for the first time, click on Options –> Configure Highlighting, and make sure you check ‘Relocated DLLs’ at the bottom (it’s off by default)…
I always use red as it makes them stand out. Then run your application and look for relocations – you’ll need to be viewing DLL’s Ctrl+D) and have the lower pane view on (Ctrl+L).
Note: On Windows Vista, Windows Server 2008 and above we have a new feature called ASLR (Address Space Layout Randomisation) which is primarily there to make the job of a hacker a little less fun. In essence we move the locations of images in a random manner, so that exploits targeting a given memory address have much less chance of being effective. So, on a system with ASLR you’ll notice few if any Relocated DLL’s. I’ve kept this session in as I know plenty of people still running Windows Server 2003 in production environments, so rebasing is still a good idea.
Hopefully this post has provided you with enough information to go out and grab some memory back on your TS/Citrix boxes. There’s another upside to running NGEN on your code – it’ll start up faster! This is because with a regular .NET application we have to JIT the code as we call it. With NGEN this has already been done, which generally means you get snappier application startup.
The ideas presented here are really only necessary when running your application under Terminal Services or Citrix. For regular client applications that’s unnecessary, but when running under TS/Citrix I’d say it’s not important, it’s imperative.
One last thing for application vendors out there – why not add another page to your installation application that asks the user if they are running in a TS/Citrix environment? Then you can do the right thing and NGEN the application and all dependant assemblies as part of the install and save everyone a load of grief.
And lastly, don’t be tempted to use Task Manager to work out how much memory your application is using. It lies. Well, that’s a bit unfair really – it reports a reasonable number but doesn’t include any information about shared pages, so might report you are using 200Mb when you’re actually sharing 100Mb of that with several other users. To get a real picture of how much RAM a given application uses then give vadump of vmmap a go.
Originally posted by Morgan Skinner on Saturday, March 07, 2009 here.