I received several requests to write a little something on using managed code in a cold startup case – by which I mean immediately, or at least soon, after a reboot.  I guess before I get into that I should give my usual disclaimer that I’m not going to try to be perfectly correct in my exposition in the name of being remotely brief.

So here goes.

It’s often said that in the performance world “space is king” – meaning that if you make your code small it will naturally tend to be fast because of locality improvements and whatnot.  However certain or doubtful of this “fact” you may be for normal “warm” cases it’s surely the case in the cold cases.

In cold startup it is not likely that any of the CLR components are yet in the operating system’s disk cache – you’re going to be doing real I/O to bring those in.  Some of that will be batched up nicely by the operating system but once some core set of pages are loaded the rest will start to fault in.  Those faults will be “hard faults” meaning you’re going to go to the disk.

This is a peculiar situation because in such a world, processing costs tend to almost vanish in comparison to all of the disk i/o you will be doing.  Even seemingly daunting tasks like jitting up a goodly bit of code may be overwhelmed by the disk i/o.

It’s a whole new ball-game.

The best way to get a handle on your costs then is to switch from measuring processor time, or even wall-clock time, and just measure the i/o you are doing.  Which pages do you need to bring into memory and from which dlls will they come?  What other files will you read?  All of the I/O for those files will also cost you. 

Seemingly innocuous reads of configuration files can move the disk head around slowing down other reads and potentially dragging in a lot of parsing code (more reads).  Remember that in cold start i/o is at a premium so anything you can defer until after startup, when the disk has otherwise settled, is a great idea – give the disk scheduler every chance to get the right pages in the right order.  If you defer the initialize of some subsystems you save directly by not touching the code and indirectly by not looking at any registry entries it might need (that’s I/O too).

Now let me get back to the JIT phenomenon.  In warm startup cases, if you are loading from ngen’d images those would be coming from the disk cache, and the cost of loading those pages into your process is fairly low compared to jitting.  Furthermore many of those pages can be shared across processes so we like to encourage putting sharable code into ngen’d images – jitted code can’t be shared.  In cold startup things are different.  The IL is smaller than the native code so it may in fact be cheaper to load the IL and JIT it than it would be to do disk i/o for the prejitted code.  Of course you probably don’t want to do this for code that is likely to be needed in other processes (like mscorlib) because the cost of the loading is amortized.  But if your process has a significant amount of application specific code and cold startup is paramount to you then jitting may be more attractive than it first seems.

Of course… you have to measure to be sure of anything.

Recap:

  • Cold startup time will be dominated by disk i/o
  • Consider the size of code you are loading – defer what you can
  • Consider initialization files and the code to parse them – defer what you can
  • Consider collateral operating system resources like there registry – avoid what you can

Managed code startup tuning isn’t really all that different from unmanaged tuning – the real issue is that in managed code everything is easier, even dragging in some huge DLL with a couple lines of C#.  So be careful out there

See also: http://blogs.msdn.com/ricom/archive/2004/04/22/118422.aspx for some general Performance Planning tips.

Update: As it happens a colleague of mine just did some experiments with a test application using a recent build of Whidbey and some of the Avalon dlls showing some of the kinds of effects you could see if you choose to ngen more or less.  And yes I thanked him profusely :)

Environment: These tests were run on a 1GHz PIII with 512Mb RAM on XP/SP2. The machine was off the network, anti-virus software was turned off, and some other services that seemed unneccesary were also turned off.  This is a fairly clean situation.

Scenario: The test application loads three FX assemblies (mscorlib, system, and system.xml), five Avalon assemblies and two application specific assemblies.

Scenario   Average Time     Methods Jitted   

Samples

1. Nothing ngen'ed 43.302s   6279   44.183, 43.262, 42.461
2. Only mscorlib ngen'd 48.356s   4882   48.559, 48.199, 48.309
3. All three FX ngen'd 43.733s   4646   43.292, 44.063, 43.843
4. Everything ngen'd 29.174s   0   31.124, 28.350, 28.050

This data indicates ngening can hurt or help cold startup depending on the particular scenario so it's rather hard to give general guidance -- you'll have to measure your own specific scenario. Notice how ngening mscorlib alone actually hurts cold startup and ngening all three fx assemblies seems to be the break even point for this scenario, after which things start to improve.  Even armed with this data though, the problem is more subtle because of course there is generally not just one thing happening at startup and you might want to look ahead to what comes after -- preloading some of the the framework will have collatoral benefits for things that may run shortly afterwards (or not so shortly) and those effects should not be underestimated.

And don't forget... the warm startup case will have totally different dynamics...

Putting me further in debt, here's another update with the equivalent warm numbers (same setup): 

Scenario (WARM)   Average Time     Methods Jitted   

Samples

1. Nothing ngen'ed 13.071s   6279   13.098, 13.048, 13.068
2. Only mscorlib ngen'd 11.546s   4882   11.546, 11.536, 11.556
3. All three FX ngen'd 10.735s   4646   10.725, 10.745, 10.735
4. Everything ngen'd 2.543s   0   2.553, 2.553, 2.543

Big difference...