I whipped up this guide to help people on internal teams take some of our larger goals and think about them as they apply to their own area. I thought a lot of the advice was generally applicable to managed library authors so I rewrote it for a wider audience and I offer it here with my usual disclaimers...
How to think about major performance themes in your feature/library
When considering new features or significant upgrades to existing features it’s important to understand how these items will impact the overall perceived performance of the system. We have several classes of goals to help us to achieve a good customer experience they are:
The first two of these are fairly easy for teams to act on, most of the discussion focuses on the third point.
As always, the discussions below are necessarily abbreviated and are only intended to provide a helpful and practical model to think about the problems. Providing a 100% correct discussion of any of these issues would consume entire volumes and not be terribly helpful in any case so no attempt at perfection is made here.
It’s easy enough to understand a no regression bar; the trick is to define the appropriate scenarios to use for measuring meaningful regressions. There are two things you should do to achieve this goal:
It’s comparatively simple to get performance tests running regularly once they are authored, however it is a bit of an art to produce tests that give reasonably repeatable numbers that can be trended. It’s in everyone’s interest to invest in these tests and getting them into the normal performance battery so that specific changes causing regressions can be identified.
From time to time we get specific requests for corrections in key scenarios that block partners. Some of these we take on as must-fix issues and they are assigned to a particular milestone.
Particular teams are assigned performance bugs to track this work; this is the second source of performance bugs. Naturally creation and assignment of these bugs is done in cooperation with the appropriate teams as it would otherwise be highly randomizing.
These are the high level goals set by the performance team, our management, and our partners, to drive certain key performance themes. Because of their broad nature it is sometimes difficult for individual teams to understand how they might help or hinder this process. The main purpose of this entry is to provide some advice on how to best internalize these goals.
Working Set refers to the number of pages of virtual memory committed to a given process, both shared and private. Managed Module Working Set refers specifically to those pages whose origin can be traced directly to a DLL that contains managed code (e.g. mscorlib.dll, system.dll, system.xml.dll).
The easiest way to help reduce the size of module working sets is of course to reduce the size of the modules, or at least grow the modules as slowly as possible. While it isn’t always the case that adding new code will affect the working set (because the code might not run in scenarios that are working set sensitive) overall code growth is nonetheless the leading indicator of expected working set growth. In contrast, removing or consolidating code doesn’t generally cause working set issues (though it’s possible in exotic cases)
Don’t forget the hidden costs associated with the presence of managed classes – these are things like metadata, vtables and so forth. These can end up rivaling the code itself for size. You can make this situation worse by having large numbers of attributes or other metadata, especially if they are frequently consulted.
Lastly, virtually any use of reflection will cause some otherwise cold metadata to be forced into the process working set, it’s only a question of how much metadata and how often is that data shared with other processes. Avoiding reflection where it is practical to do so means you will never face that particular problem.
This includes state maintained on a per AppDomain basis by the CLR, such as loader data structures, security policy, evidence, grant-sets etc. In addition to CLR overhead, each library can have per AppDomain overhead.
Interestingly it turns out that the bulk of the time/space we spend initializing managed state isn’t general startup but actually has to do with creating the first app domain. This is noteworthy because of course we’ll be paying roughly that same cost on the second and subsequent AppDomains.
If you’re writing managed code the primary sources of per-application domain data are the static members of your classes. The cost may appear in different places (e.g. primitive types don’t end up on the GC heap) but it’s nonetheless per AppDomain memory.
To reduce your per AppDomain working set be sure to defer as much initialization as possible to a time when you’re sure the initialization is necessary (this saves both code and data), reduce the static data (and of course the objects to which the static data refers), and simplify the construction path of that data as much as possible. Where data is shareable between AppDomains, consider plans that would facilitate a central copy – this can be worth the complexity for largish data items.
Private memory, is defined as memory allocated for a process which cannot be shared by other processes. This memory is more expensive than shared memory when multiple such processes execute on a machine. Private memory in (traditional) unmanaged dlls usually constitutes of C++ statics and is of the order of 5% of the total working set of the dll. NGEN'ed assemblies, on the other hand contain more information. Apart from the module statics, they contain key CLR data structures required to support managed code execution during runtime. Some of these data structures are private to the process.
Private bytes in the native images are predominately caused the application of “fixups” to get access to data whose location could not be known at ngen time. The biggest source of these by far is the fixups for string literals. But in any case its generally a bad idea to think too much about fixup problems when writing managed code because fixups are routinely targeted for eradication at the mscorwks level and control over them is very limited at best.
Instead of thinking about per module private bytes, you should be thinking about reducing private bytes more generally, and of course the main source of private bytes will be objects on the GC heap, especially long-lived objects. General reduction of long lived managed objects is as valuable, if not more so, than private page reduction in the modules.
The CPU time to load and initialize the CLR and .NET Framework libraries, load user libraries, and get to the Main entry point.
As previously discussed, many of the costs associated with startup actually pertain to the creation of the first AppDomain – this is true for both space and time costs. Most if not all of the points discussed in the reduction of per AppDomain space are entirely applicable here, so they won’t be repeated.
Startup time generally is consumed in two big ways: soft faults, and I/O, and they are deeply intertwined. A soft fault occurs when a page that is required in the process’s working set is not present in the process but is present elsewhere in physical memory. Three reasons this happens are:
The remaining cases of actual disk I/O significantly (e.g. 1000x) more expensive than the above, they are:
The above 6 categories will tend to dominate the cost.