Recently I was working on diagnosing a performance issue with a customer’s web site with a colleague (this is one of our favourite engagement types so if you need some help let me know J), and we found that items were being trimmed very regularly from the ASP.NET Cache, causing excessive back-end work, and in turn reduced scalability.
* a “cache trim” is when ASP.NET looks for unused (based on a LRU algorithm) cache entries and deletes them.
So if you’re using the ASP.NET Cache API (or indeed any cache provider) to store custom application data, the moral of the story is to make sure you size your cache appropriately, and monitor it during test and live to see how it behaves.
It is worth knowing that an un-configured ASP.NET Cache takes a memory limit of the minimum of either 60% of physical RAM, or a fixed size RAM. This fixed size is different depending upon your processor architecture - see Thomas Marquardt's post here for some more detail. The idea is that when the limits imposed by configuration are approached, the cache will start trimming.
I’ve attached a very simple ASP.NET application that you can play with to study the cache behaviour while you’re reading the next section. Host it in IIS for the best results.
It is possible to set limits on how much memory an ASP.NET Application Pool can consume in IIS (see here), and you can also configure behaviour using settings on the cache web.config element. It is important to understand exactly what these mean;
· privateBytesLimit (in web.config, on the cache element): This is the maximum number of bytes that the host process is permitted to consume before the cache is trimmed. In other words, it is not the memory to dedicate to the cache, nor a limit to the memory that the worker process can use... but rather the total memory usage by the w3wp process (in IIS7) at which cache trimming will occur.
· percentagePhysicalMemoryUsedLimit (in web.config, on the cache element): This is the maximum percentage of RAM that the machine can use before the cache is trimmed. So on a 4gb machine with this set to 80%, the cache would be trimmed if Task Manager reported over 3.2gb (i.e. 80% of 4gb) was in use. Note that this is not the percentage memory used by ASP.NET or caching – it is the memory used machine-wide.
· Private Memory Limit (in IIS7 manager, as an advanced property of the App Pool): This is the maximum number of private bytes a worker process can consume before it is recycled (which will of course also empty the cache). If you set this limit lower than privateBytesLimit in web.config, you’ll always get a recycle before a cache trim... which sounds unlikely to be what you would want.
The remaining content of this post itemises things you should consider monitoring to assess your cache’s performance. Some of it is obvious – but it is surprising how easy it is to overlook the obvious when your boss is breathing down your neck about your slow application J
The best indicator that something might be going wrong is if your application is performing badly. Or maybe it was performing fine, but you’ve changed something or gotten more users, and suddenly it’s not so fast. Make sure you collect timing data in your IIS logs, and use LogParser to parse and analyse them. Also use standard performance counters from ASP.NET, WCF, and whatever other technologies you’re using. Get a feel for where and when things are slow or failing. This isn’t specific to caching, but it is an essential first step!
Do some simple maths on the number of hits your back-end resource is getting. For example, if you have a “GetCities” web service, the results of which are cached for 2 hours, and you have 4 front-end load balanced web servers each with their own cache, you should expect a maximum of 4 hits to that web service every 2 hours. If you’re seeing more than that, alarm bells should be ringing.
There are some great performance counters for the ASP.NET Cache API, so use those to monitor the state of the cache. Specifically the “ASP.NET Apps\Cache API Entries” counter is the number of items that are in the ASP.NET Cache right now. It should be broadly stable over longer periods with approximately the same load. If you cache an item per user, per region, or anything similar, be aware that this can dramatically affect your cache behaviour and memory consumption... in which case Cache Entries can be quite revealing.
The “ASP.NET Apps\Cache API Trims” counter is the number of cache items that have been removed due to a memory limit being hit (i.e. they were trimmed). Obviously this should ideally be very low or zero under normal operation. Too many trims could indicate you need to revisit your caching strategy, or manually configure cache memory limits.
The “ASP.NET Apps\Cache API Turnover Rate” counter shows the number of newly added and removed/expired/trimmed cache items per second. A target for this number depends on the behaviour of your application, but generally it should be fairly low – ideally items should be cached for quite some time before they expire or are removed. A high turnover rate could imply frequent trims, frequent explicit removals in application code, dependencies firing frequently (e.g. a SqlCacheDependency), or a sudden increase in user load (if, for example, you’re caching something per user).
Monitoring the “Process\Private Bytes” counter for the w3wp process (assuming you’re using IIS7) tells you how much memory IIS is using for your ASP.NET application. A cache trim is likely to show up as a sudden drop in consumed bytes, and equally you should be able to see how close it is to memory limits for your configuration.
It is initially a little easy to confuse the cause of a drop in consumed Private Bytes between heavy cache trimming and an Application Pool recycle, so you should also watch the “ASP.NET\Worker Process Restarts” performance counter to ensure you know which happened.
When you add items to the ASP.NET cache you can optionally specify a CacheItemRemovedCallback. This means that when the item is removed from the cache the call-back you’ve specified is called, passing in a CacheItemRemovedReason. This is great for adding some debugging instrumentation – if your item was trimmed due to memory pressure then the reason will be “underused”. Other reasons are Expired (you specified a time limit), Removed (you called Cache.Remove) and DependencyChanged (a file, SQL table, or other dependency setup with a CacheDependency was fired).
If you decide to add some logging using this approach I’d recommend adding a switch that enables or disables setting a call-back, as there is a small overhead in dealing with it.
There are two slightly mysterious sounding cache counters too. I must admit, at first it took me a few minutes to get my head around exactly what these counters meant.
The first is “ASP.NET Apps\Cache % Machine Memory Limit Used”. This is literally the current physical memory usage by the whole machine as a percentage of the configured (or default) maximum physical memory usage. What?! Well, if you have edited the “percentagePhysicalMemoryUsedLimit” setting to 60%, this means your machine can use up to 60% of its physical memory before cache trimming occurs (not a very realistic example I know!). This counter reports the current usage as a percentage of the maximum... so if your machine is using 40% of available physical RAM, and the limit is 60%, this counter would report 66% (40 divided by 60, multiplied by 100 to get a percentage). It is important to note that this is memory consumed across the whole machine – not just by ASP.NET.
The second is “ASP.NET Apps\Cache % Process Memory Limit Used”. This is the total Private Bytes in use by the current ASP.NET process as a percentage of the limit specified in “privateBytesLimit”. So if you set the limit at 400mb, and your process is currently using 350mb, that would be reported as 87.5% (350 divided by 400, multiplied by 100 to get a percentage).
If either of these counters hits 100% ASP.NET will almost immediately trim 50% of your cache by picking the least recently used items... so obviously you don’t want to be hitting these limits often.
Phew. Well I hope that’s useful stuff. The best advice I can give is to do some real sums. Many customers cache data items per user, plus huge lists of reference data that can grow over time. The end result can be caches that are crippled over a fixed concurrent user level, or take a long time to reload large reference data sets when they have been trimmed. It is more than possible to do some rough maths to work out how much memory your application is using for the cache, and how this changes according to user numbers, regions, languages, locations, roles, user types, base offices, or other parameters – and the results can be very illuminating.
Finally, remember that you should always test using hardware, configuration, and simulated user loads that are as close to live as you can possibly afford, as this gives you the best possible chance of catching problems early.