If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

ASP.NET Quiz Answers: Does Page.Cache leak memory?

ASP.NET Quiz Answers: Does Page.Cache leak memory?

  • Comments 14

A few days ago I posted a question I had gotten on email (look here for complete post):

 

"We use Page.Cache to store temporary data, but we have recently discovered that it causes high memory consumption. The bad thing is that the memory never goes down even though the cache items have expired, and we suspect a possible memory leak in its implementation.

We have created this simple page:

protected void Page_Load(object sender, EventArgs e){
     this.Page.Cache.Add(Guid.NewGuid().ToString(), Guid.NewGuid().ToString(), null, DateTime.MaxValue, TimeSpan.FromMinutes(1),
                                       CacheItemPriority.NotRemovable, new CacheItemRemovedCallback(this.OnRemoved));
}

public void OnRemoved(string key, object value, CacheItemRemovedReason r)
{
          value = null;
}

Which we stress with ACT (Application Center Test) for 5 minutes. Memory usage peaks at 450 MB, and after some time it decreases to 253 MB, but never goes down completely even though we waited for 10 minutes after the stress test. Our expectation is that the memory should go down to about 50-60 MB.

The question is does the above scenario fall into the category of memory leaks?"

I had thoroughly enjoyed the quizzes that Rico Mariani and Mike Stall had on their blogs, so I did a copy cat... and I have to say I really liked the results, both because I thought the answers were really good and contained details that I would probably have missed, and also because it brought up some new questions that I didn't think off. 

If you haven't looked at the quiz yet, I would recommend you do, and especially the comments...

Answers/Summary:

Before I start, I want to say that I am not so naive that I claim that our products are completely bug free, no software ever is... but when I get a proof and don't agree with the results (especially when it is such a commonly used feature as Cache) I get as suspicious as Dr. House:) and start scrutinizing the test.  This is partially why I brought this question up as a post to begin with... I.e. because I think it is important that in any situation where you do a stress test or a proof of concept it is very important that you know the underlying platform in order to interpret the results correctly.

As I mentioned there were a lot of good comments on the quiz, as well as many good questions so I will divide the points into different sections. 

1. The CacheItemRemovedCallback
2. The stress test and garbage collection
3. Sliding Expiration and Absolute Expiration 
4. CacheItemPriority.NotRemovable
5. Page.Cache vs. Cache vs. Application
6. Real-life scenario vs. Stress test for Proof
7. A small comment on timers

The CacheItemRemovedCallback

The first thing I noticed when looking at the results given was that 450 MB seemed like an insane amount for this tests.  Even if nothing was removed from cache, the items stored in cache (GUID's) are relatively small and would never amount to that much, so something smells very fishy.  As Matt correctly pointed out in this comment, we are running into an issue where we are connecting an instance eventhandler with a cache object, which effectively causes us to cache the whole page and all its contents. 

To get more info about this see my earlier post about "the eventhandlers that made the memory balloon".  In this particular case it is not necessary to set the value to null. The GUID will be un-rooted when it is removed.  If you have a situation where you do need to dispose the object that is stored in cache, you would have to use a static eventhandler of some kind.

Performing this minor change we can get the same stress test to peak at 50 MB instead of 450, which is a major improvement. But even still the objects are not removed from memory when the cache expires... so on to the next point...

The stress test and garbage collection

When doing a stress test like this it is very important to understand a few things.  The first is how to interpret the results and the second is to understand the platform we are working with and the behavior of the garbage collector.

In this case the data that we looked at was memory usage for the process in taskmanager, alternatively private bytes in performance monitor.  The question really doesn't tell, but in my private tests I was looking at private bytes in performance monitor. 

What I would be really interested in is
a) are the objects removed properly from cache?
b) does the size of the managed heaps decrease (i.e. are the .net objects stored in cache actually collected)? and
c) what happens with the size of the process and what will happen if we run the test a second time, i.e. will it increase by the same amount or will memory be reused etc.

So I added the counter ASP.NET Apps v2.0.50727/Cache Total Entries, and saw that it increased gradually with the test, and then every so often I got a dip that created a sawtooth pattern in the Cache Total Entries, indicating that cache entries were being released and new ones came in.  Then i stopped the test and waited, and after 1 minute (approx.) there was a huge dip, and then after another minute there was another huge dip, and after about 5 minutes my Cache Total Entries count was down to 0. 

Conclusion #1.  My objects are no longer rooted by the cache and should be available for garbage collection and removal, but... they are never collected, why oh why?.

Petros pointed out that this is because after the stress test, we had no activity, so nothing caused a GC, and thus nothing will get collected even if it is available for collection.  This is the most common mistake when performing a stress test (see my post about why un-rooted objects are not garbage collected here for more info)

 So, what can we do? Well, if i am trying to stress a leak, and want to verify if it really is a leak, i usually introduce a page with the following code

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

And run this after the stress test.  Caution!  I wouldn't recommend calling this in production, just to clean up memory unless you have a really good reason and can make sure that it is not called so often that it throws the GC's own mechanism for determining generation limits etc. and so that it doesn't cause high CPU issues.  Think twice and three and four times before putting this in production.  It is great for after a stress test though... to simulate the next time a full GC comes in without having to create allocations to cause it.

With this in place I could see that my .net CLR Memory/Bytes in all heaps, went down all the way to 1.5 MB after the collection, which means that all the objects I cached etc. were gone, and everything was successful, so we don't have a leak in the Cache mechanism. Yay!!!

Conclusion #2: The objects would have been gone if a full collection would have occurred.

But, wait a minute... my private bytes didn't go down as much as I would have thought... hmmm...

JCornwell brought up an interesting point... Even if we do collect and the bytes in all heaps are all nicely down to a minimum, we don't necessarily decommitt all memory and return it to the OS.  If we had to do this all the time performance would decrease significantly... but... it is not really a problem for us, it is just a food for thought when looking at the results.  See, If I run the same test again, we will reuse this memory, so it is by no means leaked...

Conclusion #3: It is important to know how the garbage collector works and what the important counters and values are to correctly interpret a result.

Sliding Expiration and Absolute Expiration 

No one touched on this, but I wanted to bring it up because it was something that caught my eye when I saw the sample.  In this case we had a sliding expiration of 1 minute and an absolute expiration of DateTime.MaxValue... 

Ok, I have to admit, I don't know all the method signatures and things by hand but just seeing the sample I was a bit confused about if the cache items should expire after 1 minute or after whatever insanely long amount of time DateTime.MaxValue might be.  And I got even more confused when I looked it up in MSDN in order to see which took precedence, and found that I was supposed to have gotten an argument exception if i used both a sliding and absolute expiration, but I clearly didn't get an exception...

I even got so perturbed that I had to hook up windbg (surprise, surprise) and make super sure that I didn't get an exception and even then I didn't trust it...

so I went to the code and found that MaxValue = Cache.NoAbsoluteExpiration... I later found out that this was documented in MSDN:) but I had more fun finding it out using reflector.

Soooo... we were using the sliding expiration but I wanted to bring it up, since it confused me, and perhaps would confuse other people too...

CacheItemPriority.NotRemovable

This one caused a bit of discussion when Scott said he thought that it is better to not use NotRemovable and build in logic to re-populate the cache when needed.  I think it is a good point, but that NotRemovable has it's benefits too in some cases.

Just to clarify. NotRemovable means that the item will be available for collection when the cache item has expired but not before. If the cache item is removable the cache item is eligible for removal before its expiration if memory usage is high.

One specific location where it has a benefit is in ASP.NET's session implementation.  As you are probably all aware by now (from my ramblings in previous posts), InProc session state is stored in cache.  The session objects are stored with a CacheItemPriority of NotRemovable since there is no way to repopulate these if they are deleted.   I believe you should choose CacheItemPriority based on the cost and possibility of re-populating the cache.  But do feel free to disagree with me:)

Page.Cache vs. Cache vs. Application

What I really wanted to get out of the question about the difference between Page.Cache and Cache was that there really is no difference. They are pointing to the same object.  The cache is application specific (appdomain specific), and in the event of an appdomain recycle it is emptied out.

The Application is very similar to the Cache, in that it is a static object with a dictionary like structure.  This is saved as a legacy from ASP, and I have yet to find a reason to use it instead of just using Cache.   

Real-life scenario vs. Stress test for repro.

A few people mentioned that the test didn't seem realistic. I agree, but I also don't think the intent of the sample above was to be realistic, rather I think the person who wrote the email wrote the sample this way to quickly and easily determine if there was a memory leak, since it is a lot faster and cleaner than trying to repro with the full application. 

A small comment on timers

Finally, I just want to comment on a timers issue that I have mentioned before.  

If you run on 1.1. and you see that your cache items aren't expiring (Cache Total Entries) you may be running into this problem http://support.microsoft.com/kb/900822/en-us where timers are not firing properly. But that is not the case in my stress test.

Laters y'all.





  • Very nice Follow up. Keep em coming.

    -blake

  •  Announcing
     the Windows Mobile Virtual User Group Meeting [Via: trobbins ]
     Refactoring
    ...
  • Excellent. Thanks for this.
  • the 2.0 Framework is also facing the same issue.

  • Any version of the framework will.  It is not really a problem in the framework as such but a problem with hooking up eventhandlers and delegate to static/cache objects.

  • In August I wrote about how you could cause a nasty high memory situation by using CacheItemRemovedCallbacks

  • Can you advise what objects are stored in the asp.net "internal cache". I.e. the performance counter "Cache Total Entries" is a total of the counter "Cache API entries" and some other entries - "internal asp.net" cache entries.

    In our application, we have about 1000 entries in the API cache - some compiled xslt transforms (with 3 hour expiries) and some other smaller stuff, but our "Cache Total Entries" is peaking at about 15000.

    If we turn off "disableMemoryCollection" in machine/web.config then scavenging decides to cull everything in the cache.

    If we turn it on, privatebytes bloats up until it runs out of memory and we have to recycle our web app's worker process.

    It's all pointing to a bug in .net 2's implementation of cache scavenging.

    Any ideas what .net uses the cache for internally?

  • I was having a similar issue and stumbled across your quiz and consequently these answers.  The articles and comments were all a great read and I did learn a lot from this.  So thank you.

    My issue was that I was storing objects in cache, which, when expired, still consumed memory.

    I was under the assumption that the framework would automatically call the Dispose() method on any object which implemented IDisposable, when the object was expired from cache.

    This seems like a legitimate expectation, doesn't it?  Is there some reason why this isn't done?

    I ended up solving my problem by implementing a Shared sub in a global class, and passing AddressOf -sub- into the Cache.Insert, as the onRemove callback function.

    I would have assumed that we didn't have to go to all this trouble, if the object implemented the IDisposable interface.  I would have assumed that the framework would have been responsible enough to dispose of the object itself when the object was expired from cache.

    The work-around did the trick, but don't you think the framework should have handled this?

    I'd be interested to know if not, why?

  • I dont have enough information about why it was implemented this way to make any type of judgement on that, and I can see how people might jump to the conclusion that IDisposable objects are disposed/closed when thrown out of cache.

    However, if you think about it from a broader perspective, there really is no difference between an object going out of scope when a method ends, or when it gets thrown out of cache. For example if you use a connection in a method you would have to manually close it or use a using statement to get it auto disposed/closed.  If you don't it will stick around until finalized...

    I could see one problem with autodisposing items when they are no longer cached, if for example someone else has a reference to it you would end up with a situation where you would have a reference to an already disposed object, without knowing it... Something like that might be possible to check for, but there is always a tradeoff between perf and getting all the frills.

    Don't get me wrong, I see your point but I think that very careful consideration would have to be taken before making such a decision. Since off the top of my head I can already come up with one issue with it, I personally think the current solution, where the developer is given a choice to do something with the object when it goes out of scope may just be the cleanest one...

    Anyways, just my 2 cent, in reality I don't know enough about the decisions around this to talk intelligently about it.

  • I was helping a colleague out with an OOM (OutOfMemory) situation he was dealing with. Problem description:

  • Can you please reply to the comment "Friday, March 02, 2007 9:30 AM by dasveed"?  I have noticed the same types of numbers in my application.

  • Regarding dasveed's comment.

    The Cache Total entries includes both Cache API entries i.e. Cache.Add(...) or Cache["..."] = ...  , output cache, sessions etc.

    There are two counters that you can use to determine when the cache will be scavenged.

    Cache % Machine Memory Limit used = "The amount of physical memory used by the machine divided by the physical memory limit for the cache, as a percentage.  When this reaches 100%, half of the cache entries will be forcibly removed.  The __Total__ instance is the average of all instances, and therefore cannot be used to determine when cache entries will be forcibly removed."

    and

    Cache % Process Memory Limit Used = "The value of private bytes for the worker process divided by the private bytes memory limit for the cache, as a percentage.  When this reaches 100%, half of the cache entries will be forcibly removed.  The __Total__ instance is the average of all instances, and therefore cannot be used to determine when cache entries will be forcibly removed."

    You can partially decide when cache will be scavenged by setting the Memory limit settings on the application pool.  These values are used to determine if the cache should be scavenged or not. At that point it will scavenge as many cached items as it needs to recover memory.  

    To make sure a particluar cached object stays cached until it expires you can set it to non-removable (which is done for the session for example).  All other cached objects should be seen as items that could potentially be removed if needed and should be tested for their existence when used.

    Re. the 2nd question about how the cache is stored internally.  Simplified, the cache is a static variable with a list of cacheitems referencing the actual cached items.  

    In-proc session state uses the cache by caching a session item per session with a dictionary of session vars.  The cached session is set to non-removable and has the expiration time set to a sliding expiration of for example 20 mins if that is the session timeout.

    hope that helps

Page 1 of 1 (14 items)
Leave a Comment
  • Please add 5 and 4 and type the answer here:
  • Post