If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.NET Debugging Demos Lab 7: Memory Leak - Review

.NET Debugging Demos Lab 7: Memory Leak - Review

Rate This
  • Comments 18

We have reached the end of the .NET Debugging Demos series.  And we are going to end it with a review of the last memory leak lab.

I am interested to get feedback, either in comments or through email in the contact me section, on what you liked, what you didn't like. If you felt that it was worth your while etc. so I know if there is an interest in possibly doing similar lab series in the future.  It will probably be a while before I would post a new series but if there is interest at least I can start thinking about it.  If that is the case, let me know what you thought was missing from this one so that I can put some good scenarios in the next one.

Previous demos and setup instructions

If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.

Information and setup instructions
Lab 1: Hang
Lab 1: Hang - review
Lab 2: Crash
Lab 2: Crash - review
Lab 3: Memory
Lab 3: Memory - review
Lab 4: High CPU hang
Lab 4: High CPU hang - review
Lab 5: Crash
Lab 5: Crash - review
Lab 6: Memory Leak
Lab 6: Memory Leak - review
Lab 7: Memory Leak

Problem description:

The problem description is very similar to Lab 6.

We have started getting out of memory exceptions on the buggy bits site and we have been able to determine a scenario in which we think we are leaking memory but we can't seem to figure out where the memory is going. 

The leak seems to be occurring on our News page for example

http://localhost/BuggyBits/News.aspx

and we can reproduce it by stress testing.  

It seems like it is leaking just a small bit every time but since it is something that customers look at a lot and over time the process will crash with an out of memory exception.  

Reproduce the issue and gather data:

1. Restart IIS (iisreset)

2. Browse to http://localhost/BuggyBits/News.aspx

3. Set up performance monitoring per Lab 3 and start monitoring the performance

4. Stress the application with tinyget (tinyget -srv:localhost -uri:/BuggyBits/News.aspx -threads:50 -loop:20)

5. After tinyget has finished, get a hangdump with adplus (adplus -hang -pn w3wp.exe -quiet)

6. Stop the performance monitor log

Review the performance monitor log to figure out what we are leaking:

1. Open up the performance monitor log in performance monitor and look at the following counters (set the scale appropriately so that you can see the graphs in the window)

Process\Private Bytes
Process\Virtual Bytes
.NET CLR Memory\# Bytes in all heaps
.NET CLR Memory\# Total committed bytes
.NET CLR Memory\# Total reserved bytes

2. Compare Private Bytes, Virtual Bytes and #Bytes in all heaps

Q: Do the graphs for these 3 counters follow eachother or do they diverge? Based on this, can you tell if the issue we are facing is a virtual bytes leak, a native leak or a .NET leak?

A: All the graphs follow eachother which means that the memory increase on the .NET heap is what is causing the overall memory growth.

Debug the memory dump

1. Open the memory dump, load up the symbols and load sos.dll (see information and setup instructions for more info)

Q: What is the size of the memory dump (on disk)

A: The size on disk in my case is 543 MB, which is pretty close to the amount of private bytes in the process.

2. Run !eeheap -gc and !dumpheap -stat

Q: What is the size of the .NET heap according to !eeheap -gc?

0:000> !eeheap -gc
Number of GC Heaps: 2
------------------------------
Heap 0 (001aa200)
generation 0 starts at 0x031b3f38
generation 1 starts at 0x02fc7b48
generation 2 starts at 0x02eb0038
ephemeral segment allocation context: none
 segment    begin allocated     size            reserved
001bd250 7a733370  7a754b98 0x00021828(137,256) 00004000
001b0798 790d8620  790f7d8c 0x0001f76c(128,876) 00004000
02eb0000 02eb0038  03650ff4 0x007a0fbc(7,999,420) 0385f000
Large object heap starts at 0x0aeb0038
 segment    begin allocated     size            reserved
0aeb0000 0aeb0038  0ce6acd8 0x01fbaca0(33,270,944) 00045000
11b30000 11b30038  13ada208 0x01faa1d0(33,202,640) 00055000
15c00000 15c00038  17baa208 0x01faa1d0(33,202,640) 00055000
19c00000 19c00038  1bbaa218 0x01faa1e0(33,202,656) 00055000
1dd90000 1dd90038  1fd3a218 0x01faa1e0(33,202,656) 00055000
23d90000 23d90038  25d3a218 0x01faa1e0(33,202,656) 00055000
25d90000 25d90038  2640c4d8 0x0067c4a0(6,800,544) 01983000
Heap Size  0xcc6b9d0(214,350,288)
------------------------------
Heap 1 (001ab218)
generation 0 starts at 0x070c5b1c
generation 1 starts at 0x06faf290
generation 2 starts at 0x06eb0038
ephemeral segment allocation context: none
 segment    begin allocated     size            reserved
06eb0000 06eb0038  0763eff4 0x0078efbc(7,925,692) 0386f000
Large object heap starts at 0x0ceb0038
 segment    begin allocated     size            reserved
0ceb0000 0ceb0038  0ee5a1e8 0x01faa1b0(33,202,608) 00055000
13c00000 13c00038  15baa208 0x01faa1d0(33,202,640) 00055000
17c00000 17c00038  19baa218 0x01faa1e0(33,202,656) 00055000
1bd90000 1bd90038  1dd3a218 0x01faa1e0(33,202,656) 00055000
1fd90000 1fd90038  21d3a218 0x01faa1e0(33,202,656) 00055000
21d90000 21d90038  238a6298 0x01b16260(28,402,272) 004e9000
Heap Size  0xc0f7b3c(202,341,180)
------------------------------
GC Heap Size  0x18d6350c(416,691,468)

A: The size of the .net heaps is ~417MB and comparing this to the size of the memory dump we can also see that most of the memory goes to the .net heaps 

Q: Is most of the memory stored on the regular heaps or on the large object heap?

A: From the !eeheap -gc output most of the memory goes to the large object heaps so it appears that we are creating quite a bit of large objects on here.

We saw from performance monitor that we appeared to be leaking .net memory, so the next step is to determine what the memory is used for.

3. Run !dumpheap -stat

0:000> !dumpheap -stat
Statistics:
      MT    Count    TotalSize Class Name
...
0ff61954    1,001      132,132 ASP.masterpage_master
6639e4c0    2,038      146,736 System.Web.Caching.CacheEntry
7a7582d8    8,074      161,480 System.Collections.Specialized.ListDictionary+DictionaryNode
663aa0a0    1,001      172,172 System.Web.HttpRequest
6639b220    9,388      187,760 System.Web.VirtualPath
663a9dac    1,001      188,188 System.Web.HttpContext
663a4adc    1,001      196,196 System.Web.HttpResponse
664245ac      591      196,212 System.Web.UI.HtmlTextWriter+RenderAttribute[]
663c7308    6,006      216,216 System.Web.UI.ControlCollection
663ad7c0    1,001      244,244 System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
7912dae8    3,404      258,168 System.Byte[]
6641f33c    6,006      264,264 System.Web.UI.Control+OccasionalFields
79104368   15,263      366,312 System.Collections.ArrayList
0ff60d2c    1,001      376,376 ASP.news_aspx
7912dd40    4,156      444,688 System.Char[]
79101fe4    8,016      448,896 System.Collections.Hashtable
664140a4    8,189      491,340 System.Web.UI.LiteralControl
7912d9bc    8,019    1,345,416 System.Collections.Hashtable+bucket[]
7912d8f8   25,495    2,448,752 System.Object[]
790fd8c4   56,124    4,122,820 System.String
7912d7c0    1,809  400,474,664 System.Int32[]
Total 303,433 objects, Total size: 416,611,124

 

Q: What types of objects seem to use up most of the memory?

A: out of the approximately 417 MB, 400 MB are locked up in Int32 arrays, so if we could just figure out what is causing these (or the largest of these arrays) to stick around we would probably resolve the issue.

Q: Looking at the 10-20 bottommost object types in !dumpheap -stat, can you see any patterns among the objects? I.e. can they be grouped in any way?
Normally you will be able to see patterns such as many data related objects, many ui related objects, many xml related objects etc. like in this memory investigation

A: Most of the 10-20 bottommost object types are UI or request related, i.e. objects that you would typically see hooked up to an asp.net page or an asp.net request.  I have marked these in maroon.

Q: Looking at the 10-20 bottom most object types in !dumpheap -stat, do the quantities of each of them seem normal or does anything seem out of the ordinary?

A: The most obvious one is the Int32 arrays, but the fact that we have 1001 news.aspx pages is also very curious.  We ran the test with 50*20 (1000) requests, and I had browsed to http://localhost/buggybits/news.aspx once before running the test to start up the w3wp.exe process so it looks like all the requests are still sticking around.  Normally you should expect to see only the requests that are currently executing stick around or perhaps a few more, but definitely not this many.  If I run ~* e !clrstack to see all .net stacks I have no requests currently executing.

4. Dump the large objects using !dumpheap -min 85000

Q: What type of objects are stored on the LOH?

0:000> !dumpheap -min 85000
Using our cache to search the heap.
   Address         MT     Size  Gen
...
235372f8 7912d7c0  400,012    3 System.Int32[] 
23598d98 7912d7c0  400,012    3 System.Int32[] 
235fa838 7912d7c0  400,012    3 System.Int32[] 
2365c2d8 7912d7c0  400,012    3 System.Int32[] 
236bdd78 7912d7c0  400,012    3 System.Int32[] 
2371f818 7912d7c0  400,012    3 System.Int32[] 
237812b8 7912d7c0  400,012    3 System.Int32[] 
237e2d58 7912d7c0  400,012    3 System.Int32[] 
238447f8 7912d7c0  400,012    3 System.Int32[] 
Statistics:
      MT    Count    TotalSize Class Name
7912d7c0    1,001  400,412,012 System.Int32[]
Total 1,001 objects, Total size: 400,412,012

A: Most (or all in this case) LOH objects are Int32 arrays and they all seem to have approximately the same size. What is even more interesting is that there are
1001 of them (same number as our aspx pages) so they could be connected somehow.

5. Take any of the objects on the LOH and run !gcroot <object address> to find out where they are rooted

0:000> !gcroot 236bdd78 
...
DOMAIN(001CCE68):HANDLE(Strong):2c510a8:Root:  06eb6238(System.Threading._TimerCallback)->
  06eb61f0(System.Threading.TimerCallback)->
  06eb52a0(System.Web.Caching.CacheExpires)->
  06eb52c0(System.Object[])->
  06eb5ab4(System.Web.Caching.ExpiresBucket)->
  02f7a698(System.Web.Caching.ExpiresPage[])->
  0761c9b0(System.Web.Caching.ExpiresEntry[])->
  0762a0e0(System.Web.Caching.CacheEntry)->
  0762a0c0(System.Web.Caching.CacheItemRemovedCallback)->
  07628624(ASP.news_aspx)->
  236bdd78(System.Int32[])

GCRoot will check all "roots" in the process and if the object you are looking for is in any of the "root chains" it will list this root chain.

Q: What does DOMAIN(001CCA88):HANDLE(Strong) stand for?

A: Roots exist in two places, there are roots on threads (basically if the object is in use on a stack), and then there is the domain specific roots.  In this case we have Domain(001CCE68):Handle(Strong)... the strong handle indicates that the root object (in this case a TimerCallback) is a static variable in the domain 001CCE68.  You can find out more about the domain by running !dumpdomain 001CCE68

Q: What types of roots would you normally see when running !gcroot and what do they mean?

Type Explanation
ESP ESP=Extended Stack Pointer, Object is in use on a stack
DOMAIN(001CCE68):HANDLE(Strong) Strong reference, Typically a static variable
DOMAIN(001CCE68):HANDLE(WeakLn) Weak Long Handle, A weak reference that is tracked through finalization (can be resurrected) 
DOMAIN(001CCE68):HANDLE(WeakSh) Weak Short Handle, A weak reference, can't be resurrected
DOMAIN(001CCE68):HANDLE(Pinned) Pinned object, pinned at a specific address, can't move around during garbage collection.
DOMAIN(001CCE68):HANDLE(RefCnt) Reference count, referenced as long as the reference count is > 0.

An example of a rootchain would look like this (if we are looking to see why the string at address 02ebf628 is still around)

!gcroot 02ebf628
...
DOMAIN(001CCA88):HANDLE(Strong):2c41188:Root: 02ec05c0(MyNamespace.Person)->
02ec0578(MyNamespace.Address)->
02ebf628(System.String)

In this case we have a strongly rooted (static) object of type MyNamespace.Person at 02ec05c0.  Person has a member variable of type Address at 02ec0578 and Address has a membervariable of type string at 02ebf628, so the reason that the string is still around is because it is linked to an Address that is linked to a Person and that Person is stored as a static variable.  Unless the chain is broken the string will not be available for collection until the application domain is recycled since static objects never go out of scope.

6. Look at the rootchain of the object that you did !gcroot on. 

Q: Why is it sticking around?  How could the root chain be broken?

Note: If you walk it from the bottom up and reach an object type that you don't recognize, look it up on the msdn help to get more info about it in order to understand why there is a link.

A: The Int32[] is a member variable of a news_aspx page, and this in turn is a member variable of a CacheItemRemovedCallback which is referenced by a CacheEntry so as long as the cacheentry exists and the link is not severed the object will stay around.  The question is really what is a CacheItemRemovedCallback and why is the page referenced there...

Let's look at the objects involved in a bit more detail...

The CacheEntry has the cache key "aaf95c8a-6f8a-4def-a945-e94877360508" and the value "<I>New site launched 2008-02-02</I>".  

0:000> !do 0762a0e0
Name: System.Web.Caching.CacheEntry
...
      MT    Field   Offset                 Type VT     Attr    Value Name
790fd8c4  40013ef        4        System.String  0 instance 0762a064 _key
79101e20  40013f0        c          System.Byte  0 instance       34 _bits
79102290  40013f1        8         System.Int32  0 instance 1539970699 _hashCode
790fd0f0  40013f8       10        System.Object  0 instance 06ee9b98 _value
...
6639ebb8  4001401       14 ...g.CacheDependency  0 instance 00000000 _dependency
790fd0f0  4001402       18        System.Object  0 instance 0762a0c0 _onRemovedTargets

0:000> !do 0762a064 
Name: System.String
...
String: aaf95c8a-6f8a-4def-a945-e94877360508
...

0:000> !do 06ee9b98 
Name: System.String
...
String: New site launched 2008-02-02
...

If we look at news.aspx.cs we will find that this is cached in the Page_Load event

    protected void Page_Load(object sender, EventArgs e)
    {
        string news = "<I>New site launched 2008-02-02</I>";
        string key = Guid.NewGuid().ToString();
        Cache.Add(key, news, null, Cache.NoAbsoluteExpiration, new TimeSpan(0, 5, 0), CacheItemPriority.NotRemovable, new CacheItemRemovedCallback(this.RemovedCallback)); 
        lblNews.Text = ((string)Cache[key]);
    }

The next item in the root chain was the CacheItemRemovedCallback and that is linked to the CacheEntry as the _onRemovedTargets,  looking at the code above we see that we do in fact add a new CacheItemRemovedCallback in the Cache.Add method, so that explains why it is linked like this...

0:000> !do 0762a0c0 
Name: System.Web.Caching.CacheItemRemovedCallback
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fd0f0  40000ff        4        System.Object  0 instance 07628624 _target
7910ebc8  4000100        8 ...ection.MethodBase  0 instance 00000000 _methodBase
791016bc  4000101        c        System.IntPtr  0 instance 46383656 _methodPtr
791016bc  4000102       10        System.IntPtr  0 instance        0 _methodPtrAux
790fd0f0  400010c       14        System.Object  0 instance 00000000 _invocationList
791016bc  400010d       18        System.IntPtr  0 instance        0 _invocationCount

The target membervariable points to a news_aspx object.  The _target member variable for a delegate/eventhandler is the object instance that should be used to call the eventhandler.  
Going back to the code above, the method that handles the CacheItemRemovedCallback (i.e. the function to be called when the cached object is removed from cache) is this.RemovedCallback ("new CacheItemRemovedCallback(this.RemovedCallback)").  In other words, our target is _this (which is the nwes_aspx page itself, and the _methodPtr should point to RemovedCallback.  So the page will be linked to the cacheEntry as long as it is still in cache in order to be able to call the RemovedCallback when it goes out of scope.

0:000> !do 07628624 
Name: ASP.news_aspx
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fd8c4  4002144        4        System.String  0 instance 02f1bd08 _id
790fd8c4  4002145        8        System.String  0 instance 00000000 _cachedUniqueID
...
663bd0a0  4000017      16c ...WebControls.Label  0 instance 07629e88 lblNews
7912d7c0  4000018      170       System.Int32[]  0 instance 236bdd78 bits
7910be50  4000019       38       System.Boolean  0   static        1 __initialized
790fd0f0  400001a       1c        System.Object  0   static 02f1d36c __fileDependencies

Finally the Int32[] is linked to the news_aspx object as the membervariable bits.

public partial class News : System.Web.UI.Page
{
    int[] bits = new int[100000]; 
0:000> !do 236bdd78 
Name: System.Int32[]
MethodTable: 7912d7c0
EEClass: 7912d878
Size: 400012(0x61a8c) bytes
Array: Rank 1, Number of elements 100000, Type Int32
Element Type: System.Int32
Fields:
None

Putting it all together and determining the cause of the memory leak

1. Look at the code for news.aspx.cs and use the knowledge you gathered from the debugging session to figure out how the leak was generated.

Q: How long will the objects stick around?

A: Until the object is dropped from cache, in this case 5 minutes

Q: What can you do to avoid this type of eventhandler leak?

A: Any time you hook up an eventhandler to a static object or a long lived object it is better to hook up a static method as the eventhandler rather than an instance method.  That way there is no object that needs to stick around for the eventhandler to still be valid.

In this particular case, everytime we hit the news.aspx page and add the item to cache we will "store" an instance of the news_aspx page in cache and everything it references.  The easy solution here would be to make the RemovedCallback static and call News.RemovedCallback rather than this.RemovedCallback.

I have described this particular issue and another similar one in the following articles:

.NET Memory Leak Case Study: The Event Handlers That Made The Memory Baloon

ASP.NET Quiz Answers: Does Page.Cache leak memory?

You should also be careful to not store web controls or any other objects in cache that may have references that you are not aware of.  Storing a web control in cache for example will also store a page in cache since the web control usually has a _parent property that points to the aspx page.

Laters everyone,

Tess

  • according to what you say above: "A: Any time you hook up an eventhandler to a static object or a long lived object it is better to hook up a static method as the eventhandler rather than an instance method.  That way there is no object that needs to stick around for the eventhandler to still be valid.

    In this particular case, everytime we hit the news.aspx page and add the item to cache we will "store" an instance of the news_aspx page in cache and everything it references.  The easy solution here would be to make the RemovedCallback static and call News.RemovedCallback rather than this.RemovedCallback.

    "

    The following code posted in MSDN it is wrong?

    public void AddItemToCache(Object sender, EventArgs e) {

       itemRemoved = false;

       onRemove = new CacheItemRemovedCallback(this.RemovedCallback);

       if (Cache["Key1"] == null)

         Cache.Add("Key1", "Value 1", null, DateTime.Now.AddSeconds(60), Cache.NoSlidingExpiration, CacheItemPriority.High, onRemove);

    }

  • Thats correct elcarlos14,  

    If you look at the bottom of http://msdn.microsoft.com/en-us/library/system.web.caching.cacheitemremovedcallback.aspx  you will note my comment at the bottom explaining the issue.

  • Hi Tess,

    your labs are really greats and help me to discover and solve issues found in my applications.

    I would ask you if you have some triks to suggest me in order to analyze a large memory dump file of about 1GB.

    The main problem I'm facing out is due to the time taken by "dumpheap -mt" command to analyze and return about 765k MT objects.

Page 2 of 2 (18 items) 12
Leave a Comment
  • Please add 8 and 7 and type the answer here:
  • Post