If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.NET Debugging Demos Lab 7: Memory Leak - Review

.NET Debugging Demos Lab 7: Memory Leak - Review

Rate This
  • Comments 18

We have reached the end of the .NET Debugging Demos series.  And we are going to end it with a review of the last memory leak lab.

I am interested to get feedback, either in comments or through email in the contact me section, on what you liked, what you didn't like. If you felt that it was worth your while etc. so I know if there is an interest in possibly doing similar lab series in the future.  It will probably be a while before I would post a new series but if there is interest at least I can start thinking about it.  If that is the case, let me know what you thought was missing from this one so that I can put some good scenarios in the next one.

Previous demos and setup instructions

If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.

Information and setup instructions
Lab 1: Hang
Lab 1: Hang - review
Lab 2: Crash
Lab 2: Crash - review
Lab 3: Memory
Lab 3: Memory - review
Lab 4: High CPU hang
Lab 4: High CPU hang - review
Lab 5: Crash
Lab 5: Crash - review
Lab 6: Memory Leak
Lab 6: Memory Leak - review
Lab 7: Memory Leak

Problem description:

The problem description is very similar to Lab 6.

We have started getting out of memory exceptions on the buggy bits site and we have been able to determine a scenario in which we think we are leaking memory but we can't seem to figure out where the memory is going. 

The leak seems to be occurring on our News page for example

http://localhost/BuggyBits/News.aspx

and we can reproduce it by stress testing.  

It seems like it is leaking just a small bit every time but since it is something that customers look at a lot and over time the process will crash with an out of memory exception.  

Reproduce the issue and gather data:

1. Restart IIS (iisreset)

2. Browse to http://localhost/BuggyBits/News.aspx

3. Set up performance monitoring per Lab 3 and start monitoring the performance

4. Stress the application with tinyget (tinyget -srv:localhost -uri:/BuggyBits/News.aspx -threads:50 -loop:20)

5. After tinyget has finished, get a hangdump with adplus (adplus -hang -pn w3wp.exe -quiet)

6. Stop the performance monitor log

Review the performance monitor log to figure out what we are leaking:

1. Open up the performance monitor log in performance monitor and look at the following counters (set the scale appropriately so that you can see the graphs in the window)

Process\Private Bytes
Process\Virtual Bytes
.NET CLR Memory\# Bytes in all heaps
.NET CLR Memory\# Total committed bytes
.NET CLR Memory\# Total reserved bytes

2. Compare Private Bytes, Virtual Bytes and #Bytes in all heaps

Q: Do the graphs for these 3 counters follow eachother or do they diverge? Based on this, can you tell if the issue we are facing is a virtual bytes leak, a native leak or a .NET leak?

A: All the graphs follow eachother which means that the memory increase on the .NET heap is what is causing the overall memory growth.

Debug the memory dump

1. Open the memory dump, load up the symbols and load sos.dll (see information and setup instructions for more info)

Q: What is the size of the memory dump (on disk)

A: The size on disk in my case is 543 MB, which is pretty close to the amount of private bytes in the process.

2. Run !eeheap -gc and !dumpheap -stat

Q: What is the size of the .NET heap according to !eeheap -gc?

0:000> !eeheap -gc
Number of GC Heaps: 2
------------------------------
Heap 0 (001aa200)
generation 0 starts at 0x031b3f38
generation 1 starts at 0x02fc7b48
generation 2 starts at 0x02eb0038
ephemeral segment allocation context: none
 segment    begin allocated     size            reserved
001bd250 7a733370  7a754b98 0x00021828(137,256) 00004000
001b0798 790d8620  790f7d8c 0x0001f76c(128,876) 00004000
02eb0000 02eb0038  03650ff4 0x007a0fbc(7,999,420) 0385f000
Large object heap starts at 0x0aeb0038
 segment    begin allocated     size            reserved
0aeb0000 0aeb0038  0ce6acd8 0x01fbaca0(33,270,944) 00045000
11b30000 11b30038  13ada208 0x01faa1d0(33,202,640) 00055000
15c00000 15c00038  17baa208 0x01faa1d0(33,202,640) 00055000
19c00000 19c00038  1bbaa218 0x01faa1e0(33,202,656) 00055000
1dd90000 1dd90038  1fd3a218 0x01faa1e0(33,202,656) 00055000
23d90000 23d90038  25d3a218 0x01faa1e0(33,202,656) 00055000
25d90000 25d90038  2640c4d8 0x0067c4a0(6,800,544) 01983000
Heap Size  0xcc6b9d0(214,350,288)
------------------------------
Heap 1 (001ab218)
generation 0 starts at 0x070c5b1c
generation 1 starts at 0x06faf290
generation 2 starts at 0x06eb0038
ephemeral segment allocation context: none
 segment    begin allocated     size            reserved
06eb0000 06eb0038  0763eff4 0x0078efbc(7,925,692) 0386f000
Large object heap starts at 0x0ceb0038
 segment    begin allocated     size            reserved
0ceb0000 0ceb0038  0ee5a1e8 0x01faa1b0(33,202,608) 00055000
13c00000 13c00038  15baa208 0x01faa1d0(33,202,640) 00055000
17c00000 17c00038  19baa218 0x01faa1e0(33,202,656) 00055000
1bd90000 1bd90038  1dd3a218 0x01faa1e0(33,202,656) 00055000
1fd90000 1fd90038  21d3a218 0x01faa1e0(33,202,656) 00055000
21d90000 21d90038  238a6298 0x01b16260(28,402,272) 004e9000
Heap Size  0xc0f7b3c(202,341,180)
------------------------------
GC Heap Size  0x18d6350c(416,691,468)

A: The size of the .net heaps is ~417MB and comparing this to the size of the memory dump we can also see that most of the memory goes to the .net heaps 

Q: Is most of the memory stored on the regular heaps or on the large object heap?

A: From the !eeheap -gc output most of the memory goes to the large object heaps so it appears that we are creating quite a bit of large objects on here.

We saw from performance monitor that we appeared to be leaking .net memory, so the next step is to determine what the memory is used for.

3. Run !dumpheap -stat

0:000> !dumpheap -stat
Statistics:
      MT    Count    TotalSize Class Name
...
0ff61954    1,001      132,132 ASP.masterpage_master
6639e4c0    2,038      146,736 System.Web.Caching.CacheEntry
7a7582d8    8,074      161,480 System.Collections.Specialized.ListDictionary+DictionaryNode
663aa0a0    1,001      172,172 System.Web.HttpRequest
6639b220    9,388      187,760 System.Web.VirtualPath
663a9dac    1,001      188,188 System.Web.HttpContext
663a4adc    1,001      196,196 System.Web.HttpResponse
664245ac      591      196,212 System.Web.UI.HtmlTextWriter+RenderAttribute[]
663c7308    6,006      216,216 System.Web.UI.ControlCollection
663ad7c0    1,001      244,244 System.Web.Hosting.ISAPIWorkerRequestInProcForIIS6
7912dae8    3,404      258,168 System.Byte[]
6641f33c    6,006      264,264 System.Web.UI.Control+OccasionalFields
79104368   15,263      366,312 System.Collections.ArrayList
0ff60d2c    1,001      376,376 ASP.news_aspx
7912dd40    4,156      444,688 System.Char[]
79101fe4    8,016      448,896 System.Collections.Hashtable
664140a4    8,189      491,340 System.Web.UI.LiteralControl
7912d9bc    8,019    1,345,416 System.Collections.Hashtable+bucket[]
7912d8f8   25,495    2,448,752 System.Object[]
790fd8c4   56,124    4,122,820 System.String
7912d7c0    1,809  400,474,664 System.Int32[]
Total 303,433 objects, Total size: 416,611,124

 

Q: What types of objects seem to use up most of the memory?

A: out of the approximately 417 MB, 400 MB are locked up in Int32 arrays, so if we could just figure out what is causing these (or the largest of these arrays) to stick around we would probably resolve the issue.

Q: Looking at the 10-20 bottommost object types in !dumpheap -stat, can you see any patterns among the objects? I.e. can they be grouped in any way?
Normally you will be able to see patterns such as many data related objects, many ui related objects, many xml related objects etc. like in this memory investigation

A: Most of the 10-20 bottommost object types are UI or request related, i.e. objects that you would typically see hooked up to an asp.net page or an asp.net request.  I have marked these in maroon.

Q: Looking at the 10-20 bottom most object types in !dumpheap -stat, do the quantities of each of them seem normal or does anything seem out of the ordinary?

A: The most obvious one is the Int32 arrays, but the fact that we have 1001 news.aspx pages is also very curious.  We ran the test with 50*20 (1000) requests, and I had browsed to http://localhost/buggybits/news.aspx once before running the test to start up the w3wp.exe process so it looks like all the requests are still sticking around.  Normally you should expect to see only the requests that are currently executing stick around or perhaps a few more, but definitely not this many.  If I run ~* e !clrstack to see all .net stacks I have no requests currently executing.

4. Dump the large objects using !dumpheap -min 85000

Q: What type of objects are stored on the LOH?

0:000> !dumpheap -min 85000
Using our cache to search the heap.
   Address         MT     Size  Gen
...
235372f8 7912d7c0  400,012    3 System.Int32[] 
23598d98 7912d7c0  400,012    3 System.Int32[] 
235fa838 7912d7c0  400,012    3 System.Int32[] 
2365c2d8 7912d7c0  400,012    3 System.Int32[] 
236bdd78 7912d7c0  400,012    3 System.Int32[] 
2371f818 7912d7c0  400,012    3 System.Int32[] 
237812b8 7912d7c0  400,012    3 System.Int32[] 
237e2d58 7912d7c0  400,012    3 System.Int32[] 
238447f8 7912d7c0  400,012    3 System.Int32[] 
Statistics:
      MT    Count    TotalSize Class Name
7912d7c0    1,001  400,412,012 System.Int32[]
Total 1,001 objects, Total size: 400,412,012

A: Most (or all in this case) LOH objects are Int32 arrays and they all seem to have approximately the same size. What is even more interesting is that there are
1001 of them (same number as our aspx pages) so they could be connected somehow.

5. Take any of the objects on the LOH and run !gcroot <object address> to find out where they are rooted

0:000> !gcroot 236bdd78 
...
DOMAIN(001CCE68):HANDLE(Strong):2c510a8:Root:  06eb6238(System.Threading._TimerCallback)->
  06eb61f0(System.Threading.TimerCallback)->
  06eb52a0(System.Web.Caching.CacheExpires)->
  06eb52c0(System.Object[])->
  06eb5ab4(System.Web.Caching.ExpiresBucket)->
  02f7a698(System.Web.Caching.ExpiresPage[])->
  0761c9b0(System.Web.Caching.ExpiresEntry[])->
  0762a0e0(System.Web.Caching.CacheEntry)->
  0762a0c0(System.Web.Caching.CacheItemRemovedCallback)->
  07628624(ASP.news_aspx)->
  236bdd78(System.Int32[])

GCRoot will check all "roots" in the process and if the object you are looking for is in any of the "root chains" it will list this root chain.

Q: What does DOMAIN(001CCA88):HANDLE(Strong) stand for?

A: Roots exist in two places, there are roots on threads (basically if the object is in use on a stack), and then there is the domain specific roots.  In this case we have Domain(001CCE68):Handle(Strong)... the strong handle indicates that the root object (in this case a TimerCallback) is a static variable in the domain 001CCE68.  You can find out more about the domain by running !dumpdomain 001CCE68

Q: What types of roots would you normally see when running !gcroot and what do they mean?

Type Explanation
ESP ESP=Extended Stack Pointer, Object is in use on a stack
DOMAIN(001CCE68):HANDLE(Strong) Strong reference, Typically a static variable
DOMAIN(001CCE68):HANDLE(WeakLn) Weak Long Handle, A weak reference that is tracked through finalization (can be resurrected) 
DOMAIN(001CCE68):HANDLE(WeakSh) Weak Short Handle, A weak reference, can't be resurrected
DOMAIN(001CCE68):HANDLE(Pinned) Pinned object, pinned at a specific address, can't move around during garbage collection.
DOMAIN(001CCE68):HANDLE(RefCnt) Reference count, referenced as long as the reference count is > 0.

An example of a rootchain would look like this (if we are looking to see why the string at address 02ebf628 is still around)

!gcroot 02ebf628
...
DOMAIN(001CCA88):HANDLE(Strong):2c41188:Root: 02ec05c0(MyNamespace.Person)->
02ec0578(MyNamespace.Address)->
02ebf628(System.String)

In this case we have a strongly rooted (static) object of type MyNamespace.Person at 02ec05c0.  Person has a member variable of type Address at 02ec0578 and Address has a membervariable of type string at 02ebf628, so the reason that the string is still around is because it is linked to an Address that is linked to a Person and that Person is stored as a static variable.  Unless the chain is broken the string will not be available for collection until the application domain is recycled since static objects never go out of scope.

6. Look at the rootchain of the object that you did !gcroot on. 

Q: Why is it sticking around?  How could the root chain be broken?

Note: If you walk it from the bottom up and reach an object type that you don't recognize, look it up on the msdn help to get more info about it in order to understand why there is a link.

A: The Int32[] is a member variable of a news_aspx page, and this in turn is a member variable of a CacheItemRemovedCallback which is referenced by a CacheEntry so as long as the cacheentry exists and the link is not severed the object will stay around.  The question is really what is a CacheItemRemovedCallback and why is the page referenced there...

Let's look at the objects involved in a bit more detail...

The CacheEntry has the cache key "aaf95c8a-6f8a-4def-a945-e94877360508" and the value "<I>New site launched 2008-02-02</I>".  

0:000> !do 0762a0e0
Name: System.Web.Caching.CacheEntry
...
      MT    Field   Offset                 Type VT     Attr    Value Name
790fd8c4  40013ef        4        System.String  0 instance 0762a064 _key
79101e20  40013f0        c          System.Byte  0 instance       34 _bits
79102290  40013f1        8         System.Int32  0 instance 1539970699 _hashCode
790fd0f0  40013f8       10        System.Object  0 instance 06ee9b98 _value
...
6639ebb8  4001401       14 ...g.CacheDependency  0 instance 00000000 _dependency
790fd0f0  4001402       18        System.Object  0 instance 0762a0c0 _onRemovedTargets

0:000> !do 0762a064 
Name: System.String
...
String: aaf95c8a-6f8a-4def-a945-e94877360508
...

0:000> !do 06ee9b98 
Name: System.String
...
String: New site launched 2008-02-02
...

If we look at news.aspx.cs we will find that this is cached in the Page_Load event

    protected void Page_Load(object sender, EventArgs e)
    {
        string news = "<I>New site launched 2008-02-02</I>";
        string key = Guid.NewGuid().ToString();
        Cache.Add(key, news, null, Cache.NoAbsoluteExpiration, new TimeSpan(0, 5, 0), CacheItemPriority.NotRemovable, new CacheItemRemovedCallback(this.RemovedCallback)); 
        lblNews.Text = ((string)Cache[key]);
    }

The next item in the root chain was the CacheItemRemovedCallback and that is linked to the CacheEntry as the _onRemovedTargets,  looking at the code above we see that we do in fact add a new CacheItemRemovedCallback in the Cache.Add method, so that explains why it is linked like this...

0:000> !do 0762a0c0 
Name: System.Web.Caching.CacheItemRemovedCallback
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fd0f0  40000ff        4        System.Object  0 instance 07628624 _target
7910ebc8  4000100        8 ...ection.MethodBase  0 instance 00000000 _methodBase
791016bc  4000101        c        System.IntPtr  0 instance 46383656 _methodPtr
791016bc  4000102       10        System.IntPtr  0 instance        0 _methodPtrAux
790fd0f0  400010c       14        System.Object  0 instance 00000000 _invocationList
791016bc  400010d       18        System.IntPtr  0 instance        0 _invocationCount

The target membervariable points to a news_aspx object.  The _target member variable for a delegate/eventhandler is the object instance that should be used to call the eventhandler.  
Going back to the code above, the method that handles the CacheItemRemovedCallback (i.e. the function to be called when the cached object is removed from cache) is this.RemovedCallback ("new CacheItemRemovedCallback(this.RemovedCallback)").  In other words, our target is _this (which is the nwes_aspx page itself, and the _methodPtr should point to RemovedCallback.  So the page will be linked to the cacheEntry as long as it is still in cache in order to be able to call the RemovedCallback when it goes out of scope.

0:000> !do 07628624 
Name: ASP.news_aspx
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fd8c4  4002144        4        System.String  0 instance 02f1bd08 _id
790fd8c4  4002145        8        System.String  0 instance 00000000 _cachedUniqueID
...
663bd0a0  4000017      16c ...WebControls.Label  0 instance 07629e88 lblNews
7912d7c0  4000018      170       System.Int32[]  0 instance 236bdd78 bits
7910be50  4000019       38       System.Boolean  0   static        1 __initialized
790fd0f0  400001a       1c        System.Object  0   static 02f1d36c __fileDependencies

Finally the Int32[] is linked to the news_aspx object as the membervariable bits.

public partial class News : System.Web.UI.Page
{
    int[] bits = new int[100000]; 
0:000> !do 236bdd78 
Name: System.Int32[]
MethodTable: 7912d7c0
EEClass: 7912d878
Size: 400012(0x61a8c) bytes
Array: Rank 1, Number of elements 100000, Type Int32
Element Type: System.Int32
Fields:
None

Putting it all together and determining the cause of the memory leak

1. Look at the code for news.aspx.cs and use the knowledge you gathered from the debugging session to figure out how the leak was generated.

Q: How long will the objects stick around?

A: Until the object is dropped from cache, in this case 5 minutes

Q: What can you do to avoid this type of eventhandler leak?

A: Any time you hook up an eventhandler to a static object or a long lived object it is better to hook up a static method as the eventhandler rather than an instance method.  That way there is no object that needs to stick around for the eventhandler to still be valid.

In this particular case, everytime we hit the news.aspx page and add the item to cache we will "store" an instance of the news_aspx page in cache and everything it references.  The easy solution here would be to make the RemovedCallback static and call News.RemovedCallback rather than this.RemovedCallback.

I have described this particular issue and another similar one in the following articles:

.NET Memory Leak Case Study: The Event Handlers That Made The Memory Baloon

ASP.NET Quiz Answers: Does Page.Cache leak memory?

You should also be careful to not store web controls or any other objects in cache that may have references that you are not aware of.  Storing a web control in cache for example will also store a page in cache since the web control usually has a _parent property that points to the aspx page.

Laters everyone,

Tess

  • PingBack from http://blogs.msdn.com/tess/pages/net-debugging-demos-information-and-setup-instructions.aspx

  • Thanks for the series of posts on debugging.

    I have more insight in this than before.

    Josh Coswell

    http://riverasp.net

  • Tess,

    Thanks for the great post! Just one question:

    This example works well because the cache key is always different (because it's a new guid each time). What would happen if the cache key were a hard-coded string? If the RemovedCallback method was static, wouldn't it get called once (for each time the handler was added) since cache item is added in the page load? Here's a pseudo-code example of what I'm talking about:

    private static void RemoveCallback(){}

    //Request 1

    CacheItem.Removed += RemoveCallback;

    //Request 2

    CacheItem.Removed += RemoveCallback;

    //Cache Item Expires

    //Does RemoveCallback is called twice???

    Thanks,

    -Mark

  • Thanks very much for the outstanding series on debugging .NET applications. The scenarios covered, troubleshooting techniques and the organization of content were excellent . Your labs along with "Debugging .NET 2.0 Applications" are a must read for .NET developers.  It would be interesting to see some more labs on debugging threading related issues like deadlocks & race conditions. Hope to see more great articles from you. Thanks again!

  • Hi Mark,

    Good question.

    There are two different answers to your question...

    1.

    if you just do CacheItem.Removed += RemovedCallback in a loop or keep doing it on each request you would add many eventhandlers to one CacheItem.Removed and then all the pages associated with those eventhandlers would stick around.

    2. if you do Cache.Add("myKey", ..., new CacheItemRemovedCallback...)  where "myKey" stays the same everytime only the last news_aspx page would have to stick around.  Why? because if you do Cache.Add on a key that already exists, you will remove the old value from cache and add the new one, so the old one no longer exists, and so the onRemove target for the old one has no reason to stick around anymore.

  • ASP.NET Client Side Templating with jQuery [Via: Rick Strahl ] WPF Unwanted Multiple ClientBin Sub...

  • During our ASP.NET debugging chat there were many questions around the GC and the different generations.

  • Whats the best way to debug IE memory leak? It is such a pain when you building AJAX application that work for IE. And there are no official tool from Microsoft that does it. Even the new IE 8 development tool. Any suggestions?

  • Dave,

    You can use the same techniques on inetinfo.exe as with other processes.  I.e. debugging memory leaks with debug diag or windbg.

  • Thank you for all the Great Posts and the Labs...Its Fun to learn Debugging from your posts... :). Wish to see more n more, over the time to come....

  • Tess,

    I've just finished your Labs series, it is one of the most valuable information source I've ever seen, thank you for your efforts.

    Do you plan to have something on debugging classic asp issues (memory, performance, hang, etc)?

  • Petter,  very happy to hear that.

    I wasnt planning on doing anything for native issues, but for hangs/perf issues and crashes you troubleshoot them pretty much the same way, and for mem issues you would normally use leaktracking in debug diag to get to the bottom of the issue.

  • Hi Tess,

    I see increasing private Bytes..So collected the dump. and followed the steps below.

    dumpheap-stat output gives....

    43a209c    34708   3533884 System.Object[]

    43a273c     6749   7338752 System.Byte[]

    79b946b0   830459  33038116 System.String

    205158      122  47250460      Free

    Total 1187018 objects

    -

    So selected Dumpheap -mt and ran GCroot on thread and get following output.

    0:000> !gcroot 054c5000

    Scan Thread 0 (10a0)

    Scan Thread 2 (16cc)

    Scan Thread 3 (13fc)

    Scan Thread 7 (e80)

    Scan Thread 4 (113c)

    Scan Thread 30 (1220)

    Scan Thread 32 (159c)

    Scan Thread 31 (ac0)

    Scan Thread 33 (118)

    Scan Thread 43 (17e4)

    Scan HandleTable 41b0048

    Scan HandleTable 2022a0

    HANDLE(RefCnt):4392238:Root:046a2210(System.Drawing.Internal.GPStream)->046a21ec(System.IO.MemoryStream)->054c5000(System.Byte[])

    from the above output can I infer that GC handle is keeping reference to System.Byte[], Beyond this step how can I

    narrow down on the function responsible to keep the reference alive.

  • For above question Dumpeap -mt was done  for system.byte[]

  • Hi, Tess!

    In first place, thank's by the posts, they are genials and give a way to pratice this kind of advanced analysis..

    I am with a complicated memory leak issue. I find that system.objects[] are the vilains, but i have dificult to understand de message below. I can't get identifiy the problematic source code. Could you give same tips to give a direction?

    0:006> !gcroot 21811000

    Note: Roots found on stacks may be false positives. Run "!help gcroot" for

    more info.

    ESP:33eeea8:Root:010b3968(System.Collections.Generic.List`1[[System.String, mscorlib]])->

    21811000(System.Object[])

    =========================================

    i stoped here

    0:006> !do 010b3968

    Name: System.Collections.Generic.List`1[[System.String, mscorlib]]

    MethodTable: 79137270

    EEClass: 7912f680

    Size: 24(0x18) bytes

    (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)

    Fields:

         MT    Field   Offset                 Type VT     Attr    Value Name

    7912d8f8  40009c7        4      System.Object[]  0 instance 21811000 _items

    79102290  40009c8        c         System.Int32  1 instance 101575559 _size

    79102290  40009c9       10         System.Int32  1 instance 101575559 _version

    790fd0f0  40009ca        8        System.Object  0 instance 00000000 _syncRoot

    7912d8f8  40009cb        0      System.Object[]  0   shared   static _emptyArray

       >> Domain:Value dynamic statics NYI

    001599f8:NotInit  <<

    How could fiding more issues to identify the memory leak issue?

    Thank's so much

Page 1 of 2 (18 items) 12
Leave a Comment
  • Please add 1 and 8 and type the answer here:
  • Post