If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.Net memory leak: Unblock my finalizer

.Net memory leak: Unblock my finalizer

Rate This
  • Comments 44

This time I figured I'd talk about an issue that spans all three debugging scenarios, hangs, memory issues and crashes... yummy:)

But first off let me just share a link i found on Mike Stall's debugging blog, the source for .net 2.0 (rotor). Very handy if you want to take a closer look at how things are implemented in the framework.

Edit 2010-03-08: Changed the link to the download for the CLI source

Problem statement:

The application runs fine for hours on end, then suddenly memory starts raising uncontrollably, eventually crashing with a "stopped unexpectedly" because of out of memory exceptions. Around the same time that memory starts raising CPU usage seems to go up and requests to the application star hanging, no new requests go through.

To get the application working the application needs to be restarted.

Gathering data:

Apart from gathering dumps (preferably multiple a few minutes apart for comparison), I would also recommend here to get some performance logs with counters like

  • Process/%Processor Time
  • Process/Virtual Bytes
  • Process/Private Bytes
  • .NET CLR Memory/All counters (To see how much time we are spending doing GC, how much memory is managed and how the ratio of Gen 0/Gen 1 and Gen 2 collections look)
  • ASP.NET counters to look for queueing, execution time etc.

The performance logs might not give you the answer but they can give a pretty good picture of what you are looking at and how the memory leak/hang/high CPU ties together.

Debugging the problem:

The dump we gathered was 845 MB, and from the performance log we saw that # bytes in all heaps followed private bytes very closely.

We can take two routes here (memory or hang) so we'll start with one...

Memory route:

If we run !eeheap –gc to see what the managed heaps look like we get the following output confirming that it is in fact a managed memory leak.

0:043> !eeheap -gc
Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorsvr.dll"
Number of GC Heaps: 2
------------------------------
Heap 0 (0x000b9368)
generation 0 starts at 0x36f72cbc
generation 1 starts at 0x3687db60
generation 2 starts at 0x026f0030
ephemeral segment allocation context: (0x36f72cbc, 0x36f72cc8)
 segment    begin       allocated     size
0x26f0000 0x26f0030  0x66e2b94 0x3ff2b64(67,054,436)
0x11010000 0x11010030  0x15008544 0x3ff8514(67,077,396)
0x19010000 0x19010030  0x1cfce278 0x3fbe248(66,839,112)
0x21010000 0x21010030  0x25007050 0x3ff7020(67,072,032)
0x29010000 0x29010030  0x2cef6fec 0x3ee6fbc(65,957,820)
0x332c0000 0x332c0030  0x372bde34 0x3ffde04(67,100,164)
Large object heap starts at 0x0a6f0030
 segment    begin       allocated     size
0x0a6f0000 0x0a6f0030  0x0a725738 0x00035708(218,888)
Heap Size  0x17eba7a8(401,319,848)
------------------------------
Heap 1 (0x000dc930)
generation 0 starts at 0x3a244e9c
generation 1 starts at 0x39c85f9c
generation 2 starts at 0x066f0030
ephemeral segment allocation context: (0x3a244e9c, 0x3a244ea8)
 segment    begin       allocated     size
0x66f0000 0x66f0030  0xa6d8818 0x3fe87e8(67,012,584)
0x15010000 0x15010030  0x18ffd9b8 0x3fed988(67,033,480)
0x1d010000 0x1d010030  0x20ff8d4c 0x3fe8d1c(67,013,916)
0x25010000 0x25010030  0x28eee26c 0x3ede23c(65,921,596)
0x2d010000 0x2d010030  0x30e96034 0x3e86004(65,560,580)
0x372c0000 0x372c0030  0x3a599490 0x32d9460(53,318,752)
Large object heap starts at 0x0b6f0030
 segment    begin       allocated     size
0x0b6f0000 0x0b6f0030  0x0b7f6198 0x00106168(1,073,512)
Heap Size  0x17102694(386,934,420)
------------------------------
Reserved segments:
------------------------------
GC Heap Size  0x2efbce3c(788,254,268)

The GC heaps take up a total of around 788 MB (rounding freely:)), so the next natural step is to figure out what is on the heap... maybe a !dumpheap –stat will do the trick...

0:043> !dumpheap -stat
------------------------------
Heap 0
Bad MethodTable for Obj at 0x26f0030
total 21 objects
------------------------------
Heap 1
Bad MethodTable for Obj at 0x66f0030
total 2 objects
------------------------------
total 23 objects
Statistics:
        MT      Count     TotalSize Class Name
0x020a209c         12        36,960 System.Object[]
0x000dc488          9       488,672      Free
0x79b94638          2       766,768 System.String
Total 23 objects, Total size: 1,292,400

Hmm... what's up here? Managed heap corruption? On top of everything else? I mean we have the classic signs, bad method table (someone overwriting the method table for an object), and the size of the heap looks completely out of whack!!! But, would I really write another blog post about that?!? Of course not:) at least not this soon...

So when else would the heap be out of whack? Just bear with me for a moment...

Let's take a look at the !threads output

0:043> !threads
ThreadCount: 18
UnstartedThread: 0
BackgroundThread: 16
PendingThread: 0
DeadThread: 2
                                  PreEmptive   GC Alloc                     Lock     
        ID  ThreadOBJ       State     GC       Context           Domain     Count APT Exception
 15 0x109c 0x000dc260   0x180a220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker) System.Threading.ThreadAbortException
 19 0x14c4 0x000dd3f8   0x200b220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Finalizer)
XXX      0 0x000f2748   0x1800820 Enabled  0x00000000:0x00000000 0x000d58c0     0 MTA (Threadpool Worker)
 21  0xaa0 0x024bebc0   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
XXX      0 0x024bed88       0x820 Enabled  0x00000000:0x00000000 0x000d58c0     0 Ukn
 29 0x1314 0x024bd318   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
 30 0x1668 0x024e0218   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 31  0xd48 0x024c0460   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     2 MTA (GC) (Threadpool Worker)
 32 0x16ec 0x025372e8   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
 33 0x1758 0x025421e0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA
 34 0x1354 0x025428f0       0x222 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 35  0x954 0x02543000       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 36 0x15c8 0x02543dc0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 37  0xd1c 0x025444d0   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 38  0xbd0 0x0254a2d0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 39  0x240 0x0254b600   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 40 0x10d8 0x0254ca58       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 41  0xd80 0x02572018       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA

Notice thread 31, in the exception column we see (GC), that means that we are in the middle of a garbage collection. Actually, it means that Thread 31 made an allocation that triggered a GC, alternatively, it could be that someone called GC.Collect on thread 31.

Taking a closer look (running kb 2000 on thread 31) we find that in fact someone made an allocation, causing the GC to have to allocate_more_space and triggering a GC.

  31  Id: 14b0.d48 Suspend: 1 Teb: 7ff59000 Unfrozen
ChildEBP RetAddr  Args to Child              
0ca7ecb4 7c822124 77e6baa8 000003e0 00000000 ntdll!KiFastSystemCallRet
0ca7ecb8 77e6baa8 000003e0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0ca7ed28 77e6ba12 000003e0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0ca7ed3c 791fee7b 000003e0 ffffffff 000b9368 kernel32!WaitForSingleObject+0x12
0ca7ed5c 7920273d 00000001 00000000 00009c4c mscorsvr!GCHeap::GarbageCollectGeneration+0x1a9
0ca7ed8c 791c0ccd 024c0498 00009c4c 00000000 mscorsvr!gc_heap::allocate_more_space+0x181
0ca7efb4 791b6269 024c0498 00009c4c 00000000 mscorsvr!GCHeap::Alloc+0x7b
0ca7efc8 791b8873 00009c4c 00000000 00000000 mscorsvr!Alloc+0x3a
0ca7efe8 791b8814 020a26b0 00002710 00000000 mscorsvr!FastAllocatePrimitiveArray+0x45
0ca7f068 0caf0566 3b609d84 372bde28 0ca7f09c mscorsvr!JIT_NewArr1+0xbb
WARNING: Frame IP not in any known module. Following frames may be wrong.
0ca7f078 0caf0524 37090e70 37090184 0276b7ac 0xcaf0566
0ca7f09c 0cba25b4 02764610 37090184 3708f348 0xcaf0524
0ca7f1cc 799a67a2 3708f3b0 3708f6b4 799b4168 0xcba25b4
0ca7f1d8 799b4168 00000002 00000000 0c8f1a38 mscorlib_79990000+0x167a2
0ca7f260 79217188 00000000 00dc9160 02728fd8 mscorlib_79990000+0x24168
02432f60 24548bca 2c15ff10 8b020a95 8bcb8bd8 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
02432f60 00000000 2c15ff10 8b020a95 8bcb8bd8 0x24548bca

If this was a single processor machine, running the GC in workstation mode we would have seen the actual collection on this thread but this is a multi processor machine with 2 heaps (we could see this from the !eeheap –gc output), so we have 2 threads dedicated to doing garbage collection.

We can find out which ones they are by searching our stacks for the function gc_thread_stub which is the first function called on the GC threads (in our case it was thread 17 and 18)

  17  Id: 14b0.172c Suspend: 1 Teb: 7ffa6000 Unfrozen
ChildEBP RetAddr  Args to Child              
0225fdc4 792042e5 11fc4800 11fc67d0 0012eea8 mscorsvr!memcopy+0x12
0225fde4 792043da 00000001 0012eea8 0225fe24 mscorsvr!gc_heap::compact_plug+0x2d
0225fe08 792044cc 120f5694 00000000 026f0030 mscorsvr!gc_heap::compact_in_brick+0x59
0225fe40 791ff1bd 00000002 026f0030 00000001 mscorsvr!gc_heap::compact_phase+0x116
0225fed0 791ff43d 00000002 000b9368 00000000 mscorsvr!gc_heap::plan_phase+0x857
0225ff24 791ff065 00000000 000b9368 77e670b2 mscorsvr!gc_heap::gc1+0x78
0225ff88 791fe6c3 00000001 00000000 000b9368 mscorsvr!gc_heap::garbage_collect+0x22f
0225ffac 792356be 00000000 0225ffec 77e66063 mscorsvr!gc_heap::gc_thread_function+0x42
0225ffb8 77e66063 000b9368 00000000 00000000 mscorsvr!gc_heap::gc_thread_stub+0x1e
0225ffec 00000000 792356a0 000b9368 00000000 kernel32!BaseThreadStart+0x34

  18  Id: 14b0.76c Suspend: 1 Teb: 7ffa5000 Unfrozen
ChildEBP RetAddr  Args to Child              
0229fcc4 792042e5 1848d1b4 1848eae8 00092934 mscorsvr!memcopy+0x1c
0229fce4 792043da 00000001 00092934 0229fd24 mscorsvr!gc_heap::compact_plug+0x2d
0229fd08 792044cc 1852142c 00000000 066f0030 mscorsvr!gc_heap::compact_in_brick+0x59
0229fd40 791ff1bd 00000002 066f0030 00000001 mscorsvr!gc_heap::compact_phase+0x116
0229fdd0 791ff43d 00000002 000dc930 00000002 mscorsvr!gc_heap::plan_phase+0x857
0229fe24 791ff065 00000000 000dc930 77e670b2 mscorsvr!gc_heap::gc1+0x78
0229fe88 791fe6c3 00000001 00000000 000dc930 mscorsvr!gc_heap::garbage_collect+0x22f
0229feac 792356be 80a79bcc f773fa00 ef609be0 mscorsvr!gc_heap::gc_thread_function+0x42
0229ffb8 77e66063 000dc930 00000000 00000000 mscorsvr!gc_heap::gc_thread_stub+0x1e
0229ffec 00000000 792356a0 000dc930 00000000 kernel32!BaseThreadStart+0x34

And this folks, is the answer to why our heaps are in this funky state. The GC is in the middle of compacting the heaps which means that the heaps are "in motion" so the output from !dumpheap –stat really can't be relied on.

If we really want to know what is on the heaps we can either manually go through and run !dumpheap for all segments...

Ex.

Heap 1 (0x000dc930)
generation 0 starts at 0x3a244e9c
generation 1 starts at 0x39c85f9c
generation 2 starts at 0x066f0030
ephemeral segment allocation context: (0x3a244e9c, 0x3a244ea8)
 segment    begin       allocated     size
0x66f0000 0x66f0030  0xa6d8818 0x3fe87e8(67,012,584)
0x15010000 0x15010030  0x18ffd9b8 0x3fed988(67,033,480)
0x1d010000 0x1d010030  0x20ff8d4c 0x3fe8d1c(67,013,916)
...

0:043> !dumpheap 0x15010030 0x18ffd9b8
------------------------------
Heap 0
total 0 objects
------------------------------
Heap 1
0x15010030 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x1501003c 0x020a26b0   40,012    2 System.Int32[] 
0x15019c88 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x15019c94 0x020a26b0   40,012    2 System.Int32[] 
0x150238e0 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x150238ec 0x020a26b0   40,012    2 System.Int32[] 
0x1502d538 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x1502d544 0x020a26b0   40,012    2 System.Int32[]
...

and summarize the resulting output, keeping in mind that it wont work for some segments that are currently in motion... or get a new dump when we are not in a GC. But I'll leave that as an exercise for the reader:) (don't you just hate it when they say that in study books, it just means that the author either didn't know how to do it themselves, or was too lazy to:) in my case it's the latter, I promise:))

Hang/Performance route:

It was also mentioned, in hte problem description, that apart from that memory grew really rapidly, CPU usage was high, and requests appeared to be stuck... since we couldn't get much info about the managed heap from !dumpheap –stat, perhaps we should try the "hang" approach and see what that gives us.

Normally ~* kb and ~* e !clrstack (to see all stacks) would have been the way to go, and it is in this case too, but before we go for the dough, let's take a peak at the threadpool, to get a bit of info about our queue status, CPU usage and whether or not the threadpool is exhausted.

0:043> !threadpool
CPU utilization 81%
Worker Thread: Total: 16 Running: 16 Idle: 0 MaxLimit: 40 MinLimit: 2
Work Request in Queue: 68
0x9a3dc
CorThreadPoolWorkitemCallback Callback@0x01e84df0
Unknown 0x000e3f20
CorThreadPoolWorkitemCallback Callback@0x01eb55e8
CorThreadPoolWorkitemCallback Callback@0x01eb5578
CorThreadPoolWorkitemCallback Callback@0x01eb5df0
CorThreadPoolWorkitemCallback Callback@0x01eb5db8
CorThreadPoolWorkitemCallback Callback@0x01eb57a8
CorThreadPoolWorkitemCallback Callback@0x01eb5700
CorThreadPoolWorkitemCallback Callback@0x01e83e20
...
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread: Total: 2 Free: 2 MaxFree: 4 CurrentLimit: 2 MaxLimit: 40 MinLimit: 2

So we have 16 worker threads created out of a total of 40 (20*2 processors) where 20 is the number we've given in machine.config for maxWorkerThreads and all of them are in the middle of executing requests. We have 68 work requests in the queue, hmm... then why aren't we just creating new threads to process these requests??? It doesn't really appear like we exhausted the threadpool.

Ok, everybody who has ever been curious why CPU utilization was listed under threadpool, raise your hand:)

We know from previous case studies that the CPU utilization shows the CPU utilization of the whole system, not only the CPU utilization of this process, how could that possibly have anything to do with the threadpool? Well, surprise surprise, it does. This piece of information is stored in a variable by the framework because it is one of the factors that determine whether or not we will be creating new threads. In particular, if the CPU utilization is over 80% we don't create any new threads... and this is why we are not creating any new threads in this case.

As a matter of fact, if you take a look at any dump where we are in the middle of a garbage collection you will notice that the CPU utilization is 81%. Is this because the GC in some magical way manages to keep itself exactly at 81%? Don't think so:) But artificially setting this value during a GC to 81 has the nice side-effect that no new threads can be created that could otherwise be allocating memory or otherwise changing objects on the heap. So, the real CPU usage could be 20 or 100 or 59% for that matter, it will always be shown as 81 during a GC, so you can't really rely on this either in a GC dump.

Anyways, over to the threads... what are they all up to?

If we run ~* kb and look at all the stacks we will see that most of them are sitting in a stack similar to this one...

  32  Id: 14b0.16ec Suspend: 1 Teb: 7ff58000 Unfrozen
ChildEBP RetAddr  Args to Child              
0cbeeae0 7c822124 77e6baa8 000003ec 00000000 ntdll!KiFastSystemCallRet
0cbeeae4 77e6baa8 000003ec 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0cbeeb54 77e6ba12 000003ec ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0cbeeb68 791fed5c 000003ec ffffffff 791fe901 kernel32!WaitForSingleObject+0x12
0cbeeb74 791fe901 00000001 025372e8 00000001 mscorsvr!GCHeap::WaitUntilGCComplete+0x4f
0cbeeb84 7926ec73 0000001c 02537320 0000001c mscorsvr!Thread::RareDisablePreemptiveGC+0xb5
0cbeebac 791c0ccd 02537320 0000001c 00000000 mscorsvr!gc_heap::allocate_more_space+0xe6
0cbeedd4 791b6269 02537320 0000001c 00000000 mscorsvr!GCHeap::Alloc+0x7b
0cbeede8 791c0dfd 0000001c 00000000 00000000 mscorsvr!Alloc+0x3a
0cbeee04 791b2a11 00000006 00000005 791ce390 mscorsvr!SlowAllocateString+0x26
0cbeee10 791ce390 00000005 39685aa4 0cbeee9c mscorsvr!JIT_TrialAlloc::GenAllocArray+0x29c
0cbeee20 791ce428 0cbeeea4 00000408 00000005 mscorsvr!COMString::NewString+0x20
0cbeee9c 026b2b0c 39340a14 00000408 0cbeef44 mscorsvr!COMString::StringInitCharArray+0xa4
WARNING: Frame IP not in any known module. Following frames may be wrong.
0cbeef80 79999f6d 39680c14 00000000 791b8bc6 0x26b2b0c
0cbeef8c 791b8bc6 00000000 0cbeeff4 026b75af mscorlib_79990000+0x9f6d
0cbeef98 026b75af 391c35c4 00000001 391c355c mscorsvr!COMString::EqualsString+0x3f
00000000 00000000 00000000 00000000 00000000 0x26b75af

Waiting for the GC to complete...

And if we were to look at some other dumps taken during this performance issue, we would see that most if not all would be in a GC. So it appears that our problem "hang"-wise seems to be that we are doing way many garbage collections or that the garbage collections are taking a loooong time.

What can make us spend a lot of our time in GC?

The short answer: Lots of garbage collections, specifically a lot of higher generation collections, in essence it is very bad if we constantly hit the limit for generation 2.

If you want the GC to work really hard, here are some good tips (short of manually calling GC.Collect often):

  • Have a high allocation rate / second
  • Let your objects reach gen 2 and then let them die
  • Create a lot of objects that get stored on the large object heap (since collecting the LOH means collecting all generations)
  • Block the finalizer so that no matter if objects are collected they still can't go away if they have a finalizer

A few examples...

High allocation rate

Let's say you have a recordset containing 10.000 people and you decide to dynamically create a page with an HTML table containing these. You start by storing the header of the page in a string myPageContents = "<HTML><TITLE>blabla</TITLE><BODY><TABLE>..." and so on. Then you have a loop that appends the record enclosed in the proper table tags for each record like this

myPageContents += "<TR><TD>" + firstname + "</TD><TD>" + lastname + "</TD></TR>";

This sounds like a fine idea, right?

Well, think again... The problem is that strings are immutable, i.e. that they can not be changed. This means that we will create 5 new copies of the string (one slightly larger than the previous), for each iteration if we use the code above. Even worse, when the strings get long enough, the later ones would be stored on the LOH. Now that is a pretty good way to get a high allocation rate / second and growing your LOH:)

Let objects reach gen 2 and then die

Let all your objects implement finalizers, or derive from objects that implement finalizers. Even if the finalizer doesn't contain any code the object will be "rooted" by the freacheable queue when it is created and if no-one has suppressed finalization (in a dispose method or similar) for it, it won't be free'd straight away when a garbage collection occurs, but rather it would be promoted one generation and then, only once the finalizer has run it's finalize method it can be collected. In other words, if you want your object to be promoted one generation for no good reason, create a finalize method and leave it empty:)

Speaking of finalize methods, please read http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconFinalizeDispose.asp on finalization and dispose patterns so you don't run into unnecessary issues such as the one just mentioned, or nullreference exceptions because you are accessing an objects member variable inside its finalizer.

Large object heap

I think this one is pretty obvious, pretty much, try to use the LOH as little as possible.

Blocking finalizers

A finalize method should never perform a blocking operation, the reason for this is that we only have one finalize thread in a .net process, and if we block it we won't be able to get rid of any objects that need finalization. This of course leads to high memory consumption, but also to a large number of garbage collections since we will reach the limits that cause us to GC over and over again because we never get rid of any data. Seems pretty much like what we are running into in our scenario.

So how do we find out which one we are running into?

If we have a high allocation rate we will see this in perfmon under .NET CLR Memory - Allocated Bytes/Sec, and technically the best way to figure out why we are allocating so much (if we can limit it to a specific page or small scenario) would be to run a profiler like the CLR Profiler. But we could get a pretty good idea by looking at the thread that caused the GC and see if perhaps it is doing something that could be allocating a lot of memory, and run !objsize on the objects on that stack (!dso output) to see if you find any particularly big ones. Remember though that what we see on that thread might just be the straw that broke the camels back.

To see which of your objects have finalizers you can run !finalizequeue. That will show you a list of all objects currently rooted by the freachable queue and can give you a good idea about which of your classes you should inspect for unnecessary finalizers. Remember here that it's not an exhaustive list of all classes that have finalizers, just the ones currently alive, and that noone has called SuppressFinalize on.

Finally, to see if you have a blocked finalizer, simply look at the finalizer thread and see what it is doing. In our case the finalizer is thread 19 (from the !threads output)

19 0x14c4 0x000dd3f8   0x200b220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Finalizer)

Kb 2000 (native stack) and !clrstack (managed stack) for this thread show

  19  Id: 14b0.14c4 Suspend: 1 Teb: 7ffa4000 Unfrozen
ChildEBP RetAddr  Args to Child              
022dfbb4 7c821364 77e42439 00000001 022dfbf8 ntdll!KiFastSystemCallRet
022dfbb8 77e42439 00000001 022dfbf8 00000000 ntdll!NtDelayExecution+0xc
022dfc20 7920a823 000186a0 00000001 022dfe00 kernel32!SleepEx+0x68
022dfc40 7920a89b 000186a0 022dfc78 022dfca8 mscorsvr!Thread::UserSleep+0x93
022dfc50 0247316a 022dfc5c 000186a0 022dfcb0 mscorsvr!ThreadNative::Sleep+0x30
WARNING: Frame IP not in any known module. Following frames may be wrong.
022dfca8 79204678 022dfd24 79214df6 022dfe00 0x247316a
022dfd34 7920473d 049db038 000dd3f8 000f2a68 mscorsvr!MethodTable::CallFinalizer+0xee
022dfd48 79203b2c 049db038 00171eb0 022dfe74 mscorsvr!CallFinalizer+0x84
022dfde0 79204806 049db038 00000000 000dd3f8 mscorsvr!CallFinalizer+0x255
022dfdf4 791cc3c8 022dfe74 791b3d64 022dfe5c mscorsvr!GCHeap::RestartEE+0x73
022dfe3c 792047d9 00171eb0 792047f3 022dfe74 mscorsvr!Thread::DoADCallBack+0x5c
022dfedc 791fed20 02792828 00000000 00000000 mscorsvr!CallFinalizer+0x2fa
022dff24 79245681 00000000 80915704 7ffa4000 mscorsvr!GCHeap::FinalizerThreadStart+0xc2
022dffb8 77e66063 000dd7d0 00000000 00000000 mscorsvr!Thread::intermediateThreadProc+0x44
022dffec 00000000 79245640 000dd7d0 00000000 kernel32!BaseThreadStart+0x34

Thread 19
ESP         EIP       
0x022dfc78  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Void System.Threading.Thread.Sleep(I4)
0x022dfc88  0x0caf05cd [DEFAULT] [hasThis] Void MemoryIssues.MyClassWFinalizer.Finalize()
0x022dfe00  0x79204678 [FRAME: ContextTransitionFrame] 
0x022dfe5c  0x79204678 [FRAME: GCFrame]

Houston, we have a problem:)

We are definitely blocking the finalizer. The finalize method for MyClassWFinalizer calls sleep for some reason and thus blocks the finalizer.

This might not be a super common scenario (the sleep), but it is quite often you see the finalizer get stuck in a native critical section when trying to free some native resource, or even sometimes trying to create a connection to a database or similar.

The moral of the story is to avoid, as much as possible, anything that could be blocking the finalizer, better yet, avoid letting the finalizer clean up at all. If the object you are using has a native resource of some sort, call it's close, or dispose method, when you're done with it, or use the object inside a using statement.

Close or dispose methods, if correctly implemented will call SuppressFinalize which means that your object will not be unnecessarily promoted and will not have to pass through the single finalizer thread for cleanup.

Laters all... and remember 2+2=5 (for extremely large values of 2)

 

 





  • Hi Tess,

    I was wondering if something bad does happen under .NET 2.0 if I take a lock inside a finalizer. The rules seem to change during an application shutdown. I have found so far that during shutdown:

     - finalization is aborted if a lock is taken which the finalizer thread does not own

     - Thread.Abort has no effect

     - Thread.Join does return immediately

     - The function taking a lock seems to get skipped

       ~Finalizer()

       {

            Console.WriteLine("Before taking lock");

            try

            {

            lock(this)

                  .....

             }

              finally

              {

               Console.WriteLine("After taking lock");

              }

       }

       The first line is printed but the lock has no effect and the second line in the finally block is never executed. Could you elaborate a bit what can happen to my finalization code when I take locks?

    Yours,

      Alois Kraus

  • Hi Alois,

    There are two things that I am curious about in regards to your sample.

    1. You use "this" as the object to lock on, how would this work? The object would not be finalized if it is used on another thread so who would hold the lock for this?

    2. What is the purpose of this lock? A finalizer should not access any managed objects, so what is the lock for?

  • Hi Tess,

    it will be finalized during shutdown but not during run time.

    I know the sample looks rather synthetic but I have just done a code audit where I have found massive use of locks during finalization. So far I have found no clear guidance what code is really allowed inside a finalizer which does behave correctly at run time and during shutdown.

    I think the main confusion comes from the two fold part of "Release Managed Resources" and "Release Unmanaged" Resources in the Dispose(bool) method. What exactly are managed resources and what not? Is a thread object an unmanaged resource? Do only COM objects and OS handles count as unmanaged resources?

    To make the sample more realistic each instance could lock an object of an global static array to cause similar behaviour.

    class Program

    {

        public Program(int instance)

        {

            ...

       }

        void ThreadFunc(object instance)

       {

            lock( GlobalLocks[instance] )

            {

                  ....

            }

        }

       ~Program()

       {

             lock(GlobalLocks[myInstance])

             {

                ....

             }

       }

        static object [] GlobalLocks;

        const int N = 10;

        static public void CreateObjects()

        {

             GlobalLocks = new object[N];

             for(int i=0;i<N;i++)

                 GlobalLocks[i] = new object();

            for(int i=0;i<N;i++)

            {

                  Program p = new Program(i);

            }

        }

    }

    Yours,

    Alois Kraus

  • An unmanaged resource is any .net object. For example if you have instantiated a System.Threading.Thread object and dispose of it, that is a managed resource, while if you have a native handle that you have created with a native API, that is a native resource.

    A .net thread / filestream or database connection has it's own native resources in form of handles, but the object itself will take care of disposing of those when you close/abort them or when they get finalized because they have gotten garbage collected.

  • Thanks Tess for clarification of what should be considered a managed and unmanaged resource.

    Yours,

      Alois Kraus

  • PingBack from http://dotnetdebug.net/2006/03/27/unblock-my-finalizer/

  • It was really exciting to see that so many people answered the .NET GC PopQuiz , especially seeing that

  • A long time with no posts, and a short post this time as well. Tons of work, too little time. Please

  • The purpose of my presentation was to show some common pitfalls and of course to show off windbg and

  • "A finalize method should never perform a blocking operation"

    So, never use STA COM objects.

    But...

  • the link for .net 2.0 (rotor) is broken. Can you please fix it.

    Thanks

  • Thanks for the detailed information.

    It is indeed really helpful for freshers to learn debugging without the spoon fed Visual Studio.

  • We are having the same issue may be. I have done the analysis using your Hang Analysis code.

    The result is :

    ====================================================================================================================

                      Hang Analysis                          

    ====================================================================================================================

    ____________________________________________________________________________________________________________________

    GC RELATED INFORMATION

    ____________________________________________________________________________________________________________________

    The following threads are GC threads:

    8 9 10 11 12 13 14 15

    The Finalizer (Thread 16) is blocked

    .shell: Process exited

    There is no data shown in the other two part of this result. So what I am assuming is that my finalizer is blocked.

    So now it is the only thing I should concentrate on or its may be kind of thing.

    (Either I should move towards Profiles now for tracing down this issue to the code level. )

  • Hey I am running your hang analysis code on the different dump file and getting this as a result please help me understand this things.

    The Finalizer (Thread -1) is blocked

    There is no data shown in the other part of  result.

Page 3 of 3 (44 items) 123
Leave a Comment
  • Please add 6 and 3 and type the answer here:
  • Post