If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.Net memory leak: Unblock my finalizer

.Net memory leak: Unblock my finalizer

Rate This
  • Comments 44

This time I figured I'd talk about an issue that spans all three debugging scenarios, hangs, memory issues and crashes... yummy:)

But first off let me just share a link i found on Mike Stall's debugging blog, the source for .net 2.0 (rotor). Very handy if you want to take a closer look at how things are implemented in the framework.

Edit 2010-03-08: Changed the link to the download for the CLI source

Problem statement:

The application runs fine for hours on end, then suddenly memory starts raising uncontrollably, eventually crashing with a "stopped unexpectedly" because of out of memory exceptions. Around the same time that memory starts raising CPU usage seems to go up and requests to the application star hanging, no new requests go through.

To get the application working the application needs to be restarted.

Gathering data:

Apart from gathering dumps (preferably multiple a few minutes apart for comparison), I would also recommend here to get some performance logs with counters like

  • Process/%Processor Time
  • Process/Virtual Bytes
  • Process/Private Bytes
  • .NET CLR Memory/All counters (To see how much time we are spending doing GC, how much memory is managed and how the ratio of Gen 0/Gen 1 and Gen 2 collections look)
  • ASP.NET counters to look for queueing, execution time etc.

The performance logs might not give you the answer but they can give a pretty good picture of what you are looking at and how the memory leak/hang/high CPU ties together.

Debugging the problem:

The dump we gathered was 845 MB, and from the performance log we saw that # bytes in all heaps followed private bytes very closely.

We can take two routes here (memory or hang) so we'll start with one...

Memory route:

If we run !eeheap –gc to see what the managed heaps look like we get the following output confirming that it is in fact a managed memory leak.

0:043> !eeheap -gc
Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorsvr.dll"
Number of GC Heaps: 2
------------------------------
Heap 0 (0x000b9368)
generation 0 starts at 0x36f72cbc
generation 1 starts at 0x3687db60
generation 2 starts at 0x026f0030
ephemeral segment allocation context: (0x36f72cbc, 0x36f72cc8)
 segment    begin       allocated     size
0x26f0000 0x26f0030  0x66e2b94 0x3ff2b64(67,054,436)
0x11010000 0x11010030  0x15008544 0x3ff8514(67,077,396)
0x19010000 0x19010030  0x1cfce278 0x3fbe248(66,839,112)
0x21010000 0x21010030  0x25007050 0x3ff7020(67,072,032)
0x29010000 0x29010030  0x2cef6fec 0x3ee6fbc(65,957,820)
0x332c0000 0x332c0030  0x372bde34 0x3ffde04(67,100,164)
Large object heap starts at 0x0a6f0030
 segment    begin       allocated     size
0x0a6f0000 0x0a6f0030  0x0a725738 0x00035708(218,888)
Heap Size  0x17eba7a8(401,319,848)
------------------------------
Heap 1 (0x000dc930)
generation 0 starts at 0x3a244e9c
generation 1 starts at 0x39c85f9c
generation 2 starts at 0x066f0030
ephemeral segment allocation context: (0x3a244e9c, 0x3a244ea8)
 segment    begin       allocated     size
0x66f0000 0x66f0030  0xa6d8818 0x3fe87e8(67,012,584)
0x15010000 0x15010030  0x18ffd9b8 0x3fed988(67,033,480)
0x1d010000 0x1d010030  0x20ff8d4c 0x3fe8d1c(67,013,916)
0x25010000 0x25010030  0x28eee26c 0x3ede23c(65,921,596)
0x2d010000 0x2d010030  0x30e96034 0x3e86004(65,560,580)
0x372c0000 0x372c0030  0x3a599490 0x32d9460(53,318,752)
Large object heap starts at 0x0b6f0030
 segment    begin       allocated     size
0x0b6f0000 0x0b6f0030  0x0b7f6198 0x00106168(1,073,512)
Heap Size  0x17102694(386,934,420)
------------------------------
Reserved segments:
------------------------------
GC Heap Size  0x2efbce3c(788,254,268)

The GC heaps take up a total of around 788 MB (rounding freely:)), so the next natural step is to figure out what is on the heap... maybe a !dumpheap –stat will do the trick...

0:043> !dumpheap -stat
------------------------------
Heap 0
Bad MethodTable for Obj at 0x26f0030
total 21 objects
------------------------------
Heap 1
Bad MethodTable for Obj at 0x66f0030
total 2 objects
------------------------------
total 23 objects
Statistics:
        MT      Count     TotalSize Class Name
0x020a209c         12        36,960 System.Object[]
0x000dc488          9       488,672      Free
0x79b94638          2       766,768 System.String
Total 23 objects, Total size: 1,292,400

Hmm... what's up here? Managed heap corruption? On top of everything else? I mean we have the classic signs, bad method table (someone overwriting the method table for an object), and the size of the heap looks completely out of whack!!! But, would I really write another blog post about that?!? Of course not:) at least not this soon...

So when else would the heap be out of whack? Just bear with me for a moment...

Let's take a look at the !threads output

0:043> !threads
ThreadCount: 18
UnstartedThread: 0
BackgroundThread: 16
PendingThread: 0
DeadThread: 2
                                  PreEmptive   GC Alloc                     Lock     
        ID  ThreadOBJ       State     GC       Context           Domain     Count APT Exception
 15 0x109c 0x000dc260   0x180a220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker) System.Threading.ThreadAbortException
 19 0x14c4 0x000dd3f8   0x200b220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Finalizer)
XXX      0 0x000f2748   0x1800820 Enabled  0x00000000:0x00000000 0x000d58c0     0 MTA (Threadpool Worker)
 21  0xaa0 0x024bebc0   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
XXX      0 0x024bed88       0x820 Enabled  0x00000000:0x00000000 0x000d58c0     0 Ukn
 29 0x1314 0x024bd318   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
 30 0x1668 0x024e0218   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 31  0xd48 0x024c0460   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     2 MTA (GC) (Threadpool Worker)
 32 0x16ec 0x025372e8   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
 33 0x1758 0x025421e0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA
 34 0x1354 0x025428f0       0x222 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 35  0x954 0x02543000       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 36 0x15c8 0x02543dc0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 37  0xd1c 0x025444d0   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 38  0xbd0 0x0254a2d0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 39  0x240 0x0254b600   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 40 0x10d8 0x0254ca58       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 41  0xd80 0x02572018       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA

Notice thread 31, in the exception column we see (GC), that means that we are in the middle of a garbage collection. Actually, it means that Thread 31 made an allocation that triggered a GC, alternatively, it could be that someone called GC.Collect on thread 31.

Taking a closer look (running kb 2000 on thread 31) we find that in fact someone made an allocation, causing the GC to have to allocate_more_space and triggering a GC.

  31  Id: 14b0.d48 Suspend: 1 Teb: 7ff59000 Unfrozen
ChildEBP RetAddr  Args to Child              
0ca7ecb4 7c822124 77e6baa8 000003e0 00000000 ntdll!KiFastSystemCallRet
0ca7ecb8 77e6baa8 000003e0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0ca7ed28 77e6ba12 000003e0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0ca7ed3c 791fee7b 000003e0 ffffffff 000b9368 kernel32!WaitForSingleObject+0x12
0ca7ed5c 7920273d 00000001 00000000 00009c4c mscorsvr!GCHeap::GarbageCollectGeneration+0x1a9
0ca7ed8c 791c0ccd 024c0498 00009c4c 00000000 mscorsvr!gc_heap::allocate_more_space+0x181
0ca7efb4 791b6269 024c0498 00009c4c 00000000 mscorsvr!GCHeap::Alloc+0x7b
0ca7efc8 791b8873 00009c4c 00000000 00000000 mscorsvr!Alloc+0x3a
0ca7efe8 791b8814 020a26b0 00002710 00000000 mscorsvr!FastAllocatePrimitiveArray+0x45
0ca7f068 0caf0566 3b609d84 372bde28 0ca7f09c mscorsvr!JIT_NewArr1+0xbb
WARNING: Frame IP not in any known module. Following frames may be wrong.
0ca7f078 0caf0524 37090e70 37090184 0276b7ac 0xcaf0566
0ca7f09c 0cba25b4 02764610 37090184 3708f348 0xcaf0524
0ca7f1cc 799a67a2 3708f3b0 3708f6b4 799b4168 0xcba25b4
0ca7f1d8 799b4168 00000002 00000000 0c8f1a38 mscorlib_79990000+0x167a2
0ca7f260 79217188 00000000 00dc9160 02728fd8 mscorlib_79990000+0x24168
02432f60 24548bca 2c15ff10 8b020a95 8bcb8bd8 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
02432f60 00000000 2c15ff10 8b020a95 8bcb8bd8 0x24548bca

If this was a single processor machine, running the GC in workstation mode we would have seen the actual collection on this thread but this is a multi processor machine with 2 heaps (we could see this from the !eeheap –gc output), so we have 2 threads dedicated to doing garbage collection.

We can find out which ones they are by searching our stacks for the function gc_thread_stub which is the first function called on the GC threads (in our case it was thread 17 and 18)

  17  Id: 14b0.172c Suspend: 1 Teb: 7ffa6000 Unfrozen
ChildEBP RetAddr  Args to Child              
0225fdc4 792042e5 11fc4800 11fc67d0 0012eea8 mscorsvr!memcopy+0x12
0225fde4 792043da 00000001 0012eea8 0225fe24 mscorsvr!gc_heap::compact_plug+0x2d
0225fe08 792044cc 120f5694 00000000 026f0030 mscorsvr!gc_heap::compact_in_brick+0x59
0225fe40 791ff1bd 00000002 026f0030 00000001 mscorsvr!gc_heap::compact_phase+0x116
0225fed0 791ff43d 00000002 000b9368 00000000 mscorsvr!gc_heap::plan_phase+0x857
0225ff24 791ff065 00000000 000b9368 77e670b2 mscorsvr!gc_heap::gc1+0x78
0225ff88 791fe6c3 00000001 00000000 000b9368 mscorsvr!gc_heap::garbage_collect+0x22f
0225ffac 792356be 00000000 0225ffec 77e66063 mscorsvr!gc_heap::gc_thread_function+0x42
0225ffb8 77e66063 000b9368 00000000 00000000 mscorsvr!gc_heap::gc_thread_stub+0x1e
0225ffec 00000000 792356a0 000b9368 00000000 kernel32!BaseThreadStart+0x34

  18  Id: 14b0.76c Suspend: 1 Teb: 7ffa5000 Unfrozen
ChildEBP RetAddr  Args to Child              
0229fcc4 792042e5 1848d1b4 1848eae8 00092934 mscorsvr!memcopy+0x1c
0229fce4 792043da 00000001 00092934 0229fd24 mscorsvr!gc_heap::compact_plug+0x2d
0229fd08 792044cc 1852142c 00000000 066f0030 mscorsvr!gc_heap::compact_in_brick+0x59
0229fd40 791ff1bd 00000002 066f0030 00000001 mscorsvr!gc_heap::compact_phase+0x116
0229fdd0 791ff43d 00000002 000dc930 00000002 mscorsvr!gc_heap::plan_phase+0x857
0229fe24 791ff065 00000000 000dc930 77e670b2 mscorsvr!gc_heap::gc1+0x78
0229fe88 791fe6c3 00000001 00000000 000dc930 mscorsvr!gc_heap::garbage_collect+0x22f
0229feac 792356be 80a79bcc f773fa00 ef609be0 mscorsvr!gc_heap::gc_thread_function+0x42
0229ffb8 77e66063 000dc930 00000000 00000000 mscorsvr!gc_heap::gc_thread_stub+0x1e
0229ffec 00000000 792356a0 000dc930 00000000 kernel32!BaseThreadStart+0x34

And this folks, is the answer to why our heaps are in this funky state. The GC is in the middle of compacting the heaps which means that the heaps are "in motion" so the output from !dumpheap –stat really can't be relied on.

If we really want to know what is on the heaps we can either manually go through and run !dumpheap for all segments...

Ex.

Heap 1 (0x000dc930)
generation 0 starts at 0x3a244e9c
generation 1 starts at 0x39c85f9c
generation 2 starts at 0x066f0030
ephemeral segment allocation context: (0x3a244e9c, 0x3a244ea8)
 segment    begin       allocated     size
0x66f0000 0x66f0030  0xa6d8818 0x3fe87e8(67,012,584)
0x15010000 0x15010030  0x18ffd9b8 0x3fed988(67,033,480)
0x1d010000 0x1d010030  0x20ff8d4c 0x3fe8d1c(67,013,916)
...

0:043> !dumpheap 0x15010030 0x18ffd9b8
------------------------------
Heap 0
total 0 objects
------------------------------
Heap 1
0x15010030 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x1501003c 0x020a26b0   40,012    2 System.Int32[] 
0x15019c88 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x15019c94 0x020a26b0   40,012    2 System.Int32[] 
0x150238e0 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x150238ec 0x020a26b0   40,012    2 System.Int32[] 
0x1502d538 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x1502d544 0x020a26b0   40,012    2 System.Int32[]
...

and summarize the resulting output, keeping in mind that it wont work for some segments that are currently in motion... or get a new dump when we are not in a GC. But I'll leave that as an exercise for the reader:) (don't you just hate it when they say that in study books, it just means that the author either didn't know how to do it themselves, or was too lazy to:) in my case it's the latter, I promise:))

Hang/Performance route:

It was also mentioned, in hte problem description, that apart from that memory grew really rapidly, CPU usage was high, and requests appeared to be stuck... since we couldn't get much info about the managed heap from !dumpheap –stat, perhaps we should try the "hang" approach and see what that gives us.

Normally ~* kb and ~* e !clrstack (to see all stacks) would have been the way to go, and it is in this case too, but before we go for the dough, let's take a peak at the threadpool, to get a bit of info about our queue status, CPU usage and whether or not the threadpool is exhausted.

0:043> !threadpool
CPU utilization 81%
Worker Thread: Total: 16 Running: 16 Idle: 0 MaxLimit: 40 MinLimit: 2
Work Request in Queue: 68
0x9a3dc
CorThreadPoolWorkitemCallback Callback@0x01e84df0
Unknown 0x000e3f20
CorThreadPoolWorkitemCallback Callback@0x01eb55e8
CorThreadPoolWorkitemCallback Callback@0x01eb5578
CorThreadPoolWorkitemCallback Callback@0x01eb5df0
CorThreadPoolWorkitemCallback Callback@0x01eb5db8
CorThreadPoolWorkitemCallback Callback@0x01eb57a8
CorThreadPoolWorkitemCallback Callback@0x01eb5700
CorThreadPoolWorkitemCallback Callback@0x01e83e20
...
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread: Total: 2 Free: 2 MaxFree: 4 CurrentLimit: 2 MaxLimit: 40 MinLimit: 2

So we have 16 worker threads created out of a total of 40 (20*2 processors) where 20 is the number we've given in machine.config for maxWorkerThreads and all of them are in the middle of executing requests. We have 68 work requests in the queue, hmm... then why aren't we just creating new threads to process these requests??? It doesn't really appear like we exhausted the threadpool.

Ok, everybody who has ever been curious why CPU utilization was listed under threadpool, raise your hand:)

We know from previous case studies that the CPU utilization shows the CPU utilization of the whole system, not only the CPU utilization of this process, how could that possibly have anything to do with the threadpool? Well, surprise surprise, it does. This piece of information is stored in a variable by the framework because it is one of the factors that determine whether or not we will be creating new threads. In particular, if the CPU utilization is over 80% we don't create any new threads... and this is why we are not creating any new threads in this case.

As a matter of fact, if you take a look at any dump where we are in the middle of a garbage collection you will notice that the CPU utilization is 81%. Is this because the GC in some magical way manages to keep itself exactly at 81%? Don't think so:) But artificially setting this value during a GC to 81 has the nice side-effect that no new threads can be created that could otherwise be allocating memory or otherwise changing objects on the heap. So, the real CPU usage could be 20 or 100 or 59% for that matter, it will always be shown as 81 during a GC, so you can't really rely on this either in a GC dump.

Anyways, over to the threads... what are they all up to?

If we run ~* kb and look at all the stacks we will see that most of them are sitting in a stack similar to this one...

  32  Id: 14b0.16ec Suspend: 1 Teb: 7ff58000 Unfrozen
ChildEBP RetAddr  Args to Child              
0cbeeae0 7c822124 77e6baa8 000003ec 00000000 ntdll!KiFastSystemCallRet
0cbeeae4 77e6baa8 000003ec 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0cbeeb54 77e6ba12 000003ec ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0cbeeb68 791fed5c 000003ec ffffffff 791fe901 kernel32!WaitForSingleObject+0x12
0cbeeb74 791fe901 00000001 025372e8 00000001 mscorsvr!GCHeap::WaitUntilGCComplete+0x4f
0cbeeb84 7926ec73 0000001c 02537320 0000001c mscorsvr!Thread::RareDisablePreemptiveGC+0xb5
0cbeebac 791c0ccd 02537320 0000001c 00000000 mscorsvr!gc_heap::allocate_more_space+0xe6
0cbeedd4 791b6269 02537320 0000001c 00000000 mscorsvr!GCHeap::Alloc+0x7b
0cbeede8 791c0dfd 0000001c 00000000 00000000 mscorsvr!Alloc+0x3a
0cbeee04 791b2a11 00000006 00000005 791ce390 mscorsvr!SlowAllocateString+0x26
0cbeee10 791ce390 00000005 39685aa4 0cbeee9c mscorsvr!JIT_TrialAlloc::GenAllocArray+0x29c
0cbeee20 791ce428 0cbeeea4 00000408 00000005 mscorsvr!COMString::NewString+0x20
0cbeee9c 026b2b0c 39340a14 00000408 0cbeef44 mscorsvr!COMString::StringInitCharArray+0xa4
WARNING: Frame IP not in any known module. Following frames may be wrong.
0cbeef80 79999f6d 39680c14 00000000 791b8bc6 0x26b2b0c
0cbeef8c 791b8bc6 00000000 0cbeeff4 026b75af mscorlib_79990000+0x9f6d
0cbeef98 026b75af 391c35c4 00000001 391c355c mscorsvr!COMString::EqualsString+0x3f
00000000 00000000 00000000 00000000 00000000 0x26b75af

Waiting for the GC to complete...

And if we were to look at some other dumps taken during this performance issue, we would see that most if not all would be in a GC. So it appears that our problem "hang"-wise seems to be that we are doing way many garbage collections or that the garbage collections are taking a loooong time.

What can make us spend a lot of our time in GC?

The short answer: Lots of garbage collections, specifically a lot of higher generation collections, in essence it is very bad if we constantly hit the limit for generation 2.

If you want the GC to work really hard, here are some good tips (short of manually calling GC.Collect often):

  • Have a high allocation rate / second
  • Let your objects reach gen 2 and then let them die
  • Create a lot of objects that get stored on the large object heap (since collecting the LOH means collecting all generations)
  • Block the finalizer so that no matter if objects are collected they still can't go away if they have a finalizer

A few examples...

High allocation rate

Let's say you have a recordset containing 10.000 people and you decide to dynamically create a page with an HTML table containing these. You start by storing the header of the page in a string myPageContents = "<HTML><TITLE>blabla</TITLE><BODY><TABLE>..." and so on. Then you have a loop that appends the record enclosed in the proper table tags for each record like this

myPageContents += "<TR><TD>" + firstname + "</TD><TD>" + lastname + "</TD></TR>";

This sounds like a fine idea, right?

Well, think again... The problem is that strings are immutable, i.e. that they can not be changed. This means that we will create 5 new copies of the string (one slightly larger than the previous), for each iteration if we use the code above. Even worse, when the strings get long enough, the later ones would be stored on the LOH. Now that is a pretty good way to get a high allocation rate / second and growing your LOH:)

Let objects reach gen 2 and then die

Let all your objects implement finalizers, or derive from objects that implement finalizers. Even if the finalizer doesn't contain any code the object will be "rooted" by the freacheable queue when it is created and if no-one has suppressed finalization (in a dispose method or similar) for it, it won't be free'd straight away when a garbage collection occurs, but rather it would be promoted one generation and then, only once the finalizer has run it's finalize method it can be collected. In other words, if you want your object to be promoted one generation for no good reason, create a finalize method and leave it empty:)

Speaking of finalize methods, please read http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconFinalizeDispose.asp on finalization and dispose patterns so you don't run into unnecessary issues such as the one just mentioned, or nullreference exceptions because you are accessing an objects member variable inside its finalizer.

Large object heap

I think this one is pretty obvious, pretty much, try to use the LOH as little as possible.

Blocking finalizers

A finalize method should never perform a blocking operation, the reason for this is that we only have one finalize thread in a .net process, and if we block it we won't be able to get rid of any objects that need finalization. This of course leads to high memory consumption, but also to a large number of garbage collections since we will reach the limits that cause us to GC over and over again because we never get rid of any data. Seems pretty much like what we are running into in our scenario.

So how do we find out which one we are running into?

If we have a high allocation rate we will see this in perfmon under .NET CLR Memory - Allocated Bytes/Sec, and technically the best way to figure out why we are allocating so much (if we can limit it to a specific page or small scenario) would be to run a profiler like the CLR Profiler. But we could get a pretty good idea by looking at the thread that caused the GC and see if perhaps it is doing something that could be allocating a lot of memory, and run !objsize on the objects on that stack (!dso output) to see if you find any particularly big ones. Remember though that what we see on that thread might just be the straw that broke the camels back.

To see which of your objects have finalizers you can run !finalizequeue. That will show you a list of all objects currently rooted by the freachable queue and can give you a good idea about which of your classes you should inspect for unnecessary finalizers. Remember here that it's not an exhaustive list of all classes that have finalizers, just the ones currently alive, and that noone has called SuppressFinalize on.

Finally, to see if you have a blocked finalizer, simply look at the finalizer thread and see what it is doing. In our case the finalizer is thread 19 (from the !threads output)

19 0x14c4 0x000dd3f8   0x200b220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Finalizer)

Kb 2000 (native stack) and !clrstack (managed stack) for this thread show

  19  Id: 14b0.14c4 Suspend: 1 Teb: 7ffa4000 Unfrozen
ChildEBP RetAddr  Args to Child              
022dfbb4 7c821364 77e42439 00000001 022dfbf8 ntdll!KiFastSystemCallRet
022dfbb8 77e42439 00000001 022dfbf8 00000000 ntdll!NtDelayExecution+0xc
022dfc20 7920a823 000186a0 00000001 022dfe00 kernel32!SleepEx+0x68
022dfc40 7920a89b 000186a0 022dfc78 022dfca8 mscorsvr!Thread::UserSleep+0x93
022dfc50 0247316a 022dfc5c 000186a0 022dfcb0 mscorsvr!ThreadNative::Sleep+0x30
WARNING: Frame IP not in any known module. Following frames may be wrong.
022dfca8 79204678 022dfd24 79214df6 022dfe00 0x247316a
022dfd34 7920473d 049db038 000dd3f8 000f2a68 mscorsvr!MethodTable::CallFinalizer+0xee
022dfd48 79203b2c 049db038 00171eb0 022dfe74 mscorsvr!CallFinalizer+0x84
022dfde0 79204806 049db038 00000000 000dd3f8 mscorsvr!CallFinalizer+0x255
022dfdf4 791cc3c8 022dfe74 791b3d64 022dfe5c mscorsvr!GCHeap::RestartEE+0x73
022dfe3c 792047d9 00171eb0 792047f3 022dfe74 mscorsvr!Thread::DoADCallBack+0x5c
022dfedc 791fed20 02792828 00000000 00000000 mscorsvr!CallFinalizer+0x2fa
022dff24 79245681 00000000 80915704 7ffa4000 mscorsvr!GCHeap::FinalizerThreadStart+0xc2
022dffb8 77e66063 000dd7d0 00000000 00000000 mscorsvr!Thread::intermediateThreadProc+0x44
022dffec 00000000 79245640 000dd7d0 00000000 kernel32!BaseThreadStart+0x34

Thread 19
ESP         EIP       
0x022dfc78  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Void System.Threading.Thread.Sleep(I4)
0x022dfc88  0x0caf05cd [DEFAULT] [hasThis] Void MemoryIssues.MyClassWFinalizer.Finalize()
0x022dfe00  0x79204678 [FRAME: ContextTransitionFrame] 
0x022dfe5c  0x79204678 [FRAME: GCFrame]

Houston, we have a problem:)

We are definitely blocking the finalizer. The finalize method for MyClassWFinalizer calls sleep for some reason and thus blocks the finalizer.

This might not be a super common scenario (the sleep), but it is quite often you see the finalizer get stuck in a native critical section when trying to free some native resource, or even sometimes trying to create a connection to a database or similar.

The moral of the story is to avoid, as much as possible, anything that could be blocking the finalizer, better yet, avoid letting the finalizer clean up at all. If the object you are using has a native resource of some sort, call it's close, or dispose method, when you're done with it, or use the object inside a using statement.

Close or dispose methods, if correctly implemented will call SuppressFinalize which means that your object will not be unnecessarily promoted and will not have to pass through the single finalizer thread for cleanup.

Laters all... and remember 2+2=5 (for extremely large values of 2)

 

 





  • And figured that I can't seem to come up with any good content of my own I'll point some out.&amp;nbsp; Posts...
  • I've come across a finalization hang in testing some .NET CF code.  The issue happens when doing the following.

    command.ExecuteReader(CommandBehavior.CloseConnection)

    AND specifically when the reader is NOT closed properly after its use.

    The hang occurs in GC.WaitForPendingFinalizers
    Is this behavior expected and normal when not properly closing your object?  We are developing a framework and while this issue is of lower priority we would like to discern whether we should say that this is a issue and users just need to make sure they properly call reader.Close.

    Thanks!

     
  • Hi Chris,

    The users should call reader.Close to avoid for potential blocking on the finalizer but i am curious to see how the finalizer thread looks when this occurrs, both !clrstack and kb.  

    Also, i am curious why you are calling GC.WaitForPendingFinalizers,  are you manually calling GC.Collect?

    Thanks,
  • Nice post! Great to see !eeheap getting a workout :p
  • absolutely, couldnt live without it:)
  • Hi Tess,

    Nice post, thanks for that!

    I'm playing a bit with windbg and sos to try and understand the GC internals (framework 1.1), but one thing still isn't clear to me...

    I've built a simple application in which i:
    1) Create a object with code that blocks infinitely in its finalizer (=> the finization thread is blocked for sure)
    2) Create 1000 finalizable objects (with a non blocking finalizer)

    When i take a look in the finalize queue (!finalizequeue), i see the 1000 finalizable objects being listed.

    If i then force a garbage collection in the application (GC.Collect()), the 1000 objects disappear from the !finalizequeue dump list.  I would expect these 1000 objects to be moved to the freachable queue, but they don't (at least, !finalizequeue says they don't: "Ready for finalization 0 objects (0x001bba30->0x001bba30) - Freachable queue"

    The 1000 objects are still in the managed heap (!dumpheap -type...), but non of them seem to be rooted (!gcroot).  When i unblock the finalizer and force an addition GC, the 1000 objects get cleaned up?

    Any idea what might be going on?  Are the objects in the freachable queue, but is !finalizequeue not showing this?  Have the objects been in the freachable queue, but has some other thread already removed them? (I assumed that the finalize thread dequeued the freachable queue, and since this thread is blocked, i assumed the freachable queue could only grow)...
  • I am just going to make a guess here because i didnt have time to properly debug and research, but i think this might be what you are running into...

    When an object with a finalizer becomes unrooted it gets moved from the queue that you see with !finalizequeue and should get added to the freacheable queue as you mentioned, but this only happens at certain times, for example during a GC.  I think these objects are currently stuck in limbo... i.e. they have been removed from the "root" queue, but have yet to be put on the freacheable queue because we havent gone in to the next GC... which in turn is because we are blocking the finalizer...

    If on the other hand they had gotten "un-rooted" and put on the freacheable queue before your finalizer got stuck, and we hadnt had a finalization in between, you would see them on the f-reacheable queue.

    In other words, normally you would see this list full at that point but because of the timing of your test you dont.    I ran into the same thing, which is why i chose not to show the output from !finalizequeue in this post.

    Hope that explains the mystery...
  • I'm just wondering,
    does calling GC.SuppressFinalize lower an object's GC generation to 'normal'? If it does this could be useful.

    I'm currently using a variation of the Finalize/Dispose pattern except with "DisposeManaged()" and "DisposedUnmanaged()" methods, instead of 'bool disposing'.

    I'm thinking that if I do a reflection check on the type the first time an object of that type is used, and see if DisposedUnmanaged is ever overriden, then I can optinally call SuppressFinalize in the constructor..

    Something along the lines of

    MethodInfo mi = this.GetType().GetMethod("DisposedManaged");
    if (mi.GetBaseDefinition() == mi)
    GC.SuppressFinalize(this);

    I have not tested this idea however, and obviously caching the value would be a good idea
  • Tess, I've got a program that's getting blocked in the finalizer and I can't figure out why. The finalizer stack shows that it's sitting in GetToSTA() waiting for a context switch to the main UI thread (which is STA), so I assume it is trying to finalize an RCW. However, the main thread is not blocked or busy: it's sitting in its main message loop! To top it off, I'm not aware of any COM objects my program is using. Does the Framework use some wrapped COM objects for its own implementations? Do you have any idea how I can tell what managed object the finalizer is currently trying to finalize? I don't even know what class to look at.

    Any ideas you can give would be greatly appreciated.
  • The following links to .NET resources have been collated over time with the assistance of colleagues.&amp;nbsp;...
  • Hi Graham,

    not sure i understand what you mean by lowering an objects GC generation to 'normal'... what it does is that it takes it off the finalizable objects list so the finalizer no longer has to be called.  
  • Hi Bill,

    You shouldnt see this unless you are using an STA component in your app, directly or indirectly.  

    Check with !threads to see which threads are STA and use !clrstack to see if you can get an idea of what is running on them and if you can see any references to STA components.

    Specifically the !finalizer.

    I would also look throug lm to see if there are any 3rd party components on there that could be written in legacy VB or in C++ using MFC, in which case they will be STA.  
    Any ActiveX controls for example...

    Its a bit hard giving any more specific info without seeing the dump.

    Best of luck,
     
  • Excellent!! You saved me!! thank you again
  • Great article.  I'm getting a result that contradicts "...and that noone has called SuppressFinalize on"

    I have a Dispose routine from MSDN utilizing Idisposable.Dispose in which the Finalizer calls Dispose(false) - passing false to the Dispose method.  The Dispose method does indeed execute SuppressFinalize.   However, when I execute a  !finalizequeue my form still shows up.  

    Is this "...and that noone has called SuppressFinalize on" really true?

    Thanks!
  • Does it actually still show up after you have called SupressFinalize on it?  You can see if the dispose method has really been called by printing something out with Debug.Write (while you have the debugger attached).
Page 2 of 3 (44 items) 123
Leave a Comment
  • Please add 6 and 8 and type the answer here:
  • Post