If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.Net memory leak: Unblock my finalizer

.Net memory leak: Unblock my finalizer

Rate This
  • Comments 44

This time I figured I'd talk about an issue that spans all three debugging scenarios, hangs, memory issues and crashes... yummy:)

But first off let me just share a link i found on Mike Stall's debugging blog, the source for .net 2.0 (rotor). Very handy if you want to take a closer look at how things are implemented in the framework.

Edit 2010-03-08: Changed the link to the download for the CLI source

Problem statement:

The application runs fine for hours on end, then suddenly memory starts raising uncontrollably, eventually crashing with a "stopped unexpectedly" because of out of memory exceptions. Around the same time that memory starts raising CPU usage seems to go up and requests to the application star hanging, no new requests go through.

To get the application working the application needs to be restarted.

Gathering data:

Apart from gathering dumps (preferably multiple a few minutes apart for comparison), I would also recommend here to get some performance logs with counters like

  • Process/%Processor Time
  • Process/Virtual Bytes
  • Process/Private Bytes
  • .NET CLR Memory/All counters (To see how much time we are spending doing GC, how much memory is managed and how the ratio of Gen 0/Gen 1 and Gen 2 collections look)
  • ASP.NET counters to look for queueing, execution time etc.

The performance logs might not give you the answer but they can give a pretty good picture of what you are looking at and how the memory leak/hang/high CPU ties together.

Debugging the problem:

The dump we gathered was 845 MB, and from the performance log we saw that # bytes in all heaps followed private bytes very closely.

We can take two routes here (memory or hang) so we'll start with one...

Memory route:

If we run !eeheap –gc to see what the managed heaps look like we get the following output confirming that it is in fact a managed memory leak.

0:043> !eeheap -gc
Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorsvr.dll"
Number of GC Heaps: 2
------------------------------
Heap 0 (0x000b9368)
generation 0 starts at 0x36f72cbc
generation 1 starts at 0x3687db60
generation 2 starts at 0x026f0030
ephemeral segment allocation context: (0x36f72cbc, 0x36f72cc8)
 segment    begin       allocated     size
0x26f0000 0x26f0030  0x66e2b94 0x3ff2b64(67,054,436)
0x11010000 0x11010030  0x15008544 0x3ff8514(67,077,396)
0x19010000 0x19010030  0x1cfce278 0x3fbe248(66,839,112)
0x21010000 0x21010030  0x25007050 0x3ff7020(67,072,032)
0x29010000 0x29010030  0x2cef6fec 0x3ee6fbc(65,957,820)
0x332c0000 0x332c0030  0x372bde34 0x3ffde04(67,100,164)
Large object heap starts at 0x0a6f0030
 segment    begin       allocated     size
0x0a6f0000 0x0a6f0030  0x0a725738 0x00035708(218,888)
Heap Size  0x17eba7a8(401,319,848)
------------------------------
Heap 1 (0x000dc930)
generation 0 starts at 0x3a244e9c
generation 1 starts at 0x39c85f9c
generation 2 starts at 0x066f0030
ephemeral segment allocation context: (0x3a244e9c, 0x3a244ea8)
 segment    begin       allocated     size
0x66f0000 0x66f0030  0xa6d8818 0x3fe87e8(67,012,584)
0x15010000 0x15010030  0x18ffd9b8 0x3fed988(67,033,480)
0x1d010000 0x1d010030  0x20ff8d4c 0x3fe8d1c(67,013,916)
0x25010000 0x25010030  0x28eee26c 0x3ede23c(65,921,596)
0x2d010000 0x2d010030  0x30e96034 0x3e86004(65,560,580)
0x372c0000 0x372c0030  0x3a599490 0x32d9460(53,318,752)
Large object heap starts at 0x0b6f0030
 segment    begin       allocated     size
0x0b6f0000 0x0b6f0030  0x0b7f6198 0x00106168(1,073,512)
Heap Size  0x17102694(386,934,420)
------------------------------
Reserved segments:
------------------------------
GC Heap Size  0x2efbce3c(788,254,268)

The GC heaps take up a total of around 788 MB (rounding freely:)), so the next natural step is to figure out what is on the heap... maybe a !dumpheap –stat will do the trick...

0:043> !dumpheap -stat
------------------------------
Heap 0
Bad MethodTable for Obj at 0x26f0030
total 21 objects
------------------------------
Heap 1
Bad MethodTable for Obj at 0x66f0030
total 2 objects
------------------------------
total 23 objects
Statistics:
        MT      Count     TotalSize Class Name
0x020a209c         12        36,960 System.Object[]
0x000dc488          9       488,672      Free
0x79b94638          2       766,768 System.String
Total 23 objects, Total size: 1,292,400

Hmm... what's up here? Managed heap corruption? On top of everything else? I mean we have the classic signs, bad method table (someone overwriting the method table for an object), and the size of the heap looks completely out of whack!!! But, would I really write another blog post about that?!? Of course not:) at least not this soon...

So when else would the heap be out of whack? Just bear with me for a moment...

Let's take a look at the !threads output

0:043> !threads
ThreadCount: 18
UnstartedThread: 0
BackgroundThread: 16
PendingThread: 0
DeadThread: 2
                                  PreEmptive   GC Alloc                     Lock     
        ID  ThreadOBJ       State     GC       Context           Domain     Count APT Exception
 15 0x109c 0x000dc260   0x180a220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker) System.Threading.ThreadAbortException
 19 0x14c4 0x000dd3f8   0x200b220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Finalizer)
XXX      0 0x000f2748   0x1800820 Enabled  0x00000000:0x00000000 0x000d58c0     0 MTA (Threadpool Worker)
 21  0xaa0 0x024bebc0   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
XXX      0 0x024bed88       0x820 Enabled  0x00000000:0x00000000 0x000d58c0     0 Ukn
 29 0x1314 0x024bd318   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
 30 0x1668 0x024e0218   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 31  0xd48 0x024c0460   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     2 MTA (GC) (Threadpool Worker)
 32 0x16ec 0x025372e8   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Threadpool Worker)
 33 0x1758 0x025421e0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA
 34 0x1354 0x025428f0       0x222 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 35  0x954 0x02543000       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 36 0x15c8 0x02543dc0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 37  0xd1c 0x025444d0   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 38  0xbd0 0x0254a2d0       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 39  0x240 0x0254b600   0x1800220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA (Threadpool Worker)
 40 0x10d8 0x0254ca58       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA
 41  0xd80 0x02572018       0x220 Enabled  0x00000000:0x00000000 0x000f2a68     1 MTA

Notice thread 31, in the exception column we see (GC), that means that we are in the middle of a garbage collection. Actually, it means that Thread 31 made an allocation that triggered a GC, alternatively, it could be that someone called GC.Collect on thread 31.

Taking a closer look (running kb 2000 on thread 31) we find that in fact someone made an allocation, causing the GC to have to allocate_more_space and triggering a GC.

  31  Id: 14b0.d48 Suspend: 1 Teb: 7ff59000 Unfrozen
ChildEBP RetAddr  Args to Child              
0ca7ecb4 7c822124 77e6baa8 000003e0 00000000 ntdll!KiFastSystemCallRet
0ca7ecb8 77e6baa8 000003e0 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0ca7ed28 77e6ba12 000003e0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0ca7ed3c 791fee7b 000003e0 ffffffff 000b9368 kernel32!WaitForSingleObject+0x12
0ca7ed5c 7920273d 00000001 00000000 00009c4c mscorsvr!GCHeap::GarbageCollectGeneration+0x1a9
0ca7ed8c 791c0ccd 024c0498 00009c4c 00000000 mscorsvr!gc_heap::allocate_more_space+0x181
0ca7efb4 791b6269 024c0498 00009c4c 00000000 mscorsvr!GCHeap::Alloc+0x7b
0ca7efc8 791b8873 00009c4c 00000000 00000000 mscorsvr!Alloc+0x3a
0ca7efe8 791b8814 020a26b0 00002710 00000000 mscorsvr!FastAllocatePrimitiveArray+0x45
0ca7f068 0caf0566 3b609d84 372bde28 0ca7f09c mscorsvr!JIT_NewArr1+0xbb
WARNING: Frame IP not in any known module. Following frames may be wrong.
0ca7f078 0caf0524 37090e70 37090184 0276b7ac 0xcaf0566
0ca7f09c 0cba25b4 02764610 37090184 3708f348 0xcaf0524
0ca7f1cc 799a67a2 3708f3b0 3708f6b4 799b4168 0xcba25b4
0ca7f1d8 799b4168 00000002 00000000 0c8f1a38 mscorlib_79990000+0x167a2
0ca7f260 79217188 00000000 00dc9160 02728fd8 mscorlib_79990000+0x24168
02432f60 24548bca 2c15ff10 8b020a95 8bcb8bd8 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
02432f60 00000000 2c15ff10 8b020a95 8bcb8bd8 0x24548bca

If this was a single processor machine, running the GC in workstation mode we would have seen the actual collection on this thread but this is a multi processor machine with 2 heaps (we could see this from the !eeheap –gc output), so we have 2 threads dedicated to doing garbage collection.

We can find out which ones they are by searching our stacks for the function gc_thread_stub which is the first function called on the GC threads (in our case it was thread 17 and 18)

  17  Id: 14b0.172c Suspend: 1 Teb: 7ffa6000 Unfrozen
ChildEBP RetAddr  Args to Child              
0225fdc4 792042e5 11fc4800 11fc67d0 0012eea8 mscorsvr!memcopy+0x12
0225fde4 792043da 00000001 0012eea8 0225fe24 mscorsvr!gc_heap::compact_plug+0x2d
0225fe08 792044cc 120f5694 00000000 026f0030 mscorsvr!gc_heap::compact_in_brick+0x59
0225fe40 791ff1bd 00000002 026f0030 00000001 mscorsvr!gc_heap::compact_phase+0x116
0225fed0 791ff43d 00000002 000b9368 00000000 mscorsvr!gc_heap::plan_phase+0x857
0225ff24 791ff065 00000000 000b9368 77e670b2 mscorsvr!gc_heap::gc1+0x78
0225ff88 791fe6c3 00000001 00000000 000b9368 mscorsvr!gc_heap::garbage_collect+0x22f
0225ffac 792356be 00000000 0225ffec 77e66063 mscorsvr!gc_heap::gc_thread_function+0x42
0225ffb8 77e66063 000b9368 00000000 00000000 mscorsvr!gc_heap::gc_thread_stub+0x1e
0225ffec 00000000 792356a0 000b9368 00000000 kernel32!BaseThreadStart+0x34

  18  Id: 14b0.76c Suspend: 1 Teb: 7ffa5000 Unfrozen
ChildEBP RetAddr  Args to Child              
0229fcc4 792042e5 1848d1b4 1848eae8 00092934 mscorsvr!memcopy+0x1c
0229fce4 792043da 00000001 00092934 0229fd24 mscorsvr!gc_heap::compact_plug+0x2d
0229fd08 792044cc 1852142c 00000000 066f0030 mscorsvr!gc_heap::compact_in_brick+0x59
0229fd40 791ff1bd 00000002 066f0030 00000001 mscorsvr!gc_heap::compact_phase+0x116
0229fdd0 791ff43d 00000002 000dc930 00000002 mscorsvr!gc_heap::plan_phase+0x857
0229fe24 791ff065 00000000 000dc930 77e670b2 mscorsvr!gc_heap::gc1+0x78
0229fe88 791fe6c3 00000001 00000000 000dc930 mscorsvr!gc_heap::garbage_collect+0x22f
0229feac 792356be 80a79bcc f773fa00 ef609be0 mscorsvr!gc_heap::gc_thread_function+0x42
0229ffb8 77e66063 000dc930 00000000 00000000 mscorsvr!gc_heap::gc_thread_stub+0x1e
0229ffec 00000000 792356a0 000dc930 00000000 kernel32!BaseThreadStart+0x34

And this folks, is the answer to why our heaps are in this funky state. The GC is in the middle of compacting the heaps which means that the heaps are "in motion" so the output from !dumpheap –stat really can't be relied on.

If we really want to know what is on the heaps we can either manually go through and run !dumpheap for all segments...

Ex.

Heap 1 (0x000dc930)
generation 0 starts at 0x3a244e9c
generation 1 starts at 0x39c85f9c
generation 2 starts at 0x066f0030
ephemeral segment allocation context: (0x3a244e9c, 0x3a244ea8)
 segment    begin       allocated     size
0x66f0000 0x66f0030  0xa6d8818 0x3fe87e8(67,012,584)
0x15010000 0x15010030  0x18ffd9b8 0x3fed988(67,033,480)
0x1d010000 0x1d010030  0x20ff8d4c 0x3fe8d1c(67,013,916)
...

0:043> !dumpheap 0x15010030 0x18ffd9b8
------------------------------
Heap 0
total 0 objects
------------------------------
Heap 1
0x15010030 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x1501003c 0x020a26b0   40,012    2 System.Int32[] 
0x15019c88 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x15019c94 0x020a26b0   40,012    2 System.Int32[] 
0x150238e0 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x150238ec 0x020a26b0   40,012    2 System.Int32[] 
0x1502d538 0x00afc0fc       12    2 TestSession.MyClassWFinalizer 
0x1502d544 0x020a26b0   40,012    2 System.Int32[]
...

and summarize the resulting output, keeping in mind that it wont work for some segments that are currently in motion... or get a new dump when we are not in a GC. But I'll leave that as an exercise for the reader:) (don't you just hate it when they say that in study books, it just means that the author either didn't know how to do it themselves, or was too lazy to:) in my case it's the latter, I promise:))

Hang/Performance route:

It was also mentioned, in hte problem description, that apart from that memory grew really rapidly, CPU usage was high, and requests appeared to be stuck... since we couldn't get much info about the managed heap from !dumpheap –stat, perhaps we should try the "hang" approach and see what that gives us.

Normally ~* kb and ~* e !clrstack (to see all stacks) would have been the way to go, and it is in this case too, but before we go for the dough, let's take a peak at the threadpool, to get a bit of info about our queue status, CPU usage and whether or not the threadpool is exhausted.

0:043> !threadpool
CPU utilization 81%
Worker Thread: Total: 16 Running: 16 Idle: 0 MaxLimit: 40 MinLimit: 2
Work Request in Queue: 68
0x9a3dc
CorThreadPoolWorkitemCallback Callback@0x01e84df0
Unknown 0x000e3f20
CorThreadPoolWorkitemCallback Callback@0x01eb55e8
CorThreadPoolWorkitemCallback Callback@0x01eb5578
CorThreadPoolWorkitemCallback Callback@0x01eb5df0
CorThreadPoolWorkitemCallback Callback@0x01eb5db8
CorThreadPoolWorkitemCallback Callback@0x01eb57a8
CorThreadPoolWorkitemCallback Callback@0x01eb5700
CorThreadPoolWorkitemCallback Callback@0x01e83e20
...
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread: Total: 2 Free: 2 MaxFree: 4 CurrentLimit: 2 MaxLimit: 40 MinLimit: 2

So we have 16 worker threads created out of a total of 40 (20*2 processors) where 20 is the number we've given in machine.config for maxWorkerThreads and all of them are in the middle of executing requests. We have 68 work requests in the queue, hmm... then why aren't we just creating new threads to process these requests??? It doesn't really appear like we exhausted the threadpool.

Ok, everybody who has ever been curious why CPU utilization was listed under threadpool, raise your hand:)

We know from previous case studies that the CPU utilization shows the CPU utilization of the whole system, not only the CPU utilization of this process, how could that possibly have anything to do with the threadpool? Well, surprise surprise, it does. This piece of information is stored in a variable by the framework because it is one of the factors that determine whether or not we will be creating new threads. In particular, if the CPU utilization is over 80% we don't create any new threads... and this is why we are not creating any new threads in this case.

As a matter of fact, if you take a look at any dump where we are in the middle of a garbage collection you will notice that the CPU utilization is 81%. Is this because the GC in some magical way manages to keep itself exactly at 81%? Don't think so:) But artificially setting this value during a GC to 81 has the nice side-effect that no new threads can be created that could otherwise be allocating memory or otherwise changing objects on the heap. So, the real CPU usage could be 20 or 100 or 59% for that matter, it will always be shown as 81 during a GC, so you can't really rely on this either in a GC dump.

Anyways, over to the threads... what are they all up to?

If we run ~* kb and look at all the stacks we will see that most of them are sitting in a stack similar to this one...

  32  Id: 14b0.16ec Suspend: 1 Teb: 7ff58000 Unfrozen
ChildEBP RetAddr  Args to Child              
0cbeeae0 7c822124 77e6baa8 000003ec 00000000 ntdll!KiFastSystemCallRet
0cbeeae4 77e6baa8 000003ec 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0cbeeb54 77e6ba12 000003ec ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0cbeeb68 791fed5c 000003ec ffffffff 791fe901 kernel32!WaitForSingleObject+0x12
0cbeeb74 791fe901 00000001 025372e8 00000001 mscorsvr!GCHeap::WaitUntilGCComplete+0x4f
0cbeeb84 7926ec73 0000001c 02537320 0000001c mscorsvr!Thread::RareDisablePreemptiveGC+0xb5
0cbeebac 791c0ccd 02537320 0000001c 00000000 mscorsvr!gc_heap::allocate_more_space+0xe6
0cbeedd4 791b6269 02537320 0000001c 00000000 mscorsvr!GCHeap::Alloc+0x7b
0cbeede8 791c0dfd 0000001c 00000000 00000000 mscorsvr!Alloc+0x3a
0cbeee04 791b2a11 00000006 00000005 791ce390 mscorsvr!SlowAllocateString+0x26
0cbeee10 791ce390 00000005 39685aa4 0cbeee9c mscorsvr!JIT_TrialAlloc::GenAllocArray+0x29c
0cbeee20 791ce428 0cbeeea4 00000408 00000005 mscorsvr!COMString::NewString+0x20
0cbeee9c 026b2b0c 39340a14 00000408 0cbeef44 mscorsvr!COMString::StringInitCharArray+0xa4
WARNING: Frame IP not in any known module. Following frames may be wrong.
0cbeef80 79999f6d 39680c14 00000000 791b8bc6 0x26b2b0c
0cbeef8c 791b8bc6 00000000 0cbeeff4 026b75af mscorlib_79990000+0x9f6d
0cbeef98 026b75af 391c35c4 00000001 391c355c mscorsvr!COMString::EqualsString+0x3f
00000000 00000000 00000000 00000000 00000000 0x26b75af

Waiting for the GC to complete...

And if we were to look at some other dumps taken during this performance issue, we would see that most if not all would be in a GC. So it appears that our problem "hang"-wise seems to be that we are doing way many garbage collections or that the garbage collections are taking a loooong time.

What can make us spend a lot of our time in GC?

The short answer: Lots of garbage collections, specifically a lot of higher generation collections, in essence it is very bad if we constantly hit the limit for generation 2.

If you want the GC to work really hard, here are some good tips (short of manually calling GC.Collect often):

  • Have a high allocation rate / second
  • Let your objects reach gen 2 and then let them die
  • Create a lot of objects that get stored on the large object heap (since collecting the LOH means collecting all generations)
  • Block the finalizer so that no matter if objects are collected they still can't go away if they have a finalizer

A few examples...

High allocation rate

Let's say you have a recordset containing 10.000 people and you decide to dynamically create a page with an HTML table containing these. You start by storing the header of the page in a string myPageContents = "<HTML><TITLE>blabla</TITLE><BODY><TABLE>..." and so on. Then you have a loop that appends the record enclosed in the proper table tags for each record like this

myPageContents += "<TR><TD>" + firstname + "</TD><TD>" + lastname + "</TD></TR>";

This sounds like a fine idea, right?

Well, think again... The problem is that strings are immutable, i.e. that they can not be changed. This means that we will create 5 new copies of the string (one slightly larger than the previous), for each iteration if we use the code above. Even worse, when the strings get long enough, the later ones would be stored on the LOH. Now that is a pretty good way to get a high allocation rate / second and growing your LOH:)

Let objects reach gen 2 and then die

Let all your objects implement finalizers, or derive from objects that implement finalizers. Even if the finalizer doesn't contain any code the object will be "rooted" by the freacheable queue when it is created and if no-one has suppressed finalization (in a dispose method or similar) for it, it won't be free'd straight away when a garbage collection occurs, but rather it would be promoted one generation and then, only once the finalizer has run it's finalize method it can be collected. In other words, if you want your object to be promoted one generation for no good reason, create a finalize method and leave it empty:)

Speaking of finalize methods, please read http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconFinalizeDispose.asp on finalization and dispose patterns so you don't run into unnecessary issues such as the one just mentioned, or nullreference exceptions because you are accessing an objects member variable inside its finalizer.

Large object heap

I think this one is pretty obvious, pretty much, try to use the LOH as little as possible.

Blocking finalizers

A finalize method should never perform a blocking operation, the reason for this is that we only have one finalize thread in a .net process, and if we block it we won't be able to get rid of any objects that need finalization. This of course leads to high memory consumption, but also to a large number of garbage collections since we will reach the limits that cause us to GC over and over again because we never get rid of any data. Seems pretty much like what we are running into in our scenario.

So how do we find out which one we are running into?

If we have a high allocation rate we will see this in perfmon under .NET CLR Memory - Allocated Bytes/Sec, and technically the best way to figure out why we are allocating so much (if we can limit it to a specific page or small scenario) would be to run a profiler like the CLR Profiler. But we could get a pretty good idea by looking at the thread that caused the GC and see if perhaps it is doing something that could be allocating a lot of memory, and run !objsize on the objects on that stack (!dso output) to see if you find any particularly big ones. Remember though that what we see on that thread might just be the straw that broke the camels back.

To see which of your objects have finalizers you can run !finalizequeue. That will show you a list of all objects currently rooted by the freachable queue and can give you a good idea about which of your classes you should inspect for unnecessary finalizers. Remember here that it's not an exhaustive list of all classes that have finalizers, just the ones currently alive, and that noone has called SuppressFinalize on.

Finally, to see if you have a blocked finalizer, simply look at the finalizer thread and see what it is doing. In our case the finalizer is thread 19 (from the !threads output)

19 0x14c4 0x000dd3f8   0x200b220 Enabled  0x00000000:0x00000000 0x000f2a68     0 MTA (Finalizer)

Kb 2000 (native stack) and !clrstack (managed stack) for this thread show

  19  Id: 14b0.14c4 Suspend: 1 Teb: 7ffa4000 Unfrozen
ChildEBP RetAddr  Args to Child              
022dfbb4 7c821364 77e42439 00000001 022dfbf8 ntdll!KiFastSystemCallRet
022dfbb8 77e42439 00000001 022dfbf8 00000000 ntdll!NtDelayExecution+0xc
022dfc20 7920a823 000186a0 00000001 022dfe00 kernel32!SleepEx+0x68
022dfc40 7920a89b 000186a0 022dfc78 022dfca8 mscorsvr!Thread::UserSleep+0x93
022dfc50 0247316a 022dfc5c 000186a0 022dfcb0 mscorsvr!ThreadNative::Sleep+0x30
WARNING: Frame IP not in any known module. Following frames may be wrong.
022dfca8 79204678 022dfd24 79214df6 022dfe00 0x247316a
022dfd34 7920473d 049db038 000dd3f8 000f2a68 mscorsvr!MethodTable::CallFinalizer+0xee
022dfd48 79203b2c 049db038 00171eb0 022dfe74 mscorsvr!CallFinalizer+0x84
022dfde0 79204806 049db038 00000000 000dd3f8 mscorsvr!CallFinalizer+0x255
022dfdf4 791cc3c8 022dfe74 791b3d64 022dfe5c mscorsvr!GCHeap::RestartEE+0x73
022dfe3c 792047d9 00171eb0 792047f3 022dfe74 mscorsvr!Thread::DoADCallBack+0x5c
022dfedc 791fed20 02792828 00000000 00000000 mscorsvr!CallFinalizer+0x2fa
022dff24 79245681 00000000 80915704 7ffa4000 mscorsvr!GCHeap::FinalizerThreadStart+0xc2
022dffb8 77e66063 000dd7d0 00000000 00000000 mscorsvr!Thread::intermediateThreadProc+0x44
022dffec 00000000 79245640 000dd7d0 00000000 kernel32!BaseThreadStart+0x34

Thread 19
ESP         EIP       
0x022dfc78  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Void System.Threading.Thread.Sleep(I4)
0x022dfc88  0x0caf05cd [DEFAULT] [hasThis] Void MemoryIssues.MyClassWFinalizer.Finalize()
0x022dfe00  0x79204678 [FRAME: ContextTransitionFrame] 
0x022dfe5c  0x79204678 [FRAME: GCFrame]

Houston, we have a problem:)

We are definitely blocking the finalizer. The finalize method for MyClassWFinalizer calls sleep for some reason and thus blocks the finalizer.

This might not be a super common scenario (the sleep), but it is quite often you see the finalizer get stuck in a native critical section when trying to free some native resource, or even sometimes trying to create a connection to a database or similar.

The moral of the story is to avoid, as much as possible, anything that could be blocking the finalizer, better yet, avoid letting the finalizer clean up at all. If the object you are using has a native resource of some sort, call it's close, or dispose method, when you're done with it, or use the object inside a using statement.

Close or dispose methods, if correctly implemented will call SuppressFinalize which means that your object will not be unnecessarily promoted and will not have to pass through the single finalizer thread for cleanup.

Laters all... and remember 2+2=5 (for extremely large values of 2)

 

 





  • Is it okay to call the Rollback() method of a System.Data.Common.DbTransaction object within the Finalizer or could this cause a problem?
  • Thanks for sharing this, great work!
    I enjoy and learn a lot from your blog.
  • Tess,
    do you have any non-CLR hang examples or links?

    Considering all the apps written in CLR and all the ones in Vista......
  • CuttingEdge: I wouldn't recommend calling a DbTransaction.Rollback() inside a finalizer since the finalizer is not deterministic.

    It runs on a different thread and you don't know when exactly its going to be called (unless you call GC.Collect(); and GC.WaitForPendingFinalizers() but I wouldn't recommend that.

    In general it is recommended to hold the transaction to the least amount of time needed, so the pattern usually looks like this:

    IDbTransaction transaction = connection.BeginTransaction(...);
    try {
     ... Add you database code here with the reference to the transaction ...

      transaction.Commit();
    }
    catch (Exception ex) {
       transaction.RollBack();
       throw ex; // Or log it or something
    }


    Mind that the commit is also within the Try..Catch block since you might run into an error while commit, at which point you should rollback as well.
  • Eran,

    Tess noted that when implementing Dispose(), we should also implement the finalizer. I've build a wrapper class around a Transaction, that looks basicly like this.

    public class DbTransactionWrapper : IDisposable
    {
    private bool _isCommitted;

    public DbTransaction Transaction { get; set; }

    public void Commit() {
    // Commit the Transaction
    _isCommitted = true;
    }

    public DbTransactionWrapper(string connectionString) {
    // Create a Connection and Transaction here
    }

    public void Dispose() { Dispose(true); GC.SuppressFinalize(this); }

    ~DbTransactionWrapper() { Dispose(false); }

    public void Dispose(bool disposing) {
    // When the Transaction is not committed, Rollback the Transaction and throw an exception
    }
    }


    and then let users call it like this:

    using (DbTransactionWrapper w = new DbTransactionWrapper("constr"))
    {
    SqlHelper.ExecuteNonQuery(w.Transaction, "query1");
    SqlHelper.ExecuteNonQuery(w.Transaction, "query2");
    w.Commit();
    }

    The nice thing about this is, that you don't have to call RollBack yourself, because it will be done for you by the wrapper and the code looks quite nice, doesn't it?

    So there is not really a problem when a user sticks to the use of a try-finally or using block, but simply when s/he doesn't use this guideline. In that situation -at some point- the finalizer WILL be called, and what should we do exactly at this point? Rollback the Transaction? Only Dispose the Transaction? Or is implementing a the whole transaction wrapper concept simply not a good idea?

    Thanks for any feedback.


    ps. Tess I love reading your blogs.
  • I would agree with Eran that it might not be great to have it in the finalizer since it's non deterministic. Specifically, in the finalizer you can not be sure that the transaction has not been finalized/cleaned up already.  

    "Tess noted that when implementing Dispose(), we should also implement the finalizer."  What i really meant was, if you implement finalize, you should implement dispose/close so that you can avoid having to go through the finalizer.

    Apart from that you cant sure that the transaction is still alive in the finalize method, i think if you do get to call RollBack you would talk to the DB, in which case you could create a potential block on the finalizer so i would definitely not recommend doing the rollback there.

    Since your class does not have a native resource (The transaction is a managed object) this class (DbTransactionWrapper) should not have a finalize method at all. Only objects that open and/or hold on to native resources through native API's should have finalize methods.  

    Hope this makes sense.
  • Jack,

    I don't have any posts planned about hangs in non .net applications (wanted to focus the blog on .net apps).  But in short the way to figure these out is the same as with .net hangs.  (Since no .net app is purely managed we can block in non .net code in a .net app too).  

    To figure out a hang, given a dump i usually check

    ~* kb (native stacks) and ~* e !clrstack (managed stacks)

    Try to look for patterns like AwareLock::Enter (threads waiting for managed locks) and WaitForCriticalSection (threads waiting in native critical sections)  and CorWaitOneNative (managed async "locks").   And then for the synchronous locks (lock and critical section) find the owner by running !syncblk (locks) and !critlist (from sieextpub.dll for critical sections)

    For async locks (wait one) you need to look at what the thread is doing since there isnt really a lock owner in the right sense of the word there.
  • Tess,

    this makes a lot of sense. Thank you!
  • To Jack:

    Open Windbg Help and read the
    "Debugging Deadlocks (No Ready Threads)" and
    "Tracking Down a Processor Hog" topics.

    You could also use "Application verifier" with locks and deadlocks monitoring (to get more fancy stuff).

    Tal.
  • Blocked finalizer threads have gotten some recent publicity on Tess Ferrandez's blog. I recently ran...
  • I was just going through some memory leak information and I stumbled across a newish posting from Tess:...
  • Поучительная история одного бага. Пишем в &quot;деструкторе&quot; Thread.Sleep — и приложе
  • Excellent investigation, Tess. Very good job!
  • ouch, my head hurts after that ;-)  but seriously ... good info
Page 1 of 3 (44 items) 123
Leave a Comment
  • Please add 2 and 2 and type the answer here:
  • Post