@TessFerrandez
I was looking at a memory dump recently for an issue where the process would grow to over 1 GB and return OutOfMemory Exceptions.
Debugging the issue:
A full user dump of the w3wp.exe process was taken with debug diag 1.1 when memory was really high.
If you are reading this and have no clue what debug diag is, how to get dumps with it, or use leak tracking, here is the download and the usage instructions.
The dump itself was 1.34 GB which is basically the memory we have committed (used) so it was pretty big, and the first thing we need to check for in this case is whether we are dealing with an issue with high .NET GC memory usage (lots of .net objects), .NET loader heap usage (lots of dlls) or native memory usage (native heaps or dlls)
I opened the dump in windbg and loaded sos (.loadby sos mscorwks) to take a peak at the managed memory
!eeheap –gc showed that the size of the .net GC heaps was only 268 MB so at first sight it didn’t really look like a .NET leak, which left us with a loader heap issue or a native leak.
!address –summary shows that most of the memory is used by RegionUsageIsVAD and RegionUsageHeap, i.e. virtual allocs and heap allocs. Now .NET memory goes into RegionUsageIsVAD but with this much RegionUsageHeap I suspected a native leak or high native memory usage. In order to figure out what was causing this high NT Heap usage I asked for a debug diag dump with leak tracking enabled.
Leak tracking in debug diag will basically track any allocations and de allocations you make, and at the end of the run you will end up with a dump containing information about what stacks allocated the memory currently in use in the dump.
In this case they had already ran debug diag with leak tracking enabled which was great, but I will show you a few gotchas with leak tracking that they had unfortunately ran into.
When you have used leak tracking, you need to run the dump through advanced analysis-Memory preassure analysis in debug diag to parse the leak tracking results. Debug diag will then give you information at the top about the biggest offenders, along with some heap usage information and sample stacks for allocations.
In this case debug diag was showing the following:
Ok, so 35 MB worth of GDI+ objects and 30 MB from some unknown module. But we are dealing with a ~1.4 GB dump so this seem like a tiny drop of water in a huge ocean.
A couple of questions may appear in your head here (at least they do in mine when I see this)
Why didn’t the report show more allocations? What is the GDI+ stuff? and what the heck is the Unknown module?
Why didn’t the report show more allocations?
If we look at the report we first have a section with machine name, operating system, process name etc. Here we also have data about the process uptime, which in this case is ~17 hrs. So this memory (1.4 GB) we have in the process was built up over 17 hours.
System Up-Time 3 day(s) 06:47:23 Process Up-Time 17:15:10
If we then scroll down to the Leak Analysis report section we find
Outstanding allocation summary
Number of allocations 14,959 allocations Total outstanding handle count 180 handles Total size of allocations 88.38 MBytes Tracking duration 00:07:01
So this means that leak tracking has only been turned on for 7 minutes out of the 17 hrs that the process was up. During the time leak tracking was turned on we allocated 88.38 MB and 180 handles. Debug diag will not be able to report “leaks” that happened before leak tracking was turned on, so these 35 MB for GDI+ should be compared to the 88 MB of total allocations.
In reality, given that leak tracking was turned on for such a short amount of time we can’t really affirmatively say much at all about the results here since the size of the allocations are so small compared to the size of the process. We would need to rerun debug diag leak tracking, preferably from the start of the process to say anything for sure.
But, if we ignore this and extrapolate freely, assuming that the allocation pattern is uniform, we can assume that a lot of the memory in the dump is GDI+ items, and a lot comes from this “Unknown module”. I wouldn’t recommend this as proof, but it’s worth a try spinning on this thread to see if we can get any more info, until we get a new memory dump with leak tracking turned on for a longer time.
What is the GDI+ stuff?
Well, looking at the message debug diag gave us, it seems like most of the GDI+ objects come from this method: gdiplus!GpMemoryBitmap::AllocBitmapData. The name implies that we are creating a bitmap, which can either be done explicitly from some native component, or indirectly by creating a System.Drawing.Bitmap.
Using System.Drawing objects from a service is unsupported because of the nature of such applications (i.e. running under the context of a user context that might not be logged, and the potential for desktop heap exhaustion)
From MSDN Caution: Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions.
From MSDN
Caution:
Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions.
But…, I am not going to harp on the unsuitability of System.Drawing objects in a server app. What I would say though, is that if you choose to go this unsupported route, you need to make extremely sure that you got your ducks in a row and dispose of your System.Drawing objects appropriately to avoid handle leaks, desktop heap exhaustion etc.
Going back to the dump in windbg, we can take a look at the heap to see what types and how many System.Drawing objects we have here
Among other things we can see that we have 356 Bitmaps, so since those are still in memory, unless they are disposed they will still hold on to a lot of native GDI+ mem. Well how much will of course depend on how big the respective bitmaps are.
We’ll get back to why these are sticking around in a moment.
What the heck is the Unknown module?
If you see unknown modules in debug diag, it means that debug diag couldn’t resolve the symbols for the module. This often happens if it is a .net method.
Clicking on the unknown module in the report header takes us to some sample stacks like this one. So some kind of data related stuff.
Let’s go back to windbg to figure out why these bitmaps are still around
The bitmap that I happened to pick and do !gcroot on (to figure out why it was not collected), was rooted in the finalize queue, meaning that it is ready for finalization but waiting for the finalizer to finalize it.
The other bitmaps follow the exact same pattern, so the next step is to figure out what the finalizer is doing and figure out what is stopping it from finalizing these bitmaps.
0:000> !threads ThreadCount: 279 UnstartedThread: 0 BackgroundThread: 118 PendingThread: 0 DeadThread: 161 Hosted Runtime: no PreEmptive GC Alloc Lock ID OSID ThreadOBJ State GC Context Domain Count APT Exception 1 1 190c 0015b598 8808220 Enabled 00000000:00000000 0014b380 0 Ukn (Threadpool Completion Port) 14 2 257c 0016c6d8 b220 Enabled 00000000:00000000 0019c920 0 MTA (Finalizer) ...
0:000> ~14s eax=00166878 ebx=00000000 ecx=00167208 edx=00000000 esi=40000ee0 edi=00000000 eip=77f88f13 esp=19a0f12c ebp=19a0f19c iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000202 NTDLL!NtWaitForSingleObject+0xb: 77f88f13 c20c00 ret 0Ch
0:014> !clrstack OS Thread Id: 0x257c (14) ESP EIP 19a0f444 77f88f13 [InlinedCallFrame: 19a0f444] System.Data.Common.UnsafeNativeMethods.OCIHandleFree(IntPtr, HTYPE) 19a0f440 6589a74c System.Data.OracleClient.TracedNativeMethods.OCIHandleFree(IntPtr, HTYPE) 19a0f480 658717b7 System.Data.OracleClient.OciHandle.Release() 19a0f4b4 6587183d System.Data.OracleClient.OciHandle.ReleaseHandle() 19a0f6c8 79e88f63 [GCFrame: 19a0f6c8] 19a0f8c0 79e88f63 [GCFrame: 19a0f8c0] 19a0f92c 79e88f63 [HelperMethodFrame_1OBJ: 19a0f92c] System.Runtime.InteropServices.SafeHandle.InternalFinalize() 19a0f984 7934dc52 System.Runtime.InteropServices.SafeHandle.Dispose(Boolean) 19a0f988 7934db81 System.Runtime.InteropServices.SafeHandle.Finalize() 19a0fc38 79f7b546 [ContextTransitionFrame: 19a0fc38] 19a0fd14 79f7b546 [GCFrame: 19a0fd14]
0:014> kL ChildEBP RetAddr 19a0f128 77f8f295 NTDLL!NtWaitForSingleObject+0xb 19a0f19c 77f87f26 NTDLL!RtlpWaitForCriticalSection+0x9e 19a0f1a4 6111e091 NTDLL!RtlEnterCriticalSection+0x46 WARNING: Stack unwind information not available. Following frames may be wrong. 19a0f1b4 6067819d ORACORE9!sltsmna+0x11 19a0f3fc 60677e82 oraclient9!kpufhndl0+0x30d 19a0f410 6060121a oraclient9!kpufhndl+0x12 19a0f420 25f9c85f oraclient9!OCIHandleFree+0x1a 19a0f430 6589a74c oci!OCICharSetToUnicode+0xaaae 19a0f478 658717b7 System_Data_OracleClient_ni!_load_config_used+0x39554 19a0f4ac 6587183d System_Data_OracleClient_ni!_load_config_used+0x105bf 19a0f4c0 79e88ee4 System_Data_OracleClient_ni!_load_config_used+0x10645 19a0f540 79e88e31 mscorwks!CallDescrWorkerWithHandler+0xa3 19a0f678 79e88d19 mscorwks!MethodDesc::CallDescr+0x19c 19a0f690 79eeea68 mscorwks!MethodDesc::CallTargetWorker+0x20 19a0f6b0 79f01004 mscorwks!MethodDescCallSite::Call_RetBool+0x1c 19a0f7b0 79f00f8a mscorwks!SafeHandle::RunReleaseMethod+0x75 19a0f8b0 79f00eb6 mscorwks!SafeHandle::Release+0xf2 19a0f8dc 79f357b7 mscorwks!SafeHandle::Dispose+0x2b 19a0f97c 7934dc52 mscorwks!SafeHandle::Finalize+0xab 19a0f9ac 79f7b546 mscorlib_ni+0x28dc52
Although I don’t have the oracle symbols so the oracore9, oraclient9 and oci methods are incorrect, we can see that the finalizer is currently trying to finalize an OciHandle, and in order to do so it needs to enter a critical section which is currently locked.
Until it has finalized this OciHandle it can’t finalize anything else, so our Bitmaps can’t be garbage collected.
If we want to go a bit further we can find out who owns the critical section and have a look at what it’s doing.
!locks will give us a list of all the locks in the process and any locks that have a lock count higher than 0 are currently locked. In this case, the only one that fits that criteria is the one above, so this is the critsec that we are waiting for.
It is owned by the thread with the OSID 1ca0 so we can move to this thread to find out what it is working on.
Again, the oracle methods are incorrect in this stack since we don’t have symbols, but in essence we are getting some data from an Oracle db, waiting for it to return the data.
What do we have so far?
We know that we have high memory usage, and that there is a big potential for this high memory usage to come from bitmaps.
We also know that there are indeed some System.Drawing.Bitmaps in the process that are waiting to be finalized, which means that these bitmaps are not explicitly disposed in code after we are finished with them. If they were explicitly disposed they would not have to be finalized, and thus we would not have to wait for the finalizer in order to release the underlying GDI+ resources.
We also know that the finalizer is blocked while trying to release some Oracle OCIHandle and that in order to unblock the request on thread 69 has to finish.
This in turn also means that we know that the applications code is not explicitly closing Oracle connections, command objects etc. when it’s finished using them. As with the Bitmaps, if it was explicitly closing them, they would not have to be finalized on the finalizer.
In a nutshell:
To answer the 1 GB question, to dispose or not to dispose? If you create objects that have pointers to native resources, like connections, bitmaps, files etc. you need to dispose them explicitly when you are done with them, if you don’t want to end up in a situation like this one.
MSDN will usually tell you if you need to dispose or not, like in the case of the bitmap dispose method (derived from Image dispose)
Note: Always call Dispose before you release your last reference to the Image. Otherwise, the resources it is using will not be freed until the garbage collector calls the Image object's Finalize method.
Note:
Always call Dispose before you release your last reference to the Image. Otherwise, the resources it is using will not be freed until the garbage collector calls the Image object's Finalize method.
Until next time,
Tess
PingBack from http://www.anith.com/?p=4911
Note: Thanks Tess! ;-)   .NET Memory Leak: To dispose or not to dispose, that’s the 1 GB question
I was looking at a memory dump recently for an issue where the process would grow to over 1 GB and return
Stumbled over this blog a few days ago and I'm amazed of how good it is. I love the examples you are using and hope to be able to go through all the labs in the very near future.
Keep up the good work!!!!
What .NET drawing namespace *should* be used for creating and processing images in an ASP.NET application?
Great post as usual. Since I discovered the work of debugging I have been actively doing memory dumps of all the applications I have worked on. Thanks for the great insight in the debugging world , now windbg is my best friend :)
Regards
Shivam
thanks.it is a good pattern to dispose com objects!
Hi Tess,
Do we have any tools like code analysis and telling whatever area where the dispose is not done well ?
That will be of great help. Or Microsoft can add this feature in VS2008 code analysis feature.
Thanks,
Thani
Barrkel, unfortunately there are no .net drawing namespaces that are supported for image editing in asp.net or other services.
That doesn't mean that you can't use them, it just means that if you run into issues there will be no fixes etc. unless you can repro the issue in a winforms app. It also means that they are often inappropriate to use in service applications for the reasons mentioned above (user context, desktop heap exhaustion etc.)
I think there might be some 3rd party components though, and for graphing I believe there is now also an MS graph control.
Thanigainathan, not sure if fxcop will do this, but in esscence it's a bit hard to check as it is only at runtime you will know if it is disposed correctly or not, given that the scope of an object may be very long.
Great post, thanks Tess.
One thing though, we're doing a LOT of heavy bitmap work in one of our websites, motoprofessional.com, and have had no problems yet. Though we do take particular care over disposal and resource usage. I didn't know it wasn't recommended for ASP.net apps, nice to know :)
There's an alternative to !locks for determining what thread currently owns the critical section that your focus thread is waiting on. The critical section is the first parameter passed to ntdll!RtlEnterCriticalSection, so if you dump out the parameters:
0:061> kb 20
ChildEBP RetAddr Args to Child
2d90b81c 7c822124 7c83970f 0000200c 00000000 ntdll!KiFastSystemCallRet
2d90b820 7c83970f 0000200c 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
2d90b85c 7c839620 00000000 00000002 35255d10 ntdll!RtlpWaitOnCriticalSection+0x19c
2d90b87c 03e6507b 2954eff8 2d90b954 2954eff8 ntdll!RtlEnterCriticalSection+0xa8
you can use the first parameter as the address to examine the critical section itself:
0:061> dt _RTL_CRITICAL_SECTION 2954eff8
mymodule!_RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0x097fde98 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : -6
+0x008 RecursionCount : 1
+0x00c OwningThread : 0x00000248
+0x010 LockSemaphore : 0x0000200c
+0x014 SpinCount : 0
The "OwningThread" field stores the thread ID of the owning thread.
I've found this approach to be quicker than using !locks if all I care about is who owns the CS.
Wow, another great and practical tutorial, thanks!
In general, if I can't find any documentation on whether a .net object needs to be disposed, by default I will call dispose when I'm finished with it (using using() when possible). So isn't "yes" the broader answer to your question. I mean are there situations where you don't want to call dispose on IDisposable object, even if you're not sure that the object uses pointers to native resources?
good point Dan, you can actually use sieextpub and run !critlist as well.
Russel, I can't think of a place where it would be negative to run dispose, so yeah I would agree with you.
When to dispose: always.
You can try to get by with only disposing when you *know* that native resources are involved, but in doing so, your client code knows something about how the object is implemented. Suppose you swap out a completely managed class that uses no native resources for one which does use native resources - if you weren't previously disposing that object, then now you have to.