A first hand look from the .NET engineering teams
Garbage collection is one of premiere features of the .NET managed coding platform. As the platform has become more capable, we’re seeing developers allocate more and more large objects. Since large objects are managed differently than small objects, we’ve heard a lot of feedback requesting improvement. Today’s post is by Surupa Biswas and Maoni Stephens from the garbage collection feature team. -- Brandon
The CLR manages two different heaps for allocation, the small object heap (SOH) and the large object heap (LOH). Any allocation greater than or equal to 85,000 bytes goes on the LOH. Copying large objects has a performance penalty, so the LOH is not compacted unlike the SOH. Another defining characteristic is that the LOH is only collected during a generation 2 collection. Together, these have the built-in assumption that large object allocations are infrequent.
Because the LOH is not compacted, memory management is more like a traditional allocator. The CLR keeps a free list of available blocks of memory. When allocating a large object, the runtime first looks at the free list to see if it will satisfy the allocation request. When the GC discovers adjacent objects that died, it combines the space they used into one free block which can be used for allocation. Because a lot of interaction with the free list takes place at the time of allocation, there are tradeoffs between speed and optimal placement of memory blocks.
A condition known as fragmentation can occur when nothing on the free list can be used. This can result in an out-of-memory exception despite the fact that collectively there is enough free memory. For developers who work with a lot of large objects, this error condition may be familiar. We’ve received a lot of feedback requesting for a solution to LOH fragmentation.
In .NET 4.5, we made two improvements to the large object heap. First, we significantly improved the way the runtime manages the free list, thereby making more effective use of fragments. Now the memory allocator will revisit the memory fragments that earlier allocation couldn’t use. Second, when in server GC mode, the runtime balances LOH allocations between each heap. Prior to .NET 4.5, we only balanced the SOH. We’ve observed substantial improvements in some of our LOH allocation benchmarks as a result of both changes.
We’re also starting to collect telemetry about how the LOH is used. We’re tracking how often out-of-memory conditions in managed applications are due to LOH fragmentation. We’ll use this data to measure and improve memory management of real-world applications.
We still recommend some traditional techniques are for getting the best performance from the LOH. Many large objects are quite similar in nature, which creates the opportunity for object pooling. Frequently, types allocated on the LOH are byte-buffers that are filled by third-party libraries or devices. Rather than allocating and freeing the buffer, an object pool would let you reuse a previously-allocated buffer. Since fewer allocations and collections take place on the LOH, fragmentation is less likely to occur and the program’s performance is likely to improve.
There is no doubt amongst the dev community in large investment banks about the problems we are facing with LOH fragmentation OOM issues. I wonder why set up a telemetry when the problem is quite evident and needs immediate fixing. To be honest, this issue drives business to move away from .Net stack for crucial applications.
@Mani, as this article and the recent article on GC indicates, we have invested a lot in the .NET garbage collector. We have even made improvements to the LOH. I encourage you to try .NET 4.5 to see if you notice these improvements. If you are actually facing the same problems with .NET 4.5, feel free to contact us at email@example.com. We would love to work with you.
I'll complete some concrete analysis and post back.
Opening 4 modules of our product operates at around 1.2GB for a particular client. If the client refreshes the modules multiple times, the total memory level stays consistent however out of memory exceptions occur and the application must be terminated and restarted. This is confirmed to have not changed in dot net 4.5 and is a great frustration for our customers.
It is very hard to say to a customer "Yes we can spend 40 hours to improve the usage memory by 200MB but if you use the application all day it probably will crash anyway" and have a positive reaction.
I'm sorry if this is coming off as negative and frustrated but this is the reality of our situation. I cannot see why the large object heap is never defragmented under any circumstances. It just does not make sense to me. The performance hit of even a manually triggered event will never ever be anywhere near as impactful as the damage to our reputation that this platform has caused and is causing.
To deal with LOH fragmentation issues these are some of the following workarounds we put in place:
Moved to using /3Gb switch. This has its own nightmares as it reduces the kernel memory space.
Implemented our on memory manager. Allocates large blocks of double and float arrays (which are the max of what we would ever use in the system). This scheme kills your ability to do any parallel work on the blocks due to the possibility of stomping. Hard to put in place constructs to improve performance using multiple cores.
Every release cycle we spend 2 to 3 months spinning on memory reducing efforts so we can support all of our customer requested features.
We are moving to 64bit env just shifting the problem and issue of LOH to a larger memory env.
Object pooling DOES NOT WORK for us as the content that enters the LOH are double and float arrays not our c# objects. Secondly, the arrays are rapidly allocated and released in less than few seconds. These arrays are NEVER same size, these are VARYING in size and there is NO predictable pattern of allocations to come up with array caching strategies.
Indirectly customers are suffering because of the LOH issues. Customers who PAY for applications DO NOT WISH to see OOM exceptions and crashes due to OOM.
What does "Now the memory allocator will revisit the memory fragments that earlier allocation couldn’t use." mean ?
What are situations where memory allocator couldn't use some fragments ?
P.S. I would also vote for compacting LOH, it is just really annoying to have to manage memory ourselves, isn't it the whole point of managed language ?
@dmitri The memory allocator may not have been able to use them because they were too small to satisfy the allocation requests
-Deon - MSFT
You both bring up good issues that I and the garbage collection team would love to learn more about. Given the frustration to your development teams, we do want to understand the situation. It will help us understand the space better as we prioritize and design improvements to the GC. Additionally, it may help us give you advice on further improvements.
Can you reach out to us via the "contact" form located at the top right of the blog? We can then follow up to schedule a phone conversation.
How about just give us a function called "CompactLargeObjectHeapNow()" for the cases where the developer decides that the tradeoff of spending time to compact the LOH is preferable to OutOfMemoryException?
Its quite remarkable that something mature such as the .Net platform has such a fundamental issue. What are the guys at MS doing? I mean its all very well to race ahead, but if you've got a weakness in the foundations then surely before you add extra levels sort out the foundations.
I've started work at a company here where they regularly have to recycle their app pools, and I've tracked this down to large amounts of memory being allocated due to fragmentation. Specifically: 165MB allocated of which 90MB were free but due to fragmentation 20MB was the biggest memory fragment available.
still searching for a workaround as not convinced that the object pooling will work here.
What .NET Framework version do you use? Do you still have the high fragmentation problem when running on .NET 4.5?
This post covers some of the improvements in .Net