A first hand look from the .NET engineering teams
Garbage collection is one of premiere features of the .NET managed coding platform. As the platform has become more capable, we’re seeing developers allocate more and more large objects. Since large objects are managed differently than small objects, we’ve heard a lot of feedback requesting improvement. Today’s post is by Surupa Biswas and Maoni Stephens from the garbage collection feature team. -- Brandon
The CLR manages two different heaps for allocation, the small object heap (SOH) and the large object heap (LOH). Any allocation greater than or equal to 85,000 bytes goes on the LOH. Copying large objects has a performance penalty, so the LOH is not compacted unlike the SOH. Another defining characteristic is that the LOH is only collected during a generation 2 collection. Together, these have the built-in assumption that large object allocations are infrequent.
Because the LOH is not compacted, memory management is more like a traditional allocator. The CLR keeps a free list of available blocks of memory. When allocating a large object, the runtime first looks at the free list to see if it will satisfy the allocation request. When the GC discovers adjacent objects that died, it combines the space they used into one free block which can be used for allocation. Because a lot of interaction with the free list takes place at the time of allocation, there are tradeoffs between speed and optimal placement of memory blocks.
A condition known as fragmentation can occur when nothing on the free list can be used. This can result in an out-of-memory exception despite the fact that collectively there is enough free memory. For developers who work with a lot of large objects, this error condition may be familiar. We’ve received a lot of feedback requesting for a solution to LOH fragmentation.
In .NET 4.5, we made two improvements to the large object heap. First, we significantly improved the way the runtime manages the free list, thereby making more effective use of fragments. Now the memory allocator will revisit the memory fragments that earlier allocation couldn’t use. Second, when in server GC mode, the runtime balances LOH allocations between each heap. Prior to .NET 4.5, we only balanced the SOH. We’ve observed substantial improvements in some of our LOH allocation benchmarks as a result of both changes.
We’re also starting to collect telemetry about how the LOH is used. We’re tracking how often out-of-memory conditions in managed applications are due to LOH fragmentation. We’ll use this data to measure and improve memory management of real-world applications.
We still recommend some traditional techniques are for getting the best performance from the LOH. Many large objects are quite similar in nature, which creates the opportunity for object pooling. Frequently, types allocated on the LOH are byte-buffers that are filled by third-party libraries or devices. Rather than allocating and freeing the buffer, an object pool would let you reuse a previously-allocated buffer. Since fewer allocations and collections take place on the LOH, fragmentation is less likely to occur and the program’s performance is likely to improve.
@RobertWG While we continue to evaluate adding LOH compaction, due to the new allocator, the need for compaction is reduced. We’ve found that some scenarios that used to need compactions can sustain without it now. It would be great to know if the new improvements in the 4.5 Developer Preview improves your scenario. If you still find that you need compaction, User Voice (visualstudio.uservoice.com/.../31481-net) is a good place to add the request.
Guys we are suffering with the platform here. LOH fragmentation is a major production issue for us. Why not allow the developer the option of compacting the LOH? There is absolutely NO REASON why you can't let us choose. The only reason I have heard is "it's slow". The alternative is "crash the application and have to reload hundreds of MBs of data from the database". Have it off by default sure, but preventing us even having the option is an absolute killer.
What if a program is open for a large amount of time and continuously allocates and deallocates LOH objects? If you never defrag the LOH and you are randomly allocating and deallocating LOH and SOH, chances are you will ALWAYS get a OutOfMemoryException from LOH fragmentation. How is that possibly an acceptable scenario?
In the past we have also been facing LOH fragmentation issues. Furthermore, heavy GC load was eating up our performance (around 50%!). We provide numeric libraries for scientific computing and commonly handle large to very large arrays. After all, the problems could be solved completely by redesigning our classes and using pooling strategies. As @Walking Cat and @Jon pointed out, pooling is not always easy, but - at least in our case - it enabled us to get around the GC for large arrays completely! Our apps now run at the speed of C, without any Gen2 collections for hours and are saving GB over GB to be reallocated from the managed heap.
I suggest to try to carefully redesign your memory management. It appears, the big advantage of a managed heap does not mean, you wouldn't have to take care of your memory anymore. Rather, one is even stronger encouraged to implement say a custom pool for LOH objects than on C++. But from our todays point of view, this is not a disadvantage! Ressource intense applications written in C++ would eventually profit from pooling as well. And in difference to C++, if I 'miss' some objects the GC is still around, backing up against true memory leaks.
Still there might be scenarios, where a LOH compaction could be profitable. But for all situations we were facing problems in, the solution was found in stopping misusing the LOH and/or GC. The prevention of frequent large object allocations is most often much better than compacting. It does not only prevent from fragmentation but also gives <a href="ilnumerics.net/.../">high and predictable performance</a>!
Another feature which appears to be useful to us: an option to find out the limit between LOH and SOH. I know, we can gather the limit with a brute force method and use GC.GetGeneration(). But since large objects and small objects MUST be handled differently (which gets obvious not only from this page here) the user should get the chance of knowing this limit on a more reliable/ comfortable way. Anyway - thanks for the great work so far! :)
I agree with @Jon. In the last 5/6 years the LOH has given more grief than anything else i could think of in our system. We had to create a single large array that is never relinquished back to the system till we exit, and manage allocations on that ourselves. We abandoned the idea of pooling as the it did not work i our situation. I don't know what it takes for you guys to provide this user controlled LOH collections/and compactions, i have to say this many new projects have to reconsider the use of .NET framework just to avoid this nightmare. These sort of schemes reduce your ability to maximize the use of multiple cores to improve performance.
I'm with @Rama, @Jon et.al. - provide us the ability to manually request a LOH compaction. We have embedded systems where we have large periods of idle time where we can easily handle the time cost of a LOH compaction. However, when we're busy manipulating large data sets, we can't afford an OOM exception.
I'm just another voice requesting ability to LOH compaction. Without this feature .NET is not ready for server side applications except easy web solutions where you simply let AppPool to recycle when it has memory problems. It is shame that .NET exists for more than 10 years and this issue is still not solved.
Providing guidance to use object pooling and byte buffers pooling is not sufficient because core .NET classes also don't do that! We cannot control byte buffers allocated inside .NET and third party libraries and we are not going to write the whole platform from scratch.
At the moment I'm working on a product where LOH fragmentation causes extreme issues. If we run our product as x86 application our performance test with large data survive just a few minutes before we get OOM. Using .NET 4.5 Beta added just few more minutes so the problem is still not solved.
I don't deny that we can have some naive architectonic issues in our application and that memory could be managed better in some scenarios but implementing object pooling for all cases is not an option. If we want to fix the problem now, we have to rewrite huge part of the application and even after that we are not certain that this rewriting will help to avoid OOM because we don't have control over whole code base (.NET, third party libraries) and we cannot compact LOH. Because of that it is almost certain that rewriting will also mean changing platform from .NET to Java where this issue doesn't exist!
I don't know where is LOH compaction in .NET / GC feature backlog but I know it should be on top of MSFT backlog because server applications sell server OS and database licenses. If those applications will be written in Java they will most probably not run on Windows!
Sorry to hear that you see minimal improvement in your scenario using .NET 4.5 Beta. We would like to engage with you offline to better understand your test scenario and how we can improve on this scenario. Please feel free to reach out to me with more details at: Abhishek.Mondal@microsoft.com and we can take things forward from there.
Program Manager (Garbage Collector and CoreOS)
Common Language Runtime
Brandon these a wonderful news. However I would like to say that GC is real dark horse. The darkes in the whole .NET... I really think that it might make .NET much more attracitve if you introduced some kind of web site or application suite or whatever to dedicate it propagation of the ideas, frameworks, tools, recommendation and walk-throughs to effectively measure, monitor, benchmark different aspects of GC in real life application..
I'll give you an example. Look what the other MS teams are doing - introducing PLINQ, TPL and await to involve the average Joe developer. Which is good (for Joe).
Now I am saying that we are (the-above-average-Joes) need some push in certain areas. for me this is GC.
I dont have enough time to spend a week learning and testing different approaches with say ETW. But I am sure for you guys it is a 20 min job to post nice case study - how to benchmark your app GC latency....
So myself and others can sit down and conduct few experimental lab tests with business-critical apps in one day. One week is no go. One day - is brilliant.
Common throw us a bone :)
Thank you for your comments and you should be assured that we understand the importance of measuring GC latency (and other aspects of performance) in .NET applications. For this reason we have recently released PerfView, an ETW based tool for .NET performance analysis. You can find this tool on the Microsoft Download Center, just bing "PerfView" and it will be the first result.
Using this tool you can measure GC latency by collecting a profile of your application, and opening the "GCStats" view of that profile. This view contains a detailed list of all the collections, including the latency for each. There is also a summary of total GC latency for all of the collections in the profile. As long as you can run PerfView on the same machine you are running your application, then you should be able to use this to perform a series of quick experiments and gather the data you need.
I hope we can make this information easier to find in the future, if you want to know more feel free to contact me personally at Daniel.J.Taylor@microsoft.com.
CLR Performance Team
Guys this is a real client relationship killer. LOH fragmentation really should not cause OOMEs at ALL. If a 32-bit application is running at 1GB of memory consumed, an OOME should not occur. Simple as that. "Reducing the probability" to me is great marketing speak but says you have not solved the problem.
We have 6 and 7 figure contracts at risk because of this platform limitation. Our clients use and report on a lot of data. We have had to say to them "scale back the amount of data you are recording because you won't be able to use so much". The question is is Microsoft truely serious about .NET being a player in this market segment?
If we didn't have so much invested into the platform, we would seriously be considering an immediate move to an alternate platform where memory management on the managed platform is not an issue.
@Sam Thanks for sharing your concern. I'm curious if you've actually tried .NET 4.5 to see if your applications work under the conditions you've mentioned?
@Sam I am with Brandon on this one. You shouldnt let your emotions take over until you have tests. 'reducing probability' might not be marketing speak. but it could be legal-safe speak for we solved the problem but we were advised not to put whole lenght of our neck for it ;) tests it and come back to the community with the results. we will buy you a beer :)
@Dan thank you, I will write you with an idea how whatever you achieved can be put in much better work.
(one guy discovered penicillin and the other syringe to make intramuscular injections to deliver it faster)
I really don't understand how you can claim that a LOH compactor isn't needed. Yes, it's slow, probably very slow. Make it only run either upon command or when a LOH allocation would otherwise throw an out of memory exception. How many people really would prefer the program to blow up rather than simply pause during a compaction?
Of course you do what you can to avoid the problem but you don't always have a choice.
I have stumbled across this LOH nightmare with a major customer. The best MS have to say is
"We still recommend some traditional techniques"
Are they mad ? They are effectively telling us not to use the LOH !
I need to able tell the custom the problem is fixed 100% - saying that the situation is improved is not going to cut it
This is incredible.
I have not seen one rational reason why they can’t a) give us a tool to know when LOH fragmentation is becoming an issue and b) allow us defrag when needed (eg a maintenance mode).
Mike, I am sorry to hear that. I own the CLR GC. Would it be possible for us to take a look at the LOH problem your customer is seeing? I'd like to see if there's anything we can help with short of compacting the LOH (usually there is). If you could send some full memory dumps to email@example.com it would be great. Thanks.