If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

How does the GC work and what are the sizes of the different generations?

How does the GC work and what are the sizes of the different generations?

Rate This
  • Comments 30

During our ASP.NET debugging chat there were many questions around the GC and the different generations. In this post I will try to explain the basics of how the GC works and what you should think about when developing .net applications in relation to the GC.

First off, there is already a lot written about the .net Garbage Collector. One of the best resources regarding how the GC works and how to program efficiently for the GC is Maoni’s blog. She hasn’t written anything since May 2007 but all the posts on her blog are still very relevant since the GC hasn’t really changed enough that it makes a difference as far as .net developers should be concerned.

Maoni had a presentation at the 2005 PDC about the GC and unfortunately her link to the presentation points to an invalid location so if you are interested in looking at it I have attached it to this post. Most of what I will discuss in this article is a mixture of her presentation along with things I have learned along the way, and some of the pictures in the post are taken straight from her presentation.

Table of Contents

What are segments and heaps? How much is allocated for the GC?
What are generations and why do we use a generational GC?
When and how does a collection occur?
What are roots? What keeps an object alive?
What is the Large Object Heap? And why does it exist?
Which GC Flavor fits my application best?
What is the cost of a garbage collection? How can I keep this cost at a minimum?
Additional Resources

What are segments and heaps? How much is allocated for the GC?

When you first start up a .net application the GC will allocate memory to store your .net objects.

How much it will allocate is depends on what framework version you use (including service packs or hotfixes), if you are running on x64 or x86 and what GC flavor the application is using (workstation or server)

Here is an example of how the heaps and segments look for 2.0.50727.1433 (2.0 SP1), on a dual proc running ASP.NET (the server flavor of the GC).

We have two heaps (one per logical processor since we are running the server GC) and each heap initially has one “small object” segment and one large object segment.

The initial allocation size here is 192 MB because the GC reserved 64 MB for each small object segment and 32 MB for each large object segment.

0:000> !eeheap -gc
Number of GC Heaps: 2
------------------------------
Heap 0 (001c3a88)
generation 0 starts at 0x0310d288
generation 1 starts at 0x030ee154
generation 2 starts at 0x03030038
ephemeral segment allocation context: none
segment   begin    allocated size                reserved
001c92f0  7a733370 7a754b98  0x00021828(137,256) 00004000
001c5428  790d8620 790f7d8c  0x0001f76c(128,876) 00004000

03030000 03030038 03115294 0x000e525c(938,588) 03d3f000
Large object heap starts at 0x0b030038
segment   begin     allocated  size                    reserved
0b030000 0b030038 0b4d5aa8 0x004a5a70(4,872,816) 01af8000
Heap Size 0x5cbc60(6,077,536)
------------------------------
Heap 1 (001c4a48)
generation 0 starts at 0x0712614c
generation 1 starts at 0x071014ac
generation 2 starts at 0x07030038
ephemeral segment allocation context: none
segment   begin     allocated  size                    reserved
07030000 07030038 07134158 0x00104120(1,065,248) 03d2f000
Large object heap starts at 0x0d030038
segment    begin    allocated  size                  reserved
0d030000 0d030038 0d0f3588 0x000c3550(800,080) 01f3c000
Heap Size 0x1c7670(1,865,328)
------------------------------
GC Heap Size 0x7932d0(7,942,864)

If you want to know how much the GC has reserved and committed for your particular version/flavor of the GC you can look at the performance counters # Total Committed Bytes and # Total Reserved Bytes under .net CLR memory.

You can also use !address to calculate your segment size from a dump. For example, this small object heap segment starting at 03030000…

03030000 03030038 03115294 0x000e525c(938,588) 03d3f000

…has 0x002c1000 bytes committed and an additional 0x03d3f000 bytes reserved, so the small object heap segment size for this version and GC flavor is 0x002c1000+0x03d3f000 bytes = 64 MB

0:000> !address 03030000
03030000 : 03030000 - 002c1000
Type     00020000 MEM_PRIVATE
Protect  00000004 PAGE_READWRITE
State   00001000 MEM_COMMIT
Usage    RegionUsageIsVAD

0:000> !address 03030000+002c1000
03030000 : 032f1000 - 03d3f000
Type     00020000 MEM_PRIVATE
State   00002000 MEM_RESERVE
Usage    RegionUsageIsVAD

Since this is a number that is subject to change in any hotfix or service pack you shouldn’t rely on it, but if you are wondering how much you are allocating, that is the answer.

For example 2.0 SP1 (2.0.50727.1433) has a segment size of 512 MB for the small object segments and 128 MB for the large object segments so the initial allocation size is a lot bigger on 64 bit which causes generation 2 collections to occur much more seldom.  

  Server GC Workstation GC Workstation GC+Concurrent
# of heaps 1 per logical processor 1 1

 

Once a segment is full a new segment is created within the same heap, so a heap can have many small object heap segments and many large object heap segments but the number of .net heaps will not change during the life of the process. The memory within the segments is committed and decommitted as needed and the segments are deleted when they are no longer needed.

The two small segments at the beginning of heap 1 are used to store string constants and you can simply ignore them as they won’t really affect your application in any real sense.

 

What are generations and why do we use a generational GC?

In a generational GC objects are created in Gen 0 and if they are still alive when a collection happens they get promoted to Gen 1. If they are still alive when a Gen 1 collection happens they are promoted to Gen 2 etc. until they finally end up in their final resting place in the highest generation.

The idea behind a generational GC is that most objects are very temporary, like locals, parameters etc. i.e. they go out of scope while in generation 1. If we can keep collecting just these objects without having to go through all the memory we will save a lot of time and CPU power when cleaning up the objects.

The longer an object has been alive, the more likely it is that the object will be around for a very long time. Think about it, most objects that survive a couple of collections are objects that are stored in cache, session scope or in other long term storage like static variables. If we know that this is the case then we don’t have to bother constantly searching through them all.

In the .net GC there are 3 generations (0, 1, and 2) and then there are the large objects (objects over 85000) that end up in a separate segment. The LOH objects are different in the sense that even if they survive a collection they are not promoted since they exist outside of Gen 0, 1, and 2.

There are a few other benefits to a generational garbage collector because of how allocations are done.

If we look at our first heap:

Heap 0 (001c3a88)
generation 0 starts at 0x0310d288
generation 1 starts at 0x030ee154
generation 2 starts at 0x03030038
ephemeral segment allocation context: none
segment begin allocated size reserved

03030000 03030038 03115294 0x000e525c(938,588) 03d3f000

The small object segment would look like this, where the green part is Gen 2, blue is Gen 1 and Orange is Gen 0

 

When a new object is allocated it will be allocated right after the last object on the heap (in gen 0) at address 0x03115294 and it will continue like that, growing until Gen 0 has reached its budget at which point a garbage collection will occur.

Since objects are allocated sequentially in the segment the cost of allocation is extremely small. It consists of taking a cheap lock (on single proc), moving a pointer forward, clearing the memory for a new object and register the new object with the finalize queue if it has a finalizer/destructor. The fact that they are allocated sequentially also gives a few other benefits such as locality of time and reference which means that objects that are allocated in the same method at the same time are stored close together. Since they are allocated at the same time/place they are likely to be used together and accessing them will be very quick.

Generation 1 and 0 live in something called the ephemeral segment (the first small object segment in each heap) and the size of Gen 1 and Gen 0 can never exceed the size of a segment. If a new segment is created that will become the new ephemeral segment. Gen 2 on the other hand can grow indefinitely (or until you run out of memory) so if you have high memory consumption a large amount of your objects will live in Gen 2.

The budgets for generation 1 and 0 vary over time based on the allocation pattern of the process and how much is actually collected during each collection. You can see what the current budget for Gen 0 is by looking at the .net CLR memory/Gen 0 heap size.

When and how does a collection occur?

A collection occurs

· When you allocate a new object and the generation 0 budget is reached, i.e. if the new object would cause it to go over budget.
· When someone calls GC.Collect (Induced GC)
· Based on memory pressure

Contrary to popular belief collections don’t happen at certain time intervals etc. so this means that if your application is not allocating any memory and is not calling GC.Collect, no collections will occur.
It is also important to understand that collections of the higher generations will only occur once their budgets are reached, and in the case of 64-bit processes gen 2 collections occur very seldom which means that a lot of memory may be sitting around even though it is not in use, just because it made its way into gen 2.

If you have a process (64 or 32 bit) that does not use a lot of .net objects, but does use a lot of native resources like threads, connections etc. you may end up in a situation, if you are not properly cleaning up the threads, connections etc. where you run out of native resources and handles because the objects have not been collected. Therefore it is absolutely crucial that you do dispose/close all objects that have native resources right after you are finished using them.

A garbage collection simplified goes through the following sequence

1. Suspend all threads that are making .net calls (i.e. could be allocating objects or modifying the objects on the heap). Threads making native calls are suspended on their return to managed code.

2. Determine which objects in the current generation can be garbage collected. This is done by asking the JIT, the EE Stack Walker, the Handle table and the finalize queue which objects are still accessible/in use. See maonis post “I am a happy janitor” for more info on this.

3. Delete all marked objects or add the empty spaces to a free list if the heap is not compacted.

4. Compact/Move the leftover objects to the backend of the heap (this is the most expensive part)

5. Resume all threads

Here is a collection, step by step (with pictures from Maonis presentation)

Allocate new objects at the end of the heap in Gen 0

Determine which objects are still accessible

 

Sweep the garbage and add the free blocks to the free-list to store new objects there if non-compacting

Compact the heap

Move the start of generation 0 to the end of the objects that survived. The survivors are now in generation 1

New objects are allocated in generation 0

A gen 1 collection can’t occur without a Gen 0 collection so any time a Gen 1 collection occurs it will be a Gen 1 + Gen 0, and same of course for Gen 2.

What are roots? What keeps an object alive?

If your object is rooted, that means that an object (a root) has a reference (directly or indirectly) to your object, and that that root object is either on a stack as a parameter or local, or it is a static variable, or it is on the finalizer queue, meaning that it needs to be finalized before it can be released. See this post for a discussion of different types of roots and what they mean.

An object is also considered alive if it is referenced by an object in an older generation, until that older object is collected of course.

What is the Large Object Heap? And why does it exist?

The large object heap is a special segment (or multiple segments) in a heap, specifically meant for objects over 85000 bytes. As I have mentioned many times before in my posts this 85000 refers to the size of the object itself, not the size of the object and all its children.

The example I always use is a large dataset. The dataset itself is merely a collection of a few links to different arrays, so the dataset object will never grow independently of the number of rows or columns it has, it will consistently stay at 80 bytes or 120 bytes etc. (different in different framework versions). In other words the dataset will never make it to the large object heap.

The objects that will be stored on the large object heap are usually strings and arrays of different kinds since a string is stored in one contiguous chunk rather than a linked list of the different characters. Same thing with an array, but again, here it is important to understand that it is just the size/length of the array that determines if it is a large object or not, not the total size of the objects that it references.

When you create a large object, for example a large string, it immediately goes on the large object heap segment so it is never even allocated in gen 0. As mentioned before the large object heap segment is not generational, if an object in the LOH is alive during a collection, it simply stays on the LOH.

The reasoning behind having a special heap for large objects is that it is very expensive to move them around, and particularly for arrays for example, it is very expensive to update all the references etc. Therefore the LOH is not compacted, instead any space that is left between objects when a garbage collections occur are put on a free-list so that if a new object is allocated it can be allocated in that free space. If multiple collections occur causing two or more free spaces after each other these are coalesced into one larger free space.

The large object heap is collected when a gen 2 collection occurs.

Which GC Flavor fits my application best?

At present there are three different versions / flavors of the GC, each optimized for different types of applications.

Server GC

The server GC is optimized for high throughput and high scalability in server applications where there is a consistent load and requests are allocating and deallocating memory at a high rate.
The server GC uses one heap and one GC thread per processor and tries to balance the heaps as much as possible. At the time of a garbage collection, the GC threads work on their respective threads and rendez-vous at certain points. Since they all work on their own heaps, minimal locking etc. is needed which makes it very efficient in this type of situation.

The Server GC is only available on multi processor machines. If you try to set the server GC on a uni proc machine you will instead get the workstation version with non concurrent GC.

This flavor is what ASP.NET uses by default on multiproc machines, as well as a number of other server applications. If you want to use the server GC in a windows service you can do so by setting

<configuration>
 <runtime>
   <gcServer enabled="true" />
 </runtime>
</configuration>

In the applications config file

Workstation GC – Concurrent

This is the default setting for win forms applications and windows services.

The Workstation GC is optimized for interactive applications that can’t allow for the application to be paused even for relatively short periods since pausing the threads would cause flicker in the user interface or make the application feel non responsive to button clicks etc.

This is done by trading CPU and memory usage for shorter pause time when doing generation 2 collections.

Workstation GC – Non Concurrent

The non-concurrent Workstation GC mimics the server GC except for that collections are done on the thread that triggers the GC. This mode is recommended for server type applications running on a single proc box.

You can turn concurrency off in the application config file

<configuration>
 <runtime>
   <gcConcurrent enabled="false" />
 </runtime>
</configuration>

 

Concurrent WS Non-Concurrent WS Server GC
Design Goal

Balance throughput and responsiveness for client apps with UI

Maximize throughput on single-proc machines

Maximize throughput on MP machines for server apps that create multiple threads to handle the same types of requests
Number of heaps 1 1 1 per processor (HT aware)
GC threads The thread which performs the allocation that triggers the GC The thread which performs the allocation that triggers the GC 1 dedicated GC thread per processor
EE Suspension EE is suspended much shorter but several times during a GC EE is suspended during a GC EE is suspended during a GC
Config setting <gcConcurrent enabled="true"> <gcConcurrent enabled="false"> <gcServer enabled="true">
On a single proc WS GC + non-concurrent

 

What is the cost of a garbage collection? How can I keep this cost at a minimum?

You can measure the GC cost for your application with a few different counters. Remember that all of these counters are updated at the end of a collection which means that if you use averages they may not be valid after a long time of inactivity.

.NET CLR Memory\% time in GC - This counter measures the amount of CPU time you spend in GC and it is calculated as (CPU time for GC/CPU time since last GC)

.NET CLR Memory\# Induced GC – This is the number of garbage collections that have occurred as a result of someone calling GC.Collect(). Ideally this should be 0 since inducing full collections means that you spend more time in the GC, and also because the GC continuously adapts itself to the allocation patterns in the application, and performing manual GCs skews this optimization.

.NET CLR Memory\# Gen X collections – This counter displays the amount of collections that have been performed for a given generation. Since the cost of gen 2 collections is high compared to Gen 1 and Gen 0 you want to have as few Gen 2 collections per Gen 1 and Gen 0 collections as possible. A ratio of 1:10:100 is pretty good.

The most common causes for high CPU in GC or a high number of Gen 2 collections compared to 1 and 0 is high allocation of large objects and letting objects survive multiple generations because of improper use of finalizers or because finalizable objects are not disposed of correctly in the application.

Additional Resources:

Maonis blog

Generational GC - A post-it analogy

.NET Hang Case Study: The GC-Loader Lock Deadlock (a story of mixed mode dlls)

ASP.NET Case Study: High CPU in GC - Large objects and high allocation rates

ASP.NET Case Study: Bad perf, high memory usage and high CPU in GC - Death By ViewState

.Net memory leak: Unblock my finalizer

.NET Memory: My object is not rooted, why wasn't it garbage collected?

Who is this OutOfMemory guy and why does he make my process crash when I have plenty of memory left?

.NET Garbage Collector PopQuiz - Followup

Attachment: FUN421_Stephens.ppt
  • PingBack from http://blogs.msdn.com/tess/archive/2008/04/17/how-does-the-gc-work-and-what-are-the-sizes-of-the-different-generations.aspx

  • Great article! I seem to remember a while back that the CLR would be sensitive to which OS edition it was running on. For instance, if it was running on a "workstation" OS (e.g. Windows XP) it would default to the Workstation-Concurrent GC, but if it was running on a "server" OS (e.g. Windows 2003 Server) it would default to the Server-GC.

    Is this still true under .NET 2.0 SP1?

    Thanks!

  • That has never been true,  the choice for when ws was used vs. server has always been the same except for that for 1.1 only asp.net (or perhaps also com+) used the server version by default.  Winforms and windows services have always defaulted to workstation unless they hosted the CLR and set it to server.

    The default setting is a choice of the hosting environment (eg. ASP.NET, BizTalk, SQL Server, or your own app if you are hosting the CLR).  

    The option to change which flavor to use in a config file came with a hotfix for 1.1 and it was introduced from the start in 2.0.

  • Thanks to everyone that attended the chat we had today on Tess and my pages.&#160; I think it went really

  • .NETASP.NET应用程序的扩展策略WebAdobeAIRforJavaScriptDevelopersPocketguideAreyousureyourun...

  • What a good explanation,

    Thank you.

  • So there is a common issue that people run across when they start trying to debug managed code.&#160;

  • Thanks for the info. I am always curious about "ephemeral segment".  From your statment "Generation 1 and 0 live in something called the ephemeral segment (the first small object segment in each heap) and the size of Gen 1 and Gen 0 can never exceed the size of a segment",

    1) Can we say that sizeof gen1)+ sizeof (gen0) => segment size ?

    2) Is the reason behind this is for locality so that it is faster to acess the location ?

    Thanks

  • Hi NativeCpp,

    We can say that gen1+gen0 is <= segment size, but remember that there is one ephemeral segment per heap so if this is on a multiproc machine, that would mean that the total size of gen1+gen0 <= segment size*#Cpu.

    You can see the actual sizes at any given time in perfmon under the .net clr memory counters, just remember that the gen0 size listed there is the budget rather than the actual size since those counters are only updated after a GC and at that point gen 0 is always empty.

    Regarding whether the reason is locality, I don't honestly know.  What I can say is that the general purpose with a generational GC is to speed up access and garbage collections, for example Gen 0 typically fits in the L2 cache of the processor which is excellent for speedy access since you don't even have to hit RAM to get to the data you use frequently and have created recently.  

    Since Gen0 and Gen1 fit in the same segment that means that when you page in data to RAM you will get a lot of adjecent objects, i.e. objects that are close to eachother in terms of when / where they were allocated and that of course in turn means that the access is speedy (because of the locality). So yes, i think you can say that, but again don't take my word for this as I don't know the exact reasons behind it, i'm just extrapolating from the knowledge and experience I have.

  • I am having a rough time getting all RCWs collected. Especially RCWs created when a COM app fires an event to a .NET component when the component does not explicitly add an event handler for each specific event. Apparently behind the scene the .NET framework still wraps any com object passed to such an (unhandled) event with an RCW that in my view, should be immediately available for GC.

    When calling GC.Collect(GC.MaxGenerations), is one guaranteed that all objects that are candidates for GC get collected? If not, how do I force that to occur?

    I have tried to force all objects to be collected by collecting all gens, waiting for pending finalizers and then collecting all gens again (seems to be my best approach so far).

    From my experience the answer is no so I am having to add event handlers for any event that is fired that has a COM (RCW) object as an arg. In the event, I have one line of code to call ReleaseComObject on the arg (or args) that is a com object.

    Failure to do this is resulting in access violations in the com server when the server shuts down. These occur in mscorwrks as the RCWs are cleaned up. During shutdown, even though the com server DLLs can already be unloaded, GC is running when the .NET DLLs unload and an AV occurs whenever one of those RCWs try to release a COM object whose DLL is already unloaded.

    Which brings me to another question. Can I prevent GC when the com server terminates?

  • Hi RDH,

    To force a collection of all you need to do a collect, waitforpendingfinalizers, collect so that any objects that have finalizers are also collected.

    I would strongly advice against this in production code though unless you have some very special situation where it makes sense to do so, as this skews the optimization of the GC and causes higher CPU in GC than neccessary.

    There is currently no way to prevent a GC and you really woundn't want to as this would cause other objects not to be collected either.  The way to avoid it from getting on the finalizer is as you mention to run ReleaseComObject when the object is no longer used.  

  • Hi Tess

    I've a question as regards deciphering the boundaries for a particular GC segment.

    For eg. given the details below

    --------------------------------------------------------------

    02240000 - 00012000

     Type     00020000 MEM_PRIVATE

     Protect  00000004 PAGE_READWRITE

     State    00001000 MEM_COMMIT

     Usage    RegionUsageIsVAD

    02252000 - 00fee000

     Type     00020000 MEM_PRIVATE

     Protect  00000000

     State    00002000 MEM_RESERVE

     Usage    RegionUsageIsVAD

    -------------------------------------------------------------

    how do we know that the address regions starting at 02240000 & 02252000 addresses belong to the SAME GC segment?

    Becuase !eeheap -gc tells me only about the committed bytes in a segment and does not talk about the reserved bytes for the segment? (though !eeheap -gc in your post shows reserved bytes too but not in my machine)

    Please clarify if I'm missing something here

    Many thanks for enlightening me about GC

  • Hi Tess,

    We have a high initial memory load on a 8 processors server with hyperthreading (which means we have 16 GC heaps).

    We expected that load to decrease when disabling the hyperthreading but it does not.

    Could you explain why?

    Thank's,

    Michal

  • michalka,

    I think, but don't quote me on this, that there is a cut off point at 8 processors, where if you have more than 8 processors the heap sizes will be cut in half to lower the initial vmem usage on systems with a high number of processors.

    You can try getting a dump both with hyper threading on and off and see if it changes from 8 to 16 heaps.

    if you want to know how much is reserved for a specific segment you can look at the results of !eeheap -gc

    segment    begin allocated     size            reserved

    0ed00000 0ed00038  0ee6b5cc 0x0016b594(1,488,276) 01b0f000

    In the above case for example, there is a total of

    1488276+28372992 = 29861268 bytes reserved for that segment

    0:000> ?0x0016b594

    Evaluate expression: 1488276 = 0016b594

    0:000> ?0x01b0f000

    Evaluate expression: 28372992 = 01b0f000

    0:000> ?0x0016b594+01b0f000

    Evaluate expression: 29861268 = 01c7a594

  • Hi Tess,

    Your theory sounds reasonable as we do see decrement in memory load when disabling hyprthreading on a 4 processor machine.

    In this case (8 processor machine with hyperthreading), we only have a memory dump with hyperthreading on (it is a customer environment so we do not have full access to it) and we do see 16 GC heap.

    I calculated their sizes by the way you show in this post (using the !address command):

    16MB for small segment and 8MB for large object heap.

    The .Net framework is 2.0.50.727.832

    Is it half a size?

    Do you know where can we confirm that theory?

    Thank you!

    Michal

Page 1 of 2 (30 items) 12
Leave a Comment
  • Please add 6 and 3 and type the answer here:
  • Post