If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.NET Garbage Collector PopQuiz - Followup

.NET Garbage Collector PopQuiz - Followup

Rate This
  • Comments 36

 

It was really exciting to see that so many people answered the .NET GC PopQuiz, especially seeing that so many had great answers. Perhaps the questions were too easy:)

The reason I posted the pop quiz in the first place is that, as opposed to Phil, who commented that none of this should really matter to the developer:), I do think that a good understanding of what happens behind the scenes when you are programming on top of a lot of code that you don't control, is important since it tells you a lot about how to design your app for best performance.  Granted,  some of it might be of less importance than other things, but still...

Without furter redue, here are my answers...

1. How many GC threads do we have in a .NET process running the Server version of the GC on a dual-core machine?

Two,  one per processor, or rather one per logical processor, so as many of you pointed out it would have been 4 if it was hyper threaded.  In a process running the workstation version of the GC we would have no dedicated GC threads, instead garbage collection runs on the thread initiating the GC as there is no point in switching to a different thread for garbage collection when you only have one proc/one thread doing the GC.

Chris Lyon has a good writeup about GC modes and also an interesting post about GC latency modes comming in Orcas. 

Why is this important to you? Since different GC modes are optimized for different things, your memory usage, GC latency etc. may vary a lot depending on what GC mode you are using. For example a windows service by default gets the workstation GC, but if there is a lot of througput (lots of short lived allocs), you are probably a lot better off running the server GC for memory usage and perf. 

Btw, I really enjoyed the fact that Brian used the debugger to find this out, thats the spirit:)

2. What GC mode is used in the web development server (cassini) on a quad proc machine? Why?  (you can choose from server, workstation or concurrent-workstation)

Nice catch for those of you who figured out that it was workstation GC because it is a winforms app. More specifically it is concurrent workstation meaning that most of the GC stuff will be done while other threads are executing to avoid pausing the process.  

Typically you wouldn't stress test against the web development server, but if you do any kind of memory investigation, looking at when memory is released etc. you need to know that what happens on your web dev server is probably not the same thing that will happen on your multiproc web server in terms of garbage collection.

3. How many finalizers do we have in a .NET process running the Server version of the GC on a quad proc machine?

This is probably one of the more important questions, and there were different bids on this but I think most people answered one per process which is correct.

Ok, so why is this important to you? Well, its just a reminder that any objects you create that require finalization (i.e. that has a finalizer method or a destructor) will have to go through one single point (the finalizer thread) unless the object is disposed and GC.SupressFinalize is called.

Here is a great long read about Dispose, FInalize and Resource Management guidelines from Joe Duffy.

4. When is an object garbage collected?

Now this is interesting, and there were a lot of different bids on this one, but I think Brian said it best

When a GC occurs which happens either when you allocate an object which makes gen 0 exceed its current capacity, or when GC.Collect is explicitly called and the object is no longer referenced.

A few clarifications here:

a) If we exclude GC.Collect calls, this means that a GC will only occur on allocation.  I have mentioned this before, but I think it is worth mentioning again... a classic mistake to make is to run a stress test and then come back 10 mins after the stress test and wonder why memory is not being released.  In other words, objects may well be ready to be released but no allocations are made, meaning no GCs will occur, so memory usage will stay flat.

b) There are places in the framework where GC.Collect is called.  NativeCPP mentioned that a GC would occur when there is memory preassure,  I suspect he meant when a gen limit was reached, but it will also occur in ASP.NET apps when we are closing in on the memory limit set in IIS/machine.config.  To see if the app is calling GC collect, you can look at .NET CLR Memory/# Induced GC

Recently I also found another location where GC.Collect is called.  In some parts of the System.Drawing namespace GC.Collect is called to avoid having too many handles to brushes/fonts etc. in the process and getting a stale process because of this. Typically no-one should run into this since it is only likely to happen in a server app and System.Drawing is not supported in server apps according to the MSDN docs, but still it happens.

c) Regarding the object no longer being referenced, a reference in this case can be a lot of things, but in short, references are typically

strong - static/global objects, including cache or in proc sessions since they are rooted in static objects
reference counts like x.static suggested - mostly for com wrappers
stack - objects that are still alive on a thread
pinned objects - usually happens when there are native API calls, or remoting/webservices involved.
finalizer - if the object has a finalizer the object will still be garbage collected, but it will hang out afterwards, waiting to be finalized
 

After some of the comments I realized that the wording of the question is a bit ambigous.  What I meant with the question was basically "when does a collection occur", but I am adding in some comments from Maoni that would answer the question i actually posed "when is an object collected":)... plus a comment on what nativecpp had mentioned, just goes to show, you learn something new every day:)

"Tess, re question 4. When is an object garbage collected?

When a GC occurs which happens either when you allocate an object which makes gen 0 exceed its current capacity, or when GC.Collect is explicitly called and the object is no longer referenced.

This is a bit incorrect - the correct answer should be:

When a GC that collects the generation your object is in happens, your object, if is dead, is collected. If your object is in gen2 and we are only doing a gen0 collection, your object is not gonna be collected.

>>>NativeCPP mentioned that a GC would occur when there is memory preassure, I suspect he meant when a gen limit was reached, but it will also occur in ASP.NET apps when we are closing in on the memory limit set in IIS/machine.config.

Actually he/she is right - we do trigger GCs when the machine is under memory pressure. This is described in my first Using GC Efficiently blog entry."

5. What causes an object to move from Generation 0 to Generation 1 or to Generation 2?

If the object is still referenced during a garbage collection it will automatically move into the next generation, this includes objects referenced by the finalizer (freacheable queue). 

Brian mentioned that they would not be moved if they are pinned...  I don't want to make a categoric statement here since I might very well be wrong, but I don't see why pinning would make it not move into the next generation.  The term "move" here is somewhat fictive. The objects don't necessarily move, instead the generation lines move, so that at the end of each garbage collection Gen 0 will always be empty, meaning that if a pinned object was located in Gen 0, by the end of the GC it would have to be in Gen 1.  

6. If you look at the GC sizes for Generation 0, 1 and 2 in perfmon, why is most of the memory in the process in Gen 2?

As Stefan and others mentioned gen 0 and 1 have fairly small sizes, and once these are reached the objects in there that are still referenced move into Gen 2.  Although the sizes are not "fixed" as Stefan suggests, but rather dynamic over the life of the process, in order to get the most value out of each GC, they are still limited, and objects will only stay there until the next Gen 0/Gen 1 GC as opposed to Gen 2 where referenced objects will stay forever.

In other words, given a limit x for Gen 0 and y for Gen 1.  The rest of the .NET memory usage (for managed objects) has to be in either Gen 2 or the large object heap. No matter how good your allocation pattern is, there is no way that you can fit say 100 MB in Gen 0 and Gen 1:)   The trick is just to not let objects spill over to Gen 2 and then die immediately so that you have a lot of turnover in Gen 2.  

7. How many heaps will you have at startup on a 4 proc machine running the server GC?  How many would you have if the same machine was running the workstation GC?  Will the memory used for these show up in private bytes or virtual bytes in perfmon or both?

In retrospect I should have specified this a little bit.  Some people mentioned the runtime heaps, NT heaps, loader heap etc. and I have to admit, I was just too snowed in in my own little .net object world when I wrote this question.  What I meant was, how many .NET GC heaps will you have.  Even there the question is debatable.  In an interview situation I would have said that 4 was ok, but what I really wanted the answer to be was 8.  4 small object heaps and 4 large object heaps.

Ok, so why am I such a stickler for this number of threads and number of heaps bladibladibla?  Well, a lot of people pose the question, why do I have so many virtual bytes at the startup of a .NET process and why does virtual bytes go up in chunks?  When you look at that it is important to know how much of that memory goes to these GC heaps and also knowing that they will probably eventually be filled with .net objects, so a large variation of private bytes/virtual bytes at the startup of the process is not neccesarily a sign of something really bad going on.

8. (Leading question:))  Is the fact that you have mscorwks.dll loaded in the process in 2.0 an indication of that you are running the workstation version of the GC?

Ok, that was probably not one of the best questions:)  As pretty much all of you figured out, both workstation and server now live in one single dll called mscorwks.  You can check out !eeversion to see which one you are running and in the server case, with how many heaps

9. Can you manually switch GC modes for a process?  If so, how and under what circumstances?

Surprisingly, a lot of people talked about gcserver enabled=true, and then answered no to this question:)  For the correct answer, check dal's response

concurrent

<configuration>
 <runtime>
   <gcConcurrent enabled="true" />
 </runtime>
</configuration

server

<configuration>
 <runtime>
   <gcServer enabled="true" />
 </runtime>
</configuration>

The restrictions here are

a) you can not run the server version on a single proc box, it will default to workstation

b) you can not run concurrent while also running server

c) if the runtime is hosted, the hosts GC mode will override the configuration

10. Name at least 2 ways to make objects survive GC collections unneccessarily.

There are plenty of ways to do this and a lot of you had good answers on this one. To mention two... create an unneccessary finalize method and write code that causes objects to have a mid-life crisis i.e. for example create a function that sets up a lot of objects and then go on to calling a long running operation (database request or webservice call), which causes the objects to be rooted by the thread during the whole long running operation, giving the process a good chance to perform a GC in the meantime. 

In the first case (finalizer), dispose the objects when you are done.  In the second case (mid-life crisis), set the objects to null if you are not planning on using them anymore so that the GC knows that they are ready for cleanup.

11. Can a .NET application have a *real* memory leak?  In the C++ sense where we allocate a chunk of memory and throw away the handle/pointer to it?

Again there were a lot of good answers in the comments. Although you can't leak a .net object in the classic sense of the word, i.e. create an object and throw away the pointer, unless you are in unsafe mode, you can do plenty of things to create memory leaks.

Here are some examples:

a) leaking dynamic assemblies, like in the xmlserializer case study

b) blocking the finalizer thread - this is a bit borderline for a *real* memory leak, but it certainly causes ever increasing memory usage

c) calling native code that is leaking

Btw, there is also plenty of ways to create high and increasing memory usage in a .NET apps by rooting objects without realizing that you are rooting them.  Take a look at the memory issues section of my blog for a few of the ways you can do this.

12. Why is it important to close database connections and dispose of objects? Doesn't the GC take care of that for me?  

I think pretty much all of you got this one:)  To paraphrase Arnaud, "The finalizer will eventually be called, after the object has been made available for garbage collection. Knowing that there may be quite some time until an object gets GC'ed, and that many resources are limited, you call Dispose yourself as soon as you're over with an object. It doesn't get GC'd when you call Dispose, but it releases its resources." 

And of course, you avoid dragging it through the Finalizer thread.

 

Oh, btw, if you enjoy this kind of thing, and you live in the Seattle area, you may just want to check out Maoni's blog, I hear they have a job opening in the GC team, although I'm sure the interview questions there will be a bit harder than this quiz:)

 

Laters y'all

Tess





  • Probably one of the best information sources about .net garbage collection: http://blogs.msdn.com/tess/archive/2007/04/10/net-garbage-collector-popquiz-followup.aspx

  • Probably one of the best information sources about .net garbage collection: http://blogs.msdn.com/tess/archive/2007/04/10/net-garbage-collector-popquiz-followup.aspx

  • Thanks for the much-awaited answers, Tess! That for sure was really interesting.

  • Awesome follow up Tess...gave me a few pointers I need to follow up on before my face to face w/ the PFE team on Friday. If I land the position I'll owe it all to you ;-).

  • For some reason, on the question about manually switching GC modes, I thought you meant after the process has already started up (hence my comment about doubting it but if it's even possible it would probably be in a hosting scenario).

    Thanks for the great quiz (and please keep 'em coming).  I think we all learned a lot (as we always do from your posts!)

    -Brian

  • Tess proposait il y a quelque temps, un Quiz sur le garbage collector http://blogs.msdn.com/tess/archive/2007/04/02/net-garbage-collection-popquiz.aspx

  • My buddy Paul has a good source of info on the .NET Garbage Collection. This post just tripled...

  • I must agree with Phil.

    Most, if not just about all of these questions / answers really should not matter to the developer. They are specific to the implementation. There are Microsoft's (multiple) versions, Mono's version, and a few other minor versions out there as well. Garbage collection is done differently on each version. Threads are allocated differently as well.

    Saying that, you did specifically mention ".NET", so that would be (a) Microsoft implementation. The discussion is interesting, but it does not mean much when clients execute my applications under alternative implementations.

  • .NET Garbage Collector PopQuiz - Followup [Via: Tess ] Message oriented interoperability between WCF...

  • I find all this commenting about these questions not mattering to the developer a bit tiresome.  I read this blog precisely because I'm the kind of developer who cares about this stuff.  It's a great point that a fair number of implementation details vary depending on whether you're talking the CLR, SSCLI, Mono or any other implementation of the CLI (though it's a lousy reason to avoid understanding them!).  These kinds of issues should definitely be kept in mind (and explored) so you can keep a broader perspective when making design decisions.  I think the more a developer takes time to understand what's under the covers, the better off they'll be.  There's not enough time in one career to be the expert in every detail, but I'm gonna retire trying...

  • To chime in on what Brian said, if implementations of .Net on other platforms matters to you, then you should learn the equivalent paradigms from the MS implementation on the other ones. The GC is one of the most integral parts of .Net, and a good understanding of how it works (and more importantly, what NOT to do) can make or break an application.

    If you're a C/C++ developer, you certainly know how all the different memory allocation API's work, right? Point is that stuff like this touches every single area of an application, and therefore is integral in understanding.

  • Maybe I'm not an expert but the answer to question 12 is wrong: "The GC will eventually call Dispose on IDisposable objects".

    GC does not use IDisposable in any way. GC uses the Finalize method (also wrongly known as destructor).

    It is the disposable pattern that we implement that causes the Dispose method to be called by GC via the call to Finalize like this (C#):

    ~ClassName()

    {

     Dispose(false);

    }

  • Hi Michal,

    You are completely correct, and appolgies for not catching this when I copied and pasted arnauds answer.  Dispose will not automatically be called, but in most cases as you mentioned the finalizer/destructor will call dispose.

  • > jayson knight

    Quote: "[...] if implementations of .Net on other platforms matters to you, then you should learn the equivalent paradigms from the MS implementation on the other ones."

    I care about how my applications run on all platforms, regardless of the implementation. Optimizing it for a particular implementation does not optimize it for all implementations.

    I agree with your point about C/C++. However, the crucial difference is that I will need to, at the very least, I must recompile, and at the most, port the code before running it on another platform. My .Net applications are compiled once and run on multiple platforms. There is a world of difference there.

    While I am interested in the inner workings, they are implementation details and I should not and can not rely on them being consistent on multiple platforms.

  • Tess, re question 4. When is an object garbage collected?

    >>>When a GC occurs which happens either when you allocate an object which makes gen 0 exceed its current capacity, or when GC.Collect is explicitly called and the object is no longer referenced.

    This is a bit incorrect - the correct answer should be:

    When a GC that collects the generation your object is in happens, your object, if is dead, is collected. If your object is in gen2 and we are only doing a gen0 collection, your object is not gonna be collected.

    >>>NativeCPP mentioned that a GC would occur when there is memory preassure,  I suspect he meant when a gen limit was reached, but it will also occur in ASP.NET apps when we are closing in on the memory limit set in IIS/machine.config.

    Actually he/she is right - we do trigger GCs when the machine is under memory pressure. This is described in my first Using GC Efficiently blog entry.

Page 1 of 3 (36 items) 123
Leave a Comment
  • Please add 4 and 7 and type the answer here:
  • Post