If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

ASP.NET Case Study: High CPU in GC - Large objects and high allocation rates

ASP.NET Case Study: High CPU in GC - Large objects and high allocation rates

Rate This
  • Comments 24

Rather than spending a lot of time on explaining the details of the garbage collector, I'll refer you to Maoni's blog for some very interesting reading, but in
this case study I want to show you how to get to the bottom of a problem with high CPU in the garbage collector.

Problem description:

At times the application will be very slow to respond and the CPU usage is very high (70-80%).

Troubleshooting steps:

I have already tipped you off in the title that the issue will be High CPU in GC, but how do we even get to that conclusion?  The answer is performance monitor...

A high CPU issue is generally one of 3 things

·         An infinite loop

·         Too much load (i.e. too many requests doing lots of small things, so no one single thing is a culprit)

·         Too much churning in the Garbage Collector.

The first one is pretty easy to spot in a memory dump if you get it at the right time, and is usually caused by some kind of logic problem. The second type is really tough and often requires adding hardware and new machines along with some serious profiling to optimize the code that is running on each request.  For the purposes of this post, I'll skip both of those.

To find out if your performance issue might be GC related, your friend is the .NET CLR Memory performance counters

Let's first take a look at one of the most important performance counters used for these types of issues, .net CLR Memory / % Time in GC.

Paraphrasing from Maoni's blog and my favorite white paper on ASP.NET performance counters, the % Time in GC counter shows how much of the CPU time spent since the last GC was spent on the current GC.  In other words, if we have spent 1 million cycles since the last GC, and we spent 0.3 million cycles in the current GC, this number will be 30%.  

That means for short periods of times you can see huge spikes, but it is the average that is important for this counter.

In different articles people will declare the optimal average value to be 5% or 30% or 20%, so what is it?  Short answer: there is really no optimal value, or rather, the optimal value would be close to 0, but reducing this value is very hard, so I wouldn't even bother if you don't have an average value above 30% or so. There is simply too much pain and too little gain.

The common cause for most high CPU in GC issues is a very high allocation rate (.net CLR Memory / allocated bytes/sec) but really, if you spend all your GC time doing Gen 0 collections that is not going to cause a high CPU in GC since Gen 0 collections are very inexpensive. So basically what hurts you is a lot of Gen 2 collections, i.e. a lot of .net objects making it into Generation 2 and then getting released. The other big thing that causes a high number of Gen 2 collections is excessive use of the Large Object Heap (LOH).  The reason for this is because every time we fill up a LOH segment and need to create a new one to satisfy a request we need to garbage collect, and a large object heap collection triggers a Gen 2 collection, which in turn triggers a Gen 1 and a Gen 0 collection.

This brings us to the third group of counters that I usually look at (.net CLR Memory / # Gen 0 Collections, # Gen 1 Collections and # Gen 2 Collections).  Again, there is no hard and fast rule about what the rates should be between these. Some recommend a 100/10/1 ratio, i.e. for each Gen 2 garbage collection you should have about 10 Gen 1 collections and 100 Gen 0 collections if you have a good allocation pattern.  I think that is a good rule of thumb but deviances from this are common and a fairly small deviance wouldn't concern me too much, however if have the same value for Gen 0, Gen 1 and Gen 2 collections you can start worrying a little bit:)

Oh, while I remember, another thing (apart from the blocked Finalizer that I posted about earlier) that can cause a large number of Gen 2 collections is of course the human element, in other words if you call GC.Collect(3) or GC.Collect(2) manually in your application on a regular and frequent bases, or if you call GC.GetTotalMemory(true).  To see if this can be an issue, watch the (.net CLR Memory / # Induced GC). Ideally this one should be 0, and it is in most cases, but if you are closing in on the memory limit you might see this one go up since ASP.NET itself will induce a GC at that point, but only 1 or so...

 

For this specific issue I have the following values:

% Time in GC                        ~40 % average
allocated bytes / sec              400 MB average
# Induced GC                        0
# Gen 0 Collections                28.379
# Gen 1 Collections                28.378
# Gen 2 Collections                28.378

WOW! Now that is a massive amount of allocated bytes / sec :) but I definitely know that I am not allocating anything that huge in my application, especially not every second.

The other interesting point here is that our Gen 0, Gen 1 and Gen 2 are close to identical, which is a sure clue that we are either stressing the LOH pretty hard or we have a lot of objects that make it into the 2nd generation and then die immediately.  My personal vote goes to the LOH.

In a perfect world we know exactly what page or small set of pages are responsible for the high allocation rate and we can use the CLR profiler (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/highperfmanagedapps.asp) to catalogue the allocations and figure out a way to reduce them. Oh, and did I mention, of course in a perfect world we can repro the problem on a test environment.   But… need I say it? We don’t live in a perfect world…

However if you can repro the issue in test, I would highly recommend using the CLR profiler to look at your allocation pattern.


In this particular case, we have no clue what is causing the problem so we will resort to my favourite tool windbg
J

Debugging the problem:

GC problems are pretty hard to debug for a couple of reasons.

1. If you catch a -hang dump and it happens to be in the middle of a GC, that is no absolute indication that you have a high CPU in GC problem.  In a process with fairly high load and a lot of allocations, GC's happen frequently so it is common to catch it in a GC.
2. Even if you know you have a GC problem (based on performance counters), and you catch the process in a GC, it is hard to tell what actually caused it.

The best way to catch the offender with the hands in the cookie jar, is to take multiple dumps on different occasions when the CPU is high and if they all look pretty much the same, there is a good possibility that you found your CPU thief.

Step 1 - Determining if we are in a GC:

As I've mentioned before we have two different versions of the GC (the server version and the workstation version).  In the server version we have a dedicated GC thread per processor (or 2 if it is hyper threaded), in the workstation version we GC on the same thread that initiates the GC.  The dump output I am going to show you is from a dual-core machine with .net 2.0.

If we are not in a GC, we will have 2 threads (one per core) looking something like this

  18  Id: 134c.918 Suspend: 1 Teb: 7ffa8000 Unfrozen
ChildEBP RetAddr  Args to Child             
01e1fe68 7c822124 77e6baa8 00000398 00000000 ntdll!KiFastSystemCallRet
01e1fe6c 77e6baa8 00000398 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
01e1fedc 79e77fd1 00000398 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
01e1ff20 79e77f9a 00000398 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
01e1ff70 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
01e1ff80 79f3549b ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
01e1ffa8 79f6ece3 00000000 b28c067c 01e1ffec mscorwks!SVR::gc_heap::gc_thread_function+0x2e
01e1ffb8 77e66063 000e7660 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b
01e1ffec 00000000 79f6ec79 000e7660 00000000 kernel32!BaseThreadStart+0x34

So this is a GC thread waiting for work...

In our dumps from the problem our GC threads look like this…

  16  Id: f28.1150 Suspend: 1 Teb: fff82000 Unfrozen
ChildEBP RetAddr  Args to Child             
0248fd28 7d4d8c46 000002dc 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
0248fd98 79e77fd1 000002dc ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0248fddc 79e77f9a 000002dc ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
0248fe2c 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
0248fe3c 79f35e1a ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
0248fe54 7a0d5a3b 001afbf0 0000000d 00000002 mscorwks!SVR::t_join::join+0x61
0248ff54 79f391bf 00000002 001afbf0 00000000 mscorwks!SVR::gc_heap::plan_phase+0xd78
0248ff70 79f39954 00000002 ffffffff 001afbf0 mscorwks!SVR::gc_heap::gc1+0x57
0248ff88 79f35422 00000000 00000000 001afbf0 mscorwks!SVR::gc_heap::garbage_collect+0x37c
0248ffa8 79f6ece3 00000000 914b1904 0248ffec mscorwks!SVR::gc_heap::gc_thread_function+0x68
0248ffb8 7d4e0729 001afbf0 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b
0248ffec 00000000 79f6ec79 001afbf0 00000000 kernel32!BaseThreadStart+0x34

…which indicates that we are in the middle of a garbage collection.

So what triggered this particular garbage collection? To find that out we have to look for a thread that has GarbageCollectGeneration on the stack...

  21  Id: f28.1244 Suspend: 1 Teb: fff0f000 Unfrozen
ChildEBP RetAddr  Args to Child             
029eeeb4 7d4d8c46 000002f0 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
029eef24 79e77fd1 000002f0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
029eef68 79e77f9a 000002f0 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199
029eefb8 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
029eefc8 79f3a3f3 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17
029eefec 79f3a2f6 ffffffff 001afcf8 029ef0e4 mscorwks!SVR::gc_heap::wait_for_gc_done+0x99
029ef010 79f3a0c7 00000000 00000000 0004e210 mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x267
029ef0a0 79f2d0aa 029ef0e4 0004e210 00000003 mscorwks!SVR::gc_heap::try_allocate_more_space+0x1bc
029ef0c0 79f32b14 029ef0e4 0004e210 00000003 mscorwks!SVR::gc_heap::allocate_more_space+0x2f
029ef100 7a0d5c9a 0004e20c 00000000 00000000 mscorwks!SVR::gc_heap::allocate_large_object+0x8c
029ef11c 79e74c1c 00211be8 0004e20c 00000000 mscorwks!SVR::GCHeap::Alloc+0xbc
029ef138 79e7e940 0004e20c 00000000 00000000 mscorwks!Alloc+0x60
029ef18c 79e7e9a0 791240f0 00013880 00000000 mscorwks!FastAllocatePrimitiveArray+0xbd
029ef24c 02920df3 00009c40 00002247 d768bc75 mscorwks!JIT_NewArr1+0x148
WARNING: Frame IP not in any known module. Following frames may be wrong.
029ef2a8 6881bdc6 02a60824 06a77b84 0aa3a218 0x2920df3
029ef2c0 6881bfbc 06a76df0 0aa3a218 06a7cb04 System_Web_ni+0x25bdc6
00000000 00000000 00000000 00000000 00000000 System_Web_ni+0x25bfbc

In SOS for 1.1 the threads output for this thread would have a (GC) in the exception column like this...

0:029> !threads
ThreadCount: 7
UnstartedThread: 0
BackgroundThread: 7
PendingThread: 0
DeadThread: 0
Hosted Runtime: no

                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
  14    1 11b8 001aba50   1808220 Enabled  00000000:00000000 001a8868     0 Ukn (Threadpool Worker)
  18    2 11f8 001b1ef0      b220 Enabled  00000000:00000000 001a8868     0 MTA (Finalizer)
  19    3 1174 001c75e8    80a220 Enabled  00000000:00000000 001a8868     0 MTA (Threadpool Completion Port)
  20    4 133c 001ca250      1220 Enabled  00000000:00000000 001a8868     0 Ukn
  12    6 1178 001dcae8   880a220 Enabled  00000000:00000000 001a8868     0 MTA (Threadpool Completion Port)
  21    7 1244 00211ba8   180b220 Enabled  00000000:00000000 001ca990     1 MTA (GC)
  22    5  6f8 001dc450   880b220 Enabled  00000000:00000000 001a8868     0 MTA (Threadpool Completion Port)

…but that is not always true for sos.dll for 2.0, so then we have to resort to finding the thread that calls garbage collect generation...

Step 2 - Why are we garbage collecting:

In this case we can see that the GC happened because we tried to allocate a large object (allocate_large_object) and we needed more space on the LOH to do this...
now that is a pretty good indication that we hit the spot... particularly since we saw that the number of Gen 2 collections and Gen 1 collections and Gen 0 collections were pretty much the same, indicating that there is a lot of Gen 2, or LOH collections.

The other thing we can see from this stack before moving on to the managed stack is that we are trying to allocate an object of size 0x0004e20c (the first parameter to allocate_large_object) which equates to 320.012 bytes.

We can also see if we run !runaway (which gives us combined user mode CPU time for different threads) that the threads that have used up most CPU since the start of this process are the following:

0:029> !runaway
 User Mode Time
  Thread       Time
  14:11b8      0 days 0:00:55.687
  16:1150      0 days 0:00:45.500
  17:7a8       0 days 0:00:43.875
  21:1244      0 days 0:00:23.140
 
  0:ea0       0 days 0:00:00.046
  29:fc8       0 days 0:00:00.000
  28:11f0      0 days 0:00:00.000

The total up-time for the process is around 17 minutes so it has been idle for most of its lifetime.

0:021> .time
Debug session time: Tue Jun 20 09:46:04.000 2006 (GMT+2)
System Uptime: 4 days 6:36:51.031
Process Uptime: 0 days 0:16:53.000
  Kernel time: 0 days 0:00:45.000
  User time: 0 days 0:02:48.000

Threads 16 and 17 are our GC threads and they will pretty much always be on top of the list since a lot of the other threads (worker threads) will spend a lot of their time being idle waiting for requests.

Then we have thread 14 and 21 that are worker threads (based on the threads output above), and if we start with thread 14 we can see based on the stack output that this thread is currently idle…

  14  Id: f28.11b8 Suspend: 1 Teb: fff88000 Unfrozen
ChildEBP RetAddr  Args to Child             
020afc8c 7d4d8c46 00000224 00000000 020afcd0 ntdll!ZwWaitForSingleObject+0x15
020afcfc 79e77fd1 00000224 00009c40 00000000 kernel32!WaitForSingleObjectEx+0xac
020afd40 79e77f9a 00000224 00009c40 00000000 mscorwks!PEImage::LoadImage+0x199
020afd90 79e77f50 00009c40 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117
020afda0 79f5b69c 00009c40 00000000 00000000 mscorwks!CLREvent::Wait+0x17
020afe20 7a1121c3 001862f8 00009c40 00000000 mscorwks!ThreadpoolMgr::SafeWait+0x73
020afe94 79f71123 00000000 00000000 00000000 mscorwks!ThreadpoolMgr::WorkerThreadStart+0xf1
020affb8 7d4e0729 0019cab8 00000000 00000000 mscorwks!ThreadpoolMgr::intermediateThreadProc+0x49
020affec 00000000 79f710dd 0019cab8 00000000 kernel32!BaseThreadStart+0x34

...so it is not doing any work now, but some time during the 17 minutes that the process has been up it has spent 56 seconds using the CPU, dispersed on one or more calls (we really can't tell).

The other thread (thread 21) is the thread that was triggering the GC, and again, we can't tell if the 23 seconds of CPU usage is only on the current request or over multiple requests, so we shouldn't follow this output blindly, but it can often give a good indication of where a high CPU problem lies.

If we return to the "why we are doing a garbage collection", we can look at the managed callstack for thread 21 (the thread that triggered the GC).

0:021> !clrstack
OS Thread Id: 0x1244 (21)
ESP       EIP    
029ef1e8 7d61c824 [HelperMethodFrame: 029ef1e8]
029ef254 02920df3 LargeObjectHeap.GetLotsOfDatesXML(Int32)
029ef2a4 02920b20 LargeObjectHeap.Button1_Click(System.Object, System.EventArgs)
029ef2b4 6881bdc6 System.Web.UI.WebControls.Button.OnClick(System.EventArgs)
029ef2c8 6881bfbc System.Web.UI.WebControls.Button.RaisePostBackEvent(System.String)
029ef2dc 6881bf38 System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(System.String)
029ef2e0 687d91e0 System.Web.UI.Page.RaisePostBackEvent(System.Web.UI.IPostBackEventHandler, System.String)
029ef2e8 687d912a System.Web.UI.Page.RaisePostBackEvent(System.Collections.Specialized.NameValueCollection)
029ef2f8 687dcbbf System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)
029ef4b0 687db521 System.Web.UI.Page.ProcessRequest(Boolean, Boolean)
029ef4e0 687db487 System.Web.UI.Page.ProcessRequest()
029ef518 687db3a7 System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)
029ef520 687db33a System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
029ef534 02920795 ASP.largeobjectheap_aspx.ProcessRequest(System.Web.HttpContext)
029ef538 686888df System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
029ef56c 6865a071 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)
029ef5ac 6865a39b System.Web.HttpApplication.ResumeSteps(System.Exception)
029ef5f4 6865912d System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)
029ef610 6865e0bd System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)
029ef644 6865dd72 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)
029ef650 6865c447 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)
029ef800 79f1ef33 [ContextTransitionFrame: 029ef800]
029ef850 79f1ef33 [GCFrame: 029ef850]
029ef9a8 79f1ef33 [ComMethodFrame: 029ef9a8]

So the last call on the call stack LargeObjectHeap.GetLotsOfDatesXML() is what created the large object and thus triggered the Gen 3 (large object heap) collection, which in turn triggers a Gen 2, and a Gen 1 and so on...

But... one large object won't cause tonnes of CPU in GC, so let's take a peek at what this function is doing and see if we can definitely pin it down as the bad guy... (as if the class name and function name doesn't give it away already:))

    String GetLotsOfDatesXML(int i)

    {

        String dates = "<Dates>";

        for (int j = 0; j < i; j++)

        {

            DateTime dt = DateTime.Now.Add(new TimeSpan(j, 0, 0, 0));

            dates += "<Date><Month>" + dt.Month + "</Month><Day>" + dt.Day + "</Day><DayOfWeek>" + dt.DayOfWeek + "</DayOfWeek><Date>";

            dates += "<Date><Month>" + dt.Month + "</Month><Day>" + dt.Day + "</Day><DayOfWeek>" + dt.DayOfWeek + "</DayOfWeek><Date>";

        }

        dates += "</Dates>";

        return dates;

    }

Hmm... very suspicious... depending on how big i is dates could become a very large string... we are already up to about 320000 bytes (160000 chars) based on the size of the large object allocation, which means that we have already passed through 1000 iterations give or take (if each count that the addition to the string in each iteration is around 140 chars, give or take a few characters).

In fact, if we wanted to know exactly how many iterations we are going to make, we need to find out what value the parameter i is set to.

From the callstack, this is passed in from LargeObjectHeap.Button1_Click

    protected void Button1_Click(object sender, EventArgs e)

    {

        String str = GetLotsOfDatesXML(Int32.Parse(txtNumIterations.Text));

    }

where txtNumIterations is a textbox.

Note: Please ignore the rest of the post all the way to conclusion if you are not really into drilling down to objects to find member variables:) but I thought it might be a nice exercise since it is sometimes necessary to drill very deep to find the whole truth...

To find the textbox and its value we can grab the aspx page from !dso

0:021> !dso
OS Thread Id: 0x1244 (21)
ESP/REG  Object   Name
029ef0f0 0b1a2270 System.String    <String is invalid or too large to print>
029ef1a8 0b1a2270 System.String    <String is invalid or too large to print>
029ef1bc 0b1a2270 System.String    <String is invalid or too large to print>
029ef228 06a759e4 System.String    </DayOfWeek><Date>
029ef28c 06a76df0 ASP.largeobjectheap_aspx
029ef2b4 06a77b84 System.ComponentModel.EventHandlerList
029ef2bc 06a77a84 System.Web.UI.WebControls.Button
029ef2c8 06a76df0 ASP.largeobjectheap_aspx
029ef304 06a76880 System.Web.HttpContext
...

0:021> !do 06a76df0
Name: ASP.largeobjectheap_aspx
MethodTable: 02746ccc
EEClass: 028d24cc
Size: 380(0x17c) bytes
 (C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\debuggersamples\e7443224\5232f845\App_Web_h0ctkxwz.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fa3e0  4001fe0        4        System.String  0 instance 02a7cba4 _id
790fa3e0  4001fe1        8        System.String  0 instance 00000000 _cachedUniqueID
68a2af44  4001fe2        c ...em.Web.UI.Control  0 instance 00000000 _parent
68a91070  4001fe3       2c         System.Int32  0 instance        5 _controlState
68a85ea0  4001fe4       10 ...m.Web.UI.StateBag  0 instance 00000000 _viewState
68a2af44  4001fe5       14 ...em.Web.UI.Control  0 instance 00000000 _namingContainer
68a273d0  4001fe6       18   System.Web.UI.Page  0 instance 06a76df0 _page
...
68a7d910  4000004      16c ...ebControls.Button  0 instance 06a77a84 Button1
68a95f40  4000005      170 ...bControls.TextBox  0 instance 06a77be4 txtNumIterations
68a2bc80  4000006      174 ...Controls.HtmlForm  0 instance 06a77688 form1

Then we can look at the textbox

0:021> !do 06a77be4
Name: System.Web.UI.WebControls.TextBox
MethodTable: 68a95f40
EEClass: 68a95ebc
Size: 80(0x50) bytes
 (C:\WINDOWS\assembly\GAC_32\System.Web\2.0.0.0__b03f5f7f11d50a3a\System.Web.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fa3e0  4001fe0        4        System.String  0 instance 02a85a04 _id
790fa3e0  4001fe1        8        System.String  0 instance 00000000 _cachedUniqueID
68a2af44  4001fe2        c ...em.Web.UI.Control  0 instance 06a77688 _parent
68a91070  4001fe3       2c         System.Int32  0 instance        5 _controlState
68a85ea0  4001fe4       10 ...m.Web.UI.StateBag  0 instance 06a788f0 _viewState
68a2af44  4001fe5       14 ...em.Web.UI.Control  0 instance 06a76df0 _namingContainer
68a273d0  4001fe6       18   System.Web.UI.Page  0 instance 06a76df0 _page
68a92e2c  4001fe7       1c ...+OccasionalFields  0 instance 06a78974 _occasionalFields
68a2b378  4001fe8       20 ...I.TemplateControl  0 instance 00000000 _templateControl
...

And from the viewstate we can pull out the text value

0:021> !do 06a788f0
Name: System.Web.UI.StateBag
MethodTable: 68a85ea0
EEClass: 68a85e30
Size: 16(0x10) bytes
 (C:\WINDOWS\assembly\GAC_32\System.Web\2.0.0.0__b03f5f7f11d50a3a\System.Web.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
79113dfc  400235f        4 ...tions.IDictionary  0 instance 06a78900 bag
79104f64  4002360        8       System.Boolean  0 instance        1 marked
79104f64  4002361        9       System.Boolean  0 instance        0 ignoreCase

0:021> !do 06a78900
Name: System.Collections.Specialized.HybridDictionary
MethodTable: 7a747ad4
EEClass: 7a7aa890
Size: 20(0x14) bytes
 (C:\WINDOWS\assembly\GAC_MSIL\System\2.0.0.0__b77a5c561934e089\System.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
7a747bac  4001145        4 ...ed.ListDictionary  0 instance 06a78924 list
790fea70  4001146        8 ...ections.Hashtable  0 instance 00000000 hashtable
79104f64  4001147        c       System.Boolean  0 instance        0 caseInsensitive

0:021> !do 06a78924
Name: System.Collections.Specialized.ListDictionary
MethodTable: 7a747bac
EEClass: 7a7aa918
Size: 28(0x1c) bytes
 (C:\WINDOWS\assembly\GAC_MSIL\System\2.0.0.0__b77a5c561934e089\System.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
7a747c78  4001148        4 ...ry+DictionaryNode  0 instance 06a78940 head
790fed1c  4001149       10         System.Int32  0 instance        1 version
790fed1c  400114a       14         System.Int32  0 instance        1 count
791117c8  400114b        8 ...ections.IComparer  0 instance 00000000 comparer
790f9c18  400114c        c        System.Object  0 instance 00000000 _syncRoot

0:021> !do 06a78940
Name: System.Collections.Specialized.ListDictionary+DictionaryNode
MethodTable: 7a747c78
EEClass: 7a7aa9b8
Size: 20(0x14) bytes
 (C:\WINDOWS\assembly\GAC_MSIL\System\2.0.0.0__b77a5c561934e089\System.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790f9c18  4001158        4        System.Object  0 instance 06a66bd8 key
790f9c18  4001159        8        System.Object  0 instance 06a78914 value
7a747c78  400115a        c ...ry+DictionaryNode  0 instance 00000000 next

0:021> !do 06a66bd8
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 26(0x1a) bytes
 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: Text
...

So we have the Text property, now let's look at the value for it...

0:021> !do 06a78914
Name: System.Web.UI.StateItem
MethodTable: 68a131b4
EEClass: 68a13144
Size: 16(0x10) bytes
 (C:\WINDOWS\assembly\GAC_32\System.Web\2.0.0.0__b03f5f7f11d50a3a\System.Web.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790f9c18  4002362        4        System.Object  0 instance 06a78184 value
79104f64  4002363        8       System.Boolean  0 instance        1 isDirty

0:021> !do 06a78184
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 28(0x1c) bytes
 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: 40000
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790fed1c  4000096        4         System.Int32  0 instance        6 m_arrayLength
790fed1c  4000097        8         System.Int32  0 instance        5 m_stringLength
790fbefc  4000098        c          System.Char  0 instance       34 m_firstChar
790fa3e0  4000099       10        System.String  0   shared   static Empty
    >> Domain:Value  001a8868:790d6584 001ca990:790d6584 <<
79124670  400009a       14        System.Char[]  0   shared   static WhitespaceChars
    >> Domain:Value  001a8868:06a303f0 001ca990:06a34118 <<


Voila!!! the number of iterations is 40.000, now that will generate a huge string:)

Conclusion:

As I mentioned before, one huge string won't really make the GC go bananas, but... how about 560.000 give or take a couple of houndred:)...? now that just might do it:)

I think I mentioned it before, but it is worth mentioning again.  Strings are immutable, which means that a string can not be changed... so if you concatenate strings you don't really append to the string, instead you create a new bigger string.

For example, the following code...

                      String str = "Hello";
                      str += " w";
                      str
+= "o";
                      str += "r";
                      str += "l";
                      str += "d";

...will produce these strings: Hello, Hello w, Hello wo, Hello wor, Hello worl, Hello world...  you get the picture:)

(note: changed sample from str += " w" + "o" + "r" + "l" + "d"; based on Fernandos comment)

So what can you do about it?  For one, use the string builder, it pre allocates a char buffer used for the string instead of an actual string object, so that way you will just fill up the buffer instead of creating new strings.

Seoncdly, you should try to avoid code like the above, especially if there are many iterations, as the loop itself will cause pretty high CPU since all it does is CPU intensive (i.e. no I/O operations or other low CPU operations).

This is of course only one scenario for high CPU, but I hope you get the gist of how to troubleshoot them. 

1. Look at the perf counters and determine that you have a high cpu in GC problem
2. Grab a few hang dumps when CPU is high so you have something to compare with
3. Make sure that the dump is in GC
4. Look at the thread triggering the GC and see if you can identify anything in it that could be responsible for a large number of memory allocations, thus causing a large number of GC's.
5. Fix the problem:) 

Happy mid-summers day tomorrow!





  • I get enough issues relating to bad perf caused by large viewstate that I felt like it is time to dedicate

  • Excellent article!

    We are also in a similar situation. Our code utlise some wrappers to call some native C code and if we build the wrappers in C# ie in manaed code then the server hangs within a few minutes as they are too many alloc for the GC to keep up. If we build the wrappers using __pin and Marshall then the web servers are ok but %Time in GC over around 80% with a CPUI load at 75% in .NET2.0.

    The exact same app in .NET1.1 does not experience any of this! Any clues here?

    Your help would be greatly appreciated.

  • Ah, to have your understanding...  Currently our production app is experiencing almost exactly this issue.  Our GC 0,1, and 2 are all in lock step after all allocation seems to be done.  We're allocating a bunch (emphasis on bunch) of small things (around 400,000 a second) at different memory sizes but jumping to 30MB/sec at one point.  Having the most impossible time breaking down what you have here to steps we can take because you're so much smarter than me and assume I know things I don't.  I hope your enjoying your ambrosia.

  • Hi Chris,

    It's a bit hard to tell just from that description what is going on, but it does sound like one of 2 things if GC 0,1 and 2 increase by pretty much exactly the same numbers.

    1. you do have some large objects that you are perhaps not aware of

    2. someone is calling GC collect

    I would suggest that you check out the large object heap size in perfmon (under .net clr memory) and see if it goes up, if it does, look at the dump and run !dumpheap -min 85000,  perhaps all these small objects are stored in a large object array or similar that ends up on the LOH.

    The other option with the GC.Collect you can verify or disspell by checking out the number of induced gc's (also under .net clr memory).  

    Having said this, GC perf issues like this are super tricky to figure out.  I am working on one right now that I was planning on blogging on later (things have just been terribly busy lately), but my issue is caused by something really rare that i dont think applies to your situation.  It is caused by a library used in the asp.net app calling SetProcessorAffinityMask limiting the process to run on one process:) and thus causing all the gc threads to spin on that one process in turn causing major cpu usage...  a very ugly story...

  • I thought I'd share a support story with you from a very interesting case I have. My customer is running

  • PingBack from http://sanal.org/?p=319

  • Ah. Garbage Collection... how I love and hate thee. =P I think one sad thing about programming in .net

  • These are the articles (in no particular order) that I felt best showed a thorough use of the WinDbg

  • Thanks, this really helped me find one of the problems leading to high cpu usage on our server!

Page 2 of 2 (24 items) 12
Leave a Comment
  • Please add 1 and 7 and type the answer here:
  • Post