Rather than spending a lot of time on explaining the details of the garbage collector, I'll refer you to Maoni's blog for some very interesting reading, but in this case study I want to show you how to get to the bottom of a problem with high CPU in the garbage collector.
At times the application will be very slow to respond and the CPU usage is very high (70-80%).
I have already tipped you off in the title that the issue will be High CPU in GC, but how do we even get to that conclusion? The answer is performance monitor...
A high CPU issue is generally one of 3 things
· An infinite loop
· Too much load (i.e. too many requests doing lots of small things, so no one single thing is a culprit)
· Too much churning in the Garbage Collector.
The first one is pretty easy to spot in a memory dump if you get it at the right time, and is usually caused by some kind of logic problem. The second type is really tough and often requires adding hardware and new machines along with some serious profiling to optimize the code that is running on each request. For the purposes of this post, I'll skip both of those.
To find out if your performance issue might be GC related, your friend is the .NET CLR Memory performance counters
Let's first take a look at one of the most important performance counters used for these types of issues, .net CLR Memory / % Time in GC.
Paraphrasing from Maoni's blog and my favorite white paper on ASP.NET performance counters, the % Time in GC counter shows how much of the CPU time spent since the last GC was spent on the current GC. In other words, if we have spent 1 million cycles since the last GC, and we spent 0.3 million cycles in the current GC, this number will be 30%.
That means for short periods of times you can see huge spikes, but it is the average that is important for this counter.
In different articles people will declare the optimal average value to be 5% or 30% or 20%, so what is it? Short answer: there is really no optimal value, or rather, the optimal value would be close to 0, but reducing this value is very hard, so I wouldn't even bother if you don't have an average value above 30% or so. There is simply too much pain and too little gain.
The common cause for most high CPU in GC issues is a very high allocation rate (.net CLR Memory / allocated bytes/sec) but really, if you spend all your GC time doing Gen 0 collections that is not going to cause a high CPU in GC since Gen 0 collections are very inexpensive. So basically what hurts you is a lot of Gen 2 collections, i.e. a lot of .net objects making it into Generation 2 and then getting released. The other big thing that causes a high number of Gen 2 collections is excessive use of the Large Object Heap (LOH). The reason for this is because every time we fill up a LOH segment and need to create a new one to satisfy a request we need to garbage collect, and a large object heap collection triggers a Gen 2 collection, which in turn triggers a Gen 1 and a Gen 0 collection.
This brings us to the third group of counters that I usually look at (.net CLR Memory / # Gen 0 Collections, # Gen 1 Collections and # Gen 2 Collections). Again, there is no hard and fast rule about what the rates should be between these. Some recommend a 100/10/1 ratio, i.e. for each Gen 2 garbage collection you should have about 10 Gen 1 collections and 100 Gen 0 collections if you have a good allocation pattern. I think that is a good rule of thumb but deviances from this are common and a fairly small deviance wouldn't concern me too much, however if have the same value for Gen 0, Gen 1 and Gen 2 collections you can start worrying a little bit:)
Oh, while I remember, another thing (apart from the blocked Finalizer that I posted about earlier) that can cause a large number of Gen 2 collections is of course the human element, in other words if you call GC.Collect(3) or GC.Collect(2) manually in your application on a regular and frequent bases, or if you call GC.GetTotalMemory(true). To see if this can be an issue, watch the (.net CLR Memory / # Induced GC). Ideally this one should be 0, and it is in most cases, but if you are closing in on the memory limit you might see this one go up since ASP.NET itself will induce a GC at that point, but only 1 or so...
For this specific issue I have the following values:
% Time in GC ~40 % averageallocated bytes / sec 400 MB average# Induced GC 0# Gen 0 Collections 28.379# Gen 1 Collections 28.378# Gen 2 Collections 28.378
WOW! Now that is a massive amount of allocated bytes / sec :) but I definitely know that I am not allocating anything that huge in my application, especially not every second.
The other interesting point here is that our Gen 0, Gen 1 and Gen 2 are close to identical, which is a sure clue that we are either stressing the LOH pretty hard or we have a lot of objects that make it into the 2nd generation and then die immediately. My personal vote goes to the LOH.
In a perfect world we know exactly what page or small set of pages are responsible for the high allocation rate and we can use the CLR profiler (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/highperfmanagedapps.asp) to catalogue the allocations and figure out a way to reduce them. Oh, and did I mention, of course in a perfect world we can repro the problem on a test environment. But… need I say it? We don’t live in a perfect world…
However if you can repro the issue in test, I would highly recommend using the CLR profiler to look at your allocation pattern.
In this particular case, we have no clue what is causing the problem so we will resort to my favourite tool windbgJ
Debugging the problem:
GC problems are pretty hard to debug for a couple of reasons.
1. If you catch a -hang dump and it happens to be in the middle of a GC, that is no absolute indication that you have a high CPU in GC problem. In a process with fairly high load and a lot of allocations, GC's happen frequently so it is common to catch it in a GC. 2. Even if you know you have a GC problem (based on performance counters), and you catch the process in a GC, it is hard to tell what actually caused it.
The best way to catch the offender with the hands in the cookie jar, is to take multiple dumps on different occasions when the CPU is high and if they all look pretty much the same, there is a good possibility that you found your CPU thief.
Step 1 - Determining if we are in a GC:
As I've mentioned before we have two different versions of the GC (the server version and the workstation version). In the server version we have a dedicated GC thread per processor (or 2 if it is hyper threaded), in the workstation version we GC on the same thread that initiates the GC. The dump output I am going to show you is from a dual-core machine with .net 2.0.
If we are not in a GC, we will have 2 threads (one per core) looking something like this
18 Id: 134c.918 Suspend: 1 Teb: 7ffa8000 UnfrozenChildEBP RetAddr Args to Child 01e1fe68 7c822124 77e6baa8 00000398 00000000 ntdll!KiFastSystemCallRet01e1fe6c 77e6baa8 00000398 00000000 00000000 ntdll!NtWaitForSingleObject+0xc01e1fedc 79e77fd1 00000398 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac01e1ff20 79e77f9a 00000398 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x19901e1ff70 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x11701e1ff80 79f3549b ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x1701e1ffa8 79f6ece3 00000000 b28c067c 01e1ffec mscorwks!SVR::gc_heap::gc_thread_function+0x2e01e1ffb8 77e66063 000e7660 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b01e1ffec 00000000 79f6ec79 000e7660 00000000 kernel32!BaseThreadStart+0x34
So this is a GC thread waiting for work...
In our dumps from the problem our GC threads look like this…
16 Id: f28.1150 Suspend: 1 Teb: fff82000 UnfrozenChildEBP RetAddr Args to Child 0248fd28 7d4d8c46 000002dc 00000000 00000000 ntdll!ZwWaitForSingleObject+0x150248fd98 79e77fd1 000002dc ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac0248fddc 79e77f9a 000002dc ffffffff 00000000 mscorwks!PEImage::LoadImage+0x1990248fe2c 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x1170248fe3c 79f35e1a ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x170248fe54 7a0d5a3b 001afbf0 0000000d 00000002 mscorwks!SVR::t_join::join+0x610248ff54 79f391bf 00000002 001afbf0 00000000 mscorwks!SVR::gc_heap::plan_phase+0xd780248ff70 79f39954 00000002 ffffffff 001afbf0 mscorwks!SVR::gc_heap::gc1+0x570248ff88 79f35422 00000000 00000000 001afbf0 mscorwks!SVR::gc_heap::garbage_collect+0x37c0248ffa8 79f6ece3 00000000 914b1904 0248ffec mscorwks!SVR::gc_heap::gc_thread_function+0x680248ffb8 7d4e0729 001afbf0 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9b0248ffec 00000000 79f6ec79 001afbf0 00000000 kernel32!BaseThreadStart+0x34
…which indicates that we are in the middle of a garbage collection.
So what triggered this particular garbage collection? To find that out we have to look for a thread that has GarbageCollectGeneration on the stack...
21 Id: f28.1244 Suspend: 1 Teb: fff0f000 UnfrozenChildEBP RetAddr Args to Child 029eeeb4 7d4d8c46 000002f0 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15029eef24 79e77fd1 000002f0 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac029eef68 79e77f9a 000002f0 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x199029eefb8 79e77f50 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117029eefc8 79f3a3f3 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x17029eefec 79f3a2f6 ffffffff 001afcf8 029ef0e4 mscorwks!SVR::gc_heap::wait_for_gc_done+0x99029ef010 79f3a0c7 00000000 00000000 0004e210 mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x267029ef0a0 79f2d0aa 029ef0e4 0004e210 00000003 mscorwks!SVR::gc_heap::try_allocate_more_space+0x1bc029ef0c0 79f32b14 029ef0e4 0004e210 00000003 mscorwks!SVR::gc_heap::allocate_more_space+0x2f029ef100 7a0d5c9a 0004e20c 00000000 00000000 mscorwks!SVR::gc_heap::allocate_large_object+0x8c029ef11c 79e74c1c 00211be8 0004e20c 00000000 mscorwks!SVR::GCHeap::Alloc+0xbc029ef138 79e7e940 0004e20c 00000000 00000000 mscorwks!Alloc+0x60029ef18c 79e7e9a0 791240f0 00013880 00000000 mscorwks!FastAllocatePrimitiveArray+0xbd029ef24c 02920df3 00009c40 00002247 d768bc75 mscorwks!JIT_NewArr1+0x148WARNING: Frame IP not in any known module. Following frames may be wrong.029ef2a8 6881bdc6 02a60824 06a77b84 0aa3a218 0x2920df3029ef2c0 6881bfbc 06a76df0 0aa3a218 06a7cb04 System_Web_ni+0x25bdc600000000 00000000 00000000 00000000 00000000 System_Web_ni+0x25bfbc
In SOS for 1.1 the threads output for this thread would have a (GC) in the exception column like this...
0:029> !threadsThreadCount: 7UnstartedThread: 0BackgroundThread: 7PendingThread: 0DeadThread: 0Hosted Runtime: no
PreEmptive GC Alloc Lock ID OSID ThreadOBJ State GC Context Domain Count APT Exception 14 1 11b8 001aba50 1808220 Enabled 00000000:00000000 001a8868 0 Ukn (Threadpool Worker) 18 2 11f8 001b1ef0 b220 Enabled 00000000:00000000 001a8868 0 MTA (Finalizer) 19 3 1174 001c75e8 80a220 Enabled 00000000:00000000 001a8868 0 MTA (Threadpool Completion Port) 20 4 133c 001ca250 1220 Enabled 00000000:00000000 001a8868 0 Ukn 12 6 1178 001dcae8 880a220 Enabled 00000000:00000000 001a8868 0 MTA (Threadpool Completion Port) 21 7 1244 00211ba8 180b220 Enabled 00000000:00000000 001ca990 1 MTA (GC) 22 5 6f8 001dc450 880b220 Enabled 00000000:00000000 001a8868 0 MTA (Threadpool Completion Port)
…but that is not always true for sos.dll for 2.0, so then we have to resort to finding the thread that calls garbage collect generation...
Step 2 - Why are we garbage collecting:
In this case we can see that the GC happened because we tried to allocate a large object (allocate_large_object) and we needed more space on the LOH to do this... now that is a pretty good indication that we hit the spot... particularly since we saw that the number of Gen 2 collections and Gen 1 collections and Gen 0 collections were pretty much the same, indicating that there is a lot of Gen 2, or LOH collections.
The other thing we can see from this stack before moving on to the managed stack is that we are trying to allocate an object of size 0x0004e20c (the first parameter to allocate_large_object) which equates to 320.012 bytes.
We can also see if we run !runaway (which gives us combined user mode CPU time for different threads) that the threads that have used up most CPU since the start of this process are the following:
0:029> !runaway User Mode Time Thread Time 14:11b8 0 days 0:00:55.687 16:1150 0 days 0:00:45.500 17:7a8 0 days 0:00:43.875 21:1244 0 days 0:00:23.140 0:ea0 0 days 0:00:00.046 29:fc8 0 days 0:00:00.000 28:11f0 0 days 0:00:00.000
The total up-time for the process is around 17 minutes so it has been idle for most of its lifetime.
0:021> .timeDebug session time: Tue Jun 20 09:46:04.000 2006 (GMT+2)System Uptime: 4 days 6:36:51.031Process Uptime: 0 days 0:16:53.000 Kernel time: 0 days 0:00:45.000 User time: 0 days 0:02:48.000
Threads 16 and 17 are our GC threads and they will pretty much always be on top of the list since a lot of the other threads (worker threads) will spend a lot of their time being idle waiting for requests.
Then we have thread 14 and 21 that are worker threads (based on the threads output above), and if we start with thread 14 we can see based on the stack output that this thread is currently idle…
14 Id: f28.11b8 Suspend: 1 Teb: fff88000 UnfrozenChildEBP RetAddr Args to Child 020afc8c 7d4d8c46 00000224 00000000 020afcd0 ntdll!ZwWaitForSingleObject+0x15020afcfc 79e77fd1 00000224 00009c40 00000000 kernel32!WaitForSingleObjectEx+0xac020afd40 79e77f9a 00000224 00009c40 00000000 mscorwks!PEImage::LoadImage+0x199020afd90 79e77f50 00009c40 00000000 00000000 mscorwks!CLREvent::WaitEx+0x117020afda0 79f5b69c 00009c40 00000000 00000000 mscorwks!CLREvent::Wait+0x17020afe20 7a1121c3 001862f8 00009c40 00000000 mscorwks!ThreadpoolMgr::SafeWait+0x73020afe94 79f71123 00000000 00000000 00000000 mscorwks!ThreadpoolMgr::WorkerThreadStart+0xf1020affb8 7d4e0729 0019cab8 00000000 00000000 mscorwks!ThreadpoolMgr::intermediateThreadProc+0x49020affec 00000000 79f710dd 0019cab8 00000000 kernel32!BaseThreadStart+0x34
...so it is not doing any work now, but some time during the 17 minutes that the process has been up it has spent 56 seconds using the CPU, dispersed on one or more calls (we really can't tell).
The other thread (thread 21) is the thread that was triggering the GC, and again, we can't tell if the 23 seconds of CPU usage is only on the current request or over multiple requests, so we shouldn't follow this output blindly, but it can often give a good indication of where a high CPU problem lies.
If we return to the "why we are doing a garbage collection", we can look at the managed callstack for thread 21 (the thread that triggered the GC).
0:021> !clrstackOS Thread Id: 0x1244 (21)ESP EIP 029ef1e8 7d61c824 [HelperMethodFrame: 029ef1e8] 029ef254 02920df3 LargeObjectHeap.GetLotsOfDatesXML(Int32)029ef2a4 02920b20 LargeObjectHeap.Button1_Click(System.Object, System.EventArgs)029ef2b4 6881bdc6 System.Web.UI.WebControls.Button.OnClick(System.EventArgs)029ef2c8 6881bfbc System.Web.UI.WebControls.Button.RaisePostBackEvent(System.String)029ef2dc 6881bf38 System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(System.String)029ef2e0 687d91e0 System.Web.UI.Page.RaisePostBackEvent(System.Web.UI.IPostBackEventHandler, System.String)029ef2e8 687d912a System.Web.UI.Page.RaisePostBackEvent(System.Collections.Specialized.NameValueCollection)029ef2f8 687dcbbf System.Web.UI.Page.ProcessRequestMain(Boolean, Boolean)029ef4b0 687db521 System.Web.UI.Page.ProcessRequest(Boolean, Boolean)029ef4e0 687db487 System.Web.UI.Page.ProcessRequest()029ef518 687db3a7 System.Web.UI.Page.ProcessRequestWithNoAssert(System.Web.HttpContext)029ef520 687db33a System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)029ef534 02920795 ASP.largeobjectheap_aspx.ProcessRequest(System.Web.HttpContext)029ef538 686888df System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()029ef56c 6865a071 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef)029ef5ac 6865a39b System.Web.HttpApplication.ResumeSteps(System.Exception)029ef5f4 6865912d System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object)029ef610 6865e0bd System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)029ef644 6865dd72 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)029ef650 6865c447 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)029ef800 79f1ef33 [ContextTransitionFrame: 029ef800] 029ef850 79f1ef33 [GCFrame: 029ef850] 029ef9a8 79f1ef33 [ComMethodFrame: 029ef9a8]
So the last call on the call stack LargeObjectHeap.GetLotsOfDatesXML() is what created the large object and thus triggered the Gen 3 (large object heap) collection, which in turn triggers a Gen 2, and a Gen 1 and so on...
But... one large object won't cause tonnes of CPU in GC, so let's take a peek at what this function is doing and see if we can definitely pin it down as the bad guy... (as if the class name and function name doesn't give it away already:))
String GetLotsOfDatesXML(int i)
String dates = "<Dates>";
for (int j = 0; j < i; j++)
DateTime dt = DateTime.Now.Add(new TimeSpan(j, 0, 0, 0));
dates += "<Date><Month>" + dt.Month + "</Month><Day>" + dt.Day + "</Day><DayOfWeek>" + dt.DayOfWeek + "</DayOfWeek><Date>";
dates += "</Dates>";
Hmm... very suspicious... depending on how big i is dates could become a very large string... we are already up to about 320000 bytes (160000 chars) based on the size of the large object allocation, which means that we have already passed through 1000 iterations give or take (if each count that the addition to the string in each iteration is around 140 chars, give or take a few characters).
In fact, if we wanted to know exactly how many iterations we are going to make, we need to find out what value the parameter i is set to.
From the callstack, this is passed in from LargeObjectHeap.Button1_Click
protected void Button1_Click(object sender, EventArgs e)
String str = GetLotsOfDatesXML(Int32.Parse(txtNumIterations.Text));
where txtNumIterations is a textbox.
Note: Please ignore the rest of the post all the way to conclusion if you are not really into drilling down to objects to find member variables:) but I thought it might be a nice exercise since it is sometimes necessary to drill very deep to find the whole truth...
To find the textbox and its value we can grab the aspx page from !dso
0:021> !dsoOS Thread Id: 0x1244 (21)ESP/REG Object Name029ef0f0 0b1a2270 System.String <String is invalid or too large to print>029ef1a8 0b1a2270 System.String <String is invalid or too large to print>029ef1bc 0b1a2270 System.String <String is invalid or too large to print>029ef228 06a759e4 System.String </DayOfWeek><Date>029ef28c 06a76df0 ASP.largeobjectheap_aspx029ef2b4 06a77b84 System.ComponentModel.EventHandlerList029ef2bc 06a77a84 System.Web.UI.WebControls.Button029ef2c8 06a76df0 ASP.largeobjectheap_aspx029ef304 06a76880 System.Web.HttpContext...
0:021> !do 06a76df0Name: ASP.largeobjectheap_aspxMethodTable: 02746cccEEClass: 028d24ccSize: 380(0x17c) bytes (C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\debuggersamples\e7443224\5232f845\App_Web_h0ctkxwz.dll)Fields: MT Field Offset Type VT Attr Value Name790fa3e0 4001fe0 4 System.String 0 instance 02a7cba4 _id790fa3e0 4001fe1 8 System.String 0 instance 00000000 _cachedUniqueID68a2af44 4001fe2 c ...em.Web.UI.Control 0 instance 00000000 _parent68a91070 4001fe3 2c System.Int32 0 instance 5 _controlState68a85ea0 4001fe4 10 ...m.Web.UI.StateBag 0 instance 00000000 _viewState68a2af44 4001fe5 14 ...em.Web.UI.Control 0 instance 00000000 _namingContainer68a273d0 4001fe6 18 System.Web.UI.Page 0 instance 06a76df0 _page...68a7d910 4000004 16c ...ebControls.Button 0 instance 06a77a84 Button168a95f40 4000005 170 ...bControls.TextBox 0 instance 06a77be4 txtNumIterations68a2bc80 4000006 174 ...Controls.HtmlForm 0 instance 06a77688 form1
Then we can look at the textbox
0:021> !do 06a77be4Name: System.Web.UI.WebControls.TextBoxMethodTable: 68a95f40EEClass: 68a95ebcSize: 80(0x50) bytes (C:\WINDOWS\assembly\GAC_32\System.Web\220.127.116.11__b03f5f7f11d50a3a\System.Web.dll)Fields: MT Field Offset Type VT Attr Value Name790fa3e0 4001fe0 4 System.String 0 instance 02a85a04 _id790fa3e0 4001fe1 8 System.String 0 instance 00000000 _cachedUniqueID68a2af44 4001fe2 c ...em.Web.UI.Control 0 instance 06a77688 _parent68a91070 4001fe3 2c System.Int32 0 instance 5 _controlState68a85ea0 4001fe4 10 ...m.Web.UI.StateBag 0 instance 06a788f0 _viewState68a2af44 4001fe5 14 ...em.Web.UI.Control 0 instance 06a76df0 _namingContainer68a273d0 4001fe6 18 System.Web.UI.Page 0 instance 06a76df0 _page68a92e2c 4001fe7 1c ...+OccasionalFields 0 instance 06a78974 _occasionalFields68a2b378 4001fe8 20 ...I.TemplateControl 0 instance 00000000 _templateControl...
And from the viewstate we can pull out the text value
0:021> !do 06a788f0 Name: System.Web.UI.StateBagMethodTable: 68a85ea0EEClass: 68a85e30Size: 16(0x10) bytes (C:\WINDOWS\assembly\GAC_32\System.Web\18.104.22.168__b03f5f7f11d50a3a\System.Web.dll)Fields: MT Field Offset Type VT Attr Value Name79113dfc 400235f 4 ...tions.IDictionary 0 instance 06a78900 bag79104f64 4002360 8 System.Boolean 0 instance 1 marked79104f64 4002361 9 System.Boolean 0 instance 0 ignoreCase
0:021> !do 06a78900 Name: System.Collections.Specialized.HybridDictionaryMethodTable: 7a747ad4EEClass: 7a7aa890Size: 20(0x14) bytes (C:\WINDOWS\assembly\GAC_MSIL\System\22.214.171.124__b77a5c561934e089\System.dll)Fields: MT Field Offset Type VT Attr Value Name7a747bac 4001145 4 ...ed.ListDictionary 0 instance 06a78924 list790fea70 4001146 8 ...ections.Hashtable 0 instance 00000000 hashtable79104f64 4001147 c System.Boolean 0 instance 0 caseInsensitive
0:021> !do 06a78924 Name: System.Collections.Specialized.ListDictionaryMethodTable: 7a747bacEEClass: 7a7aa918Size: 28(0x1c) bytes (C:\WINDOWS\assembly\GAC_MSIL\System\126.96.36.199__b77a5c561934e089\System.dll)Fields: MT Field Offset Type VT Attr Value Name7a747c78 4001148 4 ...ry+DictionaryNode 0 instance 06a78940 head790fed1c 4001149 10 System.Int32 0 instance 1 version790fed1c 400114a 14 System.Int32 0 instance 1 count791117c8 400114b 8 ...ections.IComparer 0 instance 00000000 comparer790f9c18 400114c c System.Object 0 instance 00000000 _syncRoot
0:021> !do 06a78940 Name: System.Collections.Specialized.ListDictionary+DictionaryNodeMethodTable: 7a747c78EEClass: 7a7aa9b8Size: 20(0x14) bytes (C:\WINDOWS\assembly\GAC_MSIL\System\188.8.131.52__b77a5c561934e089\System.dll)Fields: MT Field Offset Type VT Attr Value Name790f9c18 4001158 4 System.Object 0 instance 06a66bd8 key790f9c18 4001159 8 System.Object 0 instance 06a78914 value7a747c78 400115a c ...ry+DictionaryNode 0 instance 00000000 next
0:021> !do 06a66bd8 Name: System.StringMethodTable: 790fa3e0EEClass: 790fa340Size: 26(0x1a) bytes (C:\WINDOWS\assembly\GAC_32\mscorlib\184.108.40.206__b77a5c561934e089\mscorlib.dll)String: Text...
So we have the Text property, now let's look at the value for it...
0:021> !do 06a78914 Name: System.Web.UI.StateItemMethodTable: 68a131b4EEClass: 68a13144Size: 16(0x10) bytes (C:\WINDOWS\assembly\GAC_32\System.Web\220.127.116.11__b03f5f7f11d50a3a\System.Web.dll)Fields: MT Field Offset Type VT Attr Value Name790f9c18 4002362 4 System.Object 0 instance 06a78184 value79104f64 4002363 8 System.Boolean 0 instance 1 isDirty
0:021> !do 06a78184 Name: System.StringMethodTable: 790fa3e0EEClass: 790fa340Size: 28(0x1c) bytes (C:\WINDOWS\assembly\GAC_32\mscorlib\18.104.22.168__b77a5c561934e089\mscorlib.dll)String: 40000Fields: MT Field Offset Type VT Attr Value Name790fed1c 4000096 4 System.Int32 0 instance 6 m_arrayLength790fed1c 4000097 8 System.Int32 0 instance 5 m_stringLength790fbefc 4000098 c System.Char 0 instance 34 m_firstChar790fa3e0 4000099 10 System.String 0 shared static Empty >> Domain:Value 001a8868:790d6584 001ca990:790d6584 <<79124670 400009a 14 System.Char 0 shared static WhitespaceChars >> Domain:Value 001a8868:06a303f0 001ca990:06a34118 <<
Voila!!! the number of iterations is 40.000, now that will generate a huge string:)
As I mentioned before, one huge string won't really make the GC go bananas, but... how about 560.000 give or take a couple of houndred:)...? now that just might do it:)
I think I mentioned it before, but it is worth mentioning again. Strings are immutable, which means that a string can not be changed... so if you concatenate strings you don't really append to the string, instead you create a new bigger string.
For example, the following code...
String str = "Hello"; str += " w"; str += "o"; str += "r"; str += "l"; str += "d";
...will produce these strings: Hello, Hello w, Hello wo, Hello wor, Hello worl, Hello world... you get the picture:)
(note: changed sample from str += " w" + "o" + "r" + "l" + "d"; based on Fernandos comment)
So what can you do about it? For one, use the string builder, it pre allocates a char buffer used for the string instead of an actual string object, so that way you will just fill up the buffer instead of creating new strings.
Seoncdly, you should try to avoid code like the above, especially if there are many iterations, as the loop itself will cause pretty high CPU since all it does is CPU intensive (i.e. no I/O operations or other low CPU operations).
This is of course only one scenario for high CPU, but I hope you get the gist of how to troubleshoot them.
1. Look at the perf counters and determine that you have a high cpu in GC problem2. Grab a few hang dumps when CPU is high so you have something to compare with3. Make sure that the dump is in GC4. Look at the thread triggering the GC and see if you can identify anything in it that could be responsible for a large number of memory allocations, thus causing a large number of GC's.5. Fix the problem:)
Happy mid-summers day tomorrow!
I get enough issues relating to bad perf caused by large viewstate that I felt like it is time to dedicate
We are also in a similar situation. Our code utlise some wrappers to call some native C code and if we build the wrappers in C# ie in manaed code then the server hangs within a few minutes as they are too many alloc for the GC to keep up. If we build the wrappers using __pin and Marshall then the web servers are ok but %Time in GC over around 80% with a CPUI load at 75% in .NET2.0.
The exact same app in .NET1.1 does not experience any of this! Any clues here?
Your help would be greatly appreciated.
Ah, to have your understanding... Currently our production app is experiencing almost exactly this issue. Our GC 0,1, and 2 are all in lock step after all allocation seems to be done. We're allocating a bunch (emphasis on bunch) of small things (around 400,000 a second) at different memory sizes but jumping to 30MB/sec at one point. Having the most impossible time breaking down what you have here to steps we can take because you're so much smarter than me and assume I know things I don't. I hope your enjoying your ambrosia.
It's a bit hard to tell just from that description what is going on, but it does sound like one of 2 things if GC 0,1 and 2 increase by pretty much exactly the same numbers.
1. you do have some large objects that you are perhaps not aware of
2. someone is calling GC collect
I would suggest that you check out the large object heap size in perfmon (under .net clr memory) and see if it goes up, if it does, look at the dump and run !dumpheap -min 85000, perhaps all these small objects are stored in a large object array or similar that ends up on the LOH.
The other option with the GC.Collect you can verify or disspell by checking out the number of induced gc's (also under .net clr memory).
Having said this, GC perf issues like this are super tricky to figure out. I am working on one right now that I was planning on blogging on later (things have just been terribly busy lately), but my issue is caused by something really rare that i dont think applies to your situation. It is caused by a library used in the asp.net app calling SetProcessorAffinityMask limiting the process to run on one process:) and thus causing all the gc threads to spin on that one process in turn causing major cpu usage... a very ugly story...
I thought I'd share a support story with you from a very interesting case I have. My customer is running
PingBack from http://sanal.org/?p=319
Ah. Garbage Collection... how I love and hate thee. =P I think one sad thing about programming in .net
These are the articles (in no particular order) that I felt best showed a thorough use of the WinDbg
Thanks, this really helped me find one of the problems leading to high cpu usage on our server!