If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

Reader email: Need help troubleshooting perf/memory issues

Reader email: Need help troubleshooting perf/memory issues

Rate This
  • Comments 7

I get several emails every day (through the blog) with urgent requests to help troubleshoot memory issues or memory related performance issues.  Some just say “we have a memory issue, how do I troubleshoot it?” and others are far far more detailed with links to dumps and everything:)

I love getting emails, but since I have a day job too (troubleshooting such issues:)) I unfortunately have to ignore most such requests through the blog:( to have time to help the customers that contact me through the normal support channels, and therefore I would recommend that if you do have a pressing issue, that you create a support case on http://support.microsoft.com/ to make sure that you get a dedicated support engineer and that your issue gets the attention it deserves.

Having said all that, I wanted to point out one email I got today, both because I could see that the person writing it had put some effort into try troubleshooting the issue before contacting me, and because I think and hope that my answer might be able to give some starting points to other people in his situation.

  

Hi Tess,

We have a week to go for production and are stuck in one performance issue. Please guide me in resolving this.

We are running an ASP .Net (2.0) App on Windows Server 2003 (SP2), dual-processor, 4GB RAM. During stess-testing our app with 60 users, we see a drop in throughput from 17th minute to 22nd minute.
It rises and drops again around 40th minute.

I took 4 memory dumps for 8 mins, ran the perfmon counters. Here are the details:

        Dump1     Dump2     Dump3     Dump4
Size     1.16G     1.12G     1.04G     1.13G
GC       802MB     750MB     648MB     738MB
Loader   18MB     18MB     18MB     18MB Image 236MB     236MB     236MB     236MB
MEM_RESV     220MB     236MB     347MB     255MB
MEM_COMM     1.19GB    1.06GB    1021MB    1.08GB

Perfmon Counters (8 mins)
LOH:             307MB,
#Bytes in all heaps:     798MB,
%Time in GC:         49%(avg),
#Induced GC:        14 (avg).

So I can deduce the following:

1. We are stuck in GC for the 8 minutes.
2. Assembly leak is not the problem.
3. Am not sure if Finalizer thread is being blocked.

The kb reads as below:

0:017> kb
ChildEBP RetAddr     Args to Child
0e1afc74 7c827d0b     77e61d1e 00000334 00000000     ntdll!KiFastSystemCallRet
0e1afc78 77e61d1e     00000334 00000000 0e1afcbc     ntdll!NtWaitForSingleObject+0xc
0e1afce8 79e8c5f9     00000334 000007d0 00000000     kernel32!WaitForSingleObjectEx+0xac
0e1afd2c 79e8c52f     00000334 000007d0 00000000     mscorwks!PEImage::LoadImage+0x1af
0e1afd7c 79e8c54e     000007d0 00000000 00000000     mscorwks!CLREvent::WaitEx+0x117
0e1afd90 79fc2397     000007d0 00000000 00000000     mscorwks!CLREvent::Wait+0x17
0e1afdac 79fc24b4     000f2b28 0e1afeb0 000f3328     mscorwks!SVR::WaitForFinalizerEvent+0x4a
0e1afdc0 79e9845f     0e1afeb0 00000000 00000000     mscorwks!SVR::GCHeap::FinalizerThreadWorker+0x49
0e1afdd4 79e983fb     0e1afeb0 0e1afe5c 79f7759b     mscorwks!Thread::DoADCallBack+0x32a
0e1afe68 79e98321     0e1afeb0 9b91703f 00000000     mscorwks!Thread::ShouldChangeAbortToUnload+0xe3
0e1afea4 79eef6cc     0e1afeb0 00000000 0010f770     mscorwks!Thread::ShouldChangeAbortToUnload+0x30a
0e1afecc 79eef6dd     79fc2469 00000008 0e1aff14     mscorwks!ManagedThreadBase_NoADTransition+0x32
0e1afedc 79ef12ab     79fc2469 9b91718f 00000000     mscorwks!ManagedThreadBase::FinalizerBase+0xd
0e1aff14 79f92015     00000000 00000000 89244470     mscorwks!SVR::GCHeap::FinalizerThreadStart+0xbb
0e1affb8 77e64829     000f3328 00000000 00000000     mscorwks!Thread::intermediateThreadProc+0x49
0e1affec 00000000     79f91fcf 000f3328 00000000     kernel32!BaseThreadStart+0x34

4. !finalizequeue results
Heap 0
generation 0 has 158 finalizable objects (1d1d42c8->1d1d4540)
generation 1 has 269 finalizable objects (1d1d3e94->1d1d42c8)
generation 2 has 1961 finalizable objects (1d1d1ff0->1d1d3e94)
Ready for finalization 0 objects (1d1d4540->1d1d4540)
------------------------------
Heap 1
generation 0 has 830 finalizable objects (1d23d7c0->1d23e4b8)
generation 1 has 284 finalizable objects (1d23d350->1d23d7c0)
generation 2 has 1238 finalizable objects (1d23bff8->1d23d350)
Ready for finalization 0 objects (1d23e4b8->1d23e4b8)
 

How do I proceed from here to isolate the issue?

Thanks,

Looking at the above, these were the comments and suggestions for further troubleshooting that I sent him.

Hi,

I don’t agree that you are stuck in a GC for 8 minutes, you can check with # Gen 0/Gen 1/Gen 2 collections and will probably see them increasing.  The reason I say that you are not stuck is because the % time in GC counter will only update on completion of a GC so if you were stuck it would not update.

I do think however that your % Time in GC looks really high and I think there might be two reasons for this

  1. You are reaching the point where mem usage gets really close the OOM range (typically around 1 GB private bytes) so the process is probably working really hard to get rid of everything it can.  # Induced GC here is not 0 which means that someone called GC.Collect, and unless it is your app that calls GC.Collect it is probably ASP.NET doing this to clean up at an OOM.

    Each induced GC will be a full GC which can get pretty costly performance wise.

    Recommendation:
    Look at general memory usage, i.e. look at !dumpheap –stat etc. and see what is using all this .net memory and try to find places where you can reduce the memory usage, or alternatively, if you need to use all this memory, then you would need to move to a 64-bit environment.  (You might want to look at some of my posts on troubleshooting .net memory issues to narrow this down).  A potential also, if you are running multiple sites here, would be to separate them into different app pools.
  2. Your LOH looks pretty large (307 MB), if you are continuously allocating on the LOH you will also trigger a lot of full GC’s.

    Recommendation:  have a look at !dumpheap –min 85000 and see if there are any unexpected large objects, or if there are any that you can reduce in size or in number of objects. (See http://blogs.msdn.com/b/tess/archive/2006/06/22/asp-net-case-study-high-cpu-in-gc-large-objects-and-high-allocation-rates.aspx  for more details)

Regarding the finalizer, the finalizer is not stuck, it is sitting in mscorwks!SVR::WaitForFinalizerEvent+0x4a which means that it is just idling, waiting for the GC to tell it to finalize. 

I have a post with common asp.net threads and how they look in the “normal” idling state here http://blogs.msdn.com/b/tess/archive/2007/04/02/things-to-ignore-when-debugging-an-asp-net-hang-update-for-net-2-0.aspx that might be useful for further debugging.

You can further see that finalization is not a problem because !finalizequeue reports no objects to ready for finalization in either heap “Ready for finalization 0 objects”.  Here you can find an example of what it would look like if the finalizer was actually stuck.

Hope that helps,
Tess

  • The Concurrency Visualizer team had a recent post about how sometimes there can be a surprising number of WaitCallback items. blogs.msdn.com/.../reducing-memory-usage-to-increase-concurrency.aspx.

  • Hi Tess,

    Yes, your reply did help me in proceeding further with the issue! I figured out that all our objects were rooted in Session; the reason being that we are using SessionPageStatePersister for ViewState with a configurable depth. So we end up with holding really huge objects in "Session", which also explains why there used to be throughput drops every 20 minutes. That is when Sessions end and we lose reference to the variables, and GC can actually clean up some objects! We confirmed that the issue is indeed Session by moving Session to StateServer. Then w3wp process memory was constant at around 100MB and the StateServer's shot up to 800MB and stayed there!! So now we are in the process of cleaning up our ViewState mess!!! Thanks once again for your quick reply. Our App died due to Session-cum-ViewState; essentially, death by a thousand cuts!!

    - Bindu

  • Hi Tess,

    is there possibility in windbg/sos to determine memory usage per application domain?  Is there some way how to find this information?

    thanks

  • Nope, sorry, the heaps are not domain specific.

  • Very useful post and i always love to visit your blog... http://remotepccure.com

  • Fantastic goods from you, man. I have understand your stuff previous to and you are just too excellent. I really like what you have acquired here, really like what you’re saying and the way in which you say it. You make it entertaining and you still care for to keep it wise. I cant wait to read much more from you. This is really a great site. http://johnrleeperio.com

  • Recently puneet shah took one session for us and for references we directed us to your blog. I must the information out here are of real help and good learning. thanks a lot.

Page 1 of 1 (7 items)
Leave a Comment
  • Please add 2 and 6 and type the answer here:
  • Post