If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.NET Debugging Demos Lab 4: High CPU hang

.NET Debugging Demos Lab 4: High CPU hang

  • Comments 13

So we finished the first round of performance issues, crashes and memory leaks.  This time we are going to dive in to a high CPU situation and I know that this is giving it away a little but before you go through the lab you might want to have a look at my GC Pop Quiz to familiarize yourself with how the GC works if you feel a bit shaky about it.  

Previous demos and setup instructions

If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.

Information and setup instructions
Lab 1: Hang
Lab 1: Hang - review
Lab 2: Crash
Lab 2: Crash - review
Lab 3: Memory
Lab 3: Memory - review

Reproducing the issue:

1. Recycle IIS (IISReset)

2. Browse to http://localhost/BuggyBits/AllProducts.aspx this should take about 5 seconds or more depending on what machine you're on

3. Watch the CPU usage in taskmanager while browsing to this page

Q: What is the CPU usage like for w3wp.exe during the processing of this request?

Set up a performance counter log:

1. Check out Lab 3: Memory and set up a performance counter log per the instructions in there and start the log

Prepare to gather a memory dump:

1. Open a command prompt, move to the debuggers directory and type the following command but don't hit enter

adplus -hang -pn w3wp.exe -quiet

Reproduce the issue again and gather the memory dump:

1. From a command prompt in the IIS Resources/Tinyget directory run the following tinyget command to spawn up 5 threads making one request each

tinyget -srv:localhost -uri:/BuggyBits/AllProducts.aspx -threads:5 -loop:1

2. When the CPU usage is high (in task manager), press enter in the adplus window to get the dump

3. Stop the performance monitor log

Open the dump to figure out what it is doing:

1. Open the dump (hang_mode with todays date and time in the debuggers directory) in windbg using file/open crash dump

2. Load up the symbols and sos

Verify that you took the dump at the right time:

1. Run !threadpool to see the CPU usage of the system to make sure that you got the dump at the right time

Q: What is the cpu usage?

Q: How does the CPU usage shown here correlate to the CPU usage of the process?

Q: If you ran tinyget you may see some requests queued up eg: Work Request in Queue: 4, why are requests queueing up? Why aren't they scheduled on threads? (Compare Running worker threads and Maxlimit for worker threads)

Determine which threads have been consuming most of the CPU time

1. Run .time to see the uptime and CPU user time of the process

2. Run !runaway to see the usertime for all the threads

Q: From !runaway which threads are consuming most of the CPU?

Q: How much usermode CPU time was consumed by the process in total?  (from .time)

Note: Looking at !runaway output can be a bit tricky for multiple reasons.

First off, on a multiproc machine you have to remember that the usermode time (i.e. clock cycles spent in usermode code) is CPU time spent on all processors, therefore the usermode time may add up to more than elapsed time (process uptime). 

Secondly, the !runaway output shows the total usermode time spent on that thread since it was started.  In asp.net for example a thread may be reused for multiple requests so a high usermode time on a thread does not neccesarily mean that the requests running on that thread is a big CPU consumer.  

And finally, some threads like the GC threads (in multiproc, serverGC processes) stay alive for the duration of the process, so they have a higher chance of accumulating a lot of usermode time than worker threads, therefore it is really more interesting to look at two consecutive dumps and compare the differences in usermode time.

3. Pick the threads that have the highest usermode time and look at their stacks

~#s                   (set thread context, note # should be replaced with the appropriate thread number from !runaway)
kb 2000             (look at the native stack) 
!clrstack             (look at the .net stack)

Q: What are they doing?  Can you make an assumption based on this as to what is causing the high CPU usage?

4. This is only applicable on multi-proc machines since they have dedicated GC threads...  Sum up the usermode time for the GC threads in !runaway and divide this by total usermode time from .time. 

Q: How much of the total usermode time is spent by the GC threads?

Examine the performance counter log:

1. Open up the performance counter log and add the counters for .NET CLR Memory/% Time in GC, # Gen 0 Collections, #Gen 1 Collections, #Gen 2 Collections and Large Object Heap Size.

Q: What does the % Time in GC counter measure?

Q: What is the average % Time in GC during the stress run?

Q: What is an appropriate number for % Time in GC?

Q: What is the ratio between Gen 0, Gen 1, and Gen 2 collections?  What should it be in an optimal scenario and why?

Q: What can cause a ratio like this?

Look at the memory dump to figure out what caused the high CPU in GC:

1. Run ~* kb 2000 to get all native stacks and search for the thread that triggered the GC (mscorwks!SVR::GCHeap::GarbageCollectGeneration)

Q: Why was a gc triggered?

Q: What type of object was it allocating?

Q: What is the thread doing?

Q: Could this be causing the bad Gen 0, Gen 1, Gen 2 ratio? Why?

2. Find out what is on the large object heap (note: if you unfortunate you may have caught the GC in the plan or relocate phase in which case the !dumpheap output may not be correct.  If that is the case try to make do with what you get from the stacks and !dumpheap -min 85000)

!dumpheap -min 85000

Q: What is on the large object heap?

Q: Is there any pattern in the sizes of the strings?

3. Dump out a few of the strings to see the contents

!do <address of string>

Q: Based on the stacks, the knowledge of the collection ratios and the string contents, can you determine what went wrong.  You might want to do some internet searches on some of the calls on the stacks if you don't know the answer immediately.

Verify your assumption in the code and modify the code to resolve the issue 

1. Open AllProducts.aspx.cs and look at the code for Page_Load

Q: What line of code is causing the problem?

Q: How do you resolve it?  Why does this resolve the problem?

2. Change the code so that it uses a stringbuilder instead of doing string concatenation and rerun the tests to see the results.

Note: Because there is a lot of looping to build the dataset the page may still take about half a second to a second to render but the CPU usage should be all gone and requests should not be queueing up.

Have fun,

Tess

  • !runaway doesn't seem to work on an XP machine, it throws the following error:

    The call to LoadLibrary(ext) failed, Win32 error 0n2

       "The system cannot find the file specified."

    Please check your debugger configuration and/or network access.

    The call to LoadLibrary(uext) failed, Win32 error 0n2

       "The system cannot find the file specified."

    Please check your debugger configuration and/or network access.

    could not get process information, 6

    Everything else (clrstack, dumpheap, do, etc) all work fine. Any ideas?

  • very curious, it should work... the problem seems to be that it doesnt find the extensions ext.dll and uext.dll that are supposed to be in the winext directory under the debuggers directory...

    !runaway is defined in uext.dll and ntsdexts.dll

    I haven't debugged on an XP machine for quite some time and I did notice that there was a winxp directory under the debuggers directory.  

    Try running .chain to see what extensions it has loaded and if it doesnt have ext.dll and uext.dll loaded you might want to try

    .load winext\ext.dll

    .load winext\uext.dll

    and try !runaway again...

  • ASP.NET Group Enabled Web Form Control Extensions [Via: andrewrea ] Sharepoint An easier way to Create...

  • Link Listing - February 22, 2008

  • Cause reason: string concat. Too many "new string" should be create and allocated in the heap. And then, strings will fill out the generation 0, then generation 1, at last is generation 2. So, gc should work to "move" some of them and collect .

    Code :

    ProductsTable += "<tr><td>" + dr[0] + "</td><td>" + dr[1] + "</td><td>" + dr[2] + "</td></tr>" ;

  • I was working through the High CPU Lab Review which is basically caused by high CPU in GC. To understand

  • Hope you have had a chance to go through the high CPU lab... Some people already commented with the solution

  • No luck finding uext.dll and I don't have a winext directory in the debugging tools directory.

    If I .load winxp\ntsdexts.dll then !runaway resolves but gives me:

    could not get process information, 6

    Any ideas?

  • I'm not 100% sure, but I can't see why there would be a difference between the installation on XP and other platforms.  If I were you I would just uninstall it and install/reinstall the latest version... if that doesnt work, try with an earlier version.

  • Here's some more information on troubleshooting a High CPU Issue. I'm using a Test Scenario which means

  • Following my post about how to troubleshoot a low CPU performance issue with VSTS Test, I will now do

  • Hi Tess,

    I was trying to apply the steps from you lab to my website, but couldn't get the clrstack command to work.

    I always get messages like that for all dlls:

    ** WARNING: Unable to verify checksum for App_Web_sa_cb2o9.dll

    *** ERROR: Module load completed but symbols could not be loaded for App_Web_sa_cb2o9.dll

    Can you give me a hint whats wrong?

    Thanks in Advance!!

    Ralf

  • if it starts loading symbols when you run a command like

    !clrstack that usually mean that it is misstyped somehow and windbg treats it like a symbol.  More specifically it will happen if you run clrstack instead of !clrstack

Page 1 of 1 (13 items)
Leave a Comment
  • Please add 4 and 7 and type the answer here:
  • Post