If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.NET Debugging Demos Lab 2: Crash

.NET Debugging Demos Lab 2: Crash

  • Comments 31

It was nice to see that so many people downloaded the demo site already and checked out the lab instructions for the first lab, and thanks to Pedro for pointing out that the original demo site required .NET Framework 3.5... I've changed it now so the one that you can download from the setup instructions page should not require .Net Framework 3.5.  (Even though I would encourage you to download 3.5 and play around with it anyways:))

Here comes lab 2, a crash scenario on the BuggyBits site.  

Previous demos and setup instructions

Information and setup instructions
Lab 1: Hang
Lab 1: Hang - review 

Reproduce the problem

1. Browse to the reviews page http://localhost/BuggyBits/Reviews.aspx, you should see a couple of bogus reviews for BuggyBits

2. Click on the Refresh button in the reviews page. This will crash the w3wp.exe process (or aspnet_wp.exe on IIS 5) 

    Note: If you have Visual Studio installed a Just-In-Time Debugger message may pop up (just click no for the purposes of this excercise).    
    However since this message box will sit there and wait for user input in order to shut down the app you may want to
disable JIT debugging if you have visual studio 
    installed on a test system.

Examine the eventlogs

1. Open the Application and System eventlogs, the information in the eventlogs will differ based on the OS and IIS version you are running. Among other events you may
    have a System Event looking something like this...

Event Type:	Warning
Event Source:	W3SVC
Event Category:	None
Event ID:	         1009
Date:		2008-02-08
Time:		10:12:06
User:		N/A
Computer:   	MYMACHINE
Description:
A process serving application pool 'DefaultAppPool' terminated unexpectedly. The process id was '4592'. 
The process exit code was '0xe0434f4d'.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Q: What events do you see?

Q: What does the exit code 0xe0434f4d mean?

Q: Can you tell from the eventlogs what it was that caused the crash? 

Get a memory dump

1. Browse to the reviews page http://localhost/BuggyBits/Reviews.aspx again, but don't click refresh

2. Open a command prompt and move to the debuggers directory and type in "adplus -crash -pn w3wp.exe" and hit enter

Q: A new window should appear on the toolbar, what is it? 

Q: What is the debugger waiting for? Hint: Check the help files for ADplus/crash mode in windbg

3. Reproduce the issue by clicking on the refresh button in the reviews page.

Q: What files got created in the dump folder?  Note: The dump folder will be located under your debuggers directory with the name crash_mode and todays date and time 

Open the dump in windbg

1. Open the dump file labeled 2nd Chance CLR Exception in windbg (file/open crash dump).  Note that this dump got created just before the 1st chance process shutdown.

Note: if you throw an exception (.net or other) you have a chance to handle it in a try/catch block.  The first time it is thrown it becomes a 1st chance exception and is non-fatal.  If you don't handle the exception it will become a 2nd chance exception (unhandled exception) and any 2nd chance exceptions will terminate the process.

2. Set up the symbol path and load sos (see the setup instructions for more info)

 

In a crash dump, hte active thread is the one that caused the exceptions (since the dump is triggered on an exception).

Q: Which thread is active when you open the dump? Hint: check the command bar at the bottom of the windbg window.

Examine the callstacks and the exception

1. Examine the native and managed callstacks. 

kb 2000
!clrstack

Q: What type of thread is it?

Q: What is this thread doing?

2. Examine the exception thrown

!pe

Note: !pe/!PrintException will print out the current exception being thrown on this stack if no parameters are given

Q: What type of exception is it?

Note: In some cases, like this one where the exception has been rethrown, the original stacktrace may not be available in the exception.  In cases like this you may get more information if you find the original exception

3. Look at the objects on the stack to find the address of the original exception

!dso

Q: What is the address of the original exception

Hint: Look at your previous pe output to see the address of the rethrown exception.  Compare this to the addresses of the objects on the stack.  You should have multiple exceptions, a few with the rethrown exception address but one of the bottommost exceptions will be the original one (look for one with a different address).

4. Print out the original exception and look at the information and the callstack

!pe <original exception address>

Q: In what method is the exception thrown?

Q: What object is being finalized?

Note: you could actually have gotten this information by dumping out the _exceptionMethodString of the rethrown exception as well, but with !pe of the original exception you get the information in a cleaner way.

Q: Normally exceptions thrown in ASP.NET are handled with the global exception handler and an error page is shown to the user.  Why did this not occurr here?  Why did it cause a crash?

Examine the code for verification

1. Open Review.cs to find the destructor/finalizer for the Review class

Q: which line or method could have caused the exception

 

As an extra excercise you can also examine the disassembly of the function to try to pinpoint better where in the function the exception is caused

!u <IP shown in the exceptionstack>

 

Related posts

Creating dumps with Windbg and writing ADPlus Config files

ASP.NET 2.0 Crash case study: Unhandled exceptions

What on earth caused my process to crash?

.Net exceptions - Tracking down where in the code the exceptions occurred

 

Have fun debugging,

Tess

  • PingBack from http://blogs.msdn.com/tess/pages/net-debugging-demos-information-and-setup-instructions.aspx

  • Thank you very much for the wonderful work. I was in big need of a tool like Windbg, it really helps with my work.

  • Hi Tess,

    Excellent labs, looking forward to more.

    I have one question though. On a server running multiple sites (hundreds in fact) in the same app pool, how do I identify the site that caused a hang or crash having identified the root cause?

    Or to put it differently, how do I match up threads to IIS sites/applications in windbg?

    As a hoster we can be running up to 1000 sites on a single server with those sites divided across say 5-10 app pools.

    Cheers

    Kev

  • Hi Kev,

    Although i probably wouldn't recommend running 200 apps per app pool because of how much memory usage there would be per process (likely OOMs just because of the dlls loaded alone) your question is very valid.

    The finalizer thread is common to all apps in the process but for all other threads you can check out the threads in !threads and check which appdomain the code is running in by running !dumpdomain on the domain in the domain column.

  • Hello!

    I have the following SOS output:

    0:000> !threadpool

    CPU utilization 100%

    Worker Thread: Total: 2 Running: 0 Idle: 2 MaxLimit: 25 MinLimit: 2

    Work Request in Queue: 0

    --------------------------------------

    Number of Timers: 3

    --------------------------------------

    Completion Port Thread:Total: 5 Free: 0 MaxFree: 4 CurrentLimit: 2 MaxLimit:

    25 MinLimit: 2

    It's obvious, that the 100% CPU utilization is a problem (that has been solved already). My question is if and how the Threadpool used the current CPU utilization for scheduling WorkItems or to control how and if new Workerthreads are created (the Threadpool is primary used for async Socket Operations (HttpListener) within this project).

    It seems like no new threads are started, even if the current number of Threadpool Threads is below the Max-Threads (what could makes sence because there would be no resources available for the new thread).

    So ... how can the values (Total, Running, Idle, MaxLimit and MinLimit) be interpreted?

    Any toughts?

  • Are you saying you are getting those numbers for this lab or in some other dump?  I'm just curious because you shouldnt get 100% CPU in this specific crash lab...

    The 100% is for the whole system, not only the w3wp.exe process so this would also include any CPU usage by other processes.

    Total = number of current worker threads started (running+idle)

    Running = executing a request or work item

    MaxLimit = max number of worker threads (as set in machine.config for asp.net or 1000 by default for winforms)

    MinLimit = 1 per logical CPU (min number of worker threads at any given time)

    The threadpool does take CPU usage into account, and currently it will not create new threads if the systems cpu usage is over 80%

  • Hello!

    First of all: No. This is not related to this lab. I just read the Review posting where you said that questions - even not directly related to the lab - are welcome.

    Thank you for sharing this information! The 80% threshold is hardcoded or configurable?

    Maybe you can also explain the Completion Port Thread values (Total, Free, MaxFree, CurrentLimit, MaxLimit and MinLimit) too? I've already search over and over the web but doesn't find anything about them.

    Thank you!

  • Its totally cool to ask questions not related to the lab:)  just wanted to make sure that the lab didnt behave like that on your machine.

    The completion port threads are pretty much the same.  Completion ports are mostly used for callbacks but can be used for work items too if there are available completion port threads but no available worker threads.

    The 80% is hard coded but in reality there is no use to change it since you really can't do much with new threads at that CPU level anyways.

  • Hello Tess!

    Great post, gives very important knowledge needed for newbie .NET crash analyst :)

    I have a situation where DFS management snapin is crashing due to a null-reference exception (0x80004003), and i've gone a long way to identify at what level the exception occurs (let me know if you're curious enough to look at the dump - you should have my email somewhere hopefully). Eventually i came to the clr thread stack where the exception occurs, but i'm stuck, because i want to observe what parameters are passed to each funciton in the stack, but i went through your blog posts, and Johan's, and i can't seem to find a way to do this. Can you advice a little bit please? I've been banging my head against the table about this case for weeks now.

    Regards,

    Andrew

  • Hi Andrew,

    Yepp, I remember you:)  I can't really commit to look at any dumps but I can give you some pointers.

    0x80004003 is not really a clr exception, and I am not sure based on your comment if you are actually stopped at the exception or just see it on the heap.  If you got it from the heap you won't be able to inspect the parameters etc. so in that case you would have to set up debug diag or an adplus config file to get a full dump on 0x80004003.  Check the windbg help files for adplus config for more info on that...

    If you are stopped on the exception you can either use !clrstack -p to find the parameters or if that doesnt help you can try !dso to see the objects on the stack,

    Best of luck

    Tess

  • I decided to publish, every friday, some links that i judge interesting, from now. Architecture Scott

  • TGIF, almost time for the weekend... but before you leave, here is lab 3. Todays debugging puzzle will

  • Thanks a lot for these labs, I'm learning a lot. The assembler stuff was really neat to learn.

  • Lab 1: HangLab 2: CrashLab 3: MemoryLab 4: High CPU Hang

  • We have reached the end of the .NET Debugging Demos series. And we are going to end it with a review

Page 1 of 3 (31 items) 123
Leave a Comment
  • Please add 2 and 6 and type the answer here:
  • Post