It may appear as a contradiction after my previous post, but the first thing to do to start analyzing a memory dump is ask yourself: do I really need a dump?!?
Let me explain: when you need to troubleshoot an error there are a number of things to do before really going down the dump path, simply because not all problems can be resolved in that way... keep in mind that a dump is nothing more than a snapshot of a process at a certain point in time, we can try to understand what happened in the past but with some limitations (as long as the details we are looking for are still in memory), and of course we can't know what happened to the process after the dump has been taken; exactly like a picture you can take with your digital camera. For example if you are having problems to remotely debug your application, I hardly think a dump can add any value to your troubleshooting... while in case of a memory leak a dump is one of the first things I ask the customer to provide me (but again there are some things before this step).
What's first, then? Well, as you can guess, the first step is to understand the problem and know the scenario where it reproduces. If you are troubleshooting your own application you should already know most of the details, but if you are a consultant and are helping one of your customers with a weird exception thrown one in a while, you must have an open an ongoing discussion with the people whom developed the application, and maybe with the IT pros whom are maintaining the application and the environment day by day.
Let's assume we have the information we need to start, and we decided we need to capture a dump. But which kind? How? When? Moreover, are you sure you and your customer are talking the same language and using the same terms to name things? I tell you because I learnt this lesson on my own in the hard way... the customer was describing a crash in his application so we configured adplus to run in crash mode, but some some reason we were unable to get a dump when the crash reproduced; we kept trying, but after 3-4 runs we gave up. Finally it turned out that the crash the customer was reporting was "just" an exception not handled in a try...catch block shown to the final user (do you know the yellow/orange ASP.NET error page?) but the worker process was still happily serving requests for other users...
So, here is some basic terminology: if your customer is not expert in this area, assure he understands those terms and stick to them to avoid confusion. This still apply if you'll ever need to raise a support call with Microsoft CSS, this is the terminology you can expect to be used
So, now that we gathered all those details about the problem, which is the right approach to capture the dump we need? It depends on the problem, of course. There may be some variations depending on the circumstances, but basically we can capture either a crash or hang dump. What's the difference?
We'll need a crash dump when we can't determine when the problem (typically a crash like the name implies, but that's not the only case) will happen, so we can configure the debugger in advance to monitor our target process and capture a dump when the process will be terminated, or when we need to capture a dump on a specific exception (as I'll discuss in another post). On the other hand, we can capture a hang dump after the problem has occurred but the process is still in memory, for example in a memory leak scenario but also when a process is burning our your CPU.
As the name suggests, exceptions should be the exception rather than the rule; so for example it's always a good idea to check if an object is valid before trying to use it, rather than let the runtime throw an exception and trap it in a try...catch block. Anyway what happens when you have a debugger attached to a process? The debugger gets the first chance to handle the exception; If it allows the execution to continue and does not handle the exception, the application will see the exception as usual. If the application does not handle the exception, the debugger gets a second chance to see the exception; in this case the application would normally crash if the debugger was not present.
This is more clear when you use adplus in crash mode: by default the debugger attaches to the target process and logs every exception thrown; if you try you'll very likely end up with quite a few minidumps (a few megabytes each) corresponding to every exception thrown and trapped in try...catch blocks in your code, and a second chance full dump (the same size as the process, the private bytes value you can see in TaskManager) when the process will crash.
I mean in terms of performance for your server which could potentially be a highly stressed production environment? Of course there is a cost, especially for a crash dump because you'll have a debugger attached to your worker process for the time needed to reproduce the problem, but it's hard to exactly tell how much; in my experience I just had one server where the debugger was really affecting the site and forced us to stop it. But it worth mentioning that the server was already beyond its capacity limit and was already performing badly, the debugger was just the last straw...
Having a repro in a test environment is the ideal situation, since we'll be able to capture dumps, run tests and do whatever needed to resolve the problem without causing additional pain to your poor users.
PingBack from http://msdnrss.thecoderblogs.com/2007/09/14/something-you-need-to-know-before-start-debugging/
Much like yourself I've recently been put into a position of becoming a support team member (read: debugging guy) rather than a regular programmer.
I've enjoyed reading these 'new to debugging' entries. My experience with 'crash' vs 'hang' and learning the toolsets of the trade matches yours. Looking forward to more entries!
I've just finished writing up an e-mail for some new people in my team about starting Debugging and the