Hello NTDebuggers, in the spirit of Click and Clack (The Tappet brothers), a favorite troubleshooting show of mine, we thought it would be fun to offer up some Debug puzzlers for our readers.
That said, this week’s Debug Puzzler is in regard to Dr. Watson. I’m sure most of you have seen Dr. Watson errors. This typically means your application has crashed due to an unhandled exception. Sometimes however the process just seems to disappear. The Just-in-Time (JIT) debugging options configured via the AEDebug key does not catch the crash… Does anyone know why this may happen?
We will post reader’s comments as they respond during the week, and next Monday will post our answer and recognize some of the best answers we received from our readers.
Good luck and happy debugging!
- Jeff Dailey
[Update: our answer, posted 4/11/2008]
Hello NTDebuggers. Let me start of by saying that we were very impressed by our reader’s answers. Our two favorite answers were submitted by Skywing and molotov.
When a thread starts, the ntdll Run Time Library (RTL) for the process inserts an exception hander before it calls the BaseThreadInit code to hand control over to the executable or DLL running in the process (notepad in the example below). If anything goes wrong with the chain of exception handlers, the process can’t make it back to the RTL exception handler and the process will simply terminate. See http://www.microsoft.com/msj/0197/Exception/Exception.aspx for details.
000ef7ac 75fbf837 ntdll!KiFastSystemCallRet
000ef7b0 75fbf86a USER32!NtUserGetMessage+0xc
000ef7cc 00b21418 USER32!GetMessageW+0x33
000ef80c 00b2195d notepad!WinMain+0xec
000ef89c 76e24911 notepad!_initterm_e+0x1a1
000ef8a8 7704e4b6 kernel32!BaseThreadInitThunk+0xe
000ef8e8 7704e489 ntdll!__RtlUserThreadStart+0x23 << Exception Handler is inserted here.
000ef900 00000000 ntdll!_RtlUserThreadStart+0x1b
Secondly, the process that crashes is actually responsible for starting the debugger via the RTL exception handler. The debugger is registered under the AeDebug registry key. Even if you are able to unwind to the RTL exception handler you may still run into trouble. If the computer is low on system resources such as desktop heap, you may not be able to create a new process and thus will not be able to launch the debugger. As SkyWing stated, it’s a relatively heavyweight operation. Applications may also call TerminateProcess from within their own code based on an error condition. If we have a customer that sees this symptom on a regular basis we typically recommend having them attach a debugger to monitor the process. Simply run via ADPLUS -crash -p (PROCESSID).
Good work folks! We’ll have another puzzler ready next Monday.
Good Luck and happy debugging!
My guess: a regular process termination by some obscure component in your application, or alternatively regular terminate process from a global unhandled exception filter/handler.
bp ntdll!zwterminateprocess usually gets me a tiny little bit closer to the problem.
A thread exhausted it's stack because stack expansion was disabled by dereferencing an invalid pointer in an exception handler.
Anything sufficiently broken/corrupted in the critical path for JIT debugging, which is a lot of things (the process heap is a big one) can cause secondary failures in the JIT path (CreateProcessW is fairly heavyweight of an operation). These secondary failures typically lead to the process getting hard killed instead of being able to create the JIT debugger process successfully.
Monitoring from an external process instead of doing things from the point of an AV in a program is much better as it avoids exposing the fault handling code to whatever corruption claimed the process and caused the AV in the first place.
This is probably not the answer but annoyed the heck out of me for a bit. On my Vista x64 machine, the JIT (windbg) was not launching properly because it was not configured correctly in both the Wow6432Node and the standard registry hive. So for 32-bit processes I would get the Visual Studio JIT and for 64-bit I would get WinDbg like I expected. Manually updating the registration in both locations fixed it.
In the absence of that, I would vote for Skywings answer.
A stack overflow seems one case where Dr. Watson may not kick in.
In testing, unhandled C++ exceptions ("This application has requested the Runtime to terminate it in an unusual way.") may give one the impression that the good doctor is doing some work (do get a visual/audio notification, if configured) but in fact no dump is generated and only the header is written to drwtsn32.log. Similar results when terminate() is called due to an exception being thrown from a destructor during stack unwinding. In these cases, it would seem to be an intentional behavior of Dr. Watson, for whatever reason.
Apologies for the addition...
> it would seem to be an intentional behavior of Dr. Watson, for whatever reason. <
"whatever reason" could include, as seems to be the case here, that the failing process that invoked Dr. Watson has exited by the time Dr. Watson is ready to go.
We had exactly this issue.
In our case, it turned out to be a 3rdParty library calling abort() after encountering RTL heap memory corruption.
I think there are other cases:
- The VC RTL heap might popup a dialog box, and the user might decide to terminate the application.
- Service failed to start, caused by some loader issues (Missing DLL, or missing API, or even DllMain exception ).
Well, im kind of new to NT Development and i'm more into .Net development but i faced a problem like that. And what was happening was that some exception was raised, caught and the catching code silently closed the app (which is legal exit of an application).
Vista just improves this limitation(problem) and moves the "unhandled exception handling" code in the kernel-mode by using the WER service. We finally will not get silent process exit now. I provided a reply in the link below:
To put a finer point on the stack overflow reponses, this will specifically happen if you attempt to write to the hard guard page.