Hello NTDebuggers, in the spirit of Click and Clack (The Tappet brothers), a favorite troubleshooting show of mine, we thought it would be fun to offer up some Debug puzzlers for our readers.
That said, this week’s Debug Puzzler is in regard to Dr. Watson. I’m sure most of you have seen Dr. Watson errors. This typically means your application has crashed due to an unhandled exception. Sometimes however the process just seems to disappear. The Just-in-Time (JIT) debugging options configured via the AEDebug key does not catch the crash… Does anyone know why this may happen?
We will post reader’s comments as they respond during the week, and next Monday will post our answer and recognize some of the best answers we received from our readers.
Good luck and happy debugging!
- Jeff Dailey
[Update: our answer, posted 4/11/2008]
Hello NTDebuggers. Let me start of by saying that we were very impressed by our reader’s answers. Our two favorite answers were submitted by Skywing and molotov.
When a thread starts, the ntdll Run Time Library (RTL) for the process inserts an exception hander before it calls the BaseThreadInit code to hand control over to the executable or DLL running in the process (notepad in the example below). If anything goes wrong with the chain of exception handlers, the process can’t make it back to the RTL exception handler and the process will simply terminate. See http://www.microsoft.com/msj/0197/Exception/Exception.aspx for details.
000ef7ac 75fbf837 ntdll!KiFastSystemCallRet
000ef7b0 75fbf86a USER32!NtUserGetMessage+0xc
000ef7cc 00b21418 USER32!GetMessageW+0x33
000ef80c 00b2195d notepad!WinMain+0xec
000ef89c 76e24911 notepad!_initterm_e+0x1a1
000ef8a8 7704e4b6 kernel32!BaseThreadInitThunk+0xe
000ef8e8 7704e489 ntdll!__RtlUserThreadStart+0x23 << Exception Handler is inserted here.
000ef900 00000000 ntdll!_RtlUserThreadStart+0x1b
Secondly, the process that crashes is actually responsible for starting the debugger via the RTL exception handler. The debugger is registered under the AeDebug registry key. Even if you are able to unwind to the RTL exception handler you may still run into trouble. If the computer is low on system resources such as desktop heap, you may not be able to create a new process and thus will not be able to launch the debugger. As SkyWing stated, it’s a relatively heavyweight operation. Applications may also call TerminateProcess from within their own code based on an error condition. If we have a customer that sees this symptom on a regular basis we typically recommend having them attach a debugger to monitor the process. Simply run via ADPLUS -crash -p (PROCESSID).
Good work folks! We’ll have another puzzler ready next Monday.
Good Luck and happy debugging!
My guess: a regular process termination by some obscure component in your application, or alternatively regular terminate process from a global unhandled exception filter/handler.
bp ntdll!zwterminateprocess usually gets me a tiny little bit closer to the problem.
A thread exhausted it's stack because stack expansion was disabled by dereferencing an invalid pointer in an exception handler.
Anything sufficiently broken/corrupted in the critical path for JIT debugging, which is a lot of things (the process heap is a big one) can cause secondary failures in the JIT path (CreateProcessW is fairly heavyweight of an operation). These secondary failures typically lead to the process getting hard killed instead of being able to create the JIT debugger process successfully.
Monitoring from an external process instead of doing things from the point of an AV in a program is much better as it avoids exposing the fault handling code to whatever corruption claimed the process and caused the AV in the first place.
This is probably not the answer but annoyed the heck out of me for a bit. On my Vista x64 machine, the JIT (windbg) was not launching properly because it was not configured correctly in both the Wow6432Node and the standard registry hive. So for 32-bit processes I would get the Visual Studio JIT and for 64-bit I would get WinDbg like I expected. Manually updating the registration in both locations fixed it.
In the absence of that, I would vote for Skywings answer.
A stack overflow seems one case where Dr. Watson may not kick in.
In testing, unhandled C++ exceptions ("This application has requested the Runtime to terminate it in an unusual way.") may give one the impression that the good doctor is doing some work (do get a visual/audio notification, if configured) but in fact no dump is generated and only the header is written to drwtsn32.log. Similar results when terminate() is called due to an exception being thrown from a destructor during stack unwinding. In these cases, it would seem to be an intentional behavior of Dr. Watson, for whatever reason.
Apologies for the addition...
> it would seem to be an intentional behavior of Dr. Watson, for whatever reason. <
"whatever reason" could include, as seems to be the case here, that the failing process that invoked Dr. Watson has exited by the time Dr. Watson is ready to go.
We had exactly this issue.
In our case, it turned out to be a 3rdParty library calling abort() after encountering RTL heap memory corruption.
I think there are other cases:
- The VC RTL heap might popup a dialog box, and the user might decide to terminate the application.
- Service failed to start, caused by some loader issues (Missing DLL, or missing API, or even DllMain exception ).
Well, im kind of new to NT Development and i'm more into .Net development but i faced a problem like that. And what was happening was that some exception was raised, caught and the catching code silently closed the app (which is legal exit of an application).
Vista just improves this limitation(problem) and moves the "unhandled exception handling" code in the kernel-mode by using the WER service. We finally will not get silent process exit now. I provided a reply in the link below:
To put a finer point on the stack overflow reponses, this will specifically happen if you attempt to write to the hard guard page.
I know this is an old thread, but hope my comment will not remain unnoticed here. I couldnt find any other better way to get in contact with you, other than posting comment in one of the posts (you guys dont have a visible group email address or contact form on the blog).
So this is the best match topic in the search results I could find (although not 100% perfectly fits) for my question regarding Silent Process Exit support in Windows7+
On a corporate Win7 laptop I have a small utility running in the traybar, that gets killed from time to time. I wanted to find out who is killing it. I used the GFLAGS.exe from the Debug tools and marked my utility executable name for Silent Process Exit monitoring (I am member of local admins group on the machine). Checked the registry keys, and the GFLAGS settings seem to be registered correctly. But I dont see any feedback when (for testing purposes only) I kill this exe from task manager. This is the issue on 1 system I use primarily. However on another Windows 7 machine (from a completely different domain, different GPOs etc.) I see immediately the feedback. I setup the GFLAGS exactly the same way on both machines, so the only difference I can think of is that on the primary machine the IT dept have configured something (via GPOs maybe?) that blocks the Silent Process Exit feature. I was unable to find requirements on MSDN site ("Monitoring Silent Process Exit" page) that I should check for this issue.
Do you guy are aware of any extra setting / GPO / permission requirements, that is needed in order that feature to work? Or maybe McAfee running only on my primary laptop blocks this thing?
[If the process is exiting due to an exception you can try the steps given here, "adplus -crash -p (PROCESSID)". Alternatively, there is a good article on Silent Process Exit at the AskPerf site, http://blogs.technet.com/b/askperf/archive/2013/05/01/what-killed-my-process.aspx .]
Thanks to whoever added the hint to my previous comment. I will try it to see if a solution or not, unfortunately the process exit happens at unexpected times, so may need to run the debug process for a long time.
By the way, if you check the ASKPERF blog post comments, I came here actually from there. So that's an infinite loop then, the circle closed :(