Hello NTDebuggers, from time to time we see the following problem. It’s another access violation, and the debug notes below are from a minidump.
Here is what we need to know…
· Generally speaking what happened to cause this AV?
· What method you would use to isolate root cause of the failure?
There are a lot of ways to do this. We look forward to hearing your approach.
We will post our methods and answer at the end of the week. If you need anything please let us know.
Microsoft (R) Windows Debugger Version 6.8.0001.0
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [D:\test123.dmp]
User Mini Dump File: Only registers, stack and portions of memory are available
0:000> k 123
0017f93c 75e4edb5 ntdll!ZwWaitForMultipleObjects+0x15
0017f9d8 75e430c3 kernel32!WaitForMultipleObjectsEx+0x11d
0017f9f4 75ef2084 kernel32!WaitForMultipleObjects+0x18
0017fa60 75ef22b1 kernel32!WerpReportFaultInternal+0x16c
0017fa74 75ebbe60 kernel32!WerpReportFault+0x70
0017fb00 7732d15a kernel32!UnhandledExceptionFilter+0x1c1
0017fb08 773000c4 ntdll!_RtlUserThreadStart+0x6f
0017fb1c 77361d05 ntdll!_EH4_CallFilterFunc+0x12
0017fb44 772eb6d1 ntdll!_except_handler4+0x8e
0017fb68 772eb6a3 ntdll!ExecuteHandler2+0x26
0017fc10 772cee57 ntdll!ExecuteHandler+0x24
0017fc10 10011127 ntdll!KiUserExceptionDispatcher+0xf
*** ERROR: Module load completed but symbols could not be loaded for crash3.exe
WARNING: Frame IP not in any known module. Following frames may be wrong.
0017ff40 0040104a 0x10011127
0017ffa0 75eb19f1 crash3+0x104a
0017ffac 7732d109 kernel32!BaseThreadInitThunk+0xe
0017ffec 00000000 ntdll!_RtlUserThreadStart+0x23
start end module name
00400000 0040d000 crash3 (no symbols)
6c250000 6c288000 odbcint (deferred)
6c290000 6c2f5000 odbc32 (deferred)
72a00000 72a86000 comctl32 (deferred)
74820000 749b4000 comctl32_74820000 (deferred)
75240000 75251000 samlib (deferred)
75260000 75281000 ntmarta (deferred)
754b0000 75510000 secur32 (deferred)
75510000 75570000 imm32 (deferred)
75700000 75790000 gdi32 (deferred)
757a0000 75870000 user32 (deferred)
758a0000 758a6000 nsi (deferred)
758b0000 759f4000 ole32 (deferred)
75a00000 75aaa000 msvcrt (deferred)
75ab0000 75ba0000 rpcrt4 (deferred)
75ba0000 75c1d000 usp10 (deferred)
75c20000 75c75000 shlwapi (deferred)
75d60000 75e27000 msctf (deferred)
75e30000 75f40000 kernel32 (pdb symbols)
76140000 76189000 Wldap32 (deferred)
76190000 7624f000 advapi32 (deferred)
76250000 76d1e000 shell32 (deferred)
76d20000 76d94000 comdlg32 (deferred)
76da0000 76dcd000 ws2_32 (deferred)
77280000 77287000 psapi (deferred)
77290000 77299000 lpk (deferred)
772b0000 77400000 ntdll (pdb symbols)
Good luck and happy debugging.
[Update: our answer. Posted 5/13/2008]
We enjoyed seeing different people’s approaches on this week’s puzzler. This was a simple module unload. We loaded a lib, did a GetProcAddress, freed the lib, and called the function. The dump was a mini dump created via .dump /m C:\dump file. There are various ways this type of scenario may arise. Obviously someone could unload a lib, but why? In most cases I’ve seen, it was due to a ref count problem in a com object. Poor accounting leading to one too many decrements, and the dll will get unloaded causing a simple crash footprint.
There are quite a few ways to track this down. First of all, if you had the debugger attached and got a full dump or /ma dump you would have seen the loaded module list. This would have been a dead giveaway and part of why we did the .dump /m. There are other options you can enable that make tracking of module loads easy under the debugger. I personally like “loader snaps” if I’m trying to track down module load shenanigans. To enable this, just go into the image section of the gflags tool and enable loader snaps for the exe in question. Now attach a debugger and watch the mode load and GetProcAddress details scroll by.
Yet another popular approach is to use process monitor. This tool is not only easy to set up, but it also gives you great logs with call stacks and other details such as registry accesses.
This puzzler provided the bare minumum data required. We did not give you much to go on because sometimes in real debugging scenarios you have to work with a lack of data. I really liked how many people questioned the source of the dump file. It really shows how familiar you all are with the various dump types.
PingBack from http://windows.wawblog.info/?p=4153
Probably missmatched PDB's...
First lets check the executable version:
if this did't help...
after PDB's missmatch were resolved.
(Sorry if there are duplicates - I'm not getting the "Comment Submitted" confirmation message when I submit. Please remove any duplicates.)
It looks to me that crash3 called a function (0x10011127) but the module containing that function has been unloaded. The module does not appear to have been rebased. It would seem to be desirable to determine what module loaded at 0x10000000 in crash3's address space. If one is able to reproduce the problem by executing crash3 and doing live analysis, I can imagine that one may wish to profile the app using Dependency Walker; or use Process Explorer's DLL view (if the problem lends itself to this type of inspection); or use Process Monitor to look at the "Load Image" operations for crash3; or use a debugger to capture the LOAD_DLL_DEBUG_EVENTs. Or, I suppose one could do lma 0x10000000 in the dump. Once the module has been determined, it would seem that one could set a breakpoint on ntdll!LdrUnloadDll and look for 0x10000000 in Param1, and check the stack at the time of the unload.
The EIP points to 0x10011127 which is not in any modules in the process. I assume "0x10011127" is an invalid address which caused the AV.
Based on my experience, there are several instructiosn may change the EIP: ret, call, jmp.
I would first verify if it is an indirect call/jmp instruction that goes to the invalid address 0x10011127, we can verify tihs by using "ub 0040104a".
Then, the most likely possibility is the ret instruction execution transfer which is caused by stack overflow(the saved return address is overwritten and then popped by the ret instruction) or even SEH overwrite. I will determine this scenario by checking if the DWORD "0x10011127" exists on the stack before the ESP by using "dps esp-20".
From my initial response:
> Or, I suppose one could do lma 0x10000000 in the dump. <
It appears this type of minidump does not contain unloaded module information, so lma would not be helpful in this case.
this does not look like a call to an unloaded module since lm output is clear of that suspicion (it would have been listed under unloaded modules so far i can tell)
this is not a stack overflow either since when the return address gets overwritten the debugger has a hard time retrieving the stack trace.
issuing 'ub 0040104a' is a big step towards solving this problem, though. i wonder why this did not get posted in the first place? probably, because it would have solved it already? ;o)
doing the .formats on the address in question does not seem to indicate that it's text.
if it has been that simple, i'd say it's a deliberate call into invalid address, which does not make that much sense.
looking forward to the answer.
Reposting this again since it got eaten mysteriously the first time --
First off, typically with a new dump I would do the following (and, to be honest, it's a bit bothersome that this information isn't available on the challenges that get posted here; have had a couple of times where it would have helped much, IMO):
!analyze -v * Obvious enough
lm * For your challenges, prefer ``lmv'' so can grab the requisite .dlls and disassemble on my own as need be!
dps @esp l100 * Or dps @rsp, especially if we don't have unwind information
dps @ebp l100 * Or dps @rbp, especially if we don't have unwind information
My suspicion would be that this would have been an unloaded dll, although the lack of unloaded module info is strange, since (due to RtlUserThreadStart), we know we're on Vista (it would be nice, BTW, if that sort of thing were provided up front and didn't need to be guesswork - it wouldn't be in a normal debugging scenario! :) ).
Specifically, because 10000000 is the default base for a /dynamicbase:no dll that was not set with a sane base address at link time. Perhaps someone used GetModuleHandle on a dll that wasn't static imported, or there's a race or other problem resulting into a use of a dll function pointer after dll unload. For example, a threading issue where one thread uses the dll after the other unloads it, or any other bug that could lead to this condition.
The reason why there isn't any unloaded module info here is that I think the person who wrote this dump didn't use a WER dump, they just attached windbg while the "Please report a problem" dialog was up and used ".dump c:\users\myuser\test.dmp" (or the like), which doesn't save unloaded module information. Please don't do this and write a real dump file next time!
I think that you have to follow the data structure pointed to by EXCEPTION_POINTERS to eventually find your way back to where the exception happened. I keep forgetting how to do that and need to look it up each time but I just use the Debug Diagnostic Tool now which seems to have that logic baked in.
Only the WER in Vista SP1 can generate the dump files as stated in the link below:
So WER may not be an option if they are not using SP1 :-)
I am curious that if we specify "DumpType=1(Mini dump)" in Vista SP1 WER configuration, will the unloaded modules to saved? What is the default flags used in this scenario? I was surprised to hear that windbg ".dump test.dmp" default option did not include "/u" option. Thank you for pointing this out.
If you want to break when the given module in question unloads, and catch the guilty party, I'd use "sxe ul:modulename" (although this irritatingly has the side effect of turning off the otherwise nice module unload notifications in the debugger).
Although, in the case of a bad refcount problem, this may only show the "last victim" and not the smoking gun if there was >1 outstanding reference when the code that lost more references than it held was run.
On the subject of WER:
I'll have to admit that we actually don't use WER dumps here, much. Decided against them for a couple of reasons, and going off on a lil bit of a tangent here, but...:
- Incredibly stupid beaurocracy red tape in the way of trying to get access to this kind of data via winqual, the last I checked. Seems that everyone complains about this to MS (and probably not the right place to mention it here, but...), and it's not likely to get fixed any time soon, but worth saying that the situation kind of sucks badly from an external, non-multinational corporation developer perspective.
- Prior to Vista, there weren't, as far as I know, any real options for customizing the dump generated.
- Even with the version of WER in Vista, options for customizing the dump generated were still highly limited, unfortunately.
We ended up writing our own crash reporting system with a fair amount more data included in dumps than that WER dumps provide (certainly a tradeoff, but one that I think made sense for us).
The main things that we wanted were an easy, configurable capability to get full dumps if we needed to, as well as inclusion of address space layout information. We also wanted to do things like include specific, "important" heap variables in our dumps without having to includ all heap data (this capability is there in the Vista version of WER, to Microsoft's credit, but that still left us SOL on downlevel platforms. Not much that could be done about that other than write our own code in that case, although perhaps someday, 10 years from now, Vista may be considered the baseline system that people run with and we can rely on the builtin WER capabilities there.)
Another nice lil thing that isn't present in (any) version of WER that's come in handy in our internal crash reporting mechanism a number of times here is to include a small section of memory around each register in the thread that crashed, assuming that register referred to readable address space. For example, include +/- 128 bytes from reach register, if that register's value was a valid pointer. Relatively cheap in terms of dump file size increase, and it's proved useful in a number of cases where locals get optimized to registers and aren't otherwise inferrable by the state of the program when it died.