NTDebugging Puzzler 0x00000005 (Better late than never)

NTDebugging Puzzler 0x00000005 (Better late than never)

  • Comments 10

Hello NTDebuggers, from time to time we see the following problem.  It’s another access violation, and the debug notes below are from a minidump.   

 

Here is what we need to know…

 

·         Generally speaking what happened to cause this AV?

·         What method you would use to isolate root cause of the failure?

 

There are a lot of ways to do this.  We look forward to hearing your approach.

 

We will post our methods and answer at the end of the week.   If you need anything please let us know.

 

-------------------------------------------

 

Microsoft (R) Windows Debugger  Version 6.8.0001.0

Copyright (c) Microsoft Corporation. All rights reserved.

 

 

Loading Dump File [D:\test123.dmp]

User Mini Dump File: Only registers, stack and portions of memory are available

 

0:000> k 123

ChildEBP RetAddr 

0017f93c 75e4edb5 ntdll!ZwWaitForMultipleObjects+0x15

0017f9d8 75e430c3 kernel32!WaitForMultipleObjectsEx+0x11d

0017f9f4 75ef2084 kernel32!WaitForMultipleObjects+0x18

0017fa60 75ef22b1 kernel32!WerpReportFaultInternal+0x16c

0017fa74 75ebbe60 kernel32!WerpReportFault+0x70

0017fb00 7732d15a kernel32!UnhandledExceptionFilter+0x1c1

0017fb08 773000c4 ntdll!_RtlUserThreadStart+0x6f

0017fb1c 77361d05 ntdll!_EH4_CallFilterFunc+0x12

0017fb44 772eb6d1 ntdll!_except_handler4+0x8e

0017fb68 772eb6a3 ntdll!ExecuteHandler2+0x26

0017fc10 772cee57 ntdll!ExecuteHandler+0x24

0017fc10 10011127 ntdll!KiUserExceptionDispatcher+0xf

*** ERROR: Module load completed but symbols could not be loaded for crash3.exe

WARNING: Frame IP not in any known module. Following frames may be wrong.

0017ff40 0040104a 0x10011127

0017ffa0 75eb19f1 crash3+0x104a

0017ffac 7732d109 kernel32!BaseThreadInitThunk+0xe

0017ffec 00000000 ntdll!_RtlUserThreadStart+0x23

 

0:000> lm

start    end        module name

00400000 0040d000   crash3     (no symbols)          

6c250000 6c288000   odbcint    (deferred)            

6c290000 6c2f5000   odbc32     (deferred)            

72a00000 72a86000   comctl32   (deferred)            

74820000 749b4000   comctl32_74820000   (deferred)            

75240000 75251000   samlib     (deferred)            

75260000 75281000   ntmarta    (deferred)            

754b0000 75510000   secur32    (deferred)            

75510000 75570000   imm32      (deferred)            

75700000 75790000   gdi32      (deferred)            

757a0000 75870000   user32     (deferred)            

758a0000 758a6000   nsi        (deferred)            

758b0000 759f4000   ole32      (deferred)            

75a00000 75aaa000   msvcrt     (deferred)            

75ab0000 75ba0000   rpcrt4     (deferred)            

75ba0000 75c1d000   usp10      (deferred)            

75c20000 75c75000   shlwapi    (deferred)            

75d60000 75e27000   msctf      (deferred)            

75e30000 75f40000   kernel32   (pdb symbols)  

76140000 76189000   Wldap32    (deferred)            

76190000 7624f000   advapi32   (deferred)            

76250000 76d1e000   shell32    (deferred)             

76d20000 76d94000   comdlg32   (deferred)            

76da0000 76dcd000   ws2_32     (deferred)            

77280000 77287000   psapi      (deferred)            

77290000 77299000   lpk        (deferred)            

772b0000 77400000   ntdll      (pdb symbols)     

 

Good luck and happy debugging.

 

Jeff-

 

 


[Update: our answer. Posted 5/13/2008]

 

We enjoyed seeing different people’s approaches on this week’s puzzler.   This was a simple module unload.  We loaded a lib, did a GetProcAddress, freed the lib, and called the function.  The dump was a mini dump created via .dump /m C:\dump file.   There are various ways this type of scenario may arise.   Obviously someone could unload a lib, but why?    In most cases I’ve seen, it was due to a ref count problem in a com object.  Poor accounting leading to one too many decrements, and the dll will get unloaded causing a simple crash footprint.

 

There are quite a few ways to track this down.  First of all, if you had the debugger attached and got a full dump or /ma dump you would have seen the loaded module list.  This would have been a dead giveaway and part of why we did the .dump /m.  There are other options you can enable that make tracking of module loads easy under the debugger.  I personally like “loader snaps” if I’m trying to track down module load shenanigans.   To enable this, just go into the image section of the gflags tool and enable loader snaps for the exe in question.  Now attach a debugger and watch the mode load and GetProcAddress details scroll by.

 

Yet another popular approach is to use process monitor.  This tool is not only easy to set up, but it also gives you great logs with call stacks and other details such as registry accesses.  

 

This puzzler provided the bare minumum data required.  We did not give you much to go on because sometimes in real debugging scenarios you have to work with a lack of data.   I really liked how many people questioned the source of the dump file.  It really shows how familiar you all are with the various dump types.

 

Great work!

 





Leave a Comment
  • Please add 2 and 4 and type the answer here:
  • Post
  • PingBack from http://windows.wawblog.info/?p=4153

  • No symbols?

    Probably missmatched PDB's...

    First lets check the executable version:

    >lmv crash3

    if this did't help...

    >!sym noisy

    >.reload

    after PDB's missmatch were resolved.

    >.ecxr

    [You are right, symbols aren't loaded, but they aren't needed to debug the issue.]
  • (Sorry if there are duplicates - I'm not getting the "Comment Submitted" confirmation message when I submit. Please remove any duplicates.)

    It looks to me that crash3 called a function (0x10011127) but the module containing that function has been unloaded. The module does not appear to have been rebased.  It would seem to be desirable to determine what module loaded at 0x10000000 in crash3's address space.  If one is able to reproduce the problem by executing crash3 and doing live analysis, I can imagine that one may wish to profile the app using Dependency Walker; or use Process Explorer's DLL view (if the problem lends itself to this type of inspection); or use Process Monitor to look at the "Load Image" operations for crash3; or use a debugger to capture the LOAD_DLL_DEBUG_EVENTs.  Or, I suppose one could do lma 0x10000000 in the dump.  Once the module has been determined, it would seem that one could set a breakpoint on ntdll!LdrUnloadDll and look for 0x10000000 in Param1, and check the stack at the time of the unload.

  • The EIP points to 0x10011127 which is not in any modules in the process. I assume "0x10011127" is an invalid address which caused the AV.

    Based on my experience, there are several instructiosn may change the EIP: ret, call, jmp.

    I would first verify if it is an indirect call/jmp instruction that goes to the invalid address 0x10011127, we can verify tihs by using "ub 0040104a".

    Then, the most likely possibility is the ret instruction execution transfer which is caused by stack overflow(the saved return address is overwritten and then popped by the ret instruction) or even SEH overwrite. I will determine this scenario by checking if the DWORD "0x10011127" exists on the stack before the ESP by using "dps esp-20".

    Jeffrey Tan

  • From my initial response:

    > Or, I suppose one could do lma 0x10000000 in the dump. <

    It appears this type of minidump does not contain unloaded module information, so lma would not be helpful in this case.

  • this does not look like a call to an unloaded module since lm output is clear of that suspicion (it would have been listed under unloaded modules so far i can tell)

    this is not a stack overflow either since when the return address gets overwritten the debugger has a hard time retrieving the stack trace.

    issuing 'ub 0040104a' is a big step towards  solving this problem, though. i wonder why this did not get posted in the first place? probably, because it would have solved it already? ;o)

    doing the .formats on the address in question does not seem to indicate that it's text.

    if it has been that simple, i'd say it's a deliberate call into invalid address, which does not make that much sense.

    looking forward to the answer.

  • Reposting this again since it got eaten mysteriously the first time --

    First off, typically with a new dump I would do the following (and, to be honest, it's a bit bothersome that this information isn't available on the challenges that get posted here; have had a couple of times where it would have helped much, IMO):

    !analyze -v   * Obvious enough

    .ecxr

    lm            * For your challenges, prefer ``lmv'' so can grab the requisite .dlls and disassemble on my own as need be!

    !uniqstack -b

    dps @esp l100      * Or dps @rsp, especially if we don't have unwind information

    dps @ebp l100      * Or dps @rbp, especially if we don't have unwind information

    My suspicion would be that this would have been an unloaded dll, although the lack of unloaded module info is strange, since (due to RtlUserThreadStart), we know we're on Vista (it would be nice, BTW, if that sort of thing were provided up front and didn't need to be guesswork - it wouldn't be in a normal debugging scenario! :) ).

    Specifically, because 10000000 is the default base for a /dynamicbase:no dll that was not set with a sane base address at link time.  Perhaps someone used GetModuleHandle on a dll that wasn't static imported, or there's a race or other problem resulting into a use of a dll function pointer after dll unload.  For example, a threading issue where one thread uses the dll after the other unloads it, or any other bug that could lead to this condition.

    The reason why there isn't any unloaded module info here is that I think the person who wrote this dump didn't use a WER dump, they just attached windbg while the "Please report a problem" dialog was up and used ".dump c:\users\myuser\test.dmp" (or the like), which doesn't save unloaded module information.  Please don't do this and write a real dump file next time!

  • I think that you have to follow the data structure pointed to by EXCEPTION_POINTERS to eventually find your way back to where the exception happened. I keep forgetting how to do that and need to look it up each time but I just use the Debug Diagnostic Tool now which seems to have that logic baked in.

    http://www.microsoft.com/downloads/details.aspx?FamilyID=28bd5941-c458-46f1-b24d-f60151d875a3&displaylang=en

  • Hi Skywing,

    Only the WER in Vista SP1 can generate the dump files as stated in the link below:

    http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx

    So WER may not be an option if they are not using SP1 :-)

    I am curious that if we specify "DumpType=1(Mini dump)" in Vista SP1 WER configuration, will the unloaded modules to saved? What is the default flags used in this scenario? I was surprised to hear that windbg ".dump test.dmp" default option did not include "/u" option. Thank you for pointing this out.

    Jeffrey Tan

  • If you want to break when the given module in question unloads, and catch the guilty party, I'd use "sxe ul:modulename" (although this irritatingly has the side effect of turning off the otherwise nice module unload notifications in the debugger).

    Although, in the case of a bad refcount problem, this may only show the "last victim" and not the smoking gun if there was >1 outstanding reference when the code that lost more references than it held was run.

    On the subject of WER:

    I'll have to admit that we actually don't use WER dumps here, much.  Decided against them for a couple of reasons, and going off on a lil bit of a tangent here, but...:

    - Incredibly stupid beaurocracy red tape in the way of trying to get access to this kind of data via winqual, the last I checked.  Seems that everyone complains about this to MS (and probably not the right place to mention it here, but...), and it's not likely to get fixed any time soon, but worth saying that the situation kind of sucks badly from an external, non-multinational corporation developer perspective.

    - Prior to Vista, there weren't, as far as I know, any real options for customizing the dump generated.

    - Even with the version of WER in Vista, options for customizing the dump generated were still highly limited, unfortunately.

    We ended up writing our own crash reporting system with a fair amount more data included in dumps than that WER dumps provide (certainly a tradeoff, but one that I think made sense for us).

    The main things that we wanted were an easy, configurable capability to get full dumps if we needed to, as well as inclusion of address space layout information.  We also wanted to do things like include specific, "important" heap variables in our dumps without having to includ all heap data (this capability is there in the Vista version of WER, to Microsoft's credit, but that still left us SOL on downlevel platforms.  Not much that could be done about that other than write our own code in that case, although perhaps someday, 10 years from now, Vista may be considered the baseline system that people run with and we can rely on the builtin WER capabilities there.)

    Another nice lil thing that isn't present in (any) version of WER that's come in handy in our internal crash reporting mechanism a number of times here is to include a small section of memory around each register in the thread that crashed, assuming that register referred to readable address space.  For example, include +/- 128 bytes from reach register, if that register's value was a valid pointer.  Relatively cheap in terms of dump file size increase, and it's proved useful in a number of cases where locals get optimized to registers and aren't otherwise inferrable by the state of the program when it died.

Page 1 of 1 (10 items)