If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

.NET Debugging Demos Lab 5: Debugging Challenge! (time to test your knowledge)

.NET Debugging Demos Lab 5: Debugging Challenge! (time to test your knowledge)

Rate This
  • Comments 18

We're about halfway through with the labs so to change it up a little bit I'm going to give you a debugging challenge. 

I will post the step-by-step instructions and review in about a week, but for now I will only post a problem description.

Your mission, should you choose to accept it, is to troubleshoot the problem with only that problem description.  Post intermediate findings, questions, potential solutions etc. in the comments, and don't worry if you go down the wrong path a couple of times (that's just part of the debugging experience:)). Try not to peak at the code unless you have identified a problematic function during the troubleshooting process...

Problem description:

We get lot's of customer reports that randomly our site will display an "Internet Explorer cannot display the webpage" page. 

The page suggests the following as likely causes:

  • You are not connected to the Internet.
  • The website is encountering problems.
  • There might be a typing error in the address.

If they refresh the page the site works just fine but the fact that the page is displayed give the customers very little confidence in our site, especially if the error is displayed when they get ready to order items.  We're loosing millions of dollars because of this issue so it is extremely urgent that we resolve it. March is usually the peak-season for our site so we definitely need the site to work smoothly by then.

We believe we have been able to reproduce the issue in testing because we see this error page when we browse to the company info page and submit feedback.

Any help is greatly appreciated!

 

The challenge is on:)
Tess

Btw, forgot to say, the site that is having the issue is of course the BuggyBits demo site (Information and setup instructions)

Previous demos and setup instructions

If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.

Information and setup instructions
Lab 1: Hang
Lab 1: Hang - review
Lab 2: Crash
Lab 2: Crash - review
Lab 3: Memory
Lab 3: Memory - review
Lab 4: High CPU hang
Lab 4: High CPU hang - review

  • XXXX    1 1598 001a0ae8   8808220 Disabled 01c4f004:01c50fe8 001d01e8     1 Ukn (Threadpool Completion Port) System.StackOverflowException (018700a4) (nested exceptions)

    Looks like exception handler is causing an exception. Sounds oddly familiar to a thing i've seen not-so-long-time ago.

    Digging in deeper.

  • going down the line and !DumpHeap -type Exception we have the following stats:

    Statistics:

         MT    Count    TotalSize Class Name

    79103ca4        1           12 System.Text.DecoderExceptionFallback

    79103c58        1           12 System.Text.EncoderExceptionFallback

    79118794        2           64 System.UnhandledExceptionEventHandler

    790fe17c        1           72 System.ExecutionEngineException

    790fe0e0        1           72 System.StackOverflowException

    790fe044        1           72 System.OutOfMemoryException

    790fdf04        1           72 System.Exception

    790fe284        2          144 System.Threading.ThreadAbortException

    79123c6c     5307       382104 System.UnauthorizedAccessException

    Eventually System.UnauthorizedAccessException is the one that we suspect.

    http://msdn2.microsoft.com/en-us/library/system.unauthorizedaccessexception.aspx tells us more about this exception...we learn the exception code, then we launch adplus to capture this exception when it happens...

    adplus -crash -iis -ce 80070005 -FullOnFirst

    digging in deeper atm....

  • AKA the culprit

    Guess what happened after last command....cdb started spawning dumpfiles like mad, and filled my hard drive with gigabytes of dumps :)

    Eventually after killing it, and opening a random dumpfile...guess what i see... we're trying to log an exception, apparently opening a file, when we get the exception...

    0:001> !clrstack

    OS Thread Id: 0x9f8 (1)

    ESP       EIP    

    00aae5b4 7c812aeb [HelperMethodFrame: 00aae5b4]

    00aae658 7948d980 System.IO.__Error.WinIOError(Int32, System.String)

    00aae684 79395557 System.IO.FileStream.Init(System.String, System.IO.FileMode, System.IO.FileAccess, Int32, Boolean, System.IO.FileShare, Int32, System.IO.FileOptions, SECURITY_ATTRIBUTES, System.String, Boolean)

    00aae77c 793983c8 System.IO.FileStream..ctor(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, Int32, System.IO.FileOptions)

    00aae7a8 793984e3 System.IO.StreamWriter.CreateFile(System.String, Boolean)

    00aae7b8 79398550 System.IO.StreamWriter..ctor(System.String, Boolean, System.Text.Encoding, Int32)

    00aae7d8 7949959c System.IO.StreamWriter..ctor(System.String)

    00aae7e4 0efd10c1 Utility.WriteToLog(System.String, System.String)

    00aae830 0efd1052 ExceptionHandler.LogException(System.Exception)

    00aae83c 0efd117b Utility.WriteToLog(System.String, System.String)

    00aae900 0efd1052 ExceptionHandler.LogException(System.Exception)

    00aae90c 0efd117b Utility.WriteToLog(System.String, System.String)

    00aae9d0 0efd1052 ExceptionHandler.LogException(System.Exception)

    00aae9dc 0efd117b Utility.WriteToLog(System.String, System.String)

    With the !clrstack -a we can see the following:

    00aaf2cc 0efd117b Utility.WriteToLog(System.String, System.String)

       PARAMETERS:

           message = 0x01984770

           fileName = 0x01984840

       LOCALS:

           0x00aaf348 = 0x00000000

           0x00aaf344 = 0x05985190

           0x00aaf360 = 0x00000000

           0x00aaf35c = 0x00000000

    0:001> !dumpobj 0x01984840

    Name: System.String

    MethodTable: 790fd8c4

    EEClass: 790fd824

    Size: 38(0x26) bytes

    (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)

    String: c:\log.txt

    This is truly great...exception logger apparently tries to log stuff into C:\log.txt where it dies and spawns another exception apparently....look into the code for more :)

    Great post, and great fun :)

  • Once you have the exception address, you can use !PrintException to get a stack trace.

  • Agreed - that simplifies the task greatly, but also steals a little bit of magic from it i guess. I wonder if we can wait for more things like this - this has been exciting :)

  • My troubleshooting goes like this (any comments or corrections are most welcome):

    First i tried to get a dump

    adplus -crahs -pn w3wp.exe

    But the only exception i got was a stack overflow and it was way to late.

    My second chance was

    adplus -crash -pn w3wp.exe -dumponfirst

    Then when i reproduced the error and dumps keep on showing

    I opened the first dump and saw

     13    1  6a4 000d9410   1808220 Enabled  00000000:00000000 000fe610     1 Ukn (Threadpool Worker)

     System.IO.DirectoryNotFoundException (023b6ab0)

     Then

     !do 023b6ab0 and got to see the message in the exception. Some directory  not found.

     But so far no clues

     Then i opened one of the last dumps and

    0:013> !threads

    ThreadCount: 7

    UnstartedThread: 1

    BackgroundThread: 6

    PendingThread: 1

    DeadThread: 0

    Hosted Runtime: no

                                         PreEmptive   GC Alloc           Lock

          ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception

     13    1  6a4 000d9410   1808220 Enabled  00000000:00000000 000fe610     1 Ukn (Threadpool Worker)

     System.IO.DirectoryNotFoundException (023b6ab0) (nested exceptions)

     15    2  c3c 000e51b0      b220 Enabled  00000000:00000000 000d83c8     0 MTA (Finalizer)

     16    3  4ac 000fa8d8    80a220 Enabled  00000000:00000000 000d83c8     0 MTA (Threadpool Completion Port)

     17    4  eb4 000fddb8      1220 Enabled  00000000:00000000 000d83c8     0 Ukn

     18    7  8b4 00152aa0   180b220 Enabled  00000000:00000000 000d83c8     0 MTA (Threadpool Worker)

     19    8  420 001285d8   880b220 Enabled  00000000:00000000 000d83c8     0 MTA (Threadpool Completion Port)

     22    5  8f8 00150a68      1400 Enabled  00000000:00000000 000d83c8     0 Ukn

     Now i can see some nested exceptions.

    !printexception -nested 023b6ab0

    Exception object: 023b6ab0

    Exception type: System.IO.DirectoryNotFoundException

    Message: Could not find a part of the path 'c:\files\log.txt'.

    InnerException: <none>

    StackTrace (generated):

    <none>

    StackTraceString: <none>

    HResult: 80070003

    Nested exception -------------------------------------------------------------

    Exception object: 023b5ad0

    Exception type: System.IO.DirectoryNotFoundException

    Message: Could not find a part of the path 'c:\files\log.txt'.

    InnerException: <none>

    StackTrace (generated):

       SP       IP       Function

       01A0D048 79653A45 mscorlib_ni!System.IO.__Error.WinIOError(Int32, System.String)+0x1ebd31

       01A0D0A8 7936F43B mscorlib_ni!System.IO.FileStream.Init(System.String, System.IO.FileMode, System.IO.FileAccess, Int32, Boolean, System.IO.FileShare, Int32, System.IO.FileOptions, SECURITY_ATTRIBUTES, System.String, Boolean)+0x3e7

       01A0D19C 7936EF30 mscorlib_ni!System.IO.FileStream..ctor(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, Int32, System.IO.FileOptions)+0x74

       01A0D1C8 7936EEAB mscorlib_ni!System.IO.StreamWriter.CreateFile(System.String, Boolean)+0x3b

       01A0D1D8 7936EE54 mscorlib_ni!System.IO.StreamWriter..ctor(System.String, Boolean, System.Text.Encoding, Int32)+0x5c

       01A0D1F8 794752FC mscorlib_ni!System.IO.StreamWriter..ctor(System.String)+0x1c

       01A0D204 020D10C1 App_Code_gukcpxoe!Utility.WriteToLog(System.String, System.String)+0x59

    StackTraceString: <none>

    HResult: 80070003

    (...)

    Nested exception -------------------------------------------------------------

    Exception object: 023a64fc

    Exception type: System.IO.DirectoryNotFoundException

    Message: Could not find a part of the path 'c:\files\log.txt'.

    InnerException: <none>

    StackTrace (generated):

       SP       IP       Function

       01A0F0C8 79653A45 mscorlib_ni!System.IO.__Error.WinIOError(Int32, System.String)+0x1ebd31

       01A0F128 7936F43B mscorlib_ni!System.IO.FileStream.Init(System.String, System.IO.FileMode, System.IO.FileAccess, Int32, Boolean, System.IO.FileShare, Int32, System.IO.FileOptions, SECURITY_ATTRIBUTES, System.String, Boolean)+0x3e7

       01A0F21C 7936EF30 mscorlib_ni!System.IO.FileStream..ctor(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, Int32, System.IO.FileOptions)+0x74

       01A0F248 7936EEAB mscorlib_ni!System.IO.StreamWriter.CreateFile(System.String, Boolean)+0x3b

       01A0F258 7936EE54 mscorlib_ni!System.IO.StreamWriter..ctor(System.String, Boolean, System.Text.Encoding, Int32)+0x5c

       01A0F278 794752FC mscorlib_ni!System.IO.StreamWriter..ctor(System.String)+0x1c

       01A0F284 020D10C1 App_Code_gukcpxoe!Utility.WriteToLog(System.String, System.String)+0x59

    StackTraceString: <none>

    HResult: 80070003

    Nested exception -------------------------------------------------------------

    Exception object: 023a5600

    Exception type: System.Exception

    Message: The email entered is not a valid email address

    InnerException: <none>

    StackTrace (generated):

       SP       IP       Function

       01A0F33C 020D1005 App_Code_gukcpxoe!BuggyMail.IsValidEmailAddress(System.String)+0x65

       01A0F354 020D0F4C App_Code_gukcpxoe!BuggyMail.SendEmail(System.String, System.String)+0x3c

    StackTraceString: <none>

    HResult: 80131500

    I see that my SendEmail is calling a IsValidEmailAddress that throws a DirectoryNotFoundException

    and then i see a loop of exceptions in Utility.WriteToLog. (Still don´t know the line that caused the loop)

    Then i will see what was IP pointing at the time of the exception

     01A0F284 020D10C1 App_Code_gukcpxoe!Utility.WriteToLog(System.String, System.String)+0x59

    0:013> !ip2md 020D10C1

    MethodDesc: 01ce8d58

    Method Name: Utility.WriteToLog(System.String, System.String)

    Class: 02085b20

    MethodTable: 01ce8d60

    mdToken: 06000018

    Module: 01ce6754

    IsJitted: yes

    m_CodeOrIL: 020d1068

    !dumpmt 01ce8d60

    EEClass: 02085b20

    Module: 01ce6754

    Name: Utility

    mdToken: 0200000a  (C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\buggybits\0b50c844\fb864981\App_Code.gukcpxoe.dll)

    BaseSize: 0xc

    ComponentSize: 0x0

    Number of IFaces in IFaceMap: 0

    Slots in VTable: 6

    lmv mApp_Code_gukcpxoe

    start    end        module name

    02060000 0206a000   App_Code_gukcpxoe   (deferred)            

       Image path: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\buggybits\0b50c844\fb864981\App_Code.gukcpxoe.dll

       Image name: App_Code.gukcpxoe.dll

       Using CLR debugging support for all symbols

       Has CLR image header, track-debug-data flag not set

       Timestamp:        Sat Mar 01 11:17:44 2008 (47C93B58)

       CheckSum:         00000000

       ImageSize:        0000A000

       File version:     0.0.0.0

       Product version:  0.0.0.0

       File flags:       0 (Mask 3F)

       File OS:          4 Unknown Win32

       File type:        2.0 Dll

       File date:        00000000.00000000

       Translations:     0000.04b0

       InternalName:     App_Code.gukcpxoe.dll

       OriginalFilename: App_Code.gukcpxoe.dll

       ProductVersion:   0.0.0.0

       FileVersion:      0.0.0.0

       FileDescription:  

       LegalCopyright:    

    Then let me save the module that starts at

    !savemodule 02060000  c:\file.dll

    3 sections in file

    section 0 - VA=2000, VASize=2654, FileAddr=200, FileSize=2800

    section 1 - VA=6000, VASize=2c8, FileAddr=2a00, FileSize=400

    section 2 - VA=8000, VASize=c, FileAddr=2e00, FileSize=200

    After i open the module i get to see the code using Reflector

    public static void WriteToLog(string message, string fileName)

    {

       try

       {

           using (StreamWriter writer = new StreamWriter(fileName))

           {

               writer.WriteLine("--------------------------");

               writer.WriteLine(DateTime.Now.ToLongTimeString());

               writer.WriteLine("-------------------");

               writer.WriteLine(message);

           }

       }

       catch (Exception exception)

       {

           ExceptionHandler.LogException(exception);

       }

    }

    It looks like we are in a loop.

    Just to check the line that actually caused the exception

    !u 020D10C1

    Normal JIT generated code

    Utility.WriteToLog(System.String, System.String)

    Begin 020d1068, size 126

    (...)

    020d10b0 e8efcfdb77      call    mscorwks!JIT_NewCrossContext (79e8e0a4)

    020d10b5 8bf0            mov     esi,eax

    020d10b7 8b55c4          mov     edx,dword ptr [ebp-3Ch]

    020d10ba 8bce            mov     ecx,esi

    020d10bc e8af3e5c77      call    System.IO.StreamWriter..ctor(System.String) (79694f70)

    >>> 020d10c1 8975c0          mov     dword ptr [ebp-40h],esi

    (...)

    So prior to the exception we were initiallizing the StreamWriter.

    From this i can see where the exception took place, why (in previous exception messages)

    and following the code i see recursion (infinite) and thats why i got in my first

    dump stack overflow. (that is also in the event viewer )

    This actually happens because we are in framework 2.0 and because the

    exception occurs outside CLR the process goes down. Am i correct? If

    i was in 1.1 no shutdown would have occurred right ?

  • Nice to see the great troubleshooting.  Yes, the stackoverflow caused by the infinite recursion is causing the crash so nice work there...

    Bruno, the issue would have caused a stackoverflow and a crash in 1.1. as well... you can't recover from a stackoverflow, I think what you might be thinking of is how unhandled exceptions are handled differently in 1.1. and 2.0.

    I also have one question to both Bruno and Andrew... you both create full dumps on firstchance exceptions which seems pretty logical but did you notice how much impact this had in regards to CPU usage, time for shutdown and disk space of course?  Given the amount of 1st chance exceptions this must have yielded over 1000 dumps...  in a real-world scenario this would probably not have worked out so well...  

    could you have gotten the info you got without getting all these 1st chance exception dumps?

  • Hi,

    yes i was mixing the way how unhandled exceptions are handled.

    I could have just get a dump like

    adplus -crash -pn w3wp.exe

    And then even if i don´t have the managed threads i can get the stack on all exceptions using

    .foreach (ex {!dumpheap -type Exception -short}){!pe ${ex}}

    and after looking at the pattern and i have all the information to do the rest of the troubleshooting the same way i guess.

    I have one question: this dump i get i only have on thread and it´s not managed. Since the bug causes w3wp to end isn´t it possible to get a breakpoint in Kernel32!ExitProcess and in this way i would get more information about the threads ? Because i tried this and couldn´t get no dump.

  • If you look at the log file you will see that another exception (the stackoverflow exception) occurred right before the crash...  creating an adplus config file to get a dump on this exception would give you a dump at the right place

  • If i do the dump with adplus -crash -pn w3wp i can see the Overflow Exception, and using the foreach will get me to the right place.

    0:000> .loadby sos mscorwks

    0:000> !threads

    ThreadCount: 6

    UnstartedThread: 0

    BackgroundThread: 6

    PendingThread: 0

    DeadThread: 0

    Hosted Runtime: no

                                         PreEmptive   GC Alloc           Lock

          ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception

    XXXX    1  7ec 000d9560   1808220 Disabled 00000000:00000000 000fe6c8     1 Ukn (Threadpool Worker) System.StackOverflowException (0230106c) (nested exceptions)

    XXXX    2  870 000e52f8      b220 Enabled  00000000:00000000 000c2b68     0 Ukn (Finalizer)

    XXXX    3  224 000fa760    80a220 Enabled  00000000:00000000 000c2b68     0 Ukn (Threadpool Completion Port)

    XXXX    4  9a0 000fde70      1220 Enabled  00000000:00000000 000c2b68     0 Ukn

    XXXX    5  b78 0014a450   880a220 Enabled  00000000:00000000 000c2b68     0 Ukn (Threadpool Completion Port)

    XXXX    6  e3c 0014a920   880b220 Enabled  00000000:00000000 000c2b68     0 Ukn (Threadpool Completion Port)

    I tried to create the config file like you said

    <ADPlus>

    <Settings>

    <RunMode>CRASH</RunMode>

    <Option>Quiet</Option>

    </Settings>

    <Exceptions>

    <Option>

    NoDumpOnFirstChance

    </Option>

    <Config>

    <Code>

    clr

    </Code>

    <Actions1>

    Void

    </Actions1>

    <CustomActions1>

    .loadby sos mscorwks;

    !stoponexception System.StackOverflowException 3;

    .if @@(@$t3==1){.dump /ma /u C:\\SOE.dmp}

    </CustomActions1>

    <ReturnAction1>

    GN

    </ReturnAction1>

    </Config>

    </Exceptions>

    </ADPlus>

    but i see that after reproducing the bug, my VM gets really slow while it´s trying to get the dump, but then i don´t get any dump. Am i missing something in my conf file ?

  • Was trying to dig into more these crashes, but apparently since i installed vista with SP1 in it, i'm not getting anything with SOS.dll ...i get to

    0:000> !clrstack

    PDB symbol for mscorwks.dll not loaded

    Failed to load data access DLL, 0x80004005

    process monitor says this:

    39179 5:37:21.3735252 PM windbg.exe 4924 QueryOpen C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscordacwks.dll FAST IO DISALLOWED

    Apparently i need some more debugging of OS before i can get back to debugging of the buggy bits website. :)

    Regarding first chance exceptions, and the disk space, i presume that -FullOnFirstOver would fix the size of the created files issue (personally i just killed cdb in task manager when i saw several files were created).

    Regarding other ways to capture the bug, capturing at the moment of stack overflow .....yeah.....well..we're just learning, aren't we.

    P.S. do you have some good tips on how to read the logfile created by cdb?

  • absolutely right, its part of learning:) was just challenging you a little bit:)  

    Sounds by the message you are getting on vista that the symbols are simply not lined up right...

    I'll post the lab and walkthrough shortly, it'll have some info on going through the log file so I'll answer that one shortly

  • Bruno,  try looking at the cfg file and dumping on the native exception that occurred right before the crash rather than the managed stackoverflow

  • After some help :) ok i cheated about finding the native exception here it goes:

    it the log of my first dump i have

    Tue Mar  4 19:32:40.204 2008 (GMT+0): (fa4.c6c): Unknown exception - code e053534f (first chance)

    ---

    --- 1st chance UnknownException exception ----

    ---------------------------------------------------------------

    Then it was necessary to set a config file (or debugdiag) to set a breakpoint in this exception e053534f. Then i got the dump at the right time.

    Then looking at kb i see mscorwks!ReportStackOverflow in top so it´s easier from now on in this situation to do

    Bm mscorwks!ReportStackOverflow

    Thanks

    Thanks

  • Just to complete my previous comment here it is the config i created.

    <ADPlus>

    <Settings>

    <RunMode>CRASH</RunMode>

    <Option>Quiet</Option>

    </Settings>

    <Exceptions>

    <NewException>

              <Code> 0xe053534f</Code>

              <Name> My_Custom_Exception</Name>

          </NewException>

    <Config>

    <Code> 0xe053534f</Code>

              <Name> My_Custom_Exception </Name>

    <Actions1>

    FullDump

    </Actions1>

    </Config>

    </Exceptions>

    </ADPlus>

Page 1 of 2 (18 items) 12
Leave a Comment
  • Please add 2 and 4 and type the answer here:
  • Post