@TessFerrandez
We're about halfway through with the labs so to change it up a little bit I'm going to give you a debugging challenge.
I will post the step-by-step instructions and review in about a week, but for now I will only post a problem description.
Your mission, should you choose to accept it, is to troubleshoot the problem with only that problem description. Post intermediate findings, questions, potential solutions etc. in the comments, and don't worry if you go down the wrong path a couple of times (that's just part of the debugging experience:)). Try not to peak at the code unless you have identified a problematic function during the troubleshooting process...
Problem description:
We get lot's of customer reports that randomly our site will display an "Internet Explorer cannot display the webpage" page.
The page suggests the following as likely causes:
If they refresh the page the site works just fine but the fact that the page is displayed give the customers very little confidence in our site, especially if the error is displayed when they get ready to order items. We're loosing millions of dollars because of this issue so it is extremely urgent that we resolve it. March is usually the peak-season for our site so we definitely need the site to work smoothly by then.
We believe we have been able to reproduce the issue in testing because we see this error page when we browse to the company info page and submit feedback.
Any help is greatly appreciated!
The challenge is on:)Tess
Btw, forgot to say, the site that is having the issue is of course the BuggyBits demo site (Information and setup instructions)
Previous demos and setup instructions
If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.
Information and setup instructionsLab 1: HangLab 1: Hang - reviewLab 2: CrashLab 2: Crash - reviewLab 3: MemoryLab 3: Memory - reviewLab 4: High CPU hangLab 4: High CPU hang - review
XXXX 1 1598 001a0ae8 8808220 Disabled 01c4f004:01c50fe8 001d01e8 1 Ukn (Threadpool Completion Port) System.StackOverflowException (018700a4) (nested exceptions)
Looks like exception handler is causing an exception. Sounds oddly familiar to a thing i've seen not-so-long-time ago.
Digging in deeper.
going down the line and !DumpHeap -type Exception we have the following stats:
Statistics:
MT Count TotalSize Class Name
79103ca4 1 12 System.Text.DecoderExceptionFallback
79103c58 1 12 System.Text.EncoderExceptionFallback
79118794 2 64 System.UnhandledExceptionEventHandler
790fe17c 1 72 System.ExecutionEngineException
790fe0e0 1 72 System.StackOverflowException
790fe044 1 72 System.OutOfMemoryException
790fdf04 1 72 System.Exception
790fe284 2 144 System.Threading.ThreadAbortException
79123c6c 5307 382104 System.UnauthorizedAccessException
Eventually System.UnauthorizedAccessException is the one that we suspect.
http://msdn2.microsoft.com/en-us/library/system.unauthorizedaccessexception.aspx tells us more about this exception...we learn the exception code, then we launch adplus to capture this exception when it happens...
adplus -crash -iis -ce 80070005 -FullOnFirst
digging in deeper atm....
AKA the culprit
Guess what happened after last command....cdb started spawning dumpfiles like mad, and filled my hard drive with gigabytes of dumps :)
Eventually after killing it, and opening a random dumpfile...guess what i see... we're trying to log an exception, apparently opening a file, when we get the exception...
0:001> !clrstack
OS Thread Id: 0x9f8 (1)
ESP EIP
00aae5b4 7c812aeb [HelperMethodFrame: 00aae5b4]
00aae658 7948d980 System.IO.__Error.WinIOError(Int32, System.String)
00aae684 79395557 System.IO.FileStream.Init(System.String, System.IO.FileMode, System.IO.FileAccess, Int32, Boolean, System.IO.FileShare, Int32, System.IO.FileOptions, SECURITY_ATTRIBUTES, System.String, Boolean)
00aae77c 793983c8 System.IO.FileStream..ctor(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, Int32, System.IO.FileOptions)
00aae7a8 793984e3 System.IO.StreamWriter.CreateFile(System.String, Boolean)
00aae7b8 79398550 System.IO.StreamWriter..ctor(System.String, Boolean, System.Text.Encoding, Int32)
00aae7d8 7949959c System.IO.StreamWriter..ctor(System.String)
00aae7e4 0efd10c1 Utility.WriteToLog(System.String, System.String)
00aae830 0efd1052 ExceptionHandler.LogException(System.Exception)
00aae83c 0efd117b Utility.WriteToLog(System.String, System.String)
00aae900 0efd1052 ExceptionHandler.LogException(System.Exception)
00aae90c 0efd117b Utility.WriteToLog(System.String, System.String)
00aae9d0 0efd1052 ExceptionHandler.LogException(System.Exception)
00aae9dc 0efd117b Utility.WriteToLog(System.String, System.String)
With the !clrstack -a we can see the following:
00aaf2cc 0efd117b Utility.WriteToLog(System.String, System.String)
PARAMETERS:
message = 0x01984770
fileName = 0x01984840
LOCALS:
0x00aaf348 = 0x00000000
0x00aaf344 = 0x05985190
0x00aaf360 = 0x00000000
0x00aaf35c = 0x00000000
0:001> !dumpobj 0x01984840
Name: System.String
MethodTable: 790fd8c4
EEClass: 790fd824
Size: 38(0x26) bytes
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: c:\log.txt
This is truly great...exception logger apparently tries to log stuff into C:\log.txt where it dies and spawns another exception apparently....look into the code for more :)
Great post, and great fun :)
Once you have the exception address, you can use !PrintException to get a stack trace.
Agreed - that simplifies the task greatly, but also steals a little bit of magic from it i guess. I wonder if we can wait for more things like this - this has been exciting :)
My troubleshooting goes like this (any comments or corrections are most welcome):
First i tried to get a dump
adplus -crahs -pn w3wp.exe
But the only exception i got was a stack overflow and it was way to late.
My second chance was
adplus -crash -pn w3wp.exe -dumponfirst
Then when i reproduced the error and dumps keep on showing
I opened the first dump and saw
13 1 6a4 000d9410 1808220 Enabled 00000000:00000000 000fe610 1 Ukn (Threadpool Worker)
System.IO.DirectoryNotFoundException (023b6ab0)
Then
!do 023b6ab0 and got to see the message in the exception. Some directory not found.
But so far no clues
Then i opened one of the last dumps and
0:013> !threads
ThreadCount: 7
UnstartedThread: 1
BackgroundThread: 6
PendingThread: 1
DeadThread: 0
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
System.IO.DirectoryNotFoundException (023b6ab0) (nested exceptions)
15 2 c3c 000e51b0 b220 Enabled 00000000:00000000 000d83c8 0 MTA (Finalizer)
16 3 4ac 000fa8d8 80a220 Enabled 00000000:00000000 000d83c8 0 MTA (Threadpool Completion Port)
17 4 eb4 000fddb8 1220 Enabled 00000000:00000000 000d83c8 0 Ukn
18 7 8b4 00152aa0 180b220 Enabled 00000000:00000000 000d83c8 0 MTA (Threadpool Worker)
19 8 420 001285d8 880b220 Enabled 00000000:00000000 000d83c8 0 MTA (Threadpool Completion Port)
22 5 8f8 00150a68 1400 Enabled 00000000:00000000 000d83c8 0 Ukn
Now i can see some nested exceptions.
!printexception -nested 023b6ab0
Exception object: 023b6ab0
Exception type: System.IO.DirectoryNotFoundException
Message: Could not find a part of the path 'c:\files\log.txt'.
InnerException: <none>
StackTrace (generated):
<none>
StackTraceString: <none>
HResult: 80070003
Nested exception -------------------------------------------------------------
Exception object: 023b5ad0
SP IP Function
01A0D048 79653A45 mscorlib_ni!System.IO.__Error.WinIOError(Int32, System.String)+0x1ebd31
01A0D0A8 7936F43B mscorlib_ni!System.IO.FileStream.Init(System.String, System.IO.FileMode, System.IO.FileAccess, Int32, Boolean, System.IO.FileShare, Int32, System.IO.FileOptions, SECURITY_ATTRIBUTES, System.String, Boolean)+0x3e7
01A0D19C 7936EF30 mscorlib_ni!System.IO.FileStream..ctor(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, Int32, System.IO.FileOptions)+0x74
01A0D1C8 7936EEAB mscorlib_ni!System.IO.StreamWriter.CreateFile(System.String, Boolean)+0x3b
01A0D1D8 7936EE54 mscorlib_ni!System.IO.StreamWriter..ctor(System.String, Boolean, System.Text.Encoding, Int32)+0x5c
01A0D1F8 794752FC mscorlib_ni!System.IO.StreamWriter..ctor(System.String)+0x1c
01A0D204 020D10C1 App_Code_gukcpxoe!Utility.WriteToLog(System.String, System.String)+0x59
(...)
Exception object: 023a64fc
01A0F0C8 79653A45 mscorlib_ni!System.IO.__Error.WinIOError(Int32, System.String)+0x1ebd31
01A0F128 7936F43B mscorlib_ni!System.IO.FileStream.Init(System.String, System.IO.FileMode, System.IO.FileAccess, Int32, Boolean, System.IO.FileShare, Int32, System.IO.FileOptions, SECURITY_ATTRIBUTES, System.String, Boolean)+0x3e7
01A0F21C 7936EF30 mscorlib_ni!System.IO.FileStream..ctor(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, Int32, System.IO.FileOptions)+0x74
01A0F248 7936EEAB mscorlib_ni!System.IO.StreamWriter.CreateFile(System.String, Boolean)+0x3b
01A0F258 7936EE54 mscorlib_ni!System.IO.StreamWriter..ctor(System.String, Boolean, System.Text.Encoding, Int32)+0x5c
01A0F278 794752FC mscorlib_ni!System.IO.StreamWriter..ctor(System.String)+0x1c
01A0F284 020D10C1 App_Code_gukcpxoe!Utility.WriteToLog(System.String, System.String)+0x59
Exception object: 023a5600
Exception type: System.Exception
Message: The email entered is not a valid email address
01A0F33C 020D1005 App_Code_gukcpxoe!BuggyMail.IsValidEmailAddress(System.String)+0x65
01A0F354 020D0F4C App_Code_gukcpxoe!BuggyMail.SendEmail(System.String, System.String)+0x3c
HResult: 80131500
I see that my SendEmail is calling a IsValidEmailAddress that throws a DirectoryNotFoundException
and then i see a loop of exceptions in Utility.WriteToLog. (Still don´t know the line that caused the loop)
Then i will see what was IP pointing at the time of the exception
0:013> !ip2md 020D10C1
MethodDesc: 01ce8d58
Method Name: Utility.WriteToLog(System.String, System.String)
Class: 02085b20
MethodTable: 01ce8d60
mdToken: 06000018
Module: 01ce6754
IsJitted: yes
m_CodeOrIL: 020d1068
!dumpmt 01ce8d60
EEClass: 02085b20
Name: Utility
mdToken: 0200000a (C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\buggybits\0b50c844\fb864981\App_Code.gukcpxoe.dll)
BaseSize: 0xc
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 6
lmv mApp_Code_gukcpxoe
start end module name
02060000 0206a000 App_Code_gukcpxoe (deferred)
Image path: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\buggybits\0b50c844\fb864981\App_Code.gukcpxoe.dll
Image name: App_Code.gukcpxoe.dll
Using CLR debugging support for all symbols
Has CLR image header, track-debug-data flag not set
Timestamp: Sat Mar 01 11:17:44 2008 (47C93B58)
CheckSum: 00000000
ImageSize: 0000A000
File version: 0.0.0.0
Product version: 0.0.0.0
File flags: 0 (Mask 3F)
File OS: 4 Unknown Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0000.04b0
InternalName: App_Code.gukcpxoe.dll
OriginalFilename: App_Code.gukcpxoe.dll
ProductVersion: 0.0.0.0
FileVersion: 0.0.0.0
FileDescription:
LegalCopyright:
Then let me save the module that starts at
!savemodule 02060000 c:\file.dll
3 sections in file
section 0 - VA=2000, VASize=2654, FileAddr=200, FileSize=2800
section 1 - VA=6000, VASize=2c8, FileAddr=2a00, FileSize=400
section 2 - VA=8000, VASize=c, FileAddr=2e00, FileSize=200
After i open the module i get to see the code using Reflector
public static void WriteToLog(string message, string fileName)
{
try
using (StreamWriter writer = new StreamWriter(fileName))
writer.WriteLine("--------------------------");
writer.WriteLine(DateTime.Now.ToLongTimeString());
writer.WriteLine("-------------------");
writer.WriteLine(message);
}
catch (Exception exception)
ExceptionHandler.LogException(exception);
It looks like we are in a loop.
Just to check the line that actually caused the exception
!u 020D10C1
Normal JIT generated code
Utility.WriteToLog(System.String, System.String)
Begin 020d1068, size 126
020d10b0 e8efcfdb77 call mscorwks!JIT_NewCrossContext (79e8e0a4)
020d10b5 8bf0 mov esi,eax
020d10b7 8b55c4 mov edx,dword ptr [ebp-3Ch]
020d10ba 8bce mov ecx,esi
020d10bc e8af3e5c77 call System.IO.StreamWriter..ctor(System.String) (79694f70)
>>> 020d10c1 8975c0 mov dword ptr [ebp-40h],esi
So prior to the exception we were initiallizing the StreamWriter.
From this i can see where the exception took place, why (in previous exception messages)
and following the code i see recursion (infinite) and thats why i got in my first
dump stack overflow. (that is also in the event viewer )
This actually happens because we are in framework 2.0 and because the
exception occurs outside CLR the process goes down. Am i correct? If
i was in 1.1 no shutdown would have occurred right ?
Nice to see the great troubleshooting. Yes, the stackoverflow caused by the infinite recursion is causing the crash so nice work there...
Bruno, the issue would have caused a stackoverflow and a crash in 1.1. as well... you can't recover from a stackoverflow, I think what you might be thinking of is how unhandled exceptions are handled differently in 1.1. and 2.0.
I also have one question to both Bruno and Andrew... you both create full dumps on firstchance exceptions which seems pretty logical but did you notice how much impact this had in regards to CPU usage, time for shutdown and disk space of course? Given the amount of 1st chance exceptions this must have yielded over 1000 dumps... in a real-world scenario this would probably not have worked out so well...
could you have gotten the info you got without getting all these 1st chance exception dumps?
Hi,
yes i was mixing the way how unhandled exceptions are handled.
I could have just get a dump like
adplus -crash -pn w3wp.exe
And then even if i don´t have the managed threads i can get the stack on all exceptions using
.foreach (ex {!dumpheap -type Exception -short}){!pe ${ex}}
and after looking at the pattern and i have all the information to do the rest of the troubleshooting the same way i guess.
I have one question: this dump i get i only have on thread and it´s not managed. Since the bug causes w3wp to end isn´t it possible to get a breakpoint in Kernel32!ExitProcess and in this way i would get more information about the threads ? Because i tried this and couldn´t get no dump.
If you look at the log file you will see that another exception (the stackoverflow exception) occurred right before the crash... creating an adplus config file to get a dump on this exception would give you a dump at the right place
If i do the dump with adplus -crash -pn w3wp i can see the Overflow Exception, and using the foreach will get me to the right place.
0:000> .loadby sos mscorwks
0:000> !threads
ThreadCount: 6
UnstartedThread: 0
PendingThread: 0
XXXX 1 7ec 000d9560 1808220 Disabled 00000000:00000000 000fe6c8 1 Ukn (Threadpool Worker) System.StackOverflowException (0230106c) (nested exceptions)
XXXX 2 870 000e52f8 b220 Enabled 00000000:00000000 000c2b68 0 Ukn (Finalizer)
XXXX 3 224 000fa760 80a220 Enabled 00000000:00000000 000c2b68 0 Ukn (Threadpool Completion Port)
XXXX 4 9a0 000fde70 1220 Enabled 00000000:00000000 000c2b68 0 Ukn
XXXX 5 b78 0014a450 880a220 Enabled 00000000:00000000 000c2b68 0 Ukn (Threadpool Completion Port)
XXXX 6 e3c 0014a920 880b220 Enabled 00000000:00000000 000c2b68 0 Ukn (Threadpool Completion Port)
I tried to create the config file like you said
<ADPlus>
<Settings>
<RunMode>CRASH</RunMode>
<Option>Quiet</Option>
</Settings>
<Exceptions>
<Option>
NoDumpOnFirstChance
</Option>
<Config>
<Code>
clr
</Code>
<Actions1>
Void
</Actions1>
<CustomActions1>
.loadby sos mscorwks;
!stoponexception System.StackOverflowException 3;
.if @@(@$t3==1){.dump /ma /u C:\\SOE.dmp}
</CustomActions1>
<ReturnAction1>
GN
</ReturnAction1>
</Config>
</Exceptions>
</ADPlus>
but i see that after reproducing the bug, my VM gets really slow while it´s trying to get the dump, but then i don´t get any dump. Am i missing something in my conf file ?
Was trying to dig into more these crashes, but apparently since i installed vista with SP1 in it, i'm not getting anything with SOS.dll ...i get to
0:000> !clrstack
PDB symbol for mscorwks.dll not loaded
Failed to load data access DLL, 0x80004005
process monitor says this:
39179 5:37:21.3735252 PM windbg.exe 4924 QueryOpen C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscordacwks.dll FAST IO DISALLOWED
Apparently i need some more debugging of OS before i can get back to debugging of the buggy bits website. :)
Regarding first chance exceptions, and the disk space, i presume that -FullOnFirstOver would fix the size of the created files issue (personally i just killed cdb in task manager when i saw several files were created).
Regarding other ways to capture the bug, capturing at the moment of stack overflow .....yeah.....well..we're just learning, aren't we.
P.S. do you have some good tips on how to read the logfile created by cdb?
absolutely right, its part of learning:) was just challenging you a little bit:)
Sounds by the message you are getting on vista that the symbols are simply not lined up right...
I'll post the lab and walkthrough shortly, it'll have some info on going through the log file so I'll answer that one shortly
Bruno, try looking at the cfg file and dumping on the native exception that occurred right before the crash rather than the managed stackoverflow
After some help :) ok i cheated about finding the native exception here it goes:
it the log of my first dump i have
Tue Mar 4 19:32:40.204 2008 (GMT+0): (fa4.c6c): Unknown exception - code e053534f (first chance)
---
--- 1st chance UnknownException exception ----
---------------------------------------------------------------
Then it was necessary to set a config file (or debugdiag) to set a breakpoint in this exception e053534f. Then i got the dump at the right time.
Then looking at kb i see mscorwks!ReportStackOverflow in top so it´s easier from now on in this situation to do
Bm mscorwks!ReportStackOverflow
Thanks
Just to complete my previous comment here it is the config i created.
<NewException>
<Code> 0xe053534f</Code>
<Name> My_Custom_Exception</Name>
</NewException>
<Name> My_Custom_Exception </Name>
FullDump