Here’s another interesting crash problem I worked on a few days ago: the application pool was randomly crashing and the following errors were logged:
Event Type: Error Event Source: .NET Runtime 2.0 Error Reporting Event Category: None Event ID: 1000 Date: 21/08/2009 Time: 12:23:12 User: N/A Computer: <computername> Description: Faulting application w3wp.exe, version 6.0.3790.3959, stamp 45d691cc, faulting module kernel32.dll, version 5.2.3790.4062, stamp 462643a7, debug? 0, fault address 0x0000000000027d8d.
Event Type: Warning Event Source: W3SVC Event Category: None Event ID: 1011 Date: 21/08/2009 Time: 12:23:14 User: N/A Computer: <computername> Description: A process serving application pool <AppPoolName> suffered a fatal communication error with the World Wide Web Publishing Service. The process id was xxx. The data field contains the error number.
Even if with some difficulties (we had to disable Error Reporting as described here) we managed to capture a coupe of dumps for this crash; dumping the exception was not very helpful:
0:027> !gleLastErrorValue: (Win32) 0x3f0 (1008) - An attempt was made to reference a token that does not exist.LastStatusValue: (NTSTATUS) 0xc000007c - An attempt was made to reference a token that doesn't exist. This is typically done by referencing the token associated with a thread when the thread is not impersonating a client.0:027> !printexceptionThere is no current managed exception on this thread
0:027> !gleLastErrorValue: (Win32) 0x3f0 (1008) - An attempt was made to reference a token that does not exist.LastStatusValue: (NTSTATUS) 0xc000007c - An attempt was made to reference a token that doesn't exist. This is typically done by referencing the token associated with a thread when the thread is not impersonating a client.
0:027> !printexceptionThere is no current managed exception on this thread
But dumping the very long stack gave some good hint at least about a potential problem (a stack overflow):
0:027> kpLChild-SP RetAddr Call Site00000000`05fcae90 00000642`7f895333 kernel32!RaiseException+0x7300000000`05fcaf60 00000642`78390730 mscorwks!`string'+0x81ef300000000`05fcb130 00000642`780a9522 mscorlib_ni!System.OrdinalComparer.Equals(<HRESULT 0x80004001>)+0x7000000000`05fcb170 00000642`7809822a mscorlib_ni!System.Collections.Hashtable.KeyEquals(<HRESULT 0x80004001>)+0x4200000000`05fcb1a0 00000642`748f8976 mscorlib_ni!System.Collections.Hashtable.get_Item(<HRESULT 0x80004001>)+0x15a00000000`05fcb260 00000642`748ff9bc System_ni!System.Collections.Specialized.NameObjectCollectionBase.FindEntry(<HRESULT 0x80004001>)+0x1600000000`05fcb290 00000642`80580f48 System_ni!System.Collections.Specialized.NameValueCollection.Get(<HRESULT 0x80004001>)+0xc00000000`05fcb2c0 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x4800000000`05fcb300 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb340 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb380 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb3c0 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb400 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb440 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb480 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb4c0 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb500 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb540 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb580 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb5c0 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb600 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb640 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb680 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb6c0 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb700 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb740 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb780 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb7c0 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb800 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb840 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb880 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d00000000`05fcb8c0 00000642`8058108d MyTest!MyClass.Web.Pages.BasePage.get_Language(<HRESULT 0x80004001>)+0x18d[...]
Using the err.exe utility to lookup the exit code of the process when it crashed, we can confirm we had a stack overflow:
Event Type: WarningEvent Source: W3SVCEvent Category: NoneEvent ID: 1009Date: 21/08/2009Time: 13:44:11User: N/AComputer: <computername>Description: A process serving application pool <AppPoolName> terminated unexpectedly. The process id was '5392'. The process exit code was '0x800703e9'.C:\>err 0x800703e9# for hex 0x800703e9 / decimal -2147023895 COR_E_STACKOVERFLOW corerror.h# Is raised by the EE when the execution stack overflows as# it is attempting to ex# as an HRESULT: Severity: FAILURE (1), FACILITY_WIN32 (0x7), Code 0x3e9# for hex 0x3e9 / decimal 1001 ERROR_STACK_OVERFLOW winerror.h# Recursion too deep; the stack overflowed.# 2 matches found for "0x800703e9"
Event Type: WarningEvent Source: W3SVCEvent Category: NoneEvent ID: 1009Date: 21/08/2009Time: 13:44:11User: N/AComputer: <computername>Description: A process serving application pool <AppPoolName> terminated unexpectedly. The process id was '5392'. The process exit code was '0x800703e9'.
C:\>err 0x800703e9# for hex 0x800703e9 / decimal -2147023895 COR_E_STACKOVERFLOW corerror.h# Is raised by the EE when the execution stack overflows as# it is attempting to ex# as an HRESULT: Severity: FAILURE (1), FACILITY_WIN32 (0x7), Code 0x3e9# for hex 0x3e9 / decimal 1001 ERROR_STACK_OVERFLOW winerror.h# Recursion too deep; the stack overflowed.# 2 matches found for "0x800703e9"
In the IIS log the request that lead to the crash was like this:
GET
/MyApp/test.aspx?currentPreferences=2&ParentId=1&ClientDateTime=Wed%20Aug%2019%202009%2009:09:57%20GMT+0100%20(GMT%20Standard%20Time)&TimeZone=-60&Language=undefined&Currency=undefined&OrgUnitId=undefined
Basically some user was using a robot to scrape the web site and the Language parameter (along with a few others) was not correctly set; in this case the customer’s code was not properly validating those parameters (using a browser they are always valid) and this cause the endless recursion that ultimately exhausted the stack.
Needless to say, improving the parameter validation logic resolved the problem
Carlo
Err.exe cool! I got a new power tools...