@TessFerrandez
This is an issue I come across once in a while and since I am not a COM expert by any means I actually had to check in with our COM guys to see what was going on in this case, and as it turned out (this is embarrassing:)) I had already written about the root cause of this issue before (here). So what I am learning from this is that perhaps I should at least check my blog before asking questions:)
Problem description
Intermittently the application will stall, so all requests will hang until they run an IIS Reset. At this point the CPU usage for the process is very low.
Gathering memory dumps
A memory dump was gathered using Debug Diagnostics 1.1, by right clicking on the w3wp.exe process and choosing "Create Full Userdump" while the application was hanging.
Debugging the issue
I opened up the memory dump in windbg and ran the basic commands to set up the environment
.symfix c:\websymbols (to set up the symbol path) .reload (to reload the symbols) .loadby sos mscorwks (to load up the sos dll in order to view .net objects and stacks)
As always with hangs, the most interesting thing to look at is what the threads are doing, so the 3 standard commands I run are
~* kb 2000 (to see the native stacks) ~* e !clrstack (to see the .net stacks) !threads (to get information about the managed/.net threads)
When I browse through the native call stacks I noticed that plenty of them were sitting in this type of callstack
If you have followed the blog you will probably know that the fact that we are sitting in mscorwks!AwareLock::Enter means that we are waiting for a .net lock. i.e. on a line like the bolded line below, or alternatively in a Monitor.Enter()
lock(obj){ // do something }
If we switch to this thread (71) and run !clrstack on we get the .net stack so that we know where we are waiting for this lock/monitor
0:071> !clrstack OS Thread Id: 0x9740 (71) ESP EIP 10f2f38c 7c82ed54 [GCFrame: 10f2f38c] 10f2f4c4 7c82ed54 [HelperMethodFrame: 10f2f4c4] System.Threading.Monitor.Enter(System.Object) 10f2f518 0be7cbbb MyFramework.User.VerifyUser() 10f2f548 0be7c7d2 MyFramework.Http.HttpModules.HttpAuthenticationModule.Authenticate(System.Object, System.EventArgs) 10f2f654 6866b5d5 System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() 10f2f66c 6865a071 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef) 10f2f6ac 6865a39b System.Web.HttpApplication.ResumeSteps(System.Exception) 10f2f6f4 6865912d System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object) 10f2f710 6865e0bd System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest) 10f2f744 6865dd72 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest) 10f2f750 6865c447 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32) 10f2f900 79f1ef33 [ContextTransitionFrame: 10f2f900] 10f2f950 79f1ef33 [GCFrame: 10f2f950] 10f2faa8 79f1ef33 [ComMethodFrame: 10f2faa8]
and in this case it is in the MyFramework.User.VerifyUser() method that we are waiting for the lock.
Another interesting point about this stack is that we can see that this request hasn't started executing yet, it is still in the Authenticate module which explains why the requests don't simply time out.
The next step is to figure out who is holding this lock to determine why we are stuck here, and we find the owner by running !syncblk
We have two different syncblocks (locks) in this process and it is a bit hard to guess which one is which. The first one locking on object 1d107964 is held by thread 41 and it has 3 monitors which means that it has one waiter. The way you calculate how many waiters a monitor has is by taking (MonitorHeld-1)/2. This is because the owner will take one monitor, and each waiter takes two monitors.
The second lock (1d107334) is owned by thread 40 and has 30 waiters ((61-1)/2).
Oh, btw, when I say owner, I mean the thread that has entered into the lock statement and is now doing something while holding the lock so that thread is essentially keeping the "waiters" waiting.
If we look at our native stack we had
10f2f510 0be7cbbb 1d107334 00000001 00000000 mscorwks!JITutil_MonContention+0xa3
The 1st parameter to JITUtil_MonContention (the 3rd column here) identifies the object we are waiting on, so with this information we now know that our thread (thread 71) is waiting for the lock held by thread 40.
Next we check what this thread is doing and we find that this thread is also waiting for a lock, but this time in ConfigManager.get_Instance(), and lower down in the call chain we can see our VerifyUser method which is why we are holding the lock that thread 71 is waiting for.
0:040> kb ChildEBP RetAddr Args to Child 0c82ed6c 7c822114 77e6711b 00000001 0c82edbc ntdll!KiFastSystemCallRet 0c82ed70 77e6711b 00000001 0c82edbc 00000001 ntdll!NtWaitForMultipleObjects+0xc 0c82ee18 79f8ead4 00000001 04b0e4f0 00000000 kernel32!WaitForMultipleObjectsEx+0x11a 0c82ee80 79f17522 00000001 04b0e4f0 00000000 mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0x6f 0c82eea0 79f17493 00000001 04b0e4f0 00000000 mscorwks!Thread::DoAppropriateAptStateWait+0x3c 0c82ef24 79f1732f 00000001 04b0e4f0 00000000 mscorwks!Thread::DoAppropriateWaitWorker+0x144 0c82ef74 79f8ea4d 00000001 04b0e4f0 00000000 mscorwks!Thread::DoAppropriateWait+0x40 0c82efd0 79e77f50 ffffffff 00000001 00000000 mscorwks!CLREvent::WaitEx+0xf7 0c82efe0 7a0fd9c3 ffffffff 00000001 00000000 mscorwks!CLREvent::Wait+0x17 0c82f06c 7a0fdbbf 04c987c0 ffffffff 04b0e4dc mscorwks!AwareLock::EnterEpilog+0x94 0c82f088 7a0fdd2a 9f8962ea 04c987c0 04c987c0 mscorwks!AwareLock::Enter+0x61 0c82f0ec 7a094352 ffffffff 9f89638a 00000000 mscorwks!AwareLock::Contention+0x16c 0c82f18c 0be7d4b6 1d107964 00000000 00000000 mscorwks!JITutil_MonContention+0xa3 WARNING: Frame IP not in any known module. Following frames may be wrong. 0c82f1d0 79e74ebd 00000000 00000000 191240e4 0xbe7d4b6 0c82f1d8 00000000 191240e4 00000000 00000000 mscorwks!HelperMethodFrame::LazyInit+0x17
0:040> !clrstack OS Thread Id: 0x9da0 (40) ESP EIP 0c82f008 7c82ed54 [GCFrame: 0c82f008] 0c82f140 7c82ed54 [HelperMethodFrame: 0c82f140] System.Threading.Monitor.Enter(System.Object) 0c82f194 0be7d4b6 MyFramework.Configuration.ConfigManager.get_Instance() 0c82f2e4 0be7cd8a MyFramework.User..ctor() 0c82f398 0be7cbd1 MyFramework.User.VerifyUser() 0c82f3c8 0be7c7d2 MyFramework.Http.HttpModules.HttpAuthenticationModule.Authenticate(System.Object, System.EventArgs) 0c82f4d4 6866b5d5 System.Web.HttpApplication+SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() 0c82f4ec 6865a071 System.Web.HttpApplication.ExecuteStep(IExecutionStep, Boolean ByRef) 0c82f52c 6865a39b System.Web.HttpApplication.ResumeSteps(System.Exception) 0c82f574 6865912d System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(System.Web.HttpContext, System.AsyncCallback, System.Object) 0c82f590 6865e0bd System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest) 0c82f5c4 6865dd72 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest) 0c82f5d0 6865c447 System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32) 0c82f780 79f1ef33 [ContextTransitionFrame: 0c82f780] 0c82f7d0 79f1ef33 [GCFrame: 0c82f7d0] 0c82f928 79f1ef33 [ComMethodFrame: 0c82f928]
We go through the same drill again, finding the owner in the !syncbkl output and this time our first parameter to JITUtil_MonContention is 1d107964 which was the one held by thread 41.
Thread 41 is a bit different than the other threads, we can see from the .net callstack that we are doing Marshal.ReleaseComObject from the OnConfigRemoveCallback method, and going into the native callstack we can see that this in turn has to make a transition to an STA thread to clean up (GetToSTA). The reason for this is because the component we are trying to release is an STA component.
Preferably we shouldn't be using STA components in ASP.NET because of the need for transitions between threads and potential waits which of course cause performance degradations. If this would have been an MTA component we could have simply released it on the same thread.
However, still it should not be sitting like this, waiting indefinitely. The appropriate STA thread should handle the request and this thread should return, releasing the lock and thus releasing the other locks as well.
We can grab the 1st parameter to GetToSTA to figure out what thread it is trying to get to
0a25ead0 77789905 049d75c0 0ca91600 00000000 ole32!GetToSTA+0x6f
If we dump it out with dc, the 2nd and 3rd dword will tell us what process it is trying to get to as well as the thread id (the reason we have to go in a little roundabout fashion to find these is because public symbols don't have type information, otherwise we could have dumped out the parameter with dt or looked at it in windbg, but now you will just have to rely on the fact that it is dword 2 and 3 (at least in this version of the dll)
The PID (process ID) here is 91d0 and the thread id is 9a04
Just to verify that we are going to a thread in this process we can run "|" to get basic process information to compare agains
And then we can dump out the threads using !threads to identify the thread with the OS thread ID 9a04
Looking at the 1st thread in the list (Thread 30), we can see that the OSID is 9a04 so this is our guy.
30 1 9a04 049a6c28 3808220 Enabled 00000000:00000000 04b0f008 1 STA (Threadpool Worker)
There are a few other things that are interesting about it as well.
1. It is an STA thread (as it should be since we are doing GetToSTA to get here) and 2. It is a ThreadPool Worker thread, meaning that it is a regular .net thread, so not really a thread that would be pumping messages like an STA thread should
If we would look at the stack for it we would see that this thread is actually also waiting for the lock held by thread 40 so we are stuck in a deadlock where
thread 30 is waiting for a lock held by thread 40 thread 40 is waiting for a lock held by thread 41 thread 41 needs to get to thread 30 in order to release an STA component
What about the solution?
The problem we need to solve here is that we need to make sure that thread 41 is not trying to get to a managed thread (or another thread not pumping messages) to release the STA component.
As it turns out, the reason that thread 30 is marked as an STA thread (it should really be MTA) is because the first thread created by the threadpool is created without specifying the appartment model. If this thread happens to handle a request where it uses an STA component this thread will then be marked as STA for the rest of it's life, but unfortunately, as opposed to proper STA threads, it will not be pumping messages so you may block while trying to get to it.
To resolve the issue, install the hotfix discussed in this article http://support.microsoft.com/default.aspx?scid=kb;EN-US;928569
Laters,
Tess
ASP.NET Disabling a User Interface Element During a Partial Page Postback [Via: 4 Guys from Rolla ]...
Debug ASP.NET Case Study: Deadlock waiting in GetToSTA SQLServer SQL Server Scale Out SQL Code Layout
DebugASP.NETCaseStudy:DeadlockwaitinginGetToSTASQLServerSQLServerScaleOutSQLCodeL...
Tess,
You mentioned that the 2nd & 3rd DWord in GetToSTA represent Process ID and thread ID and the first parameter to JITutil_MonContention represents the lock we are waiting on ... etc.
Where do we find documentation about what each of these parameters mean? I can't find this on MSDN or technet. Without this information it would be impossible to debug such problems.
Thanks
you wouldnt know unless you have private symbols. You can also use the sieextpub.dll and the method !comcalls to see what thread it is trying to go to.
Hello Tess,
I am working for a large government org. In Holland.
They host some websites.
I installed them with separate worker processes for the different applications.
Sometimes out of the blue a applicationpool hangs (crashes?) with Event ID 1010
I installed IIS Diagnostics tool from microsoft.
This tool produces DUMP files.
Can those be opened with WinDbg?
Do you have any experience with this?
Thanks in advance.
Mike
Mike, yes you can open them in windbg and debug them there...
Hi Tess,
Thanks for the great articles. It helps our issue, somewhat. Currently our customer is having issue with high memory and CPU usage. We opened a Microsoft case and the support guy told us that the root cause of our issue was not having Marshal.ReleaseComObject to discard COM object.
Per Microsoft, the way we can be sure if our codes have been fixed completely is to get dump file and analyze if there is "GetToSTA" line on Finalizer Thread (by executing kbL 100).
However, we generated several dump files (using DebugDiag) from the same app pool in 5/10 minutes interval and the results were inconsistent.
w3wp.exe__DefaultAppPool__PID__3996__Date__07_14_2009__Time_02_14_24PM__770__Manual Dump.dmp] no STA 1,587,339
w3wp.exe__DefaultAppPool__PID__3996__Date__07_14_2009__Time_02_10_20PM__798__Manual Dump.dmp] STA 1,579,070
w3wp.exe__DefaultAppPool__PID__3996__Date__07_14_2009__Time_02_07_52PM__355__Manual Dump.dmp] STA 1,598,120
w3wp.exe__DefaultAppPool__PID__3996__Date__07_14_2009__Time_02_05_35PM__570__Manual Dump.dmp] no STA 1,491,832
My questions:
- what is the best practice to generate a dump file?
- where should I get a "complete" (or decent) symbols? microsoft links? "http://msdl.microsoft.com/download/symbols"? I could not execute "!clrstack" command. any suggestions?