If broken it is, fix it you should

Using the powers of the debugger to solve the problems of the world - and a bag of chips    by Tess Ferrandez, ASP.NET Escalation Engineer (Microsoft)

ASP.NET Performance Case Study: Web Service calls taking forever

ASP.NET Performance Case Study: Web Service calls taking forever

Rate This
  • Comments 30

This issue has been pretty frequent since .net came out, and I am bringing it up for two reasons.

  1. It is something I think every web developer calling web services should be aware of... and
  2. It is a good example of how the product changes based on feedback from support incidents etc. to better suit our customers

Problem Description

We have an ASP.NET application acting as a GUI to our middle tier (Web Service) and at peak times the application is dead slow. Requests start timing out but some do go through so it doesn't appear that we are completely deadlocked.

After the peak in volume the application seems to work fine again.

During this time CPU utilization is reasonably low in the GUI process, and it is very low in the Web Service process. So low in fact that it looks like hardly anything is running in the web service application.

Gathering information

In a multi-tier application of any kind it is always hard to determine where to grab memory dumps, and often times the tiers are on different machines so it's not as easy as just setting the debugger up to grab a dump from all processes at once.

Let's call the GUI server "Server A", and the backend server (web services server) "Server B".

If we take a hang dump of Server A we temporarily stall Server A (even if just for a few seconds) and thus requests to Server B will be drained so we can't go in and take a dump of Server B and expect all requests to still be alive. If on the other hand we take a hang dump of Server B, stalling it momentarily, Server A's request might time out because we are debugging Server B. So we have to focus on one of them at a time.

There are a number of possible reasons we could be "hanging" here.

  1. The web service is slow executing the requests, or possibly even deadlocked.
  2. The GUI app is slow in parts not related to the web service
  3. The GUI app is slow getting requests through to the web service
  4. ...

If we focus on Server A (GUI) we can at least see if we are waiting for a response from Server B, in which case, next time we can take dumps from Server B, and find out what we are waiting for. So a hang dump of the asp.net process on Server A, or preferably two, taken about a minute or so apart so we can see if the threads are moving or completely stalled, would be good. Note: if you are taking multiple dumps, make sure that the first dump is ready before you take the next one so you don't get two identical dumps.

Debugging the issue

Drum roll please!... Introducing a command I haven't shown before:) !threadpool

0:065> !threadpool
CPU utilization 7%
Worker Thread: Total: 37 Running: 37 Idle: 1 MaxLimit: 40 MinLimit: 2
Work Request in Queue: 0
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread: Total: 2 Free: 2 MaxFree: 4 CurrentLimit: 2 MaxLimit: 40 MinLimit: 2

CPU Utilization is 7%. This is the complete CPU utilization on the box, so not specifically for the Asp.Net process, either way it is pretty low, just like it was mentioned in the problem description.

Currently there are no work requests in the queue and no timers waiting.

We have a maximum of 40 worker threads to execute our requests, and a minimum of 2 alive at any time (this comes from our machine.config settings). And currently we are executing on 37 of these, but it doesn't appear that the thread pool is exhausted. This jives well with that there are no work requests in the queue.

So what are all these threads doing?

Running ~* kb it appears that most of the threads are in some kind of wait. More specifically in a CorWaitOneNative...

  61  Id: 1164.a44 Suspend: 1 Teb: 7ff3a000 Unfrozen
ChildEBP RetAddr  Args to Child              
0d58f430 7c822114 77e6711b 00000001 0d58f480 ntdll!KiFastSystemCallRet
0d58f434 77e6711b 00000001 0d58f480 00000000 ntdll!NtWaitForMultipleObjects+0xc
0d58f4dc 791e0b3b 00000001 0d58f5bc 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0d58f50c 791e0bdd 00000001 0d58f5bc 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0d58f560 791d5ba9 00000001 0d58f5bc 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0d58f5a8 024840c8 0d58f5b4 00000000 000186a0 mscorsvr!WaitHandleNative::CorWaitOneNative+0x6f
WARNING: Frame IP not in any known module. Following frames may be wrong.
0d58f614 0c9a5cb7 00000000 06c60b10 06c60b10 0x24840c8
0d58f644 0c9a00c2 06c60b10 0677c0c0 00000000 0xc9a5cb7
0d58f688 0c870721 06c60f9c 06c5fe04 06c60b10 0xc9a00c2
0d58f6ac 0c870532 06c60af0 06c5fe04 067592f0 0xc870721
0d58f6d0 0c952ea4 067592f0 06c5fe04 06c5f13c 0xc870532
0d58f800 799a67a2 06c5f1a4 06c5f4a8 799b4168 0xc952ea4
0d58f80c 799b4168 00000002 00000000 024a1710 mscorlib_79990000+0x167a2
0d58f890 791b3208 0d58f9a8 791b3ad7 0d58f8e4 mscorlib_79990000+0x24168
0d58f898 791b3ad7 0d58f8e4 00000000 0d58f8bc mscorsvr!CallDescrWorker+0x30
0d58f9a8 791d3ef0 00bf71bb 79b7c000 00000004 mscorsvr!MethodDesc::CallDescr+0x1b8
0d58fa64 791d3fa4 79bf71bb 79b7c000 79bd46de mscorsvr!MethodDesc::CallDescr+0x4f
0d58fa8c 791ebc5a 0d58faa4 00101ae0 0254c470 mscorsvr!MethodDesc::Call+0x97
0d58fab4 791cc3c8 02554bb0 791b3d64 ffffffff mscorsvr!RegisterWaitForSingleObjectCallback_Worker+0x139
0d58fafc 791ebb31 00101ae0 791ebbcc 02554bb0 mscorsvr!Thread::DoADCallBack+0x5c
0d58fb70 791d4096 02554bb0 02537ae8 793ecec8 mscorsvr!QueueUserWorkItemCallback+0xe3
0d58fb84 791d4055 02537ae8 00000000 791d3fe5 mscorsvr!ThreadpoolMgr::ExecuteWorkRequest+0x19
0d58fba4 791fdacc 00000000 79172262 79172270 mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x129
0d58ffb8 77e66063 0254c108 00000000 00000000 mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44
0d58ffec 00000000 791fda8b 0254c108 00000000 kernel32!BaseThreadStart+0x34

...this means that someone from managed code used WaitHandle for synchronization, and calls WaitOne().

Checking the managed bit of this thread we can see that we are in WebForm1.Page_Load, calling into a function SlowStuff.seventoo.Service.DoSomethingSlow(), which in turn invokes a web service call, and now sits waiting in a function called GetRequestStream().

0:061> !clrstack
Thread 61
ESP         EIP       
0x0d58f5d8  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Boolean System.Threading.WaitHandle.WaitOneNative(I,UI4,Boolean)
0x0d58f5ec  0x799e4bb1 [DEFAULT] [hasThis] Boolean System.Threading.WaitHandle.WaitOne(I4,Boolean)
0x0d58f620  0x0c9a5cb7 [DEFAULT] [hasThis] Class System.IO.Stream System.Net.HttpWebRequest.GetRequestStream()
0x0d58f64c  0x0c9a00c2 [DEFAULT] [hasThis] SZArray Object System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String,SZArray Object)
0x0d58f694  0x0c870721 [DEFAULT] [hasThis] String SlowStuff.seventoo.Service.DoSomethingSlow()
0x0d58f6b4  0x0c870532 [DEFAULT] [hasThis] Void SlowStuff.WebForm1.Page_Load(Object,Class System.EventArgs)
0x0d58f6dc  0x0c952ea4 [DEFAULT] [hasThis] Void System.Web.UI.Control.OnLoad(Class System.EventArgs)
0x0d58f6ec  0x0c952de4 [DEFAULT] [hasThis] Void System.Web.UI.Control.LoadRecursive()
0x0d58f700  0x0c952097 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0d58f744  0x0c950eef [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0d58f780  0x0c95095b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0d58f788  0x0c950934 [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0d58f798  0x0c76de50 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0d58f7e0  0x0c76d8ba [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0d58f828  0x0c76d783 [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0d58f844  0x02446897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0d58f880  0x0cb43cf4 [DEFAULT] [hasThis] Void System.Web.RequestQueue.WorkItemCallback(Object)
0x0d58fac0  0x791b3208 [FRAME: ContextTransitionFrame]

If we run ~* e !clrstack to get the managed stack for all the threads we find that 33 requests are waiting like this in GetRequestStream and 2 requests are waiting in GetResponse.

0:042> kb 2000
ChildEBP RetAddr  Args to Child              
0d0cebc4 7c822114 77e6711b 00000001 0d0cec14 ntdll!KiFastSystemCallRet
0d0cebc8 77e6711b 00000001 0d0cec14 00000000 ntdll!NtWaitForMultipleObjects+0xc
0d0cec70 791e0b3b 00000001 0d0ced50 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0d0ceca0 791e0bdd 00000001 0d0ced50 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0d0cecf4 791d5ba9 00000001 0d0ced50 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0d0ced3c 024840c8 0d0ced48 00000000 000186a0 mscorsvr!WaitHandleNative::CorWaitOneNative+0x6f
WARNING: Frame IP not in any known module. Following frames may be wrong.
0d0ceda8 0ca29fd5 00000000 02970580 02970580 0x24840c8
0d0ceddc 0ca29e49 02970580 02970580 0677c0c0 0xca29fd5
0d0cee0c 0ca29e06 0c9a0110 02970580 0677c0c0 0xca29e49
0d0cee54 0c870721 02970a0c 0296f868 02970580 0xca29e06
0d0cee78 0c870532 02970560 0296f868 067592f0 0xc870721
0d0cee9c 0c952ea4 067592f0 0296f868 0296c800 0xc870532
0d0cefcc 799a67a2 0296c868 0296cb6c 799b4168 0xc952ea4
0d0cefd8 799b4168 00000002 00000000 024a14c4 mscorlib_79990000+0x167a2
0d0cf060 79217188 00000000 00dc9160 0672a508 mscorlib_79990000+0x24168
02442f60 24548bca 2c15ff10 8b020b95 8bcb8bd8 mscorsvr!ComCallMLStubCache::CompileMLStub+0x429
02442f60 00000000 2c15ff10 8b020b95 8bcb8bd8 0x24548bca

Thread 42
ESP         EIP       
0x0d0ced6c  0x7c82ed54 [FRAME: ECallMethodFrame] [DEFAULT] Boolean System.Threading.WaitHandle.WaitOneNative(I,UI4,Boolean)
0x0d0ced80  0x799e4bb1 [DEFAULT] [hasThis] Boolean System.Threading.WaitHandle.WaitOne(I4,Boolean)
0x0d0cedb4  0x0ca29fd5 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Net.HttpWebRequest.GetResponse()
0x0d0cede4  0x0ca29e49 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Web.Services.Protocols.WebClientProtocol.GetWebResponse(Class System.Net.WebRequest)
0x0d0cee14  0x0ca29e06 [DEFAULT] [hasThis] Class System.Net.WebResponse System.Web.Services.Protocols.HttpWebClientProtocol.GetWebResponse(Class System.Net.WebRequest)
0x0d0cee18  0x0c9a0110 [DEFAULT] [hasThis] SZArray Object System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String,SZArray Object)
0x0d0cee60  0x0c870721 [DEFAULT] [hasThis] String SlowStuff.seventoo.Service.DoSomethingSlow()
0x0d0cee80  0x0c870532 [DEFAULT] [hasThis] Void SlowStuff.WebForm1.Page_Load(Object,Class System.EventArgs)
0x0d0ceea8  0x0c952ea4 [DEFAULT] [hasThis] Void System.Web.UI.Control.OnLoad(Class System.EventArgs)
0x0d0ceeb8  0x0c952de4 [DEFAULT] [hasThis] Void System.Web.UI.Control.LoadRecursive()
0x0d0ceecc  0x0c952097 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0d0cef10  0x0c950eef [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0d0cef4c  0x0c95095b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)
0x0d0cef54  0x0c950934 [DEFAULT] [hasThis] Void System.Web.HttpApplication/CallHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()
0x0d0cef64  0x0c76de50 [DEFAULT] [hasThis] Class System.Exception System.Web.HttpApplication.ExecuteStep(Class IExecutionStep,ByRef Boolean)
0x0d0cefac  0x0c76d8ba [DEFAULT] [hasThis] Void System.Web.HttpApplication.ResumeSteps(Class System.Exception)
0x0d0ceff4  0x0c76d783 [DEFAULT] [hasThis] Class System.IAsyncResult System.Web.HttpApplication.System.Web.IHttpAsyncHandler.BeginProcessRequest(Class System.Web.HttpContext,Class System.AsyncCallback,Object)
0x0d0cf010  0x02446897 [DEFAULT] [hasThis] Void System.Web.HttpRuntime.ProcessRequestInternal(Class System.Web.HttpWorkerRequest)
0x0d0cf04c  0x02446448 [DEFAULT] Void System.Web.HttpRuntime.ProcessRequest(Class System.Web.HttpWorkerRequest)
0x0d0cf058  0x02442fc5 [DEFAULT] [hasThis] I4 System.Web.Hosting.ISAPIRuntime.ProcessRequest(I,I4)
0x0d0cf120  0x79217188 [FRAME: ContextTransitionFrame] 
0x0d0cf200  0x79217188 [FRAME: ComMethodFrame]

Hmm... so what are these waiting for?

Well... when we are waiting in GetResponse we have sent the request to the web service server and we are waiting for it to return, in which case we can either be blocked/waiting on the server or we could be waiting on a callback thread to process the results on. In our case we have not exhausted the thread pool so we would have had free threads to use as callback threads. So... we are likely waiting on results back from the server on the two threads that are in GetResponse().

Although these two threads are interesting I find it even more interesting that we have 33 threads waiting in GetRequestStream(). This function is just supposed to get us a stream that we can send the request on. OK! So that seems like something that would be rather quick, however in doing so it also needs to get a connection.

So... what would be limiting us from getting a connection?

System.Net.Connections are managed by a class called ServicePoint, that contains information about the URI it services, the maximum number of connections, whether it expects 100-continue etc.

From !dso (!dumpstackobjects) on thread 61 (which is waiting on GetRequestStream() we can get the HttpWebRequest to the Web Service.

0:061> !dso
Thread 61
ESP/REG    Object     Name
...
0xd58f5f0 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f5f4 0x6c8f23c System.Net.LazyAsyncResult
0xd58f5f8 0x6c95868 System.Threading.ManualResetEvent
0xd58f610 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f620 0x6c60b10 SlowStuff.seventoo.Service
0xd58f624 0x6c60b10 SlowStuff.seventoo.Service
0xd58f628 0x677c0c0 System.String    DoSomethingSlow
0xd58f630 0x6c8b0d0 System.Net.HttpWebRequest
0xd58f634 0x6c8c2e0 System.String    http://tempuri.org/DoSomethingSlow
0xd58f638 0x677335c System.String    http://tempuri.org/DoSomethingSlow
0xd58f64c 0x6c60b10 SlowStuff.seventoo.Service
...

And if we dump this out we find that it has a member variable called _ServicePoint, so that looks pretty interesting...

0:061> !do 0x6c8b0d0 
Name: System.Net.HttpWebRequest
MethodTable 0x0c978b94
EEClass 0x0c96e798
Size 204(0xcc) bytes
GC Generation: 0
mdToken: 0x020001fa  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c977cf4
        MT      Field     Offset                 Type       Attr      Value Name
0x79b9788c 0x40000b6      0x4                CLASS   instance 0x00000000 __identity
0x0c9774d0 0x4000f0c        0                CLASS     shared   static s_PrefixList
    >> Domain:Value 0x000d5330:NotInit  0x001024b8:0x0274a580 <<
0x0c978b94 0x4000f30     0x10                CLASS   instance 0x06c8f23c _WriteAResult
0x0c978b94 0x4000f31     0x14                CLASS   instance 0x00000000 _ReadAResult
0x0c978b94 0x4000f32     0x18                CLASS   instance 0x00000000 _ContinueDelegate
0x0c978b94 0x4000f33     0x1c                CLASS   instance 0x0274e5c4 _ServicePoint
0x0c978b94 0x4000f34     0x20                CLASS   instance 0x06c91c58 _WriteEvent
0x0c978b94 0x4000f35     0x24                CLASS   instance 0x00000000 _WriteInProgressEvent
0x0c978b94 0x4000f36     0xb4       System.Boolean   instance 0 _HaveResponse
0x0c978b94 0x4000f37     0xb5       System.Boolean   instance 0 _OnceFailed
0x0c978b94 0x4000f38     0xb6       System.Boolean   instance 0 _WriteNotifed
0x0c978b94 0x4000f39     0x28                CLASS   instance 0x00000000 _HttpResponse
0x0c978b94 0x4000f3a     0x2c                CLASS   instance 0x00000000 _ResponseException
0x0c978b94 0x4000f3b     0x30                CLASS   instance 0x00000000 _CoreResponse
0x0c978b94 0x4000f3e     0x34                CLASS   instance 0x027447d0 _Verb
0x0c978b94 0x4000f3f     0x38                CLASS   instance 0x027447d0 _OriginVerb
0x0c978b94 0x4000f40     0x3c                CLASS   instance 0x06c8b6b4 _HttpRequestHeaders
0x0c978b94 0x4000f41     0x40                CLASS   instance 0x00000000 _WriteBuffer
0x0c978b94 0x4000f42     0x94         System.Int32   instance 2 _HttpWriteMode
0x0c978b94 0x4000f43     0xb7       System.Boolean   instance 0 _AllowAutoRedirect
0x0c978b94 0x4000f44     0x44                CLASS   instance 0x06c60b60 _Uri
0x0c978b94 0x4000f45     0x48                CLASS   instance 0x06c60b60 _OriginUri
0x0c978b94 0x4000f46     0x4c                CLASS   instance 0x00000000 _ChallengedUri
0x0c978b94 0x4000f47     0x50                CLASS   instance 0x00000000 _ChallengedSpn
0x0c978b94 0x4000f48     0x54                CLASS   instance 0x00000000 _MediaType
0x0c978b94 0x4000f49      0x8         System.Int64   instance -1 _ContentLength
0x0c978b94 0x4000f4a     0x58                CLASS   instance 0x0274c80c _Version 

Btw, we can verify that we have picked the right one by dumping out the Uri

0:061> !do 0x06c60b60
Name: System.Uri
MethodTable 0x020b4eb0
EEClass 0x02472e10
Size 76(0x4c) bytes
GC Generation: 0
mdToken: 0x020001c4  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x020b4804
        MT      Field     Offset                 Type       Attr      Value Name
0x79b9788c 0x40000b6      0x4                CLASS   instance 0x00000000 __identity
0x020b4eb0 0x4000e38      0x8                CLASS   instance 0x06c60e44 m_AbsoluteUri
0x020b4eb0 0x4000e39     0x3c       System.Boolean   instance 0 m_AlreadyEscaped
0x020b4eb0 0x4000e3a      0xc                CLASS   instance 0x066f0230 m_Fragment
0x020b4eb0 0x4000e3b     0x3d       System.Boolean   instance 0 m_FragmentEscaped
0x020b4eb0 0x4000e3c     0x2c         System.Int32   instance -41085911 m_Hash
...

0:061> !do 0x06c60e44
Name: System.String
MethodTable 0x79b94638
EEClass 0x79b94984
Size 108(0x6c) bytes
GC Generation: 0
mdToken: 0x0200000f  (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)
String: http://seventoo/myslowwebservice/service.asmx
FieldDesc*: 0x79b949e8
        MT      Field     Offset                 Type       Attr      Value Name
0x79b94638 0x4000013      0x4         System.Int32   instance 46 m_arrayLength
0x79b94638 0x4000014      0x8         System.Int32   instance 45 m_stringLength
0x79b94638 0x4000015      0xc          System.Char   instance 0x68 m_firstChar
0x79b94638 0x4000016        0                CLASS     shared   static Empty
    >> Domain:Value 0x000d5330:0x066f0230 0x001024b8:0x066f0230 <<
0x79b94638 0x4000017      0x4                CLASS     shared   static WhitespaceChars
    >> Domain:Value 0x000d5330:0x066f0244 0x001024b8:0x027122b8 <<

Back to the service point

0:061> !do 0x0274e5c4 
Name: System.Net.ServicePoint
MethodTable 0x0c9c5d08
EEClass 0x0c9b6840
Size 92(0x5c) bytes
GC Generation: 1
mdToken: 0x02000247  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c5730
        MT      Field     Offset                 Type       Attr      Value Name
0x0c9c5d08 0x400112d     0x48       System.Boolean   instance 0 m_ProxyServicePoint
0x0c9c5d08 0x400112e     0x49       System.Boolean   instance 0 m_UserChangedLimit
0x0c9c5d08 0x400112f     0x2c         System.Int32   instance 2 m_ConnectionLimit
0x0c9c5d08 0x4001130     0x30         System.Int32   instance 2 m_ConnectionMode
0x0c9c5d08 0x4001131     0x4a       System.Boolean   instance 1 m_SupportsPipelining
0x0c9c5d08 0x4001132      0x4                CLASS   instance 0x0274e620 m_ConnectionGroupList
0x0c9c5d08 0x4001133      0x8                CLASS   instance 0x0677bf00 m_Address
0x0c9c5d08 0x4001134     0x34         System.Int32   instance 900000 m_MaxIdleTime
0x0c9c5d08 0x4001135     0x50            VALUETYPE   instance start at 0x0274e614 m_ExpiresAt
0x0c9c5d08 0x4001136      0xc                CLASS   instance 0x0677bf4c m_ConnectionName
0x0c9c5d08 0x4001137     0x10                CLASS   instance 0x0677e2e8 m_IPHostEntryInfos
0x0c9c5d08 0x4001138     0x38         System.Int32   instance 0 m_CurrentAddressInfoIndex
0x0c9c5d08 0x4001139     0x3c         System.Int32   instance 2 m_ConnectionsSinceDns
0x0c9c5d08 0x400113a     0x40         System.Int32   instance 2 m_CurrentConnections
0x0c9c5d08 0x400113b     0x14                CLASS   instance 0x00000000 m_ServerCertificate
0x0c9c5d08 0x400113c     0x18                CLASS   instance 0x00000000 m_ClientCertificate
0x0c9c5d08 0x400113d     0x4b       System.Boolean   instance 1 m_UseNagleAlgorithm
0x0c9c5d08 0x400113e     0x4c       System.Boolean   instance 1 m_Expect100Continue
0x0c9c5d08 0x400113f     0x1c                CLASS   instance 0x0274df08 m_LookupString
0x0c9c5d08 0x4001140     0x20                CLASS   instance 0x0677e248 m_CachedRemoteIPAddressInfo
0x0c9c5d08 0x4001141     0x24                CLASS   instance 0x027543e4 m_Version
0x0c9c5d08 0x4001142     0x4d       System.Boolean   instance 1 m_Understands100Continue
0x0c9c5d08 0x4001143     0x28                CLASS   instance 0x0274e72c DnsLock
0x0c9c5d08 0x4001144     0x44         System.Int32   instance 140 m_HashCode
0x0c9c5d08 0x4001145     0x4e       System.Boolean   instance 1 m_ComputedHashCode

Hmm, it has a connection limit of 2. That sorta rings a bell, we have two requests in GetResponse(), i.e. in the middle of some web service requests, and the rest waiting for a connection, so this makes total sense.

We could actually stop right here and conclude that this is the source of the problem. I.e. that we only have two available connections, but let's talk about that a little bit later and dig a bit further instead...

If we go back to thread 42 (GetResponse()) and run !dso there, we find a ConnectStream on the stack

0:042> !dso
Thread 42
ESP/REG    Object     Name
...
0xd0cee2c 0x6cef7b8 System.Net.ConnectStream
...

This connect stream has a connection (one of the two created by the service point)

0:042> !do 0x6cef7b8 
Name: System.Net.ConnectStream
MethodTable 0x0c9c44c0
EEClass 0x0c9b6270
Size 104(0x68) bytes
GC Generation: 0
mdToken: 0x02000272  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c3e5c
        MT      Field     Offset                 Type       Attr      Value Name
0x79b9788c 0x40000b6      0x4                CLASS   instance 0x00000000 __identity
0x79b9c140 0x4000b17        0                CLASS     shared   static Null
    >> Domain:Value 0x000d5330:0x066f1eec 0x001024b8:0x0674a16c <<
0x0c9c44c0 0x400122f     0x3c         System.Int32   instance 1 m_CallNesting
0x0c9c44c0 0x4001230     0x20                CLASS   instance 0x06cf0c28 m_BufferedData
0x0c9c44c0 0x4001231     0x58       System.Boolean   instance 1 m_WriteBufferEnable
0x0c9c44c0 0x4001232     0x59       System.Boolean   instance 1 m_BufferOnly
0x0c9c44c0 0x4001233      0x8         System.Int64   instance 293 m_TotalBytesToWrite
0x0c9c44c0 0x4001234     0x10         System.Int64   instance 0 m_BytesLeftToWrite
0x0c9c44c0 0x4001235     0x24                CLASS   instance 0x0274f03c m_Connection
0x0c9c44c0 0x4001236     0x5a       System.Boolean   instance 1 m_WriteStream
0x0c9c44c0 0x4001237     0x28                CLASS   instance 0x00000000 m_ReadBuffer
0x0c9c44c0 0x4001238     0x40         System.Int32   instance 0 m_ReadOffset
0x0c9c44c0 0x4001239     0x44         System.Int32   instance 0 m_ReadBufferSize

0:061> !do 0x0274f03c
Name: System.Net.Connection
MethodTable 0x0c9c52f0
EEClass 0x0c9b6454
Size 152(0x98) bytes
GC Generation: 1
mdToken: 0x02000270  (c:\windows\assembly\gac\system\1.0.5000.0__b77a5c561934e089\system.dll)
FieldDesc*: 0x0c9c4bb8
        MT      Field     Offset                 Type       Attr      Value Name
0x0c9c52f0 0x40011f3     0x80       System.Boolean   instance 1 m_CanPipeline
0x0c9c52f0 0x40011f4     0x81       System.Boolean   instance 0 m_Pipelining
0x0c9c52f0 0x40011f5     0x82       System.Boolean   instance 1 m_KeepAlive
0x0c9c52f0 0x40011f6     0x58         System.Int32   instance 0 m_Error
0x0c9c52f0 0x40011f7      0x4                CLASS   instance 0x027be068 m_ReadBuffer
0x0c9c52f0 0x40011f8     0x5c         System.Int32   instance 25 m_BytesRead
0x0c9c52f0 0x40011f9     0x60         System.Int32   instance 25 m_HeadersBytesUnparsed
0x0c9c52f0 0x40011fa     0x64         System.Int32   instance 25 m_BytesScanned
0x0c9c52f0 0x40011fb     0x68         System.Int32   instance 0 m_TotalResponseHeadersLength
0x0c9c52f0 0x40011fc     0x6c         System.Int32   instance 65536 m_MaximumResponseHeadersLength
0x0c9c52f0 0x40011fd      0x8                CLASS   instance 0x06d0f0bc m_ResponseData
0x0c9c52f0 0x40011fe     0x70         System.Int32   instance 1 m_ReadState
0x0c9c52f0 0x40011ff      0xc                CLASS   instance 0x027bf074 m_WaitList
0x0c9c52f0 0x4001200     0x10                CLASS   instance 0x027bf0dc m_WriteList
0x0c9c52f0 0x4001201     0x74         System.Int32   instance 25 m_CurrentRequestIndex
0x0c9c52f0 0x4001202     0x14                CLASS   instance 0x027bf1c0 m_StatusLineInts
0x0c9c52f0 0x4001203     0x18                CLASS   instance 0x066f0230 m_StatusDescription
0x0c9c52f0 0x4001204     0x78         System.Int32   instance 0 m_StatusState

The connection in turn has a waitlist that contains all the requests that are waiting for this particular connection

0:061> !do 0x027bf074
Name: System.Collections.ArrayList
MethodTable 0x79ba2ee4
EEClass 0x79ba3020
Size 24(0x18) bytes
GC Generation: 0
mdToken: 0x02000100  (c:\windows\microsoft.net\framework\v1.1.4322\mscorlib.dll)
FieldDesc*: 0x79ba3084
        MT      Field     Offset                 Type       Attr      Value Name
0x79ba2ee4 0x4000362      0x4                CLASS   instance 0x02a7bd44 _items
0x79ba2ee4 0x4000363      0xc         System.Int32   instance 17 _size
0x79ba2ee4 0x4000364     0x10         System.Int32   instance 35 _version
0x79ba2ee4 0x4000365      0x8                CLASS   instance 0x00000000 _syncRoot

In this case, 17 requests are waiting for this connection, WOW!!!

And if we would run

0:061> .foreach (req {!do -v 0x02a7bd44 -short}) {!gcroot ${req}}

To find out on what threads all these requests are rooted/used we would see them rooted on thread 24, 26, 28, 30, 32, 34, 36, 37, 38, 44, 46, 48, 50, 52, 54, 56, and 58

The roots look something like this:

Scan Thread 46 (0x1540)
ESP:d1ceb9c:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebb8:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebbc:Root:0x29aeb18(System.Net.HttpWebRequest)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebc0:Root:0x68b7fb0(System.Net.LazyAsyncResult)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebdc:Root:0x29aeb18(System.Net.HttpWebRequest)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebec:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebf0:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cebfc:Root:0x29aeb18(System.Net.HttpWebRequest)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cec18:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29aeb18(System.Net.HttpWebRequest)
ESP:d1cec30:Root:0x29afcb0(System.Web.Services.Protocols.SoapClientMessage)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec34:Root:0x29aeb18(System.Net.HttpWebRequest)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec38:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec50:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec64:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29ae580(SlowStuff.seventoo.Service)
ESP:d1cec68:Root:0x29ae580(SlowStuff.seventoo.Service)->0x29ae580(SlowStuff.seventoo.Service)

So all these threads are waiting for our thread 42 to finish. In other words, the last thread in the waitlist will have to wait for (16*Web Service execution time) + time for currently executing request, before it can even start with its request. Now, that can be a pretty long time:)

When a new request comes in it gets scheduled in on one of the two connections and stands in line awaiting its turn, just like a busy day at the DMV.

A little background

Why 2 connections??? that seems like an awfully small number...

Let's take a look at RFC 2616, the RFC for Hypertext Transfer Protocol -- HTTP/1.1. This is what it says about HTTP connections.

"Clients that use persistent connections SHOULD limit the number of simultaneous connections that they maintain to a given server. A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy. A proxy SHOULD use up to 2*N connections to another server or proxy, where N is the number of simultaneously active users. These guidelines are intended to improve HTTP response times and avoid congestion."

So now you know where the number 2 likely came from. But as you can see, in our case it presents a bit of a problem.

Solutions please!!!

In the <system.net> section of machine.config you can set the maxConnection value to indicate how many concurrent connections you want to allow. By default this is set to * and 2, which means that we will have a maximum of 2 simultaneous connections to any given address/port.

Each service point handles a particular URI, so in essence this means that any given service point will have a maximum of 2 connections.

	<connectionManagement>
		<add address="*" maxconnection="2"/>
	</connectionManagement>

You can either change the general maxconnection setting or add one for the particular ipaddress / server name where your web service resides.

Along with setting this up, you should also look over the numbers for maxWorkerThreads/maxIOThreads, minFreeThreads and minLocalRequestFreeThreads when calling web services, per this kb article http://support.microsoft.com/?id=821268, but the biggest contention point is definitely maxconnections.

Better yet, in 2.0 a new machine config setting was introduced, and made the default...

<ProcessModel>
  <autoconfig = "true">
</ProcessModel>

This automatically configures these values for you per the article above

Wrap up

I know that in this post in particular I might have made huge leaps of logic in some places, my college algebra teacher definitely wouldn't have been proud:) but I want to defend myself by saying that I am trying to show debugging in a step-by-step fashion, when really a lot of the time, when I am debugging, I use the "poke around until I find something interesting" technique that I mentioned earlier...

So far the posts have touched as distant topics as garbage collection, pinvokes, xmlserialization, asp.net intrinsics, and now connection management. It is impossible to know how everything is implemented which is why I am trying to post case studies on a lot of different areas so at least you will know bits and pieces of how things work and build upon that.

Have fun out there!

 

 





  • There are some blog posts with great technical information you just wouldn't get from a plain knowledge...
  • .NET 点滴 ASP.NET 2.0
  • i am really impressed about your article, but I am curious about performance issues, other than connection limit for using web services to connect to backend.

    also what is the definition of 'peak volume' in this article, (req/s) ?

    ps; i am happy to find such a cool blog about debugging

  • Thank you much:)

    Other performance issues would include

    a) the web service taking a long time for whatever reason, i.e. locks, slow database requests etc.
    b) dns issues (if its slow doing dns lookups)
    c) serialization and deserialization of parameters and return values. This can be pretty expensive both timewise and memorywise if for example you are passing large datasets back and forth.

    Tess
  • thanks :)  

    write more blog you should ;)
  • Hi Tess

    I've a question about the !threadpool command. The Completion Port Thread: line of the output from that command, as I understand it these threads are used to transport requests from IIS to ASP.NET (worker process), is that correct?

    I've a hang dump of a customers site which has the following output from the !threadpool command;

    0:000> !threadpool
    Worker Thread: Total: 4 Running: 4 Idle: 2 MaxLimit: 100 MinLimit: 4
    Work Request in Queue: 0
    --------------------------------------
    Number of Timers: 0
    --------------------------------------
    Completion Port Thread: Total: 0 Free: 0 MaxFree: 8 CurrentLimit: 0 MaxLimit: 1000 MinLimit: 4

    where the number of free completion port threads is 0. I would have expected the 'free' count to be the same as the MinLimit.

    At this point our app has ground to a hault, there's no threads waiting in any of the stacks, but there's a large amount of threadabortexceptions.

    My main question is how could the Free: count of completion threads ever be 0?

    Thanks

    Hope you're over the flu :)
  • Thank you Howard, starting to feel better:)

    The completion port threads are usually used to receive results from async requests.  Requests from IIS to ASP.NET are brought in on native threads before they are dispatched as work requests to the queue, so in your case it seems like the requests are stuck either before they come in to asp.net or they are stuck on the way back. If it is on IIS 5, i would take a look at the inetinfo process.

    The minlimit is usually set to one per CPU unless you specify something else, and this is used to determine wether or not the threadpool should grow in case of a
    new request for a completion port thread. Contrary to what the name suggests it doesnt tell you the minimum number of IO/Completion port threads in the process.  

    Here is a pretty good explanation on how it works for worker threads (pretty much the same goes work io threads)

    http://www.codeproject.com/csharp/threadtests.asp

    and your MinLimit is shown as "CPU's" in the ShouldGrowThreadPool diagram.

    (Quick note: The performance "problems" he mentions when doing a burst of requests can be fixed by configuring minWorkerThreads and minIOThreads as mentioned in http://support.microsoft.com/kb/810259/, however the value should be left at the default (0) if your application does not have sudden bursts of requests)

    The threadpool is grown and shrunk based on a number of different factors, such as CPU usage, # of incomming and queued requests etc.

    Hope this helps clarify the output.

    Tess
  • Hey Tess

    Thanks very much for the reply and the links, it does help a great deal. I'm still not sure I completely understand whats going on, but I'm sure it'll eventually sink in.

    So the value of 4 makes sense for the minLimit because I've four heaps in that dump. In the same vien as your answer does the currentLimit really _mean_ the currentLimit? If so how could this ever be 0, if the currentLimit is 0 surely that would mean no requests would ever be serviced, or am I just being thick? Unless it means that the threadpool doesn't need to grow anymore at the moment?. I know, so many questions. I guess I just need to dissect the code in the threadtests example on the codeproject.
  • To be perfectly honest i dont know what the currentlimit means, but i think (mayor guess here) that it could be the number of request that come in before we start looking at if we should grow the threadpool.  

    Either way, i know for sure, that it wont stop the threadpool from creating new such threads.

    I think the problem in your particular case is that requests get stuck either before they come in to the aspnet isapi or on their way back,  and its very hard to say why that would be.  It could be anything from a SSL accellerator thats acting up, to a bug of some sort to faulty content length causing requests to not be seen as completed.  
  • Hi Tess

    You were right in the fact that the requests were stuck outside of ASP.NET. It was a nasty deadlock situation involving Oracle & DCOM and an inappropriate critical section.

    I'm just posting this incase somebody gets a threadpool output like the one I posted above and might wonder where to start looking for the problem.

    There's still many things to learn about the threadpool though and what the numbers actually mean.

    Thanks
  • Very nice to know,  btw, how did you find out?

    (sorry, just too curious for my own good some times:))
  • Yeah you are, thought you'd be sick of crashdumps by now :)

    We reproduced the problem on a couple of test systems and grabbed the dumps in the hang state. !clrstack showed that the methods at the top of the stacks were the same in each case so we knew we probably had captured the problem. Each of the methods at the top of the stacks needs to enter a critical section, running !locks showed 1 thread entering the lock 3 times, sounds a bit fishy.

    On one of the unmanaged stacks there was an outgoing RPC call preceded by a call to FinalRelease() on a COM interface, so something is being destructed and needs to make a DCOM call to clear up.

    Putting 2 and 2 together and cutting a long story short this was the code attempting to kill a db query. So why would it hang attempting to kill a db query? ah well thats because the original query had timed out the frontend but was still running in the database. So when the frontend destructed the object containing the db COM intefaces it also grabbed the lock, but it couldn't destruct until the db interfaces released, but they couldn't release because the query was still running. You needed the same lock to start any db operations, oh dear.

    As more db querys were started they all were waiting for the lock to be released, eventually the threadpool was exhausted hence the output above. This is all in a mixed mode C++ dll, is it managed? is it not? who knows :)

    Sounds so simple now :)
  • nice catch:)
  • PingBack from http://www.ellisweb.net/2006/04/items-of-interest-20060403/
Page 1 of 2 (30 items) 12
Leave a Comment
  • Please add 1 and 3 and type the answer here:
  • Post