One of my customers complaints of an unstable WCF service on his production environment. I heard them saying: "The service is frozen", "Getting unknown error from the client side UI", "The service is unresponsive". The real challenge of debugging a production issues is to reproduce it in some other kind of environment. And this is really a BIG task. But I was somewhat lucky; using my unit tests from a VSTS test client , I ran it for almost 10/12 hours until I reached a situation where all the complaints seem to be true. I took a memory dump of my WCF service process.
I tried to launch the client GUI. But unfortunately, after sometime I saw "Unknown Error" form the GUI. I checked the client side log and found something very interesting: 'The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:09:59.7289458'.
I tried to visualize the launching of the GUI application. It will contact almost 5/6 endpoints of the back end WCF service before successfully launching it. So, I suspect it can be because of at least one of the endpoint which is timing out.
This great blog: http://blogs.msdn.com/mahjayar/archive/2006/12/12/debugging-wcf-applications.aspx really helped me to debug.
I started with dumping the ServiceHost types.
0:025> !dumpheap -type System.ServiceModel.ServiceHost
Statistics: MT Count TotalSize Class Name50e822f4 11 220 System.ServiceModel.ExtensionCollection`1[[System.ServiceModel.ServiceHostBase, System.ServiceModel]]50eaf9c8 11 264 System.Collections.Generic.List`1[[System.ServiceModel.IExtension`1[[System.ServiceModel.ServiceHostBase, System.ServiceModel]], System.ServiceModel]]50e62118 11 352 System.ServiceModel.ServiceHost+ReflectedContractCollection50e553e8 11 1496 System.ServiceModel.ServiceHost
I found 11 instances of System.ServiceModel.ServiceHost type. I checked the wcf servicehost config file and found there are 11 endpoints configured. So, it makes sense.
My particular interest was to see if any of the endpoints had reached it's throttle limit. To look into it I dumped System.ServiceModel.Dispatcher.FlowThrottle type, and found 33 objects.
0:025> !dumpheap -type System.ServiceModel.Dispatcher.FlowThrottleAddress MT Size00fe56f8 50e49ffc 36 00fe57b4 50e49ffc 36 0103bda8 50e49ffc 36 0110cc74 50e49ffc 36 0110cce4 50e49ffc 36 0112ef04 50e49ffc 36 01287bd0 50e49ffc 36 01287c40 50e49ffc 36 0129b290 50e49ffc 36 012d6a94 50e49ffc 36 012d6b04 50e49ffc 36 012e57b0 50e49ffc 36 01305bd8 50e49ffc 36 01305c48 50e49ffc 36 013114f4 50e49ffc 36 0132886c 50e49ffc 36 013288dc 50e49ffc 36 01332840 50e49ffc 36 01337950 50e49ffc 36 013379c0 50e49ffc 36 01357ba0 50e49ffc 36 013e873c 50e49ffc 36 013e87ac 50e49ffc 36 013f4744 50e49ffc 36 0150ebd8 50e49ffc 36 0150ec48 50e49ffc 36 01517d00 50e49ffc 36 01523768 50e49ffc 36 015237d8 50e49ffc 36 01530948 50e49ffc 36 015452b4 50e49ffc 36 01545324 50e49ffc 36 0154d508 50e49ffc 36 total 33 objectsStatistics: MT Count TotalSize Class Name50e49ffc 33 1188 System.ServiceModel.Dispatcher.FlowThrottleTotal 33 objects
I was little curious to know why 33 objects are present for 11 endpoints. I checked the endpoint behavior at the server side :
<behavior name="HermesServiceBehavior"> <serviceMetadata /> <serviceDebug includeExceptionDetailInFaults="true" /> <serviceThrottling maxConcurrentCalls="64" maxConcurrentSessions="64" maxConcurrentInstances="2147483647" /></behavior>
So, it answers my question; each endpoint has the three throttle behaviors. => 11*3= 33 FlowThrottles
My next step was to dump all these FlowThrottle and check for any saturation.
0.25> .foreach (name {!dumpheap -type System.ServiceModel.Dispatcher.FlowThrottle -short}) {.echo -------------; .echo name; !do name}
My formatted output looked like this:
....................................................................
-------------01287bd0Name: System.ServiceModel.Dispatcher.FlowThrottleMethodTable: 50e49ffcEEClass: 50e49f8cSize: 36(0x24) bytes(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)Fields: MT Field Offset Type VT Attr Value Name790fdb60 400327b 18 System.Int32 1 instance 64 capacity790fdb60 400327c 1c System.Int32 1 instance 0 count790f8a7c 400327d 4 System.Object 0 instance 01287bf4 mutex791118a4 400327e 8 ...ding.WaitCallback 0 instance 01287bb0 release00000000 400327f c 0 instance 01287c00 waiters790f9244 4003280 10 System.String 0 instance 00fe565c propertyName790f9244 4003281 14 System.String 0 instance 00fe56a0 configName-------------01287c40Name: System.ServiceModel.Dispatcher.FlowThrottleMethodTable: 50e49ffcEEClass: 50e49f8cSize: 36(0x24) bytes(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)Fields: MT Field Offset Type VT Attr Value Name790fdb60 400327b 18 System.Int32 1 instance 64 capacity790fdb60 400327c 1c System.Int32 1 instance 64 count790f8a7c 400327d 4 System.Object 0 instance 01287c64 mutex791118a4 400327e 8 ...ding.WaitCallback 0 instance 01287c20 release00000000 400327f c 0 instance 01287c70 waiters790f9244 4003280 10 System.String 0 instance 00fe5620 propertyName790f9244 4003281 14 System.String 0 instance 00fe5758 configName-------------0129b290Name: System.ServiceModel.Dispatcher.FlowThrottleMethodTable: 50e49ffcEEClass: 50e49f8cSize: 36(0x24) bytes(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)Fields: MT Field Offset Type VT Attr Value Name790fdb60 400327b 18 System.Int32 1 instance 2147483647 capacity790fdb60 400327c 1c System.Int32 1 instance 0 count790f8a7c 400327d 4 System.Object 0 instance 0129b2b4 mutex791118a4 400327e 8 ...ding.WaitCallback 0 instance 0129b270 release00000000 400327f c 0 instance 0129b2c0 waiters790f9244 4003280 10 System.String 0 instance 0103bd48 propertyName790f9244 4003281 14
..................................................................................
It was a great output, because of this text in RED. This proves that I was guessing on the right direction. There is actually one endpoint which has reached the throttle limit.
But which endpoint is this, we have 11?
Let's dump the related ServiceHost type:
0:025> !do 01287a3401287a34Name: System.ServiceModel.ServiceHostMethodTable: 50e553e8EEClass: 50e55368Size: 136(0x88) bytes(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)Fields: MT Field Offset Type VT Attr Value Name79103c00 4000915 28 System.Boolean 1 instance 0 aborted79103c00 4000916 29 System.Boolean 1 instance 0 closeCalled.................................................................................................................
.................................................................................................................50e04530 4002be2 3c ...erviceDescription 0 instance 01287cd0 description00000000 4002be3 40 0 instance 01287b2c extensions00000000 4002be4 44 0 instance 012d6834 externalBaseAddresses00000000 4002be5 48 0 instance 01293a80 implementedContracts50e124c0 4002be6 4c ...nceContextManager 0 instance 01287b58 instances7910b5d8 4002be7 6c System.TimeSpan 1 instance 01287aa0 openTimeout50df93e0 4002be8 50 ...rformanceCounters 0 instance 01295ef4 servicePerformanceCounters50e1272c 4002be9 54 ...r.ServiceThrottle 0 instance 01287b80 serviceThrottle50e321ac 4002bea 58 ...erviceCredentials 0 instance 00000000 readOnlyCredentials50e1f218 4002beb 5c ...orizationBehavior 0 instance 012d67ec readOnlyAuthorization00000000 4002bec 60 0 instance 00000000 UnknownMessageReceived7a751b00 4002bdd 74c System.Uri 0 static 00fe546c EmptyUri790f8a7c 4003075 74 System.Object 0 instance 00000000 singletonInstance790ffe7c 4003076 78 System.Type 0 instance 01057f28 serviceType50e62118 4003077 7c ...ontractCollection 0 instance 01287e00 reflectedContracts7910b450 4003078 80 System.IDisposable 0 instance 00000000 disposableInstance
The name of the endpoint can be found in this ServiceDescription
0:025> !do 01287cd0 Name: System.ServiceModel.Description.ServiceDescriptionMethodTable: 50e04530EEClass: 50e04478Size: 32(0x20) bytes(C:\WINDOWS\assembly\GAC_MSIL\System.ServiceModel\3.0.0.0__b77a5c561934e089\System.ServiceModel.dll)Fields: MT Field Offset Type VT Attr Value Name00000000 400306f 4 0 instance 01287cf0 behaviors790f9244 4003070 8 System.String 0 instance 01287d9c configurationName50dd0b68 4003071 c ...ndpointCollection 0 instance 01287d28 endpoints790ffe7c 4003072 10 System.Type 0 instance 01057f28 serviceType50dda3c4 4003073 14 ...scription.XmlName 0 instance 00000000 serviceName790f9244 4003074 18 System.String 0 instance 00fe7a2c serviceNamespace
It should be inside configurationName.
0:025> !do 01287d9c Name: System.StringMethodTable: 790f9244EEClass: 790f91a4Size: 100(0x64) bytes(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)String: **********.********.ExternalServiceFields: MT Field Offset Type VT Attr Value Name790fdb60 4000096 4 System.Int32 1 instance 42 m_arrayLength790fdb60 4000097 8 System.Int32 1 instance 41 m_stringLength790fad38 4000098 c System.Char 1 instance 4d m_firstChar790f9244 4000099 10 System.String 0 shared static Empty >> Domain:Value 001475b8:790d57b4 <<79122994 400009a 14 System.Char[] 0 shared static WhitespaceChars >> Domain:Value 001475b8:00fc13f4 <<
Here we go. The endpoint which is saturated with the max throttle limit is ExternalService. When the GUI launches it actually contacts this endpoint to get some data.