Recently, I have worked with several internal and external customers that are hitting an odd error when using WCF's TCP Port Sharing Service or Tcp Activation Service. The symptoms include client's timing out during channel open, an event log entry on the server, and, in some cases, a server process that doesn't ever respond to new connection requests once this problem occurs.

The Symptoms:

The two symptoms that are consistently seen for each customer reporting this issue are:

  1. An entry in the Windows Event Log that looked something like this:

    Log Name:      System
    Source:        SMSvcHost 3.0.0.0
    Date:          2/3/2010 3:18:25 PM
    Event ID:      8
    Task Category: Sharing Service
    Level:         Error
    Keywords:      Classic
    User:          LOCAL SERVICE
    Computer:      Server1.Contoso.com
    Description:  An error occurred while dispatching a duplicated socket: this handle is now leaked in the process.
    ID: 6408
    Source: System.ServiceModel.Activation.TcpWorkerProcess/54166759
    Exception: System.TimeoutException: This request operation sent to http://schemas.microsoft.com/2005/12/ServiceModel/Addressing/Anonymous did not receive a reply within the configured timeout (00:01:00).  The time allotted to this operation may have been a portion of a longer timeout.  This may be because the service is still processing the operation or because the service was unable to send a reply message.  Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client.

    Server stack trace: 
       at System.ServiceModel.AsyncResult.End[TAsyncResult](IAsyncResult result)
       at System.ServiceModel.Channels.ServiceChannel.SendAsyncResult.End(SendAsyncResult result)
       at System.ServiceModel.Channels.ServiceChannel.EndCall(String action, Object[] outs, IAsyncResult result)
       at System.ServiceModel.Channels.ServiceChannelProxy.InvokeEndService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
       at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
    Exception rethrown at [0]: 
       at System.ServiceModel.AsyncResult.End[TAsyncResult](IAsyncResult result)
       at System.ServiceModel.Activation.WorkerProcess.EndDispatchSession(IAsyncResult result)

    Process Name: SMSvcHost
    Process ID: 4056

  2. An entry in the WCF traces for the service (w3wp.exe or your process using port sharing service) showed the following error:

    System.ServiceModel.CommunicationException: There was an error reading from the pipe: The pipe has been ended. (109, 0x6d).
    --->System.IO.PipeException: There was an error reading from the pipe: The pipe has been ended. (109, 0x6d).
    at System.ServiceModel.Channels.PipeConnection.OnAsyncReadComplete(Boolean haveResult, Int32 error, Int32 numBytes)
    --- End of inner exception stack trace ---
    at System.ServiceModel.Channels.PipeConnection.EndRead()
    at System.ServiceModel.Channels.DelegatingConnection.EndRead()
    at System.ServiceModel.Channels.TracingConnection.EndRead()
    at System.ServiceModel.Channels.SessionConnectionReader.OnAsyncReadComplete(Object state)

What causes this to occur? 

 It turns out there are three known issues (there may be more) that can cause this problem:

  1. The CPU Spikes to 100% and last longer than 1 minute.
    • System CPU is too busy to process IO operations.   If the system is too busy, it will not be able to process incoming requests from SMSvcHost.exe and those requests will timeout.
    • The best tool for figuring out if this is what is causing your problem is going to be perfmon - capture CPU perfmon logs for your process and watch for spikes that last for 1 minute or longer.
  2. Long running service operations (> 1 minute) with service throttling enabled. 
    • In one customer’s case, they had configured the MaxConcurrentCalls throttle on the ServiceThrottlingBehavior to 16, then inside of their service operations they were calling out to a back end service.  The calls to the back end service were taking longer than they expected, causing 16 operations to be blocked within the service, which resulted in the MaxConcurrentCalls throttle being hit.   
    • You can detect this problem by turning on warning level trace for WCF.  Any time a throttle is hit, we will log a warning to the trace log.
  3. Service is hosted in IIS and it takes a long time to activate the service when the first message is received. 
    • For example, one customer hit this when the signature validation for the assemblies tried to check for revoked certificates.  The call to check the cert was timing out, which resulted in 15 seconds delay for each signed assembly.  Since they had multiple signed assemblies, the delay was larger than 1 minute. Here is a blog post about certification validation delays and they ways to workaround these problems.
    • In another case, the customer's application was taking longer than 1 minute for the code to compile.  Pre-compiling the application before deploying is one possible workaround in this instance.

Digging into the problem: 

When using TCP port sharing or activation in IIS, the listening TCP socket actually resides inside SMSvcHost.exe.  When a new connection arrives from a client, we read the “preamble” on the connection, which includes the via address which helps us know which service needs to handle this client connection.  Once we identify the target service that should handle the connection, we duplicate the socket handle and then send it over a named pipe connection to the target service.  In current releases, we have hard coded the timeout for the call to give the socket handle to the service to 1 minute.  If that call takes longer than 1 minute, we kill the named pipe connection and log the event log. 

The reason the “handle is now leaked in the process” is due to the way the socket handle gets duplicated.  When calling into windows to create a duplicate handle to that socket, we tell the API which process should own the duplicated handle.  Since the duplicated handle is owned by the target service process, SMSvcHost.exe cannot close the handle – only the target process can close it.  Given that we failed to let the other process know that it now owns that handle, it will not close that handle until that process is closed and the OS does the auto-closing of all opened handles.

Most of the customers I have worked with on this issue have been able to capture a memory dump of the service process (e.g. w3wp.exe) by breaking into the debugger on the first chance exception of type System.IO.PipeException.  This dump allows you to see what is happening inside the service process at the time the named pipe connection gets closed due to a timeout inside of SMSvcHost.exe. 

Unfortunately, there isn’t currently a fix available, but we will try to address some of these pain points in upcoming releases.  If you find that this issue is blocking you, please contact customer support.

Update: A fix is now available.