Kenny has a very good blog entry on this topic. I want to add some more points to this from performance perspective.

InstanceContextMode

By default, the default value of InstanceContextMode for a WCF service is PerSession. This means different things for different bindings. For sessionful bindings such as NetTcpBinding and NetNamedPipeBinding with default settings, this means that a single service instance is used to serve requests from the same client for the lifetime of the connection. For non-sessionful bindings such as BasicHttpBinding, the session is not enabled and thus one service instance is created per request. So for the latter, PerSession has the same effect of PerCall.

InstanceContextMode Performance Impact

Suppose if you have a single client for the service, then for sessionful bindings, you would observe the slightly different throughput of different modes from fastest to slowest:

·         PerSession, Single, PerCall

The reason why Single is slower than PerSession is because the service needs to maintain the global locking mechanism, no matter how many clients call the service. It is reasonable that PerCall is the slowest because the overhead of creating a service instance on each call is pretty high. However, the above order is not the same for non-sessionful bindings. Here is the order from fastest to slowest for non-sessionful bindings:

·         Single, PerCall or PerSession

This is the case because “PerSession” is treated as “PerCall” for non-sessionful bindings.

Scalability and Server CPU Utilization

What would happen if you have multiple clients call the service concurrently? Well, we need to understand how WCF runtime dispatches the requests on the server-side. As Kenny pointed out, this is controlled by another property “ConcurrencyMode” of the ServiceBehavior attribute of the service. The default value is Single. It means that WCF uses a single thread to dispatch all requests sequentially to each service instance. For InstanceContextMode as PerCall and PerSession, this is very natural. However, for Single instance mode, this means that all of the client requests are queued up to be served by the same thread and being dispatched to the same service instance. The advantage for this is that you don’t need to worry about synchronization issue in your service operation implementation. However, this also means that the service is not scalable to handle many client requests concurrently.

For the last case, actually I had a short story. Once I was testing the server throughput for singleton service with BasicHttpBinding. There are 24 clients calling the servicing at the same time. I noticed that the server throughput is low and the CPU utilization was only around 60%. When I debugged the service process, I noticed that there was only one thread actively processing the requests and other requests were in the queue. Suddenly I realized that I used the default setting of ConcurrencyMode.

To conclude this, make sure to set ConcurrencyMode to Multiple if you want to achieve high throughput for your singleton service and make sure to perform your own synchronization in your service operation!