Customers reported the following WCF performance issue recently:

·        A WCF client sends 10 requests concurrently to a self-hosted WCF service on 10 different threads. The service operation is very simple and it simply sleeps for 1 second. But the latencies for the 10 requests on the client side were distributed from 1 to 3.3 seconds. Why is this?

First of all, WCF uses managed I/O threads to handle requests. The CLR ThreadPool keeps a certain number of idle I/O threads from being destroyed. When more I/O threads are needed, they are created by the ThreadPool, which is kind of expensive. The number of idle threads is specified by the “MinIOThreads” setting. You can use the ThreadPool.GetMinThreads() API to check what settings you have for your application. By default in a standalone application, this setting is the number of the CPUs that you have on the machine. For example, on my laptop with 2-core, this setting is 2. I observed the following delay numbers in the above scenario:

Thread 0 takes: 1009 ms

Thread 2 takes: 1286 ms

Thread 3 takes: 1799 ms

Thread 4 takes: 2016 ms

Thread 6 takes: 2241 ms

Thread 5 takes: 2256 ms

Thread 7 takes: 2752 ms

Thread 8 takes: 2766 ms

Thread 9 takes: 2967 ms

Thread 1 takes: 3315 ms

Because of this, you would want to bump up this MinIOThreads setting with the ThreadPool.SetMinThreads() API on the service side, for example:

ThreadPool.SetMinThreads(Environment.ProcessorCount, 10);

You may notice that this still does not solve the above problem with .NET 3.5. Why?

Eric, who is the CLR ThreadPool expert, told me that there was a bug in .NET 3.5. ThreadPool does not honor what SetMinThreads() provides. So the above logic does not work. Fortunately this has been fixed in .NET 4.0 and the Microsoft has provided the following QFE for 3.5:

Once you have this QFE installed, you would get much better result:

Thread 1 takes: 1026 ms

Thread 0 takes: 1028 ms

Thread 3 takes: 1081 ms

Thread 4 takes: 1074 ms

Thread 2 takes: 1084 ms

Thread 6 takes: 1072 ms

Thread 5 takes: 1073 ms

Thread 8 takes: 1029 ms

Thread 9 takes: 1021 ms

Thread 7 takes: 1154 ms

The sample code is attached.