Note: Cross posted from Sajay.
One of the performance improvements we did with WCF 4.0 was to enable concurrent receives. This greatly helps scenarios where we need to do some kind of work like DB authentication with username passwords or custom channels that may need to log something. Primarily if any path during message receive requires some concurrency, then this is the knob you need to consider.
Here are some common questions and answers regarding concurrent receives and should also help you decide if you need this and if you do what value would benefit you.
<dispatcherSynchronization maxPendingReceives="5" />
Why do we need this?
I have a custom username password that seems to be taking a long time and requests are processed sequentially.
I have a custom channel that is taking a long time and messages are not processed concurrently.
Other extensibility points that causes my channel to block.
Rule of thumb – Do not do blocking work in your channels!!!!
Q: Then where can I do the work?
A: There are 2 areas where work can be done between your application and the transport. You need to understand the receive loop mentioned here. Try to push your work to the dispatcher side as much as possible. These include extensibility points like Message Inspector and try avoiding blocking work in the channel layer. The overhead here is that you need to copy over the message since you cannot modify the message object and hence memory should be the resource you need to think about.
Q: What happens when we do some blocking work in the channel layer?
A: You end up blocking your receive loop! This means that the next message will not be received until you complete your receive loop. The channel’s receive loop is single threaded in nature by default since its only duty is to take a message on the transport and hand it over to the dispatcher. It is a low level message pump which does nothing but give work to the dispatcher so any additional overhead actually means you slow down the main message loop. This will stop other messages from being picked up and dispatched. You generally see client timeouts. You can consider this like the windows message pump which will freeze up your UI.
Q: I don’t have an option and need to do this work in the channel, what can I do?
A: Generally if you needed to do some work at the channel layer and this had to be blocking there was no way other than take care of this issue directly in the channel. This is because the ideal channel doesn’t block. Till netfx 3.5 there was no option besides moving the work up or even doing some really wacky custom channel to do this work on another thread and continue the loop. If you really need a sample ping me. But in 4.0 we kind of understood that there might be cases where you really have no option and are ready to take this performance penalty. For that reason we introduced a knob called MaxPendingReceives that you can use for this.
Q: What does MaxPendingReceives do?
A: This knob issues multiple receives to the underlying channel. You can configure a DispatcherSynchronizationBehavior and specify the MaxPendingReceives value indicating how many concurrent receives can be issued on the channel.
Q: What is the overhead for this?
A: The implementation is a simple queue and the overhead generally is less than a percent for small values. The larger the queue size, the more number of concurrent receives there will be and you see that your throughput doesn’t improve since you see more context switches and contention rather than optimized concurrency.
Q: Why/How do I configure MaxPendingReceives?
A: This value should be considered only after you have properly configured service throttling behavior. These are generally your first level throttles. If you don’t understand the points mentioned in the earlier section regarding the receive loop or you don’t have any of the extensibilities, then you probably don’t need this behavior and need to get take a harder look at your service performance before you even consider turning ON this knob.
Consider an 8 core system here and assume you have some CPU intensive operation like computing hash, that you need to do in the channel. This means you can probably have about 8 messages simultaneously being processed and nothing more since you would just cause the message to be accepted and wait till the CPU is available (remember your service operation also needs CPU time). So if the work is more CPU bound try to limit it to the number of cores.
On the other hand if you have some blocking work that would go ahead and access some disk and you can churn out more messages then you need to tweak this value to a sweet spot and there is no magic number. Generally we have seen that staying close to the number of cores or a factor of 2 has been optimal. Bottom line is – CHECK YOUR CPU UTILIZATION. If its already maxed out then this will only reduce your throughput.
<dispatcherSynchronization maxPendingReceives=”4“ />
Q: Are there any issues to turning on MaxPendingReceives?
A: There are 2 side effects to this. One is the that your channel will issue multiple BeginTryReceives before an End is completed and second is the overhead mentioned. The point you need to keep in mind is that since you are enabling concurrent receives on your channel, your channel also needs to be concurrency aware. The behavior change here is that the channel stack’s BeginTryReceive will be invoked N number of times to enable multiple receives and this fills up the receive queue initially. This is a behavior change in how channels were used till now since the EndTryReceive doesn’t happen immediately after the first BeginTryReceive and also the completions might not happen in the same order. If you require completion ordering, your service should also guarantee ordering. This would be a topic by itself and so the point here is that if the service behavior allows concurrent processing the channel would allow messages also to come in a FIFO completion order.