http://kennyw.com/indigo/51
I’m going to talk about the internals of WCF Bufferpooling as it stands today. Kenny and Andrew already have a posts on bufferpooling that cover them at a higher level. Here I dive into how the bufferPool was optimized further for working with the large object heap. This didn’t make it into .NET 4.0 RTM but we created a hotfix that customers can download if they hit this specific issue of fragmentation and garbage collection - http://support.microsoft.com/kb/983182
This doesn’t impact most customers and since under steady state and when the application has been running for a while they generally won’t observer any issue at all.
For services which were transferring a large amount of data per operation, we noticed a large number of un-rooted buffers in the large object heap (LOH - http://msdn.microsoft.com/en-us/magazine/cc534993.aspx). The expectation is that the buffer manager would reuse and hold on to the buffers that were allocated. But there were a large number of objects in the large object heap that did not have any valid root indicating that there were allocated and dropped. The number of such non-rooted objects would spike sometimes and cause the LOH to grow to about 1.8GB until a full GC occurs. This is generally not an issue once the application reaches a stable state but until it does there can be very large GC churn.
Here is a snapshot from the debugger which shows the LOH and we see only 2 objects rooted to the buffer pool
You can see that the memory has a large number of 8MB chunks of which only a few are rooted to the bufferPool and the rest are actually available for garbage collection which can take a while. Such a spike in memory might be of concern for memory sensitive app. Remember these will be reclaimed once a full GC happens. For most application this would not be an issue since over time the pool would be trained automatically and you would see the applications usage reflect the allocation patterns.
Internally we have a class called the PooledBufferManager and to understand this issue we need to have an idea as to how the PooledBufferManager uses the InternalBufferPool where the actual implementation exists. The InternalBufferPool uses a type called SynchronizedPool<byte[]>. The main purpose of the SynchronizedPool is the lock free nature and training heuristics it provides. It has a capability to build thread affinity for buffers and tune quota depending on usage. This is one highly tuned piece of engineering. The PooledBufferManager holds a number of BufferPools of whose sizes increase by power of 2 and one pool who size is equals the MaxBufferSize.
The Synchronized pool itself contains a locked global pool and it starts off with space for just a single entry. As the pool gets used more and more the number of takes and returns are used to compute whether the number of items in a BufferPool should grow which is done by PooledBufferManager. The side effect is that till the size grows large enough, buffers can’t be returned and all the newly allocated buffers would be dropped and GC would have to collect them. Concurrent requests would obviously cause a number of such buffers in flight but only a few can be returned till the buffer pool grows enough to hold all the returns. The training does not allow very fast growth and the size of the pools grow only one at a time.
This is not a problem for object lesser than 85000 since they can be collected by ephemeral GC mostly in gen0. The issue with objects that are larger than 85000 is that they would be allocated on the large object heap which does not get collected until a full GC actually happens. So during training large buffers would be dropped but not collected and would wait for a full GC till they are reclaimed.
The second part of this issue occurs when a particular buffer builds an affinity with a thread. Once a buffer is associated with a thread, it is moved from its global pool into the per thread pool. This also means that that a buffer can be taken and returned only by that particular thread until enough number of misses occurs to promote another thread. If another thread tries to return the buffer but the total number of buffers/entries in the SynchronizedPool is already maxed out then these would also be dropped and requests by other threads would also cause allocation.
Due to these two major code flow paths of the Synchronized pool we noticed very large number of drops and allocations.
To avoid hitting this code path for large objects we use a simpler pool that does not have any training and only a synchronized stack like the GlobalPool in the SynchronizedPool<T>. This would cause quota tuning to occur much faster and also never drop a buffer if there is available capacity and since there is no thread affinity there would not be any drops during return if there is an available slot on the stack.
Hence we need the BufferPool to support two different kinds of behaviors.
1. For buffers < 85000 we need the default behavior of the buffer pool since scenarios with very small amounts of data are very sensitive to changes.
2. For buffers >=85000 we use a new simpler Pool that doesn’t use the SynchronizedPool<T> and will save GC cost.
This means that the buffer pool would hold onto LOH objects much more aggressively. The result is a POOL with larger buffers will be formed more quickly and hence the reduce GC since there would be lesser large buffer allocations.
netsh interface ipv4 show subinterfaces
Does a referenced assembly get loaded if no types in the assembly are “not used”?
The term used is is very subjective. For a developer it would mean that you probably never created an instance or called a method on it. But this does not cover the whole story. You can instead consider what are the reasons for an assembly load occurring. Suzanne’s blog on Assembly loading Failures would give you a good understanding of failures if that is what you are interested in. This post focuses on how to identify what exactly is causing an assembly to load.
We in the WCF team are very cautious on introducing assembly dependencies and how how our code paths can cause assembly loads since this impacts the reference set of your process. Images that get loaded during a WCF call can become the cause of slow start up since every assembly is a potential disk look up and larger the number the higher the impact to startup. As a guidance for quick app startup is that you can eliminate a lot of the unnecessary assemblies from being loaded to speed up application startup if you refactor types properly.
To demonstrate the example here is a simple application with 2 dependencies – For a simple program shown below will “b.dll” be loaded?
namespace App { using a; using b; class Program { static void Main(string[] args) { Invoke(); } private static void Invoke() { TestClass.Helper(); } } class TestClass { public static FromB testInstance = null; public static void Helper() {} public static void Helper(FromA b){} public static void Helper(FromB b) { } } }
YES!! Even though TestClass doesn’t actually have an type from b.dll instantiated. There is a type defined which would cause JIT to resolve the type in TestClass when the static method is invoked on that. However a.dll is not loaded because JIT doesn’t need to compile all static methods on the type. Here is the stack when b.dll is loaded. You can see that the field is being resolved which will cause a module load. You can set a break point on module load using sxeld <dllName>
0:000> sxeld b.dll 0:000> g (1a18.1a64): Unknown exception - code 04242420 (first chance) ModLoad: 5bb00000 5bb08000 b.dll eax=00000000 ebx=00000000 ecx=00000002 edx=00000000 esi=766e4cc8 edi=0046b65c eip=77a9fc02 esp=0039a668 ebp=0039a6a0 iopl=0 nv up ei pl nz na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206 ntdll!ZwMapViewOfSection+0x12: 77a9fc02 83c404 add esp,4 0:000> k ChildEBP RetAddr 0039a668 76b6dfae ntdll!ZwMapViewOfSection+0x12 0039a6a0 6e366ffc KERNELBASE!MapViewOfFileEx+0x81 0039a6e8 6e36703e clr!CLRMapViewOfFileEx+0x23 0039a708 6e37366a clr!CLRMapViewOfFile+0x19 0039a970 6e373745 clr!MappedImageLayout::MappedImageLayout+0x1ce 0039a9ac 6e3737b2 clr!PEImageLayout::Map+0x2d 0039a9e0 6e366467 clr!PEImage::GetLayoutInternal+0x7c 0039aa28 6e3641a8 clr!PEImage::GetLayout+0xdd 0039aac0 6e363ce2 clr!RuntimeOpenImageInternal+0x131 0039ab04 6e363d6c clr!GetAssemblyMDInternalImportEx+0x9c 0039ab1c 6e47dfda clr!CreateMetaDataImport+0x16 0039ab58 6e47e08c clr!CAssemblyManifestImport::InitAndLoadMetaData+0x4f 0039ab88 6e47e610 clr!CreateAssemblyManifestImport+0x67 0039b24c 6e47f5fa clr!CAsmDownloadMgr::CreateAssembly+0x208 0039b5a4 6e47eb8c clr!CAsmDownloadMgr::DoSetupRFS+0xb7 0039b828 6e47e973 clr!CAsmDownloadMgr::DoSetup+0x22d 0039b888 6e47cd09 clr!CAssemblyDownload::DoSetup+0xae 0039b8bc 6e47ce9e clr!CAssemblyDownload::DownloadComplete+0xb6 0039bb24 6e47cf03 clr!CAssemblyDownload::KickOffDownload+0x35e 0039bbb4 6e36ae31 clr!CAssemblyName::BindToObject+0x8be 0039bc48 6e36aff2 clr!FusionBind::RemoteLoad+0x229 0039bcd4 6e36b180 clr!FusionBind::LoadAssembly+0x116 0039bf84 6e369e3b clr!AssemblySpec::FindAssemblyFile+0xf4 0039ccb4 6e367d35 clr!AppDomain::BindAssemblySpec+0x969 0039cd58 6e367bc1 clr!PEFile::LoadAssembly+0xbf 0039ce10 6e367e63 clr!Module::LoadAssembly+0x137 0039d0a4 6e321f02 clr!Assembly::FindModuleByTypeRef+0x256 0039d0f4 6e30f8fc clr!ClassLoader::LoadTypeDefOrRefThrowing+0xfb 0039d218 6e330a2a clr!SigPointer::GetTypeHandleThrowing+0x960 0039d26c 6e330b7d clr!CEEInfo::getFieldTypeInternal+0x8b 0039d310 6f864b13 clr!CEEInfo::getFieldInfo+0x35c 0039da30 6f85277d clrjit!Compiler::impImportBlockCode+0x3641 0039da94 6f85291e clrjit!Compiler::impImportBlock+0x7a 0039daac 6f85296a clrjit!Compiler::impImport+0x1bf 0039dab8 6f8529a7 clrjit!Compiler::fgImport+0x20 0039dac8 6f853b02 clrjit!Compiler::compCompile+0x45 0039db04 6f853c1a clrjit!Compiler::compCompileHelper+0x2dd 0039db78 6f853d54 clrjit!Compiler::compCompile+0x1e2 0039dc58 6f8545d9 clrjit!jitNativeCode+0x154 0039dc7c 6e3334b6 clrjit!CILJit::compileMethod+0x25 0039dce0 6e333535 clr!invokeCompileMethodHelper+0x5c 0039dd28 6e33357b clr!invokeCompileMethod+0x31 0039dd90 6e3337e1 clr!CallCompileMethodWithSEHWrapper+0x2e 0039e0f8 6e33f817 clr!UnsafeJitFunction+0x3eb 0039e1d0 6e33f97f clr!MethodDesc::MakeJitWorker+0x288 0039e240 6e3264f6 clr!MethodDesc::DoPrestub+0x49d 0039e2b8 6e302dbf clr!PreStubWorker+0x134 0039e2e8 6e30297e clr!ThePreStub+0x16 0039e2f8 6e323bc9 clr!CallDescrWorker+0x34 0039e374 6e324dfb clr!CallDescrWorkerWithHandler+0x8d 0039e400 6e324e64 clr!DispatchCallDebuggerWrapper+0x6f 0039e43c 6e328198 clr!DispatchCallSimple+0x4a 0039e4b4 6e3280cc clr!MethodTable::RunClassInitEx+0xdb 0039ee00 6e3ad21d clr!MethodTable::DoRunClassInitThrowing+0x4e5 0039ee68 6e3264f6 clr!MethodDesc::DoPrestub+0x11b 0039eee0 6e302dbf clr!PreStubWorker+0x134 0039ef10 005900c8 clr!ThePreStub+0x16 0039ef18 0059008c ConsoleApplication1!App.Program.Invoke()+0x18 0039ef24 6e30297e ConsoleApplication1!App.Program.Main(System.String[])+0x1c 0039ef34 6e323bc9 clr!CallDescrWorker+0x34 0039efb0 6e3246ec clr!CallDescrWorkerWithHandler+0x8d 0039f0f4 6e32471f clr!MethodDesc::CallDescr+0x18c 0039f110 6e32473d clr!MethodDesc::CallTargetWorker+0x1f 0039f128 6e4a863d clr!MethodDescCallSite::Call_RetArgSlot+0x1a 0039f290 6e4a8582 clr!ClassLoader::RunMain+0x24c 0039f4f8 6e4a8b98 clr!Assembly::ExecuteMainMethod+0xbf 0039fa00 6e4a8c24 clr!SystemDomain::ExecuteMainMethod+0x568 0039fa58 6e4a8d5c clr!ExecuteEXE+0x58 0039faa4 6e452e85 clr!_CorExeMainInternal+0x19a 0039fadc 716ba791 clr!_CorExeMain+0x4d 0039faec 71727f16 mscoreei!_CorExeMain+0x4a 0039fafc 71724de3 MSCOREE!ShellShim__CorExeMain+0x99 0039fb04 766e3677 MSCOREE!_CorExeMain_Exported+0x8 0039fb10 77ab9d72 KERNEL32!BaseThreadInitThunk+0xe 0039fb50 77ab9d45 ntdll!__RtlUserThreadStart+0x70 0039fb68 00000000 ntdll!_RtlUserThreadStart+0x1b 0:000>
logman is your tool for this. Here is how you can query for all the sessions and also how to see values from a particular session.
c:\> logman -ets Data Collector Set Type Status ------------------------------------------------------------------------------- AITEventLog Trace Running Audio Trace Running DiagLog Trace Running EventLog-Application Trace Running EventLog-System Trace Running NtfsLog Trace Running SQMLogger Trace Running UBPM Trace Running WdiContextLog Trace Running MpWppTracing Trace Running FSysAgentTrace Trace Running MSMQ Trace Running MSDTC_TRACE_SESSION Trace Running test_trace Trace Running The command completed successfully. c:\> logman test_trace -ets Name: test_trace Status: Running Root Path: C:\ Segment: Off Schedules: On Segment Max Size: 500 MB Name: test_trace\test_trace Type: Trace Output Location: C:\09_19_44.etl Append: Off Circular: On Overwrite: Off Buffer Size: 8 Buffers Lost: 0 Buffers Written: 1 Buffer Flush Timer: 0 Clock Type: Performance File Mode: File Provider: Name: Microsoft-Windows-Application Server-Applications Provider Guid: {C651F5F6-1C0D-492E-8AE1-B4EFD7C9D503} Level: 5 KeywordsAll: 0x0 KeywordsAny: 0xffffffff Properties: 0 Filter Type: 0 The command completed successfully.
WCF enables throttling execution of operations but not their completions. This becomes and issue when a large number of outstanding operations complete almost simultaneously causing the callback on the client to be overwhelmed with completions. Generally we don’t expect the client to issue of infinite number of pending operations but if you do end up with very high CPU usage and all suspect all your operations are stuck in the callback method which takes a lock then you need to throttle the callbacks yourself.
You could try Setting the minThreads but this affects the whole app domain. The issue is due to the large number of callbacks that come in concurrently. The sample attached throttles the callbacks to have only one thread execute completions while there are 20 threads starting the operations and all completing almost simultaneously. The idea is to wrap the AsyncResult of your operation and complete only the required number of results in parallel and this would throttle the service operation Ends automatically.
With xperf being more and more adopted and with rich stackwalking capabilities, its only natural to use it for finding out bottlenecks and cause for switch out.
Findout the ready thread information and what causes the threads to switch out and the associated stack that woke up when a thread switches back in is one way to determine what was the offending stack that causes other some other threads to switch out. This helps us identify potential hot locks or just really expensive locks or issues due to false data sharing.
You can run the following command to capture stack traces with ready thread information.
xperf –on base+cswitch+dispatcher –stackwalk cswitch+readythread
Incase you are not sure of how to debug managed code with with a crash/hang dump, then you most likely need to read this first. Once you have SOS and mscordacwks(.net 3.5 and up) loaded you first dump the heap to find out if you have any services hosts at all.
Some broker implementations require creating a copy the message forwarding it over to the backend. The broker also might slightly modify things like addressing headers etc. on the message for proper message routing within the DMZ. The problem is that we see a very high CPU cost in creating this copy message and this also results in lower throughput. Note: Streamed transfer mode is not in scope for this article.
For all performance issues we need to measure and profile and to investigate this issue we initially try to simulate the pattern of the broker by just copying over the message and then forwarding it to a backend dummy service. We then take profiles of this to understand how much the actual cost of copying is.
21,530
8.5
0.1
0.29
0.03
An 8% cost for copying seems to be acceptable considering the value of making the copy and able to do other things if required. But then again this was not what is being observed. In the profiles from the actual broker we notice about 40% cost for creating a copy. This means that almost half the time is spent in creating a message copy. So effectively your throughput would almost drop to half when the broker is configured to create a copy of the message. This is excluding costs like logging etc. Evidently our simulation is not accurate so we need to isolate this further. We take in more functionality from the broker so that we hit this expensive path. One of the key observations was that the message is copied just before it is being forwarded. This also means that there are a bunch of manipulations that was done on the message and in our simulation we didn't perform any manipulation. So to get this closer we need to probably change some things on the message. To keep it simple we did something like removing some header and adding another header to the message since most brokers modify headers before forwarding it over.
int headerIndex = input.Headers.FindHeader(header.Name, header.Namespace);
if (headerIndex >= 0)
{
input.Headers.RemoveAt(headerIndex);
}
input.Headers.Add(header);
Eureka!! We observed our throughput went down and this was in line with what we were seeing in our broker. So we can see that CreateMessage and CreateBufferedCopy have increased in cost quite a bit.
12,782
14.83
0.08
27.69
0.02
So this was performance data we collected.
Copy and Forward
Copy forwarding with new header
98.6 %
11317.6545777148
98.7 %
6854.97017102
Now that we have identified the root cause we also need to identify the solution so that the broker can achieve the functionality without taking up so much CPU.
The solution is actually quite simple "Modify your message after you create the buffered copy". I wanted give the solution before the analysis since most of you would probably not be interested in the analysis but if you are then the rest would be interesting.
The most common way to create a copy your message is using Message.CreateBufferedCopy(int).
If headers have been modified then CreateBufferedMessage takes an alternative path using the DefaultMessageBuffer. The reason is that a fully copy of the message has to be created if any buffered header has been modified. An internal property called headers.ContainsOnlyBufferedMessageHeaders is used to distinguish if the faster BufferedMessageBuffer can be used to create the buffered copy or not. If there are any modified headers then this means we need to assure that the message is fully marshaled over and the buffer itself cannot be copied(e.g. the user can add a reference type to the header) and so we fall back to a path that would fully reparse the message and create a fully deserialized copy of the modified message. The main point here is a copy should always be a deep copy and any kind of modification should not result in a message with shallow copied message parts. When you copy and create a message from the original then your message objects get its own copy of headers that it can play around with without affecting the original incoming message. Message copy by itself is a fast operation as you can see from the above profile and copying a modified message can be very CPU intensive when using buffered transfer mode.
For greater flexibility our router can be something like a pass through router. If we are just calling a backend service then we can use a generic contract to receive and forward messages to the back end service as shown below.
Here we create a copy of the message to consume locally on the broker incase we want to validate some parts of the message or log etc. Ideally the fastest would be to just directly forward it over but application sometimes require all incoming messages to be logged or validated at the entry point of the DMZ.
Next – How to Optimize Message Copying using CreateBufferedCopy?
I use the term broker and router very loosely here since they follow very similar guidelines as described here - WCF Broker Overview. Apologies for not being very rigid with these terms.
I will dive into best practices of building a router by progressing from a very simple implementation to a robust one through different scenarios and varying degrees of complexities.
The easiest implementation is by using a strongly typed contract with message forwarding as shown below.
[ServiceContract] public interface IOrderService { [OperationContract] Order[] GetOrders(int numOrders); } class OrderService:IOrderService { Order[] GetOrders(int numOrders) { return backendProxy.GetOrders(numOrders); } }
Next – Router Implementation – Message Forwarding – Copy/Pass through
A broker is usually a central point for message forwarding and pass-through for clients and backend services. There are many types of brokers that come into mind
But they mostly fall into the below basic model unless the client manages to bypass the broker with a P2P communication with the backend itself.
Router Implementation – Strong Typed with Message Forwarding
Aug’5’10
Here are 2 really good articles on MSDN that I would recommend for the functional aspects to architect your your router.
I had never thought that I would actually write about a topic like this but sometimes you want to organize your thoughts and have an opinion on things. Being in the performance team for WCF has got me used to a plethora of message exchange patters which we lovingly refer to as MEP. There exists a broad spectrum of coding and implementation styles which we see day in an day out. There are those that are extreme and elaborate and overwhelmingly flexible and also those that are so convoluted and rigid that its almost close to assembly.
Its good to know know what message exchange patterns would be most suited for his or her needs. I think its an overkill to adopt a strategy where your application will force itself to use only a single message exchange patter. For example an ideology like “We will do only rest style request reply throughout our system” The number of layers we need to add in order to align ourselves with a philosophy like this would probably outweigh the benefits that it provides, specifically in scenarios that aren’t suited for patterns like these.
There is a really nice article on MSDN listing out 6 message exchange patters http://msdn.microsoft.com/en-us/library/aa751829.aspx To quickly reiterate they are Datagram/Request-Reply/Duplex and 3 of these with sessions on top. You can think of a session like a logical abstraction to say that the message is a part of a conversation. This has nothing to do with asp.net sessions and it is a way for WCF to correlate messages.
My experience it is generally more helpful to classify your problem and see what pattern really helps your issue and then slap on the contracts and protocols rather than fixing on the protocol/MEP and then forming a solution around that. I choose not to be an advocate of any particular style but I am against protocol fanatics who are inflexible and who believe that there are a fixed set of choices for certain types of scenario.
Systems are organic and so its hard to freeze implementations. The fact is patterns are similar too. Today you might be ok with TCP but there is nothing stopping you from switching to queues. As the system grows and there would be solutions you put in place to facilitate this kind of a change. Layers get added and MEPs also change.
If you have any questions on how to make a choice I would gladly try to help out.
To modify properties that are not exposed on the standard binding we can create a CustomBinding from the provided standard binding. We can then find the element required on the particular CustomBinding and tweak it. Another option would be to just hand craft the full standard binding if you know exactly how to stack up the elements. Here is an example to how to tweek the IdleTimeout.
private static CustomBinding GetIdleBinding(int mins) { NetTcpBinding tcpBinding = new NetTcpBinding(); CustomBinding customBinding = new CustomBinding(tcpBinding); TcpTransportBindingElement transport = customBinding.Elements.Find(); transport.ConnectionPoolSettings.IdleTimeout = TimeSpan.FromMinutes(mins); return customBinding; }
ErrorLevel is not %ERRORLEVEL% . This is probably the first one you should read.
Next is the usage of the ERRORLEVEL statement. http://support.microsoft.com/kb/69576
The following table shows the effect of statement when writing your batch scriipt.
Statement Algebraic Equivalent. IF ERRORLEVEL 5 ... IF E = 5 OR E > 5 THEN ... IF NOT ERRORLEVEL 6 IF E < 6 THEN ...
Here is a sample using findstr and error level. Findstr returns 0 if it successfully finds any occurrence.
findstr -sip Failed log.txt > NULL IF NOT ERRORLEVEL 1 ( echo Found. ) else ( echo Not Found. )
If you ever wanted to copy over a really long multiline dos command or output, what would you usually do?
You could just try this little utility called clip.exe, if you haven’t tried it already.
Here is a script to quickly create and delete queues. This was based out of this post.
Usage : CreateQueue.ps1 <-c,d> <queuename> <Y/N - Private> <user> <all:restricted Permission> [T:Transactional]
WCF gives a very rich set of standard bindings that you can use for your endpoints. However we might need to tweak properties that might not be exposed on the standard bindings. You can handcraft the whole binding or you can start with standard binding as a template. Here are some ways.
BasicHttpBinding httpBinding = new BasicHttpBinding(); httpBinding.SendTimeout = TimeSpan.FromSeconds(123); Console.WriteLine(httpBinding.ToString()); BindingElementCollection bec = httpBinding.CreateBindingElements(); bec.Find<HttpTransportBindingElement>().KeepAliveEnabled = false; CustomBinding copy1 = new CustomBinding(bec); Console.WriteLine("SendTimeout = {0} KeepAliveEnabled = {1}", copy1.SendTimeout, copy1.Elements.Find<HttpTransportBindingElement>.KeepAliveEnabled); CustomBinding copy2 = new CustomBinding(httpBinding); copy2.Elements.Find<HttpTransportBindingElement>.KeepAliveEnabled = false; Console.WriteLine("SendTimeout = {0} KeepAliveEnabled = {1}", copy2.SendTimeout, copy2.Elements.Find<HttpTransportBindingElement>().KeepAliveEnabled);
Here was an interesting set of questions comparing WCF & WWS
1) With .NET 4.0, are we going to see any improvement to close the gap? 2) There seems very little information about this except on Channel 9 or various blogs. Are we going to see more information? 3) Is there a way to get the best of both WWS and .NET in a typical application? For example, use WWS for the web service API for
Here is the reply from Bob (our Product Unit Manager)
WCF and Windows Web Services (WWS) are complementary technologies. WCF is the premier Web Services stack to use when writing managed applications; if you are writing native code and want a SOAP stack then definitely use the WWSAPI. Introducing a native SOAP stack underscores our commitment to interop and WS-*. WWSAPI supports a subset of WS-* and is not as full featured or extensible as WCF. It definitely has a smaller footprint than WCF and it also has higher throughput for the scenarios it supports. This is due to a reduced feature set and implementation in native code. It also interops on the wire with WCF.
To answer the specific questions below:
Note: Cross posted from Sajay.
We generally need to have a quick set of performance counters to identify a performance issue with a service. Shown below are three new counters that you will find with WCF 4.0. I also want to emphasize on the Calls outstanding counter here since this is one very useful and is indicative of calls getting queued up.
PercentOfMaxCalls
Reports # of active requests as % of max calls
Reports the numbers of messages being processed + in the wait queue as a percentage of MaxConcurrentCalls throttle
PercentOfMaxSessions
Reports # of active requests as % of max sessions
Reports the numbers of active instances + calls in the wait queue waiting for an instance as a percentage of MaxConcurrentInstances throttle
PercentOfMaxInstances
Reports # of active requests as % of max instances
Reports the numbers of messages in the wait queue due to MaxConcurrentSessions throttle
CallsOutstanding ( from 3.0)
Reports the # of calls waiting to be completed
Reports the number of in-progress calls to this service.
Here you see that the %Instance throttle has maxed out. What do you think we should do? You can guess what throttle we are hitting here.
Some common questions when trouble shooting performance problems:
% of MaxConcurrentCalls gives you an indication of how much closer you are to hitting your max throttle value. This means that if you have a max concurrent call of 10 and you have 10 outstanding calls then you have utilized 100% of your throttle and there is no more work that your service can do. So if you see that your % maxConcurrentCalls is very high it probably indicates you have a very low throttle value. Remember the default is 16 till v3.5 and 16* proc count for v4.0. Before 4.0 you needed to work this out by checking the calls outstanding throttle and your web.config to figure out this value. These new % performance counters would help you identify if you are hitting the max values from now.
We have bumped up the default number of sessions for service throttling behavior to 100 per CPU. But then again if you are seeing that %MaxConcurrentSessions is very high this means that you have exhausted all your session. This is probably because your clients are not closing either proxies and terminating sessions when you have less clients or possibly due to a very small MaxConcurrentSessions configuration.
This was one was interesting as the service was exposed as a fully typed service, but the client wanted to modify some parts of the xml. Ideally you can plug into any part to perform these operations, but the general requirement here was that they needed a simple pointer to the Body and didn't care much about anything else.
For example you can get the XElement from the body using a simple operation and use the Message object on the client to get a pointer to the body.
[ServiceContract(Namespace="http://www.sajay.com/")] public interface IClientProxy { [OperationContract(Action="http://www.sajay.com/getdata", ReplyAction="http://www.sajay.com/getdataresponse")] Message DoWork(); } IClientProxy proxy = cf.CreateChannel(); Message msg = proxy.DoWork(); XNode node = XElement.ReadFrom(msg.GetReaderAtBodyContents());
[/code]
So the service can return a typed object, but the client doesn't have to handle the message in the same way. Understanding the Message class here helps you appreciate how the serializer and encoder is decoupled from the message representation. Basically WCF does not constrain the native format of your message on either the producer's or the consumer's side. It merely conforms to some rules of invoking a set of abstract classes classes which you wire up by virtue of specifying the encoder and the serializer. The fact that we have XML semantics around with abstract classes like the XmlReader/XmlDictionaryWriter etc doesn't actually mean that WCF converts everything to XML before it can process the message. It merely uses these to invoke the required parts to obtain things like the header/body and other properties. The Message however can be represented natively in any form like binary/json/atom/ or any whacky type as long as you wire them up properly and can retrieve them in some manner.
This should give you some insight into how much more powerful the message model is compared to simple RPC style contracts. http://msdn.microsoft.com/en-us/library/ms734675.aspx With 4.0 and workflow services, you will see this becoming more and more the default way of thinking about messages and services.