We have gotten a number of cases lately involving applications using Outlook 2010's MAPI failing when connecting to Exchange 2010 servers. The primary symptom observed by our customers is that random MAPI calls to Exchange return MAPI_E_CALL_FAILED. At first blush, there is no rhyme or reason as to which calls fail, except that all of the applications which see the problem have multiple threads using MAPI. It turns out the underlying problem is very complicated and we won't be able to fix it in a hotfix (though we do have a long term plan for a solution), so I wanted to get this information out there in case this problem is affecting you.
The root of this problem has to do with how the Exchange MAPI provider (henceforth called just MAPI here) packs remote operations, or rops, into the RPC buffer to send them to Exchange to be processed. In all versions up until Outlook 2003 (and including all versions of the MAPI download), most rops are sent synchronously to the server as soon as the client calls in to MAPI. So if you were to call GetProps and follow along in the debugger, you would see your call go in to emsmdb32, then to rpcrt4, at which point a packet would be sent on the wire. Your thread would then wait for Exchange to respond, then it would parse the response buffer and hand it back to the client. The MAPI_DEFERRED_ERRORS flag can cause multiple rops from the same thread to get packed together in the same buffer, but the key point here is that everything usually happens on the initial calling thread.
Outlook 2007 changed the model for how Outlook’s MAPI handles MAPI operations. Most MAPI operations are now issued asynchronously. MAPI maintains a worker thread which queues up pending MAPI operations from the various client threads and packs them into buffers as needed, resulting in fewer packets being sent between the client and server. So for instance, if one thread issued an operation to read a stream and another issued an operation to set properties at the same time, both operations could be packed into the same buffer and sent to the server. When the results come back, MAPI decodes them and sends them back to any client threads which were waiting on the results. This has the benefit of greatly reducing the amount of data sent across the wire, which is a primary concern for the developers of Outlook’s MAPI. The side effect of this change is that we can now see operations packed together in combinations that were never possible to see with Outlook 2003. This is the first change which leads to the random MAPI_E_CALL_FAILED results.
On the server side, Exchange 2010 made a series of fixes to prevent looping calls which can overwhelm networks and bring down Exchange servers. The key rop here is RopBufferTooSmall, which is the rop the Exchange server can send back to a MAPI client if the server needs to send more data back to the client than is available in the response buffer. This is a signal to the client that it retry the operation, but this time either use a larger buffer, pack fewer operations into the request buffer, or both. Through analysis of numerous hang dumps we had received over the years, the Exchange team identified a number of scenarios where the server would send this response back to the client and the client's response would be to issue the exact same request the server had just rejected. This would result in a loop, with client and server exchanging the exact same packets until one or both was shut down. The key characteristics in these loops were that RopReadStream was the operation for which the result could not fit in the response buffer, that it was the first and usually only operation in the request buffer, and that we were already using the maximum size response buffer. The fix then was to special case when RopReadStream was the first operation in the request buffer. Instead of returning the RopBufferTooSmall rop, we would return the error ecBufferTooSmall (0x47D, or 1149 in decimal). This change, being more picky about how Exchange 2010 will accept packed rops, is the second change leading to random MAPI_E_CALL_FAILED results.
We combine these changes to see how they interact: prior to Outlook 2007, if Outlook’s MAPI issued a request buffer with RopReadStream as the first operation in the buffer, it was most likely also the last operation. A single client thread was waiting synchronously for the response to this rop. If the response buffer was already maximized and the request still could not fit, re-issuing the request could not be expected to work, so Exchange’s error would be the best response, as it avoids the loop. With Outlook 2007 and higher though, this RopReadStream could be packed in the same buffer as rops for other threads. Some of these rops may carry a large payload, which complicates the math Exchange uses to calculate the remaining space in the response buffer. This calculation occurs before Exchange executes the rops (since it would do no good to execute the rops first and discover we have no room for the results) so Exchange has to assume that every rop in the buffer could potentially fail. The upshot is that RopReadStream gets the short end of the stick, and Exchange concludes it does not have enough space for its response. When RopReadStream is not the first operation in the buffer, we send RopBufferTooSmall back to the client so it can break up the operations and try again. But when it’s the first operation in the buffer, the entire packetful of rops fails outright.
And when they fail, since MAPI doesn’t have a targeted error code to match ecBufferTooSmall, everybody gets the default error of MAPI_E_CALL_FAILED.
The Exchange and Outlook development teams both take this issue very seriously and want to correct this. That said, the risk of regression when you start playing with how Outlook packs rops or how Exchange calculates response buffers is incredibly high. The amount of testing we can do for a hotfix does not reduce this risk enough for us to be comfortable approaching this as a hotfix. So the Outlook team has agreed to attempt a fix for this in the next version of Outlook, currently under development. By getting the fix in to Outlook this early in the development cycle, we’ll have plenty of opportunity to test the fix, and also to observe how it affects clients in both dogfood and beta deployments. We will evaluate our options for Outlook 2010 if/when it comes time to consider what goes in to SP2, depending heavily, of course, on how things go in the next version.
Given the nature of this problem, if your application is multithreaded, uses Outlook’s MAPI, and reads streams, there is a high likelihood you will run in to this problem at some point. Here are the options we’ve identified so far, in no particular order:
Hopefully one of the first three options is available to your application.
Given that this problem is one that is going to happen despite your best efforts to avoid it, it helps to be able to identify the issue. The best diagnostic here is to use Exchange’s RPC Client Access (aka RCA) logs, which are on by default. These logs contain an entry for every time Exchange failed a call with ecBufferTooSmall. The key word to look for in these logs is BufferTooSmallException. You should expect to find several entries with BufferTooSmall, without Exception. These are the normal retry results and can be ignored. Here are a couple of examples of the log entry you need to look for:
2011-03-17T08:23:58.188Z,2,8548,/o=org/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=User,, customerapp.exe,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,1149 (rpc::BufferTooSmall),00:00:00,, RpcDispatch: [BufferTooSmallException] Une exception de type 'Microsoft.Exchange.RpcClientAccess.BufferTooSmallException' a été levée.
2011-09-15T13:59:20.762Z,5,1351,/o=org/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=User,, customerapp.exe,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,,>ReadStream;>FastTransferDestinationPutBuffer, RpcDispatch: Exception of type 'Microsoft.Exchange.RpcClientAccess.BufferTooSmallException' was thrown.
You'll find these logs on the Client Access Server (CAS) for Exchange 2010 in the following directory: %ExchangeInstallPath%\Logging\RPC Client Access
Note that the second entry contains more information on the other operations which were included in the buffer. While you don’t need this information to identify the problem, it might be helpful in matching the logs to your own error logs. Here’s how to crank that logging higher (note that this must be done on every CAS for which you want greater logging:
Now Exchange will log the rops which were in the buffer. By the way, RCA logs are great for troubleshooting other issues as well. Any time Exchange 2010 isn’t returning the results you expect, these logs are a good place to start. Also, I should remind anyone looking at these logs that we do know at least one other reason you might see BufferTooSmallException. I discussed it in my article on Large Multivalued Properties.
This is a complicated issue that can present itself in multiple ways (due to the random nature of which calls fail). Even though I was the engineer who discovered it and figured out what was happening, I still worked two further issues where this issue was the root cause but did not recognize it. In one case I happened to get debug trace that reminded me to check for this issue, and in the other, in desperation when no other troubleshooting was working, I rolled the dice and had the customer check the RCA logs on the off chance this was their issue. It turned out it not only lined up perfectly, but the customer was able to apply the same troubleshooting to other sites experiencing issues they though were unconnected and confirmed the issue was happening there as well. So while we don’t have a fix for this yet, at least we can determine when the problem is happening and stop troubleshooting.
If development informs me of any plans to fix this earlier than the next version of Outlook, I will report them here. However, don’t ask me if or when SP2 is expected. For that, track the Office Sustained Engineering blog or the Update Center for Microsoft Office.
This issue was fixed on the Exchange 2010 side in SP2 RU3. Read the details of the update here, and get the rollup here.
Nice post, pretty much explains the weird errors we are getting with our application sometimes. Can you please shed some light on the CopyTo operation? Is it the one which produces the biggest buffer, so practically all CopyTo operations has to be serialized? It means that 2 messages can't be copied concurrently from different threads - they will always result in an error. The side effect is that there is no concurrency with Outlook 2010 MAPI?
They won't *always* result in an error. In fact, for most of the cases where I've looked at this, the error wasn't all that common (maybe a couple times a day). It requires a good bit of luck (or bad luck) for the rops to line up correctly to see the problem. However, a long running multithreaded process that does the right sort of operations is going to see the problem sooner or later.
Hi Stephen - Great informative blog entry. From what I can tell you saying this impacts both OL2007 and OL2010 MAPI - is that correct?
We have seen the problem with Outlook 2010. We have not yet seen the problem with Outlook 2007, but we don't know that it can't happen there.
Hi Stephen, since three weeks we also have Random MAPI_E_CALL_FAILED Errors With Exchange 2010, but we use CDO-MAPI 6.5.8244.0 (1-3 errors per day). We have no problems with Outlook-MAPI. It happens with Exchange 2010 SP1 and still with SP2 after an update. By uninstalling and installing CDO-MAPI, the problem will be fixed -until the next failure. Rpc logs do not show a "buffer to small"-exception. On the Exchange Server we did not find any significant error events. The throttling policy is set to unlimited.
- Is it possible to log MAPI-CDO-activity continously not on the Exchange server, but on the machine on which MAPI-CDO/mfcmapi is installed?
- Is there a current MAPI_E_CALL_FAILED-issue concerning MAPI-CDO and Exchange 2010?
George - what you're describing would be a different issue and one we haven't seen before. It concerns me that you indicate the only way to fix the problem is to uninstall and reinstall MAPICDO. That would imply that the problem has to do with files on disk and/or registry settings. Are you sure simply rebooting and/or deleting MAPI profiles isn't sufficient to fix the problem for a while?
If the *only* fix is to reinstall, you need to take a look at the files system/registry before and after you reinstall to see what changed.
Hi Stephen, next time the error occurs, we will reboot our system and check files/registry instead of repairing cdo-mapi. If the error occurs, we cannot delete the profiles because they are not accessible because of MAPI_E_CALL_FAILED :-\ Three days ago we did the following: 1.) Creating a new profile using mfcmapi's profile wizard 2.) Set this profile as default profile and 3.) Using the Exchange-Server's IP instead of Server name. Until now, there was no error again! We hope this is a stable workaround...
George - exactly what function are you getting MAPI_E_CALL_FAILED from?
Hi Stephen, we saw MAPI_E_CALL_FAILED in mfcmapi 126.96.36.1999 after the action "Profile | Show Profiles": "MAPIInitialize failed with error 0x80004005 == MAPI_E_CALL_FAILED. In file MapiObjects.cpp On line 113". We did not see the error for over a week now. It seems to be fixed!
George, that would be completely unrelated to the issue I'm discussing here.
In reference to letter 4.A & 4.B (under What you can do)
Does 'multi-threading' ONLY mean multiple threads in a single process?
What if you have multiple (single threaded) processes that are concurrently calling GetProps or CopyTo etc? Should those calls be synchronized regardless?
Traditionally, multithreading refers to multiple threads in a single process. MAPI does not consolidate RPCs from different procesess so there would no need (for this issue) to synchronize the calls.
we recognized BufferTooSmall with Exchange 2010 SP3 (14.3 Build 123.4) when MAPI calls are not synchronized... i thought the issue was fixed in 2010 SP2 RU3?
Is this a new issue?