We have gotten a number of cases lately involving applications using Outlook 2010's MAPI failing when connecting to Exchange 2010 servers. The primary symptom observed by our customers is that random MAPI calls to Exchange return MAPI_E_CALL_FAILED. At first blush, there is no rhyme or reason as to which calls fail, except that all of the applications which see the problem have multiple threads using MAPI. It turns out the underlying problem is very complicated and we won't be able to fix it in a hotfix (though we do have a long term plan for a solution), so I wanted to get this information out there in case this problem is affecting you.
The root of this problem has to do with how the Exchange MAPI provider (henceforth called just MAPI here) packs remote operations, or rops, into the RPC buffer to send them to Exchange to be processed. In all versions up until Outlook 2003 (and including all versions of the MAPI download), most rops are sent synchronously to the server as soon as the client calls in to MAPI. So if you were to call GetProps and follow along in the debugger, you would see your call go in to emsmdb32, then to rpcrt4, at which point a packet would be sent on the wire. Your thread would then wait for Exchange to respond, then it would parse the response buffer and hand it back to the client. The MAPI_DEFERRED_ERRORS flag can cause multiple rops from the same thread to get packed together in the same buffer, but the key point here is that everything usually happens on the initial calling thread.
Outlook 2007 changed the model for how Outlook’s MAPI handles MAPI operations. Most MAPI operations are now issued asynchronously. MAPI maintains a worker thread which queues up pending MAPI operations from the various client threads and packs them into buffers as needed, resulting in fewer packets being sent between the client and server. So for instance, if one thread issued an operation to read a stream and another issued an operation to set properties at the same time, both operations could be packed into the same buffer and sent to the server. When the results come back, MAPI decodes them and sends them back to any client threads which were waiting on the results. This has the benefit of greatly reducing the amount of data sent across the wire, which is a primary concern for the developers of Outlook’s MAPI. The side effect of this change is that we can now see operations packed together in combinations that were never possible to see with Outlook 2003. This is the first change which leads to the random MAPI_E_CALL_FAILED results.
On the server side, Exchange 2010 made a series of fixes to prevent looping calls which can overwhelm networks and bring down Exchange servers. The key rop here is RopBufferTooSmall, which is the rop the Exchange server can send back to a MAPI client if the server needs to send more data back to the client than is available in the response buffer. This is a signal to the client that it retry the operation, but this time either use a larger buffer, pack fewer operations into the request buffer, or both. Through analysis of numerous hang dumps we had received over the years, the Exchange team identified a number of scenarios where the server would send this response back to the client and the client's response would be to issue the exact same request the server had just rejected. This would result in a loop, with client and server exchanging the exact same packets until one or both was shut down. The key characteristics in these loops were that RopReadStream was the operation for which the result could not fit in the response buffer, that it was the first and usually only operation in the request buffer, and that we were already using the maximum size response buffer. The fix then was to special case when RopReadStream was the first operation in the request buffer. Instead of returning the RopBufferTooSmall rop, we would return the error ecBufferTooSmall (0x47D, or 1149 in decimal). This change, being more picky about how Exchange 2010 will accept packed rops, is the second change leading to random MAPI_E_CALL_FAILED results.
We combine these changes to see how they interact: prior to Outlook 2007, if Outlook’s MAPI issued a request buffer with RopReadStream as the first operation in the buffer, it was most likely also the last operation. A single client thread was waiting synchronously for the response to this rop. If the response buffer was already maximized and the request still could not fit, re-issuing the request could not be expected to work, so Exchange’s error would be the best response, as it avoids the loop. With Outlook 2007 and higher though, this RopReadStream could be packed in the same buffer as rops for other threads. Some of these rops may carry a large payload, which complicates the math Exchange uses to calculate the remaining space in the response buffer. This calculation occurs before Exchange executes the rops (since it would do no good to execute the rops first and discover we have no room for the results) so Exchange has to assume that every rop in the buffer could potentially fail. The upshot is that RopReadStream gets the short end of the stick, and Exchange concludes it does not have enough space for its response. When RopReadStream is not the first operation in the buffer, we send RopBufferTooSmall back to the client so it can break up the operations and try again. But when it’s the first operation in the buffer, the entire packetful of rops fails outright.
And when they fail, since MAPI doesn’t have a targeted error code to match ecBufferTooSmall, everybody gets the default error of MAPI_E_CALL_FAILED.
The Exchange and Outlook development teams both take this issue very seriously and want to correct this. That said, the risk of regression when you start playing with how Outlook packs rops or how Exchange calculates response buffers is incredibly high. The amount of testing we can do for a hotfix does not reduce this risk enough for us to be comfortable approaching this as a hotfix. So the Outlook team has agreed to attempt a fix for this in the next version of Outlook, currently under development. By getting the fix in to Outlook this early in the development cycle, we’ll have plenty of opportunity to test the fix, and also to observe how it affects clients in both dogfood and beta deployments. We will evaluate our options for Outlook 2010 if/when it comes time to consider what goes in to SP2, depending heavily, of course, on how things go in the next version.
Given the nature of this problem, if your application is multithreaded, uses Outlook’s MAPI, and reads streams, there is a high likelihood you will run in to this problem at some point. Here are the options we’ve identified so far, in no particular order:
Hopefully one of the first three options is available to your application.
Given that this problem is one that is going to happen despite your best efforts to avoid it, it helps to be able to identify the issue. The best diagnostic here is to use Exchange’s RPC Client Access (aka RCA) logs, which are on by default. These logs contain an entry for every time Exchange failed a call with ecBufferTooSmall. The key word to look for in these logs is BufferTooSmallException. You should expect to find several entries with BufferTooSmall, without Exception. These are the normal retry results and can be ignored. Here are a couple of examples of the log entry you need to look for:
2011-03-17T08:23:58.188Z,2,8548,/o=org/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=User,, customerapp.exe,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,1149 (rpc::BufferTooSmall),00:00:00,, RpcDispatch: [BufferTooSmallException] Une exception de type 'Microsoft.Exchange.RpcClientAccess.BufferTooSmallException' a été levée.
2011-09-15T13:59:20.762Z,5,1351,/o=org/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=User,, customerapp.exe,14.0.4760.1000,Classic,,,ncacn_ip_tcp,,,,>ReadStream;>FastTransferDestinationPutBuffer, RpcDispatch: Exception of type 'Microsoft.Exchange.RpcClientAccess.BufferTooSmallException' was thrown.
You'll find these logs on the Client Access Server (CAS) for Exchange 2010 in the following directory: %ExchangeInstallPath%\Logging\RPC Client Access
Note that the second entry contains more information on the other operations which were included in the buffer. While you don’t need this information to identify the problem, it might be helpful in matching the logs to your own error logs. Here’s how to crank that logging higher (note that this must be done on every CAS for which you want greater logging:
Now Exchange will log the rops which were in the buffer. By the way, RCA logs are great for troubleshooting other issues as well. Any time Exchange 2010 isn’t returning the results you expect, these logs are a good place to start. Also, I should remind anyone looking at these logs that we do know at least one other reason you might see BufferTooSmallException. I discussed it in my article on Large Multivalued Properties.
This is a complicated issue that can present itself in multiple ways (due to the random nature of which calls fail). Even though I was the engineer who discovered it and figured out what was happening, I still worked two further issues where this issue was the root cause but did not recognize it. In one case I happened to get debug trace that reminded me to check for this issue, and in the other, in desperation when no other troubleshooting was working, I rolled the dice and had the customer check the RCA logs on the off chance this was their issue. It turned out it not only lined up perfectly, but the customer was able to apply the same troubleshooting to other sites experiencing issues they though were unconnected and confirmed the issue was happening there as well. So while we don’t have a fix for this yet, at least we can determine when the problem is happening and stop troubleshooting.
If development informs me of any plans to fix this earlier than the next version of Outlook, I will report them here. However, don’t ask me if or when SP2 is expected. For that, track the Office Sustained Engineering blog or the Update Center for Microsoft Office.
This issue was fixed on the Exchange 2010 side in SP2 RU3. Read the details of the update here, and get the rollup here.
If you look around for it, you’ll find a lot of posts on a 0x81002746 error that you might get from ConfigureMsgService. But no one seems to know what the error really means or why it happens. I thought I’d fill in some details.
For starters, this error has nothing to do with WSAECONNRESET, which is 0x2746. Some error lookup tools will make this connection, assuming the value was built as a traditional HRESULT and reading the Code from within it. This has lead many people down the wrong path. Actually, the error is MAIL_E_NAMENOTFOUND. Once we know the real name of the error, we can make some sense of when and why it happens.
When you call ConfigureMsgService to configure the Exchange provider, you provide the name of an Exchange server and a mailbox. The Exchange provider uses these to query the GAL to get more information to finish configuring the profile. It talks to the GAL using NSPI. In particular, it binds to the server and issues an NspiGetMatches call, passing a filter built using the mailbox name you gave to ConfigureMsgService. If this call succeeds and returns a record, it parses the results and continues configuring the message service. If it fails, we bubble up some version of the error we got back. But if the call succeeds with no records? Normally, we’d put up a dialog asking the user to check the name. But if we’re running as a service or the client passed flags to disable UI? That’s when we return MAIL_E_NAMENOTFOUND.
Let’s examine the implications of this: We asked the server for any records matching a given name. The server responded with zero matches. This means it didn’t recognize the name. In other words, we looked in the GAL and did not find the mailbox we were looking for! We can now list some common causes:
I recently had a customer encounter this error. I was able to eliminate the first two possibilities, but I knew better than to try and personally chase all the whackadoodle permissions that could be set in their Exchange environment. I’m a debugger, not an administrator! So I enlisted Kevin Carker, from our Exchange Admin team, to work with the customer and figure this out. His ace sleuthing uncovered a reason we had not previously considered: the customer had multiple GALs! They must have created them while playing around in the management console, and since nothing broke for their users, didn’t clean them up. But the way they left things configured, the user they were running their application as defaulted to one of the other GALs, which was empty. So NSPIGetMatches looked in that GAL, did not find the mailbox, and we returned MAIL_E_NAMENOTFOUND. Now, the customer could have configured everything so the presence of this empty GAL didn’t cause a problem, but, seeing as it was empty, they chose to delete it instead, at which point their account defaulted back to the regular GAL and could now configure a profile.
Hopefully, now that we understand what this error means and why it happens, no one else will need to rip their hair out trying to troubleshoot it. If you’re getting MAIL_E_NAMENOUTFOUND from ConfigureMsgService, the mailbox you looked up could not be found in the GAL you looked in. By working through the various reasons why this may be, you should be able to resolve this without making a support call.
It’s been a while, but we just shipped a new version of the MAPICDO download. Per tradition, I’ve put together release notes: