I wrote these rules out while debugging a crash in another MS product:
I won't name the app, but it violated all 4 rules.
Known consequences of violating these rules:
This post was inspired by a case I worked recently. In this case, the customer was using the 5.5 Event Script service to autoaccept meeting requests. They weren't having any problems with their script or scalability. Their problem was that the service would run fine for days at a time, and then suddenly stop handling all incoming meeting requests. The error they would get in the event logs was this not especially helpful event 11:
Event ID: 11Source: MSExchangeESDescription: A fatal error (0x80004005)occurred in an IExchangeEventSink while processing message [Subject = '<subject>']
Examining the script logs was no help - they didn't say anything about the error. This case had me stumped for a while. The latest fixes for events.exe were no good. Turning on internal tracing didn't take me too far either.
Time to pull out the big gun
When I finally got a debugger attached to the events.exe process, I found that OpenMsgStore was returning MAPI_E_CALL_FAILED (==0x80004005). Debugging into that, I found that during handling of OpenMsgStore, MAPI has to go to the registry to retrieve some properties from the profile. MAPI's call to RegQueryValueEx was returning ERROR_KEY_DELETED (==1018). The profile we were using in the Event Script service had been deleted out from under us! Why?
The simplistic answer is that right after MAPILogonEx, events.exe calls DeleteProfile. To understand that, we need to digress.
What does DeleteProfile do?
Well, for starters, it doesn't delete the profile. At least not always. Here's what the MSDN has to say on this function:
The IProfAdmin::DeleteProfile method deletes a profile. If the profile to delete is in use when DeleteProfile is called, DeleteProfile returns S_OK but does not delete the profile immediately. Instead, DeleteProfile marks the profile for deletion and deletes it after it is no longer being used, when all of its active sessions have ended.
Let's break this down:
Check if the profile is in use
MAPI maintains a shared memory object (this is CreateFileMapping, not .Shared sections) for interprocess communication and synchronization. In that shared memory object, we keep a linked list of all profiles currently in use. When we log onto a profile with MAPILogonEx, its ref count is bumped up. When the client logs out, the ref count is dropped back down. So checking if a profile is in use is equivalent to looking in the shared memory object to see if the profile has a ref count.
Mark a profile for deletion
MAPI uses a special reg key to mark a profile for deletion. The key is
HKEY_CURRENT_USER\Software\Microsoft\Windows NT\CurrentVersion\Windows Messaging Subsystem\Profiles\Deleted Profiles
To mark a profile for deletion, we write the profile name as a subkey. When MAPI is asked to present a list of existing profiles, such as in the MAPILogonEx dialog or in response to GetProfileTable, profiles listed under Deleted Profiles are left out.
Delete a profile after it is no longer being used
This is the hard one. How does MAPI determine the exact instance a profile is no longer in use? During logoff is not sufficient, since we also need to account for cases where the application using MAPI has crashed. So we have to be cleverer than that. We delete profiles from the registry when:
The first two cases handle most profile deletions. The third case handles the abnormal situations - an application has crashed or leaked a session pointer. The third case is also where bugs in the implementation cause cross process problems.
Here come the bugs, part 1
The first bug is completely our fault. Suppose two different processes run as the same user. Suppose also that one of the processes uses impersonation to run under the credentials of another user. Note that by default, this does not load the impersonated user's registry hive. Doing that requires a different call. (See KB 259301 for example code that does this). So, both processes will use the same registry hive when accessing HKEY_CURRENT_USER.
The first process to start creates a profile for MAPI. It then calls DeleteProfile after logging on so as not to clutter up the registry with temporary profiles. The second process (which has done the impersonation) now comes in and calls MAPIInitialize. We have a problem with our shared memory now - each process got their own block of shared memory. Our scheme now breaks down. The second process checks its shared memory to see what profiles are in use, finds that no profiles are being used, and proceeds to delete every profile listed under "Deleted Profiles". The first process is now what we call hosed.
When this bug was first reported, we weren't sure how to fix it without doing a total rewrite of this feature. That is, until one of our devs had the brilliant insight that each shared memory section was essentially tracking the profiles in use by a particular NT account. So all we needed to do was reflect that in our use of the registry by appending a CRC of the SID under which MAPI is running to the "Deleted Profiles" string. Now all processes which are running MAPI under the same NT account (therefore accessing the same shared memory) are looking at the same place in the registry, and processes running MAPI under different NT accounts look in different places in the registry.
Here come the bugs, part 2
And yet, even after putting in the fix, we still saw profiles being deleted out from under us. This one isn't completely MAPI's fault. Suppose two different processes are running on the same machine. Both run as the same user and neither uses impersonation. However, one runs in Terminal Services and the other runs in the console.
Same dance as the first bug. When the second process goes to create a shared memory object, it does not get the same shared memory as the first one. We don't get the same shared memory because sessions started under Terminal Service use a different namespace than sessions started from the console. In general this is a good thing, but it blocks MAPI from doing what it's trying to do.
The fix from the first bug isn't effective here because the SID for both processes is the same. Fortunately, the fix here is even simpler - it's even spelled out in the MSDN: prepend "Global\\" to the name of the shared memory. Now all processes running as the same user will use the same shared memory regardless of the Terminal Service session.
Where was I?
This started as an Event Script problem, and digressed into a discussion of DeleteProfile. The Event Script service is built around MAPI. It creates a profile on the fly and calls DeleteProfile immediately after logging on. The customer used Terminal Services to manage their server, and to top it off, Outlook was installed on the machine. So every few days, they'd encounter Bug 1 or Bug 2 and their script was toast. I had them remove Outlook, sent them the patch, and they haven't seen the problem since.
Hey Steve, where's this wonderful patch you speak of?
Thought you'd never ask. Here it is by product:
[8/22/04 8:09PM Minor edit - clarified some text]
[8/25/04 11:04AM Clarified impersonation and added links]
I'm working on a followup to my memory management article and a writeup on a MAPI deleted profile bug I ran across recently, but this takes precedence.
Exchange and Outlook on the same machine is bad. In my last post I waved my hands about some scenarios which could lead to a crash. I got a dump today which illustrates one. The issue originally manifested as heap corruption in MFCMAPI while opening a message store. We enabled pageheap on the process to see what was corrupting the heap. Here's the stack we got:
0:000> kL ChildEBP RetAddr 0012f2bc 77fb44fb NTDLL!RtlpDphIsNormalHeapBlock+0x86 ... 0012f4d0 35525bda NTDLL!RtlFreeHeap+0x85 0012f4e0 35525b6e MSMAPI32!LH_ExtHeapFree+0x19 0012f50c 62a5248b MSMAPI32!MAPIFreeBuffer+0x64 0012f778 62a530dd emsmdb32!HrSetupOffline+0x144 0012f834 62cd1755 emsmdb32!RMSP_Logon+0x2c4 0012f894 62cd1478 mapi32!HrIntDoOneClientLogon+0x8a 0012f8b0 62cd0f25 mapi32!HrIntClientStoreLogon+0x47 0012f920 62cd1422 mapi32!HrCommonOpenStore+0x3c3 0012f9a0 004310b6 mapi32!SESSOBJ_OpenMsgStore+0x56 0012f9d0 004321c7 MFCMapi!CallOpenMsgStore+0x66
I've trimmed the stack a little to get to the heart of the issue. We're corrupting the heap when we call RtlFreeHeap. Notice that mapi32 and msmapi32 both appear on this stack. Here's the versions of both (again, output is trimmed):
0:000> lmvm mapi32 Image path: C:\WINNT\system32\mapi32.dll Timestamp: Sat Jun 12 00:17:32 2004 (40CA83DC) File version: 6.0.6603.0 ProductName: Microsoft Exchange 0:000> lmvm MSMAPI32 Image path: C:\Program Files\Common Files\System\Mapi\1033\MSMAPI32.DLL Timestamp: Tue Dec 16 20:39:51 2003 (3FDFB3E7) File version: 10.0.6515.0 ProductName: MAPI32
So what's the problem? We have two competing versions of MAPI loaded into the same process!
Emsmdb32.dll made a call to GetProps to get some properties on a profile. The resulting LPSPropTagArray was allocated using Exchange's MAPI32.dll. Later, Emsmdb32.dll calls MAPIFreeBuffer to clean up this memory. Somehow, Outlook's MSMAPI32.dll ended up handling this call. Since Outlook's MSMAPI32.dll doesn't know anything about the heaps created by Exchange's MAPI32.dll, we end up corrupting the heap during this free. Without pageheap enabled, this corruption is silent, and doesn't rear it's ugly head until later on when we try allocating some memory against the corrupted heap.
After removing Outlook from the box, this problem went away.
Some of you may know that I wrote the MAPI utility/sample MFCMAPI. Someday I'll write some posts directly about it. This post, however, is about memory management in MAPI.
The latest build of MFCMAPI was just added to the internal dev tools collection for the next version of Office. This means a lot of the developers and testers are running MFCMAPI against their private, debug builds of Outlook. One of the developers e-mailed me last week to let me know that this build of MFCMAPI was causing debug assertions. They took a look at the stack and told me the function throwing the assertion was MAPIFreeBuffer. They also pointed out where I was making the call.
Horror of horrors! Had I somehow attempted to MAPIFreeBuffer memory that hadn't been allocated via MAPI? Or was I freeing the same memory twice? After all the preaching I do to my customers over the importance of good MAPI memory management had I committed one of the sins I caution against?
Well, yes and no. I had indeed made an error, but it wasn't one I had seen before.
MFCMAPI displays a lot of data in list boxes. A row in a list box has a single data pointer. I have a structure that contains pointers to various buffers to hold things like Entry IDs. To simplify cleanup, I allocate the main structure with MAPIAllocateBuffer, then I allocate buffers that hang off of this structure with MAPIAllocateMore. That way, a single call to MAPIFreeBuffer can free the structure and all of the extra buffers. This makes my code very clean.
Sometimes I need to wipe out one of these buffers and replace the contents. This is where I got into trouble. I didn't want to leak any memory, so I called MAPIFreeBuffer on the buffer, set the pointer to NULL, then allocated a new buffer with MAPIAllocateMore. That call to MAPIFreeBuffer is the one that caused the assertion. You cannot free memory allocated with MAPIAllocateMore by calling MAPIFreeBuffer on it. The only way to free that memory is to call MAPIFreeBuffer on the 'parent' memory which you indicated in the call to MAPIAllocateMore.
So, lesson learned. Here are some other common MAPI memory leaks I see in customer's code (heck, I think I've seen every one of them in our code at one point or another):
So how did I fix my bug? The answer was quite simple - don't call MAPIFreeBuffer here! The memory I had allocated will be freed when the parent memory is freed, regardless of whether or not I still have a pointer to it. Since this scenario was rare, I can afford the memory hit of having a few extra buffers allocated for the lifetime of the structure.
[Comments for this post have been closed]
Put this together from a posting I made to the MAPI-L list and a couple cases I worked recently:
The problem is with MAPI store providers that just don't work when loaded under Outlook 2003. The developers of the providers, when they contact us, are usually convinced the problem is a bug in Outlook 2003 since the providers “worked perfectly” under Outlook XP. My experience though, is that the problem is that Outlook 2003 demands so much more of the MAPI spec than previous versions of Outlook that bugs which have always been in these providers are now exposed. What's very interesting though is that I'm seeing the same bugs over and over in these providers. I hope to highlight the source of these bugs and offer some ideas on how to correct them.
There is a book which is very much coveted in the MAPI development community: Inside MAPI. One of the samples in this book is MSLMS, the Microsoft Sample Local Message Store. For what it is, sample code demonstrating how to write a message store provider, it's great. I'm not aware of any other sample message store out there. So it does not suprise me that a number of people choose to base their provider on the code in MSLMS. As such, they tend to inherit the same bugs and design flaws present in the sample.
I recently spent a couple days hacking on the MSLMS sample so that it could load under Outlook 2003. Here's a short list of the problems I found and what I had to do to fix them:
Those are just the problems I fixed. Here are some I didn't:
Anyone who has based their message store provider on MSLMS (even if it was “What would MSLMS do?“ and not just borrowing the code) needs to review their code and ensure they've addressed all of the above bugs before they approve their provider for support under Outlook 2003.
I was hoping to post something less controversial for my first blog entry, but this issue came up again recently and I felt I had to address it.
I'm frequently asked why I we don't support putting Outlook and Exchange on the same machine. One response I typically hear is “But I run both on my machine and never have a problem”. In this post, I'll attempt to clarify a few things and explain what kinds of problems you can expect when you put Outlook and Exchange on the same server.
A couple definitions - shorthand mainly:
Admin: The Exchange Administrator (5.5) or the Exchange System Manager (2000 and 2003).
MAPI: Used by iself, refers to Extended MAPI. I'll spell out Simple MAPI when I need to make a distinction.
The main thing to clear up is that the warning against putting the Outlook and Exchange on the same machine applies primarily to servers and other mission critical machines. Basically, if your server hosts an application which uses MAPI, you should never install Outlook on it.
So why do the articles make a point to say not combine the Admin and Outlook? The answer is because there are a large number of server applications out there which rely on MAPI for integration with Exchange. Examples include voicemail, PDA synchronization, workflow, archival, legal discovery and connectors (gateways) to third party mail/database systems. The recommended method for installing MAPI onto a server for one of these applications is to install the Admin.
Here's a (greatly simplified) synopsis of MAPI's architecture. The core file is mapi32.dll. After loading MAPI, applications then open message stores and address books. These are implemented by providers. The providers which allows MAPI to talk to Exchange and the directory are emsmdb32.dll and emsabp32.dll.
MAPI was designed so that anyone could implement the core component and providers. A few companies did choose to implement the Simple MAPI portion of the API in their own mapi32.dll, but AFAIK, the only currently available implementations of the Extended MAPI portion of the API are those shipped by Outlook and Exchange. A key flaw in the original MAPI design is that it did not allow for multiple implementations to coexist.
Life would be very simple if the mapi32, emsmdb32 and emsabp32 shipped by Outlook and Exchange were the same implementation, built out of the same code tree. If that were the case, the only concern would be keeping up with the latest builds.
However, life is not this simple. Outlook and Exchange have different needs from MAPI, and, as such, the code for their implementations has diverged a good deal. Exchange needs high stability and scalability out of MAPI, so a good portion of the design and testing is focused on eliminating potential deadlocks and memory leaks. Outlook needs a strong user experience, so the focus is on features like Cancel RPC, RPC over HTTP, Cached Mode, and server reconnect. Outlook also needed to solve the coexistence problem to allow for other mail clients, so enter the MAPI Stub.
The stub library works by acting as a central dispatcher for MAPI calls, proxying the calls out to the various implementations of MAPI which may be on the box. There's a performance penalty to the stub library's proxying mechanism though. This performance penalty was not acceptable in a server environment, so Exchange declined to support their implementation of MAPI with the stub. The effect of all this is that Exchange's providers expect to be loaded by Exchange's MAPI, and Outlook's providers expect to be loaded by Outlook's MAPI. We can't guarantee this will be the case if both are installed on the same box.
I began this post promising to list some concrete examples of the problems this can cause. These are real issues I've encountered as an Escalation Engineer:
Of course, most of these problems could be solved (any problem can be solved, right?), and the stub library is a good first step, but committing to support of this configuration would greatly expand the test matrices of both Exchange and Outlook, not to mention the increased test burden for third party server applications using MAPI. Unfortunately, we've not been able to justify the cost this would entail.
So, can you put Outlook and the Exchange Admin on the same box? If you're just talking about an administrator's desktop, one which can be rebooted on a whim or even rebuilt if needed, where downtime isn't a big issue, sure - it still won't be supported, but you might get away with it. If you're talking about your Exchange server or a server hosting an application integrated with Exchange, you do so at your own peril.
[Comments on this post have been closed]