This post was inspired by a case I worked recently. In this case, the customer was using the 5.5 Event Script service to autoaccept meeting requests. They weren't having any problems with their script or scalability. Their problem was that the service would run fine for days at a time, and then suddenly stop handling all incoming meeting requests. The error they would get in the event logs was this not especially helpful event 11:
Event ID: 11Source: MSExchangeESDescription: A fatal error (0x80004005)occurred in an IExchangeEventSink while processing message [Subject = '<subject>']
Examining the script logs was no help - they didn't say anything about the error. This case had me stumped for a while. The latest fixes for events.exe were no good. Turning on internal tracing didn't take me too far either.
Time to pull out the big gun
When I finally got a debugger attached to the events.exe process, I found that OpenMsgStore was returning MAPI_E_CALL_FAILED (==0x80004005). Debugging into that, I found that during handling of OpenMsgStore, MAPI has to go to the registry to retrieve some properties from the profile. MAPI's call to RegQueryValueEx was returning ERROR_KEY_DELETED (==1018). The profile we were using in the Event Script service had been deleted out from under us! Why?
The simplistic answer is that right after MAPILogonEx, events.exe calls DeleteProfile. To understand that, we need to digress.
What does DeleteProfile do?
Well, for starters, it doesn't delete the profile. At least not always. Here's what the MSDN has to say on this function:
The IProfAdmin::DeleteProfile method deletes a profile. If the profile to delete is in use when DeleteProfile is called, DeleteProfile returns S_OK but does not delete the profile immediately. Instead, DeleteProfile marks the profile for deletion and deletes it after it is no longer being used, when all of its active sessions have ended.
Let's break this down:
Check if the profile is in use
MAPI maintains a shared memory object (this is CreateFileMapping, not .Shared sections) for interprocess communication and synchronization. In that shared memory object, we keep a linked list of all profiles currently in use. When we log onto a profile with MAPILogonEx, its ref count is bumped up. When the client logs out, the ref count is dropped back down. So checking if a profile is in use is equivalent to looking in the shared memory object to see if the profile has a ref count.
Mark a profile for deletion
MAPI uses a special reg key to mark a profile for deletion. The key is
HKEY_CURRENT_USER\Software\Microsoft\Windows NT\CurrentVersion\Windows Messaging Subsystem\Profiles\Deleted Profiles
To mark a profile for deletion, we write the profile name as a subkey. When MAPI is asked to present a list of existing profiles, such as in the MAPILogonEx dialog or in response to GetProfileTable, profiles listed under Deleted Profiles are left out.
Delete a profile after it is no longer being used
This is the hard one. How does MAPI determine the exact instance a profile is no longer in use? During logoff is not sufficient, since we also need to account for cases where the application using MAPI has crashed. So we have to be cleverer than that. We delete profiles from the registry when:
The first two cases handle most profile deletions. The third case handles the abnormal situations - an application has crashed or leaked a session pointer. The third case is also where bugs in the implementation cause cross process problems.
Here come the bugs, part 1
The first bug is completely our fault. Suppose two different processes run as the same user. Suppose also that one of the processes uses impersonation to run under the credentials of another user. Note that by default, this does not load the impersonated user's registry hive. Doing that requires a different call. (See KB 259301 for example code that does this). So, both processes will use the same registry hive when accessing HKEY_CURRENT_USER.
The first process to start creates a profile for MAPI. It then calls DeleteProfile after logging on so as not to clutter up the registry with temporary profiles. The second process (which has done the impersonation) now comes in and calls MAPIInitialize. We have a problem with our shared memory now - each process got their own block of shared memory. Our scheme now breaks down. The second process checks its shared memory to see what profiles are in use, finds that no profiles are being used, and proceeds to delete every profile listed under "Deleted Profiles". The first process is now what we call hosed.
When this bug was first reported, we weren't sure how to fix it without doing a total rewrite of this feature. That is, until one of our devs had the brilliant insight that each shared memory section was essentially tracking the profiles in use by a particular NT account. So all we needed to do was reflect that in our use of the registry by appending a CRC of the SID under which MAPI is running to the "Deleted Profiles" string. Now all processes which are running MAPI under the same NT account (therefore accessing the same shared memory) are looking at the same place in the registry, and processes running MAPI under different NT accounts look in different places in the registry.
Here come the bugs, part 2
And yet, even after putting in the fix, we still saw profiles being deleted out from under us. This one isn't completely MAPI's fault. Suppose two different processes are running on the same machine. Both run as the same user and neither uses impersonation. However, one runs in Terminal Services and the other runs in the console.
Same dance as the first bug. When the second process goes to create a shared memory object, it does not get the same shared memory as the first one. We don't get the same shared memory because sessions started under Terminal Service use a different namespace than sessions started from the console. In general this is a good thing, but it blocks MAPI from doing what it's trying to do.
The fix from the first bug isn't effective here because the SID for both processes is the same. Fortunately, the fix here is even simpler - it's even spelled out in the MSDN: prepend "Global\\" to the name of the shared memory. Now all processes running as the same user will use the same shared memory regardless of the Terminal Service session.
Where was I?
This started as an Event Script problem, and digressed into a discussion of DeleteProfile. The Event Script service is built around MAPI. It creates a profile on the fly and calls DeleteProfile immediately after logging on. The customer used Terminal Services to manage their server, and to top it off, Outlook was installed on the machine. So every few days, they'd encounter Bug 1 or Bug 2 and their script was toast. I had them remove Outlook, sent them the patch, and they haven't seen the problem since.
Hey Steve, where's this wonderful patch you speak of?
Thought you'd never ask. Here it is by product:
[8/22/04 8:09PM Minor edit - clarified some text]
[8/25/04 11:04AM Clarified impersonation and added links]
I'm working on a followup to my memory management article and a writeup on a MAPI deleted profile bug I ran across recently, but this takes precedence.
Exchange and Outlook on the same machine is bad. In my last post I waved my hands about some scenarios which could lead to a crash. I got a dump today which illustrates one. The issue originally manifested as heap corruption in MFCMAPI while opening a message store. We enabled pageheap on the process to see what was corrupting the heap. Here's the stack we got:
0:000> kL ChildEBP RetAddr 0012f2bc 77fb44fb NTDLL!RtlpDphIsNormalHeapBlock+0x86 ... 0012f4d0 35525bda NTDLL!RtlFreeHeap+0x85 0012f4e0 35525b6e MSMAPI32!LH_ExtHeapFree+0x19 0012f50c 62a5248b MSMAPI32!MAPIFreeBuffer+0x64 0012f778 62a530dd emsmdb32!HrSetupOffline+0x144 0012f834 62cd1755 emsmdb32!RMSP_Logon+0x2c4 0012f894 62cd1478 mapi32!HrIntDoOneClientLogon+0x8a 0012f8b0 62cd0f25 mapi32!HrIntClientStoreLogon+0x47 0012f920 62cd1422 mapi32!HrCommonOpenStore+0x3c3 0012f9a0 004310b6 mapi32!SESSOBJ_OpenMsgStore+0x56 0012f9d0 004321c7 MFCMapi!CallOpenMsgStore+0x66
I've trimmed the stack a little to get to the heart of the issue. We're corrupting the heap when we call RtlFreeHeap. Notice that mapi32 and msmapi32 both appear on this stack. Here's the versions of both (again, output is trimmed):
0:000> lmvm mapi32 Image path: C:\WINNT\system32\mapi32.dll Timestamp: Sat Jun 12 00:17:32 2004 (40CA83DC) File version: 6.0.6603.0 ProductName: Microsoft Exchange 0:000> lmvm MSMAPI32 Image path: C:\Program Files\Common Files\System\Mapi\1033\MSMAPI32.DLL Timestamp: Tue Dec 16 20:39:51 2003 (3FDFB3E7) File version: 10.0.6515.0 ProductName: MAPI32
So what's the problem? We have two competing versions of MAPI loaded into the same process!
Emsmdb32.dll made a call to GetProps to get some properties on a profile. The resulting LPSPropTagArray was allocated using Exchange's MAPI32.dll. Later, Emsmdb32.dll calls MAPIFreeBuffer to clean up this memory. Somehow, Outlook's MSMAPI32.dll ended up handling this call. Since Outlook's MSMAPI32.dll doesn't know anything about the heaps created by Exchange's MAPI32.dll, we end up corrupting the heap during this free. Without pageheap enabled, this corruption is silent, and doesn't rear it's ugly head until later on when we try allocating some memory against the corrupted heap.
After removing Outlook from the box, this problem went away.