Whenever I find myself repeating the same message over and over again, I have to ask why I haven't blogged it yet. This is one of those cases. :)
I've seen quite a few issues over the years with MSG files. The issues range from "it takes too long to write properties" to "the properties on the MSG don't match what I see in the store" to "I get such and such error trying to copy this message to an MSG". The root cause of most of these issues is one of expectations. People are trying to use MSG files as an archival format, and that's not their intended purpose. If you really want to archive mail, you should develop your own format for persisting the data. You'll gain advantages in versatility, speed, and fidelity.
To understand why I make this recommendation, we first need to realize that not all messages can be copied over to the MSG format. This is noted at the end of http://support.microsoft.com/kb/171907. Since MAPI is transacted, the underlying MSG file has to be opened with STGM_TRANSACTED, meaning nothing is committed to disk until SaveChanges is called on the message. Couple that with the quirk in the MAPI specification that pretty much forces you to create a new transaction each time you add a recipient or attachment and you quickly run into the limit on open root storage files noted in http://support.microsoft.com/kb/163202. This OS imposed limit on open root storage objects isn't likely to ever change, as it's an artifact of the implementation. Likewise, the need for new transactions for each recipient and attachments also won't ever change. Neither the MSG format nor structured storage have seen active development in years. This limit is going to be hit whenever a message has a large number of recipients or attachments, or when there exist a deep level of embedded messages.
The next issue is speed. Writing a message to MSG can be quite slow. There is a huge performance penalty working with a structured storage file in STGM_TRANSACTED mode. And this penalty is multiplied by the number of open root storage objects. So not only do you run into a limit trying to add all those recipients and attachments, but each subsequent recipient and attachment is that much slower to add. For instance, I recently worked on an issue where the repro required that I have 5000 recipients on a message that I then copied over to MSG format. It took over an hour to write the file. And none of that delay was actually in the MAPI code - it was all at the COM level.
Next - not every MSG file you can write can be opened by Outlook. Over the years folks have tried various tricks to squeeze performance out of the code writing their MSG files. In many cases, they succeeded in writing the file faster, or allowing more recipients and attachments on the message. But the downside was they wrote a file that Outlook didn't know how to open! One variation of this issue surfaced with Outlook 2007. Given the performance problems working with MSG files, in Outlook 2007 we decided to check the number of recipients and attachments when opening the file. If either was over 2048, then we refused to open the file at all. The main reasoning for this was a number of corrupt MSG files that had surfaced in the wild with astronomical counts of recipients and attachments - on the order of millions. But a side effect was to block Outlook 2007 from opening some MSG files that Outlook 2003 could open. We've had some customers complain about this one and a fix is in the works. I'll report here when it's done. However, that fix will only cover this one variation of the problem. It won't fix the large number of other scenarios out there.
That covers the mechanics of reading and writing to MSG. Now we discuss fidelity. This isn't about whether the MSG format is out partying with the EML format, but rather how faithfully the MSG represents the source message. This is where MSG being a MAPI based format gets you in trouble. For instance, in archival scenarios, especially when the archive is used for legal discovery, properties such as PR_LAST_MODIFICATION_TIME and PR_LAST_MODIFIER_NAME are very important as they indicate who modified the message and when. But since MSG is itself a MAPI message and has such has to obey all the rules of MAPI, those properties will only reflect the time the MSG was written and the name of the account that wrote it, both of which aren't likely to match the original message. This problem can extend to the body properties as well: no matter how you do it, you're likely to end up converting the body from one format to another when storing it in the MSG file. And every conversion carries with it the possibility of a loss of data. Perhaps some line spacing is subtly changed, or font choices aren't preserved exactly. In some messages, these subtle textual differences could have huge semantic ramifications.
Fidelity also figures in when discussing Unicode. In a large organization, messages will be written in a variety of languages. The only way to preserve these messages into MSG format without converting half the characters to question marks or boxes is to use the Unicode format. Unfortunately, this format is only understood by Outlook 2003 and Outlook 2007. Exchange's MAPI doesn't understand this format at all. So if you're relying on MSG files to save out Unicode data, your solution is stuck using Outlook's implementation of MAPI for all processing of the archive. This severely hampers your ability to build a server based application.
Workaround
So, we've got messages that cannot be copied to the archive, a painfully slow API, messages that cannot be opened once archived, and a format that's not capable of representing the actual message being archived. Clearly, these are not the attributes we want in an archive.
Fortunately, the workaround is simple: don't use MSG to archive messages. Instead, develop your own file format to preserve the important properties on a message. Here's one approach using the file system and XML files:
The only really hard part about this format is determining how to store each of the possible MAPI property types. However, when we look closely, we see there are only 13 types to consider, most of which can be represented as just a simple number or string. Even binary data is easy to store if it's first converted to hex. Multivalued properties, large binary and string properties, and named properties all add additional wrinkles, but are easily addressed. I figure a junior programmer could complete a reasonable first draft of the required code to both read and write a MAPI message to and from XML in an afternoon. In fact, most of the code for writing the XML format is already present in MFCMAPI - check out dumpstore.cpp.
Objections
Hopefully I've convinced most of you not to use the MSG file format for archiving. Some of you might not be convinced though. You might think you've got that one special case that requires you to use MSG. I don't believe such a case exists. I've anticipated a few of the common objections:
The final objection is my favorite: "But I've never had a problem with MSG files" - Bully for you! This article isn't addressed to you then. However, I had one customer who also made this claim when I found they were using MSG to archive messages. Not quite believing them though, I outlined each of the problems listed above. It turns out they had encountered or were encountering every single one of them. They just hadn't connected the problems back to their choice to use MSG to archive their data.
[This is now documented here: http://msdn.microsoft.com/en-us/library/ff960239.aspx ]
We just had a customer whose Wrapped PST based store wasn't getting rules to fire in Outlook 2007, when everything was working in 2003. In the course of figuring their issue out, we found a few flags that we should have documented before.
Suppose you built a message store based on the Wrapped PST provider. When your new messages are delivered to your backend database and you create matching messages in the PST, how do you inform Outlook that a new mail has arrived? In Outlook 2003, the way you solve this is to use the IMAPISupport object that was passed to you during logon. From there, you would call Notify, passing fnewNewMail and the details of the new message. When Outlook gets this new message notification, it passes the Entry ID over to the rules engine and everything is hunky dory.
Unfortunately, this doesn't work in Outlook 2007. You fire your notification, Outlook sees it, then Outlook decides not to run rules on the message. Why? Because when Outlook 2007 looked at PR_STORE_SUPPORT_MASK on the wrapped PST, it found the STORE_ITEMPROC flag was set. This means the wrapped PST is an "Item Proc" store, and handles rules differently. Without getting into too many details about "Item Proc", the general idea is that such stores will feed new items into a pipeline where rules, junk mail, and spam processing can happen before listening clients will get a notification. This concept was introduced in Outlook 2007 as a way to streamline the processing.
When an Item Proc store knows it has a new message, it calls a special callback function in Outlook to run it through the rules engine. In the PST, the concept of "newly delivered message" is handled through internal flags. These flags may be set or not set depending on how the message arrives in the PST. For instance, if the PST is being used as an OST and new mail is synchronized from an Exchange server, the flags will be set. Similarly, when the PST is being used as a back end POP3/IMAP, the flags will be set when a new mail is delivered.
For a wrapped PST, however, the way you set these flags is to pass ITEMPROC_FORCE in the SaveChanges call when the newly delivered message is committed to the store. This tells the PST to set its internal flags and mark the item as being eligible for the callback.
There's a twist though. If the mail being delivered to the PST came from Exchange, then rules would already have run on the server and the message has to be handled differently. And the default assumption is that the mail came from Exchange. So you have to indicate that Exchange was not the source of the mail by also passing the NON_EMS_XP_SAVE flag. When both flags are passed, the item will go through rules/junk/spam processing as soon as you call SaveChanges if Outlook is running, and shortly after Outlook starts if it wasn't.
Definitions of the new flags:
Hey - look what I found on one of my old machines. It's the documentation from the Exchange 5.5 SDK / EDK!
Exchange_55_SDK_Docs.exe
We're not supporting development against 5.5 anymore, and a rather large portion of what's in these docs only applies to Exchange 5.5, but there are some interesting tidbits still. Some highlights:
Note - these are old .CHM files. If you can't read them I can't really help you. I'm only providing them here as a historical curiosity.
BTW - for some reason, we've never pulled the 5.5 samples and libraries. You can still get those from here:
http://www.microsoft.com/downloads/details.aspx?FamilyID=36a309c3-8c55-4476-8785-cafc59a2d075&DisplayLang=en
The January '08 refresh for the Outlook 2007 Auxiliary Reference is now live. In addition to a general scrub over most of the topics, there are some interesting new topics added, such as:
Timezone/rebasing docs: http://msdn2.microsoft.com/en-us/library/bb820976.aspx and http://msdn2.microsoft.com/en-us/library/cc160684.aspx
Protecting open PST files when you crash: http://msdn2.microsoft.com/en-us/library/cc160695.aspx
Rules processing in the wrapped PST: http://msdn2.microsoft.com/en-us/library/bb820947.aspx
Additionally, my update to the wrapped PST has been incorporated.
As promised, though a bit late, here are the change lists I put together for a couple of versions of MFCMAPI, along with a history lesson.
Versions 1-3 of MFCMAPI were essentially toys, built as I learned what could be done with MAPI. I can't even find them now - I wasn't very good at source control back then.
The oldest build of MFCMAPI I possess is 4.0.0.4, from 9/26/2001. At the time, it was for me just a collection of sample code that I would share with customers and other Dev Support engineers. Somewhere along the line, I found that Exchange and Outlook support folks wanted to use MFCMAPI as a tool. So I set up an internal site here at Microsoft where they could grab the latest stable build and use it. I didn't consider it ready for customers to use as a tool though, so I asked that I be kept in the loop any time it was sent out the door.
At some point (9/27/2002 to be exact) I bumped the major build number to 5.* to reflect the number of changes I had made recently. I also started work on the classic KB article: http://support.microsoft.com/kb/291794. It went live on 7/24/2003 with version 5.0.0.8. Minor build numbers were introduced along the way and I kept the article updated with builds and source. The last build I published in the KB was 5.0.18.1978 and it was built on 2/14/2005.
In August of 2004, a program manager for Exchange contacted me about having MFCMAPI officially replace MDBVU for Exchange 12, eventually known as Exchange 2007. Of course, I jumped at the chance. The experience of checking my code into the Exchange build tree was incredibly instructive. We ended up changing over half the code to get it up to the Exchange team's standards. We took our time getting the official Exchange build ready, so I kept up the KB releases for a while. Finally on 6/7/2006, we released the newly christened "MAPI Editor". Since this version was built out of the Exchange 2003 build tree, the version number was 06.05.7830.
As I stated at the time, my eventual goal was to get the source for this new build published. I also wanted to continually update the binary as I added features and bug fixes. But as we all know, when you hand control over to someone else - and their priorities aren't in sync with yours - what you want and what you get aren't always the same.
So - this past year, with the blessing of the Exchange team, I took control back. They can still publish an official build of "MAPI Editor" if they want, but I've put all the source and the releases for MFCMAPI up on CodePlex. The first release went live on 8/30/2007, and I've done three updates since then. Since I took over building the project, I resumed my old version numbering scheme, bumping the major build number to 6.
I've rambled enough - here are those diffs:
KB (5.0.18.1978) -> MAPI Editor Download (06.05.7830):
Features:
Cleanup:
MAPI Editor Download (06.05.7830) -> Initial Codeplex population (6.0.0.1000):
Fixes: