Whenever I find myself repeating the same message over and over again, I have to ask why I haven't blogged it yet. This is one of those cases. :)
I've seen quite a few issues over the years with MSG files. The issues range from "it takes too long to write properties" to "the properties on the MSG don't match what I see in the store" to "I get such and such error trying to copy this message to an MSG". The root cause of most of these issues is one of expectations. People are trying to use MSG files as an archival format, and that's not their intended purpose. If you really want to archive mail, you should develop your own format for persisting the data. You'll gain advantages in versatility, speed, and fidelity.
To understand why I make this recommendation, we first need to realize that not all messages can be copied over to the MSG format. This is noted at the end of http://support.microsoft.com/kb/171907. Since MAPI is transacted, the underlying MSG file has to be opened with STGM_TRANSACTED, meaning nothing is committed to disk until SaveChanges is called on the message. Couple that with the quirk in the MAPI specification that pretty much forces you to create a new transaction each time you add a recipient or attachment and you quickly run into the limit on open root storage files noted in http://support.microsoft.com/kb/163202. This OS imposed limit on open root storage objects isn't likely to ever change, as it's an artifact of the implementation. Likewise, the need for new transactions for each recipient and attachments also won't ever change. Neither the MSG format nor structured storage have seen active development in years. This limit is going to be hit whenever a message has a large number of recipients or attachments, or when there exist a deep level of embedded messages.
The next issue is speed. Writing a message to MSG can be quite slow. There is a huge performance penalty working with a structured storage file in STGM_TRANSACTED mode. And this penalty is multiplied by the number of open root storage objects. So not only do you run into a limit trying to add all those recipients and attachments, but each subsequent recipient and attachment is that much slower to add. For instance, I recently worked on an issue where the repro required that I have 5000 recipients on a message that I then copied over to MSG format. It took over an hour to write the file. And none of that delay was actually in the MAPI code - it was all at the COM level.
Next - not every MSG file you can write can be opened by Outlook. Over the years folks have tried various tricks to squeeze performance out of the code writing their MSG files. In many cases, they succeeded in writing the file faster, or allowing more recipients and attachments on the message. But the downside was they wrote a file that Outlook didn't know how to open! One variation of this issue surfaced with Outlook 2007. Given the performance problems working with MSG files, in Outlook 2007 we decided to check the number of recipients and attachments when opening the file. If either was over 2048, then we refused to open the file at all. The main reasoning for this was a number of corrupt MSG files that had surfaced in the wild with astronomical counts of recipients and attachments - on the order of millions. But a side effect was to block Outlook 2007 from opening some MSG files that Outlook 2003 could open. We've had some customers complain about this one and a fix is in the works. I'll report here when it's done. However, that fix will only cover this one variation of the problem. It won't fix the large number of other scenarios out there.
That covers the mechanics of reading and writing to MSG. Now we discuss fidelity. This isn't about whether the MSG format is out partying with the EML format, but rather how faithfully the MSG represents the source message. This is where MSG being a MAPI based format gets you in trouble. For instance, in archival scenarios, especially when the archive is used for legal discovery, properties such as PR_LAST_MODIFICATION_TIME and PR_LAST_MODIFIER_NAME are very important as they indicate who modified the message and when. But since MSG is itself a MAPI message and has such has to obey all the rules of MAPI, those properties will only reflect the time the MSG was written and the name of the account that wrote it, both of which aren't likely to match the original message. This problem can extend to the body properties as well: no matter how you do it, you're likely to end up converting the body from one format to another when storing it in the MSG file. And every conversion carries with it the possibility of a loss of data. Perhaps some line spacing is subtly changed, or font choices aren't preserved exactly. In some messages, these subtle textual differences could have huge semantic ramifications.
Fidelity also figures in when discussing Unicode. In a large organization, messages will be written in a variety of languages. The only way to preserve these messages into MSG format without converting half the characters to question marks or boxes is to use the Unicode format. Unfortunately, this format is only understood by Outlook 2003 and Outlook 2007. Exchange's MAPI doesn't understand this format at all. So if you're relying on MSG files to save out Unicode data, your solution is stuck using Outlook's implementation of MAPI for all processing of the archive. This severely hampers your ability to build a server based application.
So, we've got messages that cannot be copied to the archive, a painfully slow API, messages that cannot be opened once archived, and a format that's not capable of representing the actual message being archived. Clearly, these are not the attributes we want in an archive.
Fortunately, the workaround is simple: don't use MSG to archive messages. Instead, develop your own file format to preserve the important properties on a message. Here's one approach using the file system and XML files:
The only really hard part about this format is determining how to store each of the possible MAPI property types. However, when we look closely, we see there are only 13 types to consider, most of which can be represented as just a simple number or string. Even binary data is easy to store if it's first converted to hex. Multivalued properties, large binary and string properties, and named properties all add additional wrinkles, but are easily addressed. I figure a junior programmer could complete a reasonable first draft of the required code to both read and write a MAPI message to and from XML in an afternoon. In fact, most of the code for writing the XML format is already present in MFCMAPI - check out dumpstore.cpp.
Hopefully I've convinced most of you not to use the MSG file format for archiving. Some of you might not be convinced though. You might think you've got that one special case that requires you to use MSG. I don't believe such a case exists. I've anticipated a few of the common objections:
The final objection is my favorite: "But I've never had a problem with MSG files" - Bully for you! This article isn't addressed to you then. However, I had one customer who also made this claim when I found they were using MSG to archive messages. Not quite believing them though, I outlined each of the problems listed above. It turns out they had encountered or were encountering every single one of them. They just hadn't connected the problems back to their choice to use MSG to archive their data.
"I need to be able to open the messages in Outlook" - Steve, you are missing the point. People want to be able to open a message in *Outlook*, not my super fast reliable viewer, which, unfortunately, does not get installed by Outlook :-)
A requirement to have Outlook installed is an easy one (have you ever seen a corporate PC with a copy of Office installed?), installing anything third-party is a PITA.
It does not have to be an MSG file, people simply want something that *Outlook* can open. If MS comes up with an XML schema that OUtlook can natively open, I'll be the first one to use it.
EML format would be good, but I don't think you could handle EX type recipients (people woudl want to see the familir GAL dialog, not a one-off SMTP address).
No - I don't think I missed the point - the point is they're trying to use this format for *archiving* and it's totally unsuitable for that purpose. When I talk about having Outlook installed, I'm speaking more about the server where the archiving is taking place. No server should have Outlook installed on it. That's just a bad idea.
The only scenario I can see where they would have access to the archive but NOT have any software (not even a web page) from the archive vendor is if the only interface the vendor presents to the end user is a file share. And I think we can agree a file share is a pretty poor interface for an enterprise ready product.
You seem to be proposing "Microsoft should fix this", which I already addressed. Even if we were to make some better format, it would only work with the newest version of Outlook, so for that reason most vendors would reject it.
I am with you when you talk about archiving - each and every property that is expected to be used later must be persisted explicitly.
What I am talking about however is UI - people are most comfortable with Outlook, they do not want to use any other app that does something that Outlook can do.
A use case: a user sends/receives a message to/from a customer. It gets parsed and its most important properties (subject, body, attachments, etc) are parsed and stored separately. The whole message is also stored in the MSG format in a blob in a DB (storage is cheap).
A user (the same or a different one) at a later time can simply look at a history view for a given contact and double click on the message. The message is extracted from DB, saved as an MSG file, and opened by Outlook. A user can then reply/forward/etc in the familiar Outlook environment. The "familiar Outlook environment" is the keyword; at this point I would not care less if an obscure MAPI property was not persisted correctly.
That's a valid use case. However, since in that case you already have code pulling the MSG file from the DB and saving it to disk, there's no reason you couldn't pull properties from the DB and construct a message on the fly. In fact, I even mentioned this option in the article. As long as you have code running on the client side there's no reason your UI needs to change.
I'm not concerned with the expense of storing an MSG file. What I *am* concerned with is the fact that so many messages cannot be represented in MSG at all.
I really, really do not want to deal with the stuff that I do not care about, especially the pretty fomanatting, be that HTML or RTF, or a combination of the two.
The user however does care about that a lot.
Plus if the extraction is done on the server, the MAPI system might not be installed, even if Outlook is locally available.
I just want to have a file format that Outlook can open *natively*.
Another option that you did not mention is that (since your own all the source code), you can just pull out the relevant pieces of the COM system from Windows and create a private MAPI function that can deal with any MSG file.
The function does not have to be real fancy and support simultaneous access from different processes (you cannot do that now with MSG files anyway).
I understand the technical limitations of the MSG format, but if a customer wants to have 10,000 recipients in a message, I can come up with an excuse why accessing such an MSG file takes long time, but saying that he simply can't do that ain't gonna fly...
?? This isn't about formatting. It's about the choice of storage on the back end. You can present the data to the user however you like, including as a MAPI message.
You keep demanding a change to Outlook - that's not going to help most vendors who would still need to deal with older versions of Outlook. Same goes for your proposal for some new version of MSG - in order for that code to get on the box the user would have to install a new version of Outlook.
I've given a solution here that doesn't require explaining anything to the customer. I'm not sure why you're fighting it.
Steve, I am not trying to fight anything, I am just trying to highlight *why* people are using the MSG format and will be using it despite all of its shortcomings for the years to come: it is simply the format that Outlook can display.
If you come up with a different format in one of the next versions of Outlook, I will be able to at least give the customers an option to use it (remember when Outlook 98 was the latest and greatest?)
Or you can try to "fix" the MSG format (there is nothing really fundamentally wrong with it on the binary level).
Again, most of my customers would rather accepts the current MSG file format limitations rather than lose the ability to open messages in Outlook.
I do realize that many people believe the statements "I need to display the message in Outlook" and "I must use MSG" to be equivalent. But they don't have to be. There's nothing stopping an archiving vendor from building a message on the fly. In fact, that's exactly what's happening under the covers when an MSG is opened.
I can see where you're going on this Stephen, and you make very good points, but implementing an ability to re-constitute a exported object back into Outlook is going to require the use MAPI which is not an option for many who are stuck using the monstrosity called the "Outlook Object Model" or OOM.
It is impossible to recreate an object in the "sent" state using the OOM. Re-creating certain complex Outlook objects such as task requests and appointment is also impossible using the OOM because of Outlook's use of numerous undocumented MAPI properties which are not exposed in OOM.
Another issue is more and more Outlook addins are being written in .NET which has no support for MAPI short of using libraries like MAPI33. The use of any MAPI with .NET is unsupported by Microsoft.
As developers we should be encouraging standardized open formats instead of everyone making their own.
Wouldn't it be better for everyone if Microsoft created a new open format that would be recognizable and supported by both MAPI and Outlook? One that IS suitable for archiving AND can be opened from the Windows explorer though a shell open/double click action?
Word, Excel, PowerPoint, etc. are all moving to an open format in Office 2007, why is Outlook not?
You're right that implementing the reconsitution does require MAPI, but not very much MAPI.
Here's an idea: Someone could start an open project to build a handler that knows how to open these XML files. All it would need to do is register in the file system to handle whatever extension is used, then when invoked log on to the default/current profile and build the message. It would then hand the message off to Outlook to display. That's really all that Outlook's MSG handler code is doing.
This would put us in a much better position for lobbying the Outlook team. Instead of saying "you should support a better format" you'd be saying "you should support the XYZ format".
BTW - I don't find the OOM or .Net observations to be relevent. The handler you use to open the files doesn't have to be tied into any other code. It doesn't even need to be an add-in. It can stand alone.
There is already MIME format, which Outlook itself can handle (EML files are currently handled by OE).
You already have IConverterSession used all over the place by Outlook; why not reset the EML file handler to outlook.exe?
The potential proiblem I see is the EX type addresses, while RFC really expects SMTP.
Yeah - I considered discussing MIME in my post, but MIME's an even worse format fidelity-wise for storing MAPI messages. You could TNEF encode all the MAPI stuff, but that doesn't help on the indexing front. Plus, TNEF has it's own problems.
Stephen, The .NET issue is relevant because Microsoft will not give support for a .NET program or addin using MAPI in any manner, that includes interop. Things like the MAPI33 library are unsupported.
Thus if you have a very large application written in .NET that works again MSG files solely using the OOM, you cannot give up using MSG files and do object re-constitution through MAPI without either giving up .NET and re-writing the application in C++, or giving up a large amount of Microsoft support.
For my case we've broken this rule long ago by using MAPI33 and have had to do quite a bit of haggling to get some degree of support for an issue we had with Outlook crashing and a .NET addin being used that was interfacing with MAPI. The issue was finally resolved and had nothing to do our using of MAPI.
I'm not outright rejecting your advice on this, we're seriously looking into giving up the saving/archiving of .MSG files.
Is there anything you can do on your end to remove the support barriers with .NET programs using MAPI through managed C++ like the MAPI33 library does? Managed C++ can bridge the gap with .NET and MAPI because the sensitive MAPI API calls that don't work from CLR interop can be made safely in the unmanaged portion of the code which is what the MAPI33 library does.
Once you're archiving to a text file MAPI doesn't need to be involved to process it. And there's no reason an application needs to be a single process. So I don't see the comments about .Net to be particularly relevant.
BTW, MAPI33 isn't removing any support barriers or gaps. It's doing exactly what we don't support, which is to use MAPI from managed code. The only thing MAPI33 gets you is the ability to be unsupportable faster.
The product teams are the ones that decided not to support MAPI with .Net. I don't see them changing their minds any time soon. There's no sense in lobbying me for a change. We're all well versed in all the arguments.
I think the biggest flaw in your argument for a user-defined format is that you seem to be under the impression that we developers are in control of our environment.
As a vendor, I supply MSG files to our clients. I have no control over their environment, which varies widely from client to client. I don't know if they're using a webapp to process and display these MSG files, or they're using Outlook, or any of a hundred other use cases I could come up with. Heck, I can't even guarantee that the client isn't running a *nix environment and has their own MSG parser.
The MSG format is one Outlook natively supports, so it is a loose file format from which hundreds of avenues of businesses have sprung.
As a "supplier" of data, we can't control the format. We can't tell every client "you need to design your own XML reader for this document type we invented for Outlook e-mails because MSG is unsuitable and we've no idea how or what you intend to do with it".
What you're proposing is a format war 100x the scale of Blu-Ray vs. HD-DVD.
The data suppliers can't get into the format wars, because we don't control the player. If and when Microsoft decides to replace MSG with something else, we will all happily march along. However, we the suppliers have no choice but to use it until then.