You've probably been hearing about new Office 2007 file formats and that the fact that they are based on Open XML is a great benefit. But why? Why should you as developers care? Are they nothing more than just a compressed file format? Actually, they're much, much, more than that. They're an incredibly powerful building block that you can base your Office Business Applications on. Let me explain in more detail…
First of all, Word, Excel and PowerPoint 2007 files now act as containers since they are actually compressed zip containers (just try changing the extension to a .zip and you'll know what I mean). To an end user, the file still looks like a single item but to the developer, the file is a package of parts, segmented in a logical tree structure, tied together by relationships which you can navigate through. No longer do you have the black box of a binary file from previous file formats. So knowing this, what are some of the benefits?
Interoperability – Because of the open standard of the ECMA Open XML file formats, you can do things like generate files from Office 2007 documents on a non-Microsoft platform like Linux. Take for example a partner called Sonata, who created an awesome solution for the Linux platform where they took an Office 2007 Word document and then using XSLT, converted it to HTML and published it.
Mitigation of File Corruption – Since we now have a segmented architecture this means that if a part gets corrupted, the other parts of the package should still be safe. For example, if your style part is corrupted, you will still be able to open your document, it just won't look as nice. Also, corruption tends to occur as a result of truncation. Because we are no longer working with a black box, you can mitigate data loss by putting the most important information at the top of the XML files in the package parts.
Security – Security is greatly improved as a result of the segmented architecture. Macros now have their own parts within the package and it is easy to separate this portion from the content to better manager security issues. This is why we have the new file formats such as .xlsm, .pptm, .docm, as well as the template macro-enabled versions. All these have macros contained within them.
Digital Signatures –Now you have more power over what you sign within a document. You can digitally sign the packages using x.509 certificates, and you can sign all parts of the package, including even the digital signatures themselves. Imagine that as part of a workflow you were to sign that a certain portion of content has changed or has not changed, or perhaps you would want to digitally sign just the macro part. Some very interesting scenarios here.
Developer Scenarios
Talking about scenarios, let's get into some developer scenarios. First of all, I would suggest that you couple these scenarios with a workflow in MOSS so that the solution drives itself and stays streamlined within the business framework.
Because styles are in their separate parts, you can just go into the Style package part and manipulate it. Imagine you are an enterprise which has decided to change your logo and you have hundreds of thousands or documents in your repository. Since the files are no longer binary black boxes, you can now go in and easily change out the logo of all the documents since all of the Office documents have a common structure.
This is the most common Open XML development scenario. Imagine the following:
Note : In our business layer, we typically I suggest using some kind of template to give us a headstart.
VSTO and Open XML
VSTO abstracts working with Open XML so that you never have to go through the XML and navigate through the tree structure. It makes debugging and deployment of your solution on the server much easier. With "Orcas", the next version of Visual Studio Tools for Office, there is support for the Office 2007 file formats with the ServerDocument class which allows for manipulation of the cached data in the file in an abstracted way. Rather than writing the dealing with straight XML you can work with the business objects to manipulate the data within Office 2007 files for both your client side and server side applications, where debugging and deployment is made simpler and more integrated.
Resources
This past week at TechEd Orlando, Microsoft announced the Open XML SDK CTP. This provides classes which offer higher levels of abstraction for working with Open XML, code samples, How-To articles. For a great overview of this, check out the Channel9 video with Chris Bryant.
There are some awesome How-To's on the Office Developer How-To Center with videos, articles and code samples. More to come as well.
Of course there is the non-Microsoft www.OpenXMLDeveloper.org which is a wealth of information from a vast Open XML community.
For you Java developers, check out the article on Using Java to crack Office 2007.
And here are some others:
MSDN XML in Office Developer Portal
How To Manipulate Office Open XML File Formats
Word 2007 Content Control Toolkit
Blogs:
http://blogs.msdn.com/brian_jones/
http://blogs.msdn.com/dmahugh/
http://blogs.infosupport.com/wouterv/
http://blogs.msdn.com/kevinboske
http://blogs.msdn.com/xps/