One of the main reasons we announced this new file format so early was that we wanted to give people an opportunity to start working on building different types of solutions on top of the file formats. I’m pushing for an early release of the schemas (sometime before Beta 1), but that still leaves us with a few months before they would be out. So, in the mean time, the best way to start playing around with potential solutions is using Office 2003. There is already a ton of XML support in that product. While the announcement of these new default XML formats is a big deal, it is definitely not the first time we’ve worked with XML. In Office 2000 (which we started developing in 1997) we build an HTML format that leveraged XML for representing things like document properties and other application specific information. This was done because HTML didn't support all of our features and we didn't want people to lose information when saving as a web page. It was unfortunate because it didn't look like "pure" HTML, but it was necessary to support our customers data. Starting in 1999 we began building the SpreadsheetML format that shipped with Excel in Office XP. Then in 2001 we started working on the WordProcessingML format which is now available in Word 2003. So, as you can see, we’ve been doing stuff with XML in Office for the past 8 years. Why the brief history lesson you ask? It’s important to understand that the new formats coming out with Office12 are based on the work we’ve done up through Office 2003. So, if you build solutions on top of Word2003’s XML, those will map fairly easily into the new file formats. For Word, the only big difference with the new format is that we break the single XML file into multiple files and wrap them all up in a ZIP package (We’ve actually designed a logical model for structuring documents from multiple pieces which we then mapped into ZIP). Today I want to show an example of something you can do with WordprocessingML in Word 2003.
There have been a number of questions around support for other XML formats (there are tons of them out there). As I’ve described, since the formats are XML and fully documented, anyone can build transforms to go from our format into another (or vice versa). I decided I would post a really simple transform that runs against Word 2003 XML just to give folks an example. This transform will get rid of all the tracked changes and comments in a file. It does the exact same thing as if you were editing the file directly in Word and chose to accept all revisions. This transform is something that people could leverage as part of a workflow process. Imagine if you had documents you wanted to publish and you wanted to make sure there weren’t any deletions or comments in the files. I’m sure you’ve heard of people getting burned by posting documents on a server that had deletions in them. Often times the end user didn’t realize the deletions were still there, and there wasn’t an easy way for administrators to write an automated process to remove those deletions. Well, using XML, it’s easy to write solutions that manipulate Office documents without having to run the applications themselves.
So, that's just one example of writing a tool that manipulates a Word document. If you were going to try to do something like this with the binary formats, it would have been extremely difficult. Most people that are trying to do this today usually end up writing code that automates the Office applications. The advantage with the XSLT is that you don’t need to have the Office applications involved (in the demo we had Word apply the XSLT, but you could have used any number of tools to do it).
Let me know if you guys have any questions or if the XSLT doesn’t work for you. I think in my next post I’ll talk more about the Word schema and how we designed it. At first glance it’s a fairly intimidating schema, but as you learn about it, it’s pretty basic and straightforward. There are just a ton of features in Word, so we had to create XML to represent them all. That doesn't mean that you need to deal with them all though if you're just trying to do something simple. Also, does anyone feel like it would be useful to have some posts talking about more of the basics around XML? Or does everyone feel like they are already up to speed on everything I've discussed and just want to see more technical posts?