I really don't have anything to do with Office XML formats so can't contribute much of substance to the debate over Massachusetts' draft Enterprise Technical Reference Model v 3.5 which mandates the OASIS Open Document Format. This has generated a lot of weblog posts, mostly from open source advocates or employees of Microsoft competitors fulsomely praising it and hoping that this political decision will give their preferred technologies more economic clout.

A couple of more independent assessments raise issues that I find more interesting. For example, Stephen O'Grady acknowledges that Microsoft (or third parties) could easily support the ODF format is there is a business need to do so.  That gets to the  question of whether Massachusetts' very real needs for document interoperability and longevity are best met by the XML familiy of technologies in general, or one specific format in particular.  MS Office and OpenOffice / StarOffice have taken different paths here. Office 2003 can handle custom XML schemas and stylesheets quite dynamically, whereas OO has been hard coded to handle a specific schema, and the latest beta versions have evolved to support the OASIS ODF.  It's not at all clear to me why supporting specific schema(s), whether or not they are endorsed by a standards body, is considered more "open" or "standards based" than support for the more fundamental XML, XML Schema, and XSLT standards that Office 2003 implements. 

One technical note - one can't simply configure Office 2003 to handle OASIS ODF documents directly because the OASIS Technical Committee chose to define that format using the RELAX NG schema spec (endorsed by OASIS and ISO) rather than the W3C XML Schema spec, the W3C Recommendation and which  Office 2003 supports.   There are some plausible technical reasons for this in that RELAX NG is simpler, based on a more solid formal underpinning, and somewhat better suited for defining complex textual document formats than is W3C XML Schema.  Unfortunately, that advantage does not carry over into tool support (few mainstream XML editors currently support RELAX NG validation) or support for structured data within the text.  For example, tools that support the popular data-oriented XML programming technique known as data binding (which allows XML to be parsed easily into instances of application-level programming objects rather than abstract node trees) almost all require W3C schemas as input.

This gets to a second issue, raised by Joe Wilcox (and something I debated at length in previous O'Grady postings):

Considering the OpenDocument format is only truly supported by OpenOffice 2.0, which isn't even available yet, I'm at a loss to see how the XML-based format meets the Commonwealth's goals for openness or backward compatibility. Nobody's really using the format yet, right? How, uh, open is that?

My other problem is one of definition. Looks like the Commonwealth considers Adobe's PDF as open, because the spec is openly published. OK, I'm scratching my head, because if you download Corel's WordPerfect SDK the WPD specification is right there. As for Microsoft, while I'm grumbly about the company's liberal use of open, I have to say if PDF meets the Commonwealth's standard so should Office formats; at the least the XML-based formats coming with Office 12. Microsoft does publish its XML schemas and license them on a royalty-free basis.

There have been all sorts of inconclusive debates about the real meaning of terms such as "standard" and "open", and I don't want to go there.  It's important, however, to appreciate that there is a very real distinction between a specification ratified by a committee or standards organization, and a "standard" that enables real-world interoperability.  MS Word's binary format or PDF are usually considered "de facto standards" in the sense that one can reasonably expect a random correspondent to be able to read a document in one of those formats.  Obviously we can do better, and  almost all concerned believe that moving to some sort of openly documented XML format is a better way to achieve short term interoperability and long term usability of documents.  In general, the more eyes that have helped debug a spec and the more organizations have endorsed it, the more of a real standard it will be. But there are plenty of examples of standards organizations producing specifications that have not led to real world  interoperability of any significance.  W3C XLinkOASIS WS-Reliability,  and ISO HyTime are clear examples of this in the SGML/XML world.

The important thing to remember is that industries, not industry standards committees, are the ones who produce industry standards.  Knowing many of the people who helped produce OASIS ODF, I expect it to be a carefully crafted and good-quality specification, but it has to prove itself capable of solving real-world problems before it can legitimately be called an industry standard.  For example, its reliance on RELAX NG makes it pleasing to XML geekdom, but greatly complicates the processing task for most actual developers. Will  the mainstream be diverted by the need to support ODF documents, or will ODF remain in a backwater?  Good intentions don't often make for successful policies.

Bismarck supposedly said ""People who enjoy eating sausage and obey the law should not watch either being made".  That applies to industry standards, which we enjoy once they've been "cooked", but get produced by a messy process at best.  It's easy to sympathize with Massachusetts' desire to buy its document standard sausage only from nice clean  kitchens that use wholesome cruelty-free ingredients, but I've spent too much time in the XML standards sausage factory to believe it until I taste it. 

Tofu bratwurst for the Labor Day picnic, anyone?