This blog post covers the main presentation from our ODF workshop that took place in Redmond last week: Peter Amstein’s explanation of the guiding principles behind our support of ODF in Office 2007 SP2. I’ve added explanations of some of the details that were covered verbally in the workshop, but if anything’s not clear here, please let me know.
Why ODF 1.1?
We’re implementing ODF 1.1 in our initial release of ODF support. We chose this version because it is the most current approved ODF specification, and because it is the version of ODF that current release versions of most other applications such as OpenOffice also support. We will support ODF in Word, Excel and PowerPoint, using the file extensions .odt, .ods, and .odp. The exact release date for Office 2007 SP2 has not been announced yet, but we expect ODF support to be available sometime in the first half of 2009.
As we set out to build in support for ODF, we developed a set of principles to guide our implementation team. Those principles are:
Let’s take a look at each of these principles in more detail and with some examples.
Adhere to the ODF 1.1 Standard
Where the specification is clear and mapping between OOXML features and ODF features is straightforward, this is of course no problem. For example, OOXML’s italics property maps neatly to ODF’s italics property.
When we found the specification to be ambiguous, we decided to follow common practice as long as it adheres to the standard. We did not create extensions in the case of features supported by Office and OOXML that are not in ODF at all. For example, ODF doesn’t support the concept of multi-stop gradient fill for shapes, but Office supports this concept. So we chose not to write multi-stop gradient values when saving to ODF.
Extending the ODF spec might have been a pragmatic approach to addressing gaps in the spec in the short term. But we felt that it would not be good for the ODF ecosystem in the long term since other applications wouldn’t be able to read those extensions (unless those products also implemented the same extensions we do) – and we don’t see that approach as promoting interoperability or the best experience for ODF users. We also don’t want to be accused of “co-opting” ODF and “polluting” the cyberspace with many ODF files that don’t adhere to the standard. We think it is better to evolve ODF with the community in the OASIS Technical Committee and/or the appropriate SC34 Working Group.
On the flip side, Office does not have support for Gantt charts, but ODF does allow them. When we load an ODF file that contains a Gantt chart we leave the chart area blank rather than try to map it to some other type of chart. But we preserve the chart data so that the user can pick another chart type from the Excel UI if desired.
The principle here is that we want to do what an informed user would likely expect.
Where ODF is a superset of OOXML, we can either ignore the ODF-only constructs, or map them to an OOXML construct where there is a logical way to do so.
When OOXML is a superset of ODF, we usually map the OOXML-only constructs to a default ODF value. For example, ODF does not support OOXML’s doubleWave border style, so when we save as ODF we map that style to the default border style.
Preserve the user’s intent
In simple cases, it isn’t a problem for Word to preserve document structure and semantics when saving an ODF file. For example, a document heading can be saved with a heading style that has an associated outline level.
In more complex cases we preferred a neutral approach when saving to ODF rather than implying semantics that the user did not intend. For example, in Word one can color code the bullets in a bulleted list by applying a color attribute to the paragraph character for the list item. Word can persist that attribute when saving to OOXML, but ODF does not have the concept of paragraph characters with attributes.
If we were to apply the color attribute to the paragraph style that would cause the entire list item to take on the color, and this might imply more than the user meant. So we choose to drop the bullet color, rather than color the whole list item.
We want to preserve the user’s ability to edit the contents of their document even if they have used a feature that can’t be saved to ODF, so that what the user sees in the document and how the user interacts with the document will not be changed until the user saves and closes the file.
For example if you insert a table in a PowerPoint slide and save as ODF, you still have a table in your open presentation with all of the normal table editing behaviors – you can easily add a row or insert a column, for example. The table becomes a group of shapes only after the user closes and reopens the file. Or as another example, you can open an ODS spreadsheet with Excel and use the conditional formatting features to analyze trends in the data. But the conditional formatting will not be preserved when you save and close the file.
Preserve Visual Fidelity
Wherever possible we write the ODF in such a way as to preserve visual fidelity when the document is opened in another application.
Chart gap width (e.g., the space between bars in a bar chart) is a good example. If the gap width of a chart is not specified in the file, OpenOffice applies different defaults than Microsoft Office and will render Chart gap widths differently. So in this case, Office will write our Chart gap width even when the gap width is the default value—i.e. when we traditionally wouldn’t write it.
High Level Architectural View
Word, Excel and PowerPoint have a Model-View-Controller design. The in-memory representation of the document, or Model, is designed to facilitate document revision and display functions and includes concepts which are never saved to the file, such as the insertion point and the selection.
The persistence code converts this in-memory representation to and from some sort of the disk file based representation. Office 2007 already had code to support a number of angle-brackety persistence formats including HTML and OOXML. When we built in support for ODF, we added it in that area of our code.
That’s a general overview of how we’ve approached ODF support in Office 2007 SP2. These topics were also the foundation of the roundtable discussions we had at the workshop; for a variety of perspectives on those discussions, see the blog posts by Dennis Hamilton, John Head, and Jesper Lund Stocholm.
This is very useful - sounds like a good approach in general.
I have a couple of questions.
In OOXML outline level is an attribute that can apply to multiple styles (ie "Heading 2" and "Heading 2 - numbered" could both have outline level 2) whereas in ODF there is one document outline with one style at each level. What are you planning in that area?
I'm also very interested in what you're planning to with lists and particularly list styles. OOXML have different list models when you load a Word doc into OpenOffice.org Writer it creates styles even when they existed in Word - this could be improved when saving to ODF from Word but the models are quite different so how do you propose to get reasonable interop?
Doug ne povesteste despre motivație, arhitectură și principiile implementării ODF din viitorul Office
@Peter - You might have better luck with your questions on the Interoperability Forums (http://forums.community.microsoft.com/en-US/tag/interoperability/forums/) in the Interoperability through Standards section. I'm not sure that the Microsoft Office team has their own place for this just yet.
Going through the various blog posts it seems there is very good coverage of the event from the combination of the bloggers' information.
I agree with John Head's perspective of "throwing the user a rope" when they get to the point of saving a "non-native" document with an application - in this case, Office 2007 SP2.
It may be a bit of a reach to play out all the scenarios in a preview, but certainly informing the users would be very useful in some cases.
However, you also have the scenario where you have an organization/user community that don't really care all that much, so drowning them in complex details about all the evils that may befall them if they continue this heinous act of saving a document just scares them into paralysis.
Perhaps you can have an app-level setting that is on by default (and could be group policy enforced for interop-zealot organizations) that when a user first opens an ODF document a message pops up saying "I see you have opened an ODx file, do you want a vast amount of information on fidelity issues thrusting at you when you save a file, plus choices on how you "dither", plus a nice report you can store as a companion document - (an interop scorecard if you will!), or do you not really care and would prefer not to be pestered when you save."
Or something to that effect ;) Once the choice is made (unless enforced by GPO), the user gets "hawk" or "dove" interop feedback.
The other interesting case would be when Office is actually being used as the creation tool for ODx files. Would there be an opportunity to explain at design/creation time to a user just about to commit an interop sin such as the "doublewave" and say "careful with that border, Eugene".
Currently the only cue you have is the save choice, but you might want to consider using the template mechanism (New Blank ODT Document?).
Don't know if this fits in with the vision for the implementation specifics of the ODF support.
Would have been fun to be there, if only for the socializing.
Peter, these are great questions, and good examples of the complexities involved. On the outline styles, we don't extend the spec (per the principles above), so I think we probably map multi-style outline levels into a single level, but I need to confirm that. I'll follow up with the implementation team and get some more specific info on this topic and the lists question.
Gareth, the socializing was fun, and would have been even more so if you were there!
I've got some question about a specific feature mapping between the formats:
Many companies use a different paper tray for the first page of a document to print this on special paper with a company logo.
Can this page setup feature be correctly mapped to ODF and back to OOXML in all cases (i.e. including mailings, documents with multiple sections, etc.)? If not, what are the restrictions?
I'm having a little trouble with one aspect of your last diagram. Either I'm not understanding something or there is a small error in it (arrow problem).
The "Translator API" box is shown intercommunicating with the *entire* built-in file open & save box (blue). I thought it worked only through the RTF sub-block.
Can you enlighten me?
Microsoft a récemment annoncé son ambition d’apporter un support natif du format OpenDocument directement
Doug, you are great at writing about such things.
We faced all of this while recently adding OpenDocument as a new format to our Aspose.Words and we essentially made all the same choices. But we have not written about it so well...
Where do I find and hire people who can do this for me? :)
Stefan, I'll have to look into that one in more detail, both our implementation and also what it says in the ODF spec. We use the paperSrc element in WordprocessingML to indicate paper source, but I don't know whether there's a corresponding element in ODF to be used for this purpose. If there is (anybody know for sure?) and the value spaces can be mapped in some way, then round-tripping as you describe should be possible.
Ian, that's a good catch on the diagram. You sure pay attention to the details. (That's a good thing!)
The diagram is incorrect for the current version of Office, because as you mentioned the translator API goes through RTF. But it's correct for SP2 -- beginning in SP2 there will be a new translator API that also supports Open XML as the intermediate file format (for all 3 applications Word, Excel and PowerPoint).
Roman, I worked closely with Peter Amstein on this material, so you're probably reacting to his words and not mine. :-) But it's good to hear that another implementer has made the same choices on these sorts of issues.
Stefan, it looks like the core team is still working on that feature (paper source mapping), so we don’t know yet if there are going to be any restrictions on how well it works when saving to ODF.
The way ODF does this is that each page has a page style, and each page style can have a paper-tray-name associated with it. So there could be a different paper tray/source for each page of a multipage document.
Because Word does not support more than 2 paper trays per document (one for the first page and one for all the others) if an ODF file has more than 2 paper trays specified we will have to use the “Be Predictable” principle to map those to the two we support.
Doug, thank you for the insight into ODF. Seems like a bit of a challenge to design the paper source mapping predictable while also suitable for a good automatic conversion in both directions.
What might even add to complexity, is that Word actually supports applying different first and other page trays in different sections of the document.
This is also used by Word in mailings: If you output a mailing letter with page 1 in first tray and other pages in second tray into a new document containing all the letters this is what Word will do:
Each letter will be in its own section having its own paper tray settings! If the single letter already was multi-sectioned the scenario even gets more complex.
But I hope at least in the single-section letter scenario paper sources will be able to convert in both directions without need for manual adjustments by the user.
Please can I clarify something...
Suppose I create an ODF document using some other vendor's product. Without me realizing, this supports features which MS Office doesn't, including Gantt charts, and I decide to use them. Let's say I call my document "Product Plan (Pre-Release)".
When this is complete, my boss wants to email it to the whole company, but first she needs to edit it, to remove the word "(Pre-Release)" from the title, and she uses MS Office to make this tiny change.
Has she just trashed my Gantt charts?
Also what happens "the other way around", if I create my document using MS Office and (without realizing) MS-Only features, then she edits it using Open Office?
If this is a real problem, please can MS Office have an option to warn (or stop) users whenever they do something that might cause this type of problem.
Just a quick question, but is there someone liaising with the OASIS technical committee to have non-depreciated features added to the next version of ODF? Most of these features probably wouldn't be too hard to add to this specification, and then the ODF documents produced by Office in the future would be of higher quality. From memory, are there not a few people from Microsoft already on this committee? I'm sure most features, well if they don't have a negative impact on the spec, would be accepted.
Doug Mahugh said: "Extending the ODF spec might have been a pragmatic approach to addressing gaps in the spec in the short term. But we felt that it would not be good for the ODF ecosystem in the long term since other applications wouldn’t be able to read those extensions (unless those products also implemented the same extensions we do) – and we don’t see that approach as promoting interoperability or the best experience for ODF users. We also don’t want to be accused of 'co-opting' ODF and 'polluting' the cyberspace with many ODF files that don’t adhere to the standard. We think it is better to evolve ODF with the community in the OASIS Technical Committee and/or the appropriate SC34 Working Group."
There is no logic to this statement. I'm not going to say that Microsoft isn't in a "damned if you do, damned if you don't" situation, because they clearly are. However, this decision is so obviously self-serving in so many ways that much of the flak they're going to get for this policy is entirely justified.
Proponents of ODF over OOXML have for quite some time challenged the OOXML on the very premise that its features could be implemented on top of ODF using ODF-compliant extensions. Wouldn't an outcry regarding adding extensions in a manner consistent with the ODF specification actually serve to expose the hypocrisy of OOXML opponents? Ah, but if you did that, you'd be providing the very evidence your standards opponents need to kill OOXML, now wouldn't you?
At the same time, anyone who wants to use ODF as their primary document format is treated as a second-class citizen. In spite of the fact that valid and conforming ODF documents that support any Office 2007 feature can be created using extension (that are fully documented and limited to a specific version of ODF), they're forced to discard Office-specific features an suffer constant software nagging regarding loss of data. Yet the only justification given is essentially that Microsoft fears bad PR. I appreciate Microsoft's fear of the Embrace And Extend(TM) label, but the "X" in XML does stand for extensible, and there is little room for criticism when your competition have all agreed on a standard for ODF extensions.
It would seem that the only real reason for not using ODF extensions for Office 2007 features is to technically claim support for ODF while simultaneously discouraging its use. Microsoft is attempting to manipulate what formats its customers use under the guise of being too craven to do the right thing for its customers. I suspect that this will garner respect from neither its customers nor its critics.
"Where ODF is a superset of OOXML, we can either ignore the ODF-only constructs, or map them to an OOXML construct where there is a logical way to do so."
This really betrays the fact that the OOXML format was designed to reflect the internal data structures in MS Office 2007 rather than being developed as an interchange format. Otherwise, you wouldn't be using those terms interchangeable, and you'd take advantage of ODF constructs that more closely match the internal memory model than constructs structures in OOXML.
"For example if you insert a table in a PowerPoint slide and save as ODF, you still have a table in your open presentation with all of the normal table editing behaviors – you can easily add a row or insert a column, for example. The table becomes a group of shapes only after the user closes and reopens the file. Or as another example, you can open an ODS spreadsheet with Excel and use the conditional formatting features to analyze trends in the data. But the conditional formatting will not be preserved when you save and close the file."
You can actually include a spreadsheet in a presentation "by reference" in ODF 1.1 like this:
<draw:object xlink:href="Link to OpenDocument spreadsheet table" />
Thus, an application can simply write the table elsewhere in the *.ODP archive and reference it. (There's been some discussion about a reference to the markup not being "native", but so long as the object can be identified as an OpenDocument spreadsheet table and parse accordingly, I don't see what the problem is.) Ironically, ODF 1.2 should allow embedding of table markup directly, so this situation could have been avoided by targeting ODF 1.2 instead.
I do appreciate your methodology, though. It's a good reference for doing gracefully degradation in a document editing context. However, this graceful degradation should have been couple with a series of well documented extensions that preserve information specific to Office 2007 while fully conforming to the ODF standard. Restricting your file output to the lowest common denominator of a version of ODF that will be officially out of date in a matter of months does not serve your customers. It only serves to encourging the use of your own proprietary file formats for a little while longer.
Bottom line: If you really want people to take you seriously you need to do the following...
1) Implement ODF 1.2 using the principles you outlined. (ODF 1.2 may still be in draft form, but there's at least one existing implementation, and it will likely be published in its final form before your ODF 1.1 support is even released.)
2) Implement all features not directly supported by ODF 1.2 as valid and conforming extensions for use with ODF 1.2 only. Document the heck out of these features so that other vendors can support these features if they wish to.
3) Work with OASIS to eliminate the need for most of these extensions in ODF 1.3.
4) OPTIONAL: Discontinue work on OOXML. (Once you do the three things above, it would largely be a token gesture, though. Once the customer has a fully supported format with true industry-wide support, no one will really care. It'll be like VML.)