Rob Weir posted on his blog a couple of days ago an Update on ODF Spreadsheet Interoperability. I think it’s great that he has brought up spreadsheet interoperability, and specifically the issue of formulas, which seems to be the main thrust of his post. I mentioned on the day of our SP2 release last week that “I’ll be doing some blog posts that get down into more of the technical details, to help explain some of the engineering decisions that we made in our implementation,” and Rob’s post is a good starting point for that conversation.
For those who only want to read the first two paragraphs of this very long post, here’s a summary:
Before I get into the details, I think it’s worth noting that there’s nothing new here. The challenges of ODF’s lack of formula specification have been around for a long time, and many people saw the current situation coming.
Nearly three years ago, Stephen McGibbon had a good post covering the situation, which is worthwhile reading for some background on how we got to where we are today. As you can see from the quotes in Stephen’s blog post, many people in the ODF community – and the broader standards community – were dismayed at the decision to not include formula syntax in ODF. Others outside of Microsoft also pointed out the problem years ago. Rob’s blog post, as well as this post, are excellent examples of the sorts of interoperability challenges that those people saw coming as a result of the decision to not include formula syntax in the ODF standard.
The first thing I did upon reading Rob’s post was to try to follow his steps for myself, so that I could understand the context of his findings. Here’s the methodology he describes in his post:
The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps: Enter the title in A1 "May's Wedding Planner" and increased font size to 14 point. Enter formula = TODAY() in B3 and set US style MM/DD/YY date format/ Enter the date of the wedding as a constant in cell B4, also setting date format. Added simple calculations on cells B6-B8, to calculate days, weeks and months until the wedding. A11 through E16 is a simple ledger of the kind that is done thousands of times a day by spreadsheet users everywhere. Once you have the formula set up in column E (Balance = previous balance + credits - debits) then you can simply copy down the formula to the new row for each new entry
The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:
Sounds simple enough. So I fired up Open Office 3.0.1, and followed those steps. The resulting spreadsheet looked like this:
Then I saved the document, by clicking File/Save and then typing in a filename:
So far so good. Next step, I tried opening the document in IBM Lotus Symphony version 1.2.0. Here’s what I saw:
And then I opened the same document in Excel 2007 SP2, and here’s what I saw:
This is a great example of a common ODF spreadsheet interoperability challenge, and two different ways of dealing with it. The challenge is caused by the fact that Open Office writes formulas in a syntax that is unknown to Symphony and Excel. Open Office, for reasons I don’t understand, has decided to use as their default formula syntax the unfinished Open Formula specification, which is neither approved nor published by OASIS – not even out for public review yet.
So what do Symphony and Excel do about this challenge? The answer is that Symphony preserves the (unrecognized) formula markup, and Excel preserves the cached values. (A quick aside for those who don’t know: spreadsheets typically store both the formula and the value resulting from the most recent recalculation.)
Getting back to Rob’s initial premise of this being a typical wedding-planning exercise, if Maya were to send this spreadsheet to a person using Microsoft Excel SP2, that person would see the values as shown above. They’d know at a glance what day was ‘today’ when Maya made the file, and that the ledger balance that day was 5500.
But if Maya were to send this spreadsheet to a person running IBM Lotus Symphony, they’d see only the formulas. Perhaps an ODF markup expert like Rob would be able to massage that spreadsheet into something usable, but most people would find it a bit hard to conceive of what it means for a wedding to be “of:=B4-B3” days away, or for there to be a ledger balance of “of:=E15+C16-D16” dollars.
So what does Rob’s test matrix show for these two scenarios? Oddly, it labels Open Office to IBM Lotus Symphony interoperability for this scenario as “OK” and it labels Open Office to Microsoft Office SP2 interoperability as “Fail” (with a red background for added emphasis). Now, I know Rob works for IBM and probably wants to portray Symphony in the best possible light, but is that a reasonable assessment of the interoperability we’ve just seen above?
After further investigation, though, I think I see what Rob may have actually done to get the result in his table. He seems to have included some steps that aren’t documented in his blog post:
Here’s how it looks, for those who don’t have Open Office handy:
After following these steps, Rob was then able to create a spreadsheet that stores formulas in the undocumented non-standardized syntax that was used by old versions of Open Office. Symphony, being simply a fork of an older version of the Open Office code base, is able to understand those formulas, so it can load both the values and the formulas themselves.
It’s worth noting what the OpenOffice.org developers have to say about this option that Rob has apparently used for his interoperability testing:
So it sounds like there isn’t any single version of ODF that will provide compatibility across all versions of Open Office and Symphony. You can use the “may cause information to be lost” option if you want to do a demonstration of formula interoperability with ODF, but if you want to demonstrate text-bullet interoperability, you may need to use another option.
And what does Excel 2007 SP2 do with the document saved in this alternative format? Exactly the same thing it does with Open Office’s current default format: it displays the data, so that the document user can see the results of the last recalculation of the spreadsheet, and it ignores the formulas that are written in a non-standardized syntax that Excel doesn’t support. I think that’s a pretty reasonable approach, when a spreadsheet application comes across non-standardized formula syntax: show the last recalculated result, thus preserving the data, and don’t try to guess at the semantics of undocumented formula markup.
Why Doesn’t Office 2007 SP2 support Open Office formula syntax?
That’s a logical question to ask, in regard to how SP2 handles formulas. To answer it accurately and completely, we should distinguish between the two formula syntaxes that Open Office uses.
The first is the syntax they use in their non-recommended “1.0/1.1 (OpenOffice.org 2.x)” setting. This is an undocumented, deprecated syntax, and therefore not a reliable mechanism for formula interoperability. Despite what you may read on some blogs, it is not the same syntax as used by Excel 97/2000/2003. Open Office copied quite a bit of the feature set from Excel, and there are definitely similarities in the formula syntax, but there are also differences with regard to referencing, operators, data types, and function arguments.
The other formula syntax that Open Office supports is the Open Formula syntax, which will eventually appear in ODF 1.2. This syntax has not yet been approved by a standards body, nor has it undergone the 60-day public review period that OASIS requires prior to approval and publication. It may go to public review soon, but it won’t be a standard until later this year at the soonest, and the details may change as a result of the remaining TC work and the public review process. (According to recent discussions in the ODF TC, we may send the other parts of ODF 1.2 out for public review first, to allow more time to finish up Open Formula.)
In Office’s implementation, we haven’t chosen to support the draft Open Formula spec (as Open Office currently does), because we have certain obligations when we ship software that don’t apply to open-source projects like Open Office. We need to test and verify behavior to a degree that’s not possible without final, fixed documentation that is believed to be 100% complete and accurate. When Open Formula is completed, standardized, and published, we'll be looking at that as the future path for enabling formula interoperability in ODF spreadsheets. But we’re not there yet; ODF 1.2 is not done, and not even ready for public review.
It’s interesting to note that we have discussed this very issue at a DII workshop. Last July, we had a workshop in Redmond, with attendees including other ODF implementers, members of the ODF TC, standards professionals, and others. In the roundtable discussions, I brought up our approach to formula support as outlined above, and asked for feedback. Although it was not 100% unanimous, there was clear consensus among most of the participants in the discussion that they did not want us to implement a non-standard formula syntax in anticipation of it becoming a standard. “Putting the cart before the horse” in that manner was seen by many as a possible source of future interoperability problems, rather than a solution to them, and we took that feedback into consideration.
As I’ve covered before, we feel that thorough documentation of implementation details is a cornerstone of document format interoperability. We’ve published detailed implementer notes for our ODF implementation, and on the matter of formulas (which are stored in the table:formula element), here’s what our implementer notes have to say:
The standard defines the attribute table:formula, contained within the element <table:table-cell>, contained within the parent element <office:spreadsheet \ table:table-row> This attribute is supported in core Excel 2007. 1. When saving the table:formula attribute, Excel 2007 precedes its formula syntax with the "msoxl" namespace. 2. When loading the attribute table:formula, Excel 2007 first looks at the namespace. If the namespace is “msoxl”, Excel 2007 will load the value of table:formula as a formula in Excel. 3. When loading the table:formula attribute, if the namespace is missing or unknown, the table:formula attribute is not loaded, and the value “office:value” is used instead. If the result of the formula is an error, Excel 2007 loads the <text:p> element and maps the element to an Error data type. Error data types that Excel 2007 does not support are mapped to #VALUE!
The standard defines the attribute table:formula, contained within the element <table:table-cell>, contained within the parent element <office:spreadsheet \ table:table-row>
This attribute is supported in core Excel 2007.
1. When saving the table:formula attribute, Excel 2007 precedes its formula syntax with the "msoxl" namespace.
2. When loading the attribute table:formula, Excel 2007 first looks at the namespace. If the namespace is “msoxl”, Excel 2007 will load the value of table:formula as a formula in Excel.
3. When loading the table:formula attribute, if the namespace is missing or unknown, the table:formula attribute is not loaded, and the value “office:value” is used instead. If the result of the formula is an error, Excel 2007 loads the <text:p> element and maps the element to an Error data type. Error data types that Excel 2007 does not support are mapped to #VALUE!
And, as both Rob’s tests and mine show, that is exactly what Excel does. It would be great if there were a place implementers could go to see these sorts of details for all major ODF implementations.
I’m glad to see this sort of public scrutiny of the details of ODF interoperability and how the underlying challenges are handled by various implementations. As you can see, spreadsheet interoperability is a complicated topic, and in the specific case of ODF spreadsheets, there is even more complexity created by the lack of a defined formula syntax in any published version of ODF.
The good news, when it comes to formulas, is that the Open Formula specification will address this area soon. My colleague Eric Patterson represents the Excel team in the Open Formula SC, and the very capable David Wheeler leads that group. Much good work has been done already, and we look forward to seeing the final Open Formula spec go out for public review and then approval by OASIS. The nearly 400 pages of formula syntax documentation in ISO/IEC 29500 (Part 1, section 18.17) enables reliable formula interoperability in the Open XML community, and soon the ODF community will have a similar level of formula interoperability.
But formulas are not the only ODF interoperability challenge. As members of the ODF TC and also the OIC (ODF Interoperability and Conformance) TC, both Rob and I – and many others – will need to work together to enable better interoperability in areas including tracked changes, mail merge, application settings, and others. Will ODF 1.2 be the most interoperable version of ODF yet? I hope so, and there are signs that it will be. But our work is not nearly done.
So as I understand it, when Microsoft add extensions to an existing standard, as OOO does with formulas on top of ODF, it is called "Embrace, extend and extinguish" and is the vilest evil to ever have been visted on the earth, but when it's not Microsoft, they are merely "conventions" which should be supported by all other vendors in an ecosystem.
See "The Strategy" section in: http://en.wikipedia.org/wiki/Embrace,_extend_and_extinguish
Squirm in the irony of this when applied to OpenOffice + ODF.
I know this is expressed in a rather emotive way, but purely to illustrate the point.
One could say that one man's evil extensions are another man's conventions.
Of course I read your words carefully, and my response reflected that. I said standards + cruft, not cruft without any standards.
As I said before, maybe this is a harsh implementation, but this is the post-EU-slapdown Microsoft, they have to be wary. They don't want to be accused of being complicit in EEE by proxy for OpenOffice ;-)
And again, you can still use the Sun plug-in. They are the originators of said extension!
Another thing - any response to my first question?
"So I suppose the only difference is that the defaults are different in 1.3 ;-) If that's really the only difference then I think we are all allowed a bit of a laugh at your outrage. I hope there is a bit more to it than that. A quick overview of the 1.2-1.3 changes that explain the new behavior would be good."
It's funny to see your point: numbers only = data preserved (I bet that several economists will disagree with you, but ok).
No additional comments :)
dmahugh: "Well, our "divergent approach" as you call it is based on the one and only approved and published standard for formula markup that exists, as of today."
Irrelevant for the following reasons:
1) The OOXML formula language was not approved and published as a separate standard. Rather it is part of a larger specification. Thus, it was not approved as a general purpose formula language. It was approved as part of a larger specification.
2) The OOXML formulas do not match the syntax of the formula examples given in ODF 1.1 subsection 8.1.3. For instance, they exclude the brackets around the cell references. Thus the OOXML formula language is not in keeping with the spirit of the ODF 1.1 specification.
3) The OOXML formulas violate ODF 1.1 subsection 8.3.1 by failing to include a dot before the cell reference.
4) OOXML formulas are not a standard, defacto or otherwise, for other ODF 1.1 implementations.
From the main article: "Open Office, for reasons I don’t understand, has decided to use as their default formula syntax the unfinished Open Formula specification, which is neither approved nor published by OASIS – not even out for public review yet."
I beg to differ:
Latest draft is less than a week old. That's not public enough for you?
Gareth, this is not the time or place to announce the exciting new features in Lotus Symphony 1.3 You'll want to keep posted to http://symphony.lotus.com for that information.
@Dough, "Well, our "divergent approach" as you call it is based on the one and only approved and published standard for formula markup that exists, as of today."
Well there is at least a big issue with this approach: the "approved and published standard for formula markup that exists" is NOT conformant with ODF 1.1 (there is no "." in front of the cell references). So the OOXML formula syntax is NOT a possible choice if you want to respect the ODF standard.
You can argue as long as you want, it is clear to everybody (even to you if you want to be honest) that this is again a trick used by Microsoft to break interoperability.
But Microsoft making promises and not following them is no news: "the Commission notes that today's announcement follows at least four similar statements by Microsoft in the past on the importance of interoperability" http://europa.eu/rapid/pressReleasesAction.do?reference=MEMO/08/106
@Gareth, "this is the post-EU-slapdown Microsoft, they have to be wary."
Well, Microsoft should be more wary then: "In its ongoing antitrust investigation concerning interoperability with Microsoft Office, the Commission will investigate whether the announced support of ODF (OpenDocument format) in Office leads to better interoperability and allows consumers to process and exchange their documents with the software product of their choice." http://europa.eu/rapid/pressReleasesAction.do?reference=MEMO/08/324
This is clearly a very bad start...
@Doug, let's have a closer look at your "Guiding Principles":
* Adhere to the ODF 1.1 Standard: Failed. The formula syntax used in SP2 does not adhere to ODF 1.1
* Be Predictable: OK. You summarise this as "The principle here is that we want to do what an informed user would likely expect. " Indeed, a user informed of Microsoft usage would likely expect that you find a trick to break interoperability.
* Preserve User Intent: Failed. The obvious intent of the user is to use formula, not to silently have them erased.
* Preserve Editability: Failed. As the formula are gone, you can no longer edit the file successfully.
* Preserve Visual Fidelity: OK.
I would say this is not a bright score...
@Luc so you feel that the EU should deem extensions not included in a standard as a positive interoperability initiative that is to be (or forced to be?) supported in this case, but not say in Java or any other examples of EEE.
One can't bang on about interoperability with extensions now, when it has been decided that standards are the cure. Is interoperability the most important issue, or is it standards? Maybe a few vendors could get together in a smoky room and thrash out interoperability behind the scenes instead, would that be better?
I don't think the EU would base a legal challenge on this type of non-equitable basis.
Microsoft have been supportive of implementing any future, published standard in this area, so the point is moot.
The press release contains 2 key concepts: ODF (no formulas, so irrelevant w.r.t the standard) & better interoperability.
There is now native support, rather than just the old MS-sponsored plug-in. Whether it is better or not, I don't personally know, but there has certainly been a considerable effort in terms of the code and implementation notes.
I imagine that over a wide range of scenarios (excepting the ones targeted to raise the temperature) the interoperability is better.
I guess only real customers with no agenda will be the only ones to be able to verify that.
@Rob I am sure they will be beavering away at some features inspired by this very thread as we speak ;-)
@Gareth: the role of the EC here is not to validate conformance to whatever standards. It is to check if there is an abuse of a monopoly position that harms customers. The EC press release don't talk about standards. It talks about "better interoperability" and allowing consumers "to process and exchange their documents with the software product of their choice". And SP2 offers worse interoperability than the previous CleverAge plug-in, not better. Conclude yourself.
It is easy to see, even for the EC, that the recent claims of Microsoft to be the White Knight defending the sacred Standards against the Evil OpenOffice and its Dark Lord Rob Weir is purely an hypocritical position, resulting is less interoperability rather than more interoperability.
Doug, we all understand you are deeply troubled by the failure of your employer to meet the expectations on a product update that is an exact match with your job title "Office Interoperability".
However, I don't think it is a great idea to try and save face by telling lies and trying to pretend that nothing bad happened. Nothing you say can cover up the very obvious facts:
1) Excel 2007 does not produce conformant ODF 1.1 documents. (Small but important issues with missing brackets and the leading dot.) That is not interoperability.
2) Excel 2007 won't open any ODF 1.1 spreadsheet created by any other ODF-supporting software without silently crippling it by deleting all formulas. That is not interoperability.
3) No other ODF 1.1 spreadsheet application will open an (almost-conformant) ODF 1.1 spreadsheet saved from Excel 2007. That is not interoperability.
The actual product release certainly does not adhere to your own guidelines which you are quoting:
"Adhere to the ODF 1.1 Standard", "Be Predictable", "Preserve User Intent", "Preserve Editability"
and "Preserve Visual Fidelity". You need to admit that SP2 failed on at least three of five accounts here. (Deleting all formulas is bad but formally predictable behavior, and visual fidelity is preserved even by just importing a static table.)
Excel 2007 is not interoperable through the ODF format, and does not yet deliver on your promises. Admit that, and maybe we can get on with fixing it instead of squabbling over why who did what. If you stay on your current path of denial, you will send the message that your job is not in fact "Office Interoperability", but something rather less honorable.
The cell-reference issue that some of you have brought up is part of the broader issue of formula syntax in ODF. We believe we are completely conformant with the spec on this matter, and that we have implemented formulas in a way that is consistent with the intent and spirit of the ODF specification. I'll post a blog post on this topic next, since I think it deserves more thorough and specific explanation than a comment box can contain. (As Fermat put it, "Cuius rei demonstrationem mirabilem sane detexi. Hanc marginis exiguitas non caperet.")
I must admit, it's pretty fascinating to me to see the level of interest people have in discussing something (formulas) that doesn't even appear in the ODF specification at all. Imagine how much fun we'll have when we start discussing things that are actually there!
"Imagine how much fun we'll have when we start discussing things that are actually there!"
this is the problem here, you are arguing with a guy who has "fun" with standards. Standards are a serious thing, not a game nor a Public Relations statement.
But, but, but ... you guys are making my head asplode!
If standards are so serious, why are all of these people arguing that we should support things like OO.o's historical formula syntax (never was a standard, never will be) or Open Formula (not yet a standard, probably will be eventually)?
Which is it? Do you think standards important, or not?
In either case, here at Microsoft, we feel standards are important. And we have fun, too.
Doug, if you are not convinced by Rob, please have a look at what the implementers of the add-in translator sponsored by Microsoft are saying:
"In its first versions, the OpenXML/ODF translator simply did not translate formulas at all. Only the static values were preserved. Unfortunately, users are not really interested in what is part of a standard and what not. They actually want to get a good translation which means the preservation of all visual content and even of the functional content such as formulas. Consequently, we (the OpenXML/ODF Translator team of Sonata, DIaLOGIKa and Microsoft) decided to implement a decent formula translation approach in version 2.5 after having received a number of complains."
So, Microsoft already supports OO.o's historical formula syntax, despite the fact that it never was a standard, and never will be.
Can you explain this:
- Microsoft first made an attempt at not supporting the OOo formula syntax, received complaints and "decided to implement a decent formula translation approach" in the translator;
- at the same time, Microsoft decides that it is against the Microsoft serious approach to standards to support the same decent approach for SP2.
This hypocrisy is so blatant that I would like to thank you for showing to the world and to the European Commission that there is no "new Microsoft", and that you remain the same monopoly abuser playing dirty tricks to avoid interoperability at all cost.
I do seem to see a higher degree of conformance required by microsoft by others. If its not in the spec, it is not advisable to implement it.
Let me take a case... why don't all browser manufacturers mimic internet explorer 6 to give practical interoperability.. This is because it is not the standard.
Now u want microsoft to provide practical interoperability with ODF even though its not in the spec.???.. Seems pretty one sided to me..
@gk: "Now u want microsoft to provide practical interoperability with ODF"
Microsoft itself committed to this practical, real world interoperability in its press release on 21-May-08: "Microsoft recognizes that customers care most about real-world interoperability in the marketplace, so the company is committed to continuing to engage the IT community to achieve that goal when it comes to document format standards." http://www.microsoft.com/Presspass/press/2008/may08/05-21ExpandedFormatsPR.mspx
Many customers and ODF supporters *asked* for support of ODF in MS Office. Nobody *forced* Microsoft to support ODF. But if MS decides to provide support, it seems logical to require a working support, not a broken one aimed at fragmenting the corpus of ODF documents.
What we see here is the same old tricks already used to fragment Java and HTML, for which MS has been punished respectively by an US court and the European Commission.
I have no doubt that once again Microsoft will be punished by the EC for their current monopoly abuser behaviour.