Rob Weir posted on his blog a couple of days ago an Update on ODF Spreadsheet Interoperability. I think it’s great that he has brought up spreadsheet interoperability, and specifically the issue of formulas, which seems to be the main thrust of his post. I mentioned on the day of our SP2 release last week that “I’ll be doing some blog posts that get down into more of the technical details, to help explain some of the engineering decisions that we made in our implementation,” and Rob’s post is a good starting point for that conversation.
For those who only want to read the first two paragraphs of this very long post, here’s a summary:
Before I get into the details, I think it’s worth noting that there’s nothing new here. The challenges of ODF’s lack of formula specification have been around for a long time, and many people saw the current situation coming.
Nearly three years ago, Stephen McGibbon had a good post covering the situation, which is worthwhile reading for some background on how we got to where we are today. As you can see from the quotes in Stephen’s blog post, many people in the ODF community – and the broader standards community – were dismayed at the decision to not include formula syntax in ODF. Others outside of Microsoft also pointed out the problem years ago. Rob’s blog post, as well as this post, are excellent examples of the sorts of interoperability challenges that those people saw coming as a result of the decision to not include formula syntax in the ODF standard.
The first thing I did upon reading Rob’s post was to try to follow his steps for myself, so that I could understand the context of his findings. Here’s the methodology he describes in his post:
The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps: Enter the title in A1 "May's Wedding Planner" and increased font size to 14 point. Enter formula = TODAY() in B3 and set US style MM/DD/YY date format/ Enter the date of the wedding as a constant in cell B4, also setting date format. Added simple calculations on cells B6-B8, to calculate days, weeks and months until the wedding. A11 through E16 is a simple ledger of the kind that is done thousands of times a day by spreadsheet users everywhere. Once you have the formula set up in column E (Balance = previous balance + credits - debits) then you can simply copy down the formula to the new row for each new entry
The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:
Sounds simple enough. So I fired up Open Office 3.0.1, and followed those steps. The resulting spreadsheet looked like this:
Then I saved the document, by clicking File/Save and then typing in a filename:
So far so good. Next step, I tried opening the document in IBM Lotus Symphony version 1.2.0. Here’s what I saw:
And then I opened the same document in Excel 2007 SP2, and here’s what I saw:
This is a great example of a common ODF spreadsheet interoperability challenge, and two different ways of dealing with it. The challenge is caused by the fact that Open Office writes formulas in a syntax that is unknown to Symphony and Excel. Open Office, for reasons I don’t understand, has decided to use as their default formula syntax the unfinished Open Formula specification, which is neither approved nor published by OASIS – not even out for public review yet.
So what do Symphony and Excel do about this challenge? The answer is that Symphony preserves the (unrecognized) formula markup, and Excel preserves the cached values. (A quick aside for those who don’t know: spreadsheets typically store both the formula and the value resulting from the most recent recalculation.)
Getting back to Rob’s initial premise of this being a typical wedding-planning exercise, if Maya were to send this spreadsheet to a person using Microsoft Excel SP2, that person would see the values as shown above. They’d know at a glance what day was ‘today’ when Maya made the file, and that the ledger balance that day was 5500.
But if Maya were to send this spreadsheet to a person running IBM Lotus Symphony, they’d see only the formulas. Perhaps an ODF markup expert like Rob would be able to massage that spreadsheet into something usable, but most people would find it a bit hard to conceive of what it means for a wedding to be “of:=B4-B3” days away, or for there to be a ledger balance of “of:=E15+C16-D16” dollars.
So what does Rob’s test matrix show for these two scenarios? Oddly, it labels Open Office to IBM Lotus Symphony interoperability for this scenario as “OK” and it labels Open Office to Microsoft Office SP2 interoperability as “Fail” (with a red background for added emphasis). Now, I know Rob works for IBM and probably wants to portray Symphony in the best possible light, but is that a reasonable assessment of the interoperability we’ve just seen above?
After further investigation, though, I think I see what Rob may have actually done to get the result in his table. He seems to have included some steps that aren’t documented in his blog post:
Here’s how it looks, for those who don’t have Open Office handy:
After following these steps, Rob was then able to create a spreadsheet that stores formulas in the undocumented non-standardized syntax that was used by old versions of Open Office. Symphony, being simply a fork of an older version of the Open Office code base, is able to understand those formulas, so it can load both the values and the formulas themselves.
It’s worth noting what the OpenOffice.org developers have to say about this option that Rob has apparently used for his interoperability testing:
So it sounds like there isn’t any single version of ODF that will provide compatibility across all versions of Open Office and Symphony. You can use the “may cause information to be lost” option if you want to do a demonstration of formula interoperability with ODF, but if you want to demonstrate text-bullet interoperability, you may need to use another option.
And what does Excel 2007 SP2 do with the document saved in this alternative format? Exactly the same thing it does with Open Office’s current default format: it displays the data, so that the document user can see the results of the last recalculation of the spreadsheet, and it ignores the formulas that are written in a non-standardized syntax that Excel doesn’t support. I think that’s a pretty reasonable approach, when a spreadsheet application comes across non-standardized formula syntax: show the last recalculated result, thus preserving the data, and don’t try to guess at the semantics of undocumented formula markup.
Why Doesn’t Office 2007 SP2 support Open Office formula syntax?
That’s a logical question to ask, in regard to how SP2 handles formulas. To answer it accurately and completely, we should distinguish between the two formula syntaxes that Open Office uses.
The first is the syntax they use in their non-recommended “1.0/1.1 (OpenOffice.org 2.x)” setting. This is an undocumented, deprecated syntax, and therefore not a reliable mechanism for formula interoperability. Despite what you may read on some blogs, it is not the same syntax as used by Excel 97/2000/2003. Open Office copied quite a bit of the feature set from Excel, and there are definitely similarities in the formula syntax, but there are also differences with regard to referencing, operators, data types, and function arguments.
The other formula syntax that Open Office supports is the Open Formula syntax, which will eventually appear in ODF 1.2. This syntax has not yet been approved by a standards body, nor has it undergone the 60-day public review period that OASIS requires prior to approval and publication. It may go to public review soon, but it won’t be a standard until later this year at the soonest, and the details may change as a result of the remaining TC work and the public review process. (According to recent discussions in the ODF TC, we may send the other parts of ODF 1.2 out for public review first, to allow more time to finish up Open Formula.)
In Office’s implementation, we haven’t chosen to support the draft Open Formula spec (as Open Office currently does), because we have certain obligations when we ship software that don’t apply to open-source projects like Open Office. We need to test and verify behavior to a degree that’s not possible without final, fixed documentation that is believed to be 100% complete and accurate. When Open Formula is completed, standardized, and published, we'll be looking at that as the future path for enabling formula interoperability in ODF spreadsheets. But we’re not there yet; ODF 1.2 is not done, and not even ready for public review.
It’s interesting to note that we have discussed this very issue at a DII workshop. Last July, we had a workshop in Redmond, with attendees including other ODF implementers, members of the ODF TC, standards professionals, and others. In the roundtable discussions, I brought up our approach to formula support as outlined above, and asked for feedback. Although it was not 100% unanimous, there was clear consensus among most of the participants in the discussion that they did not want us to implement a non-standard formula syntax in anticipation of it becoming a standard. “Putting the cart before the horse” in that manner was seen by many as a possible source of future interoperability problems, rather than a solution to them, and we took that feedback into consideration.
As I’ve covered before, we feel that thorough documentation of implementation details is a cornerstone of document format interoperability. We’ve published detailed implementer notes for our ODF implementation, and on the matter of formulas (which are stored in the table:formula element), here’s what our implementer notes have to say:
The standard defines the attribute table:formula, contained within the element <table:table-cell>, contained within the parent element <office:spreadsheet \ table:table-row> This attribute is supported in core Excel 2007. 1. When saving the table:formula attribute, Excel 2007 precedes its formula syntax with the "msoxl" namespace. 2. When loading the attribute table:formula, Excel 2007 first looks at the namespace. If the namespace is “msoxl”, Excel 2007 will load the value of table:formula as a formula in Excel. 3. When loading the table:formula attribute, if the namespace is missing or unknown, the table:formula attribute is not loaded, and the value “office:value” is used instead. If the result of the formula is an error, Excel 2007 loads the <text:p> element and maps the element to an Error data type. Error data types that Excel 2007 does not support are mapped to #VALUE!
The standard defines the attribute table:formula, contained within the element <table:table-cell>, contained within the parent element <office:spreadsheet \ table:table-row>
This attribute is supported in core Excel 2007.
1. When saving the table:formula attribute, Excel 2007 precedes its formula syntax with the "msoxl" namespace.
2. When loading the attribute table:formula, Excel 2007 first looks at the namespace. If the namespace is “msoxl”, Excel 2007 will load the value of table:formula as a formula in Excel.
3. When loading the table:formula attribute, if the namespace is missing or unknown, the table:formula attribute is not loaded, and the value “office:value” is used instead. If the result of the formula is an error, Excel 2007 loads the <text:p> element and maps the element to an Error data type. Error data types that Excel 2007 does not support are mapped to #VALUE!
And, as both Rob’s tests and mine show, that is exactly what Excel does. It would be great if there were a place implementers could go to see these sorts of details for all major ODF implementations.
I’m glad to see this sort of public scrutiny of the details of ODF interoperability and how the underlying challenges are handled by various implementations. As you can see, spreadsheet interoperability is a complicated topic, and in the specific case of ODF spreadsheets, there is even more complexity created by the lack of a defined formula syntax in any published version of ODF.
The good news, when it comes to formulas, is that the Open Formula specification will address this area soon. My colleague Eric Patterson represents the Excel team in the Open Formula SC, and the very capable David Wheeler leads that group. Much good work has been done already, and we look forward to seeing the final Open Formula spec go out for public review and then approval by OASIS. The nearly 400 pages of formula syntax documentation in ISO/IEC 29500 (Part 1, section 18.17) enables reliable formula interoperability in the Open XML community, and soon the ODF community will have a similar level of formula interoperability.
But formulas are not the only ODF interoperability challenge. As members of the ODF TC and also the OIC (ODF Interoperability and Conformance) TC, both Rob and I – and many others – will need to work together to enable better interoperability in areas including tracked changes, mail merge, application settings, and others. Will ODF 1.2 be the most interoperable version of ODF yet? I hope so, and there are signs that it will be. But our work is not nearly done.
I think there are actually two issues.
One is what is the most appropriate format for Office SP2 to save, in the interim new world of strongly-labelled formulas before Open Formula is released. The other is what formats Office SP2 should load successfully, in the interim new world of strongly-labelled formulas before Open Formula is released.
The ODF spec clearly says that the namespace prefix can select the syntax. I don't have much problem with Microsoft exporting in Excel with a proper label, as the implementation notes say; it is up to other importing applications to add filters to get interoperability. The lag is just a ramification of the ODF 1.1 spec.
But what is good for the goose has to be good for the gander. For load, the current approach in SP2 and the implementation notes is clearly deficient for interoperability. Instead of failing if the syntax is not Excel, trying it in the dot notation would be genuine interoperability.
This is nothing more than Postel's law: being conservative in what you send and generous in what you receive, which has proved itself a pretty good mindset for implementers.
You're asking me about OS/2?!? I was still at Harvard when OS/2 was being marketed. You'll need to ask someone much older than me about that.
To Doug's point on "typically", the question is when does the scope of that word end? I suggest that it ends when the verbal phrase ends. This is standard English usage. Anyone is welcome to suggest an alternative reading, but you will need to state where the effect "typically" ends, and on what principle you make that reading. Simply throwing your hands up and saying ,"Oh my, this doesn't follow ISO drafting guidelines for the preparation of standards, therefor it means anything I want, or maybe nothing at all, based on my whim" -- that doesn't cut it. You need to state your principles and interpret the text that you have, not the text that you wish you had.
Thank you Doug.
I don't expect Rob to answer my question.
Nonetheless, another question arises, has IBM extended the above incentives to publications such as Groklaw?
One fact that intrigues me is all the fuzz about document standards knowing Linux market share on the desktop at 1% unchanged during the past 8 years:
The second fact that intrigues about all the fuzz on standards is the lack of a word processor and spreadsheet equivalent to MS Word and MS Excel in the Linux world until pretty recently:
The last fact the intrigues me is the lack of good GUIs in the Linux world (which have 'borrowed' a lot from MS Windows) until only a few years ago:
Why all of the sudden so much fuzz for document standards?
Rob, a key paragraph is:
"IBM will offer employees incentives ranging from medals to IBM software or hardware to cash, depending on how much effort they put into OS/2. In return, says Lucy Baney, an IBM marketing executive, the company will ask employees to approach their neighbors, their dentists, their schoolboards. Armed with brochures and talking points, the IBMers will sing the praises of OS/2 as the solution to people's personal-computing needs."
Let me refrase the paragraph for you:
IBM will offer employees incentives ranging from medals to IBM software or hardware to cash, depending on how much effort they put into ODF. In return, says an IBM marketing executive, the company will ask employees to approach their neighbors, their dentists, their schoolboards. Armed with brochures and talking points, the IBMers will sing the praises of ODF as the solution to people's document needs.
I tried the formula-thing in Gnumeric creating and saving a simple spreadsheet. The result: Gnumeric saves formulas in the interoperable way ( contrary to what the Microsoft product does ):
~ $ gnumeric --version
gnumeric version '1.8.2'
Does the world reference implementation slip out side your minds.
How do you think bugs are going to be found in the new draft standard for formulator if its not implemented.
OpenOffice is the reference implementation so they do items from draft. Still provide option to save in prior format.
I am sorry but openoffice's formula syntax is documented for all versions even the old Staroffice file formats style is documented. OpenOffice changed over to the draft standard because it would be more future safe.
Yes gnumeric koffice IBM Lotus Symphony and others have managed to operate in a compatible way. OpenOffice syntax 1 of them.
MS choose to be incompatible so be it.
Correct Openoffice's spreedsheet formal are not compatible with Excels . Openoffice don't have rounding errors and has more functions. Excel support is a small subset. So excel is still short many functions to do 1.2 support when it gets released.
Doug Mahugh and a bunch of the standards crew (both in and out of Microsoft) have been having a great
I wrote a comment here that was moderated. Thank you.
ghomen, I just looked through the comment table and I don't see anything from you that hasn't been let through. Can you tell me the rough date/time? Or re-post? Thanks.
My comment was:
You are trying to justify a major fault on Microsoft's ODF implementation with problems on a very specific and unuseful scenario (summing strings!). That scenario hardly matters at all when compared to the state of interoperability you left ODF with.
I sincerely think that there's no possible excuse for what was done given so many existing reference implementations (including your own supported plugin).
Probably it's not up to you to decide, but this really looks like a trick to spread FUD on ODF portability. People will notice, you know? There's nothing that can be said to save face at this time.
Can something be done?
Suggestion: issue an urgent fix for this and get your credibility restored.
Why the hell a lot of people is talking about a standard not yet approved (ODF 1.2)?
Hey guys, I´m a Linux user (Debian), and I agree with Microsoft in this point of view.
And... Why the hell OOo is using ODF 1.2?
This is out of standard! Where is the interoperatibilty?
Why the hell OOo don´t follow the ISO/IEC/Whatever 26300 (ODF 1.0/1.1)?
If they follow ODF 1.2, so they are OUT of standard and interoperability!
And... Come one, help me with this bug:
Renato S. Yamane
I have to say I am not a fan of Microsoft (to not say that I hate it), but no one has the rights to say anything against MS for the reason explained in your post (there are other reasons to not like the company where you work, but this one is not one of them).
It makes me wonder, how could ODF 1.0/1.1 be approved as an ISO standard? The lack of specification on how to define formulae is just a joke.
Let's see if MS Office will follow the new ODF standard with this problem fixed when it is ready and aproved.
To all that say there are no standards for formulas. Did you actually _read_ Rob Weir's blog post? I will copy the link for you: http://www.robweir.com/blog/2009/05/follow-up-on-excel-2007-sp2s-odf.html
Carefully pay attention to the quotes of section 8.1.3 and 8.3.1 of the ODF 1.1 standard. Then compare the entries for the different spreadsheet products and how they encode a formula. If you still do not see that Excel 2007 SP2 is the one that breaks the formal standard, then well, go on living in your universe....
Carl, everyone is entitled to their own unique interpretation of the normative requirements of ODF 1.1, if they're so inclined. The interpretation you've linked to is one perspective, although it seems to be a perspective held by one member of the ODF TC and no others.
Here are a few other opinions you may find interesting or helpful:
From an ODF TC member and ODF implementer: http://ajg.math.concordia.ab.ca/?p=4
From the convenor of SC34 WG1: http://adjb.net/post/Notes-on-Document-Conformance-and-Portability-4.aspx
From a longtime document standards contributor: http://broadcast.oreilly.com/2009/05/odf-11-formula-support-in-office-sp2.html
From the ODF editor: http://www.durusau.net/publications/whats_to_like.pdf
It's also worth noting that Excel 2007 SP2's approach to formulas was publicly demonstrated last July (at an interoperability event that I invited all members of the ODF TC to attend). So there's no new information here in nearly a year. For a non-Microsoft perspective on the formula details that was posted last year, see here: http://idippedut.dk/post/2008/08/18/DII-ODF-workshop-the-good-stuff.aspx
Un article de fond qui montre bien comment les problèmes d’interopérabilité sont complexes. http://blogs