Rob Weir posted on his blog a couple of days ago an Update on ODF Spreadsheet Interoperability. I think it’s great that he has brought up spreadsheet interoperability, and specifically the issue of formulas, which seems to be the main thrust of his post. I mentioned on the day of our SP2 release last week that “I’ll be doing some blog posts that get down into more of the technical details, to help explain some of the engineering decisions that we made in our implementation,” and Rob’s post is a good starting point for that conversation.
For those who only want to read the first two paragraphs of this very long post, here’s a summary:
Before I get into the details, I think it’s worth noting that there’s nothing new here. The challenges of ODF’s lack of formula specification have been around for a long time, and many people saw the current situation coming.
Nearly three years ago, Stephen McGibbon had a good post covering the situation, which is worthwhile reading for some background on how we got to where we are today. As you can see from the quotes in Stephen’s blog post, many people in the ODF community – and the broader standards community – were dismayed at the decision to not include formula syntax in ODF. Others outside of Microsoft also pointed out the problem years ago. Rob’s blog post, as well as this post, are excellent examples of the sorts of interoperability challenges that those people saw coming as a result of the decision to not include formula syntax in the ODF standard.
The first thing I did upon reading Rob’s post was to try to follow his steps for myself, so that I could understand the context of his findings. Here’s the methodology he describes in his post:
The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps: Enter the title in A1 "May's Wedding Planner" and increased font size to 14 point. Enter formula = TODAY() in B3 and set US style MM/DD/YY date format/ Enter the date of the wedding as a constant in cell B4, also setting date format. Added simple calculations on cells B6-B8, to calculate days, weeks and months until the wedding. A11 through E16 is a simple ledger of the kind that is done thousands of times a day by spreadsheet users everywhere. Once you have the formula set up in column E (Balance = previous balance + credits - debits) then you can simply copy down the formula to the new row for each new entry
The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:
Sounds simple enough. So I fired up Open Office 3.0.1, and followed those steps. The resulting spreadsheet looked like this:
Then I saved the document, by clicking File/Save and then typing in a filename:
So far so good. Next step, I tried opening the document in IBM Lotus Symphony version 1.2.0. Here’s what I saw:
And then I opened the same document in Excel 2007 SP2, and here’s what I saw:
This is a great example of a common ODF spreadsheet interoperability challenge, and two different ways of dealing with it. The challenge is caused by the fact that Open Office writes formulas in a syntax that is unknown to Symphony and Excel. Open Office, for reasons I don’t understand, has decided to use as their default formula syntax the unfinished Open Formula specification, which is neither approved nor published by OASIS – not even out for public review yet.
So what do Symphony and Excel do about this challenge? The answer is that Symphony preserves the (unrecognized) formula markup, and Excel preserves the cached values. (A quick aside for those who don’t know: spreadsheets typically store both the formula and the value resulting from the most recent recalculation.)
Getting back to Rob’s initial premise of this being a typical wedding-planning exercise, if Maya were to send this spreadsheet to a person using Microsoft Excel SP2, that person would see the values as shown above. They’d know at a glance what day was ‘today’ when Maya made the file, and that the ledger balance that day was 5500.
But if Maya were to send this spreadsheet to a person running IBM Lotus Symphony, they’d see only the formulas. Perhaps an ODF markup expert like Rob would be able to massage that spreadsheet into something usable, but most people would find it a bit hard to conceive of what it means for a wedding to be “of:=B4-B3” days away, or for there to be a ledger balance of “of:=E15+C16-D16” dollars.
So what does Rob’s test matrix show for these two scenarios? Oddly, it labels Open Office to IBM Lotus Symphony interoperability for this scenario as “OK” and it labels Open Office to Microsoft Office SP2 interoperability as “Fail” (with a red background for added emphasis). Now, I know Rob works for IBM and probably wants to portray Symphony in the best possible light, but is that a reasonable assessment of the interoperability we’ve just seen above?
After further investigation, though, I think I see what Rob may have actually done to get the result in his table. He seems to have included some steps that aren’t documented in his blog post:
Here’s how it looks, for those who don’t have Open Office handy:
After following these steps, Rob was then able to create a spreadsheet that stores formulas in the undocumented non-standardized syntax that was used by old versions of Open Office. Symphony, being simply a fork of an older version of the Open Office code base, is able to understand those formulas, so it can load both the values and the formulas themselves.
It’s worth noting what the OpenOffice.org developers have to say about this option that Rob has apparently used for his interoperability testing:
So it sounds like there isn’t any single version of ODF that will provide compatibility across all versions of Open Office and Symphony. You can use the “may cause information to be lost” option if you want to do a demonstration of formula interoperability with ODF, but if you want to demonstrate text-bullet interoperability, you may need to use another option.
And what does Excel 2007 SP2 do with the document saved in this alternative format? Exactly the same thing it does with Open Office’s current default format: it displays the data, so that the document user can see the results of the last recalculation of the spreadsheet, and it ignores the formulas that are written in a non-standardized syntax that Excel doesn’t support. I think that’s a pretty reasonable approach, when a spreadsheet application comes across non-standardized formula syntax: show the last recalculated result, thus preserving the data, and don’t try to guess at the semantics of undocumented formula markup.
Why Doesn’t Office 2007 SP2 support Open Office formula syntax?
That’s a logical question to ask, in regard to how SP2 handles formulas. To answer it accurately and completely, we should distinguish between the two formula syntaxes that Open Office uses.
The first is the syntax they use in their non-recommended “1.0/1.1 (OpenOffice.org 2.x)” setting. This is an undocumented, deprecated syntax, and therefore not a reliable mechanism for formula interoperability. Despite what you may read on some blogs, it is not the same syntax as used by Excel 97/2000/2003. Open Office copied quite a bit of the feature set from Excel, and there are definitely similarities in the formula syntax, but there are also differences with regard to referencing, operators, data types, and function arguments.
The other formula syntax that Open Office supports is the Open Formula syntax, which will eventually appear in ODF 1.2. This syntax has not yet been approved by a standards body, nor has it undergone the 60-day public review period that OASIS requires prior to approval and publication. It may go to public review soon, but it won’t be a standard until later this year at the soonest, and the details may change as a result of the remaining TC work and the public review process. (According to recent discussions in the ODF TC, we may send the other parts of ODF 1.2 out for public review first, to allow more time to finish up Open Formula.)
In Office’s implementation, we haven’t chosen to support the draft Open Formula spec (as Open Office currently does), because we have certain obligations when we ship software that don’t apply to open-source projects like Open Office. We need to test and verify behavior to a degree that’s not possible without final, fixed documentation that is believed to be 100% complete and accurate. When Open Formula is completed, standardized, and published, we'll be looking at that as the future path for enabling formula interoperability in ODF spreadsheets. But we’re not there yet; ODF 1.2 is not done, and not even ready for public review.
It’s interesting to note that we have discussed this very issue at a DII workshop. Last July, we had a workshop in Redmond, with attendees including other ODF implementers, members of the ODF TC, standards professionals, and others. In the roundtable discussions, I brought up our approach to formula support as outlined above, and asked for feedback. Although it was not 100% unanimous, there was clear consensus among most of the participants in the discussion that they did not want us to implement a non-standard formula syntax in anticipation of it becoming a standard. “Putting the cart before the horse” in that manner was seen by many as a possible source of future interoperability problems, rather than a solution to them, and we took that feedback into consideration.
As I’ve covered before, we feel that thorough documentation of implementation details is a cornerstone of document format interoperability. We’ve published detailed implementer notes for our ODF implementation, and on the matter of formulas (which are stored in the table:formula element), here’s what our implementer notes have to say:
The standard defines the attribute table:formula, contained within the element <table:table-cell>, contained within the parent element <office:spreadsheet \ table:table-row> This attribute is supported in core Excel 2007. 1. When saving the table:formula attribute, Excel 2007 precedes its formula syntax with the "msoxl" namespace. 2. When loading the attribute table:formula, Excel 2007 first looks at the namespace. If the namespace is “msoxl”, Excel 2007 will load the value of table:formula as a formula in Excel. 3. When loading the table:formula attribute, if the namespace is missing or unknown, the table:formula attribute is not loaded, and the value “office:value” is used instead. If the result of the formula is an error, Excel 2007 loads the <text:p> element and maps the element to an Error data type. Error data types that Excel 2007 does not support are mapped to #VALUE!
The standard defines the attribute table:formula, contained within the element <table:table-cell>, contained within the parent element <office:spreadsheet \ table:table-row>
This attribute is supported in core Excel 2007.
1. When saving the table:formula attribute, Excel 2007 precedes its formula syntax with the "msoxl" namespace.
2. When loading the attribute table:formula, Excel 2007 first looks at the namespace. If the namespace is “msoxl”, Excel 2007 will load the value of table:formula as a formula in Excel.
3. When loading the table:formula attribute, if the namespace is missing or unknown, the table:formula attribute is not loaded, and the value “office:value” is used instead. If the result of the formula is an error, Excel 2007 loads the <text:p> element and maps the element to an Error data type. Error data types that Excel 2007 does not support are mapped to #VALUE!
And, as both Rob’s tests and mine show, that is exactly what Excel does. It would be great if there were a place implementers could go to see these sorts of details for all major ODF implementations.
I’m glad to see this sort of public scrutiny of the details of ODF interoperability and how the underlying challenges are handled by various implementations. As you can see, spreadsheet interoperability is a complicated topic, and in the specific case of ODF spreadsheets, there is even more complexity created by the lack of a defined formula syntax in any published version of ODF.
The good news, when it comes to formulas, is that the Open Formula specification will address this area soon. My colleague Eric Patterson represents the Excel team in the Open Formula SC, and the very capable David Wheeler leads that group. Much good work has been done already, and we look forward to seeing the final Open Formula spec go out for public review and then approval by OASIS. The nearly 400 pages of formula syntax documentation in ISO/IEC 29500 (Part 1, section 18.17) enables reliable formula interoperability in the Open XML community, and soon the ODF community will have a similar level of formula interoperability.
But formulas are not the only ODF interoperability challenge. As members of the ODF TC and also the OIC (ODF Interoperability and Conformance) TC, both Rob and I – and many others – will need to work together to enable better interoperability in areas including tracked changes, mail merge, application settings, and others. Will ODF 1.2 be the most interoperable version of ODF yet? I hope so, and there are signs that it will be. But our work is not nearly done.
Allthough the OpenXML/ODF translator was funded by Microsoft it is OSS software created by 3rd pary software producers and as such not belonging to any Microsoft software prodcut and I think it is safe to asume that Microsoft will not add OSS software snippets to their core software product as that would create possible license issues especially if someone were to reuse some GPL code in those sources.
Luc, as I said in my post we have certain obligations when we ship software that don’t apply to open-source projects. I’ll cover some more details of this in my follow-up post on spreadsheet interoperability coming shortly.
Luc: "...I think it is safe to [assume] that Microsoft will not add OSS software snippets to their core software product as that would create possible license issues especially if someone were to reuse some GPL code in those sources."
You can't legally put GPL code into BSD at all, unless you're the author, in which case you'd probably be required to relicense the code under BSD in order to submit it to the project in the first place.
I think the whole issue of code reuse is beside the point, though. Microsoft has more than adequate resources to create whatever code they wish, especially when the development effort is directly associated with one of its primary sources of income: Microsoft Office. Of all parties involved, they are in the worst position to argue for choosing a formula language based on scarcity of resources.
doug, let me copy a snip of "code" from Weir's blog:
"So, going back to my test spreadsheets from all of the various ODF applications, how do these applications encode formulas with cell addresses:
* Symphony 1.3: =[.E12]+[.C13]-[.D13]
* Microsoft/CleverAge 3.0: =[.E12]+[.C13]-[.D13]
* KSpread 1.6.3: =[.E12]+[.C13]-[.D13]
* Google Spreadsheets: =[.E12]+[.C13]-[.D13]
* OpenOffice 3.01: =[.E12]+[.C13]-[.D13]
* Sun Plugin 3.0: [.E12]+[.C13]-[.D13]
* Excel 2007 SP2: =E12+C13-D13
I would like to know why Excel 2007 doesn't produce ODF 1.1 conformant documents, regarding this "cell address" thing.
carlos, a small correction:
* OOo 3.0.1: of:=E12+C13-D13
* Excel 2007 SP2: msoxl:=E12+C13-D13
* OOo 3.0.1: "of:=[.C4]+[.D4]+[.E4]"
@Clippy, carlos had it right the first time. In the Maya spreadsheet Rob Weir saved using OOo 3.0.1, the formula is "of:=[.E12]+[.C13]-[.D13]". This is a direct cut and paste.
@carlos, Microsoft interprets the word "Typically" in section 8.1.3 as being infectious and recursive, converting everything it touches into non-normative examples. Thus, the content that a formula, cell addresses, and even all of section 8.3.1 become non-normative simply by being within seven degrees of the word "Typically". Never mind the fact that the specification makes perfect sense if you interpret "Typically" as only applying to having a equal sign at the beginning of the formula, as this example in 6.5.3 would indicate:
text:formula='ooo-w:[address book file.address.FIRSTNAME] == "Julie"'
Note the use of the brackets used for addressing are present, but the preceding equal sign is missing.
Microsoft's interpretation of the ODF 1.1 specification is not necessarily a deliberate misreading, but even giving them that benefit of the doubt, they still ignored parts of the spec they considered informative to use the formula language that suited their interests rather than picking a widely used defacto standard that takes 8.1.3 and 8.3.1 to heart and promotes interoperability. The whole point of standards is to allow interchange. Anything else is just a subversion of the process.
Matthew, there's a long history how standards text is written and what's normative or not; ISO and other standards bodies have rules governing such details as well. I think it's in everyone's interest for us to follow those precedents and rules, rather than making up new rationales for such things. The entire reason such rules exist is so that we DON'T get into these little debates about things like "what the meaning of the word 'is' is."
As I've mentioned, I'll cover all of this in my next blog post. This discussion needs more facts and fewer opinions ... in my opinion. :-)
@Mathew: "The whole point of standards is to allow interchange. Anything else is just a subversion of the process."
The pity is that subversion of processes is a standard at Microsoft :-(
The reason for several applications using the same cell referencing system is that this is used by OOo and that these applications implemented the existing OOo spreadsheet variation in ODF whereas MS has been implementing the existing Excel spreadsheet variation in ODF and is using the cell referencing mechanism that Excel is using.
It seems that the ODF 1.1 specification does not define normative specifications on how to reference spreadsheet cell or in fact anything normative about specifying spreadsheet content in an ODF spreadsheet.
doug, is that true?
IMHO if it is, then this decision is totally inexcusable.
If there is no standard (yet) for formulas in ODF, i.e. not specified in ODF 1.1 and not published or finalized in ODF 1.2... then why do anything with formulas at all?
Why do SP2, period?
Or, if you prefer, if you wanted to do something useful, consider the document base.
How many ODF documents exist, world-wide? How does that compare to the empty set of documents in "ODF" format produced by SP2 (before there was an SP2)?
This is a _document_ format issue, not an _application_ issue. It doesn't matter as much how many installed versions of Vendor X software that read/write ODF - it matters how many of what "format" exist.
So, why not take the approach of, "How do we build an SP that allows us to read the existing ODF files, and write them out again, so that the original author's software could open the result?"
I don't think it is something a rocket scientist needs to suggest, at a DII meeting.
Don't forget, too, that the OOo source was and is available for reference in implementing SP2. I suggest the path of least resistance in the marketplace, would have been interoperating regardless of the state of standards -- if you wanted to achieve any degree of interoperability.
בימים האחרונים שוחררה חבילת השירות השניה של Office 2007. אחת התכונות הבולטות של חבילת השירות הזאת היא
@gk: "Let me take a case... why don't all browser manufacturers mimic internet explorer 6 to give practical interoperability.. This is because it is not the standard."
IE6 implements the HTML and CSS specifications. It simply does not conform to those standards. From what I can tell, Office 2007 SP2 can read/write ODF1.1 documents as specified.
What is in question here is that the formulas are not specified in the current standard, but *are* being standardised in a separate formula standard that is being drafted and agreed upon that is ratifying what existing implementations and existing documents do. Where existing implementations - such as Symphony - do not conform at the moment, upcoming versions do to make them interoperable.
I find it amusing that here Microsoft are saying "we can't do XYZ as it is not standardised, and is still in draft" whereas the upcoming Visual Studio 10 is supporting language features in the draft C++0x standard!
It seems that Microsoft is suffering from split-personality disorder and is choosing which standards and draft standards to conform to that best suite Microsoft. Like IE being one of the few vendors that don't support Scalable Vector Graphics or MathML, instead going for a butchered hybrid that no-one else supports.
Since Rob Weir moderates really nicely his blog, I assume he won't post my question there.
Rob, please tell me, what are today's incentives given by IBM to it workers to hawk agendas?
The following is just a precedent established by IBM:
IBM Is Offering Workers Prizes to Hawk OS/2
Paul B. Carroll, Staff Reporter
The Wall Street Journal
March 27, 1992
International Business Machines Corp.'s sales force is already bigger than many armies, but as IBM prepares to start shipping its much-maligned OS/2 operating system, it has decided it needs reinforcements.
So IBM is about to launch a program that will attempt to turn all its 344,000 employees into salesmen for the personal-computer software, which is in a fight for its life against Microsoft Corp.'s Windows juggernaut.
IBM will offer employees incentives ranging from medals to IBM software or hardware to cash, depending on how much effort they put into OS/2. In return, says Lucy Baney, an IBM marketing executive, the company will ask employees to approach their neighbors, their dentists, their schoolboards. Armed with brochures and talking points, the IBMers will sing the praises of OS/2 as the solution to people's personal-computing needs.
IBM is pulling out most of the stops in advertising and pricing, too, as it prepares for one of the stiffest marketing battles the personal-computer industry has seen. IBM must not only overcome Microsoft's considerable momentum but must also face a Microsoft marketing effort that, while very different and more low-key, is just as intense in support of Microsoft's latest version of Windows. In fact, the situation here is the reverse of the one IBM normally sees; Microsoft is the entrenched power and IBM the struggling competitor attempting to dislodge it.
"There's a very serious commitment to energize the whole company behind the product," says Fernand Sarrat, the top OS/2 marketing executive.
Although he declines to be specific on IBM's advertising plans, he says that "there isn't an IBM U.S. ad campaign that will receive anywhere near the dollars that OS/2 does" this year. That easily puts OS/2 advertising spending into the tens of millions of dollars, not counting the money IBM will spend on extensive international ads.
Mr. Sarrat says the campaign will be informational rather than the sort of macho advertising that has been rumored in the trade press; one slogan that was reportedly considered was "Curtains for Windows." But Mr. Sarrat adds: "It's not that we'll be namby-pamby. That's for damn sure."
The campaign will really start rolling next month. IBM's new version of OS/2 will be available to corporate customers next week, meeting IBM's commitment to deliver it by the end of March, but it won't be widely available in retail outlets until the second half of April. So even though Microsoft has already begun its campaign, in advance of its April 6 introduction of Windows 3.1, Mr. Sarrat says IBM has decided to wait a bit.
On pricing, he says that people who have the latest version of OS/2 will get the new version free. Many users of Microsoft's Windows and DOS will also get huge discounts off the list price of $195, but Mr. Sarrat declines to be specific, lest he tip his hand to Microsoft. (Windows 3.1 will have a list price of $150.)
IBM's pricing plan means it will be taking losses on many of its initial OS/2 sales. Software securities analysts have estimated that IBM must pay Microsoft a royalty of about $20 on each copy of OS/2, because it contains Windows software. They have said it also costs $30 or more to produce each copy of OS/2. And those figures don't include any of IBM's marketing expenses, any of the corporate overhead that eats up more than 30% of IBM's revenue or any of the OS/2 development expenses that have totaled hundreds of millions of dollars.
"This is a long-term war," Mr. Sarrat says.
He predicts that IBM will sell millions of copies of OS/2 this year, even though it has sold something less than one million copies in the five years since OS/2's introduction. Mr. Sarrat even goes so far as to predict that within a few years OS/2 will be outselling Windows, which Rick Sherlund, a securities analyst at Goldman Sachs, predicts will sell 11 million to 12 million copies in the fiscal year ending June 30 and 17 million copies the following year.
"It won't happen this year or next year," Mr. Sarrat says, "but after next year it's fair game."
In contrast to IBM, Microsoft will spend most of its effort "making sure people have a good experience" with its new version of Windows, says Steve Ballmer, a Microsoft executive vice president.
Microsoft will spend plenty of money on advertising, including its first television campaign. Mr. Ballmer says a published estimate of a $31 million marketing effort "is probably low even as a U.S. number." Microsoft will also be aggressive on pricing, offering upgrades to the new version for $50 initially.
But Mr. Ballmer says most of Microsoft's effort will go into a huge program to train computer dealers, to offer workshops to heavy Windows users and to help people get information on how to use the product. Patty Stonesifer, a Microsoft vice president in charge of customer support, says that Microsoft has 500 people available to answer telephone callers' questions on Windows, up from 70 when the prior version of Windows came out in May 1990. She says Microsoft has also invested heavily in an electronic bulletin board to keep users up to date on problems that surface with the software and to provide the latest tips on how to use Windows better.
"Making Windows easier to use will be a demand generator in itself," she says.
Microsoft has also mounted an aggressive public-relations campaign in recent weeks, having executives do waves of interviews to address OS/2. The executives have argued, in particular, that while OS/2 may make sense for IBM's traditional corporate users, it is too complex and requires too much memory to attract the broad mass of users who have been drawn to Windows.
Still, Mr. Ballmer acknowledges that many people in the computer industry and many users "are rooting for some healthy competition. People want a healthy, knockdown, drag-out fight. But we haven't shipped, and IBM hasn't shipped. In the next few weeks, we'll see what happens."
Copyright Dow Jones & Company Inc