WG4/WG5 Meetings
29 June 09 07:32 AM

WG4 meeting, Copenhagen

I’ve just returned last night from a week in Copenhagen, where I attended the SC34 WG4/WG5 meetings that were hosted by Danish Standards.  As usual, it was several days of non-stop document format discussions, in the meetings as well as over breakfast, lunch, dinner, and Carlsbergs.  A typical comment from one of the delegates Wednesday afternoon: “let’s take a break from sitting down and continue this debate standing up for a while.”

Other attendees have posted some thoughts about the meetings already, and I expect we’ll see more discussion of the details on participants’ blogs going forward.  See Alex Brown and Jesper Lund Stocholm for information about some of the topics we discussed, including boolean values, ISO 8601 dates, and other aspects of IS29500 maintenance.

WG4 (IS29500 Maintenance)

The main focus of WG’s work, as always, was processing of DRs (defect reports) that have been submitted by member bodies.  As of the end of the meeting, we had closed 205 of the 284 defect reports that have been submitted to date; watch the WG4 statistics page for an update in the next few days that will reflect the latest status.  The biggest submitter of DRs to date has been the UK, although I see that Japan plans to take the lead soon, according to a comment from WG4 convenor Murata Makoto on Alex Brown’s Flickr stream.

In addition to discussing proposed solutions and closing DRs, we discussed at some length two topics Alex raised in presentations to the group: the intent of IS29500’s division into Transitional and Strict at the BRM last year (including how they’re related and the long-term maintenance implied by this structure), and various approaches to conformance testing for IS29500 (and also IS26300).  We also reviewed the planned schedule for COR1 (the first set of technical corrigenda for IS29500) and AM1 (the first set of amendments).  Project editor Rex Jaeschke is already working on these documents, and WG4 hopes to be ready to approve them on the July 23 conference call, or on a July 30 call if needed.  After that, they’ll proceed to SC34 and JTC1 for balloting.  One other interesting topic we discussed was how we can implement a public email archive for all WG4 correspondence.

I’ll have more to say on all of these topics as we move forward with them in WG4.  For the next few weeks, however, the main focus of WG4’s work will be to get COR1 and AM1 ready for approval and publication, so that the IS29500 standard can be updated to reflect all of the work WG4 has done to date.

WG5 (ODF-OXML Translation)

After WG4 met on June 22-24, WG5 met on June 25.  WG5 is the working group on translation between the ODF and Open XML formats, and the main work item in WG5 right now is a TR (technical report) that will provide guidance on various details of the translation process.

Klaus-Peter Eckert (Fraunhofer FOKUS), the editor of the TR, provided an overview of the current status and outlined some of the scenarios the report is based upon.

Alex Brown presented on conformance and validation, covering the same topics he had covered for WG4 earlier in the week.  We discussed possible collaboration between WG5 members, Fraunhofer FOKUS, and others, to create a community-developed validator that would benefit from the expertise of WG1, WG4, and WG5.  These discussions are ongoing.

I presented a recap of the ODF Plugfest I had attended in The Hague, and explained how members of the OIC TC will be contributing to the plugfest wiki.  We discussed whether a similar wiki-based approach might be appropriate for testing translation scenarios as defined by WG5.  We also went through the schema that OIC TC chair Bart Hanssens has created for managing interoperability test cases, and looked at how that approach might be useful to WG5.  Although WG5 and the OIC TC have different missions – the OIC TC is all about ODF interoperability, whereas WG5 is about ODF-OXML translation – there is quite a bit of conceptual overlap in their work.  We decided that it would be good to keep WG5 informed of developments in the OIC TC going forward, and I’ll be playing that role for WG5.

As usual at these meetings, I got out and took some pictures around town in the evenings.  The fact that it was light out until after 23:00 every evening certainly extended photo-taking hours, and we were fortunate to be in town during the week of the solstice celebration (the witch-burning to which one attendee alluded earlier in the week), as well as the week of a crazy graduation ritual in which truckloads of young Danes in white hats cruise around the city yelling through bullhorns, sounding sirens, drinking beer, and generally carrying on.

celebrating graduation

Postedby dmahugh | 1 Comments    
ODF Plugfest, The Hague
16 June 09 11:09 PM

Over the last two days I’ve been attending the ODF Interoperability Workshop, a fascinating event that brought together ODF implementers from many countries to talk about the issues and collaborate on interoperability testing.  The workshop web site covers the details of the agenda, provides a variety of related content (including the presentations), and lists the objectives of the event:

The aim is to provide a low-level hands-on interoperability testing environment in which
vendors and community members can fine tune the interoperability capabilities of their ODF implementations and make test cases, recommendations and create best practices for implementors.

The ultimate goal is to achieve full seamless interoperability for the entire feature set of ODF across all suppliers, platforms and supported technologies.

The workshop is meant for people who write and architect the code to handle the actual ODF in
applications - desktop editors and viewers, online apps, mobile, etc. Participants should represent every major team behind the various competing ODF products, their direct (technical) management and community leaders, as well as the members of the ODF OIC committee.

It was a productive two days, both in terms of what was accomplished in the official activities of the event and also in terms of the networking opportunities it provided.

The first day started with a speech by Frank Heemskerk, the Minister of Foreign Trade for the Netherlands.  He discussed the the Dutch government’s policy on the use of open standards, and made a direct appeal to the attendees to “go beyond compliancy and help achieve broad-based open standards."

Mr. Heemskerk was followed by Ineke Schop, Program Manager for Netherlands in Open Connection.  Ms. Schop described her view of the goals and aspirations of open standards in general, as well as some of the specific steps being taken by her organization to deliver on those objectives.

After that we got into the details of working through ODF interoperability issues.  There were a variety of sessions by implementers and members of the ODF TC and OIC TC, and you can find all the details in the agenda posted on the workshop web site.  Note that the presentations are also included in the online agenda – several have already been posted, and the rest will be available soon.  Video interviews were recorded with many of the attendees, and those should be available soon as well.

It was great to see some old friends again, and I also met many people I knew before only through their voices on ODF TC calls or their online presence in the ODF community, including Oliver-Rainier Wittman (Sun), Mingfei Jai (IBM), Marc Maurer ( AbiWord), Zaheda Bhorat (Google), and many others.  My colleague Peter Amstein, the chief architect of our ODF support, was also in attendance, and it was an opportunity for him to get to know the people behind many other ODF implementations.

During the afternoon of each day, we did interoperability testing and had many informal discussions about specific technical issues.  Some of the tests were based on specific issues that people already knew about, and at other times we worked through specific scenarios that OIC TC chair Bart Hanssens had defined, as well as scenarios that attendees created.  This testing resulted in identification of varying interpretations of the spec, bugs, and other issues that can now be resolved to improve the overall state of interoperability.  I’ll not be talking about specific details of those tests, because we were asked to conduct ourselves in accordance with the Chatham House Rule and not name specific products in post-event blogging or reporting.  This policy was in place to assure that the event could be productive and results-oriented, and I’d say this worked very well – all of the implementers were open and pragmatic about working through issues that came up in testing.

There were many bloggers and Twitterers in attendance, so I expect others will post their thoughts on the event after everyone gets back home; I noticed that Floschi already has a nice summary posted.

It was a very useful event, and I’d like to give special thanks to Fabrice Mous and Michiel Leenaars, who worked tirelessly to provide a great experience for the attendees.

image

Postedby dmahugh | 4 Comments    
Testing Office’s ODF Implementation
14 June 09 09:32 PM

In this blog post, I’m going to cover some of the details of how we approached the challenges of testing our ODF 1.1 implementation that was released in Office 2007 SP2.

Adding support for a new document format such as ODF to Office is a large and complex project.  Office has a very broad range of functionality, and we had to map that functionality to the structures defined in ODF.  This mapping then needed to be rigorously tested, in isolation and also in rich documents that reflect typical usage of various combinations of features, to assure that our generated documents are conformant to the specification and to maximize interoperability with other implementations.

High-Level Planningimage

When we began work on our ODF 1.1 implementation, we started by developing a set of high-level guiding principles that we would follow.  I covered those in a blog post last year, as well as a recent post that explained how we see the relationship between standards and interoperability.

After we had reached agreement on these principles, the various feature teams began designing the details.  A “feature team” here at Microsoft is made up of three groups of people: program managers (PMs), developers, and testers.  In broad simple terms, PMs are responsible for writing down the specifications, developers are responsible for implementing those specifications, and testers are responsible for verifying that everything works as intended.  Since there was a specification for ODF in hand already, the main job of the feature team was to write down the details of how we would implement it.  In this post I’ll be focusing on the work of the testers, although inevitably that will include some discussion of the work of the PMs and developers, because the  three disciplines work very closely together in an iterative manner.

Most of the people who planned and executed our ODF implementation are members of the same teams that are responsible for other aspects of the design, development and testing of the Office clients.  We created an “ODF virtual-team” that included specific individuals from each of the relevant product teams – Word, Excel, PowerPoint, and graphics, primarily --- and the v-team approached the project with the same management structure and business processes that we use for other work on Office.  Attendees of the DII workshop in Redmond last summer had a chance to meet several key members of the ODF v-team, who gave presentations and participated in the roundtable discussions at that event.

In addition to these people in Redmond, we have other teams that we can call on for projects like this one, and for the testing work on our ODF implementation we pulled in people from the Office group in four countries, as well as people who worked on Office years ago but have moved on to other roles (for their expertise in older features that we wanted to verify are supported correctly in our ODF implementation).

Mapping Between ODF and Open XML

Office’s internal representation of documents is very closely aligned with the Open XML formats, so one of the first steps in planning our ODF implementation was to do detailed mapping between the Open XML structures that Office already supported, and the ODF structures that we would be saving and loading to/from in ODF 1.1 documents.

The PMs had primary responsibility for this, and they created sets of spreadsheets to capture the mappings between every ODF and Open XML element and attribute.  This mapping needed to be defined in both directions: OXML->ODF for File/Save operations, and ODF->OXML for File/Open operations.

As a simple example of how that worked, here is part of the spreadsheet for the concept of bold text, as mapped from OXML to ODF:

image

This excerpt is just a subset of what was captured in the mapping; the PMs also identified required/optional status, default values, and other information.

And here’s the converse mapping for bold text, going from ODF to OXML:

image

I’ve used a very simple example here, and yet as you can see there are many details involved.  There were thousands of details like this in the mapping spreadsheets, and collectively these spreadsheets served two roles:

  • they were the spec for the developers
  • they defined the scope of the test plan for the testers

The process of creating the mapping spreadsheets is interesting unto itself, due to the many places where ODF and Open XML had different approaches or different capabilities.  I’ll cover the mapping spreadsheets in more detail in a future blog post.

Test Tools and Test Documentsimage

Like any professional test team, the Office testers have a wide variety of tools they’ve built to help automate their work.  Here are a few examples of the tools that were used to test Office’s ODF implementation:

  • Verifying conformance to the schemas in the standard was a high priority, and we used Jing (called by an internal tool we call ODE) to validate against ODF’s RNG schemas.
  • The Excel team used an internal tool named Trippy to automate round-tripping.  They ran this tool against a test library of over 700,000 test documents, each of which was saved as an ODS file and then validated against the reference schemas.
  • The Word team used tool called OHarness, which can be used to run the same operation on each one of a batch of files.  They used a library of over 100,000 documents, saving each one as an ODT file, logging bugs for the developers, and repeating the tests until they drove the number of non-conformant documents to zero.

These tools, and others developed by the test teams, all work against large collections of documents.  These test documents came from a variety of sources:

  • Text documents that have been used in the past for the binary formats documentation and other purposes.
  • Real-world documents which have been given to us by customers for the purpose of helping us see how they use our products and seeing the problems they have run into.
  • Documents from test libraries created by other organizations, such as the test documents from the University of Central Florida atomic test suite and the test documents that Dialogika has created based on their work in developing the European Commission’s corporate style package for official and legislative documents.
  • Documents manually created by the testers to cover every element, attribute and attribute value defined in the ODF schemas.
  • Public documents collected from the internet.

Our libraries of test documents are dynamic and constantly growing.  As a recent example, we found that the latest Committee Draft of the ODF 1.2 specification uses styles in a way that exposed a bug in Word’s implementation.  (Rick Jelliffe has blogged about this bug.)  So we’ve added that document to our test library going forward.  (We’ve also fixed that bug and tested the fix, which will appear in a future update.)

Verifying Mapping

After the developers had written code to handle the mappings as defined in the spreadsheets (which were essentially the specs for their work), the testers got to work testing this code.

One aspect of testing was the small documents for verifying specific elements and attributes.  These were handled in an automated manner using tools such as Trippy and OHarness, as mentioned above.

Another aspect of this testing was the creation of complex “real-world documents” that contained combinations of functionality to test various scenarios that we’ve found typically occur in actual use of Word, Excel, or PowerPoint.

For example, many Excel users create spreadsheet documents that contain a large worksheet of raw data like this one:

image

… and that data is often summarized that data in pivot tables and/or formatted reports like these:

image

The test team would create documents like this one, then manually verify that the document could be saved as either an ODS or XLSX file without change in appearance or functionality.  In this particular case, the test team verified that a variety of details were handled the same in Open XML and ODF, including:

  • Formatting of cell content, including conditional formatting
  • Data with Autofilter on data sheet
  • PivotTable in Pivot sheet based on above data
  • Results of formula calculations
  • Data validation

Verifying Conformance

As I mentioned earlier, the product teams each have a large corpus of test document that are used for automated testing of conformance.  Binary documents and Open XML documents are opened and then saved as ODF, and each of these documents is validated against the ODF schemas.  By analyzing the results of these tests, the testers can identify problems that need to be corrected, and then the tests are re-run.

The goal of this process is simple: to drive the number of non-conformant documents to zero.  We reached that goal for the Office 2007 SP2 implementation of ODF, and as of this writing I don’t know of a way to make Word, Excel or PowerPoint write a non-conformant ODF document.  It may theoretically be possible to do so – and if anyone happens to come across such a scenario please let me know – but we have verified that the hundreds of thousands of documents in our test libraries can be saved as fully conformant ODF 1.1 files from Office 2007 SP2.  By conformant,  I mean here fully schema-compliant and also conformant with our reading of the text of the ODF 1.1 spec.

Security Testing

When we add support for a new format, one area that requires intensive testing is security.  Does our implementation of the new format create any new security risks that need to be mitigated?  Is there any way that an ODF document can be corrupted (deliberately or accidentally) that could cause a security problem?  The test teams were responsible for answering these questions.

The key tool used for this aspect of the test plan was Distributed File Fuzzing (DFF).  The basic concept is that thousands of documents are corrupted in random ways, and these documents are opened on large numbers of PCs in a distributed environment.  Data is collected on the ways in which these corrupted files fail to open, and this data is used to verify that there are not security problems caused by bad error handlers, buffer overruns, integer overflow, or other issues.

When issues are found in security testing,  the process is the same as in the other types of testing: the testers log bugs, and the developers check whether the problem is in design or implementation, and based on those findings we either modify the design and re-code, or correct the code.  The tests are then repeated, and this process continues until the number of open security issues reaches zero.

Testing Interoperabilityimage

The final piece of the testing puzzle is interoperability testing: verifying that documents created in Office can be opened in other implementations, and vice versa.

This type of testing is nothing new for the test teams, because we do it every time we add a feature to Office.  In the past, we focused primarily on interoperability between various versions of Office, but now that test matrix has been expanded to include the latest versions of major ODF implementations.

To verify interoperability with other ODF implementations, the test teams created documents from scratch in OpenOffice.org and Symphony, and then opened those documents in Office.  They also created documents in Office and opened them in the other implementations.

In addition to these types of simple tests, we also wanted to verify that our implementation was not dependent on details of other implementations that aren’t actually standardized in the specification.

A good example of this sort of issue is the question of how parts are named and where they’re stored in the ZIP package that comprises an ODF document.  I’ve blogged in the past about this same issue in Open XML – an implementation of the Open XML standard shouldn’t assume that the document start part is word/document.xml, just because Word happens to use that name and location.

In ODF, some of those details are standardized – the start part is always named content.xml, for example – but others are not.  So the testers used ODE to manually modify documents that had been created by OpenOffice.org, to change certain details such as the name of the folder containing embedded images.  They then opened these documents in Office, to verify that our implementation will be able to interoperate with implementations that have made different design decisions within the range of options that the ODF standard allows.

Summary

As you can see, there are many things to consider when creating and executing a test plan for support of a new document format in Office.  At an abstract level, it’s just another test plan – we design, then code, then test, with ongoing revisions to all three as needed to reach our design goals.  But the specifics of the ODF implementation test plan were geared toward the details of the ODF standard, as outlined above.

Due to the work our test teams did on the ODF 1.1 implementation in Office 2007 SP2, we are very confident that the implementation we produced adheres to the details of the design we had created, as documented on the implementer notes web site.  I realize that some people may disagree with some of the design decisions we made in our implementation, and we welcome constructive debate of those details.

I’m posting this from The Hague, where I will be attending the ODF plugfest today and tomorrow.  My colleague Peter Amstein – who led the technical work on our ODF implementation – is also here, and we’re looking forward to learning about how other implementers approach document format interoperability testing, and discussing how we can all work together on ODF interoperability going forward.

Parliament building, The Hague

Postedby dmahugh | 16 Comments    
Standards-Based Interoperability
05 June 09 07:02 AM

There has been quite a bit of discussion lately in the blogosphere about various approaches to document format interoperability.  It’s great to see all of the interest in this topic, and in this post I’d like to outline how we look at interoperability and standards on the Office team.  Our approach is based on a few simple concepts:

  • Interoperability is best enabled by a multi-pronged approach based on open standards, proactive maintenance of standards, transparency of implementation, and a collaborative approach to interoperability testing.
  • Standards conformance is an important starting point, because when implementations deviate from the standard they erode its long-term value
  • Once implementers agree on the need  for conformance to the standard, interoperability can be improved through supporting activities such as shared stewardship of the standard, community engagement, transparency, and collaborative testing

It’s easy to get bogged down in the details when you start thinking through interoperability issues, so for this post I’m going to focus on a few simple diagrams that illustrate the basics of interoperability.  (These diagrams were inspired by a recent blog post by Wouter Van Vugt.)

Interoperability without Standards image

First, let’s consider how software interoperability works when it is not standards-based.

Consider the various ways that four applications can share data, as shown in the diagram to the right.  There are six connections between these four applications, and each connection can be traversed in either direction, so there are 12 total types of interoperability involved.  (For example., Application A can consume a data file produced by Application B, or vice versa.)

As the number of applications increases, this complexity grows rapidly.  Double the number of applications to 8 total, and there will be 56 types of interoperability between them:

image

Let’s go back to the simple case of 4 applications that need to interoperate with one another, and take a look at another factor: software bugs.  All complex software has bugs, aimagend some bugs can present significant challenges to interoperability.  Let’s consider the case that 3 of the 4 applications have bugs that affect interoperability, as shown in the diagram to the right.

The bugs will need to be addressed when data moves between these applications.  Some bugs can present unsolvable roadblocks to interoperability, but for purposes of this discussion let’s assume that every one of these bugs has a workaround.  That is, application A can take into account the known bug in application B and either implement the same buggy behavior itself, or try to fix up the problem when working with files that it knows came from application B.

Here’s where those workarounds will need to be implemented:

image

Note the complexity of this diagram.  There are 6 connections between these 4 applications, and everyone one of them has a different set of workarounds for bugs along the path.  Furthermore, any given connection may have different issues when data moves in different directions, leading to 12 interoperability scenarios, every one of which presents unique challenges.  And what happens if one of the implementers fixes one of their bugs in a new release?  That effectively adds yet another node to the diagram, increasing the complexity of the overall problem.

In the real world, interoperability is almost never achieved in this way.  Standards-based interoperability is much better approach for everyone involved, whether that standard is an open one such as ODF (IS26300) or Open XML (IS29500), or a de-facto standard set by one popular implementation.

Standards-Based Interoperability

In the world of de-facto standards, one vendor ends up becoming the “reference implementation” that everyone else works to interoperate with.  In actual practice, this de-facto standard may or may not even be written down – engineers can often achieve a high degree of interoperability simply by observing the reference implementation and working to follow it.

De-facto standards often (but not always) get written down to become public standards eventually. One simple example of this is the “Edison base” standard for screw-in light bulbs and sockets, which started as a proprietary approach but has long since been standardized by the IEC.   In fact this is a much more common way for standards to become successful than the “green field” approach in which the standard is written down first before there are any implementations.

Once a standard becomes open and public, the process for maintaining it and the way that implementers achieve interoperability with one another changes a little.

The core premise of open standards-based interoperability is this: each application implements the published standard as written, and this provides a baseline for delivering interoperability.  Standards don’t address all interoperability challenges, but the existence of a standard addresses many of the issues involved, and the other issues can be addressed through standards maintenance, transparency of implementation details, and collaborative interoperability testing.

In the standards-based scenario, the standard itself is the central mechanism for enabling interoperability between implementations:

image

This diagram is much simpler than the other diagram above that showed 56 possible connections between 8 implementations.  The presence of the standard means that there are only 8 connections, and each connection only has to deal with the bugs in a single implementation.

How this all applies to Office 2007 SP2

I covered last summer the set of guiding principles that we used to guide the work we did to support ODF in Office 2007 SP2. These principles were applied in a specific order, and I’d like to revisit the top two guiding principles to explain how they support the view of interoperability that I’ve covered above.

Guiding Principle #1: Adhere to the ODF 1.1 Standard

In order to achieve the level of simplicity shown in the diagram above, the standard itself must be carefully written and implementers need to agree on the importance of adhering to the published version of the standard.  That’s why we made “Adhere to the ODF 1.1 Standard” our #1 guiding principle.  This is the starting point for enabling interoperability.

Recent independent tests have found that our implementation does in fact adhere to the ODF 1.1 standard, and I hope others will continue to conduct such tests and publish the results.

Guiding Principle #2: Be Predictable

The second guiding principle we followed in our ODF implementation was “Be Predictable.”  I’ve described this concept in the past as “doing what an informed user would likely expect,” but I’d like to explain this concept in a little more detail here, because it’s a very important aspect of our approach to interoperability in general.

Being predictable is also known as the principle of least astonishment.  The basic concept is that users don’t want to be surprised by inconsistencies and quirks in the software they use, and software designers should strive to minimize or eliminate any such surprises.

There are many ways that this concept comes into play when implementing a document format such as ODF or Open XML.  One general category is mapping one set of options to a different set of options, and I used an example of this in the blog post mentioned above:

image When OOXML is a superset of ODF, we usually map the OOXML-only constructs to a default ODF value. For example, ODF does not support OOXML’s doubleWave border style, so when we save as ODF we map that style to the default border style.

Our other option in this case would have to turn the text box and the border into a picture.  That would have made the border look nearly identical when the user opened the file again, but we felt that users would have been astonished (in a bad way) when they discovered that they could no longer edit the text after saving and reopening the file.

What about Bugs and Deviations?

image

Of course, the existence of a published standard doesn’t prevent interoperability bugs from occurring.  These bugs may include deviations from the requirements of the standard.  In addition, they may include different interpretations of ambiguous sections of the standard.

The first step in addressing these sorts of real-world issues is transparency.  It’s hard to work around bugs and deviations if you’re not sure what they are, or if you have to resort to guesswork and reverse engineering to locate them.

Our approach to the transparency issue has been to document the details of our implementation through published implementer notes.  We’ve done that for our implementations of ODF 1.1 and ECMA-376, and going forward we’ll be doing the same for IS29500 and future versions of ODF when we support them.

Interoperability Testing

The final piece of the puzzle is hands-on testing, to identify areas where implementations need to be adjusted to enable reliable interoperability.

This is where the de-facto standard approach meets the public standard.   If the written standard is unclear or allows for multiple approaches to something, but all of the leading implementations have already chosen one particular approach, then it is easy for a new entrant to the field to see how to be interoperable.   If other implementers have already chosen diverging approaches however, then it is not so clear what to do.   Standards maintainers can help a great deal in this situation by clarifying and improving the written standard, and new implementers may want to wait on implementing that particular feature of the standard until the common approach settles out.

We did a great deal of interoperability testing for our ODF implementation before  we released it, both internally and through community events such as the DII workshops.  We’ve also worked with other implementers in a 1-on-1 manner, and going forward we’ll be participating in a variety of interoperability events.  These are necessary steps in achieving the level of interoperability and predictability that customers expect these days.

In my next post, I’ll cover our testing strategy and methodology in more detail.  What else would you like to know about how Office approaches document format interoperability?

Postedby dmahugh | 20 Comments    
Tracked Changes
13 May 09 09:52 PM

When I blogged about the release of SP2 with ODF support two weeks ago, I mentioned that I was planning to blog about a few of the tough decisions we faced in our SP2 implementation of ODF, such as the decision not to support tracked changes.  I’ve spent some time since then covering our approach to formulas in ODF, and now I’d like to move on to answering the question of why we aren’t supporting ODF tracked changes.

For those who just want the summary, here’s a high-level recap of what I’ll cover in more detail below:

  • Tracked changes is a very complex aspect of document format functionality; for example, the ECMA-376 specification devotes over 100 pages to describing tracked changes
  • Microsoft Word has a long history of supporting tracked changes, and this functionality is used by a large number of Word users
  • Due to its role in collaborative processes, tracked changes is often used for documents with legal, financial or technical implications that are reviewed and edited by multiple people; in such scenarios, accuracy and reliability are critical
  • ODF 1.1 has a very limited description of tracked changes, covered in only 4 pages of the specification.  ODF 1.1 does not does explain how to implement change tracking for many of Word’s commonly used features, and in some cases it is not even clear if the ODF mechanism makes it possible at all.
  • As a result of these differences, we found that it is not possible to implement robust and reliable tracked changes with ODF; even very simple concepts, such as deleting a row from a table, are not supported by any existing ODF implementation of tracked changes
  • There is almost no interoperability among the various non-Microsoft implementations of ODF when it comes to tracked changes.
  • To protect our customers from losing data when using tracked changes, and to avoid making an interoperability promise that would turn out to be hollow, we made the difficult decision to not support tracked changes at all in ODF

The rest of this post will cover the details of the points summarized above.  This is a long post, and it gets a little technical in places, because change tracking is inherently a complex topic.

State of Tracked Changes Interoperability

SP2 is a new implementation of ODF, but there are many existing implementations of ODF that are already in wide use.  I’ve done an informal review of them to try to understand existing practices around the use of tracked changes in ODF documents.

Here’s what I’ve found:

If anyone knows of additional information on these implementations, or any other ODF implementation that supports tracked changes, especially if you know of one which is not derived from the OpenOffice.org source code, please let me know and I’ll update that list.

To test interoperability between current ODF implementations of tracked changes, I created a simple document with some tracked changes, saved it in ODF, and then looked at what happened when I opened that document in other ODF implementations.

So the first step is to create a test document.  Using Symphony 1.2, I followed these steps:

  • Click on “Create a new Document”
  • Insert a table (Create/Table), and put some text in each cell to identify the rows
  • Add a paragraph of text, below the table, containing two sentences
  • Add a numbered list of four items, below the paragraph

The starting point for my document looks like this:

image

Then I added some change-tracking, as follows:

  • Turn on change tracking (Edit/Revisions/Record)
  • Delete the second row from the table (right-click, Row/Delete)
  • Highlight the last sentence of the paragraph and the first two items of the numbered list, up through the (DELETE) on the second item, and delete that region

My document now looks like this in Symphony:

image

One things you’ll notice here is that the row I deleted from the table is simply gone, with no change tracking recorded.  This is due to an inherent limitation in ODF’s approach to change tracking, which does not allow table changes to be tracked in a standardized manner.

More on that later, but first let’s see what happens when I save this document as ODF 1.1.  After I click Save, here’s what I  see:

image

Take a close look at the numbering of the list items, and you’ll see that the second list item has no numbering any longer.  Very strange.  And if I reject all changes in the document, the numbering of that item doesn’t come back – it disappeared somehow, the instant I saved my document as ODF 1.1.

I suppose some people might be tempted to suggest that I should use the latest OpenOffice.org release for this test, which came out a couple weeks ago.  I tried that, and I get similar – but not identical – strange behavior by following the steps above.

Speaking of OpenOffice.org 3.1, let’s open this saved document in that implementation of ODF.  When I do, here’s what I see:

image

At first glance, it looks like all of the changes were accepted.  But in fact, the changes are still in the document, and you must go into Edit/Changes/Show to make the tracked changes appear.

In Google Docs, we see essentially the same thing that OpenOffice.org displayed by default:

image

Google Docs automatically accepts tracked changes in ODF documents, and then uses its own entirely different approach for managing change tracking.  Google Docs uses a Revision History feature to track changes to documents; for example, here’s what I see when I click on Tools, Revision History when viewing this document in Google Docs:

image

It appears that Google Docs is pretty committed to this approach to change tracking, based on this recent exchange on the Google Docs Help Center site:

Jcuesta: We need Track Changes.  When?

Gill (Google Docs Guru): Who knows?  Given that we already have Revisions, quite possibly never.

Moving on to another ODF 1.1 implementation, AbiWord 2.6.8 (which does not support tracked changes), here’s how my test document appears:

image

AbiWord doesn’t support tracked changes, so I would have expected to either see the document with no changes at all, or with all changes accepted.  Instead, I see what appears to be a random re-arrangement of the document content.  On closer inspection, I think this is due to ODF’s approach to handling deletions, which requires that deleted content be stored at a location separate from where it was deleted.  I’ll explain that in more detail below.

So far, we have two applications that seem to agree on how to display this document (OpenOffice.org 3.1 and Google Docs), and two others that each have a different way of displaying the document.  Sounds messy, but it gets even worse if you start varying which application creates the document in the first place.

For example, I followed the same steps outlined above, but started from OpenOffice.org 3.1 instead of Symphony 1.2.  Here’s the result:

image

But if I load this OO.o-created document in Google Docs, I see something quite different from what I saw when I loaded the Symphony-created document in Google Docs.  Instead of all tracked changes being accepted, and the deleted text gone, now I see all tracked changes being ignored, and the deleted text (except for the deleted table row) is present, although the list numbering skips over the second item:

image 

So we’ve seen that none of these implementations track changes to tables, and the behavior when loading tracked-changes documents into applications other than OpenOffice.org or Symphony varies between several possibilities, including accepting changes, ignoring changes, and restoring deleted content to a different position in the document.  Furthermore, this is only a simple test that includes nothing but deletions.  If you start combining deletions and insertions in the ways that people typically do while collaborating on documents, you’ll find even more surprising behavior when those documents are opened in applications other than the one that created  them.  This is the state of ODF tracked-changes interoperability today.

The Cause of the Problem

The problems above are not just caused by bugs in these implementations.  Rather, they are the result of inadequate specification of change-tracking functionality in ODF 1.1, combined with a peculiar design decision in ODF’s approach to tracking deletions.

To get a feel for how thoroughly ODF specifies change tracking, it’s instructive to compare the size of the relevant sections of the ODF 1.1 and ECMA-376 specifications.  ECMA-376, which supports 100% of the change-tracking functionality that Word uses, devotes 121 pages to change tracking in Part 4, Section 2.13.5.  ODF 1.1, by comparison, has only 4 pages devoted to change tracking in section 4.6 of ODF 1.1.

There are many areas where we found that ODF 1.1’s approach to tracked changes couldn’t provide the functionality and reliability that our customers have come to expect.

Where to put deleted content?

When you delete content with tracked changes on, the content remains in the document, marked as deleted by a particular user on a particular date/time.  But where in the document?  The answer is different for Open XML and ODF.

Let’s look at a simple example, and see how the two formats handle the deleted text.  Here’s the example we’ll use, a single sentence with a word deleted from it:

image

First let’s look at how Open XML handles this deletion.  Here’s the ECMA-376 markup that Word 2007 writes out for this sentence:

image

You can see that the deleted text is inline, right where it was before it was deleted, surrounded by a delText tag.

Now let’s look at the ODF markup that OpenOffice.org 3.1 writes for this deletion:

image

In this case, the deleted word does not appear inline.  Rather, there is a text:change element inline, with an ID of ct205721376.  Within the text:tracked-changes element (which occurs earlier in the body of the document), you can see where ID ct205721376 is defined as being a deletion by Doug Mahugh, containing the word deletion inside a text:p element.

There are two problems with this approach: one problem for implementations that don’t support tracked changes, and one problem for implementations that do support tracked changes.

To see the problem for implementations that don’t support tracked changes, refer above to the AbiWord screen shot.  AbiWord doesn’t know about tracked changes, but it does know about paragraphs (text:p elements), so it displays every paragraph it finds in the document, in the order that it finds them.  Since the deleted “paragraphs” appear first in the markup, they appear first in the displayed document.

I put paragraphs in quotes there for a reason: in the simple example we’re looking at here, I did not delete a paragraph, I deleted a word from inside a paragraph.  So why is the deleted text wrapped inside a paragraph element?

The answer is that the ODF spec requires deleted content (as contained in a text:deletion element) to be schema-compliant, regardless of whether the deleted region was a well-formed element or (as in this case) merely a fragment within some other structure, such as a word within a paragraph.

This is the source of the problem I alluded to above, for implementers who choose to support ODF tracked changes.  Each implementer must decide how to synthesize markup to make each piece of deleted content into well-formed XML, and then later – when it comes time to accept or reject the change – each implementer must make decisions about how to distinguish between the synthesized packaging and the deleted content itself.

Unfortunately, the ODF specification doesn’t provide much guidance on this complex topic.  Here’s the guidance provided in ODF 1.1 (Section 4.6.4 Deletion):

To reconstruct the text before the deletion took place, do:

  • If the change mark is inside a paragraph, insert the text content of the <text:deletion> element as if the beginning <text:p> and final </text:p> tags were missing.
  • If the change mark is inside a header, proceed as above, except adapt the end tags to match their new counterparts.
  • Otherwise, simply copy the text content of the <text:deletion> element in place of the change mark.

This guidance works for very simple cases, but does not allow for complex situations such as deleting part of a table, as described below.  A specific implementer may come up with an approach that works within their application, but since the spec doesn’t say how to synthesize the markup for the shim, what shows up as a deletion in one application might show up as a different deletion, or not deleted at all, in a different application.

The approach used by ECMA-376, as shown in the example above, keeps the delete text inline where it was deleted, thus eliminating all of these issues.  There is no extra synthesized markup added when a deletion is saved, and therefore implementers don’t need to make decisions about how or whether to remove that markup when it comes time to accept or reject the changes.

Changes to Tables

The ODF 1.1 specifiation says (in section 8.11) that “Change tracking of tables is not supported for text documents.”

And indeed, no existing ODF implementation that I’m aware of attempts to track changes to tables, such as adding or deleting rows or cells, modifying table properties or grid layout, and so on.  Looking at Section 4.6, it’s easy to see why this is so: there is no information provided about how to track table changes, and it’s not at all obvious how one would do so within the current mechanism.

Deleted sections of tables would be especially problematic in ODF, because of the need to create a shim to make the relocated deleted content schema-valid.  The ODF spec provides some guidance on how to revert deleted paragraph content (as quoted above), but for tables, there is no such guidance.

So if a row of a table is deleted, what should an implementer do?  Store in <text:tracked-changes> a table with one row inside the deleted-content section?  And how would another implementation know whether that indicates a deleted row of a table, or a deleted one-row table?

In the ECMA-376 specification, on the other hand, there are defined mechanisms for tracking changes to tables.  As one example, consider the simple act of deleting an row from a table while change-tracking is turned on.  In ODF, that row is simply gone, and reverting your tracked changes later will not recover the row.  But in Open XML, the <del> element can be applied to a table row, and as stated in Section 2.13.15.4, “This element specifies that the parent table row shall be treated as a deleted row whose deletion has been tracked as a revision. This setting shall not imply any revision state about the table cells in this row or their contents (which must be revision marked independently), and shall only affect the table row itself.“

Format Changes

Tracking changes also entails tracking changes to document formatting properties.

ECMA-376 has many elements dedicated to tracking formatting changes, including pPrChange, rPrChange, sectPrChange, tblPrChange, tblPrExChange, tcPrchange, and trPrChange.  These elements are described over 17 pages (pages 1015-1032 of Part 4).

ODF 1.1, on the other hand, has a single format-change element, which is documented as follows in Section 4.6.5, Format Change:

A format change element represents any change in formatting attributes. The region where the change took place is marked by a change start and a change end element.

Note: A format change element does not contain the actual changes that took place.

Much was made during the IS29500 standards process of the difference in the size of the ODF and Open XML specifications.  This is a good example of where that difference comes from: in this case, a concept glossed over in three vague sentences of the ODF spec gets 17 pages of documentation in the Open XML spec.

Summary

This has been a long blog post, but I wanted to make sure that people understand why we made the difficult decision to not support tracked changes in our Office 2007 SP2 implementation of ODF.

When you load an ODF document containing tracked changes into Word 2007 SP2, all existing changes will be accepted, and you will not be able to save any further tracked changes in the document unless you save as DOCX.  This is an inconvenience, but a necessary one to protect users from unexpected surprises in the various scenarios outlined above.  Keep in mind that you can still use Word’s document compare feature to compare a previous version of an ODT file to a newer version, in order to see what changed.

Finally, there are a few questions that I anticipate some people may ask, so I’d like to address those here …

Couldn’t you have at least supported tracked changes for simple cases, as OpenOffice.org does?

Change tracking that handles “some” or even "most” of the changes a user makes would be extremely risky to use, because the user may be surprised to discover later that certain types of changes were not being tracked.  We’ve learned through clear feedback we get from our customers that a feature which works “most of the time” can be worse than no feature at all.  Users count on accurate, reliable change tracking for managing updates to their critical business documents.

We really wanted to make change tracking work for our ODF implementation in Office 2007 SP2. I’ve spoken to some of the developers on the Word team, who wrote a lot of code for this and really tried to solve the problems. But ultimately our test team pointed out that the feature was just not “ship quality” and there was no good way to make it better without extending ODF - which our first principle of Adhere to the ODF 1.1 standard told us not to do.

Will change tracking be improved in ODF 1.2?

Unfortunately, it doesn’t look like it.  The current draft of ODF 1.2 contains no additions to Section 4.6 of ODF 1.1 (which is Section 4.5 in ODF 1.2 due to renumbering).  The only change is that the examples have been removed from the section.

Why didn’t Microsoft work to get this fixed in the ODF TC?

We joined the OASIS ODF TC last June, and we started slowly because some people have stated concerns about Microsoft having too much influence on ODF’s direction.  The first proposal we made was a very simple proposal to add two optional attributes to indicate maximum grid size for spreadsheet applications, which would have addressed a specific real-world interoperability problem we encountered with a major ODF implementation.  Other TC members argued against this proposal, and after several such exchanges we decided not to push the matter.

We then continued submitting proposed solutions to specific interoperability issues, and by the time proposals for ODF 1.2 were cut off in December, we had submitted 15 proposals for consideration.  The TC voted on what to include in version 1.2, and none of the proposals we had submitted made it into ODF 1.2.

We look forward  to contributing more to the ODF TC in the future, and we would welcome the opportunity to work with other TC members to improve ODF’s ability to handle tracked changes.

Postedby dmahugh | 33 Comments    
1 + 2 = 1?
09 May 09 11:26 PM

Does 1 plus 2 equal 3?   After last week’s sometimes acrimonious discussion about formulas in ODF, you may be glad to hear that IBM and Microsoft appear to agree on that answer to this simple question.  But OpenOffice.org is not so certain – maybe the answer is just 1 sometimes – and the question itself turns out not to be so simple after all.  Let me explain.

The State of ODF Formula Interoperability Today

What is the current reality of ODF formula interoperability?  Understanding the status of the ODF ecosystem will help clarify the set of issues and options that we faced when making the tough decisions we had to make about how to best support formulas in ODF spreadsheets.

For this example, I’ll use the latest released versions of two well-known ODF implementations: IBM Lotus Symphony (version 1.2, download here) and OpenOffice.org (version 3.1, download here).  I want to talk about current reality, so I’m not using any outdated versions of software (the OO build I’m using, for example, was released in the last week).  I also stayed away from unreleased or private beta versions that might become available sometime in the future, and I used the default settings for each application.

First, I fired up Symphony 1.2, and followed these steps:

  • Enter a numeric value of 1 in cell A1.
  • Format cell A2 as text, right-justified, then enter a 2 in that cell.
  • In cell A3, enter the formula =A1+A2.

In Symphony 1.2, here’s what I see:

image

After saving this spreadsheet as an ODS file, I open it in OpenOffice.org 3.1 and see this:

image

Clearly this is a problem.  The exact same data, in the exact same spreadsheet, when operated on with the exact same formula, provides different results.

Some might be tempted to say that formatting a cell as text and then using it in a calculation is dumb.  And I’d agree that there are few people who ever do such a thing intentionally.  But in a large complex spreadsheet, with thousands of cells involved in complex calculations, it’s easy to make mistakes like this.  In fact, if you’ve spent any amount of time at all creating complex spreadsheets, I’ll bet that on more than one occasion you’ve wasted a bunch of time trying to debug a problem that turned out to be caused by such mistakes; I know I sure have.

Similar issues arise with boolean values – what does it mean to “sum” a column of cells that includes both numeric values and boolean values?  Not all spreadsheet implementations agree on the answer to that question, either. This can create interactions between formatting and calculating – change the format of some cells, and the totals change in your spreadsheet.  Most users find such behavior very confusing, to say the least.

One of the most interesting things I found in my testing of these two implementations was that although they write different markup for formulas, the exact same interoperability problem occurs regardless of which application is used to create the spreadsheet.

If you create the spreadsheet in Symphony 1.2, as I did, the table:table-cell element has a table:formula attribute with a value of "=[.A1]+[.A2]".  And this formula will yield a result of 3 in Symphony and 1 in OpenOffice.org, as described above.

If instead you create the same spreadsheet in OpenOffice.org 3.1, when you open it in Symphony 1.2 you'll see of:=A1+A2 in cell A3.  But after you manually correct the formula, this spreadsheet, too, will yield a result of 3 in Symphony and 1 in OpenOffice.org.

So these two ODF implementations do not have predictable formula interoperability, regardless of where you start.  And these are not obscure implementations – they are the latest released versions of the implementations from IBM and Sun, the two companies that together chair the ODF TC.  Even if both companies released fixes tomorrow, there will still be many copies of the current non-interoperable versions of these applications in use for a long time to come.  This is the state of formula interoperability among ODF spreadsheets today.

Fixing the Problem

This difference in behavior is a well-known issue among those who work with spreadsheet formulas.  As Rob Weir said three years ago “Automatic string conversions considered dangerous. They are the GOTO statements of spreadsheets.”  (One of the ODF TC members even has that line in his email auto-signature.)

How to manage string conversions is far from the only problem with spreadsheet interoperability across vendors (and even across versions of the same product in some cases). The current draft OpenFormula specification contains 254 notes (by my count) about other issues similar to this one.

The OpenFormula sub-committee of the ODF TC has worked hard to address this.  Here is an excerpt from the draft OpenFormula specification (emphasis added):

6.2.4 Conversion to Number

If the expected type is Number, then if value is of type:

  • Number, return it.
  • Logical, return 0 if FALSE, 1 if TRUE.
  • Text: The specific conversion is implementation-defined; an application may return 0, an error value, or the results of its attempt to convert the text value to a number (and fall back to 0 or error if it fails to do so). Applications may apply VALUE() or some other function to do this conversion, should they choose to do so. Conversion depends on the actual locale the application runs in, especially if group or decimal separators are involved. Note that portable spreadsheet files cannot depend on any particular conversion, and shall avoid implicit conversions from text to number.

After OpenFormula is approved and published, this approach, with its explicitly defined concept of “portable spreadsheet files,” will allow more predictable and consistent interoperability for ODF spreadsheet users.

But in the current environment, with no standardization of formula markup across major ODF implementations, users who want to avoid interoperability problems need to stick to a very conservative strategy.  As Burton Group analyst Guy Creese said last week:

“… this in-between time (between the OpenOffice.org de facto standard and the wait for the officially approved 1.2 standard) means there isn't one way to handle this problem. The vendors would like you to believe that there is (their way), but in reality there isn't. Ultimately, this will resolve itself over time. ODF 1.2 will be approved, and there will finally be an approved standard that everyone--IBM, Microsoft, Sun (Sun/Oracle)--can follow.

Until then, if an enterprise does want to use ODF, the best strategy is to stick with one productivity suite as a way to avoid these interoperability problems. That way, even if formula support is idiosyncratic, it at least will be consistent within the enterprise.”

 How Excel 2007 SP2 Handles ODF Formulas

The question of how to handle formulas in SP2’s ODF implementation was one of the tough decisions we faced in our ODF implementation.  We had made conformance to the ODF 1.1 specification a top priority, and yet the spec doesn’t specify a formula language. 

It seemed clear to us that we couldn’t simply omit the  namespace, as the current version of Symphony does.  That would be in violation of Section 8.1.3 of the ODF specification, where it says “Every formula should begin with a namespace prefix specifying the syntax and semantics used within the formula.”

What about using the same of: namespace that OpenOffice.org 3.1 uses?  We saw a couple of pretty serious problems with that approach as well:

  • It would not be interoperable with some existing implementations, such as the widely  used current version of IBM Lotus Symphony.
  • It is based on a draft specification that has not been finalized or approved as a standard, and therefore could still change.

What about using the oooc: namespace that OpenOffice.org 3.1 writes when you choose its ODF 1.1 compatibilty mode? That syntax is on its way out for everyone, and we saw no point creating yet another new implementation of something that is clearly going to be deprecated soon.  And it doesn’t solve the problem: OpenOffice.org 3.1 writes the oooc: namespace prefix in its ODF 1.0/1.1 compatibility mode, and those spreadsheets still can yield different results in OpenOffice.org and Symphony.

After a robust internal debate on the topic, it became clear what we needed to do to apply the first two of our five prioritized guiding principles for Office’s ODF implementation:

  • Adhere to the ODF 1.1 standard
  • Be Predictable
  • Preserve User Intent
  • Preserve Editability
  • Preserve Visual Fidelity

As we discussed in several DII workshops starting back in July of 2008 (with multiple ODF implementers and multiple ODF TC members in attendance), these guiding principles are in priority order. When we could not achieve them all, we choose the top ones first.

To adhere to the ODF 1.1 standard, we begin formulas with “a namespace prefix specifying the syntax and semantics used within the formula.”  Excel 2007 SP2 uses an msoxl prefix and write the formula attribute like this:

table:formula="msoxl:=A1+A2"

That fulfills our goal of adhering to the standard since ISO/IEC 29500 defines both the syntax and semantics of this namespace.  Then, to provide a predictable user experience across all spreadsheets, we elected to support this namespace, and only this namespace.

If I move my spreadsheet from one application to another, and then discover I can’t recalculate it any longer, that is certainly disappointing.  But the behavior is predictable: nothing recalculates, and no erroneous results are created.

But what if I move my spreadsheet and everything looks fine at first, and I can recalculate my totals, but only much later do I discover that the results are completely different than the results I got in the first application?

That will most definitely not be a predictable experience.  And in actual fact, the unpredictable consequences of that sort of variation in spreadsheet behavior can be very consequential for some users.  Our customers expect and require accurate, predictable results, and so do we. That’s why we put so much time, money and effort into working through these difficult issues.

What Does Excel 2007 SP2 Do With the Example Above?

The answer is that we agree with IBM: 1 + 2 = 3.

Excel does the same thing Symphony 1.2 does, converting the text “2” to a numeric 2 and using that value in the calculation, so that the total is 3.  Excel does this because this type of automatic conversion – which has been a popular Excel feature for a very long time – is allowed by the semantics of the formula markup language Excel uses.

The formula markup that Excel uses is based on the formula language defined in ECMA-376 and ISO/IEC 29500, and here’s what it says about type conversion in Section 18.17.2.6 (Types and Values) of Part 1 of IS29500:

An implementation is permitted to provide an implicit conversion from string-constant to number. However, the rules by which such conversions take place are implementation-defined. [Example: An implementation might choose to accept "123"+10 by converting the string "123" to the number 123. Such conversions might be locale-specific in that a string-constant such as "10,56" might be converted to 10.56 in some locales, but not in others, depending on the radix point character. end example]

Excel’s approach to formulas in ODF, as well as our approach to other difficult issues, is completely public and fully documented in the implementer notes for SP2.  As the note for this issue explains:

The standard defines the attribute table:formula, contained within the element <able:table-cell>, contained within the parent element <office:spreadsheet table:table-row>

This attribute is supported in core Excel 2007. This attribute is supported in core Excel 1. When saving the Table:Formula attribute, Excel precedes its formula syntax with the "msoxl" namespace. 2. When loading the attribute Table:formula, Excel first looks at the namespace. If the namespace is "msoxl", Excel will load the value of Table:formula as a formula in Excel. 3. When loading the Table:formula attribute, if the namespace is missing or unknown, the Table:formula attribute is not loaded, and the value "Office:value" is used instead. If the result of the formula is an error, the element <text:p> will be loaded and mapped to an Error data type in Excel. Error types not supported by Excel are mapped to #VALUE!

The Question of Syntax

I’d like to also address the issue of cell reference syntax in the ODF 1.1 specification, since that was also a topic of much discussion on several blogs last week.  I’ll start with some quick background for those who don’t wallow in standards documents for a living.

The English language is inherently an ambiguous thing,  and great literature sometimes uses the ambiguity to good effect.  Words can have more than one meaning, and verb phrases might be intended to go with one noun or with another, as in famously ambiguous job references like “You will be very fortunate to get this person to work for you."

Writers of technical standards like to use rules and procedures that are designed to avoid this sort of problem.  These rules, which place requirements on the  use of words like should, shall, must and may,  tend to result in a stilted writing style which gets tedious fast, but reduces the need to agree on what is “obvious” or “implied” when interpreting the meaning of the text later.

A standards document is said to contain both normative language and informative language.   The things you must do to comply with a standard are supposed to be in the normative part,  and things like examples and introductions are informative.   So that everyone can be  sure about which parts are which,  the normative parts use specific phrases like “shall” and “shall not” to clearly label the things the standard actually requires you to do.

So the debate about Excel 2007 SP2’s cell reference syntax comes down to whether the few sentences in the ODF 1.1 spec which cover this were meant to be informative or normative.  The section of ODF 1.1  in question does not use the words shall or must.   It introduces the topic with the phrases “typically” and “can include”.   In our reading of it,  this language makes that part of the specification informative, stating no requirements for implementers.

The ODF 1.1 spec is casual about applying the rules of normative language, and as a result ODF 1.1 has more than its share of ambiguity.  The ODF 1.2 draft, however, is already much improved in this regard, mainly through the great work of ODF editor Patrick Durusau.  The OpenForumla draft specification is extremely careful in its use of normative language, and that will help implementers a great deal when they sit down to write their software.

When Will Office Support OpenFormula?

This question has come up on some blogs, so I’d like to address it here as well.

The real question is “when will Office support ODF 1.2,” since OpenFormula is simply a part of the ODF 1.2 specification.  And the answer is that we don’t know yet, because nobody knows yet when ODF 1.2 will be published as an OASIS or ISO standard.  As I said in the previous post, “we will look closely at Open Formula when it becomes a standard and make a decision then about how to best proceed.”  (It looks like IBM has committed to supporting ODF 1.2 and OpenFormula in late 2010.)

In the meantime, if you want to use Excel 2007 SP2 to edit documents that contain formulas from OpenOffice.org or Symphony, and preserve those formulas through editing sessions, and you understand the risk that the results might not be the same, you have a couple of free options.

The Open XML / ODF Translator Add-Ins for Office can be used with Office 2007 SP2, and as covered on the translator team blog, supports a variety of formula namespaces.

The Sun ODF Plugin provides yet another option, and apparently works with SP2.

Postedby dmahugh | 58 Comments    
ODF Spreadsheet Interoperability
05 May 09 02:58 PM

Rob Weir posted on his blog a couple of days ago an Update on ODF Spreadsheet Interoperability.  I think it’s great that he has brought up spreadsheet interoperability, and specifically the issue of formulas, which seems to be the main thrust of his post.  I mentioned on the day of our SP2 release last week that “I’ll be doing some blog posts that get down into more of the technical details, to help explain some of the engineering decisions that we made in our implementation,” and Rob’s post is a good starting point for that conversation.

For those who only want to read the first two paragraphs of this very long post, here’s a summary:

  • ODF’s lack of spreadsheet formula syntax creates some interoperability challenges;   Because ODF 1.0 and 1.1 do not support formulas, all ODF spreadsheet implementations are application-dependent
  • We’ve worked hard to overcome these challenges in ways that provide accurate results and predictable interoperability
  • We have been fully transparent about the decisions we’ve made in our ODF implementation, both in terms of the guiding principles we’ve followed and also the specific details published in our implementer notes 
  • The Open Formula specification is not yet a standard, so we do not support it in its unfinished state, but we will look closely at Open Formula when it becomes a standard and make a decision then about how to best proceed

Before I get into the details, I think it’s worth noting that there’s nothing new here.  The challenges of ODF’s lack of formula specification have been around for a long time, and many people saw the current situation coming.

Nearly three years ago, Stephen McGibbon had a good post covering the situation, which is worthwhile reading for some background on how we got to where we are today.  As you can see from the quotes in Stephen’s blog post, many people in the ODF community – and the broader standards community – were dismayed at the decision to not include formula syntax in ODF.  Others outside of Microsoft also pointed out the problem years ago.  Rob’s blog post, as well as this post, are excellent examples of the sorts of interoperability challenges that those people saw coming as a result of the decision to not include formula syntax in the ODF standard.

Testing Methodology

The first thing I did upon reading Rob’s post was to try to follow his steps for myself, so that I could understand the context of his findings.  Here’s the methodology he describes in his post:

The test scenario I used was a simple wedding planner for a fictional user, Maya, who is getting married on August 15th. She wants to track how many days are left until her wedding, as well as track a simple ledger of wedding-related expenses. Nothing complicated here. I created this spreadsheet from scratch in each of the editors, by performing the following steps:

  • Enter the title in A1 "May's Wedding Planner" and increased font size to 14 point.
  • Enter formula = TODAY() in B3 and set US style MM/DD/YY date format/
  • Enter the date of the wedding as a constant in cell B4, also setting date format.
  • Added simple calculations on cells B6-B8, to calculate days, weeks and months until the wedding.
  • A11 through E16 is a simple ledger of the kind that is done thousands of times a day by spreadsheet users everywhere. Once you have the formula set up in column E (Balance = previous balance + credits - debits) then you can simply copy down the formula to the new row for each new entry

Sounds simple enough.  So I fired up Open Office 3.0.1, and followed those steps.  The resulting spreadsheet looked like this:

image

Then I saved the document, by clicking File/Save and then typing in a filename:

image

So far so good.  Next step, I tried opening the document in IBM Lotus Symphony version 1.2.0.  Here’s what I saw:

image

And then I opened the same document in Excel 2007 SP2, and here’s what I saw:

image

This is a great example of a common ODF spreadsheet interoperability challenge, and two different ways of dealing with it.  The challenge is caused by the fact that Open Office writes formulas in a syntax that is unknown to Symphony and Excel.  Open Office, for reasons I don’t understand, has decided to use as their default formula syntax the unfinished Open Formula specification, which is neither approved nor published by OASIS – not even out for public review yet.

So what do Symphony and Excel do about this challenge?  The answer is that Symphony preserves the (unrecognized)  formula markup, and Excel preserves the cached values.  (A quick aside for those who don’t know: spreadsheets typically store both the formula and the value resulting from the most recent recalculation.)

Getting back to Rob’s initial premise of this being a typical wedding-planning exercise, if Maya were to send this spreadsheet to a person using Microsoft Excel SP2, that person would see the values as shown above.  They’d know at a glance what day was ‘today’ when Maya made the file, and that the ledger balance that day was 5500.

But if Maya were to send this spreadsheet to a person running IBM Lotus Symphony, they’d see only the formulas.  Perhaps an ODF markup expert like Rob would be able to massage that spreadsheet into something usable, but most people would find it a bit hard to conceive of what it means for a wedding to be “of:=B4-B3” days away, or for there to be a ledger balance of “of:=E15+C16-D16” dollars.

So what does Rob’s test matrix show for these two scenarios?  Oddly, it labels Open Office to IBM Lotus Symphony interoperability for this scenario as “OK” and it labels Open Office to Microsoft Office SP2 interoperability as “Fail” (with a red background for added emphasis).  Now, I know Rob works for IBM and probably wants to portray Symphony in the best possible light, but is that a reasonable assessment of the interoperability we’ve just seen above?

After further investigation, though, I think I see what Rob may have actually done to get the result in his table.  He seems to have included some steps that aren’t documented in his blog post:

  • In Open Office Calc, he went into Tools, Options, Load/Save, General.
  • For “ODF format version” he changed the setting from “1.2 (recommended")” to “1.0/1.1 (OpenOffice.org 2.x)”
  • The dialog then warned him that “Not using ODF 1.2 may cause information to be lost.”
  • He clicked OK to save the change.

Here’s how it looks, for those who don’t have Open Office handy:

image

After following these steps, Rob was then able to create a spreadsheet that stores formulas in the undocumented non-standardized syntax that was used by old versions of Open Office.  Symphony, being simply a fork of an older version of the Open Office code base, is able to understand those formulas, so it can load both the values and the formulas themselves.

It’s worth noting what the OpenOffice.org developers have to say about this option that Rob has apparently used for his interoperability testing:

image

So it sounds like there isn’t any single version of ODF that will provide compatibility across all versions of Open Office and Symphony.   You can use the “may cause information to be lost” option if you want to do a demonstration of formula interoperability with ODF, but if you want to demonstrate text-bullet interoperability, you may need to use another option.

And what does Excel 2007 SP2 do with the document saved in this alternative format?  Exactly the same thing it does with Open Office’s current default format: it displays the data, so that the document user can see the results of the last recalculation of the spreadsheet, and it ignores the formulas that are written in a non-standardized syntax that Excel doesn’t support.  I think that’s a pretty reasonable approach, when a spreadsheet application comes across non-standardized formula syntax: show the last recalculated result, thus preserving the data, and don’t try to guess at the semantics of undocumented formula markup.

Why Doesn’t Office 2007 SP2 support Open Office formula syntax?

That’s a logical question to ask, in regard to how SP2 handles formulas.  To answer it accurately and completely, we should distinguish between the two formula syntaxes that Open Office uses.

The first is the syntax they use in their non-recommended “1.0/1.1 (OpenOffice.org 2.x)” setting.  This is an undocumented, deprecated syntax, and therefore not a reliable mechanism for formula interoperability.  Despite what you may read on some blogs, it is not the same syntax as used by Excel 97/2000/2003.  Open Office copied quite a bit of the feature set from Excel, and there are definitely similarities in the formula syntax, but there are also differences with regard to referencing, operators, data types, and function arguments.

The other formula syntax that Open Office supports is the Open Formula syntax, which will eventually appear in ODF 1.2.  This syntax has not yet been approved by a standards body, nor has it undergone the 60-day public review period that OASIS requires prior to approval and publication.  It may go to public review soon, but it won’t be a standard until later this year at the soonest, and the details may change as a result of the remaining TC work and the public review process.  (According to recent discussions in the ODF TC, we may send the other parts of ODF 1.2 out for public review first, to allow more time to finish up Open Formula.)

In Office’s implementation, we haven’t chosen to support the draft Open Formula spec (as Open Office currently does), because we have certain obligations when we ship software that don’t apply to open-source projects like Open Office.  We need to test and verify behavior to a degree that’s not possible without final, fixed documentation that is believed to be 100% complete and accurate.  When Open Formula is completed, standardized, and published, we'll be looking at that as the future path for enabling formula interoperability in ODF spreadsheets.  But we’re not there yet; ODF 1.2 is not done, and not even ready for public review.

It’s interesting to note that we have discussed this very issue at a DII workshop.  Last July, we had a workshop in Redmond, with attendees including other ODF implementers, members of the ODF TC, standards professionals, and others.  In the roundtable discussions, I brought up our approach to formula support as outlined above, and asked for feedback.  Although it was not 100% unanimous, there was clear consensus among most of the participants in the discussion that they did not want us to implement a non-standard formula syntax in anticipation of it becoming a standard.  “Putting the cart before the horse” in that manner was seen by many as a possible source of future interoperability problems, rather than a solution to them, and we took that feedback into consideration.

As I’ve covered before, we feel that thorough documentation of implementation details is a cornerstone of document format interoperability.  We’ve published detailed implementer notes for our ODF implementation, and on the matter of formulas (which are stored in the table:formula element), here’s what our implementer notes have to say:

The standard defines the attribute table:formula, contained within the element <table:table-cell>, contained within the parent element <office:spreadsheet \ table:table-row>

This attribute is supported in core Excel 2007.

1. When saving the table:formula attribute, Excel 2007 precedes its formula syntax with the "msoxl" namespace.

2. When loading the attribute table:formula, Excel 2007 first looks at the namespace. If the namespace is “msoxl”, Excel 2007 will load the value of table:formula as a formula in Excel.

3. When loading the table:formula attribute, if the namespace is missing or unknown, the table:formula attribute is not loaded, and the value “office:value” is used instead. If the result of the formula is an error, Excel 2007 loads the <text:p> element and maps the element to an Error data type. Error data types that Excel 2007 does not support are mapped to #VALUE!

And, as both Rob’s tests and mine show, that is exactly what Excel does.  It would be great if there were a place implementers could go to see these sorts of details for all major ODF implementations.

Summary

I’m glad to see this sort of public scrutiny of the details of ODF interoperability and how the underlying challenges are handled by various implementations.  As you can see, spreadsheet interoperability is a complicated topic, and in the specific case of ODF spreadsheets, there is even more complexity created by the lack of a defined formula syntax in any published version of ODF.

The good news, when it comes to formulas, is that the Open Formula specification will address this area soon.  My colleague Eric Patterson represents the Excel team in the Open Formula SC, and the very capable David Wheeler leads that group.  Much good work has been done already, and we look forward to seeing the final Open Formula spec go out for public review and then approval by OASIS.  The nearly 400 pages of formula syntax documentation in ISO/IEC  29500 (Part 1, section 18.17) enables reliable formula interoperability in the Open XML community, and soon the ODF community will have a similar level of formula interoperability.

But formulas are not the only ODF interoperability challenge.  As members of the ODF TC and also the OIC (ODF Interoperability and Conformance) TC, both Rob and I – and many others – will need to work together to enable better interoperability in areas including tracked changes, mail merge, application settings, and others.  Will ODF 1.2 be the most interoperable version of ODF yet?  I hope so, and there are signs that it will be.  But our work is not nearly done.

Postedby dmahugh | 60 Comments    
Links for 05/04/2009
04 May 09 04:22 PM

PHPPowerPoint 0.1.0 was released last week, as an open-source PHP API for generating PPTX files, much like the PHPExcel API for XLSX files.  Maarten Balliauw has a blog post with more information, download links, and sample code.

A new post at OpenXMLDeveloper.org covers the Simple OOXML Library, a set of classes that sit on top of the Open XML SDK to help developers create word-processing documents and spreadsheets quickly with minimal programming required.  The abstractions of this library make it possible to be immediately productive with Open XML even if you haven’t studied the details of the spec yet.

Eric White has a guest post by Bob McClellan on how to use extension methods with Open XML SDK classes to manage changes to documents.  The technique Bob outlines – which is used in  the Open XML Power Tools – includes cached XDocument objects that are added to parts as annotations, and semaphores to track where changes have been made.  The end result is a great example of how “a very small amount of carefully designed code can create very powerful functionality.”

Jesper Lund Stocholm’s post last week on Extending OOXML presents some good advice for implementers about how to manage extensions that are created via MCE (Markup Compatibility and Extensibility, defined in Part 3 of IS29500), as well as some thoughts on the future evolution of IS29500.  It’s a good example of the types of debates we have in WG4, and not coincidentally a couple of other WG4 members have participated in the comment thread.

Version 3.0 of the ODF translator add-in project was released last week, just four months after Version 2.5.  This release is focused on ODF 1.1 conformance and interoperability with Office 2007 SP2.  The ODF translator team blog has all the details.

Alex Brown’s Notes on Document Conformance and Portability #1 covers how Open Office, Google Docs, and Office 2007 SP2 handle a mixture of left-to-right and right-to-left text.  All three applications handle his example correctly, and as he says “Where we have a solid spec and observable behaviour we can compare the two and make a judgement.”  Noting the #1 in the post title, I’m guessing we’ll see more of these sorts of tests from Alex.

If you’ve ever played around with Ruby, you know that it’s the coolest programming language to come along in years.  (#LanguageTrollBait)  I came across a post that Tomas Varsavsky wrote a month ago on how to generate DOCX files from a template that includes Ruby source code, using a technique that includes actual Ruby source code within fields.  Very cool.

Last week’s blog post on SP2’s ODF implementation is already my most-viewed post of 2009, so there is clearly much interest in this topic.  Oliver Bell has a good summary of that announcement and what it means going forward.

On a related note, I see that Rob Weir posted today regarding some ODF spreadsheet interoperability testing he’s been doing.  He raises a few interesting points, including some topics that could use further explanation and clarification, so I’m planning to respond in some detail in my next blog post later this week.  Stay tuned.

Postedby dmahugh | 2 Comments    
Working with ODF in Word 2007 SP2
28 April 09 05:00 AM

For those of us on the Office Interoperability team, as well as our colleagues throughout Office, today is a big day.  We’ve released SP2 (Service Pack 2 for Office 2007), which includes a bunch of updated features.  Gray Knowlton has a roundup of what’s new in SP2, but I think the feature of most interest to readers here is probably the built-support for ODF 1.1.

I first mentioned our plans for ODF support in a blog post last year, and I’ve also blogged in the past about the guiding principles that we followed in our ODF implementation.  Our decision to support ODF is just one aspect of Office's broad commitment to choice and interoperability, as covered by Tom Robertson today on the Microsoft on the Issues blog.

For today’s post, I thought I’d put together a hands-on example of a typical user experience when working with ODF and Office 2007 SP2.  I’m going to focus on a typical document creation and editing scenario in Word.  Specifically, I’ll go through these steps:

  • Create a typical document in Word 2007 SP2, and save it as ODF.
  • Open that document in OpenOffice 3.0.
  • Back in Word, add some fancy styling and other typical enhancements to the document, then save the fancier version in ODF.
  • Open that fancier version in OpenOffice.

The starting point.  As a first step, I’ll create a document we can use as a starting point to try out some things.  So I select File/New in Word, add some text, insert a few of the things we all use regularly in documents (a title, headings of various levels, a numbered list, and a table), and do some simple formatting.  Here's how it looks:

image

The next step is to save this as an ODT document.  That’s pretty simple – – just click the Office Button,  move your mouse to ‘Save As”,  and then select “OpenDocument Text” from the menu.  Before I go any further, it’s worth noting a couple of things about this step:

  • You can make ODF the default document format if you’d like, and then you won’t need to select it from the dropdown list each time
  • I’ll get a message warning me that my document may contain features that aren’t compatible with this format, because ODF can’t represent 100% of the things we can do in Word

Now I’ll open this document in OpenOffice version 3.0.1.  In a future post I’ll look at differences between various existing ODF implementations, but for today’s post I’m just going to stick to OpenOffice 3.0.1 and Office 2007 SP2.

When I open my ODT document in OpenOffice Writer, here’s what it looks like:

image

As you can see, the document looks essentially the same in both applications.  The page break is the only obvious difference – it occurs at a different point in the document due to differences between the default line-spacing values used in Word and OpenOffice.  Other than that detail, the document looks the same in both applications, with the same fonts, formatting, headings and content.

The line-spacing variation is something you can see in other ODT documents and other ODF implementations as well.  For example, if you open the latest draft of the ODF 1.2 specification (OpenDocument-v1.2-cd01-rev06.odt) in IBM Lotus Symphony 1.2.0, it is 931 pages long, but if you open the same document in OpenOffice Writer 3.0.1, it’s 875 pages long.  These types of variations demonstrate a fundamental difference between a fixed-layout format (such as PDF or XPS) and a flow-oriented layout like ODF or Open XML.  Flow-oriented formats work well for dynamic editing activities, whereas fixed-layout formats rigidly pin down the layout of a document so that it will be rendered exactly the same on different devices.  For these reasons, most people prefer to use a flow-oriented format during document authoring and editing, and a fixed-layout format for published documents that are no longer being edited.

Getting Fancier.  Now let’s move on to some fancier formatting and see how that works.  I’m going to open this document in Word and make a variety of changes:

  • I’ll switch to a different styleset, which will alter all of the styles in the document; I’ll choose the “Modern” styleset from Word’s built-in options
  • I’ll Insert an image into the body of the document, with square text-wrapping around it
  • I’ll apply a table style to the table; I’ll use one with header-row and first-column formatting turned on, as well as row and column banding
  • I’ll insert a header and a footer, using Word’s “Annual” style for header and footers
  • I’ll insert a table of contents, using the default settings

As a result of these changes, my document now looks like this in Word:

image

And if I save that version as an ODT file and open it in OpenOffice, I see this:

image

You’ll notice that many things are identical in both Word and OpenOffice, and a few things look a little different in each application.  Here are some things that are the same in both applications:

  • All of the content is the same – nothing is missing in either application
  • All of the title/header/text styling is the same
  • The table styling is the same
  • The header and footer look the same
  • If you were to try clicking on the links in the table of contents, you’d find that these work the same in both applications (i.e., clicking on an entry takes you to that part of the document)

And here are some things that appear differently in the two applications:

  • The formatting of the hyperlinks in the Table of Contents is different, due to differences in Word and OpenOffice’s default styling for hyperlinks
  • The document is a little longer in OpenOffice than in Word, due to issues like the default line-spacing issue mentioned above
  • The text-wrap margins around the inserted image also differ slightly, again due to differences in application defaults

If you’d like to test these sample documents yourself, they’re in a ZIP file attached to this blog post (below).

Getting more information. This demonstration was just a simple example, for those who are curious about how the new built-in ODF support works in Office.  You can find more detailed information about SP2’s support for ODF 1.1, including which features are supported by Word, Excel and PowerPoint, at these links:

Going forward, I’ll be doing some blog posts that get down into more of the technical details, to help explain some of the engineering decisions that we made in our implementation.  For example, tracked changes functionality is of interest to many users, so I’m working on a post to cover why we decided to not implement tracked changes in ODF.

What else would you like to understand about our implementation of ODF?  Share your questions and thoughts in the comment thread, or email me (dmahugh at microsoft dot com) if you have suggestions for topics you’d like to see covered here.  I’m very proud of the work my colleagues on the Word, Excel and PowerPoint teams have done to add ODF support, and I’m looking forward to discussing the details now that SP2 has been released.

Postedby dmahugh | 21 Comments    
Attachment(s):SampleDocs.zip
Miscellaneous Links, 04/22/2009
22 April 09 01:13 PM

Catching up on links to blog posts I’ve found interesting this month …

I was on vacation the first week of April, and missed out on the announcement of the release of the Open XML SDK Version 2 April CTP.  Zeyad Rajabi has details of the SDK version 2 over on Brian Jones’s blog, including a great example of the validation technology that is being built into the SDK.  Zeyad also has a new post this week on how to remove comments from Excel and PowerPoint files.

Stephen Peront, who will be in London at the DII workshop on May 18, has posted an example of how to use the new External File Converter API in Office 2007 SP2.  This API gives developers the ability to add their own custom formats to Office’s list of supported formats – the list that appears in the dropdowns on the File/Open and Save/As dialogs.

Eric White has an overview of the tradeoffs involved in choosing between the DocumentBuilder class in the Open XML Power Tools and the altChunk functionality in the Open XML spec and another post on DocumentBuilder that covers using DocumentBuild with content controls for document assembly.

Speaking of the Open XML Power Tools, there’s a new version out.  Eric also has a post on the Release of PowerTools for Open XML V1.1.1 and a guide to getting started with PowerTools development, and Lawrence Hodson has a post on OpenXMLDeveloper.org that covers how to generate form letters using the PowerTools for Open XML and PowerShell scripting.

Gray Knowlton has announced the OfficePalooza VBA contest, which includes challenges for every level of VBA developer, from casual users to advanced VBA experts.  If you know how to use VBA to make Office apps sing and dance, it’s a great opportunity to show off your skills.  More info on the OfficePalooza site.

Rick Jelliffe blogged about the makeup of the ODF TC while I was in Prague at the SC 34/WG4 meeting.  The comment thread has an interesting exchange of opinions between Rick, Rob Weir, and Bruce D’Arcus.

Jesper Lund Stocholm has a series of posts on various aspects of the proposed change to IS29500 namespaces (as submitted in a Swiss defect report) that has been the subject of discussion and debate within WG4.  First he talked about ISO 8601 dates, then document protection, and most recently MCE (Markup Compatibility & Extensibility).  These posts offer a good overview of the issues and options under consideration by WG4.

Finally, here’s a photo I took Monday evening of the 520 bridge, which connects the “east side” (the area around Microsoft’s Redmond campus) to downtown Seattle …

Evergreen Point Floating Bridge, Olympic Mountains

Postedby dmahugh | 2 Comments    
IS29500 is an American National Standard
17 April 09 01:50 PM

The INCITS Executive Board has approved the adoption of ISO/IEC IS29500 (Office Open XML) as an American National Standard this week (on April 15), and the document will be published soon by ANSI. 

This action taken by INCITS is a relatively routine occurence, as the US typically adopts ISO/IEC standards as national standards.  As an INCITS V1 member, I was very excited to see this news.  It’s a positive step in the validation and global adoption of IS29500.

Postedby dmahugh | 2 Comments    
DII Workshop: London, May 18
09 April 09 10:17 AM

I’m pleased to announce that another DII workshop is coming soon.  The last two events took place in Brussels and Redmond, and this one will take place in London on Monday, May 18.

This is a free event that is part of the Document Interoperability Initiative (www.documentinteropinitiative.org), and the goal of the DII workshops is to share information with the developer community and solicit feedback on how we can work together to improve interoperability.  This particular workshop will include an emphasis on the IS29500 validator and document test library projects that are being led by Fraunhofer Fokus.  These projects were launched in response to community input at previous DII workshops, and at the London event Fraunhofer will be soliciting feedback on how these projects can best meet the needs of document format implementers.  Since both projects are in the initial planning and specification phases, this is an excellent opportunity for implementers and standards professionals to influence their future direction.

I won’t be able to attend this event myself, but Stephen Peront will be there representing our team, along with others from Redmond and our UK office.  We’re also expecting a variety of people representing other implementers, Fraunhofer Fokus, SC 34 WG4 (the working group responsible for maintenance of IS29500), and others.

We’re still working out the details of the agenda, but it will include presentations from Fraunhofer on the initial planning and structure of the validator and test library projects, as well as presentations from implementers and standards professionals on related aspects of document format interoperability and validation technologies.  And, as always at DII events, there will be significant time devoted to roundtable discussions of key topics.  If you have a topic you’d like to present, contact Stephen and he can help get you on the agenda if there’s still room.  This is a community event, and we'd like to see many voices and many perspectives involved.

The event will be an all-day affair on Monday, May 18, hosted at the Microsoft office on Cardinal Place in London, with a dinner afterward.  To register for the event, contact Stephen with your name, email, and company/organization, and he'll follow up with more details including hotel recommendations and related information.

Other upcoming DII events will take place in Berlin and Beijing, and we’ll have info on those in the weeks ahead.  Stephen and I will be sharing more info about the London event after we see who all may want to contribute and we get the agenda finalized, but I wanted to get this announcement out as soon as possible so that folks can make travel plans.  Any questions, please let me know.

Postedby dmahugh | 3 Comments    
WG4 meetings and SC 34 plenary, Prague
30 March 09 10:30 AM

We had three days of WG4 face-to-face meetings in Prague last week, followed by the SC34 plenary on Friday at the same location.  As Jesper has noted, it was a tough week and we made an enormous amount of progress.

For those who might not know, WG4 is the SC34 working group responsible for the maintenance of the Open XML spec, ISO/IEC 29500.  WG4 had its first meeting in Okinawa in January, and this time around in Prague we had more participants (31 the first day, as opposed to 22 in Okinawa) and worked through many more defect reports.  I’ve not seen project editor Rex Jaeschke’s official report yet, but we worked through over 30 defect reports ranging from simple editorial corrections to far-reaching changes such as the Swiss proposal to change the namespaces to distinguish them from ECMA-376 1st Edition.

We reached consensus on general guidelines for distinguishing corrections (COR) from amendments (AM), and based on that we asked SC34 to approve creation of amendment processes for parts 1, 2 and 4.  (We don’t happen to have any defect reports on the table for part 3.)

We also agreed on several other things that will be covered in the minutes of the meeting (which aren’t out yet), including procedural matters – dates of conference calls, timeline for the amendment and corrigendum processes,  and so on – as well as technical issues such as support of the principles that WG1 had come up with on Monday regarding XML 1.0 4th edition, or identification of two proposals for handling the Swiss namespace proposal.  On that last topic, we agreed to review those two proposals with our national bodies and reach a final decision within WG4 by 30 days from now.

The trend toward real-time coverage of these meetings on Twitter, which had started in Okinawa, continued with Alex, Jesper, Inigo and me covering WG4, and Lars and others covering other working group meetings.  It’s really cool to see reactions from others via Twitter while the conversation’s still taking place in the room, and this also allows informal/unofficial interaction by others who aren’t attending the meetings.  Gareth Horton, for example, made some contributions to the discussions even though he was back in the UK at the time.  On a related note, I posted some photos on Flickr, and Alex posted many more as well.

WG5 also met last week, although I didn’t personally attend their meeting.  I heard that they had a good turnout too, and worked through tightening up their scope of work, refining the approach they’re taking for the technical report (TR), and other details.  Watch for more information about all of the WG meetings soon.  In the case of WG4, all publicly accessible documents are being made available at http://www.itscj.ipsj.or.jp/sc34/wg4/.

It was a non-stop week, with breakfast discussions, meetings all day (and technical discussions right through lunch and breaks), then continued discussions in the evening over dinner or fine Prague beers.  I had the opportunity to spend time with Ken Holman and learn more about Genericode, CVA, and UBL, and I also met Mohamed Zergaoui for the first time.  Mohamed, who had participated in the DIN report last year as an invited expert, was very active in WG4 (representing France), and on Friday he was appointed the SC34 liaison to the W3C.

WG4 now has all the pieces in place to work through open defect reports in the calls and meetings ahead.  Conference calls will start on April 16, and we’ll have them every two weeks.  We’ll meet again in Copenhagen in June, and then at the next SC34 plenary in September (Seattle). Now is a great time to get involved in IS29500 maintenance, so if you’d like to contribute contact your national body and get involved.  See you in Copenhagen!

Postedby dmahugh | 2 Comments    
Links for 03-25-2009
25 March 09 08:32 AM

I’m here at the WG4 meeting in Prague this week, working on IS29500 maintenance with the other members.  I’ll be posting about the week’s activity after we’re done and I have some time, but for now I wanted to cover a few blog posts I’ve found interesting in the last few days …

I only showed up here Monday evening, but some people were around during the XML Prague conference over the weekend.  Alex Brown has blogged about Day 1 and Day 2 of that event, as well as the SC34 WG1 meeting on Monday.

Eric White has a post on the release of the Open XML Power Tools V1.1, with a video showing off some new features and plenty of details.  If you’ve not yet downloaded the Power Tools, now’s the time – they’ve getting more and more polished, and are useful utilities as well as excellent code samples for Open XML developers.

Gray Knowlton has the latest on the Daisy translator, which can be used to save from word to an MP3, among other things.

Gray also has a post on a the availability of full source code for the TextGlow Silverlight Open XML Viewer.  Intergen has posted the code on the OpenXmlDeveloper.org web site.

Rich Quackenbush has a post on Using OPC to Store Your Own Data.  OPC (the Open Packaging Convention) is an ISO standard (as Part 2 of IS29500), and it offers a flexible mechanism for storing various content types in a single ZIP archive.

Prague

Postedby dmahugh | 1 Comments    
Miscellaneous links for 03/18/2009
18 March 09 09:00 PM

It’s been a couple of weeks since I posted, and I’ve come across several interesting blog posts and articles in that time.  Here are a few favorites …

There’s some cool new content that has started appearing on the OpenXMLDeveloper.org site.  First was the OOXML Crawler, an application that crawls all the documents at a URL (a web site or SharePoint library, say) and retrieves all of the metadata properties from those documents.  And this week there’s a cool article on how to use XSLT to transform raw XML data into a DOCX.  The XML source data used in the example is based on HL7, the emerging standard for medical records.  Both articles include complete source code, of course.

Speaking of HL7, Altova’s MapForce product now supports the standard, and Alexander Falk has a blog post about how out works entitled Electronic health records, HL7, and XML data mapping.  Additional information about how to use MapForce with HL7 can be found over on the Altova blog as well.

Zeyad continues to crank out great examples of how to use the Open XML SDK over on Brian Jones’s blog.  His latest include How to Assemble Multiple PowerPoint Decks and Importing Charts from Spreadsheets to Wordprocessing Documents.

Eric White has some new posts on Open XML SDK topics, including Creating a Template Open XML Document in Memory and a brief overview of Cathal Coffey’s “DocX” library (which currently handling string replacement and property-setting).

I  recently attended a web meeting that used Slideshare to share a presentation, and was impressed with how easy it was to use.  I didn’t know at the time that they support Open XML – you can upload a PPTX to Slideshare and share it with others.  Very cool.

Rick Jelliffe’s Concentration at the ODF TC  presents some thoughts and questions about how to best assure a balance of power among competing vendors who are working together on standards development, and Gray Knowlton shares some additional thoughts on related interoperability topics.  In anticipation of the question, I can tell you that our approach has been to take care to avoid having more than two Microsoft voting members on the ODF TC.  This means I have to sleep in on some Mondays (the calls are at 7AM our time) to be sure to not accidentally acquire voting rights, a personal sacrifice I’m glad to make.

Have you ever wondered how the ODF translator deals with formulas?  Annerose Hümbert explains the details of how it works in V3.0 over on the translator team blog.  I couldn’t agree more with this quote: “users are not really interested in what is part of a standard and what not.”

Finally, Peter Sefton has a thought-provoking post on custom XML and interoperability.  It’s worth reading carefully, and you can follow the links to find several other perspectives on the topic, comprising a web of conversations that started with Glyn Moody’s post.  Is custom XML an insidious plot to limit document interoperability?  I don’t think so (shocker, eh?), but I think it’s great to see people with so much experience and expertise debating document interoperability topics.

By the way, this is my first post written with Windows Live Writer.  I like it – simple, straightforward install and configuration, and everything is intuitive and quick.  I’ve always been a Notepad kind of guy, and it’s still my all-time favorite productivity application, but I’ve recently moved to Windows 7 and decided it just doesn’t look right to be writing blog posts in Notepad on Windows 7.  A slave to fashion, I am.

Postedby dmahugh | 0 Comments    
More Posts Next page »

This Blog

Syndication

Page view tracker