Well, a few days have turned into a few weeks. The joy of technology, travel and catching up on things. You can read a recap of the event and download the presentations here. I will just take a few minutes to provide some salient points from my perspective.

We held the event in the Microsoft offices at Cardinal Place in London, UK.  We had an excellent turnout with participants from Fraunhofer FOKUS, Workshare, PowerPoint Alchemy, Griffin Brown Digital Publishing, FEDICT, Dialogika, Gama System, Genisoft, PowerPoint Alchemy, Datalucid Limited, and RealDolmen, as well as independent experts from SC34 and the OASIS ODF/OIC technical committees.

The focus of this event was the new Fraunhofer FOKUS IS29500 Validator and Document Library project. Members of the Fraunhofer team presented the project to industry experts and received feedback from industry experts including Alex Brown (convener of SC34 WG1 and member of WG4), Bart Hanssens (Chair of the OASIS OIC TC), and Dennis Hamilton (Secretary of  the OASIS OIC TC). This broad expertise across document formats led to a wide-ranging conversation about managing document format standards.

Stephanie Krieger, Julien Chable and John Wilson spoke up quite often at the event, raising interoperability concerns with standards conformance from real world customer situations. Stephanie is a well accomplished author who has written a number of books, including Advanced Microsoft Office Documents 2007 Edition Inside Out.

Some of the attendees have taken the time to write their thoughts on the event, here is a link to their posts:

In addition to the introduction of the Fraunhofer FOKUS project, there were a number of presentations shared by attendees. I have included a brief description of each of the presentations.

Introduction and Interoperability @ Microsoft UK

Paul Lorimer (left) is the Group Manager of the Office Interoperability team. Paul kicked off the event by talking about value of the Fraunhofer FOKUS project and along with some goals that Microsoft is looking to achieve. Giampiero Nanni (right) is the Director of Interoperability for Microsoft in the UK; he presented on what Microsoft in the UK is doing around interoperability. It was great to have Paul and Giampiero at the event, as they were able to answer a number of questions that came up during the event, sharing Microsoft’s goals and efforts. You can download Paul’s presentation here, and Giampiero’s presentation here. You can read more about interoperability at Microsoft here.

Standards-based validation of IEC/ISO 29500 XML resources

Alex Brown discussed how the word “valid” has a very specific meaning within a standard, and that when people use the word validation, they generally mean “schema-valid”. Alex explained how validation requires a much deeper meaning, requiring terms such as: conformant, valid, interoperable and portable. Alex provided a history of ODF going through the standards process and explained where IS29500 is in the process along with the current set of activities there. Alex then explained the differences between “application” conformance and “document” conformance. He finished his presentation with a demonstration of using a new W3C technology, XProc, to show how XML Pipelines can be used to test all of the previously mentioned validation terms in a succinct and manageable way. You can read more about XML Pipelines in this post on his blog. You can download Alex’s presentation here.

High Fidelity Programmatic Access to Document Content

Matevž Gačnik explained the definition of “original” content as defined by the Slovenian government and European Union legislature. According to these regulations, a document can be considered “original” if it is signed by the author, stored and archived by a certified software solution and is stored in a preferred document format. Matevž explained the challenge that IS29500 is not currently a preferred format because when the CTD was approved, IS29500 had not yet been approved as a standard. Matevž shared some feedback from their organization about Office and IS29500; one point that stood out to me was his comment that parsing Office documents as XML is “2000x faster”. You can download Matevž’s presentation here (Note: to view this presentation, you may need to right-click on the link and select “Save Target As…”, then download and open from your local computer.)

PHP PowerPoint Project on CodePlex

Maarten Balliauw, in lightning speed, introduced the group to a new PHP project on CodePlex, called PHPPowerPoint. The PHPPowerPoint project provides a set of classes for PHP for reading and writing the PresentationML file formats. The PHPPowerPoint project originated from the PHPExcel project. Maarten demonstrated the PHPPowerPoint, and Slide classes, then showing us how the PHPPowerPoint_Reader_IReader and PHPPowerPoint_Writer_IWriter interfaces are used for persisting the document. Maarten concluded his presentation by generating a document using PHPPowerPoint. You can download Maarten’s presentation here.

Interoperability by Community

Gerd Schürmann started by sharing a little history about Fraunhofer, introducing us to the late Joseph von Fraunhofer (1787 – 1826). Joseph was a scientist, discovering the “Fraunhofer Lines” in the sun spectrum; an inventor, creating a new manufacturing method for lenses; and an entrepreneur, being a director and associate of a glassworks. Gerd explained the breadth of offerings that Fraunhofer provides, including: research and development projects, advance studies and consultancies, services, standardization and fora activities, academic education and teaching and prototype development. Gerd concluded by introducing the IS29500 Validator and Document Library project. You can download Gerd’s presentation here.

PLANETS & Doc Conversion Tools

Wolfgang Keber (left) and Natasa Milic-Frayling (right) introduced us to the PLANETS project, which focuses on preserving digital assets. The four-year project is co-funded by the European Union and PLANET is an acronym which stands for Preservation and Long-term Access through Networked Services. Natasa is from the Microsoft Research labs, which has contributed to this project. Wolfgang explained the challenges in going between different document formats. For example, converting a document from a Binary MS Office document and converting it to ODF or UOF. Wolfgang then explained that by creating a wrapper around each format, they have been able to achieve converting documents from many formats to many other formats. Wolfgang concluded his presentation by showing us a demo. Stephanie Krieger and Julien Chable proposed some difficult questions about the formatting, which spurred some lively and interesting discussion about the interoperability of some document formats with others. You can download Wolfgang’s presentation here.

Extensibility within Standards

I then had the privilege of presenting on the topic of extensibility within standards. I started the presentation with a discussion, asking people what they thought of when they heard the terms extensibility and standards together. I then moved into showing the extensibility mechanisms defined in Part 3 of the IS29500:2008 standard. I showed how custom elements and attributes can be added to the markup of the document. I then showed how an implementer can use alternate content blocks (ACB) to allow a consumer to gracefully render a previous version of the markup. I then provided a demo where I added custom elements and attributes to the markup of a PresentationML document, and opened the document in PowerPoint 2007. I concluded my session with another discussion, asking people whether they think extensibility mechanisms are a healthy object oriented way of advancing standards. This led to a lively discussion, but in general, I think people agreed that extensibility within standards has value. You can download my presentation here.

Fraunhofer – Validator and Test Document Library Project

Jan Ziesing (left) and Ucheoma “Uche” Ishionwu (right) picked up from where Gerd’s presentation left off by officially introducing the Fraunhofer FOKUS IS29500 Validator and Document Library project. Jan was the presenter, and he turned to Uche for three specific demos. Jan started by explaining that the purpose of the document library is to create a suite of documents for testing and verifying IS29500 interoperability. Fraunhofer will maintain a web site for a document repository where people can up/download documents. Jan shared with us their research on the complexity of categorizing documents, explaining how automation and validation can be used to categorize documents into specific domains when they are uploaded. You can download Jan’s presentation here.

Uche provided demos for categorizing documents, building semantic rules for a photo book and semantic validation. In the first demo, Uche showed how categorizing documents is something that can be done programmatically. He identified different attributes of the presentation, then added weighted values to some attributes to level their importance in the categorization. You can download Uche’s first demo here. In the second demo, Uche described how this categorization can be applied to a real world photo book document when uploaded to the Document Library. By applying these attributes Jan and Uche demonstrated how the programmatic categorization of the document allows the document to be easily found within the Document Library. You can download Uche’s second demo here. In the third demo, Uche showed how an XSD schema is not enough to completely validate a document against a standard. Uche manually modified a document to invalidate it against the standard, but keep it compliant with the XSDs; he then ran validation on the document, which validated correctly. Uche then used Schematron to add semantic validation rules (i.e. rules that are only specified in the text of the standard) to more accurately validate his file against the standard. You can download Uche’s third demo here.

You can email Jan Ziesing to learn how to signup and contribute to the project.

Roundtable Discussion

The event concluded with a roundtable discussion, led by Fraunhofer. Many topics were discussed; attendees provided feedback about the Validator and Document Library project, also sharing their thoughts about what validation scenarios are important to them. Here are some of the feedback that was shared:

      • Some people expressed the opinion that they would like SC34 to contribute to the validator; defining/validating the rules needed to validate IS29500 files
      • Some people expressed the opinion that they would like the OASIS OIC TC to coordinate efforts between the Validator and Document Library project and the work the OIC TC is doing with ODF interoperability and validation
      • Some people expressed that they would like to see the validator be made available as a web service
      • Some people shared that some organizations may consider their documents as proprietary, and want to know if the validator could be made available to these organizations in such a way that they could either a) securely pass documents to the validator without fear of the document being made available to others, and/or b) have a copy of the validator that they can run privately within their own infrastructure
      • Some people expressed that they would like the Document Library to have a mechanism by which the intellectual property rights (IPR) of the document and/or owner can be verified, thereby protecting the IPR of the document and/or owner. They felt that this would make the library more valid and useful to users

Fraunhofer noted this feedback and hopes to incorporate it into their work.