Well, a few days have turned into a few weeks. The joy of technology, travel and catching up on things. You can read a recap of the event and download the presentations here. I will just take a few minutes to provide some salient points from my perspective.
We held the event in the Microsoft offices at Cardinal Place in London, UK. We had an excellent turnout with participants from Fraunhofer FOKUS, Workshare, PowerPoint Alchemy, Griffin Brown Digital Publishing, FEDICT, Dialogika, Gama System, Genisoft, PowerPoint Alchemy, Datalucid Limited, and RealDolmen, as well as independent experts from SC34 and the OASIS ODF/OIC technical committees.
The focus of this event was the new Fraunhofer FOKUS IS29500 Validator and Document Library project. Members of the Fraunhofer team presented the project to industry experts and received feedback from industry experts including Alex Brown (convener of SC34 WG1 and member of WG4), Bart Hanssens (Chair of the OASIS OIC TC), and Dennis Hamilton (Secretary of the OASIS OIC TC). This broad expertise across document formats led to a wide-ranging conversation about managing document format standards.
Stephanie Krieger, Julien Chable and John Wilson spoke up quite often at the event, raising interoperability concerns with standards conformance from real world customer situations. Stephanie is a well accomplished author who has written a number of books, including Advanced Microsoft Office Documents 2007 Edition Inside Out.
Some of the attendees have taken the time to write their thoughts on the event, here is a link to their posts:
In addition to the introduction of the Fraunhofer FOKUS project, there were a number of presentations shared by attendees. I have included a brief description of each of the presentations.
Paul Lorimer (left) is the Group Manager of the Office Interoperability team. Paul kicked off the event by talking about value of the Fraunhofer FOKUS project and along with some goals that Microsoft is looking to achieve. Giampiero Nanni (right) is the Director of Interoperability for Microsoft in the UK; he presented on what Microsoft in the UK is doing around interoperability. It was great to have Paul and Giampiero at the event, as they were able to answer a number of questions that came up during the event, sharing Microsoft’s goals and efforts. You can download Paul’s presentation here, and Giampiero’s presentation here. You can read more about interoperability at Microsoft here.
Alex Brown discussed how the word “valid” has a very specific meaning within a standard, and that when people use the word validation, they generally mean “schema-valid”. Alex explained how validation requires a much deeper meaning, requiring terms such as: conformant, valid, interoperable and portable. Alex provided a history of ODF going through the standards process and explained where IS29500 is in the process along with the current set of activities there. Alex then explained the differences between “application” conformance and “document” conformance. He finished his presentation with a demonstration of using a new W3C technology, XProc, to show how XML Pipelines can be used to test all of the previously mentioned validation terms in a succinct and manageable way. You can read more about XML Pipelines in this post on his blog. You can download Alex’s presentation here.
Matevž Gačnik explained the definition of “original” content as defined by the Slovenian government and European Union legislature. According to these regulations, a document can be considered “original” if it is signed by the author, stored and archived by a certified software solution and is stored in a preferred document format. Matevž explained the challenge that IS29500 is not currently a preferred format because when the CTD was approved, IS29500 had not yet been approved as a standard. Matevž shared some feedback from their organization about Office and IS29500; one point that stood out to me was his comment that parsing Office documents as XML is “2000x faster”. You can download Matevž’s presentation here (Note: to view this presentation, you may need to right-click on the link and select “Save Target As…”, then download and open from your local computer.)
Maarten Balliauw, in lightning speed, introduced the group to a new PHP project on CodePlex, called PHPPowerPoint. The PHPPowerPoint project provides a set of classes for PHP for reading and writing the PresentationML file formats. The PHPPowerPoint project originated from the PHPExcel project. Maarten demonstrated the PHPPowerPoint, and Slide classes, then showing us how the PHPPowerPoint_Reader_IReader and PHPPowerPoint_Writer_IWriter interfaces are used for persisting the document. Maarten concluded his presentation by generating a document using PHPPowerPoint. You can download Maarten’s presentation here.
Gerd Schürmann started by sharing a little history about Fraunhofer, introducing us to the late Joseph von Fraunhofer (1787 – 1826). Joseph was a scientist, discovering the “Fraunhofer Lines” in the sun spectrum; an inventor, creating a new manufacturing method for lenses; and an entrepreneur, being a director and associate of a glassworks. Gerd explained the breadth of offerings that Fraunhofer provides, including: research and development projects, advance studies and consultancies, services, standardization and fora activities, academic education and teaching and prototype development. Gerd concluded by introducing the IS29500 Validator and Document Library project. You can download Gerd’s presentation here.
Wolfgang Keber (left) and Natasa Milic-Frayling (right) introduced us to the PLANETS project, which focuses on preserving digital assets. The four-year project is co-funded by the European Union and PLANET is an acronym which stands for Preservation and Long-term Access through Networked Services. Natasa is from the Microsoft Research labs, which has contributed to this project. Wolfgang explained the challenges in going between different document formats. For example, converting a document from a Binary MS Office document and converting it to ODF or UOF. Wolfgang then explained that by creating a wrapper around each format, they have been able to achieve converting documents from many formats to many other formats. Wolfgang concluded his presentation by showing us a demo. Stephanie Krieger and Julien Chable proposed some difficult questions about the formatting, which spurred some lively and interesting discussion about the interoperability of some document formats with others. You can download Wolfgang’s presentation here.
I then had the privilege of presenting on the topic of extensibility within standards. I started the presentation with a discussion, asking people what they thought of when they heard the terms extensibility and standards together. I then moved into showing the extensibility mechanisms defined in Part 3 of the IS29500:2008 standard. I showed how custom elements and attributes can be added to the markup of the document. I then showed how an implementer can use alternate content blocks (ACB) to allow a consumer to gracefully render a previous version of the markup. I then provided a demo where I added custom elements and attributes to the markup of a PresentationML document, and opened the document in PowerPoint 2007. I concluded my session with another discussion, asking people whether they think extensibility mechanisms are a healthy object oriented way of advancing standards. This led to a lively discussion, but in general, I think people agreed that extensibility within standards has value. You can download my presentation here.
Jan Ziesing (left) and Ucheoma “Uche” Ishionwu (right) picked up from where Gerd’s presentation left off by officially introducing the Fraunhofer FOKUS IS29500 Validator and Document Library project. Jan was the presenter, and he turned to Uche for three specific demos. Jan started by explaining that the purpose of the document library is to create a suite of documents for testing and verifying IS29500 interoperability. Fraunhofer will maintain a web site for a document repository where people can up/download documents. Jan shared with us their research on the complexity of categorizing documents, explaining how automation and validation can be used to categorize documents into specific domains when they are uploaded. You can download Jan’s presentation here.
Uche provided demos for categorizing documents, building semantic rules for a photo book and semantic validation. In the first demo, Uche showed how categorizing documents is something that can be done programmatically. He identified different attributes of the presentation, then added weighted values to some attributes to level their importance in the categorization. You can download Uche’s first demo here. In the second demo, Uche described how this categorization can be applied to a real world photo book document when uploaded to the Document Library. By applying these attributes Jan and Uche demonstrated how the programmatic categorization of the document allows the document to be easily found within the Document Library. You can download Uche’s second demo here. In the third demo, Uche showed how an XSD schema is not enough to completely validate a document against a standard. Uche manually modified a document to invalidate it against the standard, but keep it compliant with the XSDs; he then ran validation on the document, which validated correctly. Uche then used Schematron to add semantic validation rules (i.e. rules that are only specified in the text of the standard) to more accurately validate his file against the standard. You can download Uche’s third demo here.
You can email Jan Ziesing to learn how to signup and contribute to the project.
The event concluded with a roundtable discussion, led by Fraunhofer. Many topics were discussed; attendees provided feedback about the Validator and Document Library project, also sharing their thoughts about what validation scenarios are important to them. Here are some of the feedback that was shared:
Fraunhofer noted this feedback and hopes to incorporate it into their work.