-
I'm in London this week for meetings with SC 34 Ad-Hoc Group 1 (AHG1) and Ecma TC45. I suppose London is an appropriate place for standards bodies to meet, having had a newspaper called the Standard for nearly 200 years.
SC 34 AHG1 Meeting
The AHG1 meeting took place on Monday and Tuesday at the British Library. The goal of this meeting was to come up with a set of recommendations for SC 34 regarding the structure and activities of WG4, the new working group to be created for maintaining IS29500. Alex Brown, the AHG1 convenor, led the discussion and wrote the recommendations on the screen while the other attendees all discussed and debated various possibilities. At the end of the second day we reviewed the resulting document to tighten it up a bit, and the final outcome of the meeting is available for all to see on the SC 34 web site.
Jesper Lund Stocholm, one of two attendees representating Denmark, has posted his commentary on the meeting, and I don't have much to add to what he had to say. It was a productive two days, and we had the benefit of several deep experts in the JTC 1 Directives that govern this type of work, who kept us all focused on the task at hand.
IS29500 maintenance will proceed under the ISO jurisdiction in WG4 of SC 34, with participation by NB members and liaison organizations. The pace will be brisk, with conference calls taking place "as often as necessary (e.g. weekly)." I've participated in TC45's weekly calls for some time, and have recently started participating in the OASIS ODF TC's weekly calls. That sort of rhythm really keeps things moving, and I suspect it will keep the project editor and editing team very busy.
TC45 will play an active role in the maintenance process, and we all agreed that “SC 34 and Ecma will make best efforts to cooperate to ensure versions of ISO/IEC 29500 and Ecma 376 are kept synchronized.” Ecma will publish updated versions of the specification, as is their practice with jointly developed standards, but we all agreed that nobody benefits from having the latest version of the standard available from only one organization and not the other. That's why we included specific text about the goal of synchronization in our recommendations to SC 34.
Ecma TC45 Meeting
After two days of AHG1 meetings, today I attended the Ecma TC45 meeting, also hosted by the British Library. (Thank you, Adam.) We couldn't get started on maintenance, of course, but several of us were here for the AHG1 meeting anyway, so it was a good opportunity to review our plans for participating in SC 34's maintenance plan going forward.
The next step in IS29500 maintenance will take place at the SC 34 plenary in October (Jeju, Republic of Korea), where SC 34 members will make the final decisions on how WG4 will be structured, based on the recommendations from AHG1.
And now I'm off to find a pub. The Evening Standard unfortunately stopped selecting a Pub of the Year last year (why?), and the pubs on Weir Road are a bit too far away (hi Rob :-)), so I'll just wander the King's Cross area and see what I find. As they say in these parts, Cheers!
-
Eric White on Linq and Open XML. Eric White has been posting some great code samples lately, including a sample of functional programming with the Open XML SDK, how to transform flat data into a hierarchy with Linq, and a set of blog posts on Writing robust Linq to XML code that performs well.
Orcmid on Interop. Like Eric, Dennis Hamilton has been blogging prolifically this summer, with a series of posts in the last week on interop topics: Interoperability: The Experience of It , Interoperability: No Code Need Apply?, and Interoperability: What's the Self-Interest?.
PowerShell and Open XML. A few recent posts on working with Open XML documents from PowerShell scripts:
ODF translator version 2.0, Milestone 1 released. Functionality in the latest release of the ODF translator project is covered on the translator project blog. The next release is planned for August.
Office Binary Translator release. The M2 version of the binary translator project was released last week, and Wolfgang Keber has some additional info on the B2X blog.
Erika's Latest Project. Erika Ehrli is going to take a few months off to become a Mom! She'll be missed around here, but we're all very happy for her. I've heard rumors that Erika and Wouter (another new parent) have a bet going on whose kid will learn .NET development first.
-
We’re holding a workshop in Redmond on July 30 to talk about our ODF support in Office 2007 SP2. This is a free workshop that we’re doing as part of our Document Interoperability Initiative, to share information with the developer community and solicit feedback on how we can work together to improve interoperability.
We initially invited the members of the ODF TC, but it looks like we may have a few more slots open, so I thought I’d extend the invitation more broadly. I can’t guarantee that we’ll have room for everyone who would like to attend, but please let me know if you’re interested and I’ll see if we can get you in. Priority will be given to those who are working with the ODF format, including developers who are implementing ODF, organizations that are using ODF, and persons who are contributing to the evolution and maintenance of the ODF standard.
Here’s a general outline of what the workshop will include:
- Presentations from members of the Office product team, to explain our approach to ODF support and demonstrate some of the specific functionality we’re planning.
- Hands-on lab time, to give you an opportunity to try out a pre-release version of our ODF support.
- Discussion time, so that we can hear your feedback on topics of interest to document format implementers.
- Hosted dinner event at a nearby location.
This event will provide a unique opportunity to get an early look at how we’re planning to implement ODF support in Office. If you’d like to attend, please let me know via email (dmahugh at Microsoft dot com) as soon as possible. For those who aren’t able to attend, I’ll be covering some of the content from the workshop here on my blog afterward.
-
Brian Jones mentioned on his blog yesterday that he’s going to be pretty focused on Office 14 going forward and people should look over here on my blog for information about Office’s approach to interoperability and file formats going forward. I’d like to add a few more details regarding my own plans.
Over the next few weeks, I'll be transitioning to a PM position on our Office interoperability team, and I'm going to get much more involved in a variety of standards groups and activities including Ecma TC45 and the OASIS ODF TC, as well as INCITS V1 (the US SC34 mirror, where I've been an active member since January 2007).
Here's an overview of what I'll be doing:
- Through Ecma TC45, I'll be involved in Ecma's ongoing role in SC34's maintenance of the IS29500 spec.
- As implementers of ODF, we're getting involved in the ODF TC to stay on top of the latest changes to ODF and to help improve interoperability. I'll be our representative in that group, which I joined three weeks ago.
- I'll continue to be active in INCITS V1, where I'm looking forward to learning more about topic maps and other emerging standards.
- I'll be working with my colleagues to put on interop workshops and related events, to help developers create solutions that interoperate with Office and related Microsoft products.
- Here on my blog, Open XML will continue to be a major focus, but I'll also be covering our work with ODF, XPS, the binary formats, and related technologies.
It's exciting to be part of Office's commitment to interoperability, and I'm looking forward to working with the people I've met through the IS29500 process and many other interesting people I'll be meeting as we move forward. There are so many smart and passionate members of the standards groups I'm getting involved with, and we also have many smart and passionate people working on future versions of Office here at Microsoft. It’s an exciting time to be part of this great work.
-
We have announced today the release of version 1.0 of technical documentation for a variety of protocols used by Office, SharePoint and Exchange. this brings the total amount of protocol documentation available for free download on MSDN to about 50,000 pages, including 5,000 pages of new documentation for the binary file formats alone.
The binary format documentation is thorough and well-organized, and will be useful to anyone who needs to write code that reads existing binary documents. Here, for example, is a diagram showing the basic structure of tables in the .DOC format:
And here's a diagram that shows how metadata is organized in a spreadsheet file:
Speaking of spreadsheets, this new batch of documentation also covers the XLSB format that is used in Office 2007 to optimize performance in unusually large and complex spreadsheets. For a full list of what formats are covered, see the MSDN site.
The documentation is supported on a group of User Forums that are organized by general topic. If you have specific questions about the details, that's the place to get them answered, and for more information about the things we're doing to enable interoperability with Office see today's press release or the Microsoft interop site.
-
Here are some interesting blog posts from the last week, as well as a few items I had missed while I was at TechEd but discovered while catching up on my RSS feeds today ...
Wouter Van Vugt has an excellent post on "Embedding repeating elements in a schema-mapped document" that shows how to use the <customXml> element with repeating data structures in WordprocessingML tables.
Eric White has two posts last week of interest to users of the Open XML Power Tools, including "PowerTools Script that Generates a Table in an Open XML Document" and "Much Improved Approach for Automatic Document Generation using PowerTools." The latter introduces the work of Doug Finke, a new contributor to the Power Tools project on Codeplex.
Peter O'Kelly and Guy Creese have updated the Burton Group whitepaper "What’s Up, .DOC? ODF, Open XML, and the Revolutionary Implications of XML in Productivity Applications" to include information and perspective on recent events that have taken place since the approval of IS29500. The paper will be presented at Burton's Catalyst Conference in San Diego next week.
Rick Jelliffe's "A new test for objectivity" prompted an interesting exchange with Alex Brown on the matter of Britian's High Court's rejection of an application for judicial review of BSI's position on IS29500, and Alex's post "OOXML Hit into the Long Grass" adds more details on the matter.
Another recent Rick Jelliffe post worth reading is "the era of closed formats is dead," which includes some interesting perspective from South African standards activist Bob Jolliffe and prolific XML/Java author Elliotte Rusty Harold. (By the way, if you're responsible for maintaining lots of HTML content — and who isn't these days? — check out out ERH's handy new book "Refactoring HTML," which he's serializing on his The Cafes blog.)
Alex Brown has information about XML UK's XML in the Office conference, scheduled for this Thursday at Reading Town Hall. The event will include a full day of hands-on presentations from Alex, Inigo Surguy, and many others, including my UK colleague Matt Deacon.
-
When learning about Open XML or developing Open XML solutions, it's very common to find yourself wondering "what's the difference between these two documents?" For example, you may see something in a document that you'd like to recreate programmatically, so you want to know what markup would be required. Or perhaps you've modified a document manually (using Word, say) and you want to know what markup changes were caused by your edits.
In those situations, a diff utility can save a lot of time. I'll cover two good options for comparing Open XML documents below: Eric White's free command-line tool OpenXmlDiff, which comes with source code and can be useful in automated workflows, and Altova's commercial GUI tool DiffDog, which offers a variety of interactive capabilities for analyzing the differences between Open XML documents.
Eric White's OpenXmlDiff
Eric White recently had a need for an Open XML diff utility, and he decided to create a tool from scratch. The result was OpenXmlDiff, a simple and straightforward command-line tool that generates a report of all the differences between two Open XML documents. The diff report is written to console output, so you can easily redirect it to a text file or another program. Eric has put together a screencast that provides a concise 3-minute overview of how to download and use OpenXmlDiff.
OpenXmlDiff uses the XML Diff and Patch Utility (a free download on MSDN) to analyze the differences between the same XML part within two different Open XML documents. That tool identifies the specific changes that would be need to transform one XML document (i.e., OPC part) into another, and OpenXmlDiff handles the details of the OPC package and generates a well-organized output report that summarizes differences at the package level and then shows the specific details for parts that differ.
OpenXmlDiff is a good option if you want to study source code or extend a tool on your own, and it's also free. For those who want more of a slick GUI tool for comparing Open XML documents, there's another good option ...
Altova's DiffDog
I had the pleasure of meeting Alexander Falk in person at TechEd two weeks ago, and we had lunch and talked about our mutual interests including XML standards, Open XML tools, and — most of all — photography. Ironically, we got so busy talking about photography that I forgot to take a picture of Alex, but I did snap a couple of photos of their booth, where a variety of Altova employees (including Tara and Erin, pictured) were on hand to answer questions and do demos.
Altova's suite of XML tools has been evolving rapidly, and one of the areas where they've added quite a bit of functionality lately is Open XML support. For example, Alex blogged recently about how to use Altova's MapForce to auto-generate C# code that creates an Open XML spreadsheet, and their XMLSpy and StyleVision products also provide built-in support for the Open XML formats.
Another Altova tool that can be very useful to Open XML developers is DiffDog, a full-featured general-purpose diff/merge utility that supports any type of text file and also offers XML-aware differencing and support for Open XML documents (i.e., OPC packages) and ZIP files.
DiffDog's "XML-aware" approach means that it's smart about how to organize differences in XML documents for various visualizations (text view, grid view), and it also provides options for how to handle whitespace, CDATA, ordering of attributes (semantically meaningless, but sometimes important to a developer) and many other XML-specific details. And with full support for parts in ZIP packages, you can easily use DiffDog on Open XML documents. Download the free 30-day trial version and check it out.

-
Hot on the heels of the release of Version 1 of the Open XML SDK earlier this week, Eric White has posted information about a cool project that is built with the SDK: the Open XML Power Tools project.
the Open XML Power Tools are a set of PowerShell cmdlets that can be used to automate various document management tasks. Each cmdlet does one thing, such as removing metadata or adding a chart or digital signature, and the cmdlets can be strung together in PowerShell scripts. This enables creation of simple custom solutions entirely in PowerShell.

In addition, the Power Tools are great examples of how to use the Open XML Formats SDK for common tasks such as creating a chart in a spreadsheet or adding a watermark to a word-processing document. Full source code is available on the Codeplex site, and the first release includes all of these cmdlets:
- Accept-OpenXmlChange
- Add-OpenXmlContent
- Add-OpenXmlDigitalSignature
- Add-OpenXmlDocumentIndex
- Add-OpenXmlDocumentTOA
- Add-OpenXmlDocumentTOC
- Add-OpenXmlDocumentTOF
- Add-OpenXmlPicture
- Export-OpenXmlSpreadsheet
- Export-OpenXmlToHtml
- Export-OpenXmlWordprocessing
- Get-OpenXmlBackground
- Get-OpenXmlComment
- Get-OpenXmlCustomXmlData
- Get-OpenXmlDigitalSignature
- Get-OpenXmlDocument
- Get-OpenXmlFooter
- Get-OpenXmlHeader
- Get-OpenXmlStyle
- Get-OpenXmlTheme
- Get-OpenXmlWatermark
- Lock-OpenXmlDocument
- Remove-OpenXmlComment
- Remove-OpenXmlDigitalSignature
- Set-OpenXmlBackground
- Set-OpenXmlContentFormat
- Set-OpenXmlContentStyle
- Set-OpenXmlCustomXmlData
- Set-OpenXmlFooter
- Set-OpenXmlHeader
- Set-OpenXmlStyle
- Set-OpenXmlTheme
- Set-OpenXmlWatermark
The fastest way to learn more about the Open XML Power Tools project is to watch Eric White's screencast of the Power Tools in action. In just a few minutes you'll see exactly how to install and use the Power Tools, and some examples of what they can do.
TechEd/IT Pro
We've had some interesting discussions with IT pros this week at TechEd/IT Pro. I've worked an Open XML booth at quite a few developer events, but this is my first IT pro event, and there some consistent differences in perspective.
The attendees this week are very interested in deployment, security, and file size issues, whereas last week (during TechEd/Developer) the attendees were most interested in API and tool support for the Open XML formats. So last week we had many questions about the Open XML SDK, and this week we've talked more about the Compatibility Pack and the Office Migration Planning Manager.
The release of the Open XML Power Tools this week is good timing for the IT pro crowd, and several people have left the booth today with plans to download the first release and put it immediately to work. We've also heard good feedback on some of the things that would be useful in future power tool cmdlets. It's an open-source project, so anybody can sign up and make contributions.
And in between Open XML discussions, I've been auditioning for a possible new job as you can see below. :-) Wish me luck!

-
Version 1 of the Open XML SDK is now available for free download from MSDN. Here are the links:
This is the final "go-live" Version 1, so you can deploy solutions that you build using this version.
Going forward, the next step for the SDK will be the CTP of Version 2 planned for this summer, as covered in the Open XML SDK roadmap. We'd love to have even more developers providing feedback on the CTPs through the Connect program; here are the steps for signing up:
- Sign up and register on the Microsoft Connect site. (You will need a Windows Live ID, which you can get here if you don't already have one.)
- Email your Live ID account name (not password :-)) to osdkhelp@microsoft.com.
- The folks running the program will then send you an invitation to join with more information.
Open XML Developer Resources
In addition to the links above, don't forget about all of the great Open XML content on MSDN. These resources can help you get up to speed quickly on Open XML development topics:
And if you're at TechEd this week in Orlando, come on by the Open XML booth, where we're giving away brand-new hot-off-the-presses Open XML developer posters, recently updated by Erika. We're in the building on International Drive with colossal clamshells above both ends ...

-
This afternoon at TechEd, Zeyad Rajabi demonstrated some of the ways developers can use the Open XML SDK to read and write Open XML documents.
He started with the basics, such as how to create a hello-world document (as shown on MSDN), then progressed to real-world scenarios such as a web site for creating customized sales contracts. That demo (attached) builds on the template that Tristan Davis created in his post on the Word team blog that first described how content controls work.
One cool thing in the attached demo is the button that creates 100 sales contracts at once, to demonstrate the level of performance you can get by creating Open XML documents with the SDK instead of automating the Office clients. On Zeyad's laptop it took about 1.5 seconds to create 100 documents.
Zeyad also showed off some of the upcoming functionality in Version 2 of the SDK, including a high-level API that provides simple access to the content of XML parts in an Open XML package. I'll post those samples when we start releasing CTPs of Version 2 this summer.
-
If you're at TechEd in Orlando this week, come on by the Open XML booth to get a copy of the recently updated Open XML developer poster (thanks Erika!). I'll be here for TechEd/Developer and TechEd/IT Pro, so hope to see you here.
What's new with the Open XML SDK. Tomorrow at 1:00, Zeyad Rajabi will be doing a session on the Open XML SDK, showing off some of the new functionality that will be available in the next CTP (which will be released any day now) as well as a few other things coming in Version 2. It's session OFC08-TLC, in Green Interactive Theater 1, which is right next to the Open XML booth. Don't let the long title scare you off ("Content-Level Document Editing and Manipulation Using Open XML SDK" or some such thing), it's going to be a fast-paced overview of some very cool technology for Open XML developers.
I took some time off last week, so am a bit behind in blogging and email (OK, more than a "bit"), but here are a few Open XML links from the last week or so that I've found interesting ...
Embedding simple values in a schema-mapped document. Wouter Van Vugt continues his exploration of how the <customXml> tag can be used to add semantic markup to documents, this time walking through some code samples and creating a simple method that can update data values in a schema-mapped document.
Caroline Arms and Jon Udell. Few organizations are any more involved in document archival considerations than the Library of Congress. LOC's Caroline Arms was recently interviewed by Jon Udell, and it's a wide-ranging discussion of the issues and options for document storage. (FYI, Caroline was one of the most active participants in Ecma TC45, and attended the IS29500 BRM with the Ecma delegation.)
Thought-provoking posts. A few recent posts on Open XML-related topics have made for interesting reading ...
That's all I have time for today, but more to come soon. We're having some great conversations with developers, so if you're here at TechEd come on down to the Open XML booth.
-
If you're an Office 2007 user, the image above probably looks pretty familiar. But look close, and you'll see some Save-As options you've not seen before here: OpenDocument, and (unless you have the existing add-in) PDF & XPS.
This is a screen shot of a pre-release copy of SP2 (Service Pack 2) for the 2007 Microsoft Office System, showing the new document format standards that we'll be supporting starting with SP2. We've made an announcement of this and several related things today, and you can get all the details in the press release, and watch for additional perspective that will be provided by Gray Knowlton and Jason Matusow on their blogs today. I'll provide a few details here on our technical goals in implementing ODF, the planned user experience, and a few aspects that I think will be of particular interest to developers.
It's exciting to be announcing built-in support for these standards, but I think it's worth noting that this isn't a new direction for Office, but rather the continuation of a long tradition of adding support for additional formats. Office supports a variety of document formats, including the legacy binary formats, the Office Open XML formats, HTML, RTF, text, and many others. Support for multiple document formats provides many benefits to Office users, including the ability to choose the format that best meets each customer’s needs, whether those needs are interoperability, archiving, performance, or standards support. The addition of built-in ODF, PDF and XPS support are logical steps to address the evolving needs of Office users.
ODF, PDF and XPS as built-in file formats. We're making these new formats work just like the other formats Office supports, in a seamless and integrated fashion. When you click the Save As Type dropdown, for instance, you'll see a list which includes ODT, PDF and XPS in the same list where you'll find DOCX, DOC, and many other formats.
And of course users can set ODF to be the default format if they wish, the same way they would for other Word, Excel or PowerPoint formats.
What about the SourceForge translator projects? Microsoft has helped launch open-source translators on SourceForge that can be used for translating between Open XML formats and ODF, UOF or DAISY XML formats. We will continue to invest in these projects, because they enable scenarios that our built-in ODF support in Office doesn't address.
For example, the XSLTs from the SourceForge translator projects can be used by developers working on any platform, in any language. This provides many benefits:
- The translators have been used to enable interoperability on multiple platforms, including Novell (SUSE Linux) and Ubuntu redistributions
- The translators enable domain-specific heterogeneous scenarios, such as batch processing through a command-line interface and web service/mail server/portal integration scenarios.
- The open-source XSLT architecture provides fundamental mapping information between ODF and Open XML that is useful for implementers.
- The ODF translator project will continue to enable ODF read/write in Office XP and Office 2003.
There is new information today about the planned release of v2.0 of the ODF translator on the ODF translator team blog. The SourceForge translator projects will continue to move forward, and Microsoft will continue to be an active participant in these projects.
Third-party translators. We anticipate that some developers may want to take over the default ODF load and save paths, so that they can plug in their own translators for ODF, and we'll be providing an API in SP2 that enables this scenario. This means that if a developer disagrees with the details of our approach and would like to implement ODF for Office in a different way, they're free to do so and can set it up such that when a user opens an ODT attached to an email or from their desktop, it will be loaded through their ODF code path.
That's a first look at what we're planning for ODF support in Office, and of course we'll have much more to say as we get closer to the release of SP2. In the meantime, I'm very interested in what other developers and implementers feel is most important in our support of ODF and other standards. How can we work together to improve document format interoperability for the Microsoft Office system? What can we do better?
-
Open XML can help you skip school. I've covered in the past how ISVs, corporate developers, information workers, and public-sector decision makers can benefit from Open XML standardization, but my colleague Pranav Wagh has addressed a hitherto-ignored segment of the market with his "Open XML for primary school goers." On a more serious note, Pranav has posted some useful content recently on how to create a PPTX with the packaging API, and then how to merge selected slides into a PPTX.
Interop interviews in Information Week. Information Week has published a recent interview with Sam Ramji (Sr. Director, Platform Strategy) and Tom Robertson (GM, Standards/Interoperability) that covers various aspects of Microsoft's approach to interopability, standards, open source, and related topics.
Lost in translation. Rob Weir posted some performance tests for translation between document formats, prompting Jesper Lund Stocholm to exult "When Rob is right, he's right!" Both posts include interesting thoughts, and have prompted a good discussion of some of the core issues in translating between various document formats. On a related note, Sun announced recently the availability of version 1.2 of the Sun ODF plugin for Microsoft Office.
Is Jesper on a binge? Speaking of translation, I'm not sure how to interpret Jesper's latest post. It's written in Danish, and the only words I understand are beer and pilsner. Well, "alkoholisk" is a word I might be willing to guess at. Skal!
-
I was traveling on Tuesday/Wednesday this week and wasn't around when the news came out, but the DAISY translator plugin for Word and an update to the DAISY Pipeline were announced on Wednesday. Here are a few places to learn more about how it works and how it's being used ...
BLOGS:
PRESS COVERAGE:
LINKS:
-
The beta code for OpenOffice 3.0 has been released, and it includes Open XML read support:
Behind the scenes, OpenOffice.org 3.0 will support the upcoming OpenDocument Format (ODF) 1.2 standard, and is capable of opening files created with MS-Office 2007 or MS-Office 2008 for Mac OS X (.docx, .xlsx, .pptx, etc.). This is in addition to read and write support for the MS-Office binary file formats (.doc, .xls, .ppt, etc.).
Full text of the announcement: http://marketing.openoffice.org/3.0/announcementbeta.html