[Blog Map] This blog is inactive. New blog: EricWhite.com/blog
Transforming Open XML documents using XSLT is an interesting scenario, but before we can do so, we need to convert the Open XML document into the Flat OPC format. We then perform the XSLT transform, producing a new file in the Flat OPC format, and then convert back to Open XML (OPC) format. This post is one in a series of four posts that present this approach to transforming Open XML documents using XSLT. The four posts are:
Transforming Open XML Documents using XSLT
Presents an overview of the transformation process of Open XML documents using XSLT, and why this is important. Also presents the ‘Hello World’ XSLT transform of an Open XML document.
Transforming Open XML Documents to Flat OPC Format
This post describes the process of conversion of an Open XML (OPC) document into a Flat OPC document, and presents the C# function, OpcToFlat.
Transforming Flat OPC Format to Open XML Documents
This post describes the process of conversion of a Flat OPC file back to an Open XML document, and presents the C# function, FlatToOpc.
The Flat OPC Format (This Post)
Presents a description and examples of the Flat OPC format.
All of the parts in the OPC package are there in the Flat OPC XML document, but the parts are not files in a ZIP file; they are instead child elements of other XML elements, which contain information about the part such as its URI and content type. If the part is a binary part in the OPC document, the binary data is encoded in a base 64 string. All of the relations between parts are also stored as XML within the containing Flat OPC XML document.
The following snippet contains the first few lines (reformatted a bit to make it somewhat more readable) of a DOCX Open XML document that has been saved in this format. The elements that contain the parts of the original OPC file are in the http://schemas.microsoft.com/office/2006/xmlPackage namespace.
<?xml version="1.0" encoding="utf-8" standalone="yes"?><?mso-application progid="Word.Document"?><pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage"> <pkg:part pkg:name="/docProps/app.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"> <pkg:xmlData> <Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"> <Template>Normal.dotm</Template> <TotalTime>1</TotalTime>
If the part is a binary part, then the XML (if formatted) will look something like this:
<?xml version="1.0" encoding="utf-8" standalone="yes"?><?mso-application progid="Word.Document"?><pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage"> <!-- parts elided --> <pkg:part pkg:name="/word/media/image1.png" pkg:contentType="image/png" pkg:compression="store"> <pkg:binaryData>iVBORw0KGgoAAAANSUhEUgAAAC8AAAAwCAIAAAAOxbS1AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAB3RJTUUH2AkZFyYqE7SxgQAAAAd0RVh0QXV0aG9yAKmuzEgAAAAMdEVYdERlc2NyaXB0aW9uABMJISMAAAAKdEVYdENvcHlyaWdodACsD8w6AAAADnRFWHRDcmVhdGlvbiB0aW1lADX3DwkAAAAJdEVYdFNvZnR3YXJlAF1w/zoAAAALdEVYdERpc2NsYWltZXIAt8C0jwAAAAh0RVh0V2FybmluZwDAG+aHnAupHOW4Uipc9/jjp+xskiue31xJkDGpnHUTxs8pRPTe8P9HxQL+H6KBS/qb/3X5f5ory38B6Ji6BcSn9wYAAAAASUVORK5CYII=</pkg:binaryData> </pkg:part>
There is an interesting characteristic of the binary data that is encoded into a base 64 string: the string must be broken into lines of 76 characters, and there must not be a line break at the beginning or end of the data. No big deal, but we must take this into consideration when converting OPC to Flat OPC and back again.
You can save documents in the Flat OPC format using Word 2007. In the ‘Save As’ dialog box, select ‘Word XML Document’ from the ‘Save as type’ drop-down list:
When you save in this format with Word 2007, Word adds the following XML processing instruction to the XML document:
This allows you to double-click on the file in Windows, and Word 2007 will open the document.
PowerPoint 2007 has an identical feature. You can save as type ‘PowerPoint XML Presentation’ to save in the Flat OPC format. PowerPoint adds the following processing instruction to the XML document:
Excel 2007 does not have the feature to allow you to save in the Flat OPC format. However, the approach of converting an OPC file to Flat OPC, transforming using XSLT, and then converting back still works. For consistency, we’ll call an XLSX that has been converted to Flat OPC an ‘Excel XML Spreadsheet’ document.
To summarize, there are three XML document formats that are varieties of the Flat OPC format:
Note that the Flat OPC format is not the same as the ‘Word 2003 XML Document’ format. Those documents have a schema that is very different from the Flat OPC format.
The relations between parts are also stored in the Flat OPC XML document. The XML that contains the relations looks like this (reformatted):
<pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml"> <pkg:xmlData> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml" /> <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml" /> <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml" /> </Relationships> </pkg:xmlData></pkg:part>
There are two varieties of relations:
The above snippet of XML contains the relations from the package to parts within the package. The following snippet shows some relations between parts:
<pkg:part pkg:name="/word/_rels/document.xml.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml"> <pkg:xmlData> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings" Target="webSettings.xml" /> <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml" /> <!-- some relations elided --> </Relationships> </pkg:xmlData></pkg:part>
When there is a relation between parts, the relation is defined from one part to another part. To determine the URI of the ‘from’ part, we need to parse the pkg:name attribute of the pkg:part element. For example, a pkg:name attribute with the value of /word/_rels/document.xml.rels indicates that this part contains the relations from the /word/document.xml part to other parts.
External relations are also stored in the relevant pkg:part element, indicated by a TargetMode attribute with the value of ‘External’:
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="file:///C:\Users\ericwhit\Documents\08-09-25-Word-Xml-Document\WordXmlDocument\bin\Debug\OfficeButton.png" TargetMode="External" />
PingBack from http://osrin.net/2008/10/eric-white-has-too-much-to-say/
Thanks so much for this post. It helped improve my knowledge of the flat OPC format a great deal.
Question for you, do you know if PowerPoint 2008 for the mac can understand the Flat OPC format at all? I have a .xml file that can be opened as a presentation in PowerPoint 2007 on the PC but can't be opened in PowerPoint 2008 on the Mac. I don't see the "Save as PowerPoint XML Presentation" on the Mac like the PC has so I was wondering if the Mac handles the pptx formats only.
Thanks in advance,
I don't know very much about Mac support for Open XML or the flat OPC format, but if it won't open it, nor has the option to save as PowerPoint XML, then would guess it's not currently supported. As far as I know, the product team hasn't announced anything regarding this. Wish I had more info for you, but unfortunately, I don't.
Could you please tell me what mime type I should use when serving a Flat OPC XML document?
thanks .. Jason
You said - Excel 2007 does not have the feature to allow you to save in the Flat OPC format. However, the approach of converting an XLSX file to Flat OPC, transforming using XSLT, and then converting back still works.
Can you explain how?
Are you trying to directly display the flat OPC in a browser window? If so, I haven't done any work in this area, but I think that you would have to take an approach similar to the Open XML Viewer (codeplex.com/openxmlviewer).
The difference between Excel and the other two apps (Word and PowerPoint) is that Word and Powerpoint have the capability of directly reading the XML. But if you convert an OPC to flat OPC, and then back to OPC results in a normal Open XML document, which Excel can then open. Excel doesn't know that you've done a transformation to flat OPC and back, nor does it care. (In the article, I misspoke, and wrote it as "XLSX to flat OPC and then back to OPC". I was thinking OPC, but wrote XLSX. :P )
I'm still tripping over one thing - what is the relationship between an OpenXmlPackage and a Package. It seems to me that they are both the entire flat OPC XML document.
??? - thanks - dave