[Blog Map] This blog is inactive. New blog: EricWhite.com/blog
Transforming Open XML documents using XSLT is an interesting scenario, but before we can do so, we need to convert the Open XML document into the Flat OPC format. We then perform the XSLT transform, producing a new file in the Flat OPC format, and then convert back to Open XML (OPC) format. This post is one in a series of four posts that present this approach to transforming Open XML documents using XSLT. The four posts are:
Transforming Open XML Documents using XSLT
Presents an overview of the transformation process of Open XML documents using XSLT, and why this is important. Also presents the ‘Hello World’ XSLT transform of an Open XML document.
Transforming Open XML Documents to Flat OPC Format
This post describes the process of conversion of an Open XML (OPC) document into a Flat OPC document, and presents the C# function, OpcToFlat.
Transforming Flat OPC Format to Open XML Documents
This post describes the process of conversion of a Flat OPC file back to an Open XML document, and presents the C# function, FlatToOpc.
The Flat OPC Format (This Post)
Presents a description and examples of the Flat OPC format.
All of the parts in the OPC package are there in the Flat OPC XML document, but the parts are not files in a ZIP file; they are instead child elements of other XML elements, which contain information about the part such as its URI and content type. If the part is a binary part in the OPC document, the binary data is encoded in a base 64 string. All of the relations between parts are also stored as XML within the containing Flat OPC XML document.
The following snippet contains the first few lines (reformatted a bit to make it somewhat more readable) of a DOCX Open XML document that has been saved in this format. The elements that contain the parts of the original OPC file are in the http://schemas.microsoft.com/office/2006/xmlPackage namespace.
<?xml version="1.0" encoding="utf-8" standalone="yes"?><?mso-application progid="Word.Document"?><pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage"> <pkg:part pkg:name="/docProps/app.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"> <pkg:xmlData> <Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"> <Template>Normal.dotm</Template> <TotalTime>1</TotalTime>
If the part is a binary part, then the XML (if formatted) will look something like this:
<?xml version="1.0" encoding="utf-8" standalone="yes"?><?mso-application progid="Word.Document"?><pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage"> <!-- parts elided --> <pkg:part pkg:name="/word/media/image1.png" pkg:contentType="image/png" pkg:compression="store"> <pkg:binaryData>iVBORw0KGgoAAAANSUhEUgAAAC8AAAAwCAIAAAAOxbS1AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAB3RJTUUH2AkZFyYqE7SxgQAAAAd0RVh0QXV0aG9yAKmuzEgAAAAMdEVYdERlc2NyaXB0aW9uABMJISMAAAAKdEVYdENvcHlyaWdodACsD8w6AAAADnRFWHRDcmVhdGlvbiB0aW1lADX3DwkAAAAJdEVYdFNvZnR3YXJlAF1w/zoAAAALdEVYdERpc2NsYWltZXIAt8C0jwAAAAh0RVh0V2FybmluZwDAG+aHnAupHOW4Uipc9/jjp+xskiue31xJkDGpnHUTxs8pRPTe8P9HxQL+H6KBS/qb/3X5f5ory38B6Ji6BcSn9wYAAAAASUVORK5CYII=</pkg:binaryData> </pkg:part>
There is an interesting characteristic of the binary data that is encoded into a base 64 string: the string must be broken into lines of 76 characters, and there must not be a line break at the beginning or end of the data. No big deal, but we must take this into consideration when converting OPC to Flat OPC and back again.
You can save documents in the Flat OPC format using Word 2007. In the ‘Save As’ dialog box, select ‘Word XML Document’ from the ‘Save as type’ drop-down list:
When you save in this format with Word 2007, Word adds the following XML processing instruction to the XML document:
This allows you to double-click on the file in Windows, and Word 2007 will open the document.
PowerPoint 2007 has an identical feature. You can save as type ‘PowerPoint XML Presentation’ to save in the Flat OPC format. PowerPoint adds the following processing instruction to the XML document:
Excel 2007 does not have the feature to allow you to save in the Flat OPC format. However, the approach of converting an OPC file to Flat OPC, transforming using XSLT, and then converting back still works. For consistency, we’ll call an XLSX that has been converted to Flat OPC an ‘Excel XML Spreadsheet’ document.
To summarize, there are three XML document formats that are varieties of the Flat OPC format:
Note that the Flat OPC format is not the same as the ‘Word 2003 XML Document’ format. Those documents have a schema that is very different from the Flat OPC format.
The relations between parts are also stored in the Flat OPC XML document. The XML that contains the relations looks like this (reformatted):
<pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml"> <pkg:xmlData> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml" /> <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml" /> <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml" /> </Relationships> </pkg:xmlData></pkg:part>
There are two varieties of relations:
The above snippet of XML contains the relations from the package to parts within the package. The following snippet shows some relations between parts:
<pkg:part pkg:name="/word/_rels/document.xml.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml"> <pkg:xmlData> <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings" Target="webSettings.xml" /> <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml" /> <!-- some relations elided --> </Relationships> </pkg:xmlData></pkg:part>
When there is a relation between parts, the relation is defined from one part to another part. To determine the URI of the ‘from’ part, we need to parse the pkg:name attribute of the pkg:part element. For example, a pkg:name attribute with the value of /word/_rels/document.xml.rels indicates that this part contains the relations from the /word/document.xml part to other parts.
External relations are also stored in the relevant pkg:part element, indicated by a TargetMode attribute with the value of ‘External’:
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="file:///C:\Users\ericwhit\Documents\08-09-25-Word-Xml-Document\WordXmlDocument\bin\Debug\OfficeButton.png" TargetMode="External" />