Welcome to MSDN Blogs Sign in | Join | Help
The Flat OPC Format

[Blog Map] 

Transforming Open XML documents using XSLT is an interesting scenario, but before we can do so, we need to convert the Open XML document into the Flat OPC format.  We then perform the XSLT transform, producing a new file in the Flat OPC format, and then convert back to Open XML (OPC) format.  This post is one in a series of four posts that present this approach to transforming Open XML documents using XSLT.  The four posts are:

Transforming Open XML Documents using XSLT

Presents an overview of the transformation process of Open XML documents using XSLT, and why this is important.  Also presents the ‘Hello World’ XSLT transform of an Open XML document.

Transforming Open XML Documents to Flat OPC Format

This post describes the process of conversion of an Open XML (OPC) document into a Flat OPC document, and presents the C# function, OpcToFlat.

Transforming Flat OPC Format to Open XML Documents

This post describes the process of conversion of a Flat OPC file back to an Open XML document, and presents the C# function, FlatToOpc.

The Flat OPC Format (This Post)

Presents a description and examples of the Flat OPC format.

 

All of the parts in the OPC package are there in the Flat OPC XML document, but the parts are not files in a ZIP file; they are instead child elements of other XML elements, which contain information about the part such as its URI and content type.  If the part is a binary part in the OPC document, the binary data is encoded in a base 64 string.  All of the relations between parts are also stored as XML within the containing Flat OPC XML document.

The following snippet contains the first few lines (reformatted a bit to make it somewhat more readable) of a DOCX Open XML document that has been saved in this format.  The elements that contain the parts of the original OPC file are in the http://schemas.microsoft.com/office/2006/xmlPackage namespace.

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

<?mso-application progid="Word.Document"?>

<pkg:package

  xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">

  <pkg:part

    pkg:name="/docProps/app.xml"

    pkg:contentType="application/vnd.openxmlformats-officedocument.extended-properties+xml">

    <pkg:xmlData>

      <Properties

        xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"

       xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">

        <Template>Normal.dotm</Template>

        <TotalTime>1</TotalTime>

 

If the part is a binary part, then the XML (if formatted) will look something like this:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

<?mso-application progid="Word.Document"?>

<pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">

  <!-- parts elided -->

  <pkg:part pkg:name="/word/media/image1.png"

            pkg:contentType="image/png"

            pkg:compression="store">

    <pkg:binaryData>iVBORw0KGgoAAAANSUhEUgAAAC8AAAAwCAIAAAAOxbS1AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA

B3RJTUUH2AkZFyYqE7SxgQAAAAd0RVh0QXV0aG9yAKmuzEgAAAAMdEVYdERlc2NyaXB0aW9uABMJ

ISMAAAAKdEVYdENvcHlyaWdodACsD8w6AAAADnRFWHRDcmVhdGlvbiB0aW1lADX3DwkAAAAJdEVY

dFNvZnR3YXJlAF1w/zoAAAALdEVYdERpc2NsYWltZXIAt8C0jwAAAAh0RVh0V2FybmluZwDAG+aH

nAupHOW4Uipc9/jjp+xskiue31xJkDGpnHUTxs8pRPTe8P9HxQL+H6KBS/qb/3X5f5ory38B6Ji6

BcSn9wYAAAAASUVORK5CYII=</pkg:binaryData>

  </pkg:part>

 

 

There is an interesting characteristic of the binary data that is encoded into a base 64 string: the string must be broken into lines of 76 characters, and there must not be a line break at the beginning or end of the data.  No big deal, but we must take this into consideration when converting OPC to Flat OPC and back again.

You can save documents in the Flat OPC format using Word 2007.  In the ‘Save As’ dialog box, select ‘Word XML Document’ from the ‘Save as type’ drop-down list:

When you save in this format with Word 2007, Word adds the following XML processing instruction to the XML document:

<?mso-application progid="Word.Document"?>

 

This allows you to double-click on the file in Windows, and Word 2007 will open the document.

PowerPoint 2007 has an identical feature.  You can save as type ‘PowerPoint XML Presentation’ to save in the Flat OPC format.  PowerPoint adds the following processing instruction to the XML document:

<?mso-application progid="PowerPoint.Show"?>

 

Excel 2007 does not have the feature to allow you to save in the Flat OPC format.  However, the approach of converting an XLSX file to Flat OPC, transforming using XSLT, and then converting back still works.  For consistency, we’ll call an XLSX that has been converted to Flat OPC an ‘Excel XML Spreadsheet’ document.

To summarize, there are three XML document formats that are varieties of the Flat OPC format:

  • Word XML Document
  • PowerPoint XML Presentation
  • Excel XML Spreadsheet

Note that the Flat OPC format is not the same as the ‘Word 2003 XML Document’ format.  Those documents have a schema that is very different from the Flat OPC format.

The relations between parts are also stored in the Flat OPC XML document.  The XML that contains the relations looks like this (reformatted):

<pkg:part pkg:name="/_rels/.rels"

          pkg:contentType="application/vnd.openxmlformats-package.relationships+xml">

  <pkg:xmlData>

    <Relationships

        xmlns="http://schemas.openxmlformats.org/package/2006/relationships">

      <Relationship

        Id="rId3"

        Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties"

        Target="docProps/app.xml" />

      <Relationship

        Id="rId2"

        Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties"

        Target="docProps/core.xml" />

      <Relationship

        Id="rId1"

        Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"

        Target="word/document.xml" />

    </Relationships>

  </pkg:xmlData>

</pkg:part>

 

There are two varieties of relations:

  • The package in the OPC file has relations to parts.  If the pkg:part contains a pkg:name attribute with the value of "/_rels/.rels" then the part contains the relations from the OPC package to parts within the package.
  • Parts have relations to other parts.  If the pkg:part contains a pkg:name attribute with a value other than "/_rels/.rels" then the part contains the relations from one part to another part.

The above snippet of XML contains the relations from the package to parts within the package.  The following snippet shows some relations between parts:

<pkg:part pkg:name="/word/_rels/document.xml.rels"

          pkg:contentType="application/vnd.openxmlformats-package.relationships+xml">

  <pkg:xmlData>

    <Relationships

        xmlns="http://schemas.openxmlformats.org/package/2006/relationships">

      <Relationship Id="rId3"

        Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings"

        Target="webSettings.xml" />

      <Relationship Id="rId2"

        Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings"

        Target="settings.xml" />

      <!-- some relations elided -->

    </Relationships>

  </pkg:xmlData>

</pkg:part>

 

When there is a relation between parts, the relation is defined from one part to another part.  To determine the URI of the ‘from’ part, we need to parse the pkg:name attribute of the pkg:part element.  For example, a pkg:name attribute with the value of /word/_rels/document.xml.rels indicates that this part contains the relations from the /word/document.xml part to other parts.

External relations are also stored in the relevant pkg:part element, indicated by a TargetMode attribute with the value of ‘External’:

<Relationship

  Id="rId4"

  Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"

  Target="file:///C:\Users\ericwhit\Documents\08-09-25-Word-Xml-Document\WordXmlDocument\bin\Debug\OfficeButton.png"

  TargetMode="External" />

 

Posted: Monday, September 29, 2008 1:36 PM by EricWhite
Filed under:
Leave a Comment

(required) 

(required) 

(optional)

(required) 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker