Blog - Title

Transforming Flat OPC Format to Open XML Documents

Transforming Flat OPC Format to Open XML Documents

  • Comments 13

Transforming Open XML documents using XSLT is an interesting scenario, but before we can do so, we need to convert the Open XML document into the Flat OPC format.  We then perform the XSLT transform, producing a new file in the Flat OPC format, and then convert back to Open XML (OPC) format.  This post is one in a series of four posts that present this approach to transforming Open XML documents using XSLT.  The four posts are:

Transforming Open XML Documents using XSLT

Presents an overview of the transformation process of Open XML documents using XSLT, and why this is important.  Also presents the ‘Hello World’ XSLT transform of an Open XML document.

Transforming Open XML Documents to Flat OPC Format

This post describes the process of conversion of an Open XML (OPC) document into a Flat OPC document, and presents the C# function, OpcToFlat.

Transforming Flat OPC Format to Open XML Documents (This Post)

This post describes the process of conversion of a Flat OPC file back to an Open XML document, and presents the C# function, FlatToOpc.

The Flat OPC Format

Presents a description and examples of the Flat OPC format.

About the Code

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
The code presented in this post uses LINQ to XML and System.IO.Packaging to perform the conversion from Flat OPC to an Open XML (OPC) document.

The signature of the function to convert from an Open XML document to Flat OPC is:

static void FlatToOpc(XDocument doc, string docxPath)

You pass in an XDocument object and the path to the new Open XML document.  The method creates an Open XML document at the specified path.

The code to convert a base 64 string to binary uses the System.Convert.FromBase64CharArray method.  Before converting the string to binary, the code strips the new lines that were added when producing the Flat OPC file.

Here is the code to perform the transform (also attached):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using System.IO;
using System.IO.Packaging;
using System.Xml;
using System.Xml.Schema;

class Program
{
    static void FlatToOpc(XDocument doc, string docxPath)
    {
        XNamespace pkg =
            "http://schemas.microsoft.com/office/2006/xmlPackage";
        XNamespace rel =
            "http://schemas.openxmlformats.org/package/2006/relationships";

        using (Package package = Package.Open(docxPath, FileMode.Create))
        {
            // add all parts (but not relationships)
            foreach (var xmlPart in doc.Root
                .Elements()
                .Where(p =>
                    (string)p.Attribute(pkg + "contentType") !=
                    "application/vnd.openxmlformats-package.relationships+xml"))
            {
                string name = (string)xmlPart.Attribute(pkg + "name");
                string contentType = (string)xmlPart.Attribute(pkg + "contentType");
                if (contentType.EndsWith("xml"))
                {
                    Uri u = new Uri(name, UriKind.Relative);
                    PackagePart part = package.CreatePart(u, contentType,
                        CompressionOption.SuperFast);
                    using (Stream str = part.GetStream(FileMode.Create))
                    using (XmlWriter xmlWriter = XmlWriter.Create(str))
                        xmlPart.Element(pkg + "xmlData")
                            .Elements()
                            .First()
                            .WriteTo(xmlWriter);
                }
                else
                {
                    Uri u = new Uri(name, UriKind.Relative);
                    PackagePart part = package.CreatePart(u, contentType,
                        CompressionOption.SuperFast);
                    using (Stream str = part.GetStream(FileMode.Create))
                    using (BinaryWriter binaryWriter = new BinaryWriter(str))
                    {
                        string base64StringInChunks =
                            (string)xmlPart.Element(pkg + "binaryData");
                        char[] base64CharArray = base64StringInChunks
                            .Where(c => c != '\r' && c != '\n').ToArray();
                        byte[] byteArray =
                            System.Convert.FromBase64CharArray(base64CharArray,
                            0, base64CharArray.Length);
                        binaryWriter.Write(byteArray);
                    }
                }
            }

            foreach (var xmlPart in doc.Root.Elements())
            {
                string name = (string)xmlPart.Attribute(pkg + "name");
                string contentType = (string)xmlPart.Attribute(pkg + "contentType");
                if (contentType ==
                    "application/vnd.openxmlformats-package.relationships+xml")
                {
                    // add the package level relationships
                    if (name == "/_rels/.rels")
                    {
                        foreach (XElement xmlRel in
                            xmlPart.Descendants(rel + "Relationship"))
                        {
                            string id = (string)xmlRel.Attribute("Id");
                            string type = (string)xmlRel.Attribute("Type");
                            string target = (string)xmlRel.Attribute("Target");
                            string targetMode =
                                (string)xmlRel.Attribute("TargetMode");
                            if (targetMode == "External")
                                package.CreateRelationship(
                                    new Uri(target, UriKind.Absolute),
                                    TargetMode.External, type, id);
                            else
                                package.CreateRelationship(
                                    new Uri(target, UriKind.Relative),
                                    TargetMode.Internal, type, id);
                        }
                    }
                    else
                    // add part level relationships
                    {
                        string directory = name.Substring(0, name.IndexOf("/_rels"));
                        string relsFilename = name.Substring(name.LastIndexOf('/'));
                        string filename =
                            relsFilename.Substring(0, relsFilename.IndexOf(".rels"));
                        PackagePart fromPart = package.GetPart(
                            new Uri(directory + filename, UriKind.Relative));
                        foreach (XElement xmlRel in
                            xmlPart.Descendants(rel + "Relationship"))
                        {
                            string id = (string)xmlRel.Attribute("Id");
                            string type = (string)xmlRel.Attribute("Type");
                            string target = (string)xmlRel.Attribute("Target");
                            string targetMode =
                                (string)xmlRel.Attribute("TargetMode");
                            if (targetMode == "External")
                                fromPart.CreateRelationship(
                                    new Uri(target, UriKind.Absolute),
                                    TargetMode.External, type, id);
                            else
                                fromPart.CreateRelationship(
                                    new Uri(target, UriKind.Relative),
                                    TargetMode.Internal, type, id);
                        }
                    }
                }
            }
        }
    }

    static void Main(string[] args)
    {
        XDocument doc;
        doc = XDocument.Load("Test.xml");
        FlatToOpc(doc, "Test-new.docx");
        doc = XDocument.Load("Test2.xml");
        FlatToOpc(doc, "Test2-new.pptx");
    }
}

Attachment: FlatToOpc.zip
Leave a Comment
  • Please add 6 and 4 and type the answer here:
  • Post
  • PingBack from http://osrin.net/2008/10/eric-white-has-too-much-to-say/

  • Hello,

    Thanks for this great posts about Flat OPC. Very helpfull.

    I have a problem with the flatToOpc method, maybe it is my fault,

    I extracted the an Custom XML Node from a word document using the XMLNode.WordOpenXML function in interop. This make me flap OPC structure (w:pkg..)

    Then i save it to my database. What i m trying to do is to transform this content as a document using the FlatToOpc method. But when i m trying to open the docx build by the function, i doesnt open.

    I look inside the Package and i saw that the package is not complet. I guess it is because the pkg i have is just a fragment of my document.

    Is there a way to transform it as a document? what is the best way?

    Hope you can help me. Thanks in advance.

  • Hi Alberic,

    I'm not an expert in Office interop.  I believe that when you get your custom XML node, you are only getting the custom XML part, not the complete document.

    However, I believe that you can using interop to 'Save as XML', which is the Flat OPC format (perhaps with a GUID as a filename).  You can then load that document and save it to your database.  I don't know of any way to get the complete Flat OPC document using Word interop (not to say that there isn't a way - I just don't know it).

    Be aware that if you are saving documents into your database as Flat OPC, your storage requirements (particularly for documents with images) will increase.

    Does this help?

    -Eric

  • Thanks for this quick answer Eric!

    Actually it is working. My test wasn't working because in my fragment i had a contentcontrol that was databind to an xmlpart that wasn't present in my Flat XML.

    Thanks for the help and all your helpfull posts.

    Regards

    Albéric

  • Hi Eric,

    Just wanted to say great post. I have been trying to find an example like this for 2 days straight. We also use word interop to save our documents. Alberic - you can use "Microsoft.Interop.Word._Document.WordOpenXML" to get the documents xml but it seems that this property does not comply with your package standards Eric. If I cannot find a way around this I will simply convert all the document xml content to conform to your flattening standards.

    One thing that does concern me though is that my original document is 372 kb and after flatting it and then packaging it again it is now 505 kb. Could you explain why this happens?

    Thanks again,

    Amykins

  • Hi Amy,

    Glad you like the post.  Regarding file sizes, I suspect it might have to do with differences in compression methods that my code uses and that Word uses.  I didn't do much research around this.  There may be other options than CompressionOption.SuperFast which will result in smaller file sizes, but I'm not sure.  There may be other factors that impact file size.  Currently, I'm deeply involved in another project that I must finish in the next couple of weeks.  I'll take a look at this as soon as I'm able.

    -Eric

  • I found a bug. Please take a look at http://www.windwardreports.com/temp/BadTransform.zip - it has a legit flat XML file (got it by calling Range.WordOpenXML on an InlineShape that is a chart).

    Problem 1) It has an external link that is a relative URI.

    Problem 2) If that is switched to relative, it generates a package that has no MainDocumentPart.

    Any ideas?

    thanks - dave

  • ps - the Range.WordOpenXML was called on the chart in the attached docx file. That chart was created in the attached xlsx file and then pasted into Word.

    As all values in the chart are literals, there is no embedded or linked spreadsheet.

    thanks - dave

  • Hi David,

    I took a look at the DOCX file, and it is invalid, as it has the same external relationship in it with an invalid external URI.  How did you create that DOCX?  I tried copying and pasting a chart from excel, and that properly created a valid external relationship for the external data for the chart.  I would be very interested to know which operation creates that invalid URI for external relationship.

    -Eric

  • I did a select of the chart in the xlsx file and pasted it to the docx file. If you look at the docx file you will see it has this.

    What is special here is the chart data is all literals so there is no need for an embeded or external XLSX file for the chart data.

    ps - for a faster response, you are welcome to email me at david@windward.net

  • Hi;

    Have you had a chance to look at the literal case for an embedded chart?

    thanks - dave

  • Hi David,

    Yes, I have.  It appears that in certain circumstances, when copying/pasting a chart, the external reference is created with an incorrect external reference.  I've seen a few of these around, and until now didn't know how these were created.  I was able to get DocumentBuilder to work properly by surrounding the creation of the external link with a try/catch block, and then if it threw an exception, finding the w:externalData element in chart1.xml, and removing it.  That element isn't necessary.  Then the resulting document isn't invalid.

    Which version of Office are you using?

    -Eric

  • I'm using the last SP of Office 2007. I just remove the link altogether in the document and that seems to work ok too. So it looks like we both did the same thing but different ways.

    If you say not having that is ok, then I'm happy.

    thanks - dave

    ps - to get this you create a chart in Excel where all the chart data is literals - so there is no XLSX file for the data.

Page 1 of 1 (13 items)