Welcome to MSDN Blogs Sign in | Join | Help
Transforming Flat OPC Format to Open XML Documents

[Blog Map] 

Transforming Open XML documents using XSLT is an interesting scenario, but before we can do so, we need to convert the Open XML document into the Flat OPC format.  We then perform the XSLT transform, producing a new file in the Flat OPC format, and then convert back to Open XML (OPC) format.  This post is one in a series of four posts that present this approach to transforming Open XML documents using XSLT.  The four posts are:

Transforming Open XML Documents using XSLT

Presents an overview of the transformation process of Open XML documents using XSLT, and why this is important.  Also presents the ‘Hello World’ XSLT transform of an Open XML document.

Transforming Open XML Documents to Flat OPC Format

This post describes the process of conversion of an Open XML (OPC) document into a Flat OPC document, and presents the C# function, OpcToFlat.

Transforming Flat OPC Format to Open XML Documents (This Post)

This post describes the process of conversion of a Flat OPC file back to an Open XML document, and presents the C# function, FlatToOpc.

The Flat OPC Format

Presents a description and examples of the Flat OPC format.

 

About the Code

The code presented in this post uses LINQ to XML and System.IO.Packaging to perform the conversion from Flat OPC to an Open XML (OPC) document.

The signature of the function to convert from an Open XML document to Flat OPC is:

static void FlatToOpc(XDocument doc, string docxPath)

 

You pass in an XDocument object and the path to the new Open XML document.  The method creates an Open XML document at the specified path.

The code to convert a base 64 string to binary uses the System.Convert.FromBase64CharArray method.  Before converting the string to binary, the code strips the new lines that were added when producing the Flat OPC file.

Here is the code to perform the transform (also attached):

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Xml.Linq;

using System.IO;

using System.IO.Packaging;

using System.Xml;

using System.Xml.Schema;

 

class Program

{

    static void FlatToOpc(XDocument doc, string docxPath)

    {

        XNamespace pkg =

            "http://schemas.microsoft.com/office/2006/xmlPackage";

        XNamespace rel =

            "http://schemas.openxmlformats.org/package/2006/relationships";

 

        using (Package package = Package.Open(docxPath, FileMode.Create))

        {

            // add all parts (but not relationships)

            foreach (var xmlPart in doc.Root

                .Elements()

                .Where(p =>

                    (string)p.Attribute(pkg + "contentType") !=

                    "application/vnd.openxmlformats-package.relationships+xml"))

            {

                string name = (string)xmlPart.Attribute(pkg + "name");

                string contentType = (string)xmlPart.Attribute(pkg + "contentType");

                if (contentType.EndsWith("xml"))

                {

                    Uri u = new Uri(name, UriKind.Relative);

                    PackagePart part = package.CreatePart(u, contentType,

                        CompressionOption.SuperFast);

                    using (Stream str = part.GetStream(FileMode.Create))

                    using (XmlWriter xmlWriter = XmlWriter.Create(str))

                        xmlPart.Element(pkg + "xmlData")

                            .Elements()

                            .First()

                            .WriteTo(xmlWriter);

                }

                else

                {

                    Uri u = new Uri(name, UriKind.Relative);

                    PackagePart part = package.CreatePart(u, contentType,

                        CompressionOption.SuperFast);

                    using (Stream str = part.GetStream(FileMode.Create))

                    using (BinaryWriter binaryWriter = new BinaryWriter(str))

                    {

                        string base64StringInChunks =

                            (string)xmlPart.Element(pkg + "binaryData");

                        char[] base64CharArray = base64StringInChunks

                            .Where(c => c != '\r' && c != '\n').ToArray();

                        byte[] byteArray =

                            System.Convert.FromBase64CharArray(base64CharArray,

                            0, base64CharArray.Length);

                        binaryWriter.Write(byteArray);

                    }

                }

            }

 

            foreach (var xmlPart in doc.Root.Elements())

            {

                string name = (string)xmlPart.Attribute(pkg + "name");

                string contentType = (string)xmlPart.Attribute(pkg + "contentType");

                if (contentType ==

                    "application/vnd.openxmlformats-package.relationships+xml")

                {

                    // add the package level relationships

                    if (name == "/_rels/.rels")

                    {

                        foreach (XElement xmlRel in

                            xmlPart.Descendants(rel + "Relationship"))

                        {

                            string id = (string)xmlRel.Attribute("Id");

                            string type = (string)xmlRel.Attribute("Type");

                            string target = (string)xmlRel.Attribute("Target");

                            string targetMode =

                                (string)xmlRel.Attribute("TargetMode");

                            if (targetMode == "External")

                                package.CreateRelationship(

                                    new Uri(target, UriKind.Absolute),

                                    TargetMode.External, type, id);

                            else

                                package.CreateRelationship(

                                    new Uri(target, UriKind.Relative),

                                    TargetMode.Internal, type, id);

                        }

                    }

                    else

                    // add part level relationships

                    {

                        string directory = name.Substring(0, name.IndexOf("/_rels"));

                        string relsFilename = name.Substring(name.LastIndexOf('/'));

                        string filename =

                            relsFilename.Substring(0, relsFilename.IndexOf(".rels"));

                        PackagePart fromPart = package.GetPart(

                            new Uri(directory + filename, UriKind.Relative));

                        foreach (XElement xmlRel in

                            xmlPart.Descendants(rel + "Relationship"))

                        {

                            string id = (string)xmlRel.Attribute("Id");

                            string type = (string)xmlRel.Attribute("Type");

                            string target = (string)xmlRel.Attribute("Target");

                            string targetMode =

                                (string)xmlRel.Attribute("TargetMode");

                            if (targetMode == "External")

                                fromPart.CreateRelationship(

                                    new Uri(target, UriKind.Absolute),

                                    TargetMode.External, type, id);

                            else

                                fromPart.CreateRelationship(

                                    new Uri(target, UriKind.Relative),

                                    TargetMode.Internal, type, id);

                        }

                    }

                }

            }

        }

    }

 

    static void Main(string[] args)

    {

        XDocument doc;

        doc = XDocument.Load("Test.xml");

        FlatToOpc(doc, "Test-new.docx");

        doc = XDocument.Load("Test2.xml");

        FlatToOpc(doc, "Test2-new.pptx");

    }

}

 

Posted: Monday, September 29, 2008 1:46 PM by EricWhite
Filed under:

Attachment(s): FlatToOpc.zip

Comments

Alberic said:

Hello,

Thanks for this great posts about Flat OPC. Very helpfull.

I have a problem with the flatToOpc method, maybe it is my fault,

I extracted the an CustomXML Node from a word document using the XMLNode.WordOpenXML function in interop. This make me flap OPC structure (w:pkg..)

Then i save it to my database. What i m trying to do is to transform this content as a document using the FlatToOpc method. But when i m trying to open the docx build by the function, i doesnt open.

I look inside the Package and i saw that the package is not complet. I guess it is because the pkg i have is just a fragment of my document.

Is there a way to transform it as a document? what is the best way?

Hope you can help me. Thanks in advance.

# December 4, 2008 10:52 AM

EricWhite said:

Hi Alberic,

I'm not an expert in Office interop.  I believe that when you get your custom XML node, you are only getting the custom XML part, not the complete document.

However, I believe that you can using interop to 'Save as XML', which is the Flat OPC format (perhaps with a GUID as a filename).  You can then load that document and save it to your database.  I don't know of any way to get the complete Flat OPC document using Word interop (not to say that there isn't a way - I just don't know it).

Be aware that if you are saving documents into your database as Flat OPC, your storage requirements (particularly for documents with images) will increase.

Does this help?

-Eric

# December 4, 2008 1:51 PM

Alberic said:

Thanks for this quick answer Eric!

Actually it is working. My test wasn't working because in my fragment i had a contentcontrol that was databind to an xmlpart that wasn't present in my Flat XML.

Thanks for the help and all your helpfull posts.

Regards

Albéric

# December 5, 2008 3:28 AM

Amy Kins said:

Hi Eric,

Just wanted to say great post. I have been trying to find an example like this for 2 days straight. We also use word interop to save our documents. Alberic - you can use "Microsoft.Interop.Word._Document.WordOpenXML" to get the documents xml but it seems that this property does not comply with your package standards Eric. If I cannot find a way around this I will simply convert all the document xml content to conform to your flattening standards.

One thing that does concern me though is that my original document is 372 kb and after flatting it and then packaging it again it is now 505 kb. Could you explain why this happens?

Thanks again,

Amykins

# October 15, 2009 2:51 AM

EricWhite said:

Hi Amy,

Glad you like the post.  Regarding file sizes, I suspect it might have to do with differences in compression methods that my code uses and that Word uses.  I didn't do much research around this.  There may be other options than CompressionOption.SuperFast which will result in smaller file sizes, but I'm not sure.  There may be other factors that impact file size.  Currently, I'm deeply involved in another project that I must finish in the next couple of weeks.  I'll take a look at this as soon as I'm able.

-Eric

# October 15, 2009 3:10 AM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

  
Enter Code Here: Required

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker