October, 2008

  • Eric White's Blog

    How to Use altChunk for Document Assembly

    • 54 Comments

    Merging multiple word processing documents into a single document is something that many people want to do.  An application built for attorneys might assemble selected standard clauses into a contract.  An application built for book publishers can assemble chapters of a book into a single document.  This post explains the semantics of the altChunk element, and provides some code using the Open XML SDK that shows how to use altChunk.

    This blog is inactive.
    New blog: EricWhite.com/blog

    Blog TOC
    Instead of using altChunk, you could write a program to merge the Open XML markup for documents.  You would need to deal with a number of issues, including merging style sheets and resolving conflicting styles, merging the comments from all of the documents, merging bookmarks, and more.  This is doable, but it’s a lot of work.  You can use altChunk to let Word 2007 do the heavy lifting for you.

    altChunk is a powerful technique.  It’s a tool that should be in every Open XML developer’s toolbox.  In an upcoming post, I’ll show an example of the use of altChunk in a SharePoint application.  You can create compelling document assembly solutions in SharePoint using altChunk.

    Overview of the altChunk Markup

    The altChunk markup tells the consuming application to import content into the document.  This behavior is not required for a conforming application – a conforming application is free to ignore the altChunk markup.  However, the standard recommends that if the application ignores the altChunk markup, it should notify the user.  Word 2007 supports altChunk.

    To use altChunk, you do the following:

    • You create a new part in the package.  The part can have a number of content types, listed below.  When you create the part, you assign a unique ID to the part.
    • You store the content that you want to import into the part.  You can import a variety of types of content, including another Open XML word processing document, HTML, or text.
    • The main document part has a relationship to the alternative format part.
    • You add a w:altChunk element at the location where you want to import the alternative format content.  The r:id attribute of the w:altChunk element identifies the chunk to import.  The w:altChunk element is a sibling to paragraph elements (w:p).  You can add an altChunk element at any point in the markup that can contain a paragraph element.

    A few options for content types that can be imported into a document are:

    • application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml

    The alternative format content part contains an Open XML document in binary form.

    • application/xhtml+xml

    The alternative format content part contains an XHTML document.

    • text/plain

    The alternative format content part contains text.

    There are more than these three options; the code presented in this post shows how to implement altChunk for these three types of content.

    The altChunk markup in the document looks like this:

    <w:p>
      <w:r>
        <w:t>Paragraph before.</w:t>
      </w:r>
    </w:p>
    <w:altChunkr:id="AltChunkId1" />
    <w:p>
      <w:r>
        <w:t>Paragraph after.</w:t>
      </w:r>
    </w:p>


    altChunk: Import Only

    One important note about altChunk – it is used only for importing content.  If you open the document using Word 2007 and save it, the newly saved document will not contain the alternative format content part, nor the altChunk markup that references it.  Word saves all imported content as paragraph (w:p) elements.  The standard requires this behavior from a conforming application.

    Using altChunk

    The following screen-clipping shows a simple word processing document.  It has a heading, a paragraph styled as Normal, and a comment:

    The following screen-clipping shows another word processing document, with content that we want to insert into the first document.

    After running the example program included with this post, the resulting document looks like the following.  Notice that the resulting document has comments from both of the source documents:

    The following example shows how to merge two Open XML documents using altChunk.  It uses V1 of the Open XML SDK, and LINQ to XML:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.IO;
    using DocumentFormat.OpenXml.Packaging;
    using System.Xml;
    using System.Xml.Linq;

    class Program
    {
        static void Main(string[] args)
        {
            XNamespace w =
                "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XNamespace r =
                "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

            using (WordprocessingDocument myDoc =
                WordprocessingDocument.Open("Test.docx", true))
            {
                string altChunkId = "AltChunkId1";
                MainDocumentPart mainPart = myDoc.MainDocumentPart;
                AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                  "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",
                  altChunkId);
                using (FileStream fileStream =
                    File.Open("TestInsertedContent.docx", FileMode.Open))
                    chunk.FeedData(fileStream);
                XElement altChunk = new XElement(w + "altChunk",
                    new XAttribute(r + "id", altChunkId)
                );
                XDocument mainDocumentXDoc = GetXDocument(myDoc);
                // Add the altChunk element after the last paragraph.
                mainDocumentXDoc.Root
                    .Element(w + "body")
                    .Elements(w + "p")
                    .Last()
                    .AddAfterSelf(altChunk);
                SaveXDocument(myDoc, mainDocumentXDoc);
            }
        }

        private static void SaveXDocument(WordprocessingDocument myDoc,
            XDocument mainDocumentXDoc)
        {
            // Serialize the XDocument back into the part
            using (Stream str = myDoc.MainDocumentPart.GetStream(
                FileMode.Create, FileAccess.Write))
            using (XmlWriter xw = XmlWriter.Create(str))
                mainDocumentXDoc.Save(xw);
        }

        private static XDocument GetXDocument(WordprocessingDocument myDoc)
        {
            // Load the main document part into an XDocument
            XDocument mainDocumentXDoc;
            using (Stream str = myDoc.MainDocumentPart.GetStream())
            using (XmlReader xr = XmlReader.Create(str))
                mainDocumentXDoc = XDocument.Load(xr);
            return mainDocumentXDoc;
        }
    }


    To use altChunk with HTML, the code looks like this:

    using (WordprocessingDocument myDoc =
        WordprocessingDocument.Open("Test3.docx", true))
    {
        string html =
          @"<html>
                <head/>
                <body>
                    <h1>Html Heading</h1>
                    <p>This is an html document in a string literal.</p>
                </body>
            </html>";
        string altChunkId = "AltChunkId1";
        MainDocumentPart mainPart = myDoc.MainDocumentPart;
        AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
            "application/xhtml+xml", altChunkId);
        using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
        using (StreamWriter stringStream = new StreamWriter(chunkStream))
            stringStream.Write(html);
        XElement altChunk = new XElement(w + "altChunk",
            new XAttribute(r + "id", altChunkId)
        );
        XDocument mainDocumentXDoc = GetXDocument(myDoc);
        mainDocumentXDoc.Root
            .Element(w + "body")
            .Elements(w + "p")
            .Last()
            .AddAfterSelf(altChunk);
        SaveXDocument(myDoc, mainDocumentXDoc);
    }


    Using V2 of the Open XML SDK:

    using (WordprocessingDocument myDoc =
        WordprocessingDocument.Open("Test1.docx", true))
    {
        string altChunkId = "AltChunkId1";
        MainDocumentPart mainPart = myDoc.MainDocumentPart;
        AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
            AlternativeFormatImportPartType.WordprocessingML, altChunkId);
        using (FileStream fileStream = File.Open("TestInsertedContent.docx", FileMode.Open))
            chunk.FeedData(fileStream);
        AltChunk altChunk = new AltChunk();
        altChunk.Id = altChunkId;
        mainPart.Document
            .Body
            .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
        mainPart.Document.Save();
    }


    The attached code shows examples of placing an Open XML document, html, and text into an alternative content part.  I’ve provided two versions of the example – one using V1 of the Open XML SDK (and LINQ to XML), and another using V2 of the Open XML SDK.

Page 3 of 20 (20 items) 12345»
Page 2 of 2 (20 items) 12