Blog - Title

How to Use altChunk for Document Assembly

How to Use altChunk for Document Assembly

  • Comments 54

Merging multiple word processing documents into a single document is something that many people want to do.  An application built for attorneys might assemble selected standard clauses into a contract.  An application built for book publishers can assemble chapters of a book into a single document.  This post explains the semantics of the altChunk element, and provides some code using the Open XML SDK that shows how to use altChunk.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Instead of using altChunk, you could write a program to merge the Open XML markup for documents.  You would need to deal with a number of issues, including merging style sheets and resolving conflicting styles, merging the comments from all of the documents, merging bookmarks, and more.  This is doable, but it’s a lot of work.  You can use altChunk to let Word 2007 do the heavy lifting for you.

altChunk is a powerful technique.  It’s a tool that should be in every Open XML developer’s toolbox.  In an upcoming post, I’ll show an example of the use of altChunk in a SharePoint application.  You can create compelling document assembly solutions in SharePoint using altChunk.

Overview of the altChunk Markup

The altChunk markup tells the consuming application to import content into the document.  This behavior is not required for a conforming application – a conforming application is free to ignore the altChunk markup.  However, the standard recommends that if the application ignores the altChunk markup, it should notify the user.  Word 2007 supports altChunk.

To use altChunk, you do the following:

  • You create a new part in the package.  The part can have a number of content types, listed below.  When you create the part, you assign a unique ID to the part.
  • You store the content that you want to import into the part.  You can import a variety of types of content, including another Open XML word processing document, HTML, or text.
  • The main document part has a relationship to the alternative format part.
  • You add a w:altChunk element at the location where you want to import the alternative format content.  The r:id attribute of the w:altChunk element identifies the chunk to import.  The w:altChunk element is a sibling to paragraph elements (w:p).  You can add an altChunk element at any point in the markup that can contain a paragraph element.

A few options for content types that can be imported into a document are:

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml

The alternative format content part contains an Open XML document in binary form.

  • application/xhtml+xml

The alternative format content part contains an XHTML document.

  • text/plain

The alternative format content part contains text.

There are more than these three options; the code presented in this post shows how to implement altChunk for these three types of content.

The altChunk markup in the document looks like this:

<w:p>
  <w:r>
    <w:t>Paragraph before.</w:t>
  </w:r>
</w:p>
<w:altChunkr:id="AltChunkId1" />
<w:p>
  <w:r>
    <w:t>Paragraph after.</w:t>
  </w:r>
</w:p>


altChunk: Import Only

One important note about altChunk – it is used only for importing content.  If you open the document using Word 2007 and save it, the newly saved document will not contain the alternative format content part, nor the altChunk markup that references it.  Word saves all imported content as paragraph (w:p) elements.  The standard requires this behavior from a conforming application.

Using altChunk

The following screen-clipping shows a simple word processing document.  It has a heading, a paragraph styled as Normal, and a comment:

The following screen-clipping shows another word processing document, with content that we want to insert into the first document.

After running the example program included with this post, the resulting document looks like the following.  Notice that the resulting document has comments from both of the source documents:

The following example shows how to merge two Open XML documents using altChunk.  It uses V1 of the Open XML SDK, and LINQ to XML:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        XNamespace w =
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
        XNamespace r =
            "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

        using (WordprocessingDocument myDoc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            string altChunkId = "AltChunkId1";
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
              "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",
              altChunkId);
            using (FileStream fileStream =
                File.Open("TestInsertedContent.docx", FileMode.Open))
                chunk.FeedData(fileStream);
            XElement altChunk = new XElement(w + "altChunk",
                new XAttribute(r + "id", altChunkId)
            );
            XDocument mainDocumentXDoc = GetXDocument(myDoc);
            // Add the altChunk element after the last paragraph.
            mainDocumentXDoc.Root
                .Element(w + "body")
                .Elements(w + "p")
                .Last()
                .AddAfterSelf(altChunk);
            SaveXDocument(myDoc, mainDocumentXDoc);
        }
    }

    private static void SaveXDocument(WordprocessingDocument myDoc,
        XDocument mainDocumentXDoc)
    {
        // Serialize the XDocument back into the part
        using (Stream str = myDoc.MainDocumentPart.GetStream(
            FileMode.Create, FileAccess.Write))
        using (XmlWriter xw = XmlWriter.Create(str))
            mainDocumentXDoc.Save(xw);
    }

    private static XDocument GetXDocument(WordprocessingDocument myDoc)
    {
        // Load the main document part into an XDocument
        XDocument mainDocumentXDoc;
        using (Stream str = myDoc.MainDocumentPart.GetStream())
        using (XmlReader xr = XmlReader.Create(str))
            mainDocumentXDoc = XDocument.Load(xr);
        return mainDocumentXDoc;
    }
}


To use altChunk with HTML, the code looks like this:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test3.docx", true))
{
    string html =
      @"<html>
            <head/>
            <body>
                <h1>Html Heading</h1>
                <p>This is an html document in a string literal.</p>
            </body>
        </html>";
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        "application/xhtml+xml", altChunkId);
    using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
    using (StreamWriter stringStream = new StreamWriter(chunkStream))
        stringStream.Write(html);
    XElement altChunk = new XElement(w + "altChunk",
        new XAttribute(r + "id", altChunkId)
    );
    XDocument mainDocumentXDoc = GetXDocument(myDoc);
    mainDocumentXDoc.Root
        .Element(w + "body")
        .Elements(w + "p")
        .Last()
        .AddAfterSelf(altChunk);
    SaveXDocument(myDoc, mainDocumentXDoc);
}


Using V2 of the Open XML SDK:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test1.docx", true))
{
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.WordprocessingML, altChunkId);
    using (FileStream fileStream = File.Open("TestInsertedContent.docx", FileMode.Open))
        chunk.FeedData(fileStream);
    AltChunk altChunk = new AltChunk();
    altChunk.Id = altChunkId;
    mainPart.Document
        .Body
        .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
    mainPart.Document.Save();
}


The attached code shows examples of placing an Open XML document, html, and text into an alternative content part.  I’ve provided two versions of the example – one using V1 of the Open XML SDK (and LINQ to XML), and another using V2 of the Open XML SDK.

Attachment: altChunk.zip
Leave a Comment
  • Please add 7 and 8 and type the answer here:
  • Post
  • AltChunk rocks! However, I can't get it to work inside a headerpart. Inside the maindocumentpart it works perfect. Is this supported at all?

    You can check the broken document here: http://blogs.infosupport.com/cfs-file.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/porint/order.docx

  • Beautiful work Eric.  Worked like a charm.

    I have captured content using InfoPath forms in Moss and wished to export the content to a word document.  the content control do not allow you to map the content directly by including the customxml parts.  This option sure worked.

    Cheers

  • Thanks for the awesome post!

    I'm also trying to assemble different types of office documents (excel, word, powerpoint) into a single document.  Your post really helped me with combining word documents, but I'm not sure how to proceed with the other types (excel, powerpoint).  Any suggestions?

  • What is best practice for assembling a document from a database source?

    Is it to use content controls? To use AltChunks? To use content controls replaced at runtime with AtlChunks? To use custom markup replaced at runtime with database content; e.g. http://geekswithblogs.net/DanBedassa/archive/2009/01/16/dynamically-generating-word-2007-.docx-documents-using-.net.aspx or http://msdn.microsoft.com/en-us/library/cc850835(office.14).aspx).

  • I need to create a .docx file from an html document that includes image tags with a src element pointing to a url. Is there any way make to sure that images contained in the HTML document are put into the Word package, so that Word will render the images without internet connectivity? When I read your "altChunk: Import Only" section, I was hopeful that Word might actually accomplish this for me. I can't get that to happen, however. Am I missing something?

    Also, you mention in that section that you must "open the file and save it" for the altchunk stuff to be removed. Is there anyway to do that programmatically? (rather than requiring the user to open and save). In fact, I've had to actually EDIT & save before the "auto-pruning" would take place. Any suggestions?

    Thanks!

  • hi Kmote,

    I don't know of any way to accomplish what you're trying to do.  As far as I know, you are correct - you must make a small change in the file and save it.

    Ultimately, the solution to this is to have an html => open xml converter in code that you can modify for your specific needs.  I have this on my todo list - but it will be some time before I can get to this.

    I wish I had a better answer for you, but I don't.

    -Eric

  • Eric,

    Thanks for this post! Any advice on merging Excel documents into a Word document? I have been searching the internet up and down and your blog is by far the best.

    Thanks,

    Matt

  • Hi, really good post, I tried to import a mht file, but it doesn't work. Do you know the content type for this?

  • Fantastic Eric. Just what I needed. I can't tell you how much time and grief you probably saved me... Thanks!

  • Hi Eric,

    I was planning to use altchunk to insert html text to word template that have custom xml tags (pink tags from schema). My requirement is that user will create template with xml tag. I will read the tag name using xpath or linq to xml and replace the node with altchunk. But since we cannot use Custom XMl tags (pink tags) as per Gray's blog what is the alternative solution? How can i map the content control with my xml schema tag so that i can insert altchunk. There would be several such tags and each will have different html text.

    Here is the link from Gray's blog

    http://blogs.technet.com/gray_knowlton/archive/2009/12/23/what-is-custom-xml-and-the-impact-of-the-i4i-judgment-on-word.aspx?CommentPosted=true#commentmessage

    Any help is appreciated.

  • Hi Eric,

    Looks like its been a while since your last post. I am trying like many others to merge the headers in as well. I can merge the documents no problems but only the first documents header and footer get saved to the final document. Is there a way around this?

    Chris

  • Hi Chris, If you want to control sections and headers, have you taken a look at using DocumentBuilder instead of altChunk?

    http://blogs.msdn.com/ericwhite/archive/2010/01/08/how-to-control-sections-when-using-openxml-powertools-documentbuilder.aspx

    For a comparison of the two approaches:

    http://blogs.msdn.com/ericwhite/archive/2009/04/19/comparison-of-altchunk-to-the-documentbuilder-class.aspx

    -Eric

  • Hi Eric,

    Yes we have looked at DocumentBuilder but as we are using xml documents, streams and office documents it is not really suitable.

    I have decided to take the longer route of a copying everything manually into to each document( we loop through them, depending on how many are selected), making sure that all style-references, header-references, footer-references, ... are preserved.

    I have been using the reflector tool to see how this is created but cannot seem to find the rsid values for each paragraph to add to the properties. Below is what i have so far,

    Dim paraRef = mainPart.GetIdOfPart(mainPart.Document.MainDocumentPart)

    Dim para As Paragraph = New Paragraph With {.RsidParagraphAddition = paraRef}

    Dim parid As String = mainPart.Document.MainDocumentPart.GetIdOfPart(mainPart.Document.MainDocumentPart)

    Dim headid As String = mainPart.Document.MainDocumentPart.GetIdOfPart(mainPart.Document.MainDocumentPart.HeaderParts).ToString

    Dim footid As String = mainPart.Document.MainDocumentPart.GetIdOfPart(mainPart.Document.MainDocumentPart.FooterParts).ToString

    Dim headRef As HeaderReference = New HeaderReference With {.Id = headid, .Type = HeaderFooterValues.First}

    Dim footRef As FooterReference = New FooterReference With {.Id = footid, .Type = HeaderFooterValues.First}

    Dim title As TitlePage = New TitlePage()

    Dim paraProp As ParagraphProperties = New ParagraphProperties

    Dim sectionProperty As SectionProperties = New SectionProperties

    sectionProperty.Append(headRef)

    sectionProperty.Append(footRef)

    sectionProperty.Append(title)

    paraProp.Append(sectionProperty)

    para.Append(paraProp)

    What am i missing?

  • Hi Chris,

    First thing - I modified DocumentBuilder a while ago so that it works just fine with streams and in-memory documents.  I'm not quite clear why you can't use it.

    Regarding Rsid elements and attributes, you really don't need to add those.  Those are only used for a fairly obscure scenario where I pass a single document to two people, who separately edit it, and then the results are merged back into a single document.  If you are programmatically assembling a document, then almost by definition, you don't care about Rsid elements and attributes.  You can discard those in the generated document.

    Regarding your example, it is not clear to me what is missing.  In general, I take the approach of creating the resulting document exactly as I want it using Word, and then looking at the resulting markup.

    -Eric

  • Hi Eric,

    Sorry iv just realised this is the case, I misread the error we recieved and gave up on it a little to quick. Iv managed to get the DocumentBuilder working as we like it however it only works with DocumentFormat.openxml.dll v:1.0.1825.0.

    We need to use v:2.0.5022.0 to make use of some other features that we use. If we use the v2 with the DocumentBuilder then we get a null value for part here:

    public static XDocument GetXDocument(this OpenXmlPart part)

           {

               XDocument xdoc = part.Annotation<XDocument>();

    This same code works with v:1.0.18

    Stack Trace as follows

    StackTrace "   at OpenXml.PowerTools.DocumentExtensions.GetXDocument(OpenXmlPart part) in C:\Users\cbertrand\Desktop\Open_XML_PowerTools\Classes\DocumentExtensions.cs:line 36    at OpenXml.PowerTools.Source..ctor(WordprocessingDocument source, Boolean keepSections) in C:\Users\cbertrand\Desktop\Open_XML_PowerTools\Classes\DocumentBuilder.cs:line 39    at DocBuildTest._Default.mergedocs() in C:\Projects\DocBuildTest\DocBuildTest\Default.aspx.vb:line 36" String

    Is there a resolution for this?

    Thanks for all your help so far

    Chris

Page 3 of 4 (54 items) 1234