Blog - Title

How to Use altChunk for Document Assembly

How to Use altChunk for Document Assembly

  • Comments 54

Merging multiple word processing documents into a single document is something that many people want to do.  An application built for attorneys might assemble selected standard clauses into a contract.  An application built for book publishers can assemble chapters of a book into a single document.  This post explains the semantics of the altChunk element, and provides some code using the Open XML SDK that shows how to use altChunk.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Instead of using altChunk, you could write a program to merge the Open XML markup for documents.  You would need to deal with a number of issues, including merging style sheets and resolving conflicting styles, merging the comments from all of the documents, merging bookmarks, and more.  This is doable, but it’s a lot of work.  You can use altChunk to let Word 2007 do the heavy lifting for you.

altChunk is a powerful technique.  It’s a tool that should be in every Open XML developer’s toolbox.  In an upcoming post, I’ll show an example of the use of altChunk in a SharePoint application.  You can create compelling document assembly solutions in SharePoint using altChunk.

Overview of the altChunk Markup

The altChunk markup tells the consuming application to import content into the document.  This behavior is not required for a conforming application – a conforming application is free to ignore the altChunk markup.  However, the standard recommends that if the application ignores the altChunk markup, it should notify the user.  Word 2007 supports altChunk.

To use altChunk, you do the following:

  • You create a new part in the package.  The part can have a number of content types, listed below.  When you create the part, you assign a unique ID to the part.
  • You store the content that you want to import into the part.  You can import a variety of types of content, including another Open XML word processing document, HTML, or text.
  • The main document part has a relationship to the alternative format part.
  • You add a w:altChunk element at the location where you want to import the alternative format content.  The r:id attribute of the w:altChunk element identifies the chunk to import.  The w:altChunk element is a sibling to paragraph elements (w:p).  You can add an altChunk element at any point in the markup that can contain a paragraph element.

A few options for content types that can be imported into a document are:

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml

The alternative format content part contains an Open XML document in binary form.

  • application/xhtml+xml

The alternative format content part contains an XHTML document.

  • text/plain

The alternative format content part contains text.

There are more than these three options; the code presented in this post shows how to implement altChunk for these three types of content.

The altChunk markup in the document looks like this:

<w:p>
  <w:r>
    <w:t>Paragraph before.</w:t>
  </w:r>
</w:p>
<w:altChunkr:id="AltChunkId1" />
<w:p>
  <w:r>
    <w:t>Paragraph after.</w:t>
  </w:r>
</w:p>


altChunk: Import Only

One important note about altChunk – it is used only for importing content.  If you open the document using Word 2007 and save it, the newly saved document will not contain the alternative format content part, nor the altChunk markup that references it.  Word saves all imported content as paragraph (w:p) elements.  The standard requires this behavior from a conforming application.

Using altChunk

The following screen-clipping shows a simple word processing document.  It has a heading, a paragraph styled as Normal, and a comment:

The following screen-clipping shows another word processing document, with content that we want to insert into the first document.

After running the example program included with this post, the resulting document looks like the following.  Notice that the resulting document has comments from both of the source documents:

The following example shows how to merge two Open XML documents using altChunk.  It uses V1 of the Open XML SDK, and LINQ to XML:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        XNamespace w =
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
        XNamespace r =
            "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

        using (WordprocessingDocument myDoc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            string altChunkId = "AltChunkId1";
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
              "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",
              altChunkId);
            using (FileStream fileStream =
                File.Open("TestInsertedContent.docx", FileMode.Open))
                chunk.FeedData(fileStream);
            XElement altChunk = new XElement(w + "altChunk",
                new XAttribute(r + "id", altChunkId)
            );
            XDocument mainDocumentXDoc = GetXDocument(myDoc);
            // Add the altChunk element after the last paragraph.
            mainDocumentXDoc.Root
                .Element(w + "body")
                .Elements(w + "p")
                .Last()
                .AddAfterSelf(altChunk);
            SaveXDocument(myDoc, mainDocumentXDoc);
        }
    }

    private static void SaveXDocument(WordprocessingDocument myDoc,
        XDocument mainDocumentXDoc)
    {
        // Serialize the XDocument back into the part
        using (Stream str = myDoc.MainDocumentPart.GetStream(
            FileMode.Create, FileAccess.Write))
        using (XmlWriter xw = XmlWriter.Create(str))
            mainDocumentXDoc.Save(xw);
    }

    private static XDocument GetXDocument(WordprocessingDocument myDoc)
    {
        // Load the main document part into an XDocument
        XDocument mainDocumentXDoc;
        using (Stream str = myDoc.MainDocumentPart.GetStream())
        using (XmlReader xr = XmlReader.Create(str))
            mainDocumentXDoc = XDocument.Load(xr);
        return mainDocumentXDoc;
    }
}


To use altChunk with HTML, the code looks like this:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test3.docx", true))
{
    string html =
      @"<html>
            <head/>
            <body>
                <h1>Html Heading</h1>
                <p>This is an html document in a string literal.</p>
            </body>
        </html>";
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        "application/xhtml+xml", altChunkId);
    using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
    using (StreamWriter stringStream = new StreamWriter(chunkStream))
        stringStream.Write(html);
    XElement altChunk = new XElement(w + "altChunk",
        new XAttribute(r + "id", altChunkId)
    );
    XDocument mainDocumentXDoc = GetXDocument(myDoc);
    mainDocumentXDoc.Root
        .Element(w + "body")
        .Elements(w + "p")
        .Last()
        .AddAfterSelf(altChunk);
    SaveXDocument(myDoc, mainDocumentXDoc);
}


Using V2 of the Open XML SDK:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test1.docx", true))
{
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.WordprocessingML, altChunkId);
    using (FileStream fileStream = File.Open("TestInsertedContent.docx", FileMode.Open))
        chunk.FeedData(fileStream);
    AltChunk altChunk = new AltChunk();
    altChunk.Id = altChunkId;
    mainPart.Document
        .Body
        .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
    mainPart.Document.Save();
}


The attached code shows examples of placing an Open XML document, html, and text into an alternative content part.  I’ve provided two versions of the example – one using V1 of the Open XML SDK (and LINQ to XML), and another using V2 of the Open XML SDK.

Attachment: altChunk.zip
Leave a Comment
  • Please add 3 and 2 and type the answer here:
  • Post
  • Hi Eric,

    Interesting post.

    I have just been reading up on OpenXML and it looks like a great solution to my document assembly problem.

    Is it possible to combine excel tables/charts and powerpoint slides into a word document using OpenXML.

    Clearly altChunk wouldn't be the method as it only works with Word/XML/XTML files but would it work for Excel/Powerpoint elements embedded into Word?

    Ed

  • Hi Eric,  Great bit of code - nearly exactly what I was looking for.  I seem to have a problem though if each sub-document has a different header - the headers seem to get lost.  Any ideas?

    Terry

  • Hi Eric,

    Thanks for the code sample.

    I am facing a problem with the bullets & numbering when using altChunk to merge two word documents (office 2003 .doc documents converted to .docx using OFC.exe). The code I am using is given below.

               string oriDoc = @"C:\Final.docx";

               string mergedDocPath= @"C:\A.docx";

               using (WordprocessingDocument doc = WordprocessingDocument.Open(oriDoc, true))

               {

                   IEnumerator<Locked> enumerator = doc.MainDocumentPart.StyleDefinitionsPart.Styles.Descendants<Locked>().GetEnumerator();

                   while (enumerator.MoveNext() == true)

                       enumerator.Current.Val = BooleanValues.True; //Tried using False as well, but it doesnt make sense here.

                   doc.MainDocumentPart.Document.Save();

                   Paragraph paragraph = doc.MainDocumentPart.Document.Descendants<Paragraph>().Last();

                   AlternativeFormatImportPart importPart = doc.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML);

                   using (StreamReader reader = new StreamReader(mergedDocPath, true))

                       importPart.FeedData(reader.BaseStream);

                   AltChunk altChunk = new AltChunk();

                   altChunk.AltChunkProperties = new AltChunkProperties();

                   altChunk.AltChunkProperties.MatchSource = new MatchSource();

                   altChunk.AltChunkProperties.MatchSource.Val = BooleanValues.True;//Tried using False as well

                   altChunk.Id = doc.MainDocumentPart.GetIdOfPart(importPart);

                   paragraph.InsertAfterSelf(altChunk);

                   doc.MainDocumentPart.Document.Save();

               }

    A.docx originally looks like this,

    ----------------------------------------------------------------------------------------------------

    Diggity dog

    ffdgfgfdg

    first time dfvidjgldgdgm

    dfsfsdfgdgdfgghfgghfh:

    1.       Zoom vroom

    2.       doom boom

    3.       dhgfhfghgfhfghgfhgfhgfh dsfsfsddfsfdsfgdsfdgffg fgffdgdghfdh fsfgf fdsfsfsdfdsf:

    4.       Sweeetdfvggdggf

    a.       dfsfvcff

                                                                  i.      why Go

                                                                ii.      jeremy

                                                               iii.      black

    ----------------------------------------------------------------------------------------------------

    After merging the formatting becomes like this,

    ----------------------------------------------------------------------------------------------------

    Diggity dog

    ffdgfgfdg

    first time dfvidjgldgdgm

    dfsfsdfgdgdfgghfgghfh:

    • Zoom vroom

    • doom boom

    • dhgfhfghgfhfghgfhgfhgfh dsfsfsddfsfdsfgdsfdgffg fgffdgdghfdh fsfgf fdsfsfsdfdsf:

    • Sweeetdfvggdggf

    • dfsfvcff

    • why Go

    • jeremy

    • black

    ----------------------------------------------------------------------------------------------------

    Something similar happens to bullets too. The bullets style changes to the bullets styling of "Final.Docx".

    On checking the afchunk the bullet & numbering were correct, which indicates that the parent document superimposes its bulleting & numbering on the chunk.

    I thought about setting DocumentProtection.Enforcement and DocumentProtection.Formatting to false. Also I tried setting AutoFormatOverride.Val to false. But I couldn't find a way to do that. Also will setting these help?

    Also does setting AltChunk.Id manually rather than by using MainDocumentPart.GetIdOfPart cause a difference?

    If the above method does not work, should I instead take all the Styles from the second document and merge them into the first document? Although this does not look like the right way to go about doing things.

    Thanks,

    Anand.

  • Stephen McGibbon has screenshots of the Open XML and ODF support coming in Windows 7 Wordpad , as announced

  • Hi, Ed, Terry, and Anand,

    Thanks for the great questions.  I'll be responding to these, but it may be as late as the end of next week, due to schedule constraints.  Thanks for your patience.

    -Eric

  • Suite à la PDC 2008 et au workshop Open XML donné par Microsoft à Redmond ( Doug , encore mille excuses

  • I received this message privately, but the question and the response are relevant to many, so including it here.

    Question:

    I'm attempting to merge multiple documents (which contain rows of a table) into a single document.  When the merge process happens, I get what looks to be a paragraph marker between my table rows (so there's visual seperation between the rows of the table, wich isn't what I want).

    Any thoughts on how to modify altChunk's behavior to not include the document delimeter between the documents that it merges?

    My response:

    I've seen this same behavior, and as far as I know, this is behavior that is not configurable in Word.  I'll check, but would guess that this can't be changed.

    The solution to this is to write some utility that can move content between docs (not using altChunk).  I'm starting on the prep work for this.  See this post:

    http://blogs.msdn.com/ericwhite/archive/2008/11/03/inserting-deleting-moving-paragraphs-in-open-xml-wordprocessing-documents.aspx

    -Eric

  • Eric, I am having the excact problem as Anand. Maybe you have a good solution for this.

    Rather strange that it is not possible to do inline numbering type in the document.xml itself.

  • Hi,

    i have few doubts

    1. is it possible to view a altchunk from word 2007 or it can be view only in xml format

    2. can we insert the contents in between the documents?

  • One of the most common requests we hear related to word processing documents is the ability to merge

  • Hi,

    I try to mergedonc and then making some string replace using this code : http://www.codeproject.com/KB/office/OfficeTokenReplacement.aspx

    It'sdoing some regex on thewhole xml

    but when I use chunk, the unziped embeded content is under AltChunk1.docx and I have to uzip after.

    I first tried with the PDC source code

    //Find all content controls in document

                   List<SdtBlock> sdtList = mainPart.Document

                       .Descendants<SdtBlock>().Where(s => sourceFile

                           .Contains(s.SdtProperties

                               .GetFirstChild<Alias>().Val.Value)).ToList();

                   //Go through all the content controls

                   if (sdtList.Count != 0)

                   {

                       string altChunkId = "AltChunkId" + id;

                       id++;

                       //Add altchunk into document

                       AlternativeFormatImportPart chunk =

                           mainPart.AddAlternativeFormatImportPart(

                           "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",

                           altChunkId);

                       //stream data from source file into altchunk

                       chunk.FeedData(File.Open(sourceFile, FileMode.Open));

                       //Create new altchunk element

                       AltChunk altChunk = new AltChunk();

                       altChunk.Id = altChunkId;

                       //Swap out content control for altchunk

                       foreach (SdtBlock sdt in sdtList)

                       {

                           OpenXmlElement parent = sdt.Parent;

                           parent.InsertAfter(altChunk, sdt);

                           sdt.Remove();

                       }

                       //Save

                       mainPart.Document.Save();

                   }

    but I only have paragraph and no SdtBlock ?

    Could you please help me !!

  • Hi Eric ...Can this be used with word 2003?

  • How can I insert an AltChunk at a special place ?

  • Hi Eric, Nice sample of code. I am using some html as altChunk. Its working for plain html but, if the html contains some images, the images are not coming. I understand the problem as images are not in the scope of the document. As you have mentioned that whwn the document with alt chunk is saved by MS Word2007, it converts all the altChunk to WordML. My question is whether can we do the same(converting HTML to WordML).

    It wil be a great help for my project.

  • Resolution ================ Step 1: Open a new Microsoft Word 2007 document and type A B C Save the document

Page 1 of 4 (54 items) 1234