Blog - Title

How to Use altChunk for Document Assembly

How to Use altChunk for Document Assembly

  • Comments 54

Merging multiple word processing documents into a single document is something that many people want to do.  An application built for attorneys might assemble selected standard clauses into a contract.  An application built for book publishers can assemble chapters of a book into a single document.  This post explains the semantics of the altChunk element, and provides some code using the Open XML SDK that shows how to use altChunk.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Instead of using altChunk, you could write a program to merge the Open XML markup for documents.  You would need to deal with a number of issues, including merging style sheets and resolving conflicting styles, merging the comments from all of the documents, merging bookmarks, and more.  This is doable, but it’s a lot of work.  You can use altChunk to let Word 2007 do the heavy lifting for you.

altChunk is a powerful technique.  It’s a tool that should be in every Open XML developer’s toolbox.  In an upcoming post, I’ll show an example of the use of altChunk in a SharePoint application.  You can create compelling document assembly solutions in SharePoint using altChunk.

Overview of the altChunk Markup

The altChunk markup tells the consuming application to import content into the document.  This behavior is not required for a conforming application – a conforming application is free to ignore the altChunk markup.  However, the standard recommends that if the application ignores the altChunk markup, it should notify the user.  Word 2007 supports altChunk.

To use altChunk, you do the following:

  • You create a new part in the package.  The part can have a number of content types, listed below.  When you create the part, you assign a unique ID to the part.
  • You store the content that you want to import into the part.  You can import a variety of types of content, including another Open XML word processing document, HTML, or text.
  • The main document part has a relationship to the alternative format part.
  • You add a w:altChunk element at the location where you want to import the alternative format content.  The r:id attribute of the w:altChunk element identifies the chunk to import.  The w:altChunk element is a sibling to paragraph elements (w:p).  You can add an altChunk element at any point in the markup that can contain a paragraph element.

A few options for content types that can be imported into a document are:

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml

The alternative format content part contains an Open XML document in binary form.

  • application/xhtml+xml

The alternative format content part contains an XHTML document.

  • text/plain

The alternative format content part contains text.

There are more than these three options; the code presented in this post shows how to implement altChunk for these three types of content.

The altChunk markup in the document looks like this:

<w:p>
  <w:r>
    <w:t>Paragraph before.</w:t>
  </w:r>
</w:p>
<w:altChunkr:id="AltChunkId1" />
<w:p>
  <w:r>
    <w:t>Paragraph after.</w:t>
  </w:r>
</w:p>


altChunk: Import Only

One important note about altChunk – it is used only for importing content.  If you open the document using Word 2007 and save it, the newly saved document will not contain the alternative format content part, nor the altChunk markup that references it.  Word saves all imported content as paragraph (w:p) elements.  The standard requires this behavior from a conforming application.

Using altChunk

The following screen-clipping shows a simple word processing document.  It has a heading, a paragraph styled as Normal, and a comment:

The following screen-clipping shows another word processing document, with content that we want to insert into the first document.

After running the example program included with this post, the resulting document looks like the following.  Notice that the resulting document has comments from both of the source documents:

The following example shows how to merge two Open XML documents using altChunk.  It uses V1 of the Open XML SDK, and LINQ to XML:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        XNamespace w =
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
        XNamespace r =
            "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

        using (WordprocessingDocument myDoc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            string altChunkId = "AltChunkId1";
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
              "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",
              altChunkId);
            using (FileStream fileStream =
                File.Open("TestInsertedContent.docx", FileMode.Open))
                chunk.FeedData(fileStream);
            XElement altChunk = new XElement(w + "altChunk",
                new XAttribute(r + "id", altChunkId)
            );
            XDocument mainDocumentXDoc = GetXDocument(myDoc);
            // Add the altChunk element after the last paragraph.
            mainDocumentXDoc.Root
                .Element(w + "body")
                .Elements(w + "p")
                .Last()
                .AddAfterSelf(altChunk);
            SaveXDocument(myDoc, mainDocumentXDoc);
        }
    }

    private static void SaveXDocument(WordprocessingDocument myDoc,
        XDocument mainDocumentXDoc)
    {
        // Serialize the XDocument back into the part
        using (Stream str = myDoc.MainDocumentPart.GetStream(
            FileMode.Create, FileAccess.Write))
        using (XmlWriter xw = XmlWriter.Create(str))
            mainDocumentXDoc.Save(xw);
    }

    private static XDocument GetXDocument(WordprocessingDocument myDoc)
    {
        // Load the main document part into an XDocument
        XDocument mainDocumentXDoc;
        using (Stream str = myDoc.MainDocumentPart.GetStream())
        using (XmlReader xr = XmlReader.Create(str))
            mainDocumentXDoc = XDocument.Load(xr);
        return mainDocumentXDoc;
    }
}


To use altChunk with HTML, the code looks like this:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test3.docx", true))
{
    string html =
      @"<html>
            <head/>
            <body>
                <h1>Html Heading</h1>
                <p>This is an html document in a string literal.</p>
            </body>
        </html>";
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        "application/xhtml+xml", altChunkId);
    using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
    using (StreamWriter stringStream = new StreamWriter(chunkStream))
        stringStream.Write(html);
    XElement altChunk = new XElement(w + "altChunk",
        new XAttribute(r + "id", altChunkId)
    );
    XDocument mainDocumentXDoc = GetXDocument(myDoc);
    mainDocumentXDoc.Root
        .Element(w + "body")
        .Elements(w + "p")
        .Last()
        .AddAfterSelf(altChunk);
    SaveXDocument(myDoc, mainDocumentXDoc);
}


Using V2 of the Open XML SDK:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test1.docx", true))
{
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.WordprocessingML, altChunkId);
    using (FileStream fileStream = File.Open("TestInsertedContent.docx", FileMode.Open))
        chunk.FeedData(fileStream);
    AltChunk altChunk = new AltChunk();
    altChunk.Id = altChunkId;
    mainPart.Document
        .Body
        .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
    mainPart.Document.Save();
}


The attached code shows examples of placing an Open XML document, html, and text into an alternative content part.  I’ve provided two versions of the example – one using V1 of the Open XML SDK (and LINQ to XML), and another using V2 of the Open XML SDK.

Attachment: altChunk.zip
Leave a Comment
  • Please add 6 and 3 and type the answer here:
  • Post
  • Hi Eric,

    Just noticed i haven't stated the exact error:

    Object reference not set to an instance of an object.

    In addition it is the: Imports DocumentFormat.OpenXml.Wordprocessing statement that is not available with the previous dll, which means docx.MainDocumentPart.Document cannot be found and type SimpleField is not declared.

    We use these as we have mergefields in each document which get populated upon creation.

    Chris

  • Hi Eric,

    After merging multiple docx into single document , how can I update the source docx files if the user modify the content of the assembled document?  

  • Hi Eric,

    I'm merging HTML documents into Word documents via altChunk, but I'd like the style from the

    Word documents to be applied to the HTML documents. I've tried putting the altChunk inside of a paragraph, run and have even done it without the surrounding sdt tags, but still can't get it to work. Have any suggestions? Here is some sample markup:

    <w:sdt>

                       <w:sdtPr>

                         <w:alias w:val="description" />

                         <w:tag w:val="description" />

                       </w:sdtPr>

                       <w:sdtContent>

                         <w:p w:rsidR="00275992" w:rsidRPr="00275992" w:rsidRDefault="00275992" w:rsidP="00EC68D3">

                           <w:pPr>

                             <w:pStyle w:val="LineItemTable" />

                           </w:pPr>

                           <w:r>

                             <w:altChunk r:id="raac4be36-f977-4735-9ffc-a5cbf35dd6d5">

                               <w:altChunkPr>

                                 <w:matchSrc w:val="false" />

                               </w:altChunkPr>

                             </w:altChunk>

                           </w:r>

                         </w:p>

                       </w:sdtContent>

                     </w:sdt>

  • Hi,

    I am trying to merge word documents in sharepoint document library. Some pages in the docs are in portrait and some in landscape. after merging documents all the pages in the documents r displayed in portrait mode. how can i retain page orientation programmatically ?

    i think we can do it by inserting section properties after each page or each document.

    here is  my code

    Appreciate your help..

               foreach (SPFile item in listitem.Folder.Files)

               {

                 //  SPFile inputFile = item.File;

                   SPFile inputFile = item;

                   string altChunkId = "AltChunkId" + id;

                   id++;

                   byte[] byteArray = inputFile.OpenBinary();

                   AlternativeFormatImportPart chunk = outputDoc.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML,

                       altChunkId);

                   using (MemoryStream mem = new MemoryStream())

                   {

                       mem.Write(byteArray, 0, (int)byteArray.Length);

                       mem.Seek(0, SeekOrigin.Begin);

                       chunk.FeedData(mem);

                   }

                   AltChunk altChunk = new AltChunk();

                   altChunk.Id = altChunkId;

                   outputDoc.MainDocumentPart.Document.Body.InsertAfter(altChunk,

                       outputDoc.MainDocumentPart.Document.Body.Elements<Paragraph>().Last());

                   outputDoc.MainDocumentPart.Document.Save();

               }

               outputDoc.Close();

               memOut.Seek(0, SeekOrigin.Begin);

               ClientContext clientContext = new ClientContext(SPContext.Current.Site.Url);

               ClientOM.File.SaveBinaryDirect(clientContext, outputPath, memOut, true);

               // Conversion

  • Hi Eric,

    How about the support of Altchunks in Office 2003 with Compatibility pack installed. I created a very simple word document using open xml sdk and added an alt chunk to to the body with a stream of simple html content. It fails to open in office 2003.

    Any thoughts ??

    John

  • Hi John,

    Yes, you are right, altChunk is not supported in Office 2003.  There are other features of Open XML not supported in 2003, such as content controls.  This has to do with the actual functionality in that version of Office.  There is no code to do the conversion and import for altChunk, nor to handle content controls, therefore those are not supported.

    -Eric

  • Hi Eric,

    I am converting the html content to word using altchunk. The problem is spacing is adding between lines in the paragraph. In html there is no space between the lines but in the word the space is automatically getting added between each line.

    I used the below html.

    <html>

    <head/>

    <body>

    <div >

    <div>sdsdsd</div>

    <div><strong>sdsdsdsdsd sdsdsdsdsd sdsdsdsdsd</strong></div>

    <div><strong>erterterttr</strong></div>

    <div>Sample <em>Text</em></div>

    <div><font color="#ff0000">ACCCC</font></div>

    <div><font color="#ff0000">sdsdsd</font></div>

    <div><font color="#ff9900">Test Doc</font></div>

    <div><a href="http://sgehmoss01:9005//sites/Conversion/default.aspx">Default</a></div>

    <div> </div>

    <div><a href="www.google.com/">All Items</a></div>

    <div><font color="#ff0000"></font></div>

    <div>AAA</div>

    <div><img alt="Home Page" src="http://sgehmoss01:9005/Sites/Conversion/_layouts/images/homepage.gif"></div>

    <div> </div>

    <div><img alt="Second One" src="http://sgehmoss01:9005/Sites/Conversion/_layouts/images/homepage.gif"></div>

    <div> </div>

    <div>Saasasas</div>

    <div>asas</div>

    <div>as</div>

    <div> </div>

    <div>asas</div></div>

    </body>

    </html>

  • Hi Eric, I sure hope you will respond to me. I'm really stuck. I've been adding altchunks and sometimes I get corrupt file and other times I do not. I haven't been able to find any pattern. I've unzipped the docx file and I'm manually playing with the document.xml file and I can move

    <w:altChunk r:id="somechunkId" />

    to different locations within the root of the body. Some of them work, others produce error

    Unspecified error: location part:/word/document.xml line 176, column 0

    It's very frustrating. I need to put HTML into a word template at certain points. Word asks if I would like to try and recover and it's fine but I can't automate that on our server.

    You seem to be the expert on altchunk, any thoughts? anything would be helpful.. thank you

    -Matt

  • @Matt,

    Yes, you are correct, there are certain places where you can put altChunk, and other places where you can't.  altChunk imports block content (i.e. siblings of paragraphs and tables) so the altChunk element needs to go there, not within a paragraph.  Beyond that, I'm not sure.

    I'd be happy to take a look at one of your corrupted docs and I can probably tell you what is wrong.  If you would be good enough to submit the question on the forums at OpenXmlDeveloper.org, it would be super easy for me to respond.  Also, by answering the question there, others can take advantage of the answer.

    Cheers, Eric

Page 4 of 4 (54 items) 1234