Blog - Title

How to Use altChunk for Document Assembly

How to Use altChunk for Document Assembly

  • Comments 54

Merging multiple word processing documents into a single document is something that many people want to do.  An application built for attorneys might assemble selected standard clauses into a contract.  An application built for book publishers can assemble chapters of a book into a single document.  This post explains the semantics of the altChunk element, and provides some code using the Open XML SDK that shows how to use altChunk.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Instead of using altChunk, you could write a program to merge the Open XML markup for documents.  You would need to deal with a number of issues, including merging style sheets and resolving conflicting styles, merging the comments from all of the documents, merging bookmarks, and more.  This is doable, but it’s a lot of work.  You can use altChunk to let Word 2007 do the heavy lifting for you.

altChunk is a powerful technique.  It’s a tool that should be in every Open XML developer’s toolbox.  In an upcoming post, I’ll show an example of the use of altChunk in a SharePoint application.  You can create compelling document assembly solutions in SharePoint using altChunk.

Overview of the altChunk Markup

The altChunk markup tells the consuming application to import content into the document.  This behavior is not required for a conforming application – a conforming application is free to ignore the altChunk markup.  However, the standard recommends that if the application ignores the altChunk markup, it should notify the user.  Word 2007 supports altChunk.

To use altChunk, you do the following:

  • You create a new part in the package.  The part can have a number of content types, listed below.  When you create the part, you assign a unique ID to the part.
  • You store the content that you want to import into the part.  You can import a variety of types of content, including another Open XML word processing document, HTML, or text.
  • The main document part has a relationship to the alternative format part.
  • You add a w:altChunk element at the location where you want to import the alternative format content.  The r:id attribute of the w:altChunk element identifies the chunk to import.  The w:altChunk element is a sibling to paragraph elements (w:p).  You can add an altChunk element at any point in the markup that can contain a paragraph element.

A few options for content types that can be imported into a document are:

  • application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml

The alternative format content part contains an Open XML document in binary form.

  • application/xhtml+xml

The alternative format content part contains an XHTML document.

  • text/plain

The alternative format content part contains text.

There are more than these three options; the code presented in this post shows how to implement altChunk for these three types of content.

The altChunk markup in the document looks like this:

<w:p>
  <w:r>
    <w:t>Paragraph before.</w:t>
  </w:r>
</w:p>
<w:altChunkr:id="AltChunkId1" />
<w:p>
  <w:r>
    <w:t>Paragraph after.</w:t>
  </w:r>
</w:p>


altChunk: Import Only

One important note about altChunk – it is used only for importing content.  If you open the document using Word 2007 and save it, the newly saved document will not contain the alternative format content part, nor the altChunk markup that references it.  Word saves all imported content as paragraph (w:p) elements.  The standard requires this behavior from a conforming application.

Using altChunk

The following screen-clipping shows a simple word processing document.  It has a heading, a paragraph styled as Normal, and a comment:

The following screen-clipping shows another word processing document, with content that we want to insert into the first document.

After running the example program included with this post, the resulting document looks like the following.  Notice that the resulting document has comments from both of the source documents:

The following example shows how to merge two Open XML documents using altChunk.  It uses V1 of the Open XML SDK, and LINQ to XML:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        XNamespace w =
            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
        XNamespace r =
            "http://schemas.openxmlformats.org/officeDocument/2006/relationships";

        using (WordprocessingDocument myDoc =
            WordprocessingDocument.Open("Test.docx", true))
        {
            string altChunkId = "AltChunkId1";
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
              "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",
              altChunkId);
            using (FileStream fileStream =
                File.Open("TestInsertedContent.docx", FileMode.Open))
                chunk.FeedData(fileStream);
            XElement altChunk = new XElement(w + "altChunk",
                new XAttribute(r + "id", altChunkId)
            );
            XDocument mainDocumentXDoc = GetXDocument(myDoc);
            // Add the altChunk element after the last paragraph.
            mainDocumentXDoc.Root
                .Element(w + "body")
                .Elements(w + "p")
                .Last()
                .AddAfterSelf(altChunk);
            SaveXDocument(myDoc, mainDocumentXDoc);
        }
    }

    private static void SaveXDocument(WordprocessingDocument myDoc,
        XDocument mainDocumentXDoc)
    {
        // Serialize the XDocument back into the part
        using (Stream str = myDoc.MainDocumentPart.GetStream(
            FileMode.Create, FileAccess.Write))
        using (XmlWriter xw = XmlWriter.Create(str))
            mainDocumentXDoc.Save(xw);
    }

    private static XDocument GetXDocument(WordprocessingDocument myDoc)
    {
        // Load the main document part into an XDocument
        XDocument mainDocumentXDoc;
        using (Stream str = myDoc.MainDocumentPart.GetStream())
        using (XmlReader xr = XmlReader.Create(str))
            mainDocumentXDoc = XDocument.Load(xr);
        return mainDocumentXDoc;
    }
}


To use altChunk with HTML, the code looks like this:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test3.docx", true))
{
    string html =
      @"<html>
            <head/>
            <body>
                <h1>Html Heading</h1>
                <p>This is an html document in a string literal.</p>
            </body>
        </html>";
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        "application/xhtml+xml", altChunkId);
    using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write))
    using (StreamWriter stringStream = new StreamWriter(chunkStream))
        stringStream.Write(html);
    XElement altChunk = new XElement(w + "altChunk",
        new XAttribute(r + "id", altChunkId)
    );
    XDocument mainDocumentXDoc = GetXDocument(myDoc);
    mainDocumentXDoc.Root
        .Element(w + "body")
        .Elements(w + "p")
        .Last()
        .AddAfterSelf(altChunk);
    SaveXDocument(myDoc, mainDocumentXDoc);
}


Using V2 of the Open XML SDK:

using (WordprocessingDocument myDoc =
    WordprocessingDocument.Open("Test1.docx", true))
{
    string altChunkId = "AltChunkId1";
    MainDocumentPart mainPart = myDoc.MainDocumentPart;
    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
        AlternativeFormatImportPartType.WordprocessingML, altChunkId);
    using (FileStream fileStream = File.Open("TestInsertedContent.docx", FileMode.Open))
        chunk.FeedData(fileStream);
    AltChunk altChunk = new AltChunk();
    altChunk.Id = altChunkId;
    mainPart.Document
        .Body
        .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
    mainPart.Document.Save();
}


The attached code shows examples of placing an Open XML document, html, and text into an alternative content part.  I’ve provided two versions of the example – one using V1 of the Open XML SDK (and LINQ to XML), and another using V2 of the Open XML SDK.

Attachment: altChunk.zip
Leave a Comment
  • Please add 7 and 5 and type the answer here:
  • Post
  • Hi Eric, I'm getting following error when trying to execute the above code in an aspx page. may I know what is causing this issue.

    Thanks,

    Rama

    Server Error in '/TMS' Application.

    'AltChunkId14' ID conflicts with the ID of an existing relationship for the specified source.

    Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

    Exception Details: System.Xml.XmlException: 'AltChunkId14' ID conflicts with the ID of an existing relationship for the specified source.

    Source Error:

    Line 78:                             string altChunkId = "AltChunkId" + loop;

    Line 79:                             MainDocumentPart mainPart = myDoc.MainDocumentPart;

    Line 80:                             AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(

    Line 81:                               "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml",

    Line 82:                               altChunkId);

    Source File: c:\Inetpub\wwwroot\tms\trunk\web\Tariff\Tariff.aspx.cs    Line: 80

    Stack Trace:

    [XmlException: 'AltChunkId14' ID conflicts with the ID of an existing relationship for the specified source.]

      MS.Internal.IO.Packaging.InternalRelationshipCollection.ValidateUniqueRelationshipId(String id) +634905

      MS.Internal.IO.Packaging.InternalRelationshipCollection.Add(Uri targetUri, TargetMode targetMode, String relationshipType, String id, Boolean parsing) +210

      System.IO.Packaging.PackagePart.CreateRelationship(Uri targetUri, TargetMode targetMode, String relationshipType, String id) +62

      DocumentFormat.OpenXml.Packaging.OpenXmlPart.CreateRelationship(Uri targetUri, TargetMode targetMode, String relationshipType, String id) +36

      DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.AttachChild(OpenXmlPart part, String rId) +88

      DocumentFormat.OpenXml.Packaging.OpenXmlPartContainer.InitPart(T newPart, String contentType, String id) +246

      DocumentFormat.OpenXml.Packaging.MainDocumentPart.AddAlternativeFormatImportPart(String contentType, String id) +47

      Systrends.TMS.Web.Tariff.Tariff.<Page_Load>b__1(<>f__AnonymousType1`3 mergingdocuments) in c:\Inetpub\wwwroot\tms\trunk\web\Tariff\Tariff.aspx.cs:80

      System.Array.ForEach(T[] array, Action`1 action) +47

      Systrends.TMS.Web.Tariff.Tariff.Page_Load(Object sender, EventArgs e) in c:\Inetpub\wwwroot\tms\trunk\web\Tariff\Tariff.aspx.cs:66

      System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr fp, Object o, Object t, EventArgs e) +14

      System.Web.Util.CalliEventHandlerDelegateProxy.Callback(Object sender, EventArgs e) +35

      System.Web.UI.Control.OnLoad(EventArgs e) +99

      System.Web.UI.Control.LoadRecursive() +50

      System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +627

    Version Information: Microsoft .NET Framework Version:2.0.50727.3053; ASP.NET Version:2.0.50727.3053

  • Hi Rama,

    Somehow you are getting duplicate rIDs for the altChunk that you are adding.  rIDs need to be unique - there are a variety of ways to enforce this.  It isn't a problem when creating a document from scratch, but when modifying an existing document, you need to take care that you only add new parts with uniuqe rIDs.  Does this help you with your issue?

    -Eric

  • Hi Eric,

    I've really appreciated your article and I have one question: regarding the "altChunk: Import Only" section, is there any way to avoid this peculiar behaviour?

    In other words, is there an altChunk property or another markup that can be used to embed external sources (i.e. html files) avoiding them to be totally erased from the archive after the first saving?

    Many thanks,

    Kulio.

  • Hi Kulio,

    Unfortunately, the behavior can't be changed.  When you open the document in Word, the embedded external source is removed from the package.

    -Eric

  • There are two ways to assemble multiple Open XML word processing documents into a single document: altChunk,

  • DocumentBuilder is an example class that’s part of the PowerTools for Open XML project that enables you

  • i have used your v2 code to merge the documents, and i add my page to sharepoint site, but sharepoint not recongnzing Last() method in the following line:

    InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());

    Iam getting following exception.

    'System.Collections.Generic.IEnumerable<DocumentFormat.OpenXml.Wordprocessing.Paragraph>' does not contain a definition for 'Last'   at System.Web.Compilation.AssemblyBuilder.Compile()

  • Hi Ramesh, you need to include a "using System.Linq;" using statement.

    -Eric

  • hi Eric,

    Thanks for your response.

    but i used System.Xml.Linq namespace.

    I used System.Xml.Linq  and DocumentFormat.OpenXml dll to merge the office documents, which is working fine in 3.5 framework. When i bind my page with sharepoint site iam getting an exception saying

    'System.Collections.Generic.IEnumerable<DocumentFormat.OpenXml.Wordprocessing.Paragraph>' does not contain a definition for 'Last'   at System.Web.Compilation.AssemblyBuilder.Compile()

    Code snippet:

    using (WordprocessingDocument myDoc =

                       WordprocessingDocument.Open("Desc.docx", true))

                   {

                       string altChunkId = "AltChunkId" + i;

                       MainDocumentPart mainPart = myDoc.MainDocumentPart;

                       AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(

                           AlternativeFormatImportPartType.WordprocessingML, altChunkId);

                       using (FileStream fileStream = File.Open("Temp.docx", FileMode.Open))

                           chunk.FeedData(fileStream);

                       AltChunk altChunk = new AltChunk();

                       altChunk.Id = altChunkId;

                       mainPart.Document

                           .Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());

                       mainPart.Document.Save();

                   }

    Note: when i change my applicaiton framework version to 3.0 also i am getting the same exception in my local, which i got in sharepoint.

    Is it mean that sharepoint doens't support 3.5 framework DLL..

    Please advise.

  • Hi Ramesh, the Enumerable.Last extension method is in the System.Linq namespace, not System.Xml.Linq.  By default, the SharePoint project doesn't include a using for System.Linq.  So to use the Last extension method, you need to add that using statement.

    In general, when you get build errors like this, take a look at the MSDN docs on the class/method/type.  The docs always tell you which assembly the class is in, and what namespace the class is in.  Then, you can add appropriate references and using statements.  Make sense?

    -Eric

  • Good stuff, very helpful.   I have a slight twist to this I am working on, maybe someone can help.  

    Instead of opening existing files and merging them, I am programmatically creating WordProcessingDocuments using C# in .NET.  Then based on various conditions I may or may not want to combine them and then stream them out as a single document.    

    So instead of adding data to the stream in the form of:

    Stream fileStream = System.IO.File.Open(fileName, FileMode.Open);

    chunk.FeedData(fileStream);

    I tried to do this:

    Stream stream = wordDoc.MainDocumentPart.GetStream();

    chunk.FeedData(stream);

    Which compiles but then when you try to open the final document it give me a message that the docx can't be opened because of problems with the contents.    Any ideas?

  • Hi rmagill,  quick question - are you properly disposing of all of your streams?  That could very well cause this problem.  Another debugging technique for a situation like this - read streams to byte arrays - as necessary, you can create a non-resizable memory stream from a byte array using one of the MemoryStream constructors (I believe that the memory stream uses the passed in byte array as its backing store).  You can then examine this byte array to see what's different.

    It's best to always use a 'using' block for every object that implements IDisposable:

    private static void SaveXDocument(WordprocessingDocument myDoc,

        XDocument mainDocumentXDoc)

    {

        // Serialize the XDocument back into the part

        using (Stream str = myDoc.MainDocumentPart.GetStream(FileMode.Create, FileAccess.Write))

        using (XmlWriter xw = XmlWriter.Create(str))

            mainDocumentXDoc.Save(xw);

    }

    -Eric

  • Hi Eric,

    I thank you for your last answer.

    I've solved the problem creating the html files 'on the fly' and using a kind of 'custom marker' in the document that is replaced at runtime with the proper altchunk reference tag.

    Now I am wondering if there is a way to embed also a css stylesheet for the html files.

    The stylesheet file is placed in a directory "word/html".

    I've found out that if I insert "<Default Extension="css" ContentType="text/css" />"

    into [Content_Types].xml I get no error message when opening the docx.

    However the CSS is ignored in the docx file.

    On the contrary, ff I integrate the styles in a <style> tag inside the html file, the proper style is displayed correctly inside the docx file.

    Many thanks,

    Kulio.

  • Hi Kulio,

    From the dev team:

    Word only supports a few content types for altChunks. Word does support HTML and MHT, which is why putting them in a <style> tag worked. For HTML, Word only reads the HTML file itself and not any supporting files in the package. So if you have any external stylesheets, images, etc. MHT might be the best route.

    -Eric

  • I've had no trouble getting this working.  However, the one difficult I'm having is this:

    If I have a hyperlink in my source HTML that looks like this: <a href="myimage.jpeg">, and I have added myimage.jpeg, is there any way that I can get my hyperlink to refer to that image?  Currently the URL is resolved to "directoryTheDocumentIsIn/myimage.jpeg". I'm not sure whether the HyperlinkBase extended property could be used for this...I can't figure out a way.

    Also, I'm a little confused. I was under the impression that with altchunk, Word does a one-time conversion of the content and does away with the source altchunk file.  However, I find that even after opening the docx file several times, the altchunk file remains, and document.xml still contains the <altchunk> tag, rather than any imported html.

Page 2 of 4 (54 items) 1234