[Blog Map] This blog is inactive. New blog: EricWhite.com/blog
There are two ways to assemble multiple Open XML word processing documents into a single document:
This post compares and contrasts these two approaches.
The biggest difference between the two approaches is that the altChunk technique relies on the consuming application to merge the documents. The gist of the altChunk approach is that you embed the entire source document (whether it is another Open XML document, an HTML document, or simply text) as a binary part in the new document. You then insert altChunk markup that refers to the binary part at the desired place where you want the inserted document to end up. Compliant consuming applications are not required to recognize altChunk. Novel Open Office, which can consume Open XML documents, doesn’t process altChunk. In contrast, DocumentBuilder assembles a new document that contains basic Open XML word processing markup that can be consumed by a wide variety of consuming application, such as Novel Open Office, the Open XML Document Viewer, and more. I’ve written a wide variety of Open XML SDK / LINQ to XML examples that consume Open XML word processing documents, and those examples don’t process the altChunk element.
Another difference is that altChunk can only insert an entire document at a specified point. You can’t pick and choose a subset of a source document to insert into the newly assembled document. In contrast, DocumentBuilder allows you to specify a range of content from the source document to be inserted into the newly assembled document. For example, you can assemble a new document from paragraphs 1-3 from one document, paragraphs 5-8 from a second document, and the entire contents of a third document. Technically, with DocumentBuilder, you are not specifying paragraphs from the source document – you specify a range of child elements of the <w:body> element. Child elements can include tables and content controls, for instance.
Inserting the entire document can be an issue when you require fine-grained control of the newly assembled document. For instance, in Word 2007, you can’t create a document that contains just a table. The document will also always contain an empty paragraph that follows the table. This means that if you are merging two documents that each contains just a table, the resulting document will contain a blank paragraph between the two tables. This was one of the issues called out by a number of folks who are using altChunk, as seen in comments on my altChunk post.
As you can see from the above mentioned blog posts, the code that you need to write to take advantage of altChunk is pretty simple. In contrast, DocumentBuilder contains about 1000 lines of source code to resolve issues of interrelated markup.
You can use altChunk to convert HTML to Open XML word processing documents. In contrast, DocumentBuilder has no capabilities for working with HTML markup – you can only assemble multiple Open XML word processing documents into a new document.
If you are assembling a large number documents using altChunk, the creation of the new document will be very fast, and the main document part of the assembled document will be very small – the body of the document consists of an altChunk element for each document. Then, when opening in Microsoft Word 2007, it will take a bit longer than normal. While opening, if the documents are large enough to take time to import, you can see a progress bar repeatedly iterate from 0% - 100% for each imported document at the bottom of the task window of Word. When assembling a document using DocumentBuilder, the assembling will take a bit longer – processing time is proportional to the total size of imported chunks. Then when Word opens the document, normal opening times apply.
PingBack from http://asp-net-hosting.simplynetdev.com/comparison-of-altchunk-to-the-documentbuilder-class/
Eric, thanks for the posts. I have learned a great deal from them. One additional comment regarding the "Consuming Application" item. Choosing the altChunk method may limit (currently) your ability to further process the document. For example, I recently had to update some code used in a MOSS Document Conversion solution from altChunk to the DocumentBuilder style due to a PDF converter not supporting the altChunk style of document assembly.
Hi Pete, glad you like the posts. Yes, I only would use altChunk if I were producing documents to be consumed by Office, and I could live with the limitations imposed by altChunk. If converting HTML to Open XML, altChunk is my preferred way, for now.
Thanks Eric, I learned a lot from the altChunk posts as well. One note on the cons for altChunk. You say that you can only merge in entire documents, but I was able to merge in certain pieces of a second document. This might only work in Word 2007, but if the source you are merging in comes from the WordOpenXML property of a Content Control, then I was able to successfully merge that into a destination content control.
Is using SharePoint 2010's Word Automation Services the only way to consume docx files that have AltChunks, converting them to 'normal' docx files? The reason I'm asking is I've got a SharePoint workflow converting my docx files from AltChunk docx files to 'normal' docx files, but it will only process one file per minute. So, I'm wondering if there is different application/method of conversion, Not requiring SharePoint, or if SharePoint will process much faster, but I've just got the configuration off, &c. Any suggestions are greatly appreciated.
i have created a mail in word doc and converted it as HTML.But after editing i couldnot convert it as word doc