Blog Map
[Blog Map] This blog is inactive. New blog: EricWhite.com/blog
DocumentBuilder is a small API (part of the PowerTools for Open XML project, an open source project on CodePlex) that allows you to merge contents of documents while retaining document integrity and resolving issues of markup interdependence. This post contains detailed information on interdependence of Open XML WordprocessingML markup. This post introduces DocumentBuilder, and gives a few examples of its use. This post discusses how to control sections (and headers) when using DocumentBuilder.
DocumentBuilder is licensed under the Microsoft Public License (Ms-PL), which gives you wide latitude in how you use the code. To get DocumentBuilder, go to PowerTools for Open XML, click on the Downloads tab, and download DocumentBuilder.zip.
I've updated DocumentBuilder with a couple of minor bug fixes:
There are upcoming tasks for DocumentBuilder, and the PowerTools for Open XML in general:
Hi, I encounter un problem with your tool when parsing my master document. At the body level, I only have pr and tbl.
For the paragraph, I havn't any trouble and it's very fast (thanks for that) but for the tables, my sub doc has to be placed usually in a cell. After the replacement, all my table is replace instead of the cell content. Can you help me please ?
// project collection of tuples containing element and type
var q2 = q1
.Select(
e =>
{
string keyForGroupAdjacent = ".NonContentControl";
if (e.Name == w + "tbl")
keyForGroupAdjacent = null;
if (e.Value.StartsWith("{$C") && mokeDic.ContainsKey(e.Value))
keyForGroupAdjacent = e.Value;
...
I tried with a more specialized linq query to find the cell but I think I have to create a new collection (eg q2_table)??
Hi Ernest,
At this point, the way that you configure DocumentBuilder is to specify as sources elements that are child elements of the w:body element. What this means is that as currently written, DocumentBuilder can't do what you want. I would like to update DocumentBuilder so that you can specify a list of block-level content elements, block level content containers, or content controls. This would allow you to do what you are trying to do.
One possible work-around that you could do right now is that you could import your sub-doc as ordinary paragraphs, and then go in after the fact and surround those paragraphs with a table, placing them in a cell. This would allow DocumentBuilder to do all of the resolutions of interrelated markup.
-Eric
thanks for your prompt reply. I'm really expecting your update because it's incredibly faster than mine. Futher more, I'm using altchunk but with a deep level of imbrication and Word 2003 + 2007 plugin is unable to display them. But what I do is finding some token and keeping their position (I wasn't using XDocument) them I call a metod doing this
using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(destinationFile, true))
string altChunkId = "AltChunkId" + Guid.NewGuid();
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
bool finish = false;
int I = 0;
DateTime dtStart = DateTime.Now;
while (!finish)
try
using (FileStream fileStream = File.Open(sourceFile, FileMode.Open))
chunk.FeedData(fileStream);
}
finish = true;
catch
//System.Threading.Thread.Sleep(100);
I++;
// insert after
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
OpenXmlElement xWorking = null;
if (String.IsNullOrEmpty(positionLabel) || "???" == positionLabel)
xWorking = mainPart.Document.Body.Elements<OpenXmlElement>().Last();
else
xWorking = mainPart.Document.Descendants<OpenXmlElement>().Where(o => o.InnerText == positionLabel).FirstOrDefault();
if (null != xWorking)
if (xWorking is Body)
xWorking.Descendants<Paragraph>().Where(o => o.InnerText == positionLabel).FirstOrDefault().InsertAfterSelf(altChunk);
xWorking.Descendants<Paragraph>().Where(o => o.InnerText == positionLabel).FirstOrDefault().Remove();
else if (xWorking is Paragraph)
/*
mainPart.Document
.Body
.InsertAfter(altChunk, xWorking);
mainPart.Document.Body.RemoveChild<OpenXmlElement>(xWorking);
*/
if (xWorking.Ancestors<TableCell>().Count()>0)
xWorking.Ancestors<TableCell>().First().ChildElements.OfType<TableCellProperties>().Last().InsertAfterSelf(altChunk);
else if (xWorking.Ancestors<Body>().Count()>0)
xWorking.InsertBeforeSelf(altChunk);
//xWorking.Ancestors<Body>().First().InsertAfterSelf(altChunk);
xWorking.Remove();
else if (xWorking is Table)
if (xWorking.Descendants<TableCell>().Count()>0)
xWorking.Descendants<TableCell>().First().ChildElements.OfType<TableCellProperties>().Last().InsertAfterSelf(altChunk);
//else if (xWorking is TableCell)
//{
// xWorking.Descendants<TableCellProperties>().Last().InsertAfterSelf(altChunk);
// xWorking.Descendants<Paragraph>().Last().Remove();
//}
else if (xWorking is TableRow || xWorking is TableCell)
//
xWorking.Descendants<Text>().Where(o => o.InnerText == positionLabel).FirstOrDefault().Ancestors<TableCell>().First().ChildElements.OfType<TableCellProperties>().Last().InsertAfterSelf(altChunk);
throw new NotImplementedException(xWorking.ToString());
Was wondering if you could help, i see you have noted these changes at the top...
Fixed a bug where images were copied one byte too short.
Enhance DocumentBuilder so that if multiple documents contain the same header, reuse the header instead of duplicating headers in the destination document.
We are using PowerTools Version 2.0.8.0 when we try to open the document we get a error stating there are problems with the content.
Parts are mssing or invalid: Location /word/headerc.xml Line 1, Column 1597
This is when the headers contain images, the first image gets pulled through but any subsequent docs that contain the same header the images get removed.
All keep section attributes are set as True
Is there any known issues with images placed in the header sections?