[Blog Map] This blog is inactive. New blog: EricWhite.com/blog
This is one in a series of posts on transforming Open XML WordprocessingML to XHtml. You can find the complete list of posts here.
Revision tracking markup in Open XML word-processing documents is one of the more complex areas of the standard. If you first accept tracked revisions, it makes subsequent processing of text in word-processing documents much simpler. As an example, in my current project of transforming Open XML word-processing documents to XHtml, before doing the conversion, I accept tracked revisions in an in-memory WordprocessingDocument object. I then transform that in-memory document to XHtml. This means that in my transformation algorithm, I can completely disregard all of the revision tracking elements and attributes, and all of the complexities associated with them. If you want to know in exacting detail how the tracked revision markup works in Open XML, then this post will tell you. It also describes the algorithms in detail for accepting tracked revisions.
I've written an MSDN article, Accepting Revisions in Open XML Word-Processing Documents. That article presents the markup semantics behind the code that I've written to accept tracked revisions for the PowerTools for Open Xml project. You can find the code under the Downloads tab at CodePlex.com/PowerTools.
Thanks so much for a great series of articles on OpenXML. I'm currently writing some code using the OpenXML SDK 2.0, that will scrape data from Word tables and place them into Excel, using Zeyad Rajabi's code found here:
However, his code does not take into account multiple runs inside of a cell, particularly with regard to embedded breaks (w:br). I don't need to worry about the style of each individual run, but when I get the InnerText of each cell as in his example, it (perhaps rightly so) ignores these break tags.
I noticed in post #3, you said you would be handling cr and br tags appropriately, but looking through the series I can't find any other reference. Can you point me to the appropriate article, or perhaps point me in the right direction towards how to properly deal with these?
Hi Shawn, I haven't yet posted any code around cr and br tags. It's coming, but it takes time, as you might guess. A couple of key points for you. You must accept tracked revisions (or disallow documents with tracked changes), because in certain circumstances, the InnerText will contain deleted text. The question to answer is: how are you going to transform those <br> elements? What is the markup for SpreadsheetML that you will be generating? Then you need to write a transform from the WordprocessingML to that specific SpreadsheetML. If you don't need to allow for content controls in cells, then your code is simpler. Feel free to post the specifics of your transform here in a comment, and I'll answer them here, or feel free to email me directly via the email button here on the blog.
Hi Eric do you have a link to any c# examples of how to implement a vertical cell merge in a word document?
Hi Philip, I don't have any examples, but it is pretty easy. I've added it to my blog-post list for the next week or two. In the meantime, here is an easy way to learn about it - create a doc with a table, make a copy, vertically merge some cells in the copy, and use the Open XML SDK V2 tool to compare the two, and you can see the markup that you need to create to vertically merge cells.
Hi Eric thanks for your comments. I'm trying to break the generation of the Open xml down so I can make it modular, or at least as modular as it can be. Using the SDK produces very verbose code. Is this really the only way to figure out the structures? BTW I can see how the merge works, but If I'm creating my table like this:
how do I append the merge cell properties? Is this the best syntax to use to generate a table? I'd appreciate any thoughts and would like to thank you for getting some examples out on the web, because there's not a lot of stuff to guide people on this topic. Thanks.
Well this is great information. But do you think it will help my photography blog at?