Welcome to MSDN Blogs Sign in | Join | Help
Using the Open XML SDK and LINQ to XML to Remove Comments from an Open XML Wordprocessing Document

[Blog Map] 

This post presents a snippet of code to remove comments from an Open XML Wordprocessing document.

Note: This post may be of interest to LINQ to XML developers, as it contains some information that helps you write queries that perform better.  In the case of very large documents, the approach described below performs much better than other approaches.

The code is very simple: remove all w:commentRangeStart, w:commentRangeEnd, and w:commentReference elements in the main document part, and then remove the comment part.

The following is the code that removes the above mentioned elements.

// pre-atomize the XName objects so that they are not atomized for every item in the collection

XName commentRangeStart = w + "commentRangeStart";

XName commentRangeEnd = w + "commentRangeEnd";

XName commentReference = w + "commentReference";

mainDocumentXDoc.Descendants()

    .Where(x => x.Name == commentRangeStart ||

                x.Name == commentRangeEnd ||

                x.Name == commentReference)

    .Remove();

There are two other ways that you could write this code.  A first attempt might be as follows:

mainDocumentXDoc

    .Descendants(w + "commentRangeStart")

    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentRangeEnd")

    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentReference")

    .Remove();

Of course, this causes iteration of all of the descendants three times, not very desirable for large documents.

So, keeping this in mind, you might write it like this:

mainDocumentXDoc.Descendants()

    .Where(x => x.Name == w + "commentRangeStart" ||

        x.Name == w + "commentRangeEnd" ||

        x.Name == w + "commentReference")

    .Remove();

This causes iterations of the Descendants axis only once.  However, there is a subtler performance issue here: the names (as expressed by w + "commentRangeStart", etc.) are atomized over and over again for every item in the Descendants axis.  To make the code perform as well as possible, we pre-atomize the XName objects, then we use them in the call to the Where extension method:

XName commentRangeStart = w + "commentRangeStart";

XName commentRangeEnd = w + "commentRangeEnd";

XName commentReference = w + "commentReference";

mainDocumentXDoc.Descendants()

    .Where(x =>

       x.Name == commentRangeStart ||

       x.Name == commentRangeEnd ||

       x.Name == commentReference)

    .Remove();

For more detailed information about atomization and LINQ to XML performance, see Performance of LINQ to XML.

The attached code also has a bool method that indicates whether the document contains comments.

Code is attached.

Posted: Monday, July 14, 2008 4:35 AM by EricWhite
Attachment(s): RemoveComments.cs

Comments

Eric White's Blog said:

In the last three posts, in addition to the information regarding how we want to alter the markup in

# July 14, 2008 12:18 AM

Julien Chable said:

Les voici : PowerTools : Utiliser System.IO.Packaging dans PowerTools pour modifier des propriétés (Doug

# July 18, 2008 4:30 AM

Eric White's Blog said:

In the next series of blog posts, I’ll be exploring some interesting aspects of SharePoint development.

# July 18, 2008 12:49 PM

Ashok Pai said:

Just installed the OpenXML SDK v1.0.  Forgive me if my question is not directly related.  I'm trying to select all Tables in a Word document and write them out as Worksheets in an Excel workbook.  I can collect Word tables with VSTO but can't easily write them out to Excel.  Can I do that with OpenXML SDK?

# July 20, 2008 8:15 PM

Eric White's Blog said:

This post presents a custom application page in SharePoint that uses Open XML, the Open XML SDK and LINQ

# July 23, 2008 2:57 AM

Julien Chable said:

Ce post n’a pas voulu partir ni jeudi ni vendredi, le voici donc ! Des mise à jours à n’en plus finir

# August 18, 2008 3:49 AM

Brian Jones: Office Extensibility said:

One of the more common scenarios related to a Wordprocessing document is the need to sanitize a document

# February 6, 2009 3:03 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

  
Enter Code Here: Required

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker