Blog - Title

Using the Open XML SDK and LINQ to XML to Remove Comments from an Open XML Wordprocessing Document

Using the Open XML SDK and LINQ to XML to Remove Comments from an Open XML Wordprocessing Document

  • Comments 7

This post presents a snippet of code to remove comments from an Open XML Wordprocessing document.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Note: This post may be of interest to LINQ to XML developers, as it contains some information that helps you write queries that perform better.  In the case of very large documents, the approach described below performs much better than other approaches.

The code is very simple: remove all w:commentRangeStart, w:commentRangeEnd, and w:commentReference elements in the main document part, and then remove the comment part.

The following is the code that removes the above mentioned elements.

// pre-atomize the XName objects so that they are not atomized for every item in the collection
XName commentRangeStart = w + "commentRangeStart";
XName commentRangeEnd = w + "commentRangeEnd";
XName commentReference = w + "commentReference";
mainDocumentXDoc.Descendants()
    .Where(x => x.Name == commentRangeStart ||
                x.Name == commentRangeEnd ||
                x.Name == commentReference)
    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentRangeStart")

    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentRangeEnd")

    .Remove();

mainDocumentXDoc

    .Descendants(w + "commentReference")

    .Remove();

Of course, this causes iteration of all of the descendants three times, not very desirable for large documents.

So, keeping this in mind, you might write it like this:

mainDocumentXDoc.Descendants()

    .Where(x => x.Name == w + "commentRangeStart" ||

        x.Name == w + "commentRangeEnd" ||

        x.Name == w + "commentReference")

    .Remove();

This causes iterations of the Descendants axis only once.  However, there is a subtler performance issue here: the names (as expressed by w + "commentRangeStart", etc.) are atomized over and over again for every item in the Descendants axis.  To make the code perform as well as possible, we pre-atomize the XName objects, then we use them in the call to the Where extension method:

XName commentRangeStart = w + "commentRangeStart";

XName commentRangeEnd = w + "commentRangeEnd";

XName commentReference = w + "commentReference";

mainDocumentXDoc.Descendants()

    .Where(x =>

       x.Name == commentRangeStart ||

       x.Name == commentRangeEnd ||

       x.Name == commentReference)

    .Remove();

For more detailed information about atomization and LINQ to XML performance, see Performance of LINQ to XML.

The attached code also has a bool method that indicates whether the document contains comments.

Code is attached.

Attachment: RemoveComments.cs
Leave a Comment
  • Please add 3 and 6 and type the answer here:
  • Post
  • In the last three posts, in addition to the information regarding how we want to alter the markup in

  • Les voici : PowerTools : Utiliser System.IO.Packaging dans PowerTools pour modifier des propriétés (Doug

  • In the next series of blog posts, I’ll be exploring some interesting aspects of SharePoint development.

  • Just installed the OpenXML SDK v1.0.  Forgive me if my question is not directly related.  I'm trying to select all Tables in a Word document and write them out as Worksheets in an Excel workbook.  I can collect Word tables with VSTO but can't easily write them out to Excel.  Can I do that with OpenXML SDK?

  • This post presents a custom application page in SharePoint that uses Open XML, the Open XML SDK and LINQ

  • Ce post n’a pas voulu partir ni jeudi ni vendredi, le voici donc ! Des mise à jours à n’en plus finir

  • One of the more common scenarios related to a Wordprocessing document is the need to sanitize a document

Page 1 of 1 (7 items)