December, 2007

  • Eric White's Blog

    The WordprocessingML Class: A refinement of the approach of using LINQ to XML to access Open XML


    (July 10, 2008 - I've written a new blog post on a better way to accomplish this.) 

    This blog is inactive.
    New blog:

    Blog TOC
    This post presents a refinement of the OpenXmlDocument class, which is a new class (WordprocessingML) that derives from the OpenXmlDocument class. The WordprocessingML class adds additional functionality that is specific to WordprocessingML documents, including:

    ·         Some constant strings that contain the DocumentRelationshipType, the StylesRelationshipType, and the CommentsRelationshipType.


    ·         An XNamespace object that contain the main XML namespace for WordprocessingML documents.

    ·         Initialized properties that find the main DocumentRelationship object, the StylesRelationship object, and the CommentsRelationship object. The Relationship class is declared in the code found in the link below, and represents a node in the object graph that contains an entire OpenXML document.


    ·         A DefaultStyle method that queries for the default style of the document.

    ·         A Paragraphs method that enumerates all paragraphs in the document. The Paragraphs method returns IEnumerable<Paragraph>. The Paragraph class is a tupple class that contains: the XElement node of the paragraph for further querying if necessary, the style of the paragraph, the text of the paragraph, and a collection of comments for the paragraph. It needs to contain a collection because a paragraph can have more than one comment.


    You can see the complete listing here: The WordprocessingML Class 

    Following is a simple example that shows the use of the WordprocessingML class:

    string filename = "Test.docx";

    using (WordprocessingML doc = new WordprocessingML(filename))
        foreach (var p in doc.Paragraphs())
            Console.WriteLine("Style: {0}   Text: >{1}<",
                p.StyleName.PadRight(16), p.Text);
            if (p.Comments != null)
                foreach (var c in p.Comments)
                    Console.WriteLine("  Comment:");
                    Console.WriteLine("  Id: {0}", c.Id);
                    Console.WriteLine("  Author: {0}", c.Author);
                    Console.WriteLine("  Text: >{0}<", c.Text);

    When run on a small document, the code produces the following output:

    Style: Normal             Text: >This is a test.<

      Id: 0
      Author: Eric White
      Text: >Hello world
    Style: Heading1           Text: >This is only a test.<
    Style: Normal             Text: >This is another paragraph.<

      Id: 1
      Author: Eric White
      Text: >another<

Page 3 of 7 (7 items) 12345»
Page 1 of 1 (7 items)