Welcome to MSDN Blogs Sign in | Join | Help

Technical Improvements in the Open XML SDK

Sometimes I get to write a blog post that is really fun to write, and this is one of them.  This particular subject started brewing in my mind last November and December, before I started in my current job.  At the time, I was writing some code to see the most effective and approachable way to access Open XML documents using LINQ to XML.

One of the problems that I ran into is that after I had populated an XML tree from a part, there was no good place to keep that populated XDocument.  It would be possible to keep it in a dictionary, and then look it up from the part every time you need it, but this didn't appeal to me.  However, if the Open XML SDK had annotations, in the style of LINQ to XML, then after populating an XDocument from a part, we can attach the XDocument to the part.  Before populating the XDocument, we first check to see if we already have one.  Well, annotations have been added to the April 2008 CTP of the Open XML SDK.

This makes it easier to deal with the XML contained in the parts.  All a developer needs to do is to load the WordprocessingDocument, and get the XDocument for specific parts as necessary.  If the XDocument has already been loaded, the work to load it will not be repeated.

There are more sophisticated uses of this new feature.  One possible enhancement: automatically reserialize the XDocument objects back to the package if the XDocument was changed.  I'll be blogging more on this.

In the following example, I've written an extension method, GetXDocument, that you can call on any OpenXmlPart.  You can see how this method uses annotations.

public static XDocument GetXDocument(this OpenXmlPart part)
{
    XDocument xdoc = part.Annotation<XDocument>();
    if (xdoc != null)
        return xdoc;
    using (StreamReader streamReader = new StreamReader(part.GetStream()))
        xdoc = XDocument.Load(XmlReader.Create(streamReader));
    part.AddAnnotation(xdoc);
    return xdoc;
}

 

Here is the entire example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Microsoft.Office.DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

namespace OpenXmlSdkExample
{
    public class Comment
    {
        public int Id { get; set; }
        public string Text { get; set; }
        public string Author { get; set; }
        public Paragraph Parent { get; set; }
        public Comment(Paragraph parent) { Parent = parent; }
    }

    public class Paragraph
    {
        public XElement ParagraphElement { get; set; }
        public string StyleName { get; set; }
        public string Text { get; set; }
        public IEnumerable<Comment> Comments()
        {
            XNamespace w =
              "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XElement p = ParagraphElement;

            var commentIds = p
                             .Elements(w + "commentRangeStart")
                             .Attributes(w + "id")
                             .Select(c => (int)c);

            return
                commentIds
                .Select(i =>
                    new Comment(this)
                    {
                        Id = i,
                        Author =
                            Parent.MainDocumentPart.CommentsPart.GetXDocument()
                            .Root
                            .Elements(w + "comment")
                            .Where(c => (int)c.Attribute(w + "id") == i)
                            .First()
                            .Attribute(w + "author")
                            .Value,
                        Text =
                            Parent.MainDocumentPart.CommentsPart.GetXDocument()
                            .Root
                            .Elements(w + "comment")
                            .Where(c => (int)c.Attribute(w + "id") == i)
                            .First()
                            .Descendants(w + "p")
                            .Select(run => run
                                           .Descendants(w + "t")
                                           .StringConcatenate(e => (string)e)
                                           + "\n")
                            .Aggregate(new StringBuilder(), (sb, v) => sb.Append(v), sb => sb.ToString())
                            .Trim()
                    }
                );
        }
        public WordprocessingDocument Parent { get; set; }
        public Paragraph(WordprocessingDocument parent) { Parent = parent; }
    }

    public static class LocalExtensions
    {
        public static string DefaultStyle(this WordprocessingDocument doc)
        {
            XNamespace w =
              "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XDocument styleXDocument = doc.MainDocumentPart.StyleDefinitionsPart.GetXDocument();
            return (string)(
                from style in styleXDocument.Root.Elements(w + "style")
                where (string)style.Attribute(w + "type") == "paragraph" &&
                      (string)style.Attribute(w + "default") == "1"
                select style
            ).First().Attribute(w + "styleId");
        }

        public static IEnumerable<Paragraph> Paragraphs(this WordprocessingDocument doc)
        {
            // a good convention to use is to name the XNamespace
            // variable with the same name as the namespace prefix,
            // and to name XName variables with the local name of the element
            XNamespace w =
              "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XName r = w + "r";
            XName ins = w + "ins";
            string defaultStyle = doc.DefaultStyle();

            // query for all paragraphs in the document.
            return
                from p in doc
                          .MainDocumentPart
                          .GetXDocument()
                          .Root
                          .Element(w + "body")
                          .Descendants(w + "p")
                let styleNode = p
                                .Elements(w + "pPr")
                                .Elements(w + "pStyle")
                                .FirstOrDefault()
                select new Paragraph(doc)
                {
                    ParagraphElement = p,
                    StyleName = styleNode != null ?
                        (string)styleNode.Attribute(w + "val") :
                        defaultStyle,
                    // in the following query, we need to select both
                    // the r and ins elements in order to assemble the text
                    // properly for paragraphs that have tracked changes.
                    Text = p
                           .Elements()
                           .Where(z => z.Name == r || z.Name == ins)
                           .Descendants(w + "t")
                           .StringConcatenate(element => (string)element)
                };
        }

        public static XDocument GetXDocument(this OpenXmlPart part)
        {
            XDocument xdoc = part.Annotation<XDocument>();
            if (xdoc != null)
                return xdoc;
            using (StreamReader streamReader = new StreamReader(part.GetStream()))
                xdoc = XDocument.Load(XmlReader.Create(streamReader));
            part.AddAnnotation(xdoc);
            return xdoc;
        }

        public static string StringConcatenate<T>(this IEnumerable<T> source,
            Func<T, string> func)
        {
            StringBuilder sb = new StringBuilder();
            foreach (T item in source) sb.Append(func(item));
            return sb.ToString();
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            using (WordprocessingDocument wordDoc = WordprocessingDocument.Open("Test.docx", true))
            {
                Console.WriteLine(wordDoc.DefaultStyle());
                foreach (var p in wordDoc.Paragraphs())
                    Console.WriteLine("{0}:{1}", p.StyleName.PadRight(20), p.Text);
            }
        }
    }
}
 

Published Friday, March 14, 2008 5:43 AM by EricWhite
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Open XML SDK roadmap

After nine months of developer feedback on the Open XML SDK , we have some good news today: a roadmap

Friday, March 14, 2008 1:13 AM by Doug Mahugh

# [Open XML] La CTP 2 disponible sous peu et la roadmap du SDK Open XML !

L'annonce vient tout juste de tomber sur openxmldeveloper.org : le SDK Open XML CTP 2 va être mis à disposition

Friday, March 14, 2008 5:09 AM by Blog de Julien Chable (Neodante)

# Open XML SDK

Pubblicata la roadmap di Open XML SDK. Vi segnalo alcuni link di approfondimento: Open XML SDK download

Monday, March 17, 2008 5:08 PM by :: My Telco ::

# re: Technical Improvements in the Open XML SDK

I would like to OOXML 2.0 incorporating XML based open formats for Visio, Publisher and OneNote. Access too if possible and feasible.

Sunday, March 23, 2008 6:37 AM by someone

# New version of the Open XML SDK and roadmap for future versions

On March 13th, 2008, Microsoft announced a roadmap for the Open XML SDK.&#160; The Open XML SDK, originally

Thursday, March 27, 2008 10:33 AM by Robin Mestré: Platform Strategy Advisor

# OpenXML e la standardizzazione ISO

Dopo la standardizzazione ECMA annunciato il completamento del processo formale per&#160; l'approvazione

Tuesday, April 01, 2008 1:09 PM by Il blog del team MSDN Italia

# Announcing the Open XML Format SDK April CTP

We are glad to announce that the Open XML Format SDK April CTP is available! You can download the new

Thursday, April 17, 2008 2:33 PM by Erika Ehrli

# New Version of the Open XML SDK is Available for Download

The April 2008 CTP of the Open XML SDK is now live on the web, and available for download! I'm really

Thursday, April 17, 2008 3:18 PM by Eric White's Blog

# [Open XML] La CTP 2 d'Avril du SDK Open XML disponible !

Suite à l'annonce qu'avait fait Microsoft sur la disponibilité du SDK Open XML, voici enfin venue la

Thursday, April 17, 2008 9:02 PM by Julien Chable

Leave a Comment

(required) 
required 
(required) 
 
Page view tracker