July, 2010

  • Eric White's Blog

    Processing all Content Parts in an Open XML WordprocessingML Document

    • 1 Comments

    In Open XML WordprocessingML documents, there are five types of parts that can contain content such as paragraphs (with or without tracked revisions), tables, rows, cells, and any of a variety of content controls:

    • This blog is inactive.
      New blog: EricWhite.com/blog

      Blog TOC
      Main document part
    • Header parts (there can be more than one)
    • Footer parts (there can be more than one)
    • Endnotes (there can be zero or one)
    • Footnotes (there can be zero or one)

    There are certain Open XML programming scenarios where you need to process all varieties of parts that contain content:

    • You need to search for specific words in a document, regardless of where those words occur.
    • You need to accept tracked changes anywhere they appear in the document.
    • You need to process content controls anywhere they occur in the document, perhaps to bind them to XML in a custom XML part.

    The following example shows how to search for all content controls in a document, regardless of whether those content controls are in the main document part, in the headers/footers, or in endnotes/footnotes.  This example uses LINQ to XML.  If you are using the strongly-typed OM of the Open XML SDK, the code would be identical, except for the code to actually process the content controls.

    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;
    using System.Xml;
    using System.Xml.Linq;
    using DocumentFormat.OpenXml.Packaging;

    public static class Extensions
    {
        public static XDocument GetXDocument(this OpenXmlPart part)
        {
            XDocument partXDocument = part.Annotation<XDocument>();
            if (partXDocument != null)
                return partXDocument;
            using (Stream partStream = part.GetStream())
            using (XmlReader partXmlReader = XmlReader.Create(partStream))
                partXDocument = XDocument.Load(partXmlReader);
            part.AddAnnotation(partXDocument);
            return partXDocument;
        }
    }

    class Program
    {
        private static void IterateContentControlsForPart(OpenXmlPart part)
        {
            XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XDocument doc = part.GetXDocument();
            foreach (var sdt in doc.Descendants(w + "sdt"))
            {
                Console.WriteLine("Found content control");
                Console.WriteLine("=====================");
                Console.WriteLine(sdt.ToString());
                Console.WriteLine();
            }
        }

        public static void IterateContentControls(WordprocessingDocument doc)
        {
            IterateContentControlsForPart(doc.MainDocumentPart);
            foreach (var part in doc.MainDocumentPart.HeaderParts)
                IterateContentControlsForPart(part);
            foreach (var part in doc.MainDocumentPart.FooterParts)
                IterateContentControlsForPart(part);
            if (doc.MainDocumentPart.EndnotesPart != null)
                IterateContentControlsForPart(doc.MainDocumentPart.EndnotesPart);
            if (doc.MainDocumentPart.FootnotesPart != null)
                IterateContentControlsForPart(doc.MainDocumentPart.FootnotesPart);
        }

        static void Main(string[] args)
        {
            using (WordprocessingDocument doc = WordprocessingDocument.Open("Test.docx", false))
                IterateContentControls(doc);
        }
    }

Page 9 of 9 (9 items) «56789
Page 1 of 1 (9 items)