August, 2009

  • Eric White's Blog

    DescendantsAndSelfTrimmed LINQ to XML Axis Method

    • 0 Comments

    There are some circumstances where I need a variation on the DescendantsAndSelf axis method that allows me to specify that specific elements (and the descendants of those elements) are ‘trimmed’ from the returned collection.  One of the things that's great about LINQ to XML is the ease with which we can create specialized axis methods when necessary.

    This blog is inactive.
    New blog: EricWhite.com/blog

    Blog TOC
    Here is the scenario: Open XML contains paragraphs that contain runs.  A run can contain a picture that contains a text box, which itself contains a paragraph.  When I want to write a transform for paragraphs that are children elements of the w:body element, I want to disregard any paragraphs that are children of contained text boxes.

    To make this clear, consider the following markup:

    <document>
      <body>
        <p>
          <r>
            <t>First paragraph.</t>
          </r>
        </p>
        <p>
          <r>
            <pict>
              <shape>
                <textbox>
                  <p>
                    <r>
                      <t>Text in a textbox.</t>
                    </r>
                  </p>
                </textbox>
              </shape>
            </pict>
          </r>
        </p>
        <p>
          <r>
            <t>Second paragraph.</t>
          </r>
        </p>
      </body>
    </document>
     

    In this markup, I want to transform the <p> elements highlighted in yellow, but while processing those elements, I want to completely disregard the <p> element highlighted in green.  If children paragraphs of the textbox element need to be transformed in some fashion, I would deal with them separately – perhaps in an embedded recursive transform.

    To enable selecting descendant elements in this fashion, I’ve implemented four extension methods.  These are variations on the Descendants and DescendantsAndSelf axis methods.  The methods are:

    public static IEnumerable<XElement> DescendantsAndSelfTrimmed(this XElement element,
        XName trimName)

    Returns a filtered collection of elements that contain this element, and all descendant elements of this element, in document order.  Elements that have a matching XName are not included in the collection.  Descendant elements of excluded elements are not included in the collection.

    public static IEnumerable<XElement> DescendantsAndSelfTrimmed(this XElement element,
        Func<XElement, bool> predicate)

    Returns a filtered collection of elements that contain this element, and all descendant elements of this element, in document order.  Elements that are selected by the predicate are not included in the collection.  Descendant elements of excluded elements are not included in the collection.

    public static IEnumerable<XElement> DescendantsTrimmed(this XContainer container,
        XName trimName)

    Returns a filtered collection of the descendant elements for this document or element, in document order.  Elements that have a matching XName are not included in the collection.  Descendant elements of excluded elements are not included in the collection.

    public static IEnumerable<XElement> DescendantsTrimmed(this XContainer container,
        Func<XElement, bool> predicate)

    Returns a filtered collection of elements that contain this element, and all descendant elements of this element, in document order.  Elements that are selected by the predicate are not included in the collection.  Descendant elements of excluded elements are not included in the collection.

    Here is the listing of these extension methods, as well as some example code to exercise them (also attached):

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Xml.Linq;
     
    public static class LocalExtensions
    {
        public static IEnumerable<XElement> DescendantsAndSelfTrimmed(this XElement element,
            XName trimName)
        {
            return DescendantsInternalTrimmed(element, e => e.Name == trimName, true);
        }
     
        public static IEnumerable<XElement> DescendantsAndSelfTrimmed(this XElement element,
            Func<XElement, bool> predicate)
        {
            return DescendantsInternalTrimmed(element, predicate, true);
        }
     
        public static IEnumerable<XElement> DescendantsTrimmed(this XContainer container,
            XName trimName)
        {
            return DescendantsInternalTrimmed(container, e => e.Name == trimName, false);
        }
     
        public static IEnumerable<XElement> DescendantsTrimmed(this XContainer container,
            Func<XElement, bool> predicate)
        {
            return DescendantsInternalTrimmed(container, predicate, false);
        }
     
        private static IEnumerable<XElement> DescendantsInternalTrimmed(this XContainer container,
            Func<XElement, bool> predicate, bool includeSelf)
        {
            if (includeSelf)
            {
                XElement element = container as XElement;
                yield return element;
            }
            Stack<IEnumerator<XElement>> iteratorStack = new Stack<IEnumerator<XElement>>();
            iteratorStack.Push(container.Elements().GetEnumerator());
            while (iteratorStack.Count > 0)
            {
                while (iteratorStack.Peek().MoveNext())
                {
                    XElement currentXElement = iteratorStack.Peek().Current;
                    if (predicate(currentXElement))
                        continue;
                    yield return currentXElement;
                    iteratorStack.Push(currentXElement.Elements().GetEnumerator());
                }
                iteratorStack.Pop();
            }
        }
    }
     
    class Program
    {
        static void Main(string[] args)
        {
            XElement rootElement = XElement.Parse(
    @"<document>
      <body>
        <p>
          <r>
            <t>First paragraph.</t>
          </r>
        </p>
        <p>
          <r>
            <pict>
              <shape>
                <textbox>
                  <p>
                    <r>
                      <t>Text in a textbox.</t>
                    </r>
                  </p>
                </textbox>
              </shape>
            </pict>
          </r>
        </p>
        <p>
          <r>
            <t>Second paragraph.</t>
          </r>
        </p>
      </body>
    </document>");
            foreach (var item in rootElement.DescendantsAndSelfTrimmed("textbox"))
                Console.WriteLine("{0}{1}", "".PadRight(item.Ancestors().Count() * 2),
                    item.Name);
            Console.WriteLine("==========");
            foreach (var item in rootElement
                .DescendantsAndSelfTrimmed(e => e.Name == "p" && e.Ancestors("p").Any()))
                Console.WriteLine("{0}{1}", "".PadRight(item.Ancestors().Count() * 2),
                    item.Name);
            Console.WriteLine("==========");
            foreach (var item in rootElement.DescendantsTrimmed("r"))
                Console.WriteLine("{0}{1}", "".PadRight(item.Ancestors().Count() * 2),
                    item.Name);
            Console.WriteLine("==========");
            foreach (var item in rootElement.DescendantsTrimmed(e => e.Name == "r"))
                Console.WriteLine("{0}{1}", "".PadRight(item.Ancestors().Count() * 2),
                    item.Name);
            Console.WriteLine("==========");
            XDocument doc = XDocument.Parse(
    @"<document>
      <body>
        <p>
          <r>
            <t>First paragraph.</t>
          </r>
        </p>
        <p>
          <r>
            <pict>
              <shape>
                <textbox>
                  <p>
                    <r>
                      <t>Text in a textbox.</t>
                    </r>
                  </p>
                </textbox>
              </shape>
            </pict>
          </r>
        </p>
        <p>
          <r>
            <t>Second paragraph.</t>
          </r>
        </p>
      </body>
    </document>");
            foreach (var item in doc.DescendantsTrimmed(e => e.Name == "r"))
                Console.WriteLine("{0}{1}", "".PadRight(item.Ancestors().Count() * 2),
                    item.Name);
        }
    }

Page 1 of 3 (3 items) 123
Page 1 of 1 (3 items)