Blog - Title

Implementing 'Inheritance' in XML

Implementing 'Inheritance' in XML

  • Comments 11

Some XML vocabularies implement a powerful XML pattern that is analogous to inheritance in programming language type systems.  Open XML WordprocessingML has these semantics around styles.  When you define a new style, you can base this style on another style, and if the new style doesn't explicitly define some aspect of the style, the new style 'inherits' this aspect from its base style.  This post presents some code that uses one approach for implementing inheritance semantics.

This is one in a series of posts on transforming Open XML WordprocessingML to XHtml.  You can find the complete list of posts here.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
Consider the following XML:

<Root>
  <!-- Style: Merge child elements -->
  <StyleStyleId="RootStyle">
    <!-- Font and descendants: Replace -->
    <Font>
      <FamilyVal="Courier"/>
      <SizeVal="12"/>
    </Font>
    <!-- VisualProps: Merge attributes -->
    <VisualPropsForeColor="Black"/>
    <!-- Positioning: Merge child elements -->
    <Positioning>
      <SpaceAfterVal="12"/>
    </Positioning>
  </Style>
  <StyleStyleId="CompanyThemeHeading1"BaseStyleId="RootStyle">
    <Font>
      <FamilyVal="Cambria"/>
      <SizeVal="14"/>
    </Font>
    <VisualPropsForeColor="Blue"BackColor="OffWhite"/>
    <Positioning>
      <SpaceAfterVal="10"/>
    </Positioning>
  </Style>
  <StyleStyleId="Heading1"
         BaseStyleId="CompanyThemeHeading1">
    <VisualPropsBold="true"/>
    <Positioning>
      <IndentVal="10"/>
    </Positioning>
  </Style>
</Root>

The Heading1 style is based on the CompanyThemeHeading1 style, which is itself based on RootStyle.  The programming task is to 'roll up' these styles, and assemble a new Style element that contains all inherited child elements as appropriate.  Note that the order of the Style elements in the XML document is not significant.  The method to 'roll up' these styles should work properly regardless of the document order of the Style elements.

In the case of Open XML styles, there are three varieties of inheritance semantics:

  • Replace an element with the same element and attributes in the derived style.
  • Merge child elements with the corresponding element in the derived style.  In the case of WordprocessingML, when an element has these semantics, it will not have any attributes, so we can disregard them.
  • Merge attributes with the attributes of the same element in the derived style.  In the case of WordprocessingML, when an element has these semantics, it will not have any child nodes, so we can disregard them.

We should note that this doesn't cover all possibilities of inheritance semantics.  Other possibilities:

  • Merge attributes and replace child elements
  • Merge attributes and merge child elements
  • Replace attributes and merge child elements

For a detailed examination of the Open XML inheritance semantics, see Open XML WordprocessingML Style Inheritance.

When implementing this in code, the first task is to write an iterator that will return a collection of styles.  We pass the StyleId of the most derived style, and this method will follow the style chain, yielding up each style in order.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;

class Program
{
    static IEnumerable<XElement> StyleChainReverseOrder(XElement styles, string styleId)
    {
        string current = styleId;
        while (true)
        {
            XElement style = styles.Elements("Style")
                .Where(s => (string)s.Attribute("StyleId") == current).FirstOrDefault();
            yield return style;
            current = (string)style.Attribute("BaseStyleId");
            if (current == null)
                yield break;
        }
    }

    static void Main(string[] args)
    {
        XElement styles = XElement.Load("Styles.xml");
        foreach (var style in StyleChainReverseOrder(styles, "Heading1"))
        {
            Console.WriteLine(style);
            Console.WriteLine();
        }
    }
}

If you are not familiar with writing iterators, see The Yield Contextual Keyword.

When you run this example, you see:

<StyleStyleId="Heading1"
       BaseStyleId="CompanyThemeHeading1">
  <VisualPropsBold="true" />
  <Positioning>
    <IndentVal="10" />
  </Positioning>
</Style>

<StyleStyleId="CompanyThemeHeading1"
       BaseStyleId="RootStyle">
  <Font>
    <FamilyVal="Cambria" />
    <SizeVal="14" />
  </Font>
  <VisualPropsForeColor="Blue"
               BackColor="OffWhite" />
  <Positioning>
    <SpaceAfterVal="10" />
  </Positioning>
</Style>

<StyleStyleId="RootStyle">
  <!-- Font and descendants: Replace -->
  <Font>
    <FamilyVal="Courier" />
    <SizeVal="12" />
  </Font>
  <!-- VisualProps: Merge attributes -->
  <VisualPropsForeColor="Black" />
  <!-- Positioning: Merge child elements -->
  <Positioning>
    <SpaceAfterVal="12" />
  </Positioning>
</Style>

We've retrieved the collection of relevant styles (Heading1, CompanyThemeHeading1, and RootStyle), but they are in the reverse order to the order that we want to process them.  That is easy enough to fix - we write a method that returns the collection in reverse:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;

class Program
{
    static IEnumerable<XElement> StyleChainReverseOrder(XElement styles, string styleId)
    {
        string current = styleId;
        while (true)
        {
            XElement style = styles.Elements("Style")
                .Where(s => (string)s.Attribute("StyleId") == current).FirstOrDefault();
            yield return style;
            current = (string)style.Attribute("BaseStyleId");
            if (current == null)
                yield break;
        }
    }

    static IEnumerable<XElement> StyleChain(XElement styles, string styleId)
    {
        return StyleChainReverseOrder(styles, styleId).Reverse();
    }

    static void Main(string[] args)
    {
        XElement styles = XElement.Load("Styles.xml");
        foreach (var style in StyleChain(styles, "Heading1"))
        {
            Console.WriteLine(style);
            Console.WriteLine();
        }
    }
}

Now we can write a method to merge two styles.  The Style element has 'merge child semantics', so we'll write a method to do that:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;

class Program
{
    static IEnumerable<XElement> StyleChainReverseOrder(XElement styles, string styleId)
    {
        string current = styleId;
        while (true)
        {
            XElement style = styles.Elements("Style")
                .Where(s => (string)s.Attribute("StyleId") == current).FirstOrDefault();
            yield return style;
            current = (string)style.Attribute("BaseStyleId");
            if (current == null)
                yield break;
        }
    }

    static IEnumerable<XElement> StyleChain(XElement styles, string styleId)
    {
        return StyleChainReverseOrder(styles, styleId).Reverse();
    }

    static XElement MergeChildElements(XElement mergedElement, XElement element)
    {
        // If, when in the process of merging, the source element doesn't have a
        // corresponding element in the merged element, then include the source element
        // in the merged element.
        if (mergedElement == null)
            return element;

        XElement newMergedElement = new XElement(element.Name,
            element.Attributes(),
            element.Elements().Select(e =>
            {
                // Replace
                if (e.Name == "Font"
                    // || e.Name = "OtherElementWithReplaceSemantics"
                    )
                    return e;

                // Merge attributes
                if (e.Name == "VisualProps")
                    return new XElement(e.Name,
                        e.Attributes(),
                        mergedElement.Elements(e.Name).Attributes()
                            .Where(a => !(e.Attributes().Any(z => z.Name == a.Name))));

                // All other elements have merge child elements
                XElement correspondingElement = mergedElement.Element(e.Name);
                if (correspondingElement == null)
                    return e;
                return new XElement(e.Name,
                    e.Attributes(),
                    e.Elements().Select(c =>
                        MergeChildElements(correspondingElement.Element(c.Name), c)),
                    correspondingElement.Elements().Where(m => e.Element(m.Name) == null)
                    );
            }),
            mergedElement.Elements()
                .Where(m => !element.Elements(m.Name).Any()));
        return newMergedElement;
    }

    static void Main(string[] args)
    {
        XElement styles = XElement.Load("Styles.xml");
        var styleChain = StyleChain(styles, "Heading1").ToArray();
        XElement s1 = MergeChildElements(
            styleChain[0],
            styleChain[1]);
        Console.WriteLine(s1);
    }
}

This is a recursive method - when merging child elements, we need to implement appropriate semantics for each of the child elements.  To test the method, we merge the first two styles in the style chain.  When we run this on our sample XML document, we see:

<StyleStyleId="CompanyThemeHeading1"
       BaseStyleId="RootStyle">
  <Font>
    <FamilyVal="Cambria" />
    <SizeVal="14" />
  </Font>
  <VisualPropsForeColor="Blue"
               BackColor="OffWhite" />
  <Positioning>
    <SpaceAfterVal="10" />
  </Positioning>
</Style>

This is what we expected.  We see the Font element from the CompanyThemeHeading1 style.  The attributes of the VisualProps element are merged.  Positioning, which has 'merge child elements' semantics, contains the correct SpaceAfter element, which replaced the SpaceAfter element in the RootStyle.

Now we can write a one statement method that uses the Aggregate extension method to roll up all styles:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;

class Program
{
    static IEnumerable<XElement> StyleChainReverseOrder(XElement styles, string styleId)
    {
        string current = styleId;
        while (true)
        {
            XElement style = styles.Elements("Style")
                .Where(s => (string)s.Attribute("StyleId") == current).FirstOrDefault();
            yield return style;
            current = (string)style.Attribute("BaseStyleId");
            if (current == null)
                yield break;
        }
    }

    static IEnumerable<XElement> StyleChain(XElement styles, string styleId)
    {
        return StyleChainReverseOrder(styles, styleId).Reverse();
    }

    static XElement MergeChildElements(XElement mergedElement, XElement element)
    {
        // If, when in the process of merging, the source element doesn't have a
        // corresponding element in the merged element, then include the source element
        // in the merged element.
        if (mergedElement == null)
            return element;

        XElement newMergedElement = new XElement(element.Name,
            element.Attributes(),
            element.Elements().Select(e =>
            {
                // Replace
                if (e.Name == "Font"
                    // || e.Name = "OtherElementWithReplaceSemantics"
                    )
                    return e;

                // Merge attributes
                if (e.Name == "VisualProps")
                    return new XElement(e.Name,
                        e.Attributes(),
                        mergedElement.Elements(e.Name).Attributes()
                            .Where(a => !(e.Attributes().Any(z => z.Name == a.Name))));

                // All other elements have merge child elements
                XElement correspondingElement = mergedElement.Element(e.Name);
                if (correspondingElement == null)
                    return e;
                return new XElement(e.Name,
                    e.Attributes(),
                    e.Elements().Select(c =>
                        MergeChildElements(correspondingElement.Element(c.Name), c)),
                    correspondingElement.Elements().Where(m => e.Element(m.Name) == null)
                    );
            }),
            mergedElement.Elements()
                .Where(m => !element.Elements(m.Name).Any()));
        return newMergedElement;
    }

    static XElement AssembleDerivedStyle(XElement styles, string styleId)
    {
        return StyleChain(styles, styleId)
            .Aggregate(
                new XElement("Style"),
                (mergedStyle, style) => MergeChildElements(mergedStyle, style));
    }

    static void Main(string[] args)
    {
        XElement styles = XElement.Load("Styles.xml");
        Console.WriteLine(AssembleDerivedStyle(styles, "Heading1"));
    }
}

When you run this on the sample data, you see:

<StyleStyleId="Heading1"
       BaseStyleId="CompanyThemeHeading1">
  <VisualPropsBold="true"
               ForeColor="Blue"
               BackColor="OffWhite" />
  <Positioning>
    <IndentVal="10" />
    <SpaceAfterVal="10" />
  </Positioning>
  <Font>
    <FamilyVal="Cambria" />
    <SizeVal="14" />
  </Font>
</Style>

This is what we expected.  VisualProps has merged attributes, Font is inherited from CompanyThemeHeading1, and Positioning has the SpaceAfter element from the CompanyThemeHeading1 style, and the Indent element from the Heading1 style.

Leave a Comment
  • Please add 6 and 8 and type the answer here:
  • Post
  • Oh wow, this is amazing! Thank you!

    Eric, I'm primarily working with PresentationML and DrawingML and VB.NET, but your examples both work perfectly for me to port over and are exactly what I'm always looking for in terms of transforming OOXML to another XML vocabulary. Thank you so much for this blog, it is incredibly helpful!

  • Hey Otaku, I'm really glad you like it.  This has been a really fun project, and I'm especially glad when it is useful to others.

    -Eric

  • Hi Eric,

    I had a quick question about the (e.Name == "VisualProps") part of your code. This picks up attributes from both the element and the MergedElement - but what if you had the same attribute in both, like ForeColor? I'm wondering how one overwrites the other with this code (sorry, my C# skills are not strong). Could you explain a little about how that may work?

    Cheers,

    Todd

  • Hi Todd,

    If the element name is "VisualProps" then this method creates a new element with the same name.  The following code does this:

    // Merge attributes
    if (e.Name == "VisualProps")
        return new XElement(e.Name,

    It then adds all attributes from the element being cloned.

    // Merge attributes
    if (e.Name == "VisualProps")
        return new XElement(e.Name,
            e.Attributes(),

    It then queries for all attributes in the mergedElement where there isn't already an attribute of the same name, and adds those attributes to the element being constructed:

    // Merge attributes
    if (e.Name == "VisualProps")
        return new XElement(e.Name,
            e.Attributes(),
            mergedElement.Elements(e.Name).Attributes()
            .Where(a => !(e.Attributes().Any(z => z.Name == a.Name))));

    That last bit of code is what does what you are asking about.  That call to the Where extension method is what filters out any attributes that already exist for the element being cloned.  This means that the new element picks up all of the attributes of the element that is overriding, and then any attributes of the base element where those attributes have not been overridden.  This code uses one of my favorite idioms for writing robust code.  It uses the Attributes extension method, which returns all attributes for all elements in the source collection.

    The following post discusses this technique, although the examples it gives use the Elements extension method, but the same approach works for attributes.

    http://blogs.msdn.com/ericwhite/archive/2009/05/14/working-with-optional-elements-and-attributes-in-linq-to-xml-queries.aspx

    -Eric

  • That is very helpful, thank you Eric!

  • Eric - is it okay to ask questions on this blog related to my code? I've used your concepts/direction above for a merge of PowerPoint title placeholder shapes (but I'm merging the slide shape into the slideLayout shape). If it's not okay, that's cool, just wondering if you help look at other's code and comment.

    Cheers,

    Todd

  • Well, I'm a bit desperate here, so I thought I would take a chance. Almost all of this code is working as I need it, the issue is the <a:p> tags. I can see where it's wrong, but just can't figure out how to fix it.

       Sub Main()

           Dim slideLayoutPlaceholder = <p:sp>

                                            <p:nvSpPr>

                                                <p:cNvPr id="2" name="Title 1"/>

                                                <p:cNvSpPr>

                                                    <a:spLocks noGrp="1"/>

                                                </p:cNvSpPr>

                                                <p:nvPr>

                                                    <p:ph type="title"/>

                                                </p:nvPr>

                                            </p:nvSpPr>

                                            <p:spPr>

                                                <a:xfrm>

                                                    <a:off x="0" y="304800"/>

                                                    <a:ext cx="8229600" cy="1143000"/>

                                                </a:xfrm>

                                            </p:spPr>

                                            <p:txBody>

                                                <a:bodyPr/>

                                                <a:lstStyle/>

                                                <a:p>

                                                    <a:r>

                                                        <a:rPr lang="en-US" smtClean="0"/>

                                                        <a:t>Click to edit Master title style</a:t>

                                                    </a:r>

                                                    <a:endParaRPr lang="en-US"/>

                                                </a:p>

                                            </p:txBody>

                                        </p:sp>

           Dim slidePlaceholder = <p:sp>

                                      <p:nvSpPr>

                                          <p:cNvPr id="13" name="Title 12"/>

                                          <p:cNvSpPr>

                                              <a:spLocks noGrp="1"/>

                                          </p:cNvSpPr>

                                          <p:nvPr>

                                              <p:ph type="title"/>

                                          </p:nvPr>

                                      </p:nvSpPr>

                                      <p:spPr/>

                                      <p:txBody>

                                          <a:bodyPr numCol="2">

                                              <a:normAutofit fontScale="90000"/>

                                          </a:bodyPr>

                                          <a:lstStyle/>

                                          <a:p>

                                              <a:pPr algn="l"/>

                                              <a:r>

                                                  <a:rPr lang="en-US" dirty="0" smtClean="0"/>

                                                  <a:t>This is my title</a:t>

                                              </a:r>

                                              <a:br>

                                                  <a:rPr lang="en-US" dirty="0" smtClean="0"/>

                                              </a:br>

                                              <a:r>

                                                  <a:rPr lang="en-US" dirty="0" smtClean="0"/>

                                                  <a:t>after break</a:t>

                                              </a:r>

                                              <a:br>

                                                  <a:rPr lang="en-US" dirty="0" smtClean="0"/>

                                              </a:br>

                                              <a:r>

                                                  <a:rPr lang="en-US" b="1" dirty="0" err="1" smtClean="0"/>

                                                  <a:t>break again</a:t>

                                              </a:r>

                                              <a:endParaRPr lang="en-US" b="1" dirty="0"/>

                                          </a:p>

                                      </p:txBody>

                                  </p:sp>

           Dim y = MergeChildElements(slideLayoutPlaceholder, slidePlaceholder)

           RichTextBox1.Text = y.ToString

       End Sub

       Function MergeChildElements(ByVal mergeToElement As XElement, ByVal mergeFromElement As XElement) As XElement

           If mergeFromElement Is Nothing Then

               Return mergeToElement

           End If

           Dim newMergedElement As New XElement( _

                                               mergeToElement.Name, _

                                               mergeFromElement.Attributes, _

                                               mergeToElement.Attributes.Where(Function(a) Not (mergeFromElement.Attributes.Any(Function(z) z.Name = a.Name))), _

                                               mergeToElement.Elements.Select(Function(mt) ReturnResult(mt, mergeFromElement)))

           Return newMergedElement

       End Function

       Function ReturnResult(ByVal mt As XElement, ByVal mergeFromElement As XElement) As XElement

           If mergeFromElement.Name.LocalName = "p" Then

               Return mergeFromElement

           End If

           Dim correspondingElement As XElement = mergeFromElement.Element(mt.Name)

           If correspondingElement Is Nothing Then

               Return mt

           Else

               Return New XElement(mt.Name, mergeFromElement.Elements(mt.Name).Attributes, _

                                   mt.Attributes.Where(Function(a) Not (mergeFromElement.Element(mt.Name).Attributes.Any(Function(z) z.Name = a.Name))), _

                                   mt.Elements.Select(Function(c) MergeChildElements(c, correspondingElement.Element(c.Name))))

           End If

       End Function

  • HI Eric,

    I am looking at some pointers to XML inheritance as in, I say, in an XML file,

    <Contact>

     <ContactFirstName/>

     <ContactLastName/>

    </Contact>

    Now I need to create another XML like, say,

    <ContactDerived base="Contact">

    <PhoneNum/>

    </ContactDerived>

    I have these XMLs in seperate files.Can we treat XML just like any Class Object? or is only Schema inheritance possible?

    Is this possible?

  • Hi Eric,

    Very helpful series!

    I just noticed that the recursion implemented in MergeChildElements is not working for leaf element because element.Elements() won't return anything and thus you won't enter the Select method:

    XElement newMergedElement = new XElement(element.Name,

               element.Attributes(),

               element.Elements().Select(...

    One way to fix the issue is to test for empty elements before the above code and to implement again the replace/merge attibutes/merge elements semantics:

    if (element.Elements().FirstOrDefault() == null) {

    // replace, merge attributes,...

    }

    XElement newMergedElement = new XElement(element.Name,

               element.Attributes(),

               element.Elements().Select(...

    Cheers,

    Bruno

  • Any chance of a streaming example?  I have a HUGE document to search for elements and it supports optional inheritance...my head hurts!

  • Hi Rich, I don't have a streaming example.

    Just curious, how big is it?  I have sometimes very large XML documents - more than 5 million elements, and still can process in-memory fairly easily.  I'm wondering what is the max size that people encounter.

    -Eric

Page 1 of 1 (11 items)