When working with WordprocessingML, nearly all of the information that we need to render paragraphs, tables, and numbered items is contained in styles, stored in the WordprocessingML Style Definitions part. Styles are somewhat complicated because styles have inherited behavior – one style can be based on another style. Rendering of text that has the derived style then is dependent on the derived style, it's base class, that base class's base class, and so on. The Open XML specification refers to this list of styles that are derived from other styles as the 'style chain', which accurately describes the abstraction.
This is one in a series of posts on transforming Open XML WordprocessingML to XHtml. You can find the complete list of posts here.
This blog is inactive.New blog: EricWhite.com/blogBlog TOCWhen determining the set of properties for rendering a paragraph or table, the first job is to 'roll up' all styles in the style chain, creating a single set of properties that we can apply to the paragraph or table. This process of 'rolling up' styles is made somewhat more complicated because there are four variations of semantics that we must apply to elements in the rolling-up process.
However, it's not too complicated, and after carefully defining the semantics of 'rolling-up' styles in the style chain, we can write a small bit of generalized code to do this – probably less than 100 lines of code.
You'll notice something about the semantics of style inheritance – by far, when rolling up the styles, the most common operation is to replace any elements in base styles with an element in a derived style. In the code that I'm going to write which will roll-up styles, if the inheritance semantics are other than merging attributes or merging child elements, then the default behavior will be to do element replacement. This will make the code as small and robust as possible.
This post probably isn't of very much interest to most people, but to the folks who are interested, it will be very important. I'm in the process of writing a fairly compact conversion of Open XML to XHtml, and needed to work out the exact behavior of style inheritance. After working it out, it made good sense to blog it to make life easier for others who need to work with rendering issues of WordprocessingML.
In some cases, we must iterate through attributes of a particular element, and if the element in the derived style has an attribute, we must apply that attribute, overriding the attribute in the base style. In many cases, the base style may not define that particular attribute, so in that case, we must simply add the attribute to the element in the rolled-up style. For example, we may have a style, SpaceBefore, which defines a style that has space before the paragraph, but no space after:
<w:stylew:type="paragraph" w:customStyle="1" w:styleId="SpaceBefore"> <w:namew:val="SpaceBefore"/> <w:basedOnw:val="Normal"/> <w:qFormat/> <w:rsidw:val="00A670C6"/> <w:pPr> <w:spacingw:before="200" w:after="0"/> </w:pPr></w:style>
We may have a style, SpaceBeforeAndAfter, which defines the w:spacing element with a w:after attribute, like this:
<w:stylew:type="paragraph" w:customStyle="1" w:styleId="SpaceBeforeAndAfter"> <w:namew:val="SpaceBeforeAndAfter"/> <w:basedOnw:val="SpaceBefore"/> <w:qFormat/> <w:rsidw:val="00A670C6"/> <w:pPr> <w:spacingw:after="200"/> </w:pPr></w:style>
After 'rolling-up' the style chain, the style that we must apply to a paragraph that has the SpaceBeforeAndAfter style would look like this:
<w:stylew:type="paragraph" w:customStyle="1" w:styleId="SpaceBeforeAndAfter"> <w:namew:val="SpaceBeforeAndAfter"/> <w:basedOnw:val="SpaceBefore"/> <w:qFormat/> <w:rsidw:val="00A670C6"/> <w:pPr> <w:spacingw:before="200" w:after="200"/> </w:pPr></w:style>
In some cases, we must merge child elements. We must iterate through all child elements of an element in the derived style, and if the base style doesn't contain a particular element, we must add that element to the 'rolled-up' style. If the base style does contain the element of interest, then we must either merge attributes or replace the child elements, based on the semantics defined for that child element. The w:pPr and w:rPr elements are examples of elements that require this type of inheritance.
Consider the style NotIndented, which defines paragraph properties (w:pPr) as follows:
<w:stylew:type="paragraph" w:customStyle="1" w:styleId="NotIndented"> <w:namew:val="NotIndented"/> <w:basedOnw:val="Normal"/> <w:qFormat/> <w:rsidw:val="00082E03"/> <w:pPr> <w:spacingw:after="0"/> </w:pPr></w:style>
The following style, Indented, derives from NotIndented:
<w:stylew:type="paragraph" w:customStyle="1" w:styleId="Indented"> <w:namew:val="Indented"/> <w:basedOnw:val="NotIndented"/> <w:qFormat/> <w:rsidw:val="00082E03"/> <w:pPr> <w:indw:left="720"/> </w:pPr></w:style>
After rolling up all styles in the style chain, the style that we should apply to text styled as Indented would be defined as follows:
<w:stylew:type="paragraph" w:customStyle="1" w:styleId="Indented"> <w:namew:val="Indented"/> <w:basedOnw:val="NotIndented"/> <w:qFormat/> <w:rsidw:val="00082E03"/> <w:pPr> <w:spacingw:after="0"/> <w:indw:left="720"/> </w:pPr></w:style>
Note that both the w:spacing and w:ind elements require that their attributes be merged. In most cases, per the list below, elements are replaced (as opposed to merging of attributes).
In some cases, while rolling-up styles, we must replace an element and its attributes wholesale. We don't need to iterate through attributes, replacing individual attributes. The w:top (Paragraph Border Above Identical Paragraphs) element has these semantics. Consider the following style that defines a single line, with a size of 4 eighth's of a point, and with a color of red (FF0000 in hex):
<w:stylew:type="paragraph" w:customStyle="1" w:styleId="TopBorder1"> <w:namew:val="TopBorder1"/> <w:basedOnw:val="Normal"/> <w:qFormat/> <w:rsidw:val="007850D3"/> <w:pPr> <w:pBdr> <w:topw:val="single" w:sz="4" w:space="1" w:color="FF0000"/> </w:pBdr> </w:pPr></w:style>
Here is a derived style, TopBorder2, which defines a top border, with a size of 18 eighth's of a point, and no color defined:
<w:stylew:type="paragraph" w:customStyle="1" w:styleId="TopBorder2"> <w:namew:val="TopBorder2"/> <w:basedOnw:val="TopBorder1"/> <w:qFormat/> <w:rsidw:val="00315108"/> <w:pPr> <w:pBdr> <w:topw:val="single" w:sz="18" w:space="1"/> </w:pBdr> </w:pPr></w:style>
After rolling up the styles in the style chain, the resulting style that should be applied to a paragraph styled TopBorder2 should be like this:
Notice that the w:color attribute was not inherited from TopBorder1. The w:top element, along with its attributes, was replaced wholesale.
(Update December 13, 2009 - I've written a bit of code to show how to implement XML inheritance.)
There is one special case where merging semantics are slightly more complicated. Table styles have a very powerful feature called conditional table formatting. This feature allows us to specify a special set of formatting properties for the top row, the first column, the bottom row, banded columns, banded rows, cells at the top left, top right, etc. Conditional table formatting is defined in the w:tblStylePr element. The following table style (markup has been simplified) contains a w:tblStylePr element for the first row, and a w:tblStylePr element for the first column:
<w:stylew:type="table" w:customStyle="1" w:styleId="LightListRedHeader"> <w:namew:val="Light List Red Header"/> <w:basedOnw:val="LightList"/> <w:tblStylePrw:type="firstRow"> <w:pPr> <w:spacingw:before="0" w:after="0" w:line="240" w:lineRule="auto"/> </w:pPr> <w:rPr> <w:b/> <w:bCs/> <w:colorw:val="FFFFFF" w:themeColor="background1"/> </w:rPr> <w:tblPr/> <w:tcPr> <w:shdw:val="clear" w:color="auto" w:fill="FF0000"/> </w:tcPr> </w:tblStylePr> <w:tblStylePrw:type="firstCol"> <w:rPr> <w:b/> <w:bCs/> </w:rPr> </w:tblStylePr>
A table style definition most often will have several w:tblStylePr elements. We can't simply merge child elements for the w:tblStylePr element. We must first match the w:type attribute, and then merge child elements.
The table at the end of this post summarizes the semantics that we must apply when 'rolling-up' styles.
A fair number of elements in the style hierarchy exist solely for the user interface or other purposes. We are only interested in rolling up those elements that impact presentation, so I'm eliminating elements that don't apply. A few elements (name and basedOn) are used in the rolling-up process, so I am listing those.
Note that this is only part of the story around putting together the style information for a cell in a table. After rolling up styles in a style chain into a single set of properties for a table, we must also roll up character formatting information, which involves rolling up run formatting information for the table, for paragraph styles, and for run styles. Before rolling any of this up, we need to take the global run properties into consideration. And when rolling up this information over the hierarchy (table, paragraph, run), we need to handle something called toggle properties. Finally, where appropriate, we must retrieve information from the theme of the document. Stay tuned…
Merge child elements
Used when assembling inheritance information
Merge child elements (Conditional Table Formatting Properties). See the note about this element above.
Eric, great article. I recently had this exact need - take a base element and an inheretid element and merge them together. Your article was the only one available in every search I did that was the exact same issue I need to address. I was a bit saddened that you didn't have code examples, but I'm sure that's forthcoming. Great job though!
Hi Otaku, I've put together a code sample that shows how to do this. Take a look at Implementing 'Inheritance' in XML.
Do you know if there is documentation about how this pipeline of inheritance happens? Or is it just trying to connect the dots in the Ecma docs by reading and logging on your own in a sheet?
Hi Todd, It's all in the spec. Sections 17, 17.1, and 17.2 of IS29500 are of particular importance. It is possible to infer by the element type the semantics of inheritance. Some aspects could be considered as implementation/representation details also, which isn't necessarily within scope of the spec. Having worked this out, I wanted to make it easier for others who also need to process styling information in the spec. Reading the spec and presenting that information in a 'narative' form is one of the most enjoyable aspects of my job.
Thanks Eric. Okay, that's kind of what I was thinking that I would have to go ghrough the ISO/ECMA docs and infer the pipeline of inheritance. Appreciate you taking the time to do part of this for us! Now how about Excel and PowerPoint? :) Just kidding...
Hi Todd, I *believe* that in this series of blog posts, I've covered all of the semantics of style inheritance (with the exception of some aspects that are super-easy to get from the spec). I'm just about to write the code to assemble actual styling information for a specific paragraph/table/cell, so if I've missed some aspect, it will become apparent soon enough. But if you find some aspect of styling information that isn't clear in these blog posts, feel free to email me / leave questions as comments. As time permits, I hopefully will write an MSDN article that summarizes the assembly of styling information.
As for Excel, it's in my sights...