[Blog Map]
(Update Nov 11, 2009: This is the 6th in a series of posts (#1, #2, #3, #4, #5, #6) on doing a transform of WordprocessingML to XHtml.)
When we want to render a paragraph and its runs inside of a cell, we need to assemble the paragraph and run properties from a number of places. In a previous post, I explained how style inheritance works, and how you 'roll-up' styles from the style chain. That is only part of the story. This post details how we assemble styling information from:
-
Table styles
-
The formatting directly applied to tables, paragraphs, and runs
-
The global default paragraph and run properties.
In the process of assembling paragraph and run properties, we also need to correctly handle something called 'Toggle Properties'.
Note: next, I'm going to tackle the semantics of numbering styles, and then I believe I'm ready to start coding in earnest. I've made a decision in this project to first implement a transform to XHtml without styling information. The resulting XHtml will contain just the content of the document. I decided to do this because it is useful in its own right, and we need it for another project. Having code to extract the contents of a document in the most succinct form possible has a lot of uses. Of course, this XHtml still can be rendered in a browser, and in some cases, it will be useful to do this. Then, after publishing that code, I'll start implementing styling behavior.
Table Styles
A very powerful and cool feature of Word is that when you are applying a table style you pick and choose which aspects of the style you want to apply. For consistency, you can apply the same style to all tables in your document, but some tables may have a total row, and other tables may not:
You can apply the same style to both, then pick and choose which aspects of the table style to apply. When you are applying a table style, this is what the Ribbon in Word 2007 looks like:
You can see the range of check boxes in the Table Style Options section of the Ribbon, and how you can pick and choose which aspects of the style you want to apply. This has ramifications for us when we are assembling styling information for a cell. We know the table style for the table, and we also know the values of those check boxes, so we have to apply the various aspects of the table style per the user's preferences from those check boxes.
Before we dive into table styles in depth, we need to cover toggle properties, which play a part in how table styles work.
Toggle Properties
Toggle properties consist of a set of run properties that have a little twist in their semantics when assembling formatting information in preparation to rendering paragraphs in some fashion. The w:b element (which styles a run as bold) is a good example of a toggle property.
Here's how toggle properties work:
Toggle properties only have their toggle behavior when associated with table styles, paragraph styles, and character styles. If a run has been made bold per the table style, and the user applies a paragraph style that also has the w:b element, the net result is that original bolded text is now made not bold. And if some portion of that paragraph has a bold character style applied to it, that portion is now made bold again.
This makes sense. The table style designer designated that a cell be bold. The paragraph style designer had the intention of making text in the paragraph stand out. But the text is already bold, so that intent won't be satisfied, so to make it stand out, we reverse the boldness of the text. The same reasoning also applies to a character style that has the w:b element.
It's just these three types of styles (table, paragraph, and character) that we need to process in this fashion. If the user subsequently selects that text and presses the bold button on the toolbar, setting the properties on the run itself (not a style), we honor his or her intention, regardless of the boldness of the table, paragraph, or run styles. Also, the global run properties completely override the toggling behavior (but not directly applied formatting). If the w:b element is set on the global run property, effectively making the entire document bold, the entire document remains bold, unless formatting is set directly on a run.
The set of toggle properties are: §2.3.2.1 (Bold), §2.3.2.2 (Complex Script Bold), §2.3.2.4 (Display All Characters as Capital Letters), §2.3.2.11 (Embossing), §2.3.2.14 (Italics), §2.3.2.15 (Complex Script Italics), §2.3.2.16 (Imprinting), §2.3.2.21 (Display Character Outline), §2.3.2.29 (Shadow), §2.3.2.31 (Small Caps), §2.3.2.35 (Single Strikethrough), and §2.3.2.39 (Hidden Text). The section numbers are for Ecma-376 version 1.
Assembling Styling Properties for Cells in a Table
Due to the richness of table styles, as shown above, table, row, cell, paragraph and run properties can be stored in multiple places in a table style. Determining the properties for a table style involves rolling up those styles, in the exact same fashion as I described for rolling up style properties in the previous blog post. While rolling up that information, we need to either merge attributes, merge child elements, or replace elements.
Shading of the table cells comes from the table cell properties (w:tcPr). Formatting of the text in table cells comes from the paragraph properties (w:pPr) and run properties (w:rPr). Other necessary properties for rendering come from the table properties (w:tblPr) and table row properties (w:trPr). The process for assembling the correct table styling information for a cell is the same for each of these. In the following section, I describe the process of assembling styling information for runs in a table per the table style, but the same approach applies to assembling styling information for the other aspects of a table style (table, row, cell, and paragraph properties). When I write code to do this, of course, I'm going to write only one set of methods to do this assembling of styling information, and parameterize those methods so that I can use it for assembling all aspects of conditional table formatting properties.
To determine the run properties from a style for a cell in a table, we do the following, in order:
-
-
We retrieve the value of the w:tblLook element from the table that we're rendering, which indicates which of the conditional table formatting properties we will apply to the table.
-
We create an empty list of the run style properties (the w:rPr element). In the following steps, we will be adding run style properties to this list, based on the circumstances, and after assembling all the items in the list, we will roll them up to give us the appropriate styling information for the cell. Note that in the following steps, if the w:tblStylePr element does not exist, it is not an error. It just means that we don't need to do anything for that particular step.
-
We add to the list:
-
The run style property for the whole table style from w:tblStylePr[@w:type = 'wholeTable'].
-
If we should apply column banding, per the w:tblLook element
-
If the cell is an odd banded column cell, then add the run style property from w:tblStylePr[@w:type = 'band1Vert']
-
If the cell is an even banded column cell, then add the run style property from w:tblStylePr[@w:type = 'band2Vert']
-
If we should apply row banding, per the w:tblLook element
-
If the cell is an odd banded row cell, then add the run style property from w:tblStylePr[@w:type = 'band1Horz']
-
If the cell is an even banded column cell, then add the run style property from w:tblStylePr[@w:type = 'band2Vert']
-
If we should apply the first row formatting, per the w:tblLook element
-
If the cell is in the first row, then add the run style property from w:tblStylePr[@w:type = 'firstRow']
-
In addition, if the cell is in a row with the w:tblHeader element, then add the run style property from w:tblStylePr[@w:type = 'firstRow']
-
If we should apply the last row formatting, per the w:tblLook element
-
If we should apply the first column formatting, per the w:tblLook element
-
If we should apply the last column formatting, per the w:tblLook element
-
If the cell is the top left cell, then add the run style property from w:tblStylePr[@w:type = 'nwCell']
-
If the cell is the top right cell, then add the run style property from w:tblStylePr[@w:type = 'neCell']
-
If the cell is the bottom left cell, then add the run style property from w:tblStylePr[@w:type = 'swCell']
-
If the cell is the bottom right cell, then add the run style property from w:tblStylePr[@w:type = 'seCell']
-
Now that we have a list of run properties, we roll them up. We now have a set of style run properties that we can apply to the cell.
Note that this only gets the run properties for a table style. Once we have rolled up the run properties for the table style, we assemble the following, in order:
-
The run properties for the table style (per the above procedure)
-
The run properties for the paragraph style for the paragraph that contains the run
-
The run properties for the run style applied to the run
We then roll these three up, implementing the toggling behavior for toggle properties that I described earlier. Once we have done this process, we assemble the following, in order:
-
The global default run properties.
-
The rolled up run properties from the table styles.
-
The rolled up run properties from a directly applied run style.
-
The global defaults, with all properties except toggle properties removed. (This will provide the behavior that global properties trump style toggle properties.)
-
The run properties that are applied directly to a run.
We roll these up, and we finally have the run properties that we can apply to the run.
When we're assembling the paragraph properties for a table style, we follow a similar procedure. Once we have that rolled-up property, we need to assemble a new list of paragraph properties, in the following order:
-
The global default paragraph properties
-
The table style paragraph properties (per the above procedure)
-
The paragraph properties applied directly to a paragraph
We then roll up these three sets of paragraph properties, and we have the paragraph properties that we can apply to the paragraph in the cell.
This seems harder than it actually is. While this is a bit involved, this is what enables the very cool table styling capabilities that we see in Word. I just have to say, this is one of those cases where I really appreciate LINQ to XML. I personally really would not want to write old-style imperative code to do this.
One more point about this – I mentioned in an earlier post about an approach of adding paragraph and run properties with ordering applied to every paragraph and run in the document. I still think that this approach will work best. It means that I can assemble the style paragraph properties for a cell, then add them to every paragraph in the cell. I can assemble the style run properties for a cell, then add them to every run. This means that I'll only need to compute the style paragraph properties for a particular cell once, not for every paragraph in the cell. Same holds true for runs also.
Channel 9 launched new developer training courses for SharePoint 2010 and Office 2010. I know the folks who produced this, and personally seen a lot of it. This is good stuff. The training consists of extensive recordings from top MVPs/experts on how to develop with both SharePoint and Office 2010. As a developer, I'm personally particularly enthused about three developer aspects of SharePoint 2010: Client Object Model, Sandboxed Solutions, and LINQ to SharePoint, but there's a lot more there than that. If you haven't seen what the buzz is about, check it out.
SharePoint 2010 Developer Training: SharePoint 2010 has evolved into a first-class developer platform. Visual Studio 2010 integration is fantastic.
Office 2010 Developer Training: There are a number of new features and UI extensions that you'll want to know about.
Also while I'm at it, want to let you know about Windows Server 2008 R2 training. I really appreciate W2008 R2.
[Blog Map]
(Update Nov 11, 2009: This is the 5th in a series of posts (#1, #2, #3, #4, #5, #6) on doing a transform of WordprocessingML to XHtml.)
Html tables and WordprocessingML tables have a lot in common. Both can present complex tables with horizontally and vertically merged cells, and both have a rich set of capabilities for formatting. But there are differences in their models and capabilities. This blog post presents those differences, specifically around three areas:
I'm currently in the process of coding a pure functional transform from WordprocessingML to XHtml. Understanding the exact differences between the two types of tables enables writing this transform as accurately as possible. In addition, if you understand CSS and Html tables, this blog post provides an easy way to learn about WordprocessingML tables. (If you're a CSS expert, and see something I'm doing incorrectly, please correct me. J)
Note: In a previous post, I talked about a plan to transform WordprocessingML styles to CSS classes. I've decided to not use CSS classes to represent WordprocessingML styles. Instead, I'm going to generate a style attribute for each object (p, table, tr, td, etc.) that contains all necessary formatting for that object. My rational for this decision is detailed in this post, in the "Differences in Formatting" section below. This isn't a decision that I'm taking lightly, but I believe it is the correct one. But we'll see…
Differences in Table Layout
On the surface, the layout of WordprocessingML and Html tables look very similar. Of course, both can present a simple table that contains data:
Both can contain horizontally and vertically merged cells:
Both can represent an irregular layout:
However, WordprocessingML and XHtml tables use a somewhat different model for layout.
In WordprocessingML, you first establish a grid with some number of grid columns. Left and right edges of cells will always be on a grid column. The mechanism for horizontal cell spanning is that you specify the number of grid columns that a cell spans. You can specify that the first cell in a row starts after skipping a certain number of grid columns.
In contrast, in XHtml, there is no underlying grid on which you layout cells. Instead, the cells themselves form the grid.
To make this difference clear, let's look at a simple example. Consider the following table with four cells, but the vertical rule between the top two cells isn't aligned with the vertical rule between the bottom two cells:
Here is the WordprocessingML that describes this table. Notice the w:tblGrid, which describes the grid, and the w:gridSpan elements on the top left and bottom right cells. While the grid describes three grid columns, there are only two cells per row.
<w:tbl>
<w:tblPr>
<w:tblStyle w:val="TableGrid"/>
<w:tblW w:w="0" w:type="auto"/>
<w:tblLook w:val="04A0"/>
</w:tblPr>
<w:tblGrid>
<w:gridCol w:w="1368"/>
<w:gridCol w:w="450"/>
<w:gridCol w:w="1350"/>
</w:tblGrid>
<w:tr>
<w:tc>
<w:tcPr>
<w:tcW w:w="1818" w:type="dxa"/>
<w:gridSpan w:val="2"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Top Left</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="1350" w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Top Right</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
<w:tr>
<w:tc>
<w:tcPr>
<w:tcW w:w="1368" w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Bottom Left</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="1800" w:type="dxa"/>
<w:gridSpan w:val="2"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Bottom Right</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
</w:tbl>
Following is markup for a similar table in XHtml. There are three cells per row instead of two. The first two rows (the only ones we see) each contain a cell with a colspan attribute, merging two cells into one. The third row, with no border and a height of zero pixels, defines three cells. This is a trick based on the semantics of XHtml tables. When determining the widths of cells, the browser looks at all rows of the table, and then calculates the column width, taking widths of all cells of that column into consideration. Using this approach, we need to specify column widths only once, in the last invisible row of the table.
<table style='border-collapse:collapse;border:none'>
<tr>
<td colspan="2"
style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Top Left</p>
</td>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Top Right</p>
</td>
</tr>
<tr>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Bottom Left</p>
</td>
<td colspan="2"
style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Bottom Right</p>
</td>
</tr>
<tr style="max-height:0px">
<td style='width:68.4pt;border:none'></td>
<td style='width:22.5pt;border:none'></td>
<td style='width:67.5pt;border:none'></td>
</tr>
</table>
The differences in the model become even clearer when we specify that a grid column is skipped before placing the first cell. The following table shows a row that contains one cell that is shifted to the right:
The WordprocessingML that describes this table follows. The w:gridBefore element specifies that the one cell in the second row is to be placed in the second grid column.
<w:tbl>
<w:tblPr>
<w:tblStyle w:val="TableGrid"/>
<w:tblW w:w="0" w:type="auto"/>
<w:tblLook w:val="04A0"/>
</w:tblPr>
<w:tblGrid>
<w:gridCol w:w="2000"/>
<w:gridCol w:w="2000"/>
</w:tblGrid>
<w:tr>
<w:tc>
<w:tcPr>
<w:tcW w:w="2000" w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Top Left</w:t>
</w:r>
</w:p>
</w:tc>
<w:tc>
<w:tcPr>
<w:tcW w:w="2000" w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Top Right</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
<w:tr>
<w:trPr>
<w:gridBefore w:val="1"/>
</w:trPr>
<w:tc>
<w:tcPr>
<w:tcW w:w="2000" w:type="dxa"/>
</w:tcPr>
<w:p>
<w:r>
<w:t>Bottom Right</w:t>
</w:r>
</w:p>
</w:tc>
</w:tr>
</w:tbl>
Here is how we would form this table in XHtml:
<table style='border-collapse:collapse;border:none'>
<tr>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Top Left</p>
</td>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Top Right</p>
</td>
</tr>
<tr>
<td style="border:none;padding:0in 5.4pt 0in 5.4pt">
<p> </p>
</td>
<td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt">
<p>Bottom Right</p>
</td>
</tr>
<tr style="max-height:0px">
<td width="100" style='border:none'></td>
<td width="100" style='border:none'></td>
</tr>
</table>
In XHtml, we have no choice but to place a cell in the location where there is no cell visible. We place a non-breaking space in that cell, as some browsers may collapse the cell if it contains no data. We also specify padding. The table then renders as desired.
There is a simple strategy that we can take when converting the WordprocessingML to XHtml, which is to generate XHtml cells based on the grid, not on cells. We then specify appropriate colspan and style attributes to make the table render as we wish.
This subtle difference in abstraction is one of the most important differences between tables in WordprocessingML and XHtml. By taking this difference into account, it is easy to craft an algorithm that will produce tables that will render as we wish in XHtml. In addition to this difference in abstraction, there are a number of differences in formatting and capabilities. I don't believe that I've isolated all of the differences, but I think I've found most of the important ones. In some of the conversions, I didn't yet spend the time to find the correct CSS approach, so am still using an Html attribute approach.
Differences in Formatting
There are a number of analogous capabilities in formatting between tables in WordprocessingML and XHtml/CSS, but one of the key differences is that in WordprocessingML, there is a rich infrastructure of style inheritance. Table styles can inherit from other table styles. Paragraph styles can inherit from other paragraph styles. Run styles can inherit from other run styles. In contrast, in CSS, we can define classes, but we can't define that one class inherits from another class. However, when specifying the class for an element such as a table, paragraph, or span, we can specify more than one class, and each class is applied in turn. This is analogous to style inheritance, but the mechanisms are completely different.
It might seem that we could use the ability to specify multiple classes for an XHtml object to implement a form of style inheritance, but there is one important aspect of the semantics of WordprocessingML styles that make it impossible to use CSS classes to implement style inheritance. Table styles in WordprocessingML have the capability to define what are called conditional table formatting properties. These are properties that are applied in a specific order to a) the entire table, b) banded columns, c) banded rows, d) first and last row, e) first and last column, f) specific cells at the corners. And, of course, conditional table formatting properties inherit from the same conditional formatting properties of the base style of a table style. In theory, we could define styles for each of these conditional table formatting properties, and apply these styles in order of precedence to each cell in the table. But let's say that we have one table style with a number of conditional formatting properties that derives from another style that also contains a number of conditional formatting properties. When specifying the classes for a paragraph, it would look something like this:
<p class="BaseStyle BaseStyle_EntireTable BaseStyle_Banded_Columns BaseStyle_BandedRows (etc.)
DerivedStyle DerivedStyle_EntireTable DerivedStyle_BandedColumns (etc.)>Some text.</p>
If we had a string of derived table styles, we could end up applying 30 or 40 (or many more!) classes to a single paragraph or run. But even so, it won't work, because if the BaseStyle contains some property P, and a conditional formatting property overrides that property, and then DerivedStyle overrides the BaseStyle property P, and the conditional formatting property does not define that property, then the property that should apply is the one defined in the conditional formatting for the BaseStyle, not the property defined in the DerivedStyle. It simply won't work. We could start playing around with ordering of applications of classes, but I would hate to debug this.
We could go through the effort of defining classes for each uniquely styled cell in each table. This would involve rolling up all inherited styles, and implementing the appropriate semantics for overriding properties at the table, paragraph, and run level, keeping a list of uniquely styled paragraphs and runs, then generating a CSS class for each unique combination of properties. This does have the advantages (and disadvantages) of moving styling information away from the paragraphs and runs into the internal style sheet. These classes would have a computer-generated, non-descriptive name, so they wouldn't be helpful to a person who is reading the XHtml. In addition, it is highly unlikely that these classes could be re-used. It's not worth the effort, I believe.
One approach would be to define a certain set of CSS classes, then override those classes with locally applied styling information in the style attribute. But that defeats the whole purpose of having CSS classes in the first place. With that approach, we still don't have separation of content and presentation, and as you can see, attempting to use CSS classes to represent styles is very complex and prone to bugs.
The approach that I've decided to take is to properly roll-up styling information from the WordprocessingML and store that styling information in the style attribute for each object, optimizing that styling information so that if a property is defined at a higher level, it isn't redefined. For instance, if the paragraph specifies that a particular font is used, then the run doesn't also specify it. This optimization can be done after assembling all formatting information for each paragraph and run. This has the advantage that this conversion really is strictly a conversion of WordprocessingML to its presentation. By not using CSS classes, it makes the conversion more straightforward. It will be easier to debug. I think it is useful for this conversion to simply be a transform of WordprocessingML to its presentation, without involving the complexities that CSS classes bring. In effect, we're using XHtml and CSS used at the object level purely as a presentation engine.
Table Capabilities
Following is a partial list of features of WordprocessingML tables, and how they map to XHtml table features:
-
Both support visually right-to-left tables for languages such as Hebrew and Arabic. The w:bidiVisual element translates to the dir attribute of the table element.
-
Both support alignment of the table with respect to the margins of the containing section or object. To translate the w:tblInd element, create a div element with the align attribute set to some value (right, left, center).
-
Both support background shading. However, with WordprocessingML, you can specify a pattern for background shading. It could be possible to generate images, but this isn't a key scenario. For phase one, the conversion will convert to shading with patterns to a solid color.
-
WordprocessingML contains the abstraction of themes. In certain places, the conversion needs to retrieve font and color information from a theme.
-
Both support table and cell borders. However, WordprocessingML contains two features not supported in XHtml. WordprocessingML supports a large number of cell borders, including many 'clip art' varieties, such as "apples", "babyRattle", and "bats". All of the clip art varieties will be converted to a single line border. Commonly used styles such as solid, dotted, double lines, etc. will convert to the corresponding style in XHtml/CSS. In addition, WordprocessingML supports diagonal borders. These aren't commonly used, and I'm going to delay supporting them.
-
Cell margin (w:tblCellMargin) maps to the CSS padding attribute. Cell margin is the space between the cell contents extent and the cell border. Cell margin is typically expressed in terms of dxa, or 1/1440 of a point. The CSS padding attribute can be expressed in inches, points, or other units of measure.
-
Cell spacing (w:tblCellSpacing) maps to the cellspacing attribute of the table object. Cell spacing is the space between cell borders, but within the table. Cell spacing is merged between adjacent cells. Cell spacing in WordprocessingML is typically expressed in terms of dxa, or 1/1440 of a point. The XHtml cellspacing attribute is in terms of pixels.
-
Both models support flowing text around a Table. In WordprocessingML, it is supported via floating tables (w:tblOverlap). In XHtml and CSS, set the align attribute of table to left, and specify appropriate margins so that the table renders properly with the correct space between the table and surrounding text.
Row Capabilities
Following is a partial list of features of WordprocessingML rows, and how they map to XHtml row features:
-
In WordprocessingML, rows have the ability to be hidden. Given my primary goal in simply rendering the table properly, the proper conversion is to remove hidden rows from the converted XHtml.
-
In WordprocessingML, rows can be centered, aligned left, or aligned right. There is no corresponding capability in XHtml. For phase one, the conversion will disregard row alignment.
-
In WordprocessingML, you can specify that a particular row is a row header, and should be repeated on each printed page. Headers in XHtml tables provide the ability to format them separately. They take on a bold appearance by default. These capabilities are really not analogous, so for phase one, will not convert one to the other.
-
Table row height can be converted. w:trHeight converts to the CSS height property of a row.
Cell Capabilities
Following is a partial list of features of WordprocessingML cells, and how they map to XHtml cell features:
-
The w:noWrap element translates to the noWrap attribute of the td element.
-
Background shading of cells can be converted. The same issues apply as with table background shading.
-
Cell borders can be converted. The same issues apply as with table borders.
-
WordprocessingML has the capability to alter kerning so that the text fits exactly in a cell. The w:tcFitText element translates to the CSS fit-text property.
-
WordprocessingML supports setting the text flow direction. This isn't supported in XHtml tables.
-
Horizontal and vertical alignment is supported in both models.
With this post, I've detailed much of what I think I need to know to transform Open XML WordprocessingML tables to XHtml tables using CSS for formatting. I've also outlined the strategy that I think I'll follow given the slightly different layout model of tables in WordprocessingML and tables in XHtml. As I code the transform, I'll revise this post so that I can remember the details of the transform of WordprocessingML tables to XHtml tables.
[Blog Map]
(Update Nov 11, 2009: This is the 4th in a series of posts (#1, #2, #3, #4, #5, #6) on doing a transform of WordprocessingML to XHtml.)
When working with WordprocessingML, nearly all of the information that we need to render paragraphs, tables, and numbered items is contained in styles, stored in the WordprocessingML Style Definitions part. Styles are somewhat complicated because styles have inherited behavior – one style can be based on another style. Rendering of text that has the derived style then is dependent on the derived style, it's base class, that base class's base class, and so on. The Open XML specification refers to this list of styles that are derived from other styles as the 'style chain', which accurately describes the abstraction.
When determining the set of properties for rendering a paragraph or table, the first job is to 'roll up' all styles in the style chain, creating a single set of properties that we can apply to the paragraph or table. This process of 'rolling up' styles is made somewhat more complicated because there are four variations of semantics that we must apply to elements in the rolling-up process.
However, it's not too complicated, and after carefully defining the semantics of 'rolling-up' styles in the style chain, we can write a small bit of generalized code to do this – probably less than 100 lines of code.
You'll notice something about the semantics of style inheritance – by far, when rolling up the styles, the most common operation is to replace any elements in base styles with an element in a derived style. In the code that I'm going to write which will roll-up styles, if the inheritance semantics are other than merging attributes or merging child elements, then the default behavior will be to do element replacement. This will make the code as small and robust as possible.
This post probably isn't of very much interest to most people, but to the folks who are interested, it will be very important. I'm in the process of writing a fairly compact conversion of Open XML to XHtml, and needed to work out the exact behavior of style inheritance. After working it out, it made good sense to blog it to make life easier for others who need to work with rendering issues of WordprocessingML.
Merging Attributes
In some cases, we must iterate through attributes of a particular element, and if the element in the derived style has an attribute, we must apply that attribute, overriding the attribute in the base style. In many cases, the base style may not define that particular attribute, so in that case, we must simply add the attribute to the element in the rolled-up style. For example, we may have a style, SpaceBefore, which defines a style that has space before the paragraph, but no space after:
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="SpaceBefore">
<w:name w:val="SpaceBefore"/>
<w:basedOn w:val="Normal"/>
<w:qFormat/>
<w:rsid w:val="00A670C6"/>
<w:pPr>
<w:spacing w:before="200"
w:after="0"/>
</w:pPr>
</w:style>
We may have a style, SpaceBeforeAndAfter, which defines the w:spacing element with a w:after attribute, like this:
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="SpaceBeforeAndAfter">
<w:name w:val="SpaceBeforeAndAfter"/>
<w:basedOn w:val="SpaceBefore"/>
<w:qFormat/>
<w:rsid w:val="00A670C6"/>
<w:pPr>
<w:spacing w:after="200"/>
</w:pPr>
</w:style>
After 'rolling-up' the style chain, the style that we must apply to a paragraph that has the SpaceBeforeAndAfter style would look like this:
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="SpaceBeforeAndAfter">
<w:name w:val="SpaceBeforeAndAfter"/>
<w:basedOn w:val="SpaceBefore"/>
<w:qFormat/>
<w:rsid w:val="00A670C6"/>
<w:pPr>
<w:spacing w:before="200"
w:after="200"/>
</w:pPr>
</w:style>
Merging Child Elements
In some cases, we must merge child elements. We must iterate through all child elements of an element in the derived style, and if the base style doesn't contain a particular element, we must add that element to the 'rolled-up' style. If the base style does contain the element of interest, then we must either merge attributes or replace the child elements, based on the semantics defined for that child element. The w:pPr and w:rPr elements are examples of elements that require this type of inheritance.
Consider the style NotIndented, which defines paragraph properties (w:pPr) as follows:
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="NotIndented">
<w:name w:val="NotIndented"/>
<w:basedOn w:val="Normal"/>
<w:qFormat/>
<w:rsid w:val="00082E03"/>
<w:pPr>
<w:spacing w:after="0"/>
</w:pPr>
</w:style>
The following style, Indented, derives from NotIndented:
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="Indented">
<w:name w:val="Indented"/>
<w:basedOn w:val="NotIndented"/>
<w:qFormat/>
<w:rsid w:val="00082E03"/>
<w:pPr>
<w:ind w:left="720"/>
</w:pPr>
</w:style>
After rolling up all styles in the style chain, the style that we should apply to text styled as Indented would be defined as follows:
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="Indented">
<w:name w:val="Indented"/>
<w:basedOn w:val="NotIndented"/>
<w:qFormat/>
<w:rsid w:val="00082E03"/>
<w:pPr>
<w:spacing w:after="0"/>
<w:ind w:left="720"/>
</w:pPr>
</w:style>
Note that both the w:spacing and w:ind elements require that their attributes be merged. In most cases, per the list below, elements are replaced (as opposed to merging of attributes).
Replacing Elements
In some cases, while rolling-up styles, we must replace an element and its attributes wholesale. We don't need to iterate through attributes, replacing individual attributes. The w:top (Paragraph Border Above Identical Paragraphs) element has these semantics. Consider the following style that defines a single line, with a size of 4 eighth's of a point, and with a color of red (FF0000 in hex):
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="TopBorder1">
<w:name w:val="TopBorder1"/>
<w:basedOn w:val="Normal"/>
<w:qFormat/>
<w:rsid w:val="007850D3"/>
<w:pPr>
<w:pBdr>
<w:top w:val="single"
w:sz="4"
w:space="1"
w:color="FF0000"/>
</w:pBdr>
</w:pPr>
</w:style>
Here is a derived style, TopBorder2, which defines a top border, with a size of 18 eighth's of a point, and no color defined:
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="TopBorder2">
<w:name w:val="TopBorder2"/>
<w:basedOn w:val="TopBorder1"/>
<w:qFormat/>
<w:rsid w:val="00315108"/>
<w:pPr>
<w:pBdr>
<w:top w:val="single"
w:sz="18"
w:space="1"/>
</w:pBdr>
</w:pPr>
</w:style>
After rolling up the styles in the style chain, the resulting style that should be applied to a paragraph styled TopBorder2 should be like this:
<w:style w:type="paragraph"
w:customStyle="1"
w:styleId="TopBorder2">
<w:name w:val="TopBorder2"/>
<w:basedOn w:val="TopBorder1"/>
<w:qFormat/>
<w:rsid w:val="00315108"/>
<w:pPr>
<w:pBdr>
<w:top w:val="single"
w:sz="18"
w:space="1"/>
</w:pBdr>
</w:pPr>
</w:style>
Notice that the w:color attribute was not inherited from TopBorder1. The w:top element, along with its attributes, was replaced wholesale.
Style Conditional Table Formatting Properties
There is one special case where merging semantics are slightly more complicated. Table styles have a very powerful feature called conditional table formatting. This feature allows us to specify a special set of formatting properties for the top row, the first column, the bottom row, banded columns, banded rows, cells at the top left, top right, etc. Conditional table formatting is defined in the w:tblStylePr element. The following table style (markup has been simplified) contains a w:tblStylePr element for the first row, and a w:tblStylePr element for the first column:
<w:style w:type="table"
w:customStyle="1"
w:styleId="LightListRedHeader">
<w:name w:val="Light List Red Header"/>
<w:basedOn w:val="LightList"/>
<w:tblStylePr w:type="firstRow">
<w:pPr>
<w:spacing w:before="0"
w:after="0"
w:line="240"
w:lineRule="auto"/>
</w:pPr>
<w:rPr>
<w:b/>
<w:bCs/>
<w:color w:val="FFFFFF"
w:themeColor="background1"/>
</w:rPr>
<w:tblPr/>
<w:tcPr>
<w:shd w:val="clear"
w:color="auto"
w:fill="FF0000"/>
</w:tcPr>
</w:tblStylePr>
<w:tblStylePr w:type="firstCol">
<w:rPr>
<w:b/>
<w:bCs/>
</w:rPr>
</w:tblStylePr>
A table style definition most often will have several w:tblStylePr elements. We can't simply merge child elements for the w:tblStylePr element. We must first match the w:type attribute, and then merge child elements.
Summary of Style Inheritance Semantics
The table at the end of this post summarizes the semantics that we must apply when 'rolling-up' styles.
A fair number of elements in the style hierarchy exist solely for the user interface or other purposes. We are only interested in rolling up those elements that impact presentation, so I'm eliminating elements that don't apply. A few elements (name and basedOn) are used in the rolling-up process, so I am listing those.
Note that this is only part of the story around putting together the style information for a cell in a table. After rolling up styles in a style chain into a single set of properties for a table, we must also roll up character formatting information, which involves rolling up run formatting information for the table, for paragraph styles, and for run styles. Before rolling any of this up, we need to take the global run properties into consideration. And when rolling up this information over the hierarchy (table, paragraph, run), we need to handle something called toggle properties. Finally, where appropriate, we must retrieve information from the theme of the document. Stay tuned…
Element |
Ecma376 |
Semantics |
|
style |
2.7.3.17 |
Merge child elements |
|
name |
2.7.3.9 |
Used when assembling inheritance information |
|
basedOn |
2.7.3.3 |
Used when assembling inheritance information |
|
pPr |
2.7.7.2 |
Merge child elements |
|
rPr |
2.7.8.1 |
Merge child elements |
|
tblPr |
2.7.5.4 |
Merge child elements |
|
tblStylePr |
2.7.5.6 |
Merge child elements (Conditional Table Formatting Properties). See the note about this element above. |
|
tcPr |
2.7.5.9 |
Merge child elements |
|
trPr |
2.7.5.11 |
Merge child elements |
|
pPr |
17.7.8.2 |
|
|
adjustRightInd |
2.3.1.1 |
Replace element |
|
autoSpaceDE |
2.3.1.2 |
Replace element |
|
autoSpaceDN |
2.3.1.3 |
Replace element |
|
bidi |
2.3.1.6 |
Replace element |
|
cnfStyle |
2.3.1.8 |
Replace element |
|
contextualSpacing |
2.3.1.9 |
Replace element |
|
framePr |
2.3.1.11 |
Replace element |
|
ind |
2.3.1.12 |
Merge attributes |
|
jc |
2.3.1.13 |
Replace element |
|
keepLines |
2.3.1.14 |
Replace element |
|
keepNext |
2.3.1.15 |
Replace element |
|
kinsoku |
2.3.1.16 |
Replace element |
|
mirrorIndents |
2.3.1.18 |
Replace element |
|
numPr |
2.3.1.19 |
Replace element |
|
outlineLvl |
2.3.1.20 |
Replace element |
|
overflowPunct |
2.3.1.21 |
Replace element |
|
pageBreakBefore |
2.3.1.23 |
Replace element |
|
pBdr |
2.3.1.24 |
Merge child elements |
|
rPr |
2.3.1.29 |
Merge child elements |
|
shd |
2.3.1.31 |
Replace element |
|
snapToGrid |
2.3.1.32 |
Replace element |
|
spacing |
2.3.1.33 |
Merge attributes |
|
suppressAutoHyphens |
2.3.1.34 |
Replace element |
|
suppressLineNumbers |
2.3.1.35 |
Replace element |
|
suppressOverlap |
2.3.1.36 |
Replace element |
|
tabs |
2.3.1.38 |
Merge child elements |
|
textAlignment |
2.3.1.39 |
Replace element |
|
textboxTightWrap |
2.3.1.40 |
Replace element |
|
textDirection |
2.3.1.41 |
Replace element |
|
topLinePunct |
2.3.1.43 |
Replace element |
|
widowControl |
2.3.1.44 |
Replace element |
|
wordWrap |
2.3.1.45 |
Replace element |
|
rPr |
2.7.8.1 |
|
|
b |
2.3.2.1 |
Replace element |
|
bCs |
2.3.2.2 |
Replace element |
|
bdr |
2.3.2.3 |
Replace element |
|
caps |
2.3.2.4 |
Replace element |
|
color |
2.3.2.5 |
Replace element |
|
cs |
2.3.2.6 |
Replace element |
|
dstrike |
2.3.2.7 |
Replace element |
|
eastAsianLayout |
2.3.2.8 |
Replace element |
|
effect |
2.3.2.9 |
Replace element |
|
em |
2.3.2.10 |
Replace element |
|
emboss |
2.3.2.11 |
Replace element |
|
fitText |
2.3.2.12 |
Replace element |
|
highlight |
2.3.2.13 |
Replace element |
|
i |
2.3.2.14 |
Replace element |
|
iCs |
2.3.2.15 |
Replace element |
|
imprint |
2.3.2.16 |
Replace element |
|
kern |
2.3.2.17 |
Replace element |
|
lang |
2.3.2.18 |
Merge attributes |
|
oMath |
2.3.2.20 |
Replace element |
|
outline |
2.3.2.21 |
Replace element |
|
position |
2.3.2.22 |
Replace element |
|
rFonts |
2.3.2.24 |
Replace element |
|
rtl |
2.3.2.28 |
Replace element |
|
shadow |
2.3.2.29 |
Replace element |
|
shd |
2.3.2.30 |
Replace element |
|
smallCaps |
2.3.2.31 |
Replace element |
|
snapToGrid |
2.3.2.32 |
Replace element |
|
spacing |
2.3.2.33 |
Replace element |
|
specVanish |
2.3.2.34 |
Replace element |
|
strike |
2.3.2.35 |
Replace element |
|
sz |
2.3.2.36 |
Replace element |
|
szCs |
2.3.2.37 |
Replace element |
|
u |
2.3.2.38 |
Replace element |
|
vanish |
2.3.2.39 |
Replace element |
|
vertAlign |
2.3.2.40 |
Replace element |
|
w |
2.3.2.41 |
Replace element |
|
webHidden |
2.3.2.42 |
Replace element |
|
tblPr |
|
|
|
bidiVisual |
2.4.1 |
Replace element |
|
jc |
2.4.23 |
Replace element |
|
shd |
2.4.35 |
Replace element |
|
tblBorders |
2.4.38 |
Merge child elements |
|
tblCellMar |
2.4.39 |
Merge child elements |
|
tblCellSpacing |
2.4.43 |
Replace element |
|
tblInd |
2.4.48 |
Replace element |
|
tblLayout |
2.4.49 |
Replace element |
|
tblLook |
2.4.51 |
Replace element |
|
tblOverlap |
2.4.53 |
Replace element |
|
tblpPr |
2.4.54 |
Replace element |
|
tblStyleColBandSize |
2.7.5.5 |
Replace element |
|
tblStyleRowBandSize |
2.7.5.7 |
Replace element |
|
tblW |
2.4.61 |
Replace element |
|
tblStylePr |
|
|
|
pPr |
2.7.5.1 |
Merge child elements |
|
rPr |
2.7.5.2 |
Merge child elements |
|
tblPr |
2.7.5.3 |
Merge child elements |
|
tcPr |
2.7.5.9 |
Merge child elements |
|
trPr |
2.7.5.10 |
Merge child elements |
|
tcPr |
|
|
|
hideMark |
2.4.15 |
Replace element |
|
noWrap |
2.4.28 |
Replace element |
|
shd |
2.4.33 |
Replace element |
|
tcBorders |
2.4.63 |
Merge child elements |
|
tcFitText |
2.4.64 |
Replace element |
|
tcMar |
2.4.65 |
Merge child elements |
|
tcW |
2.4.68 |
Replace element |
|
textDirection |
2.4.69 |
Replace element |
|
vAlign |
2.4.80 |
Replace element |
|
trPr |
|
|
|
cantSplit |
2.4.6 |
Replace element |
|
gridAfter |
2.4.10 |
Replace element |
|
gridBefore |
2.4.11 |
Replace element |
|
hidden |
2.4.14 |
Replace element |
|
jc |
2.4.22 |
Replace element |
|
tblCellSpacing |
2.4.42 |
Replace element |
|
tblHeader |
2.4.46 |
Replace element |
|
trHeight |
2.4.77 |
Replace element |
|
wAfter |
2.4.82 |
Replace element |
|
wBefore |
2.4.83 |
Replace element |
[Blog Map]
(Update Nov 11, 2009: This is the 3rd in a series of posts (#1, #2, #3, #4, #5,