Blog Map
[Blog Map] This blog is inactive. New blog: EricWhite.com/blog
This is one in a series of posts on transforming Open XML WordprocessingML to XHtml. You can find the complete list of posts here.
Html tables and WordprocessingML tables have a lot in common. Both can present complex tables with horizontally and vertically merged cells, and both have a rich set of capabilities for formatting. But there are differences in their models and capabilities. This blog post presents those differences, specifically around three areas:
I'm currently in the process of coding a pure functional transform from WordprocessingML to XHtml. Understanding the exact differences between the two types of tables enables writing this transform as accurately as possible. In addition, if you understand CSS and Html tables, this blog post provides an easy way to learn about WordprocessingML tables. (If you're a CSS expert, and see something I'm doing incorrectly, please correct me. :)
Note: In a previous post, I talked about a plan to transform WordprocessingML styles to CSS classes. I've decided to not use CSS classes to represent WordprocessingML styles. Instead, I'm going to generate a style attribute for each object (p, table, tr, td, etc.) that contains all necessary formatting for that object. My rational for this decision is detailed in this post, in the "Differences in Formatting" section below. This isn't a decision that I'm taking lightly, but I believe it is the correct one. But we'll see…
On the surface, the layout of WordprocessingML and Html tables look very similar. Of course, both can present a simple table that contains data:
Both can contain horizontally and vertically merged cells:
Both can represent an irregular layout:
However, WordprocessingML and XHtml tables use a somewhat different model for layout.
In WordprocessingML, you first establish a grid with some number of grid columns. Left and right edges of cells will always be on a grid column. The mechanism for horizontal cell spanning is that you specify the number of grid columns that a cell spans. You can specify that the first cell in a row starts after skipping a certain number of grid columns.
In contrast, in XHtml, there is no underlying grid on which you layout cells. Instead, the cells themselves form the grid.
To make this difference clear, let's look at a simple example. Consider the following table with four cells, but the vertical rule between the top two cells isn't aligned with the vertical rule between the bottom two cells:
Here is the WordprocessingML that describes this table. Notice the w:tblGrid, which describes the grid, and the w:gridSpan elements on the top left and bottom right cells. While the grid describes three grid columns, there are only two cells per row.
<w:tbl> <w:tblPr> <w:tblStyle w:val="TableGrid"/> <w:tblW w:w="0" w:type="auto"/> <w:tblLook w:val="04A0"/> </w:tblPr> <w:tblGrid> <w:gridCol w:w="1368"/> <w:gridCol w:w="450"/> <w:gridCol w:w="1350"/> </w:tblGrid> <w:tr> <w:tc> <w:tcPr> <w:tcW w:w="1818" w:type="dxa"/> <w:gridSpan w:val="2"/> </w:tcPr> <w:p> <w:r> <w:t>Top Left</w:t> </w:r> </w:p> </w:tc> <w:tc> <w:tcPr> <w:tcW w:w="1350" w:type="dxa"/> </w:tcPr> <w:p> <w:r> <w:t>Top Right</w:t> </w:r> </w:p> </w:tc> </w:tr> <w:tr> <w:tc> <w:tcPr> <w:tcW w:w="1368" w:type="dxa"/> </w:tcPr> <w:p> <w:r> <w:t>Bottom Left</w:t> </w:r> </w:p> </w:tc> <w:tc> <w:tcPr> <w:tcW w:w="1800" w:type="dxa"/> <w:gridSpan w:val="2"/> </w:tcPr> <w:p> <w:r> <w:t>Bottom Right</w:t> </w:r> </w:p> </w:tc> </w:tr></w:tbl>
Following is markup for a similar table in XHtml. There are three cells per row instead of two. The first two rows (the only ones we see) each contain a cell with a colspan attribute, merging two cells into one. The third row, with no border and a height of zero pixels, defines three cells. This is a trick based on the semantics of XHtml tables. When determining the widths of cells, the browser looks at all rows of the table, and then calculates the column width, taking widths of all cells of that column into consideration. Using this approach, we need to specify column widths only once, in the last invisible row of the table.
<table style='border-collapse:collapse;border:none'> <tr> <td colspan="2" style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt"> <p>Top Left</p> </td> <td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt"> <p>Top Right</p> </td> </tr> <tr> <td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt"> <p>Bottom Left</p> </td> <td colspan="2" style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt"> <p>Bottom Right</p> </td> </tr> <tr style="max-height:0px"> <td style='width:68.4pt;border:none'></td> <td style='width:22.5pt;border:none'></td> <td style='width:67.5pt;border:none'></td> </tr></table>
The differences in the model become even clearer when we specify that a grid column is skipped before placing the first cell. The following table shows a row that contains one cell that is shifted to the right:
The WordprocessingML that describes this table follows. The w:gridBefore element specifies that the one cell in the second row is to be placed in the second grid column.
<w:tbl> <w:tblPr> <w:tblStyle w:val="TableGrid"/> <w:tblW w:w="0" w:type="auto"/> <w:tblLook w:val="04A0"/> </w:tblPr> <w:tblGrid> <w:gridCol w:w="2000"/> <w:gridCol w:w="2000"/> </w:tblGrid> <w:tr> <w:tc> <w:tcPr> <w:tcW w:w="2000" w:type="dxa"/> </w:tcPr> <w:p> <w:r> <w:t>Top Left</w:t> </w:r> </w:p> </w:tc> <w:tc> <w:tcPr> <w:tcW w:w="2000" w:type="dxa"/> </w:tcPr> <w:p> <w:r> <w:t>Top Right</w:t> </w:r> </w:p> </w:tc> </w:tr> <w:tr> <w:trPr> <w:gridBefore w:val="1"/> </w:trPr> <w:tc> <w:tcPr> <w:tcW w:w="2000" w:type="dxa"/> </w:tcPr> <w:p> <w:r> <w:t>Bottom Right</w:t> </w:r> </w:p> </w:tc> </w:tr></w:tbl>
Here is how we would form this table in XHtml:
<table style='border-collapse:collapse;border:none'> <tr> <td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt"> <p>Top Left</p> </td> <td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt"> <p>Top Right</p> </td> </tr> <tr> <td style="border:none;padding:0in 5.4pt 0in 5.4pt"> <p> </p> </td> <td style="border:solid;border-width:1px;border-color:Black;padding:0in 5.4pt 0in 5.4pt"> <p>Bottom Right</p> </td> </tr> <tr style="max-height:0px"> <td width="100" style='border:none'></td> <td width="100" style='border:none'></td> </tr></table>
In XHtml, we have no choice but to place a cell in the location where there is no cell visible. We place a non-breaking space in that cell, as some browsers may collapse the cell if it contains no data. We also specify padding. The table then renders as desired.
There is a simple strategy that we can take when converting the WordprocessingML to XHtml, which is to generate XHtml cells based on the grid, not on cells. We then specify appropriate colspan and style attributes to make the table render as we wish.
This subtle difference in abstraction is one of the most important differences between tables in WordprocessingML and XHtml. By taking this difference into account, it is easy to craft an algorithm that will produce tables that will render as we wish in XHtml. In addition to this difference in abstraction, there are a number of differences in formatting and capabilities. I don't believe that I've isolated all of the differences, but I think I've found most of the important ones. In some of the conversions, I didn't yet spend the time to find the correct CSS approach, so am still using an Html attribute approach.
There are a number of analogous capabilities in formatting between tables in WordprocessingML and XHtml/CSS, but one of the key differences is that in WordprocessingML, there is a rich infrastructure of style inheritance. Table styles can inherit from other table styles. Paragraph styles can inherit from other paragraph styles. Run styles can inherit from other run styles. In contrast, in CSS, we can define classes, but we can't define that one class inherits from another class. However, when specifying the class for an element such as a table, paragraph, or span, we can specify more than one class, and each class is applied in turn. This is analogous to style inheritance, but the mechanisms are completely different.
It might seem that we could use the ability to specify multiple classes for an XHtml object to implement a form of style inheritance, but there is one important aspect of the semantics of WordprocessingML styles that make it impossible to use CSS classes to implement style inheritance. Table styles in WordprocessingML have the capability to define what are called conditional table formatting properties. These are properties that are applied in a specific order to a) the entire table, b) banded columns, c) banded rows, d) first and last row, e) first and last column, f) specific cells at the corners. And, of course, conditional table formatting properties inherit from the same conditional formatting properties of the base style of a table style. In theory, we could define styles for each of these conditional table formatting properties, and apply these styles in order of precedence to each cell in the table. But let's say that we have one table style with a number of conditional formatting properties that derives from another style that also contains a number of conditional formatting properties. When specifying the classes for a paragraph, it would look something like this:
<p class="BaseStyle BaseStyle_EntireTable BaseStyle_Banded_Columns BaseStyle_BandedRows (etc.) DerivedStyle DerivedStyle_EntireTable DerivedStyle_BandedColumns (etc.)>Some text.</p>
If we had a string of derived table styles, we could end up applying 30 or 40 (or many more!) classes to a single paragraph or run. But even so, it won't work, because if the BaseStyle contains some property P, and a conditional formatting property overrides that property, and then DerivedStyle overrides the BaseStyle property P, and the conditional formatting property does not define that property, then the property that should apply is the one defined in the conditional formatting for the BaseStyle, not the property defined in the DerivedStyle. It simply won't work. We could start playing around with ordering of applications of classes, but I would hate to debug this.
We could go through the effort of defining classes for each uniquely styled cell in each table. This would involve rolling up all inherited styles, and implementing the appropriate semantics for overriding properties at the table, paragraph, and run level, keeping a list of uniquely styled paragraphs and runs, then generating a CSS class for each unique combination of properties. This does have the advantages (and disadvantages) of moving styling information away from the paragraphs and runs into the internal style sheet. These classes would have a computer-generated, non-descriptive name, so they wouldn't be helpful to a person who is reading the XHtml. In addition, it is highly unlikely that these classes could be re-used. It's not worth the effort, I believe.
One approach would be to define a certain set of CSS classes, then override those classes with locally applied styling information in the style attribute. But that defeats the whole purpose of having CSS classes in the first place. With that approach, we still don't have separation of content and presentation, and as you can see, attempting to use CSS classes to represent styles is very complex and prone to bugs.
The approach that I've decided to take is to properly roll-up styling information from the WordprocessingML and store that styling information in the style attribute for each object, optimizing that styling information so that if a property is defined at a higher level, it isn't redefined. For instance, if the paragraph specifies that a particular font is used, then the run doesn't also specify it. This optimization can be done after assembling all formatting information for each paragraph and run. This has the advantage that this conversion really is strictly a conversion of WordprocessingML to its presentation. By not using CSS classes, it makes the conversion more straightforward. It will be easier to debug. I think it is useful for this conversion to simply be a transform of WordprocessingML to its presentation, without involving the complexities that CSS classes bring. In effect, we're using XHtml and CSS used at the object level purely as a presentation engine.
Following is a partial list of features of WordprocessingML tables, and how they map to XHtml table features:
Following is a partial list of features of WordprocessingML rows, and how they map to XHtml row features:
Following is a partial list of features of WordprocessingML cells, and how they map to XHtml cell features:
With this post, I've detailed much of what I think I need to know to transform Open XML WordprocessingML tables to XHtml tables using CSS for formatting. I've also outlined the strategy that I think I'll follow given the slightly different layout model of tables in WordprocessingML and tables in XHtml. As I code the transform, I'll revise this post so that I can remember the details of the transform of WordprocessingML tables to XHtml tables.
How w:gridCol width is defined?
For example, w:gridCol w:w="1368"
how to derive 1368 here. i believe it is in twips but it is not the cell width (in points) * 20
so, how to derive that. BTW, you have a wonderful articles on wordxml topics. thanks a lot for that.
I got good information from this article. thanks dude..
Hello
How to predefined style to table in wordprocessingml? For e.g. Light Grid - Accent 5.
I have predefined style name in database and i want to set it to table that i create using office open xml.