October, 2009

  • Eric White's Blog

    Open XML WordprocessingML Style Inheritance (Post #4)

    • 7 Comments

    When working with WordprocessingML, nearly all of the information that we need to render paragraphs, tables, and numbered items is contained in styles, stored in the WordprocessingML Style Definitions part.  Styles are somewhat complicated because styles have inherited behavior – one style can be based on another style.  Rendering of text that has the derived style then is dependent on the derived style, it's base class, that base class's base class, and so on.  The Open XML specification refers to this list of styles that are derived from other styles as the 'style chain', which accurately describes the abstraction.

    This is one in a series of posts on transforming Open XML WordprocessingML to XHtml.  You can find the complete list of posts here.

    This blog is inactive.
    New blog: EricWhite.com/blog

    Blog TOC
    When determining the set of properties for rendering a paragraph or table, the first job is to 'roll up' all styles in the style chain, creating a single set of properties that we can apply to the paragraph or table.  This process of 'rolling up' styles is made somewhat more complicated because there are four variations of semantics that we must apply to elements in the rolling-up process.

    However, it's not too complicated, and after carefully defining the semantics of 'rolling-up' styles in the style chain, we can write a small bit of generalized code to do this – probably less than 100 lines of code.

    You'll notice something about the semantics of style inheritance – by far, when rolling up the styles, the most common operation is to replace any elements in base styles with an element in a derived style.  In the code that I'm going to write which will roll-up styles, if the inheritance semantics are other than merging attributes or merging child elements, then the default behavior will be to do element replacement.  This will make the code as small and robust as possible.

    This post probably isn't of very much interest to most people, but to the folks who are interested, it will be very important.  I'm in the process of writing a fairly compact conversion of Open XML to XHtml, and needed to work out the exact behavior of style inheritance.  After working it out, it made good sense to blog it to make life easier for others who need to work with rendering issues of WordprocessingML.

    Merging Attributes

    In some cases, we must iterate through attributes of a particular element, and if the element in the derived style has an attribute, we must apply that attribute, overriding the attribute in the base style.  In many cases, the base style may not define that particular attribute, so in that case, we must simply add the attribute to the element in the rolled-up style.  For example, we may have a style, SpaceBefore, which defines a style that has space before the paragraph, but no space after:

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="SpaceBefore">
      <w:namew:val="SpaceBefore"/>
      <w:basedOnw:val="Normal"/>
      <w:qFormat/>
      <w:rsidw:val="00A670C6"/>
      <w:pPr>
        <w:spacingw:before="200"
                   w:after="0"/>
      </w:pPr>
    </w:style>

    We may have a style, SpaceBeforeAndAfter, which defines the w:spacing element with a w:after attribute, like this:

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="SpaceBeforeAndAfter">
      <w:namew:val="SpaceBeforeAndAfter"/>
      <w:basedOnw:val="SpaceBefore"/>
      <w:qFormat/>
      <w:rsidw:val="00A670C6"/>
      <w:pPr>
        <w:spacingw:after="200"/>
      </w:pPr>
    </w:style>

    After 'rolling-up' the style chain, the style that we must apply to a paragraph that has the SpaceBeforeAndAfter style would look like this:

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="SpaceBeforeAndAfter">
      <w:namew:val="SpaceBeforeAndAfter"/>
      <w:basedOnw:val="SpaceBefore"/>
      <w:qFormat/>
      <w:rsidw:val="00A670C6"/>
      <w:pPr>
        <w:spacingw:before="200"
                   w:after="200"/>
      </w:pPr>
    </w:style>

    Merging Child Elements

    In some cases, we must merge child elements.  We must iterate through all child elements of an element in the derived style, and if the base style doesn't contain a particular element, we must add that element to the 'rolled-up' style.  If the base style does contain the element of interest, then we must either merge attributes or replace the child elements, based on the semantics defined for that child element.  The w:pPr and w:rPr elements are examples of elements that require this type of inheritance.

    Consider the style NotIndented, which defines paragraph properties (w:pPr) as follows:

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="NotIndented">
      <w:namew:val="NotIndented"/>
      <w:basedOnw:val="Normal"/>
      <w:qFormat/>
      <w:rsidw:val="00082E03"/>
      <w:pPr>
        <w:spacingw:after="0"/>
      </w:pPr>
    </w:style>

    The following style, Indented, derives from NotIndented:

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="Indented">
      <w:namew:val="Indented"/>
      <w:basedOnw:val="NotIndented"/>
      <w:qFormat/>
      <w:rsidw:val="00082E03"/>
      <w:pPr>
        <w:indw:left="720"/>
      </w:pPr>
    </w:style>

    After rolling up all styles in the style chain, the style that we should apply to text styled as Indented would be defined as follows:

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="Indented">
      <w:namew:val="Indented"/>
      <w:basedOnw:val="NotIndented"/>
      <w:qFormat/>
      <w:rsidw:val="00082E03"/>
      <w:pPr>
        <w:spacingw:after="0"/>
        <w:indw:left="720"/>
      </w:pPr>
    </w:style>

    Note that both the w:spacing and w:ind elements require that their attributes be merged.  In most cases, per the list below, elements are replaced (as opposed to merging of attributes).

    Replacing Elements

    In some cases, while rolling-up styles, we must replace an element and its attributes wholesale.  We don't need to iterate through attributes, replacing individual attributes.  The w:top (Paragraph Border Above Identical Paragraphs) element has these semantics.  Consider the following style that defines a single line, with a size of 4 eighth's of a point, and with a color of red (FF0000 in hex):

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="TopBorder1">
      <w:namew:val="TopBorder1"/>
      <w:basedOnw:val="Normal"/>
      <w:qFormat/>
      <w:rsidw:val="007850D3"/>
      <w:pPr>
        <w:pBdr>
          <w:topw:val="single"
                 w:sz="4"
                 w:space="1"
                 w:color="FF0000"/>
        </w:pBdr>
      </w:pPr>
    </w:style>

    Here is a derived style, TopBorder2, which defines a top border, with a size of 18 eighth's of a point, and no color defined:

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="TopBorder2">
      <w:namew:val="TopBorder2"/>
      <w:basedOnw:val="TopBorder1"/>
      <w:qFormat/>
      <w:rsidw:val="00315108"/>
      <w:pPr>
        <w:pBdr>
          <w:topw:val="single"
                 w:sz="18"
                 w:space="1"/>
        </w:pBdr>
      </w:pPr>
    </w:style>

    After rolling up the styles in the style chain, the resulting style that should be applied to a paragraph styled TopBorder2 should be like this:

    <w:stylew:type="paragraph"
             w:customStyle="1"
             w:styleId="TopBorder2">
      <w:namew:val="TopBorder2"/>
      <w:basedOnw:val="TopBorder1"/>
      <w:qFormat/>
      <w:rsidw:val="00315108"/>
      <w:pPr>
        <w:pBdr>
          <w:topw:val="single"
                 w:sz="18"
                 w:space="1"/>
        </w:pBdr>
      </w:pPr>
    </w:style>

    Notice that the w:color attribute was not inherited from TopBorder1.  The w:top element, along with its attributes, was replaced wholesale.

    (Update December 13, 2009 - I've written a bit of code to show how to implement XML inheritance.)

    Style Conditional Table Formatting Properties

    There is one special case where merging semantics are slightly more complicated.  Table styles have a very powerful feature called conditional table formatting.  This feature allows us to specify a special set of formatting properties for the top row, the first column, the bottom row, banded columns, banded rows, cells at the top left, top right, etc.  Conditional table formatting is defined in the w:tblStylePr element.  The following table style (markup has been simplified) contains a w:tblStylePr element for the first row, and a w:tblStylePr element for the first column:

    <w:stylew:type="table"
             w:customStyle="1"
             w:styleId="LightListRedHeader">
      <w:namew:val="Light List Red Header"/>
      <w:basedOnw:val="LightList"/>
      <w:tblStylePrw:type="firstRow">
        <w:pPr>
          <w:spacingw:before="0"
                     w:after="0"
                     w:line="240"
                     w:lineRule="auto"/>
        </w:pPr>
        <w:rPr>
          <w:b/>
          <w:bCs/>
          <w:colorw:val="FFFFFF"
                   w:themeColor="background1"/>
        </w:rPr>
        <w:tblPr/>
        <w:tcPr>
          <w:shdw:val="clear"
                 w:color="auto"
                 w:fill="FF0000"/>
        </w:tcPr>
      </w:tblStylePr>
      <w:tblStylePrw:type="firstCol">
        <w:rPr>
          <w:b/>
          <w:bCs/>
        </w:rPr>
      </w:tblStylePr>

    A table style definition most often will have several w:tblStylePr elements.  We can't simply merge child elements for the w:tblStylePr element.  We must first match the w:type attribute, and then merge child elements.

    Summary of Style Inheritance Semantics

    The table at the end of this post summarizes the semantics that we must apply when 'rolling-up' styles.

    A fair number of elements in the style hierarchy exist solely for the user interface or other purposes.  We are only interested in rolling up those elements that impact presentation, so I'm eliminating elements that don't apply.  A few elements (name and basedOn) are used in the rolling-up process, so I am listing those.

    Note that this is only part of the story around putting together the style information for a cell in a table.  After rolling up styles in a style chain into a single set of properties for a table, we must also roll up character formatting information, which involves rolling up run formatting information for the table, for paragraph styles, and for run styles.  Before rolling any of this up, we need to take the global run properties into consideration.  And when rolling up this information over the hierarchy (table, paragraph, run), we need to handle something called toggle properties.  Finally, where appropriate, we must retrieve information from the theme of the document.  Stay tuned…


    Element

    Ecma376

    Semantics

    style

    2.7.3.17

    Merge child elements

      name

    2.7.3.9

    Used when assembling inheritance information

      basedOn

    2.7.3.3

    Used when assembling inheritance information

      pPr

    2.7.7.2

    Merge child elements

      rPr

    2.7.8.1

    Merge child elements

      tblPr

    2.7.5.4

    Merge child elements

      tblStylePr

    2.7.5.6

    Merge child elements (Conditional Table Formatting Properties).  See the note about this element above.

      tcPr

    2.7.5.9

    Merge child elements

      trPr

    2.7.5.11

    Merge child elements

    pPr

    17.7.8.2

     

      adjustRightInd

    2.3.1.1

    Replace element

      autoSpaceDE

    2.3.1.2

    Replace element

      autoSpaceDN

    2.3.1.3

    Replace element

      bidi

    2.3.1.6

    Replace element

      cnfStyle

    2.3.1.8

    Replace element

      contextualSpacing

    2.3.1.9

    Replace element

      framePr

    2.3.1.11

    Replace element

      ind

    2.3.1.12

    Merge attributes

      jc

    2.3.1.13

    Replace element

      keepLines

    2.3.1.14

    Replace element

      keepNext

    2.3.1.15

    Replace element

      kinsoku

    2.3.1.16

    Replace element

      mirrorIndents

    2.3.1.18

    Replace element

      numPr

    2.3.1.19

    Replace element

      outlineLvl

    2.3.1.20

    Replace element

      overflowPunct

    2.3.1.21

    Replace element

      pageBreakBefore

    2.3.1.23

    Replace element

      pBdr

    2.3.1.24

    Merge child elements

      rPr

    2.3.1.29

    Merge child elements

      shd

    2.3.1.31

    Replace element

      snapToGrid

    2.3.1.32

    Replace element

      spacing

    2.3.1.33

    Merge attributes

      suppressAutoHyphens

    2.3.1.34

    Replace element

      suppressLineNumbers

    2.3.1.35

    Replace element

      suppressOverlap

    2.3.1.36

    Replace element

      tabs

    2.3.1.38

    Merge child elements

      textAlignment

    2.3.1.39

    Replace element

      textboxTightWrap

    2.3.1.40

    Replace element

      textDirection

    2.3.1.41

    Replace element

      topLinePunct

    2.3.1.43

    Replace element

      widowControl

    2.3.1.44

    Replace element

      wordWrap

    2.3.1.45

    Replace element

    rPr

    2.7.8.1

     

      b

    2.3.2.1

    Replace element

      bCs

    2.3.2.2

    Replace element

      bdr

    2.3.2.3

    Replace element

      caps

    2.3.2.4

    Replace element

      color

    2.3.2.5

    Replace element

      cs

    2.3.2.6

    Replace element

      dstrike

    2.3.2.7

    Replace element

      eastAsianLayout

    2.3.2.8

    Replace element

      effect

    2.3.2.9

    Replace element

      em

    2.3.2.10

    Replace element

      emboss

    2.3.2.11

    Replace element

      fitText

    2.3.2.12

    Replace element

      highlight

    2.3.2.13

    Replace element

      i

    2.3.2.14

    Replace element

      iCs

    2.3.2.15

    Replace element

      imprint

    2.3.2.16

    Replace element

      kern

    2.3.2.17

    Replace element

      lang

    2.3.2.18

    Merge attributes

      oMath

    2.3.2.20

    Replace element

      outline

    2.3.2.21

    Replace element

      position

    2.3.2.22

    Replace element

      rFonts

    2.3.2.24

    Replace element

      rtl

    2.3.2.28

    Replace element

      shadow

    2.3.2.29

    Replace element

      shd

    2.3.2.30

    Replace element

      smallCaps

    2.3.2.31

    Replace element

      snapToGrid

    2.3.2.32

    Replace element

      spacing

    2.3.2.33

    Replace element

      specVanish

    2.3.2.34

    Replace element

      strike

    2.3.2.35

    Replace element

      sz

    2.3.2.36

    Replace element

      szCs

    2.3.2.37

    Replace element

      u

    2.3.2.38

    Replace element

      vanish

    2.3.2.39

    Replace element

      vertAlign

    2.3.2.40

    Replace element

      w

    2.3.2.41

    Replace element

      webHidden

    2.3.2.42

    Replace element

    tblPr

     

     

      bidiVisual

    2.4.1

    Replace element

      jc

    2.4.23

    Replace element

      shd

    2.4.35

    Replace element

      tblBorders

    2.4.38

    Merge child elements

      tblCellMar

    2.4.39

    Merge child elements

      tblCellSpacing

    2.4.43

    Replace element

      tblInd

    2.4.48

    Replace element

      tblLayout

    2.4.49

    Replace element

      tblLook

    2.4.51

    Replace element

      tblOverlap

    2.4.53

    Replace element

      tblpPr

    2.4.54

    Replace element

      tblStyleColBandSize

    2.7.5.5

    Replace element

      tblStyleRowBandSize

    2.7.5.7

    Replace element

      tblW

    2.4.61

    Replace element

    tblStylePr

     

     

      pPr

    2.7.5.1

    Merge child elements

      rPr

    2.7.5.2

    Merge child elements

      tblPr

    2.7.5.3

    Merge child elements

      tcPr

    2.7.5.9

    Merge child elements

      trPr

    2.7.5.10

    Merge child elements

    tcPr

     

     

      hideMark

    2.4.15

    Replace element

      noWrap

    2.4.28

    Replace element

      shd

    2.4.33

    Replace element

      tcBorders

    2.4.63

    Merge child elements

      tcFitText

    2.4.64

    Replace element

      tcMar

    2.4.65

    Merge child elements

      tcW

    2.4.68

    Replace element

      textDirection

    2.4.69

    Replace element

      vAlign

    2.4.80

    Replace element

    trPr

     

     

      cantSplit

    2.4.6

    Replace element

      gridAfter

    2.4.10

    Replace element

      gridBefore

    2.4.11

    Replace element

      hidden

    2.4.14

    Replace element

      jc

    2.4.22

    Replace element

      tblCellSpacing

    2.4.42

    Replace element

      tblHeader

    2.4.46

    Replace element

      trHeight

    2.4.77

    Replace element

      wAfter

    2.4.82

    Replace element

      wBefore

    2.4.83

    Replace element

     

Page 1 of 6 (6 items) 12345»
Page 1 of 1 (6 items)