Blog - Title

Inserting / Deleting / Moving Paragraphs in Open XML Wordprocessing Documents

Inserting / Deleting / Moving Paragraphs in Open XML Wordprocessing Documents

  • Comments 17

One of the most common scenarios for Open XML is programmatically adding, deleting, and moving paragraphs in a word processing document.  A variation on this is moving or copying paragraphs from one document to another.  This programming task is complicated by the need to keep other parts of the document in sync with the data stored in paragraphs.  For example, a paragraph can contain a reference to a comment in the comments part, and if there is a problem with this reference, the document is invalid.  You must take care when moving / inserting / deleting paragraphs to maintain ‘referential integrity’ within the document.  If you are making a tool to manipulate paragraphs, then this post lists some of the constraints that you must pay attention to.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
(Update Feb 6, 2009 - The code to move/insert/delete paragraphs has been completed.  This post introduces the code, and tells where to download the code from the PowerTools for Open XML project.)

(Update March 24, 2009 - This post was updated with details of more cases of interrelated markup.)

As an example, if the comment ID has a duplicate elsewhere in the document, or if the comment ID is greater than the number of comments in the comments part, the document is invalid.  If the paragraph refers to a style that isn’t in the styles part, the document will not render as expected.

There are two types of markup that we need to pay attention to when moving paragraphs - those where markup spans multiple paragraphs, such as book marks or hyper links, and those where the paragraph contains a reference to something outside of the paragraph, such as a footnote or an image.  In some cases, such as comments, we must deal with both types of markup - comment markup can span paragraphs, and comments have an external reference to the comments part.

I have a goal of augmenting the Power Tools for Open XML to enable more sophisticated document modification tasks, such as using a document as a source of ‘boiler plate’ information, moving paragraphs from the template document to other documents as required.  In addition, I want to make it easier to add or delete paragraphs using PowerShell.  To implement this, I need to have a strategy for maintaining the integrity of documents.  The information presented in this post is the first step in putting this together.

(Update March 23, 2009 - PowerTools for Open XML v1.1 have been released.  This version of the PowerTools contains two new cmdlets, Merge-OpenXmlDocument and Select-OpenXmlString, which enable composition of a new document from existing documents, while addressing the issues of interrelated markup as detailed in this post.)

The list presented in this post probably isn’t complete – I’ll update this list with new items as necessary.

Comments

A paragraph that contains a comment must reference a valid, existing comment.  The comment w:id attributes must be unique.  In the following example, there must not be another comment that has id == “0”.

<w:p>
  <w:r>
    <w:txml:space="preserve">On the Insert tab, the </w:t>
  </w:r>
  <w:commentRangeStartw:id="0"/>
  <w:r>
    <w:txml:space="preserve">galleries </w:t>
  </w:r>
  <w:commentRangeEndw:id="0"/>
  <w:r>
    <w:rPr>
      <w:rStylew:val="CommentReference"/>
    </w:rPr>
    <w:commentReferencew:id="0"/>
  </w:r>
  <w:r>
    <w:t>include items that are designed to coordinate with the overall look of your document.</w:t>
  </w:r>
</w:p>
 

In addition, as shown in the following example, the commentRangeStart element may be in a different paragraph from the commentRangeEnd element.  If, for example, you delete the paragraph that contains the commentRangeStart, but the commentReference and commentRangeEnd elements still exist, then the document isn’t valid.  This doesn’t prevent Word 2007 from opening the document, but we should fix up these elements if deleting or moving paragraphs.

<w:p>
  <w:r>
    <w:txml:space="preserve">On the Insert tab, the galleries include items that are designed to coordinate with the overall look of your </w:t>
  </w:r>
  <w:commentRangeStartw:id="0"/>
  <w:r>
    <w:t>document.</w:t>
  </w:r>
</w:p>
<w:p>
  <w:r>
    <w:txml:space="preserve">You </w:t>
  </w:r>
  <w:commentRangeEndw:id="0"/>
  <w:r>
    <w:rPr>
      <w:rStylew:val="CommentReference"/>
    </w:rPr>
    <w:commentReferencew:id="0"/>
  </w:r>
  <w:r>
    <w:t>can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks.</w:t>
  </w:r>
</w:p>
 

Styles

A style must refer to a valid style that exists in the styles part.  If you copy a paragraph from one document to another, and if the new document doesn’t contain a style with the specified name (“Heading1” in the following example, then the document will not render as you expect.

In my experiments, if you have a paragraph that refers to a non-existent style, Word 2007 still opens the document, and the style reverts to the default style.  However, this isn’t the behavior that we want.  Typically, when moving a paragraph from one document to another, either you would want the paragraph to retain the formatting of one or the other of the documents.  Alternatively, you could create a new style with a different name, so that moved paragraphs retain their styling.

<w:p>
  <w:pPr>
    <w:pStylew:val="Heading1"/>
  </w:pPr>
  <w:r>
    <w:t>Overview</w:t>
  </w:r>
</w:p>
 

Font Tables

These entries are used for font substitution if the named font does not exist. Every font used in the document should appear in this table, but it is not a requirement for a valid document. Fonts are most commonly referenced from styles, but could also be referenced from paragraphs or text runs.

<w:fontsxmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:fontw:name="Times New Roman">
    <w:panose1w:val="02020603050405020304"/>
    <w:charsetw:val="00"/>
    <w:familyw:val="roman"/>
    <w:pitchw:val="variable"/>
    <w:sigw:usb0="20002A87"w:usb1="80000000"w:usb2="00000008"w:usb3="00000000"w:csb0="000001FF"w:csb1="00000000"/>
  </w:font>
  <w:fontw:name="Courier New">
    <w:panose1w:val="02070309020205020404"/>
    <w:charsetw:val="00"/>
    <w:familyw:val="modern"/>
    <w:pitchw:val="fixed"/>
    <w:sigw:usb0="20002A87"w:usb1="80000000"w:usb2="00000008"w:usb3="00000000"w:csb0="000001FF"w:csb1="00000000"/>
  </w:font>

Bookmarks

Bookmarks can span paragraphs.  We should maintain the pairing of bookmarkStart and bookmarkEnd elements.  Neglecting to do so will not make the document invalid, but the bookmark will be lost.

<w:p>
  <w:r>
    <w:t>Check the doc</w:t>
  </w:r>
  <w:bookmarkStartw:id="0"
                   w:name="Book1"/>
  <w:r>
    <w:t>ument.</w:t>
  </w:r>
</w:p>
<w:p>
  <w:r>
    <w:t>You</w:t>
  </w:r>
  <w:bookmarkEndw:id="0"/>
  <w:r>
    <w:t>should check.</w:t>
  </w:r>
</w:p>
 

Hyperlinks

Hyperlinks can span paragraphs.  If you move a paragraph without fixing up the markup for hyperlinks, then you will have hyperlinks that don’t have the correct appearance or behavior.

The following shows the markup for a hyperlink:

<w:body>
  <w:p>
    <w:pPr>
      <w:rPr>
        <w:rStylew:val="Hyperlink"/>
      </w:rPr>
    </w:pPr>
    <w:r>
      <w:txml:space="preserve">On the Insert tab, </w:t>
    </w:r>
    <w:r>
      <w:fldCharw:fldCharType="begin"/>
    </w:r>
    <w:r>
      <w:instrTextxml:space="preserve"> HYPERLINK "https://blogs.msdn.com/ericwhite" </w:instrText>
    </w:r>
    <w:r>
      <w:fldCharw:fldCharType="separate"/>
    </w:r>
    <w:r>
      <w:rPr>
        <w:rStylew:val="Hyperlink"/>
      </w:rPr>
      <w:t>document.</w:t>
    </w:r>
  </w:p>
  <w:p>
    <w:r>
      <w:rPr>
        <w:rStylew:val="Hyperlink"/>
      </w:rPr>
      <w:t>You</w:t>
    </w:r>
    <w:r>
      <w:fldCharw:fldCharType="end"/>
    </w:r>
    <w:r>
      <w:txml:space="preserve"> can use.</w:t>
    </w:r>
  </w:p>

Hyperlinks may also reference an external relationship. The external relationship part contains the actual link value for this type of hyperlink even though the body text may appear to define the hyperlink.

<w:pw:rsidR="00CA7872"w:rsidRDefault="00CA7872"w:rsidP="00E50D37">
  <w:pPr>
    <w:indw:left="720"/>
  </w:pPr>
  <w:r>
    <w:txml:space="preserve">You can download PowerShell from </w:t>
  </w:r>
  <w:hyperlinkr:id="rId8"w:history="1">
    <w:rw:rsidR="00A03F7B"w:rsidRPr="00764227">
      <w:rPr>
        <w:rStylew:val="Hyperlink"/>
      </w:rPr>
      <w:t>http://www.microsoft.com/windowsserver2003/technologies/
management/powershell/download.mspx</w:t>
    </w:r>
  </w:hyperlink>
</w:p>
 

Permissions

It is unlikely that you will want to programmatically add / delete / move paragraphs in a document that has been carefully designed to have some regions as editable, and others as not.

My opinion: it’s reasonable to disallow manipulation of paragraphs that have editing permissions applied to portions of the paragraph.  Note that this doesn't hinder moving paragraphs in a document that has editing permissions applied, so long as the entire paragraph is either editable or not editable.

Here is the markup for permissions that cross paragraph boundaries:

<w:p>
  <w:permStartw:id="0"
               w:edGrp="everyone"/>
  <w:r>
    <w:t>Paragraph one.</w:t>
  </w:r>
</w:p>
<w:p>
  <w:r>
    <w:txml:space="preserve">Paragraph </w:t>
  </w:r>
  <w:permEndw:id="0"/>
  <w:r>
    <w:t>two.</w:t>
  </w:r>
</w:p>

 

Content Controls

Content controls either are contained within a single paragraph, or contain one or more paragraphs, so we don’t have an issue about moving around, adding, or deleting them, with the exception that data-bound content controls contain markup to tie them to the custom XML.  If you move such markup from one document to another, you need to make sure that you assemble all of the plumbing for data-bound content controls.

It would be perfectly valid and reasonable to do document generation / assembly for a document that contained custom schema, permissions, and data-bound content controls.  However, this scenario doesn’t have the same issues associated with moving / inserting / deleting paragraphs.

I think it would be reasonable in a programmability tool to restrict moving paragraphs that contain data-bound content controls from one document to another.

For a detailed explanation of data-bound content controls, see Creating Data-Bound Content Controls using the Open XML SDK and LINQ to XML.

A content control with properly set-up data binding looks like this:

<w:sdt>
  <w:sdtPr>
    <w:aliasw:val="Name"/>
    <w:tagw:val="Name"/>
    <w:idw:val="13264407"/>
    <w:placeholder>
      <w:docPartw:val="DefaultPlaceholder_22675703"/>
    </w:placeholder>
    <w:dataBinding
      w:xpath="/Root/Name"
      w:storeItemID="{F351E99C-3283-4B75-927A-A56C9FD3BFFC}"/>
    <w:text/>
  </w:sdtPr>
  <w:sdtContent>
    <w:tc>
      <w:tcPr>
        <w:tcWw:w="4410"
               w:type="dxa"/>
      </w:tcPr>
      <w:p>
        <w:r>
          <w:t>Eric White</w:t>
        </w:r>
      </w:p>
    </w:tc>
  </w:sdtContent>
</w:sdt>

Images

An image embedded in a paragraph contains a reference to the part that contains the image.  If moving a paragraph that contains an image, we need to update this reference, and move the image part.

<w:p>
  <w:r>
    <w:t>On the Insert tab, the galleries include items that are designed to coordinate with the overall look of your document.</w:t>
  </w:r>
</w:p>
<w:p>
  <w:r>
    <w:rPr>
      <w:noProof />
    </w:rPr>
    <w:drawing>
      <wp:inlinedistT="0"distB="0"distL="0"distR="0">
        <wp:extentcx="390525"cy="400050" />
        <wp:effectExtentl="19050"t="0"r="9525"b="0" />
        <wp:docPrid="1"name="Picture 1" />
        <wp:cNvGraphicFramePr>
          <a:graphicFrameLocksxmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"noChangeAspect="1" />
        </wp:cNvGraphicFramePr>
        <a:graphicxmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
          <a:graphicDatauri="http://schemas.openxmlformats.org/drawingml/2006/picture">
            <pic:picxmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
              <pic:nvPicPr>
                <pic:cNvPrid="0"name="Picture 1" />
                <pic:cNvPicPr>
                  <a:picLocksnoChangeAspect="1"noChangeArrowheads="1" />
                </pic:cNvPicPr>
              </pic:nvPicPr>
              <pic:blipFill>
                <a:blipr:embed="rId4" />
                <a:srcRect />
                <a:stretch>
                  <a:fillRect />
                </a:stretch>
              </pic:blipFill>
              <pic:spPrbwMode="auto">
                <a:xfrm>
                  <a:offx="0"y="0" />
                  <a:extcx="390525"cy="400050" />
                </a:xfrm>
                <a:prstGeomprst="rect">
                  <a:avLst />
                </a:prstGeom>
                <a:noFill />
                <a:lnw="9525">
                  <a:noFill />
                  <a:miterlim="800000" />
                  <a:headEnd />
                  <a:tailEnd />
                </a:ln>
              </pic:spPr>
            </pic:pic>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>
<w:p>
  <w:r>
    <w:t>You can use these galleries to insert tables, headers, footers, lists, cover pages, and other document building blocks.</w:t>
  </w:r>
</w:p>
 

Shapes

The “shape” element may contain an “imagedata” element that references an image part.

<w:pw:rsidR="004B2AB8"w:rsidRDefault="004B2AB8">
  <w:pPr>
    <w:pStylew:val="Footer"/>
    <w:jcw:val="right"/>
  </w:pPr>
  <w:r>
    <w:objectw:dxaOrig="2266"w:dyaOrig="361">
      <v:shapetypeid="_x0000_t75"coordsize="21600,21600"o:spt="75"o:preferrelative="t"path="m@4@5l@4@11@9@11@9@5xe"filled="f"stroked="f">
        <v:strokejoinstyle="miter"/>
        <v:formulas>
          <v:feqn="if lineDrawn pixelLineWidth 0"/>
          <v:feqn="sum @0 1 0"/>
          <v:feqn="sum 0 0 @1"/>
          <v:feqn="prod @2 1 2"/>
          <v:feqn="prod @3 21600 pixelWidth"/>
          <v:feqn="prod @3 21600 pixelHeight"/>
          <v:feqn="sum @0 0 1"/>
          <v:feqn="prod @6 1 2"/>
          <v:feqn="prod @7 21600 pixelWidth"/>
          <v:feqn="sum @8 21600 0"/>
          <v:feqn="prod @7 21600 pixelHeight"/>
          <v:feqn="sum @10 21600 0"/>
        </v:formulas>
        <v:patho:extrusionok="f"gradientshapeok="t"o:connecttype="rect"/>
        <o:lockv:ext="edit"aspectratio="t"/>
      </v:shapetype>
      <v:shapeid="_x0000_i1026"type="#_x0000_t75"style="width:113.25pt;height:18pt"o:ole=""fillcolor="window">
        <v:imagedatar:id="rId1"o:title=""/>
      </v:shape>
      <o:OLEObjectType="Embed"ProgID="Word.Picture.8"ShapeID="_x0000_i1026"DrawAspect="Content"ObjectID="_1293685530"r:id="rId2"/>
    </w:object>
  </w:r>
</w:p>
 

Diagrams

A diagram element has references to four other parts. These parts are DiagramData, DiagramLayoutDefinition, DiagramStyle and DiagramColors. Each is referenced by a different attribute in the diagram element as shown below:

<w:pw:rsidR="004509E4"w:rsidRDefault="004509E4">
  <w:r>
    <w:rPr>
      <w:noProof/>
      <w:langw:eastAsia="ja-JP"/>
    </w:rPr>
    <w:drawing>
      <wp:inlinedistT="0"distB="0"distL="0"distR="0">
        <wp:extentcx="5400675"cy="495300"/>
        <wp:effectExtentl="76200"t="19050"r="47625"b="19050"/>
        <wp:docPrid="1"name="Diagram 1"/>
        <wp:cNvGraphicFramePr/>
        <a:graphicxmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
          <a:graphicDatauri="http://schemas.openxmlformats.org/drawingml/2006/diagram">
            <dgm:relIdsxmlns:dgm="http://schemas.openxmlformats.org/drawingml/2006/diagram"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"r:dm="rId8"r:lo="rId9"r:qs="rId10"r:cs="rId11"/>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>

 

Headers and Footers

Header and Footer parts and references work exactly the same, except for the differences in element names. A header or footer may be referenced from a section property:

<w:pw:rsidR="00E07D9D"w:rsidRDefault="007C3987"w:rsidP="00A75ABF">
  <w:pPr>
    <w:pStylew:val="ListParagraph"/>
    <w:rPr>
      <w:i/>
    </w:rPr>
    <w:sectPrw:rsidR="00E07D9D"w:rsidSect="007A5AEF">
      <w:headerReferencew:type="default"r:id="rId11"/>
      <w:pgSzw:w="12240"w:h="15840"/>
      <w:pgMarw:top="1440"w:right="1440"w:bottom="1440"w:left="1440"w:header="720"w:footer="720"w:gutter="0"/>
      <w:colsw:space="720"/>
      <w:docGridw:linePitch="360"/>
    </w:sectPr>
  </w:pPr>
  <w:r>
    <w:rPr>
      <w:i/>
    </w:rPr>
    <w:t>Before Break</w:t>
  </w:r>
</w:p>
 

The Header and Footer parts may also have references to other parts containing embedded objects, images or shapes:

<w:hdrxmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006"xmlns:o="urn:schemas-microsoft-com:office:office"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"xmlns:v="urn:schemas-microsoft-com:vml"xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"xmlns:w10="urn:schemas-microsoft-com:office:word"xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
  <w:pw:rsidR="00600C29"w:rsidRDefault="00600C29">
    <w:pPr>
      <w:pStylew:val="Header"/>
    </w:pPr>
    <w:r>
      <w:rPr>
        <w:noProof/>
        <w:langw:eastAsia="en-US"/>
      </w:rPr>
      <w:drawing>
        <wp:inlinedistT="0"distB="0"distL="0"distR="0">
          <wp:extentcx="6580533"cy="1152939"/>
          <wp:effectExtentl="19050"t="0"r="0"b="0"/>
          <wp:docPrid="6"name="Picture 2"descr="Head ECMA"/>
          <wp:cNvGraphicFramePr>
            <a:graphicFrameLocksxmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"noChangeAspect="1"/>
          </wp:cNvGraphicFramePr>
          <a:graphicxmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
            <a:graphicDatauri="http://schemas.openxmlformats.org/drawingml/2006/picture">
              <pic:picxmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
                <pic:nvPicPr>
                  <pic:cNvPrid="0"name="Picture 2"descr="Head ECMA"/>
                  <pic:cNvPicPr>
                    <a:picLocksnoChangeAspect="1"noChangeArrowheads="1"/>
                  </pic:cNvPicPr>
                </pic:nvPicPr>
                <pic:blipFill>
                  <a:blipr:embed="rId1"/>
                  <a:srcRect/>
                  <a:stretch>
                    <a:fillRect/>
                  </a:stretch>
                </pic:blipFill>
                <pic:spPrbwMode="auto">
                  <a:xfrm>
                    <a:offx="0"y="0"/>
                    <a:extcx="6580533"cy="1152939"/>
                  </a:xfrm>
                  <a:prstGeomprst="rect">
                    <a:avLst/>
                  </a:prstGeom>
                  <a:noFill/>
                  <a:lnw="9525">
                    <a:noFill/>
                    <a:miterlim="800000"/>
                    <a:headEnd/>
                    <a:tailEnd/>
                  </a:ln>
                </pic:spPr>
              </pic:pic>
            </a:graphicData>
          </a:graphic>
        </wp:inline>
      </w:drawing>
    </w:r>
  </w:p>
</w:hdr>
 

This example header contains a reference to an Image part.

Footnote References

Footnote references contain an id that needs to be valid:

<w:p>
  <w:r>
    <w:txml:space="preserve">On the Insert </w:t>
  </w:r>
  <w:r>
    <w:rPr>
      <w:rStylew:val="FootnoteReference"/>
    </w:rPr>
    <w:footnoteReferencew:id="2"/>
  </w:r>
  <w:r>
    <w:t>tab, the galleries include items that are designed to coordinate with the overall look of your document.</w:t>
  </w:r>
</w:p>

EndNote References

EndNote references contain an id that needs to be valid:

<w:p>
  <w:r>
    <w:txml:space="preserve">On the Insert </w:t>
  </w:r>
  <w:r>
    <w:rPr>
      <w:rStylew:val="EndnoteReference"/>
    </w:rPr>
    <w:endnoteReferencew:id="2"/>
  </w:r>
  <w:r>
    <w:t>tab, the galleries include items that are designed to coordinate with the overall look of your document.</w:t>
  </w:r>
</w:p>

 

Charts

The “chart” element references a separate part that contains the elements defining the chart as shown below:

<w:pw:rsidR="00CD68BC"w:rsidRDefault="00CD68BC">
  <w:r>
    <w:rPr>
      <w:noProof/>
    </w:rPr>
    <w:drawing>
      <wp:inlinedistT="0"distB="0"distL="0"distR="0">
        <wp:extentcx="5486400"cy="3200400"/>
        <wp:effectExtentl="19050"t="0"r="19050"b="0"/>
        <wp:docPrid="1"name="Chart 1"/>
        <wp:cNvGraphicFramePr/>
        <a:graphicxmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
          <a:graphicDatauri="http://schemas.openxmlformats.org/drawingml/2006/chart">
            <c:chartxmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"r:id="rId4"/>
          </a:graphicData>
        </a:graphic>
      </wp:inline>
    </w:drawing>
  </w:r>
</w:p>
 

The chart part will also contain a reference to an embedded object containing the data used by the chart:

<c:chartSpacexmlns:c="http://schemas.openxmlformats.org/drawingml/2006/chart"xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <c:date1904val="1"/>
  <c:langval="en-US"/>
  <c:chart>
    ...
  </c:chart>
  <c:externalDatar:id="rId1"/>
</c:chartSpace>
 

Lists

Lists contain identifiers that have a scope beyond a paragraph.  Moving items in a list is not a mainstream scenario, but should be done properly.

<w:p>
  <w:r>
    <w:t>Abc</w:t>
  </w:r>
</w:p>
<w:p>
  <w:pPr>
    <w:pStylew:val="ListParagraph" />
    <w:numPr>
      <w:ilvlw:val="0" />
      <w:numIdw:val="1" />
    </w:numPr>
  </w:pPr>
  <w:r>
    <w:t>2123</w:t>
  </w:r>
</w:p>
<w:p>
  <w:pPr>
    <w:pStylew:val="ListParagraph" />
    <w:numPr>
      <w:ilvlw:val="0" />
      <w:numIdw:val="1" />
    </w:numPr>
  </w:pPr>
  <w:r>
    <w:t>43243</w:t>
  </w:r>
</w:p>
<w:p>
  <w:pPr>
    <w:pStylew:val="ListParagraph" />
    <w:numPr>
      <w:ilvlw:val="0" />
      <w:numIdw:val="1" />
    </w:numPr>
  </w:pPr>
  <w:r>
    <w:t>4343</w:t>
  </w:r>
</w:p>

 

Numbering

Numbered paragraphs and lists also contain a reference to a particular ID in the NumberingDefinitions part:

<w:pw:rsidR="00207933"w:rsidRDefault="00817809"w:rsidP="00E50D37">
  <w:pPr>
    <w:pStylew:val="ListParagraph"/>
    <w:numPr>
      <w:ilvlw:val="0"/>
      <w:numIdw:val="6"/>
    </w:numPr>
  </w:pPr>
  <w:r>
    <w:t>Make sure you have administration rights on your PC.</w:t>
  </w:r>
</w:p>
 

These numbering definitions also contain references to abstract numbering definitions that appear in the same part:

<w:numberingxmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006"xmlns:o="urn:schemas-microsoft-com:office:office"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"xmlns:v="urn:schemas-microsoft-com:vml"xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"xmlns:w10="urn:schemas-microsoft-com:office:word"xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
  <w:abstractNumw:abstractNumId="4">
    <w:nsidw:val="5E3A32EC"/>
    <w:multiLevelTypew:val="hybridMultilevel"/>
    <w:tmplw:val="E8583004"/>
    <w:lvlw:ilvl="0"w:tplc="0409000F">
      <w:startw:val="1"/>
      <w:numFmtw:val="decimal"/>
      <w:lvlTextw:val="%1."/>
      <w:lvlJcw:val="left"/>
      <w:pPr>
        <w:indw:left="720"w:hanging="360"/>
      </w:pPr>
      <w:rPr>
        <w:rFontsw:hint="default"/>
      </w:rPr>
    </w:lvl>
    ...
  </w:abstractNum>
  <w:numw:numId="6">
    <w:abstractNumIdw:val="4"/>
  </w:num>
</w:numbering>
 

DocumentSettings

The DocumentSettings part is referenced from the MainDocument part, but it is not in the markup. These settings usually contain references to Header and Footer parts that need to be copied if the DocumentSettings part is copied.

<w:settingsxmlns:o="urn:schemas-microsoft-com:office:office"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"xmlns:v="urn:schemas-microsoft-com:vml"xmlns:w10="urn:schemas-microsoft-com:office:word"xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"xmlns:sl="http://schemas.openxmlformats.org/schemaLibrary/2006/main">
  <w:footnotePr>
    <w:footnotew:id="0"/>
    <w:footnotew:id="1"/>
  </w:footnotePr>
  <w:endnotePr>
    <w:endnotew:id="0"/>
    <w:endnotew:id="1"/>
  </w:endnotePr>
  ...
</w:settings>
 

Other Parts

The main document also has CoreFileProperties, ExtendedFileProperties, CustomFileProperties and WebSettings parts. These parts are not required for a valid document, but will often may the new document match the old one more closely if they are copied to the new document. These have no references to other parts.

Remaining Interrelated Markup Issues

There are issues associated with themes and interrelated markup.  This blog post doesn’t yet contain this information.  Also, as we identify other interrelated markup, I’ll update this blog post with details.

Leave a Comment
  • Please add 8 and 7 and type the answer here:
  • Post
  • Suite à la PDC 2008 et au workshop Open XML donné par Microsoft à Redmond ( Doug , encore mille excuses

  • Also... headers, footers, footnotes, document variables, images, lists

  • Thanks, Sten - I'll update this.

    -Eric

  • Sten,

    I've updated the post with the information on footnotes, images, and lists.  However, please excuse my ignorance :)  I'm not clear on what you are refering to with headers, footers, and document variables.  How are these features represented in markup that has a reference to something outside of the context of a single paragraph?

    Thanks, Eric

  • Document variables (accessible in Word 2003 through File|Properties); if copied to another document the text would come over, but the variable reference would be invalid or lost.

    The "w:instr" attribute contains a reference to settings.xml:

    <w:fldSimple w:instr=" DOCVARIABLE UvarTypeReport \* Upper \* MERGEFORMAT ">

     <w:r w:rsidR="00AD0F9A">

       <w:rPr>

         <w:sz w:val="24"/>

       </w:rPr>

       <w:t>COMPLETE</w:t>

     </w:r>

    </w:fldSimple>

    Headers and Footers are outside the document.xml, each is a part of its own and are referenced using w:headerReference or w:footerReference; Not only they are referenced but there is the w:type attribute which affects what is shown when the document is loaded in Word. In the sample below the header rId7 is not rendered in Word and rId8 is:

    <w:p w:rsidR="00AD0F9A" w:rsidRDefault="00AD0F9A">

     <w:pPr>

       <w:sectPr w:rsidR="00AD0F9A">

         <w:headerReference w:type="default" r:id="rId7"/>

         <w:headerReference w:type="first" r:id="rId8"/>

         <w:footerReference w:type="first" r:id="rId9"/>

         <w:pgSz w:w="12240" w:h="15840" w:code="1"/>

         <w:pgMar w:top="4896" w:right="1800" w:bottom="720" w:left="864" w:header="720" w:footer="576" w:gutter="0"/>

         <w:paperSrc w:first="15" w:other="15"/>

         <w:pgNumType w:start="1"/>

         <w:cols w:space="720"/>

         <w:titlePg/>

       </w:sectPr>

     </w:pPr>

    </w:p>

  • Zeyad Rajabi has started a series of very useful hands-on posts over on Brian Jones's blog about working

  • Style references are present in runs as well:

    w:r/w:rPr/w:rStyle[@val='StyleName']

    I haven't verified it yet, but since there are table styles they are probably referenced from within tables

  • DocumentBuilder is an example class that’s part of the PowerTools for Open XML project that enables you

  • Hi Eric,

    I have a open xml paragraph Markup like this.

    <w:p w:rsidR="00A60A58" w:rsidRPr="00810B68" w:rsidRDefault="00A60A58" w:rsidP="00017828">

       <w:pPr>

           <w:pStyle w:val="equation" /></w:pPr>

       <w:r w:rsidRPr="00810B68">

           <w:tab/></w:r>

       <w:r w:rsidRPr="00810B68">

           <w:rPr>

               <w:position w:val="-24" /></w:rPr>

           <w:object w:dxaOrig="2380" w:dyaOrig="560">

               <v:shape id="_x0000_i1073" type="#_x0000_t75" style="width:118.4pt;height:27.45pt" o:ole="">

                   <v:imagedata r:id="rId103" o:title="" /></v:shape>

               <o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1073" DrawAspect="Content" ObjectID="_1439405570" r:id="rId104" />

           </w:object>

       </w:r>

       <w:r w:rsidRPr="00810B68">

           <w:tab/></w:r>

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="begin" /></w:r>

       <w:r w:rsidR="00782DAD">

           <w:instrText xml:space="preserve">MACROBUTTON MTPlaceRef \* MERGEFORMAT</w:instrText>

       </w:r>

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="begin" /></w:r>

       <w:r w:rsidR="009E6C23">

           <w:instrText xml:space="preserve">SEQ MTEqn \h \* MERGEFORMAT</w:instrText>

       </w:r>

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="end" /></w:r>

       <w:bookmarkStart w:id="5" w:name="ZEqnNum679168" />

       <w:r w:rsidR="00782DAD">

           <w:instrText>(3.</w:instrText>

       </w:r>

       <w:fldSimple w:instr=" SEQ MTEqn \c \* Arabic \* MERGEFORMAT ">

           <w:r w:rsidR="00410805">

               <w:rPr>

                   <w:noProof/></w:rPr>

               <w:instrText>3</w:instrText>

           </w:r>

       </w:fldSimple>

       <w:r w:rsidR="00782DAD">

           <w:instrText>)</w:instrText>

       </w:r>

       <w:bookmarkEnd w:id="5" />

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="end" /></w:r>

    </w:p>

    and the innertext is

    "Note that the disagreement error  is not available to node i unless it is pinned to the leader node (that is,), whereas the local neighborhood tracking error  GOTOBUTTON ZEqnNum679168  \* MERGEFORMAT  REF ZEqnNum679168 \* Charformat \! \* MERGEFORMAT (3.3) is known to each node i."

    But i need only this text from the innerText,

    Note that the disagreement error  is not available to node i unless it is pinned to the leader node (that is,), whereas the local neighborhood tracking error (3.3) is known to each node i.

    which is similar to the text displayed in Microsoft Word.

    Please help me in this.

    Thank you.

  • Hi Mohamed,

    Looking at your markup I think maybe you got a different paragraph than the one with the inner text you show - the paragraph markup you show has an ActiveX object with a math equation in it.  But in any case, the answer to your question is that innerText is not helpful to you when you want to retrieve the actual text of a paragraph.  Instead, what you want to do is to find all descendant w:t elements, and concatenate the text value of them.  This will give you the actual text of the paragraph.  One exception to this is that if your paragraph contains tracked revisions, you will first want to accept tracked revisions, and then do the concatenation of the w:t elements.  You can use the RevisionAccepter class of PowerTools for Open XML to do this.  See http://powertools/codeplex.com.

    Cheers, Eric

  • Hi Eric,

    Sorry for the wrong one.

    This is the relevant Markup.

    <w:p w:rsidR="00FA19E4" w:rsidRDefault="00FA19E4" w:rsidP="00CD12D6">

       <w:r>

           <w:t xml:space="preserve">Note that the disagreement error</w:t>

       </w:r>

       <w:r w:rsidRPr="00810B68">

           <w:rPr>

               <w:position w:val="-10" /></w:rPr>

           <w:object w:dxaOrig="859" w:dyaOrig="279">

               <v:shape id="_x0000_i1101" type="#_x0000_t75" style="width:41.7pt;height:15.15pt" o:ole="">

                   <v:imagedata r:id="rId151" o:title="" /></v:shape>

               <o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1101" DrawAspect="Content" ObjectID="_1439405598" r:id="rId159" />

           </w:object>

       </w:r>

       <w:r>

           <w:t xml:space="preserve">is not available to node</w:t>

       </w:r>

       <w:r w:rsidRPr="00FA19E4">

           <w:rPr>

               <w:i/></w:rPr>

           <w:t>i</w:t>

       </w:r>

       <w:r>

           <w:t xml:space="preserve">unless it is pinned to the leader node</w:t>

       </w:r>

       <w:r w:rsidR="00C50F07">

           <w:t xml:space="preserve">(that is,</w:t>

       </w:r>

       <w:r w:rsidR="00C50F07" w:rsidRPr="00C50F07">

           <w:rPr>

               <w:position w:val="-12" /></w:rPr>

           <w:object w:dxaOrig="639" w:dyaOrig="360">

               <v:shape id="_x0000_i1102" type="#_x0000_t75" style="width:27.45pt;height:15.15pt" o:ole="">

                   <v:imagedata r:id="rId160" o:title="" /></v:shape>

               <o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1102" DrawAspect="Content" ObjectID="_1439405599" r:id="rId161" />

           </w:object>

       </w:r>

       <w:r w:rsidR="00C50F07">

           <w:t>)</w:t>

       </w:r>

       <w:r>

           <w:t>, whereas the local neighborhood trac</w:t>

       </w:r>

       <w:r>

           <w:t>k</w:t>

       </w:r>

       <w:r>

           <w:t xml:space="preserve">ing error</w:t>

       </w:r>

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="begin" /></w:r>

       <w:r>

           <w:instrText xml:space="preserve">GOTOBUTTON ZEqnNum679168 \* MERGEFORMAT</w:instrText>

       </w:r>

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="begin" /></w:r>

       <w:r w:rsidR="00954790">

           <w:instrText xml:space="preserve">REF ZEqnNum679168 \* Charformat \! \* MERGEFORMAT</w:instrText>

       </w:r>

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="separate" /></w:r>

       <w:r w:rsidR="00410805">

           <w:instrText>(3.3)</w:instrText>

       </w:r>

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="end" /></w:r>

       <w:r w:rsidR="006F3ADF">

           <w:fldChar w:fldCharType="end" /></w:r>

       <w:r>

           <w:t xml:space="preserve">is known to each node</w:t>

       </w:r>

       <w:r w:rsidRPr="00FA19E4">

           <w:rPr>

               <w:i/></w:rPr>

           <w:t>i</w:t>

       </w:r>

       <w:r>

           <w:t>.</w:t>

       </w:r>

    </w:p>

  • (Splitted the Post into Two)

    and the innertext is

    "Note that the disagreement error  is not available to node i unless it is pinned to the leader node (that is,), whereas the local neighborhood tracking error  GOTOBUTTON ZEqnNum679168  \* MERGEFORMAT  REF ZEqnNum679168 \* Charformat \! \* MERGEFORMAT (3.3) is known to each node i."

    But i need only this text from the innerText,

    Note that the disagreement error  is not available to node i unless it is pinned to the leader node (that is,), whereas the local neighborhood tracking error (3.3) is known to each node i.

    Note: Here the text from "<w:instrText>(3.3)</w:instrText>" also required.

    How can i avoid these Field Codes & Text which is not required. And i tried using Regex also but with no success.

    which is similar to the text displayed in Microsoft Word Application.

    Please help me in this.

    Thank you.

  • (Splitted the Post into two coz of Length)

    and the innertext is

    "Note that the disagreement error  is not available to node i unless it is pinned to the leader node (that is,), whereas the local neighborhood tracking error  GOTOBUTTON ZEqnNum679168  \* MERGEFORMAT  REF ZEqnNum679168 \* Charformat \! \* MERGEFORMAT (3.3) is known to each node i."

    But i need only this text from the innerText,

    Note that the disagreement error  is not available to node i unless it is pinned to the leader node (that is,), whereas the local neighborhood tracking error (3.3) is known to each node i.

    Note: Here the text from "<w:instrText>(3.3)</w:instrText>" also required.

    How can i avoid these Field Codes & Text which is not required. And i tried using Regex also but with no success.

    which is similar to the text displayed in Microsoft Word Application.

    Please help me in this.

    Thank you.

  • Hi Mohamed,

    What language are you using?  C#?  VB.NET?  XSLT?  If using C# / VB, which version, and which version of the .NET framework?

    I'll need that information in order to help...

    Cheers, Eric

  • Hi Eric,

    I'm using C# with .NET Framework 4 and Open XML SDK 2.0.

    Thanks.

Page 1 of 2 (17 items) 12