Welcome to MSDN Blogs Sign in | Join | Help

The Microsoft Office Word Team's Blog

All things Microsoft Office Word, from the Word team.
Separate Yet Equal

Alright, here we are at the start of a new year - my first as a married individual (very exciting!) - which means it's time for a few more posts to continue my focus on extensibility features in Word.

So far, in my series of posts on content controls, I've tried to describe what they do, and why we chose to add them as an important new part of Word 2007. I've done that by focusing on the way they let you control the document editing surface. As well, there's another part to the controls (which I think is super exciting) that I have only hinted at so far, and that's how content controls let you integrate Word documents with data coming from other sources, providing true data/view separation in Word.

The Data behind the Document

Let's look at an example document – here's an example of a legal contract document that some of you might have seen before – a legal contract to sell a piece of property:

It's pretty obvious that the document has a lot of boilerplate text (that's invariant regardless of who the parties are to the contract), and some dynamic text (the important data for a particular contract, if you will). In fact, I've already marked up the dynamic data with content controls so that it's easy to fill out in the document:

So, we can basically think of the resulting document as having two parts:

  • The data
  • The view (the boilerplate that surrounds the data)

That data can be handled independently of the rest of the contract – in fact, we can go and move around the content controls to completely redefine what you see on the page and the "data" of the document would be exactly the same. Now, we (as humans) know what the data of the document is just by looking at it, but how do we make it accessible in a way that's easily interoperable with other tools and code that wants to run on top of these Word documents?

To do that, we can take the data of that contract and put it in XML form, like this:

Now we've got a machine readable form (the XML) that's easily shared with any system that understands XML (which are numerous these days) as well as good for automatic processing; and a human readable form (the document including its content controls) that's good for you and me to look at and fill out the contract. All we need is a way to link these two forms together in order to have the best of both worlds: the machine and human readability of the content.

In Word 2007, we do that in two steps:

  1. Storing the data in a special space in the document called the XML data store
  2. Mapping the data in that custom XML to content controls in the document.

Custom XML with Office Open XML Documents

First, we load the data into the document in a way that:

  • Doesn't affect the printed page
  • Keeps the data in the exact form we created (the XML we see above)
  • Makes it easily accessible to any tool that consumes custom XML

To do this, we load the XML into an existing Word Open XML document in one of two ways:

  • Using the Word object model – specifically, the Document.CustomXMLParts.Add() method – to pull the XML into the document. The CustomXMLParts collection is the set of all custom XML documents which are being stored with a document (as there can be any number of them – for example, SharePoint properties are also stored this way)
  • Directly manipulating the Office Open XML file format and adding the custom XML as a new distinct part (this is what the Word object model does "under the hood")

What we end up with is a document with a separate storage for our XML data, like this:

The XML still looks exactly the same as when we added it, it just now travels along with the document.

Mapping Content Controls to Custom XML

Finally, we need a way to associate the elements in the data with individual content controls, which is called XML mapping in Word, by establishing a link between the control and an XML element or attribute by supplying one or two pieces of information:

  • An XPath expression which uniquely targets the element we want to map to
  • (optionally) A specific piece of custom XML on which to evaluate the XPath. If this is omitted, then Word will try each available piece of data in turn, until it finds a match.

To do this, we again:

  • Use the Word object model – specifically, the ContentControl.XMLMapping.SetMapping() method – to specify the XPath expression
  • Directly manipulate the Office Open XML file format and adding the mappings as a property of any content control's <sdtPr> element

Now what we have is a document with distinct data and presentation, but lots of links between the two:

Those links give us that "best of both worlds" I talked about – now, the document can be manipulated from either perspective:

  • When the user types into the controls, the corresponding data in the data store is updated in real time (so the custom XML is always live and up to date).This means that finding out the "data" of the document is as simple as pulling out the appropriate XML data store part.
  • When the data is updated inside or outside of Word, the corresponding controls are updated – so the contract that you see can be changed simply by editing the custom XML that lives with the document. That custom XML has no Word-specific information in it, and is therefore extremely easy to read and/or write.

I know I'm understating it, but if you've ever tried to get data in or out of Word documents, this is a HUGE step forward (along with the Open XML Standard) as it makes getting this information in/out of documents vastly simpler than it was before.

I think I've covered a lot of ground for one post, so I'll stop here. Before I continue (in which I'll dig into each of the pieces I covered here), some of the other members of the team are going to go through a real solution they built on top of this architecture, which is a cool way to hopefully understand it better.

- Tristan

Posted: Wednesday, January 10, 2007 2:12 AM by wrdblog

Comments

Stefan KZVB said:

I'm happy you start writing more about XML solutions in Word as I'm evaluating possibilities to migrate existing Word 2000 VBA solutions to XML with Word 2007! Could you please add some dynamically repeating data elements like a list of people or items to your upcoming examples? Is that easily possible with content controls?

Btw unexpectedly I'm not getting on with the problem that the .dotm in Word 2007's start folder is write-protected when you try to save changes to it. It's only write-protected when it's located on a network folder, on C: this works fine. Of course I'm the only one who uses the network folder in my tests. As written I first thought this might be a problem with the server virus scanner so we excluded Word's start folder from scanning. But that did not help. Could you please confirm this issue or do you have any hint how to make this work?

# January 10, 2007 8:09 AM

Brian Jones: Open XML Formats said:

There is an excellent post over on the Word team blog that goes into details on how the new content controls

# January 10, 2007 1:02 PM

Doug Mahugh said:

Tristan Davis has a great post over on the Word team blog about working with content controls and binding

# January 10, 2007 3:58 PM

Mike G said:

Great post - thanks. Like Stefan, I'd also be really interested in finding out how to handle repeating data in the underlying XML using the content controls, if it's possible.

# January 11, 2007 7:38 AM

Rajesh Khatri said:

Hi,

One query please.

I had followed the same approach as you have mentioned.

oCustomXMLPart = ActiveDocument.CustomXMLParts.Add

oCustomXMLPart.Load ("c:\MyData.xml")

When I made changes in the document and saved, changes were not reflected in "c:\MyData.xml". When I rename .docx to .zip and then unzipo it, i can see the files item2.xml with the changes made.

Is this supposed to be like this only OR "c:\MyData.xml" should also reflect the changes? If i need this modified xml file for further business processes, how can I get the changes?

Please help me to understand it better.

Regards,

Rajesh Khatri

# January 12, 2007 8:57 AM

Mick said:

excellent post

I too would like to see how far you guys can go with content binding

It's really awesome !!

# January 12, 2007 2:40 PM

Jefta said:

I just recently hooked on this blog. Excelent posts i read so far.

Perhaps I missed some earlier posts but is there also a more userfriedly solution for this? I mean you do not want to expain XPath and the word objectmodel to all word users that need solutions like these, do you?

:)

# January 15, 2007 6:41 AM

Brian Rookard said:

I too have some questions about how you can create a sort of merge document from an XML file.

I can create a merge document using an XSLT stylesheet in OpenOffice.  This allows me to take a raw xml file and apply an XSLT stylesheet to the data so that it will open in OpenOffice in the pre-formatted style according to the stylesheet I opened it with.

So, I guess my question about Microsoft Word is: can you create XSLT templates to effectively create merge documents?

From the questions I see, it seems that what people want to do is to take a large XML file and create a merge document from it (which is what I want to know how to do in Microsoft Word).

# January 15, 2007 1:30 PM

wrdblog said:

Thanks everyone for taking the time to read my post. Let me try to respond to each of the questions I got in the thread:

1 - Stefan and Mike, great question about repeating content. We didn't get to it as a native operation in Word 2007, but there are ways you can do it by manipulating the Open XML file format. Definitely something I'll talk about in the future.

2 - Rajesh, once you load the file into Word, you're correct that we only modify the version of the custom XML that's part of the document. However, you can easily extract that file using the .NET Framework 3.0 System.IO.Packaging API for further processing.

3 - Jefta, we definitely don't see setting this up as an end user task - more of something done by a template creator, at which point the users of the template don't know/care about the fact that there are XML mappings in the file. In terms of tools, VSTOv3 will have native support for this in its UI.

4 - Brian, you can definitely use XSLT (in fact, you can do that in Word 2003 as well), but this model is much cleaner in my opinion. All you have to do is swap out the data and the document automatically updates, so creating the template is literally creating the Word template vs. authoring a potentially complex XSLT (which is much harder for most users).

- Tristan

# January 16, 2007 3:11 PM

The Microsoft Office Word Team's Blog said:

XML Mapping with Content Controls in Word Quick intro: My name is Travis Ratnam. I'm a program manager

# January 16, 2007 5:54 PM

Shar said:

Just want to thank you all for every question and answer,  My poor little paper clip fellow is kept scratching his head as I try to learn if

that is possible...Thanks again

# January 17, 2007 5:29 PM

Doug Mahugh said:

It's been quite a year for those who have been blogging about the Open XML file formats. Here's a look

# December 30, 2007 9:39 PM

Noticias externas said:

It&#39;s been quite a year for those who have been blogging about the Open XML file formats. Here&#39;s

# December 30, 2007 10:11 PM

Doug Mahugh said:

This afternoon at TechEd, Zeyad Rajabi demonstrated some of the ways developers can use the Open XML

# June 4, 2008 4:27 PM

Brian Jones: Office Extensibility said:

Happy New Year! I hope everyone had a good holiday. For my first post of the New Year I want to talk

# January 5, 2009 8:10 PM
New Comments to this post are disabled
Page view tracker