Welcome to MSDN Blogs Sign in | Join | Help

Intro to Word XML Part 3: Using Your Own Schema

When we built the support for customer defined schemas into Word 2003 there were a couple scenarios we had in mind. The main goal though was that we wanted people to take existing Word documents and existing Word solutions, and make them more powerful. There has long been support for bookmarks, which have some similarities to the XML support, but that just wasn't enough. The custom XML support finally gave developers the ability to add structure to a Word document and to program against it. Did you know that once you add you're XML to a Word document, you can capture events when a user moves in or out of that element? Did you know that you can use XPath queries to navigate the Word document based on your XML, and get a Word range object as a result? There is a ton of power here, even if your end goal has nothing to do with XML. Of course you can always save out as XML and use that data in some other process, but even if you don't care about getting XML out, the XML support still makes building solutions easier.

Add your XML to a Word document

In order to try out these examples, you'll need to either have a version of Word that supports customer XML, or use the online labs. As I've pointed out, there are a number of uses for the custom XML support in Word. You can use it to bring data into Word, as well as get data out of Word. You can also use it in an existing document to simply make your solutions easier to build and more robust. Today I wanted to just give a quick introduction to getting custom XML into Word. In my first introductory post on Word XML, I showed you how to create a simple Word document. Let's take that example and look at how you could add your own XML to that as well. We'll start with the basic XML file:

<?xml version="1.0"?>
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
    <w:body>
        <w:p>
            <w:r>
                <w:rPr>
                    <w:b/>
                </w:rPr>
                <w:t xml:space="preserve">Name: </w:t>
            </w:r>
            <w:r>
                <w:t>Brian Jones</w:t>
            </w:r>
        </w:p>
        <w:p>
            <w:r>
                <w:rPr>
                    <w:b/>
                </w:rPr>
                <w:t xml:space="preserve">Occupation: </w:t>
            </w:r>
            <w:r>
                <w:t>Program Manager</w:t>
            </w:r>
        </w:p>
    </w:body>
</w:wordDocument>

If you open this in Word, you'll see that it's pretty easy to look at the file and understand what's being described. A human can easily see what the name and occupation is. If you were building a solution though to pull the information and do something smart with it, it would be a bit harder. That's where the XML support comes in. We can mark this up with our own XML, and our solution can then act on that instead. Let's add a new namespace called "http://jonesxml.com/schemas/example1" to the file and give it the prefix: "s:". Our root element should now look like this:

<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:s="http://jonesxml.com/schemas/example1">

The next thing to do is to add the XML from our schema to the file. There are three elements in our schema: employee, name, and occupation. In order to add these to our file, we just put them inline with the Word XML like this:

<?xml version="1.0"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:s="http://jonesxml.com/schemas/example1">
    <w:body>
        <s:employee>
            <w:p>
                <w:r>
                    <w:rPr>
                        <w:b/>
                    </w:rPr>
                    <w:t xml:space="preserve">Name: </w:t>
                </w:r>
                <s:name>
                    <w:r>
                        <w:t>Brian Jones</w:t>
                    </w:r>
                </s:name>
            </w:p>
            <w:p>
                <w:r>
                    <w:rPr>
                        <w:b/>
                    </w:rPr>
                    <w:t xml:space="preserve">Occupation: </w:t>
                </w:r>
                <s:occupation>
                    <w:r>
                        <w:t>Program Manager</w:t>
                    </w:r>
                </s:occupation>
            </w:p>
        </s:employee>
    </w:body>
</w:wordDocument>

Go ahead and open this file in Word. If you don't have the tag view turned on, you probably won't notice any difference yet. Press "CTRL + Shift + X" and you can toggle the XML tag view on and off. Notice that there are XML tags for all three of your elements. You can bring up the XML structure pane as well by pressing "Shift + F1" and on the task pane selector dropdown select the XML structure pane. You can also specify in your XML file directly whether or not you want the XML tags to show. In the document properties, you can set the showXMLTags element. To specify that they always show when the file is opened, just update your file like this:

<?xml version="1.0"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:s="http://jonesxml.com/schemas/example1">
    <w:docPr>
        <w:showXMLTags w:val="on"/>
    </w:docPr>

    <w:body>
        <s:employee>
            <w:p>
                <w:r>
                    <w:rPr>
                        <w:b/>
                    </w:rPr>
                    <w:t xml:space="preserve">Name: </w:t>
                </w:r>
                <s:name>
                    <w:r>
                        <w:t>Brian Jones</w:t>
                    </w:r>
                </s:name>
            </w:p>
            <w:p>
                <w:r>
                    <w:rPr>
                        <w:b/>
                    </w:rPr>
                    <w:t xml:space="preserve">Occupation: </w:t>
                </w:r>
                <s:occupation>
                    <w:r>
                        <w:t>Program Manager</w:t>
                    </w:r>
                </s:occupation>
            </w:p>
        </s:employee>
    </w:body>
</w:wordDocument>

You can also specify that they should not show up by setting the attribute value to "off".

Basic use of the object model

OK, one last thing for this post as it's already getting too long. I wanted to quickly show how you can use the object model to take advantage of the XML structure. Image you wanted to write a solution that changed the employee name. It's really simple. Open the file in Word and bring up VBE (ALT + F11). In the immediate window (CTRL + G) type the following:

activedocument.SelectSingleNode("//s:name", "xmlns:s='http://jonesxml.com/schemas/example1'").Range.Text = "Your name"

Hit enter and go back to your document. Notice that the value of the name is now changed. You can move that name element anywhere in your file and this solution will still work fine. If future examples I'll show how you can capture events on these elements to build a pretty rich solution without a ton of code.

-Brian

Published Tuesday, July 26, 2005 2:03 PM by BrianJones

Comments

Wednesday, July 27, 2005 12:26 AM by Eugen Bacic

# re: Intro to Word XML Part 3: Using Your Own Schema

Thanks, Brian! Now you have me waiting for how to capture events on those elements.
Thursday, July 28, 2005 9:54 AM by Dave R

# re: Intro to Word XML Part 3: Using Your Own Schema

Brian,

Sorry in advance, probably not the right place for this comment...
We've been using some other formal XML Editors for content creation. I'm dying to know if Office 12 will build on top of Avalon and compete with the high-end XML editors on the market.
Thursday, July 28, 2005 10:27 AM by Ignace

# re: Intro to Word XML Part 3: Using Your Own Schema

I think it's ALT+F11 for VBE instead of CTRL+F11
Thursday, July 28, 2005 11:38 AM by BrianJones

# re: Intro to Word XML Part 3: Using Your Own Schema

Ignace, you're right. Thanks for the correction. I just updated it to say "ALT + F11"

Dave, what are the scenarios you are interested in. Office 11 already has a good amount of XML support (between Word, InfoPath, FrontPage, and Excel). In Office 12 we continue to make improvements, which we'll talk about more at PDC. It's hard to answer your question though in regards to competition since editing XML is a fairly basic, generic thing. It really comes down to what type of XML you are editing and what the scenario is.

-Brian
Thursday, July 28, 2005 12:47 PM by Dave R

# re: Intro to Word XML Part 3: Using Your Own Schema

Brian, ah, you're not a mind-reader {:>}. Sorry to be so vague; I'm working in the Publishing sector - books, manuals, newsletters, etc. I'd like my authors to be able to create structured content (XML) in Word and also be able to create "a" style and layout on top of that or apply a predefined style (probably can do most of this in WordML). I'd also like them to be able to work on say a chapter, a primarily "virtual" xml doc that would allow them to link content from other xml docs e.g. letters, graphics, articles, etc. stored in a common repository. Then of course they might open up a book and add chapter links etc. What do you think?
Thursday, August 18, 2005 12:05 AM by David Giusto

# re: Intro to Word XML Part 3: Using Your Own Schema

Dave - This will address your question
Brian - This will address PtSetton's comments to your post on 7/8, Word XSLT: Data Only Transform

We at Document Management Solutions Inc. (DMSI) have integrated Word with an XML content management system. A CMS for document publishing - not a web CMS. The methodology that we used is just what Brian describes in his reply to Bryan White on 7/11 here:

http://blogs.msdn.com/brian_jones/archive/2005/07/08/436973.aspx

Brian's Quote:
"We designed the XML support so that you could leverage both
WordML and your XML together. If there are features such as
formatting, lists, and tables that Word already supports,
then you don't need to mark that up. Instead you can just take
the subset of your schema that isn't already represented by
Word functionality, and only mark up with that."

We only use the user defined schema for high level structure and for application specific data such as anchors and targets. We can then manage the WordML chunks at a higher level of abstraction. This allows for all the CMS functions that you are familiar with to be applied to a Word editorial environment. These functions include:

Check-Out & Check-In from Microsoft® Word
Version Control & Change Tracking
Document Component Sharing & Reuse
Fragment editing & Concurrent Authoring

Since we are using Word we also get WYSIWYG XML Editing & Page Composition. Two things you can not get with a traditional XML authoring tool.

To Brian's point, we got it a long time ago and once you start thinking about Word and XML properly for the context it is quite powerful. Yes, there are a lot of warts in Word XML particularly around how lists are handled but you MUST understand that Word XML is a relational database not a traditional document XML hierarchy. Let me say that again and then you should think about it - Word XML is a relational database not a traditional document XML hierarchy. If you don't understand this point you will not succeed in employing Word XML in any reasonably complex solution. If you find yourself puzzled over this point just look at w:listPr w:list and w:ilfo and say primary key. The other thing to look at is the w:p. A Word document is a series of non-nested paragraphs. It is as if you ran a SQL query and got back a list of paragraph rows where the columns include style name and content. If you still don't get it you are probably over you head here.

While full featured round-trip conversion between the two XML formats (i.e. database and hierarchy) is technically possible it is by no means practical for a Word implementation.

For the skeptics - If you want a demo contact us at http://www.dmsi-world.com
Thursday, August 18, 2005 12:52 AM by David Giusto

# re: Intro to Word XML Part 3: Using Your Own Schema

I guess I need new glasses - In my previous post I got both reference names spelled wrong.
My apologies to Peter Sefton and Bryan Wilhite.
Thursday, November 03, 2005 2:14 PM by Alexander Ryan

# re: Intro to Word XML Part 3: Using Your Own Schema

Great example.

However, whenever I save re-save my file in Word, it chooses to rename my namespace prefixes. This causes the program that processes my file to break as the XPath has now become invalid!

Is there any way to prevent Word from doing this?
Thursday, November 03, 2005 2:18 PM by BrianJones

# re: Intro to Word XML Part 3: Using Your Own Schema

Hey Alexander, the quick answer is no, you cannot control the prefix we use to write out the files. If it's really important to you though, you could always save through an XSLT that takes everything in a particular namespace and forces it to use the prefix you want.

More importantly though, you should never rely on a prefix. When you're programming against the files, you should use the namespace to build up your XPaths, not the prefix. Prefixes are able to change without effecting the actual meaning of the file at all. It's just a shorthand for the actual namespace.

Let me know if that helps and you're able to get it working ok?

-Brian
Thursday, November 03, 2005 4:38 PM by Alexander Ryan

# re: Intro to Word XML Part 3: Using Your Own Schema

Brian,

Thanks for the quick response.

I'm a bit of a newbie to this world and I'm afraid that I've never seen an XPath expression that used namespaces instead of prefixes.

One of my expressions looks like this ...

/w:wordDocument/w:body/wx:sect/u:designOverviewSection/u:designComponent[@number='1']/u:name

and the namespace looks like this ...

xmlns:u="http://www.unisys.com/schemas/3dve/designView"

which Word might change to something like this ...
xmlns:ns1="http://www.unisys.com/schemas/3dve/designView"

By chance are you saying that I have to write my program to first find out what word changed my prefix to and then dynamically revise all of my XPath expressions?

Or is there some way to use a namespace instead of a prefix in an XPath expression itself.


-Brian
Thursday, November 03, 2005 5:49 PM by BrianJones

# re: Intro to Word XML Part 3: Using Your Own Schema

How are you using this XPath? Is it in an XSLT, or through the XML DOM, or some other way?
Friday, November 04, 2005 8:35 AM by Alexander Ryan

# re: Intro to Word XML Part 3: Using Your Own Schema

XPath is being used in an external XML processing application which uses DOM (dom4j). The XPath expressions are hard-coded into this program.

I believe that I will have to modify this program to dynamically determine what word has changed the namespace prefix to & then re-generate the XPath expressions accordingly.

--Alex
Friday, November 04, 2005 12:29 PM by BrianJones

# re: Intro to Word XML Part 3: Using Your Own Schema

Have you tried using the SetProperty method to set your SelectionNamespaces. You should be able to use that for the DOM to specify what you want your prefix to map to. Then it won't matter what prefix is used in the actual XML file, it will only matter what namespace that prefix is mapped to. For example:

oXML.setProperty("SelectionNamespaces", "xmlns:my='myNamespace'")
or something like that...

This is really the right way for dealing with XML. You should never view the prefix as having any meaning on it's own. It's always the namespace you should be programming against.

-Brian
Friday, December 09, 2005 11:38 AM by Alexander Ryan

# re: Intro to Word XML Part 3: Using Your Own Schema

Brian,

I think that I did not communicate the nature of the problem correctly.

I am using an external program that moves content from one Word document into another Word document. It uses an input file that contains an XPath expression to locate the content in the source document and another XPath expression to pinpoint the location to which the content is to be pasted in the target document. These XPath expressions are actually written into an external XML file and used as input to the process and they "must" use the namespace prefix.

When word chooses to rename the namespace prefixes that I have chosen to use it breaks my program. There is no way for me to dynamically update the XPath expressions in my non-WordML input file.

I'd like to suggest that you at least consider reserving some namespace prefixes for use by programmers and not dynamically change these whenever Word documents are saved.

Friday, June 09, 2006 9:30 AM by Developers Interest

# re: Intro to Word XML Part 3: Using Your Own Schema


Hi Brian,

Thanks for the valuable information.

I have a few queries here.

Suppose I need to keep a track of the changes made to a Word document by a user. For doing this say i add tags like </s:employee> to all the paragraphs in the document so that the paragraphs can be identified with the tags. But these tags can be removed by the user at any point of time. Is there some other concrete way to do the same?

Also, I came across ids like wsp:rsidRDefault, wsp:rsidR and wsp:rsidP associated with paragraphs which seem to change with every change made to the corresponding paragraph. On what basis do these Ids change?
Friday, August 11, 2006 5:31 AM by Denise

# re: Intro to Word XML Part 3: Using Your Own Schema

When I save as data only, my child elements (of the
root) all have blank/null namespaces xmlns=""...what is the reasoning
with this and is there a way to have my namespace used in all elements?

<Incident_Report xmlns="http://www.disa.mil/DISA-PAC-PNC-IR">
 <Report_Classification xmlns="" />


Thanks,
Denise


New Comments to this post are disabled
 
Page view tracker