Welcome to MSDN Blogs Sign in | Join | Help

Intro to Word XML Part 5: Opening custom XML

Intro to Word XML Part 5: Opening custom XML files

I've been talking for awhile now about the support for custom defined schemas in Office. I'm actually going to pull together a post in the next week or so that addresses the uses and motivations behind the XML support we have in Office. We talk about XML a lot, and it should be clear by now that there are a ton of uses. From an Office point of view, there is no such thing as a single "XML editor", but instead a collection of tools that use XML to improve the power of their scenarios. Word can open generic XML, but that doesn't mean it should be used as a generic XML editor. It wouldn't really make sense to open Excel's XML in Word, since SpreadsheetML is used to describe a spreadsheet, and would be fairly difficult to edit in a Word processor. Of course Word and Excel both have a collection of shared functionality, but those are subsets of the larger overall set of functionality in each application. I plan to address this in more detail soon because I think it's really important to understand this when you are exploring the XML functionality and trying to determine what tools best suit your scenarios.

For today though let's talk about generic XML editing in Word. You can open any XML file you want it Word, and depending on how you set Word up, you can even teach Word to display your XML in a rich way. In part 3, I showed how you could create a WordprocessingML file that had your own XML in it as well. If you start with an XML file that is just made up of your XML, you can create an XSLT that will teach Word how to display your XML.

Opening an XML file

Let's start with a basic XML file:

<?xml version="1.0"?>
<s:employee xmlns:s="http://jonesxml.com/schemas/example1">
    <s:name>Brian Jones</s:name>
    <s:occupation>Program Manager</s:occupation>
</s:employee>

Try opening that file in Word. The result should be that you get a simple text document with your tags showing. This gives the appearance that Word is able to internally open any XML file. This is actually not quite what's going on. It's really more similar to what happens when you open an XML file in IE without applying a transform. Word sees that the XML is not in it's namespace, so it looks to see if there is a transform specified. If there isn't a transform, Word will fall back to using a default XSLT that transforms into WordprocessingML. The transform that we use is found in the programs folder: c:\Program Files\Microsoft Office\OFFICE11\XML2WORD.XSL

Go ahead and open that file up. You'll see that we map custom XML into a hybrid of WordprocessingML and the custom XML. We apply some indentation based on how deep the tags are nested which gives you that tree view like appearance. We also specify that that XML tag view should be on, just like we did in the example I posted in part 3 of the intro to Word XML. Also notice that we create this tag: <w:removeWordSchemaOnSave w:val="on" />. That tells Word that when the user hits the save button, the document should be saved as "data only", which removes the WordprocessingML. That's why you can open any generic XML file, make some edits, and press just press save.

Now, what I've just described doesn't exactly fit with what our goals were for the XML support in Word. We weren't trying to make Word into a generic XML editor. Our main goals were to make it much easier for people to build solutions in Word that were document based solutions. Word is a document editor, and by adding XML support to Word, the solutions you build become easier and more powerful. Visual Studio is really a better example of a generic XML editor.

Applying an XSLT to your data

If you want to open you're XML data in Word and have it formatted in a richer way than the default XSLT provides (which is probably almost always the case), then you can generate an XSLT. Let's say that we want to format this custom XML to look the same as the file looked that we built in part 3 of the intro to Word XML. We would just need to create an XSLT that output that same WordprocessingML. The XSLT would look something like this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:s="http://jonesxml.com/schemas/example1">
    <xsl:template match="/">
        <w:wordDocument>
            <w:docPr>
                <w:showXMLTags w:val="off" />
            </w:docPr>
            <xsl:apply-templates />
        </w:wordDocument>
    </xsl:template>
    <xsl:template match="s:employee">
        <w:body>
            <s:employee>
                <w:p>
                    <w:r>
                        <w:rPr>
                            <w:b />
                        </w:rPr>
                        <w:t xml:space="preserve">Name: </w:t>
                    </w:r>
                    <s:name>
                        <w:r>
                            <w:t><xsl:value-of select="s:name" /></w:t>
                        </w:r>
                    </s:name>
                </w:p>
                <w:p>
                    <w:r>
                        <w:rPr>
                            <w:b />
                        </w:rPr>
                        <w:t xml:space="preserve">Occupation: </w:t>
                    </w:r>
                    <s:occupation>
                        <w:r>
                            <w:t><xsl:value-of select="s:occupation" /></w:t>
                        </w:r>
                    </s:occupation>
                </w:p>
            </s:employee>
        </w:body>
    </xsl:template>
</xsl:stylesheet>

Save that XSLT file onto you're machine and now open the custom XML file again in Word. Notice the task pane to the right called the "XML Document" pane. You can see that the "Data only" transform was applied, but you can choose to browse for a different one. Choose "Browse..." and find the XSLT file we just created. The XSLT should now be applied and you should have a file that looks really similar to the one we created the other week. We specified that the XML tag view should be off, but you can turn them back on by pressing "CTRL + Shift + X".

There's a simple example of creating an XSLT. You can now play around with changing properties in the XSLT so that the data is displayed in different ways.

-Brian

Published Tuesday, August 16, 2005 8:44 PM by BrianJones

Comments

Thursday, August 18, 2005 6:32 PM by David Giusto

# re: Intro to Word XML Part 5: Opening custom XML

Brian,

The fall back transform c:\Program Files\Microsoft Office\OFFICE11\XML2WORD.XSL
is not used if your XML contains a non-namespaced <body> tag.

The simple case:

<body>
<para>Hello</para>
<para>world</para>
</body>

Yields very different results than:

<xbody>
<para>Hello</para>
<para>world</para>
</xbody>

It turns out that if <body> is anywhere in the XML stream word just takes the content (all the text() nodes) up to the end </body> but not anything after it. Open this with Word and where is "MNOP QRST":

<foo>ABCD
<body>
EFGH
<para>Hello</para>
IJKL
<para>world</para>
</body>
MNOP
<bar>QRST</bar>
</foo>

Now change <body> to <xbody> and try it again.

This is not really an issue unless you grab part of an HTML page or your data model includes <body> - Just keepin' you accurate.

Dave

P.s. Be sure to read my comment on the 7/26 topic:
http://blogs.msdn.com/brian_jones/archive/2005/07/26/443572.aspx#comments
Thursday, August 18, 2005 7:49 PM by BrianJones

# re: Intro to Word XML Part 5: Opening custom XML

Hey Dave, that's very observant. Have you figured out yet why that's happening? It's because Word thinks the file is an HTML file, and not an XML file.

In Word, we don't really pay attention to the file extension. Instead we sniff through the file and see if we can figure out what it is. Take a .doc file and rename it to .xml. It will still open in Word without a problem.

If you add the xml declaration <?xml version="1.0"?> to the top of your example file, then we'll know it's not an HTML document and open it properly. This would also happen if you'd used a namespace for the body tag.

I also saw your comment on the other post yesterday. It was a great comment, and there were a number of things you said that I agreed with. Over the coming months I hope to drill in a lot deeper on subjects like bullets and numbers and complex formatting so other people can better understand how it works and how to take advantage of it. Thanks for your feedback!

-Brian
Friday, August 19, 2005 3:15 AM by David Giusto

# re: Intro to Word XML Part 5: Opening custom XML

Brian, I knew that there would be a reasonable explanation for this, I almost got there with the reference to HTML. It still seems odd that the content before the <body> is included but content after </body> is not. I guess that if I check the HTML spec I will find the <head> tag and friends may allow content. But that's a topic for another day.

So how did I get here? I was playing with the example on <w:cfChunk> from 7/20.
http://blogs.msdn.com/brian_jones/archive/2005/07/20/441167.aspx
It seems that your WordML pseudo code has a non-namespaced body tag - you get the picture.
I had just read John Durant's Blog on the cfChunk topic about an hour before yours and was left looking for an actual example since John's description was a bit ambiguous - you know us engineers it all has to be very specific, pictures are good, examples are better. Thanks for the examples!! In all fairness John does give one of my projects a plug in his 8/9 blog topic on CALS tables. https://blogs.msdn.com/johnrdurant/archive/2005/08/09/wordmlcals.aspx

I have a comment about cfChunk but I'll post it in that blog-lette so it’s in the right context. You may have to hop around a bit to follow my train of thought since the topics cross over so much and I want to post in the correct topic stream as I digress.

Back on topic? – Bullets are not bad, numbered lists are a bit dicey, its hybrids and multi-level lists that will be a challenge to describe in the forum. There is an excellent explanation of this topic in the book by Simon, Evan, and Mary referred to in the 7/8 topic by Evan himself.
http://blogs.msdn.com/brian_jones/archive/2005/07/08/436973.aspx#comments
The book is here: http://www.oreilly.com/catalog/officexml/ the sample chapter http://www.oreilly.com/catalog/officexml/chapter/ch02.pdf is a must read. At least read the last paragraph on page 67 (book not pdf page) it is eloquent!
My issue with lists is that there are multiple ways to define the same structure which makes difficult to effectively convert the XML. I’m being careful not to use ‘transform’ here since the circular links between Styles, List instances, and List definitions are fairly complex for XSLT. These three objects are the poster children for argument that WordML is like a relational database. It is possible to create an XML instance that is schema compliant but not valid as a word document if all these pointers and links don’t jive. This is completely understandable when one realizes WordML is based on an object model and is much more than just a ‘document instance’.

I can’t tell you how many deer-in-the-headlights looks I get when I try to explain the difference between Document XML and Database XML or a hierarchy and a relational model to someone who is familiar with traditional XML document publishing. It always gets to terms that they can understand – I will say “it’s like the difference between newspapers and magazines”. They are both words printed on paper but they have very different editorial, production, and printing processes with very different applications and target audiences
Friday, September 09, 2005 5:33 PM by Brian Jones: Office XML Formats

# Intro to Word XML Part 6: Locking down your XML structures

What a busy week. I've been trying to keep up with all the news while also getting ready for PDC (and...
Tuesday, April 04, 2006 12:00 PM by Brian Jones: Open XML Formats

# What about Word 2003's XML format?

I've had a few folks ask me about the XML format from Word 2003, and whether or not it would be supported...
New Comments to this post are disabled
 
Page view tracker