29 March 2005

Java and MS-Word

Java and MS-Word - followup

Earlier this month, I posted some references to some Java->WordML interop material. This is a followup.

I proved to myself that it is pretty easy and straightforward to use Java to dynamically create MS-Word documents, conforming to the WordProcessingML schema. Anyone can do this, using the schema documentation, an XML-aware Java application platform.

To use this approach, a developer really needs to have a working installation Word 2003 for the development or design stage: to design the document and generate the initial XML, and you need Word 2003 to verify that what you are producing is a valid WordProcessingML document.

How did I do it?

You all know that Microsoft Word (and other Office applications) can load and save XML, and you know the schema is published by Microsoft.

The XML phreaks out there, maybe they like to wake up in the morning, drink 7 cups of starbucks' best, look at a schema, and start coding angle brackets. Not me. Given an XML schema of reasonable complexity, I have little hope of independently generating an XML document that conforms to that schema, within my lifetime. So what I did was use MS-Word as the designer. I just wrote a document. Anybody can do that. I designed the document exactly as I wanted it. Then File... Save As.... XML. Boom, I have a template document that conforms to WordProcessingML.

From that starting point, I took 2 paths. The first was to just place within that Template document keywords or fields to be replaced programmatically at runtime, with a simple text replacement library. In Java, the java.lang.String class has a replaceAll() method that accepts regular expressions and inserts replacement text. Easy. I just inserted a set of "fields" that look like ##NAME##. These are not MS-Word "fields", just plain old text, within the XML document, of a well-known format. You can use any format you like. $$NAME$$ if you want, or whatever.

The Java application then populates a Hashtable of name/value pairs, then mechanically replaces all the fields in the doc whose names are present in the Hashtable, with the value of that key. Simple. Find ##FOO## in the doc, and replace that with Hashtable.get("FOO"). The Hashtable can be populated by any means - I inserted the current time of day as one of the name/value pairs, and I also populated the list with data from a SQL query. It could also be populated from a webservices call. Whatever. It's just a Hashtable.

After replacing the "fields", the result was a legal WordProcessingML document, dynamically-generated from data. Load that doc into MS-Word, print it, whatever. Easy.

The second path I took was more XML-ish. My data source was an XML document. All data, including current time of day, and anything you might retrieve from a database, gets formatted into an XML document. You choose the schema. This doc could be obtained via a webservices call, from a database query (SQL Server and other databases can return XML documents in response to queries) or just formed in memory. I took the latter approach. Anything will do.

I then de-constructed the template XML document, and formed it into an XSL transform that could accept the XML data document, and again, produce a WordProcessingML document. Then it is a simple matter of applying the XSL transform programmatically, at runtime. This requires at least Java 1.4, which you all should be using anyway because it is more current with security fixes. Also you should take this route only if you are comfortable with XSL. It is hairy for some people.

Either path - the template version or the XSL transform - produces the same result: a valid WordProcessingML document. Either works for standalone applications or in web applications.

In Action

Those of you who are familiar with XML technologies won't be surprised to learn that it just works. But even so, the ability to dynamically generate a rich Word document, with images, text formatting, tables, and so on, all from Java, may open up some possibilities for you. Check it out for yourself. Here's a working example that uses a JSP to dynamically generate a document file. You should have MS-Word installed on your PC if you want to see the result.

Next up

I didn't try the XSL-FO route or the RenderX stylesheet I mentioned in my previous post. Also I did not try to slurp up documents with custom-schema into Word. And I didn't transmit the XML documents over webservices. I may explore some of these things in the future. Anyone have any other ideas?

Let me know what you think!

Here's the example, including links to source code.

Enjoy.
-Dino

Filed under: , ,
 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Signs on the Sand said:
Dino Chiesa of Microsoft shows how to generate dynamically WordML documents using Java and XSLT. Yep, that's not a typo, Microsoft, WordML and Java. XML serves as peacemaker again. And he even provides a working JSP demo. Cool....
30 March 05 at 10:01 AM
# Martin Naughton said:
If you are taking the "Replace All" approach, such as in CreateOrderConfViaTemplate.java, the value you insert into the XML should be XML-encoded.

For example, the following characters (spelled out) must be escaped:

"less-than"
"greater-than"
"apostrophe"
"double-quote"
"ampersand"
01 April 05 at 5:06 AM
# Dino said:
Good point Martin. I've updated the examples. Thanks.
01 April 05 at 6:57 AM
# Gunther V said:
I need to convert a generated WordML document to a .doc-file. Does somebody know how to do this? I would prefer a Java solution, but .NET solution is OK too.
07 April 05 at 9:00 AM
# DotNetInterop said:
@Gunther,
to do that you could just automate MS-Word in .NET, open the WordML file, then SaveAs.

There are examples of how to automate office in the .NET SDK install.
18 April 05 at 4:06 PM
# rash said:
Can we achieve mail merge functionality of word with xml data with this approach?
20 April 05 at 6:27 AM
# RedoBlog - The .NET Gentleman !!! said:
23 April 05 at 3:55 PM
# Ian Brandt said:
@Gunther, Dino,

A Java WordprocessingML to Doc converter sure would be nice though. I'm a Mac user. I paid half a grand for Office Pro, but Word 2004 doesn't do XML. I have to buy yet another copy of Word, 2003, and run it in Virtual PC, and I can't script the conversion from the OS X side. Where's the inter-op in that? In the future I really hope to see full support of WordprocessingML in all versions of Word so that someday we can actually distribute documents in that format, but until then a portable wordml2doc converter would be a good thing for all.
28 April 05 at 2:07 PM
# Mamun Chowdury said:
Hi, I was trying to view and download the example that you said about generating word file in Jsp. Unfortunately the link was not working. Will it possible to email me the example with source code.

Thanks in advance,

Mamun
28 April 05 at 10:01 PM
# DotNetInterop said:
@Mamun,
Sorry, the quality of service on that machine is a little low. it was sitting on an old laptop that had some power problems. I've since migrated it to a newer machine. the link ought to work now?
http://dinoch.dyndns.org:7070/WordML/
26 May 05 at 11:00 AM
# Neirrek // Le site web de Bruno Kerrien ?? Blog Archive » G??n??rer des documents Microsoft Office gr??ce ?? XML said:
PingBack from http://www.neirrek.com/blog/2005/05/11/xml-a-la-rescousse-dela-generation-de-documents-microsoft-office-2003/
06 February 06 at 5:00 AM
# All About Interop said:

A while back, the OpenXmlDeveloper.org website offered an example of how to create a WordProcessingML

03 October 06 at 4:08 PM
# All About Interop said:

In the past I've posted some articles [ 1 , 2 ] about generating Office 2003 documents from a server-side

17 January 07 at 4:10 PM
# dionazani said:

You can use Rtf Writer2 to write rtf and open in Word or OpenOffice (Writer) ...

09 September 08 at 11:10 PM
# Mathieu said:

Hi, I was trying to view and download the example that you said about generating word file in Jsp. Unfortunately the link was not working. Will it possible to email me the example with source code.

Thanks in advance,

Mathieu

10 April 09 at 12:18 AM
# Subbu said:

Hi,

I am looking for java code/utility to check if a given MS Word document has track changes ON or not.  

Any help is appreciated..

20 April 09 at 11:09 AM
# BOng said:

Your source code links aint working 15/05/2009

15 May 09 at 3:42 AM
# DotNetInterop said:

Yes, my server is down and cannot get up!  Sorry!

29 May 09 at 10:57 AM

Leave a Comment

Comment Policy: No HTML allowed. URIs and line breaks are converted automatically. Your e–mail address will not show up on any public page.

(required) 
(optional)
(required) 
Page view tracker