Mail Merging Into a Single Word Document Using XML Literals

Published 23 September 09 04:18 PM

With the release of Microsoft Office 2007 we can work with a much simpler, standard, XML format called Open XML which opens the door for many types of applications that cannot work via COM. What if you needed to build a scalable web service that processes many documents in high volume? What if you wanted to quickly read or write to these formats from a client application but wanted to have minimal dependencies on other applications? These types of programs do not want to require Microsoft Office be installed to run. The cool thing is you have the tools already with Visual Basic 2008. XML Literals are an easy way to manipulate any kind of XML, including Open XML.

I’ve written before on how to use XML Literals to manipulate Word documents, if you missed them:

If you aren’t familiar with XML Literals or Open XML then I’d suggest reading the article in the first bullet above first. It’s a great intro into XML Literals, Open XML and it also demonstrates a couple practical ways of reading and writing to Word document formats directly.

Recently I got a great question on how to change the program outlined in the last bullet above to merge letters we were creating on the fly from a database into a single document instead of multiple documents like we’re doing. This would make it much easier for an end user to print them. That’s such an obvious thing I can’t believe I didn’t think of that! For those of you that aren’t familiar with what we did I’ll give a quick recap of that application.

Creating Word Documents with XML Literals

What I wanted to do is take all the Customers in the Northwind database who had some orders shipping today and send them thank you letters. To get started I created a new Word 2007 document with the letter text and some placeholder field names to indicate where I want the data:

image

Save the document (I named it Doc1.docx) and then rename the .docx extension to .zip and you can drill into the package and see a bunch of XML documents inside it. If we drill down through the zip file we'll see that the text we just typed is located in the \word\document.xml file. Copy this xml and paste it into the Visual Basic editor, it will infer it as an XDocument object, and then you can use embedded expressions to replace the placeholders. (If I’ve lost you, read this article which explains it step-by-step.)

In order to create letters for our customers in the database, we want to end up with a collection of XDocuments with our data merged into them. I created a simple class called Letter that has two properties, CustomerID As String and Document As XDocument. Then I wrote a LINQ query to select the data and embed it into the document (I’ve omitted all the WordProcessingML for clarity, take a loot at the code sample for the whole listing.)

Dim letters = _
    From Order In db.Orders _
    Join Employee In db.Employees _
        On Order.EmployeeID Equals Employee.EmployeeID _
    Where Order.OrderDate IsNot Nothing AndAlso _
        Order.ShippedDate IsNot Nothing AndAlso _
        Order.ShippedDate.Value.Date >= #1/1/2007# _
    Let DateOrder = Order.OrderDate.Value.ToShortDateString _
    Let DateShip = Order.ShippedDate.Value.ToShortDateString _
    Select New Letter With { _
        .CustomerID = Order.Customer.CustomerID, _
        .Document = <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document ...
<w:p w:rsidR="00705CFF" w:rsidRDefault="00112228" w:rsidP="00AA2EC6"> <w:r> <w:br/> </w:r> <w:r w:rsidR="007A5236"> <w:t xml:space="preserve">Dear </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r> <w:t><%= Order.Customer.ContactName %></w:t> </w:r> <w:proofErr w:type="spellEnd"/> <w:r w:rsidR="00AA2EC6"> <w:t>,</w:t> </w:r> </w:p> <w:p w:rsidR="00E04FB0" w:rsidRDefault="00AA2EC6" w:rsidP="00AA2EC6"> <w:r> <w:t xml:space="preserve">We’d like to inform you that the order you placed on </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r w:rsidR="00112228"> <w:t><%= DateOrder %></w:t> </w:r> <w:proofErr w:type="spellEnd"/> <w:r w:rsidR="00806521"> <w:t xml:space="preserve"> has shipped on </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r w:rsidR="00112228"> <w:t><%= DateShip %></w:t> </w:r> ... </w:document>}

The query returns an IEnumerable(Of Letter) objects with the Document property set to the document.xml data for that customer. The next step involved taking the template Doc1.docx file, making a copy of it and then simply replacing the document.xml part inside the package with the one here in the Letter class for each letter. However, what I did was create separate letters for each customer. Instead we want to create one .docx file with ALL the letter data merged inside.

Merging into a Single Word Document

The first thing we need to do is insert a page break between all our letters. The easiest thing to do is to open back up your template Doc1.docx file in Word and insert a page break, save it, and then look at the document.xml again in the package. You should see this WordProccessingML element inserted near the end:

<w:p w:rsidR="00622A50" w:rsidRDefault="00622A50">
    <w:r>
        <w:br w:type="page"/>
    </w:r>
</w:p>

Paste this into the query above in the document exactly where you pulled it out of the document.xml.

Next we have a couple options on how to manipulate the package (docx file). I’ve shown how to use the Open XML SDK before to manipulate documents but in this simple case we can use the System.IO.Packaging classes directly. This is because all we’re doing is working with text in the document. If we were working with images or embedded objects it would be easier to use the SDK. In the next post we’ll add photos of the employees on each order into the document, but for this post let’s just focus on how to merge the text.

So the first thing we need to do is Import some namespaces including an XML namespace that we’re going to use:

Imports <xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
Imports System.IO
Imports System.IO.Packaging

Next we need to set up some variables and copy the Doc1.docx template to a new file I’m calling AllLetters.docx.

Dim uri As New Uri("/word/document.xml", UriKind.Relative)
Dim contentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"
Dim sourceFile = CurDir() & "\Doc1.docx"
Dim letterDir = CurDir() & "\Letters\"
My.Computer.FileSystem.CreateDirectory(letterDir)

Dim mergeFile = letterDir & "AllLetters.docx"
My.Computer.FileSystem.CopyFile(sourceFile, mergeFile, True)

Now we can open the template package and load the main document part as an XElement:

Using p As Package = Package.Open(mergeFile)

    'get the main document part (document.xml)
    Dim mainPart = p.GetPart(uri)
    Dim mainDocumentXML As XElement



Using sr As New StreamReader(mainPart.GetStream) mainDocumentXML = XElement.Load(sr) End Using

The next part is where it gets fun. A word document has a top element structure like this:

<w:document>
    <w:body>
       ...
    </w:body>
</w:document>

So what we need to do is grab the body of the template, replace it with the first customer’s document body we have and then append the rest of the elements inside the bodies of the rest of the documents. A word document can only have one <w:body> element for it to be legal. The way we get the <w:body> XElement from the document is using the descendants syntax with the three dot notation (…):

    Dim mainBody = mainDocumentXML...<w:body>.First()
    Dim i = 0

    For Each letter In letters
        Dim nextBody = letter.Document...<w:body>.First()
        If i = 0 Then
            'Replace the first body contents in the template
            mainBody.ReplaceNodes(nextBody.Elements())
        Else
            'Append the new contents for the rest of the customers
            mainBody.Add(nextBody.Elements())
        End If
        i += 1
    Next

Now that we’ve built up the right body with all our letter data in it, we can replace the main document.xml part in the package and close it.

    'Delete the current document.xml file in the template
    p.DeletePart(uri)
    'Replace that part with our new merged XDocument
    Dim replace As PackagePart = p.CreatePart(uri, contentType)
Using sw As New StreamWriter(replace.GetStream()) mainDocumentXML.Save(sw) sw.Close() End Using p.Close() End Using

Now when we run this, all the letters will be mail merged into the same document for easy printing:

image

As I mentioned if we have embedded images or objects that we need to replace it gets a little trickier. In the DevProConnections Article I have an example of how to create multiple documents with embedded pictures of the employee’s photo. In the next post I’ll show you how we can use the Open XML SDK to create a single document with embedded pictures as well.

Until then, have a look at the complete code I put up on Code Gallery.

[UPDATE: Merging Text & Photos into a Word Document using Open XML SDK]

Enjoy!

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Peter said on September 24, 2009 12:21 PM:

Did you consider using XSLT to perform the replacement and transformation, this would allow the flexibility of changing the document without a recompile?

I attempted a similar project for creating public documents for a council in 2005 using Office 2003.  The hardest part about it was accepting a word document as a plain .doc file and converting it to .xml to then append it as a supplement to the end of the report.

In Office 2003 this project was definitely a death march project, but from looking at your examples it looks like this project would now be significantly easier.

I did get the project working however because of the complexity of the xslt's produced and the fact that Linq to XML was not yet available the datasets and xslt's were extremely susceptible to the slightest change.

# Beth Massi said on September 24, 2009 12:41 PM:

Hi Peter,

XSLT would certianly work. I have used it in the past for many types of content generation. This problem above is a simple one so it would work well. However, XSTL isn't a real programming language and I find that it quickly gets unweildy if you have a lot of complex logic.

If you didn't want to recompile this program you could use XDocument.Load to load the XML and then use axis properties to find the elements you wanted to replace instead of embedded expressions.

Cheers,

-B

# Splend said on October 8, 2009 2:08 PM:

Thanks Beth,

having learned a lot from the program,

if we have header and footer in the sourcefile, and using the ReplaceNodes method to replace the body contents, then the we lost the header and footer, wondering if there is some method to solve this problem

# Splend said on October 8, 2009 4:39 PM:

Hi  Beth,

Just find another article from you, "Merging Text & Photos into a Word Document using Open XML SDK"

I got the solution (^_^)

you have writen great articles !! thank you so much!

# Beth Massi said on November 4, 2009 12:39 PM:

Thanks Splend!

Just for reference for other folks reading this thread, headers and footer appear as links (relationship IDs like images) and so this technique above will work if you have references to them and they can be the same for every page. Inserting the contents into <body> tag should also include the relationship IDs to them in the sectPr element node. i.e.

<w:body>

...

<w:sectPr w:rsidR="00993580" w:rsidSect="00993580">

<w:headerReference w:type="default" r:id="rId6"/>

<w:footerReference w:type="default" r:id="rId7"/>

<w:pgSz w:w="12240" w:h="15840"/>

...

</w:sectPr>

</w:body>

Thanks for reading!

Leave a Comment

(required) 
(optional)
(required) 

  
Enter Code Here: Required

About Beth Massi

Beth is a Program Manager on the Visual Studio Community Team at Microsoft and is responsible for producing and managing content for business application developers, driving community features and team participation onto MSDN Developer Centers (http://msdn.com), and helping make Visual Studio one of the best developer tools in the world. She also produces regular content on her blog (http://blogs.msdn.com/bethmassi), Channel 9, and a variety of other developer sites and magazines. As a community champion and a long-time member of the Microsoft developer community she also helps with the San Francisco East Bay .NET user group and is a frequent speaker at various software development events. Before Microsoft, she was a Senior Architect at a health care software product company and a Microsoft Solutions Architect MVP. Over the last decade she has worked on distributed applications and frameworks, web and Windows-based applications using Microsoft development tools in a variety of businesses. She loves teaching, hiking, mountain biking, and driving really fast.

This Blog

Syndication

Page view tracker