Mail Merging Into a Single Word Document Using XML Literals

Mail Merging Into a Single Word Document Using XML Literals

  • Comments 20

With the release of Microsoft Office 2007 we can work with a much simpler, standard, XML format called Open XML which opens the door for many types of applications that cannot work via COM. What if you needed to build a scalable web service that processes many documents in high volume? What if you wanted to quickly read or write to these formats from a client application but wanted to have minimal dependencies on other applications? These types of programs do not want to require Microsoft Office be installed to run. The cool thing is you have the tools already with Visual Basic 2008. XML Literals are an easy way to manipulate any kind of XML, including Open XML.

I’ve written before on how to use XML Literals to manipulate Word documents, if you missed them:

If you aren’t familiar with XML Literals or Open XML then I’d suggest reading the article in the first bullet above first. It’s a great intro into XML Literals, Open XML and it also demonstrates a couple practical ways of reading and writing to Word document formats directly.

Recently I got a great question on how to change the program outlined in the last bullet above to merge letters we were creating on the fly from a database into a single document instead of multiple documents like we’re doing. This would make it much easier for an end user to print them. That’s such an obvious thing I can’t believe I didn’t think of that! For those of you that aren’t familiar with what we did I’ll give a quick recap of that application.

Creating Word Documents with XML Literals

What I wanted to do is take all the Customers in the Northwind database who had some orders shipping today and send them thank you letters. To get started I created a new Word 2007 document with the letter text and some placeholder field names to indicate where I want the data:

image

Save the document (I named it Doc1.docx) and then rename the .docx extension to .zip and you can drill into the package and see a bunch of XML documents inside it. If we drill down through the zip file we'll see that the text we just typed is located in the \word\document.xml file. Copy this xml and paste it into the Visual Basic editor, it will infer it as an XDocument object, and then you can use embedded expressions to replace the placeholders. (If I’ve lost you, read this article which explains it step-by-step.)

In order to create letters for our customers in the database, we want to end up with a collection of XDocuments with our data merged into them. I created a simple class called Letter that has two properties, CustomerID As String and Document As XDocument. Then I wrote a LINQ query to select the data and embed it into the document (I’ve omitted all the WordProcessingML for clarity, take a loot at the code sample for the whole listing.)

Dim letters = _
    From Order In db.Orders _
    Join Employee In db.Employees _
        On Order.EmployeeID Equals Employee.EmployeeID _
    Where Order.OrderDate IsNot Nothing AndAlso _
        Order.ShippedDate IsNot Nothing AndAlso _
        Order.ShippedDate.Value.Date >= #1/1/2007# _
    Let DateOrder = Order.OrderDate.Value.ToShortDateString _
    Let DateShip = Order.ShippedDate.Value.ToShortDateString _
    Select New Letter With { _
        .CustomerID = Order.Customer.CustomerID, _
        .Document = <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document ...
<w:p w:rsidR="00705CFF" w:rsidRDefault="00112228" w:rsidP="00AA2EC6"> <w:r> <w:br/> </w:r> <w:r w:rsidR="007A5236"> <w:t xml:space="preserve">Dear </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r> <w:t><%= Order.Customer.ContactName %></w:t> </w:r> <w:proofErr w:type="spellEnd"/> <w:r w:rsidR="00AA2EC6"> <w:t>,</w:t> </w:r> </w:p> <w:p w:rsidR="00E04FB0" w:rsidRDefault="00AA2EC6" w:rsidP="00AA2EC6"> <w:r> <w:t xml:space="preserve">We’d like to inform you that the order you placed on </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r w:rsidR="00112228"> <w:t><%= DateOrder %></w:t> </w:r> <w:proofErr w:type="spellEnd"/> <w:r w:rsidR="00806521"> <w:t xml:space="preserve"> has shipped on </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r w:rsidR="00112228"> <w:t><%= DateShip %></w:t> </w:r> ... </w:document>}

The query returns an IEnumerable(Of Letter) objects with the Document property set to the document.xml data for that customer. The next step involved taking the template Doc1.docx file, making a copy of it and then simply replacing the document.xml part inside the package with the one here in the Letter class for each letter. However, what I did was create separate letters for each customer. Instead we want to create one .docx file with ALL the letter data merged inside.

Merging into a Single Word Document

The first thing we need to do is insert a page break between all our letters. The easiest thing to do is to open back up your template Doc1.docx file in Word and insert a page break, save it, and then look at the document.xml again in the package. You should see this WordProccessingML element inserted near the end:

<w:p w:rsidR="00622A50" w:rsidRDefault="00622A50">
    <w:r>
        <w:br w:type="page"/>
    </w:r>
</w:p>

Paste this into the query above in the document exactly where you pulled it out of the document.xml.

Next we have a couple options on how to manipulate the package (docx file). I’ve shown how to use the Open XML SDK before to manipulate documents but in this simple case we can use the System.IO.Packaging classes directly. This is because all we’re doing is working with text in the document. If we were working with images or embedded objects it would be easier to use the SDK. In the next post we’ll add photos of the employees on each order into the document, but for this post let’s just focus on how to merge the text.

So the first thing we need to do is Import some namespaces including an XML namespace that we’re going to use:

Imports <xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
Imports System.IO
Imports System.IO.Packaging

Next we need to set up some variables and copy the Doc1.docx template to a new file I’m calling AllLetters.docx.

Dim uri As New Uri("/word/document.xml", UriKind.Relative)
Dim contentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"
Dim sourceFile = CurDir() & "\Doc1.docx"
Dim letterDir = CurDir() & "\Letters\"
My.Computer.FileSystem.CreateDirectory(letterDir)

Dim mergeFile = letterDir & "AllLetters.docx"
My.Computer.FileSystem.CopyFile(sourceFile, mergeFile, True)

Now we can open the template package and load the main document part as an XElement:

Using p As Package = Package.Open(mergeFile)

    'get the main document part (document.xml)
    Dim mainPart = p.GetPart(uri)
    Dim mainDocumentXML As XElement



Using sr As New StreamReader(mainPart.GetStream) mainDocumentXML = XElement.Load(sr) End Using

The next part is where it gets fun. A word document has a top element structure like this:

<w:document>
    <w:body>
       ...
    </w:body>
</w:document>

So what we need to do is grab the body of the template, replace it with the first customer’s document body we have and then append the rest of the elements inside the bodies of the rest of the documents. A word document can only have one <w:body> element for it to be legal. The way we get the <w:body> XElement from the document is using the descendants syntax with the three dot notation (…):

    Dim mainBody = mainDocumentXML...<w:body>.First()
    Dim i = 0

    For Each letter In letters
        Dim nextBody = letter.Document...<w:body>.First()
        If i = 0 Then
            'Replace the first body contents in the template
            mainBody.ReplaceNodes(nextBody.Elements())
        Else
            'Append the new contents for the rest of the customers
            mainBody.Add(nextBody.Elements())
        End If
        i += 1
    Next

Now that we’ve built up the right body with all our letter data in it, we can replace the main document.xml part in the package and close it.

    'Delete the current document.xml file in the template
    p.DeletePart(uri)
    'Replace that part with our new merged XDocument
    Dim replace As PackagePart = p.CreatePart(uri, contentType)
Using sw As New StreamWriter(replace.GetStream()) mainDocumentXML.Save(sw) sw.Close() End Using p.Close() End Using

Now when we run this, all the letters will be mail merged into the same document for easy printing:

image

As I mentioned if we have embedded images or objects that we need to replace it gets a little trickier. In the DevProConnections Article I have an example of how to create multiple documents with embedded pictures of the employee’s photo. In the next post I’ll show you how we can use the Open XML SDK to create a single document with embedded pictures as well.

Until then, have a look at the complete code I put up on Code Gallery.

[UPDATE: Merging Text & Photos into a Word Document using Open XML SDK]

Enjoy!

Leave a Comment
  • Please add 7 and 2 and type the answer here:
  • Post
  • Hey, I found an archaic way to do it!! I can now load the XML from a template docx, loop through all the orders creating letters with the data from the order in each one. It uses your code but drops the XML Literal and doesn't use LINQ. (Sorry, but LINQ seems to be great for compile time).

    This may be self-evident:

                  Sub Main()

           Console.WriteLine("Mail merging....")

           Dim db As New NorthwindDataContext

           'Letter query -- Create a query that will create Letter objects for each

           ' of the customers in Northwind that have shipping orders:

           'My version

           Dim strLetter As String

           Dim strLetter2 As String

           'Now load it dynamically

           Dim sourceFile = CurDir() & "\QuickTest.docx"

           'Open the Document template using the Open XML SDK

           Using wordDoc = WordprocessingDocument.Open(sourceFile, False)

               'get the main document part (document.xml)

               Dim mainPart = wordDoc.MainDocumentPart()

               Dim mainDocumentXML As XDocument

               Using sr As New StreamReader(mainPart.GetStream)

                   mainDocumentXML = XDocument.Load(sr) 'load the document XML

               End Using

               'Prepend the document XML. StrLetter becomes our defacto template for each Letter.

               strLetter = "<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>" + mainDocumentXML.ToString

           End Using

           Dim MyLetters As New List(Of Letter)

           Dim MyLetter As Letter

           For Each Order In db.Orders 'No filters to make it simple

               MyLetter = New Letter

               strLetter2 = strLetter.Replace("CUSTOMERNAME", HtmlEncode(Order.Customer.CompanyName)) 'Must Import System.Web.Httputility

               MyLetter.CustomerID = Order.Customer.CustomerID

               MyLetter.Document = XDocument.Parse(strLetter2)

               MyLetters.Add(MyLetter)

           Next

           CreateOneLetter(MyLetters)

       End Sub

  • Hi PhilM,

    Great, you could also use XSLT as well this way. Just be careful with your Replace statements because it may give you unpredictable results. If you have a placeholder name that the user is also using in the template itself it will overwrite it.

    Also by using System.Web you won't be able to run on the Client Profile (but that probably doesn't matter for your case).

    Cheers,

    -B

  • Yeah, the next move would be to figure out how to use XSLT to replace the placeholders with the appropriate data. Then I wouldn't have to go to string, use replace, and then go back. And I probably would not have to import System.Web (I was only looking for HTMLEncode because some of the data has special characters).

    The Client Profile is new to me but I'm going to run all of this server-side where presumably the full .Net Framework will be installed.

    Thanks for your help. Anything I can do to say thanks? philm@cottagecomputing.com

  • Great blog!

    I have made  some modifications. Instead of adding the following

    <w:p w:rsidR="00622A50" w:rsidRDefault="00622A50">

                                        <w:r>

                                            <w:br w:type="page"/>

                                        </w:r>

                                    </w:p>

    for the break page in the query,

    I have added it in the for each statement, like this

               For Each letter In letters

                   Dim nextbody = letter.Documents...<w:body>.First

                   If i = 0 Then

                       mainBody.ReplaceNodes(nextbody.Elements)

                   Else

                       'Add a page break if there is another letter

                       mainBody.Add(<w:p w:rsidR="00622A50" w:rsidRDefault="00622A50">

                                        <w:r>

                                            <w:br w:type="page"/>

                                        </w:r>

                                    </w:p>)

                       mainBody.Add(nextbody.Elements)

                   End If

                   i += 1

               Next

    This way, a blank page is not printed after all other pages have been printed.

    Good or bad? Is there a better way of doing it?

  • Hi,

    this article is good and i am thankful to you.

    but some problem i am having that... URI path is not working....

    can u please suggest me.....

    this part

    Dim uri As New Uri("/word/document.xml", UriKind.Relative)

    this URI not getting any path value....

    and no styles and footer added to the newly created doc file!

    Please help

    Regards,

    Map.

Page 2 of 2 (20 items) 12