Mail Merging Into a Single Word Document Using XML Literals

Mail Merging Into a Single Word Document Using XML Literals

  • Comments 20

With the release of Microsoft Office 2007 we can work with a much simpler, standard, XML format called Open XML which opens the door for many types of applications that cannot work via COM. What if you needed to build a scalable web service that processes many documents in high volume? What if you wanted to quickly read or write to these formats from a client application but wanted to have minimal dependencies on other applications? These types of programs do not want to require Microsoft Office be installed to run. The cool thing is you have the tools already with Visual Basic 2008. XML Literals are an easy way to manipulate any kind of XML, including Open XML.

I’ve written before on how to use XML Literals to manipulate Word documents, if you missed them:

If you aren’t familiar with XML Literals or Open XML then I’d suggest reading the article in the first bullet above first. It’s a great intro into XML Literals, Open XML and it also demonstrates a couple practical ways of reading and writing to Word document formats directly.

Recently I got a great question on how to change the program outlined in the last bullet above to merge letters we were creating on the fly from a database into a single document instead of multiple documents like we’re doing. This would make it much easier for an end user to print them. That’s such an obvious thing I can’t believe I didn’t think of that! For those of you that aren’t familiar with what we did I’ll give a quick recap of that application.

Creating Word Documents with XML Literals

What I wanted to do is take all the Customers in the Northwind database who had some orders shipping today and send them thank you letters. To get started I created a new Word 2007 document with the letter text and some placeholder field names to indicate where I want the data:

image

Save the document (I named it Doc1.docx) and then rename the .docx extension to .zip and you can drill into the package and see a bunch of XML documents inside it. If we drill down through the zip file we'll see that the text we just typed is located in the \word\document.xml file. Copy this xml and paste it into the Visual Basic editor, it will infer it as an XDocument object, and then you can use embedded expressions to replace the placeholders. (If I’ve lost you, read this article which explains it step-by-step.)

In order to create letters for our customers in the database, we want to end up with a collection of XDocuments with our data merged into them. I created a simple class called Letter that has two properties, CustomerID As String and Document As XDocument. Then I wrote a LINQ query to select the data and embed it into the document (I’ve omitted all the WordProcessingML for clarity, take a loot at the code sample for the whole listing.)

Dim letters = _
    From Order In db.Orders _
    Join Employee In db.Employees _
        On Order.EmployeeID Equals Employee.EmployeeID _
    Where Order.OrderDate IsNot Nothing AndAlso _
        Order.ShippedDate IsNot Nothing AndAlso _
        Order.ShippedDate.Value.Date >= #1/1/2007# _
    Let DateOrder = Order.OrderDate.Value.ToShortDateString _
    Let DateShip = Order.ShippedDate.Value.ToShortDateString _
    Select New Letter With { _
        .CustomerID = Order.Customer.CustomerID, _
        .Document = <?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:document ...
<w:p w:rsidR="00705CFF" w:rsidRDefault="00112228" w:rsidP="00AA2EC6"> <w:r> <w:br/> </w:r> <w:r w:rsidR="007A5236"> <w:t xml:space="preserve">Dear </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r> <w:t><%= Order.Customer.ContactName %></w:t> </w:r> <w:proofErr w:type="spellEnd"/> <w:r w:rsidR="00AA2EC6"> <w:t>,</w:t> </w:r> </w:p> <w:p w:rsidR="00E04FB0" w:rsidRDefault="00AA2EC6" w:rsidP="00AA2EC6"> <w:r> <w:t xml:space="preserve">We’d like to inform you that the order you placed on </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r w:rsidR="00112228"> <w:t><%= DateOrder %></w:t> </w:r> <w:proofErr w:type="spellEnd"/> <w:r w:rsidR="00806521"> <w:t xml:space="preserve"> has shipped on </w:t> </w:r> <w:proofErr w:type="spellStart"/> <w:r w:rsidR="00112228"> <w:t><%= DateShip %></w:t> </w:r> ... </w:document>}

The query returns an IEnumerable(Of Letter) objects with the Document property set to the document.xml data for that customer. The next step involved taking the template Doc1.docx file, making a copy of it and then simply replacing the document.xml part inside the package with the one here in the Letter class for each letter. However, what I did was create separate letters for each customer. Instead we want to create one .docx file with ALL the letter data merged inside.

Merging into a Single Word Document

The first thing we need to do is insert a page break between all our letters. The easiest thing to do is to open back up your template Doc1.docx file in Word and insert a page break, save it, and then look at the document.xml again in the package. You should see this WordProccessingML element inserted near the end:

<w:p w:rsidR="00622A50" w:rsidRDefault="00622A50">
    <w:r>
        <w:br w:type="page"/>
    </w:r>
</w:p>

Paste this into the query above in the document exactly where you pulled it out of the document.xml.

Next we have a couple options on how to manipulate the package (docx file). I’ve shown how to use the Open XML SDK before to manipulate documents but in this simple case we can use the System.IO.Packaging classes directly. This is because all we’re doing is working with text in the document. If we were working with images or embedded objects it would be easier to use the SDK. In the next post we’ll add photos of the employees on each order into the document, but for this post let’s just focus on how to merge the text.

So the first thing we need to do is Import some namespaces including an XML namespace that we’re going to use:

Imports <xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
Imports System.IO
Imports System.IO.Packaging

Next we need to set up some variables and copy the Doc1.docx template to a new file I’m calling AllLetters.docx.

Dim uri As New Uri("/word/document.xml", UriKind.Relative)
Dim contentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"
Dim sourceFile = CurDir() & "\Doc1.docx"
Dim letterDir = CurDir() & "\Letters\"
My.Computer.FileSystem.CreateDirectory(letterDir)

Dim mergeFile = letterDir & "AllLetters.docx"
My.Computer.FileSystem.CopyFile(sourceFile, mergeFile, True)

Now we can open the template package and load the main document part as an XElement:

Using p As Package = Package.Open(mergeFile)

    'get the main document part (document.xml)
    Dim mainPart = p.GetPart(uri)
    Dim mainDocumentXML As XElement



Using sr As New StreamReader(mainPart.GetStream) mainDocumentXML = XElement.Load(sr) End Using

The next part is where it gets fun. A word document has a top element structure like this:

<w:document>
    <w:body>
       ...
    </w:body>
</w:document>

So what we need to do is grab the body of the template, replace it with the first customer’s document body we have and then append the rest of the elements inside the bodies of the rest of the documents. A word document can only have one <w:body> element for it to be legal. The way we get the <w:body> XElement from the document is using the descendants syntax with the three dot notation (…):

    Dim mainBody = mainDocumentXML...<w:body>.First()
    Dim i = 0

    For Each letter In letters
        Dim nextBody = letter.Document...<w:body>.First()
        If i = 0 Then
            'Replace the first body contents in the template
            mainBody.ReplaceNodes(nextBody.Elements())
        Else
            'Append the new contents for the rest of the customers
            mainBody.Add(nextBody.Elements())
        End If
        i += 1
    Next

Now that we’ve built up the right body with all our letter data in it, we can replace the main document.xml part in the package and close it.

    'Delete the current document.xml file in the template
    p.DeletePart(uri)
    'Replace that part with our new merged XDocument
    Dim replace As PackagePart = p.CreatePart(uri, contentType)
Using sw As New StreamWriter(replace.GetStream()) mainDocumentXML.Save(sw) sw.Close() End Using p.Close() End Using

Now when we run this, all the letters will be mail merged into the same document for easy printing:

image

As I mentioned if we have embedded images or objects that we need to replace it gets a little trickier. In the DevProConnections Article I have an example of how to create multiple documents with embedded pictures of the employee’s photo. In the next post I’ll show you how we can use the Open XML SDK to create a single document with embedded pictures as well.

Until then, have a look at the complete code I put up on Code Gallery.

[UPDATE: Merging Text & Photos into a Word Document using Open XML SDK]

Enjoy!

Leave a Comment
  • Please add 7 and 4 and type the answer here:
  • Post
  • Did you consider using XSLT to perform the replacement and transformation, this would allow the flexibility of changing the document without a recompile?

    I attempted a similar project for creating public documents for a council in 2005 using Office 2003.  The hardest part about it was accepting a word document as a plain .doc file and converting it to .xml to then append it as a supplement to the end of the report.

    In Office 2003 this project was definitely a death march project, but from looking at your examples it looks like this project would now be significantly easier.

    I did get the project working however because of the complexity of the xslt's produced and the fact that Linq to XML was not yet available the datasets and xslt's were extremely susceptible to the slightest change.

  • Hi Peter,

    XSLT would certianly work. I have used it in the past for many types of content generation. This problem above is a simple one so it would work well. However, XSTL isn't a real programming language and I find that it quickly gets unweildy if you have a lot of complex logic.

    If you didn't want to recompile this program you could use XDocument.Load to load the XML and then use axis properties to find the elements you wanted to replace instead of embedded expressions.

    Cheers,

    -B

  • Thanks Beth,

    having learned a lot from the program,

    if we have header and footer in the sourcefile, and using the ReplaceNodes method to replace the body contents, then the we lost the header and footer, wondering if there is some method to solve this problem

  • Hi  Beth,

    Just find another article from you, "Merging Text & Photos into a Word Document using Open XML SDK"

    I got the solution (^_^)

    you have writen great articles !! thank you so much!

  • Thanks Splend!

    Just for reference for other folks reading this thread, headers and footer appear as links (relationship IDs like images) and so this technique above will work if you have references to them and they can be the same for every page. Inserting the contents into <body> tag should also include the relationship IDs to them in the sectPr element node. i.e.

    <w:body>

    ...

    <w:sectPr w:rsidR="00993580" w:rsidSect="00993580">

    <w:headerReference w:type="default" r:id="rId6"/>

    <w:footerReference w:type="default" r:id="rId7"/>

    <w:pgSz w:w="12240" w:h="15840"/>

    ...

    </w:sectPr>

    </w:body>

    Thanks for reading!

  • In this example you used an XML literal. I want to do the same thing as 'Peter', i.e. load the XML dynamically so the I don't have to recompile when the user wants to change the template a bit. I need a pointer to some examples as to how I would find and replace your placeholder (e.g. CompanyName) with the dynamic code <% = CompanyName %>. Any help appreciated.

  • In this example you used an XML literal. I want to do the same thing as 'Peter', i.e. load the XML dynamically so the I don't have to recompile when the user wants to change the template a bit. I need a pointer to some examples as to how I would find and replace your placeholder (e.g. CompanyName) with the dynamic code <% = CompanyName %>. Any help appreciated.

  • Hi PhilM,

    You wouldn't be able to use an XML literal in this case because the XML is compiled. What you would do instead is use XElement.Load to load in the XML template and then you'd need to query the XML (using axis properties) to find the elements in which you wanted to replace the value. But because WordProcessing XML is hard to query I'd use content controls instead. That way you could just bind them to your custom XML stored inside the document and work with that much easier. I have an example here:

    http://blogs.msdn.com/bethmassi/archive/2009/02/12/oba-part-3-storing-and-reading-data-in-word-documents.aspx

    HTH,

    -Beth

  • Thanks for the reply. With my limited skills I already knew it was tough to query the XML looking for the placeholders, which were merely values rather than axis properties. What was attractive about this approach was getting the dynamic code (<%=... ) in the XML so that the LinqtoXML returned an Ienumerable of Letters using the data from the db. If I use content controls, do you think I can use the same approach? My prior attempts failed because I needed to repeat the content controls, they must be uniquely named, and thus I had to create new bindings to the Custom XML for each one. I couldn't figure out how to do that either. Maybe I could put Content controls in the template, then having found them after loading the XML, blow them away with the dynamic code and then run the LinqtoXML?? The article you linked me to just doesn't allow me to create repeated letters like the approach in this one does. Making sense?

  • You wouldn't be modifying the content control itself, you'd be modifying the CustomXML that they are bound to, so yes you could transform that using LINQ to XML any way you want as long as the binding remained intact. Though this approach would produce multiple documents.

    You should be able to get them to repeat or merge the documents using the Open XML SDK though. I haven't played with it but it seems possible.

  • There doesn't seem to be any rational way to have repeated data bind to the same content control (e.g. Content control for customer name in the template appears once, but needs to repeatedly bound to new custom xml elements as in a loop. One only has to bing 'repeating content controls' to find out how big a problem this is.

    I've played a lot with the OpenXML SDK and it offers nothing for this either. At least not that I have found.

    Your code is SO CLOSE!!! I want to change the last statement in the linq SELECT to use a variable which I have loaded from the template, and modified to insert the <%=CustomerName variable that you showed. But my documents are coming out invalid in Word, not sure why. My code follows (from Module 1) where in I repalced the XML literal with a variable that I can't seem to get right:

              Using sr As New StreamReader(mainPart.GetStream)

                   mainDocumentXML = XDocument.Load(sr) 'Note change from Xelement

               End Using

               Dim mainDocumentXMLString As String = mainDocumentXML.ToString

               'Perform magic (Please the placeholders with the <% = variables

               mainDocumentXML = XDocument.Parse(mainDocumentXMLString)

           End Using

           'Letter query -- Create a query that will create Letter objects for each

           ' of the customers in Northwind that have shipping orders:

           Dim letters = _

               From Order In db.Orders _

               Join Employee In db.Employees _

                    On Order.EmployeeID Equals Employee.EmployeeID _

               Where Order.OrderDate IsNot Nothing AndAlso _

                     Order.ShippedDate IsNot Nothing AndAlso _

                     Order.ShippedDate.Value.Date >= #1/1/1998# _

               Let DateOrder = Order.OrderDate.Value.ToShortDateString _

               Let DateShip = Order.ShippedDate.Value.ToShortDateString _

               Select New Letter With { _

                    .CustomerID = Order.CustomerID, _

                    .OrderID = Order.OrderID, _

                    .EmployeePhoto = If(Employee.Photo IsNot Nothing, Employee.Photo.ToArray, Nothing), _

                    .Document = mainDocumentXML}

  • Hi PhilM,

    You need to realize that the use of embedded expressions is ONLY going to work if you are doing it in the VB editor because these expressions are compiled just like any other code. It's not script. Meaning that if you have <%= %> in your XML "string" these will not be compiled magically by the Parse method, you'll end up with invalid XML. Does that make sense?

    Maybe you could try to add your own attributes to the WordProcessingML? I'll try to play with this when I have some free time.

    You may want to think about it in reverse. You can use embedded expressions to actually create the XML itself, not just the values. That way you could construct the template on the fly. i.e. you could do

           Dim myElementName = "MyElement"

           Dim myValue = "Value"

           Dim x = <<%= myElementName %>><%= myValue %></>

           Console.WriteLine(x.ToString)

           Console.ReadLine()

    Not sure if that helps you but it's a thought.

  • Thanks, you've been quite helpful. This is close to being another dead end I guess. At least I learned how it worked. I thought when I saw that you hoisted the XML out of the Word template, put it in the program and modified it, that I could do the same at run time. That way the user could modify the layout of the template without a recompile of the program. No soap I guess. Now what, I'll look over your 'reverse' idea but I really want to have the user have control of the template yet still be able to run a merge using it. There's gotta be a way... sigh.

  • You could definitely do this with a VSTO add-in or template. That way you could give the user the ability to drag placeholders onto the actual letter and then you could just use VSTO to query the data and populate the document directly.

    I'm assuming you don't like this idea though because you're trying to automate and scale the generation process. But VSTO is a great end-user solution.

    I'll think about your scenario, I think there's a way I just need to focus on it for a bit (and I have a million things running around attm ;-))

  • Yeah, I'm actually headed for a server side solution with no MSOffice Installed. So that's why the VSTO solution is out.

    Thanks, I'll check often to see if you have new ideas. What I'll do for now is recompile when the user changes the letter layout (or anything about it). There must be a way..... Thanks again

Page 1 of 2 (20 items) 12