A long time ago, I wrote Part 1 of this post, based on the presentation I did at the 2010 SharePoint Conference in Sydney. If you're following along with the code, you may want to review that post so you can set up your development environment to match mine.
To quickly recap, the last post showed how to create Word (OpenXML, docx) documents programmatically and write them to disk using the OpenXML SDK (and therefore without the requirement for Word/Office on the machine creating the documents).
In this part, I'll extend the solution to write the documents to a document library in SharePoint and then use Word Automation Services to automatically convert the docx files to PDF format.
To follow along with this walkthrough without changes, setup your SharePoint instance (at http://localhost) as follows:
If you use a remote server, and/or a different site or library names, you'll need to adjust some of the URI and path strings in the code below to make it work.
Carrying on from last time, wire up the click event of the CreateOneSharePointDocumentButton:
Generate a stub for the CreateOneDocumentOnSharePoint() method in the DocGenerator class using the Ctrl+. technique made possible by the Visual Studio 2010 Productivity Power Tools.
Switch to and add a using statement to the DocGenerator class to give you access to the SharePoint Client Libraries - giving it an alias will help disambiguate the File class later:
Using the SharePoint Client Libraries, it's very easy to write documents to a document library, and there's no need to write a document to a local drive. This means we'll use a different overload of the OpenXML SDK's WordprocessingDocument.Create() method that writes, not to a file, but to a MemoryStream.
In this code, you create a new SharePoint.Client.ClientContext that gives access to the site (in this case at http://localhost, but if you've got things set up differently, change it here).
Create an overload of the CreatePackage() method in the DocumentCreator class that creates and populates a MemoryStream:
Move the pointer to the start of the MemoryStream and call the File.SaveBinaryDirect() method passing in the ClientContext, a string indicating where the file should be written, the stream and a boolean that tells SharePoint whether or not to overwrite an existing file with the same name.
Running the app and clicking the One document in SharePoint button shows that it's very fast - in my case 102ms
Writing lots of documents is fast too - add an event handler to the CreateOneSharePointDocumentButton:
And add a CreateManyDocumentsOnSharePointInParallel() method that uses a Parallel.For() loop to call CreatePackage() and File.SaveBinaryDirect() for as many files as you create:
This is also pretty fast - in my case 40ms per document.
Navigating to the document library shows all those documents sitting just where you'd expect to see them:
Up until now, we've not had to use Word (or any other Office client) as all we've been doing is generating documents, not rendering them. Just like you can create an HTML document without requiring a browser, it's perfectly valid to create a Word document (or any other OpenXML format document) without using Word.
However, to view the document, or to create a fixed version of it like PDF or XPS, it's necessary to render it. Up until the release of SharePoint 2010, the highest fidelity way to do this was to open the document in Word. Of course, doing that on the server was fraught with difficulty. Word is not designed to be a server-side tool - it throws (sometimes modal) dialogs, it spends a lot of resources on updating the screen and it's not optimised for multi-processor, large memory scenarios. When there is a user interacting with Word though, the bottleneck is rarely the computer.
The SharePoint team addressed this problem with the Word Automation Services feature in SharePoint 2010 (standard edition and higher). Word Automation Services is the client code from Word with the UI bits stripped out and optimised to run as a server process. All of the rendering engine is available for SharePoint to use without any of the issues (both technical and from a licensing point of view) of using Word on a server. There's lots of great info on Word Automation Services on MSDN and elsewhere. Here's the list of resources I provided in the first post in this series:
Word Automation Services (WAS) document conversion jobs run as as an asynchronous server-side job that can either be scheduled automatically (for example, when a document is placed in a folder) or programmatically. Either way, the job won't start immediately, just the next time the WAS scheduler runs. The frequency of the scheduler running is set in Central Administration - see the links above for details on how to set it up. I set it to the minimum interval - one minute.
Interacting programmatically with the service is pretty straightforward, but there are two gotchas:
Create a new console application and make sure that the target framework is 3.5.
Open the Visual Studio Configuration Manager dialog by dropping down the Solution Configurations drop-down on the Visual Studio Standard toolbar (or choosing Configuration Manager from the Build menu):
Next, add a Solution Platform:
to target Any CPU (or x64)
Now you're ready to start building.
Add references to the Microsoft.SharePoint and Microsoft.Office.Word.Server assemblies.
Add using statements for those assemblies:
Add a couple of static string properties to the class that you can adjust to suit the way you've got your SharePoint setup configured:
Now you can initiate the conversion of a single document:
There are a few things to note here.
Firstly, you get a reference to the Site using the SharePoint libraries, not the SharePoint Client libraries that we used to write the Word docs to the list in the first place.
Next, you need to pass a user token to the new ConversionJob, and you get that from the SPSite user token.
Third, you specify the output format using the SaveFormat enumeration.
Finally, remember the service is performed asynchronously and so although you get a Job ID back, you don't get any more information about the job status (more on that when we do bulk conversions)
Converting whole libraries at once is also very easy. The ConversionJob class has an AddLibrary() method that takes as parameters a source and destination SPList object.
Checking the status of the job is straightforward (as long as you have the JobId - a GUID uniquely identifying this conversion job). The ConversionJobStatus object holds information about the conversion job including how many documents are to be converted, how many have been converted successfully and how many have failed. Calling the Refresh() method gets the most up-to-date status and you can use that to poll for completion. Remember that jobs only start every <n> minutes, where n is a setting in SharePoint Central Administration
The result is a SharePoint list full of PDF files, created without ever needing to open Word.
The combination of the OpenXML SDK and Word Automation Services makes server-side document creation simple, scalable and efficient. This is definitely a tool worth adding to your arsenal.
I've zipped up the two solutions - the document creation (.NET 4.0) WinForms project and the document conversion (.NET3.5) project for you to download and play with. Notice that they are NOT production ready - they're illustrative only. Use them at your peril, your mileage may vary, contents may be hot no guarantees etc … you know the drill.
Document Creation Solution Download (241kB)Document Conversion Solution Download (115kB)
1/4/11 - Updated a couple of images and some of the code explanation
Adam Cogan asked me a question the other day that asked (among other things) "How do you know if a doc has multiple sections?"
In Word, of course, you can break a document up into sections by inserting a section break from the Breaks button in the Page Setup group on the Page Layout tab:
It turns out that counting these programmatically is really easy using the OpenXML SDK 2.0 (download)
I created a new console application, added a reference to DocumentFormat.OpenXML and WindowsBase and used this code:
In it I take the filename passed as an argument and open it ReadOnly (line 22-23). I then find the number of sections using the typed enumerator (lines 26-27)
I also added the path to a file in the Command Line Arguments edit box in the Debug tab of the project Properties so there's a file being passed in when I press F5:
Running the program gives the answer very quickly:
Sample file with 2 sections Download
One of the most effective ways to get a message, any message, across is to use stories. For some reason the human brain appears to be wired to be good at remembering and regurgitating stories. I like to begin presentations with a story that’s somehow relevant and am always on the lookout for a good story to add to my repertoire.
The best stories seem to be those told from a personal perspective - either something that happened to you or something you witnessed. Good stories also stir emotions, both in the storyteller and the listener.
This week I heard two stories that did just that. I’ve been a listener to the Moth Podcast for some time now, and have always enjoyed the mix of professional storytelling with deeply personal and evocative content. Two of the stories I heard this week though moved me as much or more than any other I’ve heard.
The first, told by a comedian, wasn’t funny. The second, told by someone who plays the blues, was uplifting. Both made either made me laugh or cry - literally out loud.
and then tell me whether you experienced the same thing.
While you’re there, subscribe to the Moth podcast either in iTunes or via their RSS feed.
I got this note from Sarah Webb at Readify today - looks like a good way to spend half a day if you’re looking to move from Visual Source Safe to a more modern source control system (and get a heap of additional goodness thrown in)
Readify Dev Day: TFS for VSS Users… Since 2005, Team Foundation Server (TFS) has been providing integrated version control, work management and build capabilities. The release of TFS 2010 builds on the foundations formed in the earlier 2005 and 2008 products and focuses on lowering the barrier of entry for teams wanting to get the maximum benefit from TFS with minimal implementation fuss. Come along to this half day RDN Dev Day event to learn what is involved in getting TFS up and running in your development team. Presented by Readify Senior Consultant and Visual Studio ALM MVP, Stephen Godbold. TOPICS COVERED During this session, Stephen will be covering: What you need to know to deploy and utilise the basic functions of TFS including how to implement version control, build and work management. The roadmap of what greater deployment/integration will bring, along with the business value this delivers. A look at the migration path from Visual Source Safe to Team Foundation Server. EVENT DETAILS Date: Friday 29 April 2011 Location: Cliftons | 190 George Street, SYDNEY Duration: Half Day (including light refreshments) Times: AM Session: 9am – 12pm OR PM Session: 2pm – 5pm
Since 2005, Team Foundation Server (TFS) has been providing integrated version control, work management and build capabilities. The release of TFS 2010 builds on the foundations formed in the earlier 2005 and 2008 products and focuses on lowering the barrier of entry for teams wanting to get the maximum benefit from TFS with minimal implementation fuss.
Come along to this half day RDN Dev Day event to learn what is involved in getting TFS up and running in your development team. Presented by Readify Senior Consultant and Visual Studio ALM MVP, Stephen Godbold.
During this session, Stephen will be covering:
Date: Friday 29 April 2011
Location: Cliftons | 190 George Street, SYDNEY
Duration: Half Day (including light refreshments)
Times: AM Session: 9am – 12pm OR PM Session: 2pm – 5pm
For more information, and to register, go to the Readify site.
Hot on the heels of my post on the Readify DevDay in Sydney, I got a note about even more VSS to TFS workshops (this time presented by Richard Angus from Enhance ALM).