Note: Updated post to point to a YouTube video that is shorter than 10 minutes
In Brian's last blog he talked about Open XML and Office Services as a one-two punch to complete many types of solutions. I am going to start writing some posts that showcase these two technologies working together. Excel is a great tool for connecting to multiple types of data sources, like databases or XML files. Unfortunately, Excel has no concept of connecting to other types of Office documents, like Word files. This fact is important because information workers are pumping out more and more Office documents that contain lots of relevant business data. These documents are becoming just as important as data contained within databases. In today's post, I am going to show you how to extend the reach of Excel Services and hosted charts to gather data from within a library of Word documents.
If you want to jump straight into the code, feel free to download this solution here.
As a Program Manager (PM) in Office, one of our tasks is to write specifications for features that we are designing for the product. These specifications exist in different stages depending on how far along we are in the design. For example, a stage 0 specification is essentially just a placeholder document with no content. A stage 5 specification is a document that contains the full design for a feature and is ready to be coded by developers. Along the way we inevitably run into open issues that must be addressed before the feature can be coded properly. In order to help us track open issues within a document we included a custom style called "Issue" as part of our specification template. This style applies a yellowish background color to a paragraph so that the paragraph easily sticks out. Here is a screenshot of what this style looks like:
Whenever a PM encounters an open issue he/she applies this custom style to the content within the specification. Wouldn't it be great if you can generate a dynamic chart that shows the number of open issues for every PM per given specification stage? This chart would represent the relative health of a particular specification library.
The scenario I listed above talks about interrogating a repository of Word documents and counting the number of open issues within each of the documents. In order to accomplish this scenario we will need to take the following actions:
For the sake of this post and the scenario described above I have created the following template Excel spreadsheet:
This spreadsheet contains only one sheet with a blank chart. The chart will display the number of open issues across all the program managers listed within a SharePoint library broken down by the specification status. The chart is given a specific name so that we can reference it within our Excel Services web part. Add this template document to a SharePoint library.
Note: I am going to show you how to create a web part within SharePoint 2010, but you should be able to do the same thing within SharePoint 2007 (since Excel Services is not new to 2010).
On a SharePoint site where you are an admin, click on the menu command with your name and select Personalize this Page option:
At this point you have the ability to change the look of the page. Click on the Add a Web Part link:
In the Categories section select Office Client Applications. In the Web Parts section select Excel Web Access:
Now we need to link our Excel Services web part to our template Excel spreadsheet. Click the link to open the tool pane, which allows you to reference the correct Excel spreadsheet. In the Excel Services web part properties pane add the following information:
At this point we have an Excel Services web part that shows the chart in our Excel spreadsheet template.
The first step in this task is to iterate through all the Word documents in a given specification library. Here is a code snippet to accomplish this task:
Now that we have access to the Word documents we need to count the number of paragraphs that have the style "Issue" applied. I have already written a post on how to accomplish this task. Check out retrieving Word content based on styles. I am going to reuse the same methods I talked about in that previous post for this scenario. I will reuse the method called GetParagraphListByStyle, which returns a list of paragraphs given a main document part and a specific style name.
Gathering the PM associated with the specification and the spec status is a matter of retrieving content within a specific content control. In my solution I am going to store the PM name, the number of open issues in the specification and the status of the specification within a hash table. Here is the code snippet I used to store the information:
Notice that the specification status can be one of six values, from Stage 0 to Stage 5.
At this point we have all the data from our document library. All that is left is to insert this data into our Excel spreadsheet. This part of the solution is analogous to my previous post on document assembly for SpreadsheetML. We are simply going to create new rows and cells in our spreadsheet for every piece of data within our hash table. Notice that we are adding two types of cells: value based cells and text based cells. These two types of cells are stored differently in Excel. Here is the code snippet used to accomplish this task:
The GetWorkSheetPart method is the same method I wrote about in my previous post on how to copy a worksheet within a workbook.
The last step in the solution is to fix the chart data. Again this code snippet is taken directly from my previous post on document assembly for SpreadsheetML. In any case, here is the code snippet used to accomplish this task:
Running this code I will end up with a chart that represents the number of open issues across a library of Word documents. Clicking Update Chart will ensure the chart reflects the data contained within the library.
I am excited about this solution because it shows how you can extend the reach of our services to accomplish other types of solutions. In this case, having Excel reflect data contained within Word documents.