In previous posts, like Importing a Table from Word to Excel, I showed you how to retrieve content within specific content controls. In these posts, content controls were used to add semantic structure to a document, where this structure aided in retrieving and inserting content. What about other types of content? Well, one common request is being able to retrieve content based on styles, where content can be paragraphs, runs, or even tables. In other words, styles, too, can be used to add semantic structure and meaning to a document. In today's post, I am going to show you two things:
Something new I am going to try is to also create a video for my blog posts. Let me know if these videos are helpful to you as well.
To find Word content based on styles we need to take the following actions:
For the sake of this post, let's say I am starting with the following Word document, which contains Paragraph, Run and Table styles:
In this document, I am using the following styles:
If you want to jump straight into the code, feel free to download this solution here.
For this solution I thought it would be really cool to take advantage of Extension Methods for C#. Extension methods allow me to "add" methods to existing types without creating a new derived type, recompiling, or otherwise modifying the original type. In my case, I am going to add three extension methods off of the MainDocumentPart class (remember this class represents the main document.xml part within my Word document):
These three methods are very similar, but have one important difference; these methods all use different strongly typed classes to query for information. These extension methods will live within a class I called WordStyleExtensions. Feel free to reuse or even extend this class for your own purposes.
Since styles are referenced via ids on paragraphs, runs, and tables, we need a way to look up the style id from a style name. The following code accomplishes this task for any style:
This code simply looks up the style id from a style name. If one is not found then the style name is returned.
Let's dive down into the code for retrieving paragraphs based on a style name. As described in the solution section above, this task is broken down into two steps. The first step is to retrieve all paragraphs in the main document, which can be accomplished with the following code:
The next step is to filter down the paragraphs based on whether the paragraph uses a specific style name. This task can accomplished with the following code:
Pretty simple! The cool thing is that these methods can be easily modified to work with runs and tables. Here are the methods to retrieve content based on run and table styles:
Now that our extension methods have been created all we have left to do is call these methods:
Putting everything together and running this code, we end up with an easy way to retrieve content based on styles. For simplicity sake, I decided to just show the number of paragraphs, runs, or tables with a specific style.
Here is the output: