Welcome to MSDN Blogs Sign in | Join | Help

Embedding an Open XML File in another Open XML File

A couple of weeks ago I gave a presentation on the Open XML SDK to a few customers, where I was asked questions on how to embed files within Open XML documents. I thought it would be a good opportunity to devote a couple of posts around this topic. In today's post I am going to show you how to embed an Open XML file in another Open XML file. Specifically, I am going to show you how to embed an Excel spreadsheet (.xlsx) into a Word document (.docx). Next post will cover how to embed other file types in Open XML files.

My post will talk about using version 2 of the SDK.

If you just want to jump straight into the code, feel free to download this solution here.

Solution

To embed an Excel spreadsheet into a Word document we can take the following actions:

  1. Create a template in Word that contains a content control that will be used to demarcate the region where the embedded object will be inserted
  2. Open up the Word document via the Open XML SDK and access its main document part
  3. Add an image part to the document (this image will be a placeholder image of the embedded object file)
  4. Add an embedded package part to the document
  5. Create a paragraph that contains the embedded object
  6. Locate the content control that will contain the embedded object
  7. Swap out the content control for the newly created paragraph
  8. Save changes made to the Word document

Note that the steps outlined above are just one method to accomplish this scenario.

For the sake of this example, let's say I am starting with the following Word document:

This document contains a content control, named "EmbedObject," which will contain my embedded object. In addition, let's say I have the following Excel spreadsheet I wish to embed:

Embedded Objects in Open XML

Before we get into the code, I wanted to talk more about embedded objects. Office has three ways of storing embedded objects:

  1. Those where Office persists the IStorage as given to Office during OLE operations
  2. Those where Office persists the IStorage as given during OLE operations, but gives the embedded object a friendly extension and filename. This method assumes that the embedded object is a native file format of the application in question
  3. Those where Office interprets the IStorage given during OLE operations as simply a wrapper for a package and only stores the package. This method assumes that the package conforms to Open Packaging Conventions

The major difference between #0 vs. #1 and #2 is in how objects are embedded within a file. Types #1 and #2 allows developers working with Open XML files to more easily extract and insert embedded objects because there is no need to talk to an OLE server. Instead, developers can simply read/write embedded objects as if they were reading from or writing to files on disk. Office differentiates between these three types by looking for a specific registry key under HKCR\CLSID\{Apps_OLE_Storage_CLSID}, where Apps_OLE_Storage_CLSID is the CLSID of the OLE storage server. The Office applications look for a subkey named IPersistStorageType and determines the type of the embedded object in the following manner:

  • Office assumes the embedded object is type #0 if no subkey is specified or if the value of the subkey is 0
  • Office assumes the embedded object is type #1 if the subkey has a value of 1
  • Office assumes the embedded object is type #2 if the subkey has a value of 2

The cool thing is that other applications can take advantage of this reg key. For example, if an application writes out a value of 1 for this subkey for a particular file format then the Office applications will embed files of that type natively in the Open XML file formats.

One more thing to note is that all embedded object types require a prog id, which you can find from the registry, as well as an image representation of the object.

The Code

As mentioned above, when an object is embedded in a document, both a visual representation of the object and the underlying data is stored. The visual representation is simply an image of what you would see if you were to activate the object. For the sake of this solution, my visual representation of the document will be a placeholder image that indicates to users how to refresh the embedded object and will look like the following image:

Looking at the steps outlined above in the Solution section, here is the code snippet to accomplish steps two through four:

using (WordprocessingDocument myDoc = WordprocessingDocument.Open(output, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Png);
imagePart.FeedData(File.Open("placeholder.png", FileMode.Open));
EmbeddedPackagePart embeddedObjectPart =
mainPart.AddEmbeddedPackagePart(@"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
embeddedObjectPart.FeedData(File.Open("embed.xlsx", FileMode.Open));
}

The placeholder.png refers to the placeholder image I showed you above and the embed.xlsx file is the spreadsheet that will be embedded. The string "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" represents the content type of an Excel spreadsheet with an extension .xlsx. You can find the content type of a particular file by going to HKCR\.XXX, where XXX is the extension of the file format, and looking for a sub key named "Content Type."

The next step is to create a paragraph that represents our embedded object. Like my other post on importing SmartArt from PowerPoint to Word, I am going to take advantage of the Document Reflector tool that ships free with the SDK. Using this tool's output as a starting point, I am able to generate the necessary paragraph with the following code snippet:

static Paragraph CreateEmbeddedObjectParagraph(string imageId, string embedId)
{
Paragraph p =
new Paragraph(
new Run(
new EmbeddedObject(
new V.Shapetype(
new V.Stroke() { JoinStyle = V.StrokeJoinStyleValues.Miter },
new V.Formulas(
new V.Formula() { Equation = "if lineDrawn pixelLineWidth 0" },
new V.Formula() { Equation = "sum @0 1 0" },
new V.Formula() { Equation = "sum 0 0 @1" },
new V.Formula() { Equation = "prod @2 1 2" },
new V.Formula() { Equation = "prod @3 21600 pixelWidth" },
new V.Formula() { Equation = "prod @3 21600 pixelHeight" },
new V.Formula() { Equation = "sum @0 0 1" },
new V.Formula() { Equation = "prod @6 1 2" },
new V.Formula() { Equation = "prod @7 21600 pixelWidth" },
new V.Formula() { Equation = "sum @8 21600 0" },
new V.Formula() { Equation = "prod @7 21600 pixelHeight" },
new V.Formula() { Equation = "sum @10 21600 0" }),
new V.Path() { AllowGradientShape = V.BooleanValues.T, ConnectionPointType = OVML.ConnectValues.Rectangle, AllowExtrusion = V.BooleanValues.F },
new OVML.Lock() { Extension = V.ExtensionHandlingBehaviorValues.Edit, AspectRatio = OVML.BooleanValues.T }
) { Id = "_x0000_t75", CoordinateSize = "21600,21600", Filled = V.BooleanValues.F, Stroked = V.BooleanValues.F, OptionalNumber = 75, PreferRelative = V.BooleanValues.T, EdgePath = "m@4@5l@4@11@9@11@9@5xe" },
new V.Shape(
new V.ImageData() { Title = "", RelationshipId = imageId }
) { Id = "_x0000_i1025", Style = "width:500pt;height:400pt", Ole = V.BooleanEntryWithBlankValues.Empty, Type = "#_x0000_t75" },
new OVML.OleObject() { Type = OVML.OLEValues.Embed, ProgId = "Excel.Sheet.12", ShapeId = "_x0000_i1025", DrawAspect = OVML.OLEDrawAspectValues.Content, ObjectId = "_1307530183", Id = embedId }
) { DxaOriginal = (UInt32Value)10957U, DyaOriginal = (UInt32Value)8455U })
);
return p;
}

The last step of the solution is to swap out the content control for this newly created paragraph. This code is very similar to a lot of my previous posts where I used content controls as semantic structures. Here is the code snippet to accomplish this task:

Paragraph p = CreateEmbeddedObjectParagraph(mainPart.GetIdOfPart(imagePart),
mainPart.GetIdOfPart(embeddedObjectPart));
SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>()
.Where(s => s.GetFirstChild<SdtProperties>().GetFirstChild<Alias>().Val.Value
.Equals("EmbedObject")).First();
OpenXmlElement parent = sdt.Parent;
parent.InsertAfter(p, sdt);
sdt.Remove();
mainPart.Document.Save();

End Result

Running this code I should end up with a document that looks like the following:

Upon activating the embedded object I will see the following:

Pretty easy stuff! Next time I will show you how to embed other file formats, like PDF.

Zeyad Rajabi

Removing Page and Section Breaks from a Word Document

In today's post I am going to show you how to remove page and section breaks within a Word document using the Open XML SDK. Removing these two types of breaks is similar, but requires two different approaches. Let's start off by jumping into removing page breaks.

My post will talk about using version 2 of the SDK.

If you just want to jump straight into the code, feel free to download this solution here.

Solution to Remove Page Breaks

To remove page breaks in a document we need to take the following actions:

  1. Open the Word document via the Open XML SDK
  2. Get access to the main document part
  3. Find all page breaks within the main document part
  4. For every page break found, remove it from the document
  5. Save changes

For the sake of this example, let's say I am starting with the following Word document:

This document has a page break (shown outlined in red) on the first page.

The Code

The code is pretty straight forward and follows the solution steps as described above in the solutions section:

static void RemovePageBreaks(string filename)
{
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(filename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
List<Break> breaks = mainPart.Document.Descendants<Break>().ToList();
foreach (Break b in breaks)
{
b.Remove();
}
mainPart.Document.Save();
}
}

Pretty easy stuff!

End Result

Running this code I should end up with a document that looks like the following:

Now let's see how to remove section breaks within a document. Before I actually jump into the solution of removing sections, I want to talk a bit about section breaks within a Word document.

Section Breaks in a Word Document

WordprocessingML does not natively store the concept of pages, since it is based on paragraphs and runs. Instead it uses sections to specify groups of paragraphs that have a specific set of page properties.

Every Word document has at least one section, where each section specifies page properties (like page size, orientation, margins, etc), header/footer references, column information, etc. Given this information, there are really two high level types of sections:

  1. A section as a paragraph property – A document may have zero or more of these types of sections
  2. A document final section property – A document can will only have one of these types of sections

In today's post I am going to show you how to remove all sections that are a paragraph property.

Solution to Remove Section Breaks

To remove section breaks in a document we need to take the following actions:

  1. Open the Word document via the Open XML SDK
  2. Get access to the main document part
  3. Find all paragraph properties that are contain section breaks
  4. For every paragraph property found, remove the section property as a child of the paragraph property
  5. Save changes

For the sake of this example, let's say I am starting with the following Word document:

This document has a section break (shown outlined in red) on the first page, which separates a one column section from a two column section.

The Code

This code is also pretty straight forward and follows the solution steps as described above in the solutions section:

static void RemoveSectionBreaks(string filename)
{
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(filename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
List<ParagraphProperties> paraProps = mainPart.Document.Descendants<ParagraphProperties>()
.Where(pPr => IsSectionProps(pPr)).ToList();
foreach (ParagraphProperties pPr in paraProps)
{
pPr.RemoveChild<SectionProperties>(pPr.GetFirstChild<SectionProperties>());
}
mainPart.Document.Save();
}
}
static bool IsSectionProps(ParagraphProperties pPr)
{
SectionProperties sectPr = pPr.GetFirstChild<SectionProperties>();
if (sectPr == null)
return false;
else
return true;
}

End Result

Running this code I should end up with a document that looks like the following:

Notice how the document now has two columns. This solution removed the first section property, which specified a one column section.

Zeyad Rajabi

Importing SmartArt from PowerPoint to Word

In previous posts, I showed you how to import charts from spreadsheets to a Wordprocessing document and how to import tables from Word documents to spreadsheets. Today, I am going to show you how to import SmartArt from a PowerPoint deck into a Word document.

Solution

To import a SmartArt graphic from a PowerPoint deck to a Word document we need to take the following actions:

  1. Create a template in Word that contains a content control that will be used to demarcate the region where the SmartArt graphic will be inserted
  2. Open up the PowerPoint deck via the Open XML SDK and access the slide with the SmartArt graphic to be imported
  3. Get all the appropriate parts associated with the SmartArt graphic
  4. Retrieve necessary information from the SmartArt graphic, like name and size
  5. Open up the Word document, which will contain the imported SmartArt graphic
  6. Add all the SmartArt related parts to the Word document
  7. Create a paragraph that contains an inline SmartArt graphic, which references all the added parts
  8. Locate the content control that will contain the SmartArt graphic
  9. Swap out the content control for the created paragraph, which contains the SmartArt graphic
  10. Save changes made to the Word document

My post will talk about using version 2 of the SDK.

For the sake of this post, let's say I am starting with the following PowerPoint deck and SmartArt graphic:

Also, let's say I am starting with the following document, which contains a placeholder content control:

If you just want to jump straight into the code, feel free to download this solution here.

The Code

As described in the solution section above, steps two and three require us to open the PowerPoint deck, find the slide with the SmartArt graphic, and extract all related parts. Below is the code snippet necessary to accomplish those tasks:

using (PresentationDocument myPres = PresentationDocument.Open(presentationFile, true))
{
PresentationPart presPart = myPres.PresentationPart;
//Get the slide that contains the SmartArt graphic
SlidePart slide = (SlidePart)presPart.GetPartById("rId2");
//Get all the appropriate parts associated with the SmartArt
DiagramLayoutDefinitionPart layoutPart =
slide.DiagramLayoutDefinitionParts.First();
DiagramDataPart dataPart = slide.DiagramDataParts.First();
DiagramColorsPart colorsPart = slide.DiagramColorsParts.First();
DiagramStylePart stylePart = slide.DiagramStyleParts.First();
...
}

Note that in the code above I knew which part/slide contained the SmartArt graphic.

Next step is to retrieve some necessary information from the SmartArt graphic itself. For example, we can retrieve the name, id, extent width, and extent height. The following code snippet retrieves the SmartArt elements that contain all this information (we will use these objects later in the code):

//Get some of the appropriate properties off the SmartArt graphic
GraphicFrame graphicFrame = slide.Slide.Descendants<GraphicFrame>().First();
NonVisualDrawingProperties drawingPr =
graphicFrame.Descendants<NonVisualDrawingProperties>().First();
Draw.Extents extents = graphicFrame.Descendants<Draw.Extents>().First();

Now we get to do the fun stuff; import the SmartArt graphic into the Word document. Below is the code snippet necessary to accomplish steps five and six as mentioned in the solution section above:

//Import SmartArt into Word document
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(outputFile, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
//Add the SmartArt parts to the Word document
DiagramLayoutDefinitionPart docLayoutPart = mainPart.AddPart<DiagramLayoutDefinitionPart>(layoutPart);
DiagramDataPart docDataPart = mainPart.AddPart<DiagramDataPart>(dataPart);
DiagramColorsPart docColorsPart = mainPart.AddPart<DiagramColorsPart>(colorsPart);
DiagramStylePart docStylePart = mainPart.AddPart<DiagramStylePart>(stylePart);
//Get all the relationship ids of the added parts
docLayoutPartId = mainPart.GetIdOfPart(docLayoutPart);
docDataPartId = mainPart.GetIdOfPart(docDataPart);
docColorsPartId = mainPart.GetIdOfPart(docColorsPart);
docStylePartId = mainPart.GetIdOfPart(docStylePart);
...
}

Notice how we are simply importing the parts from the PowerPoint deck into the Word document. The next step is to create a paragraph object that contains the SmartArt. The easiest way to accomplish this task is to leverage the Document Reflector tool that comes with the SDK. This tool is awesome! It automatically generates SDK code based on a document or content within a document. I would recommend checking out Erika's post on how to use the Document Reflector.

Here is a screenshot of the Document Reflector in action when reading in a Word document containing a SmartArt graphic. Notice the nice C# code it generated for me.

Copying this code into my solution I get the following:

//Use the document reflector to figure out how to add a SmartArt graphic to Word
//Change attribute values based on specifics related to the SmartArt
Paragraph p = new Paragraph(
new Run(
new Drawing(
new WP.Inline(
new WP.Extent() { Cx = extents.Cx, Cy = extents.Cy },
new WP.EffectExtent() { LeftEdge = 0L, TopEdge = 0L,
RightEdge = 0L, BottomEdge = 0L },
new WP.DocProperties() { Id = drawingPr.Id, Name =
drawingPr.Name },
new WP.NonVisualGraphicFrameDrawingProperties(),
new Draw.Graphic(
new Draw.GraphicData(
new Dgm.RelationshipIds() { DataPart = docDataPartId,
LayoutPart = docLayoutPartId, StylePart = docStylePartId, ColorPart = docColorsPartId }
) { Uri = "http://schemas.openxmlformats.org/drawingml/2006/diagram" })
) { DistanceFromTop = (UInt32Value)0U, DistanceFromBottom = (UInt32Value)0U, DistanceFromLeft = (UInt32Value)0U, DistanceFromRight = (UInt32Value)0U })));

The code snippet above is pretty much a copy/paste. The only difference is that I added the correct extent width and height values as well as the proper name and ids for the SmartArt graphic.

Almost done! The next thing we need to do is find the content control within the Word document and swap out the content control for this newly created paragraph. The following code snippet accomplishes this task:

//Find the content control in Word where the SmartArt will be added
SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>()
.Where(s =>
s.GetFirstChild<SdtProperties>().GetFirstChild<Alias>().Val.Value
.Equals("SmartArt")).First();
//Swap out the content control for the SmartArt
OpenXmlElement parent = sdt.Parent;
parent.InsertAfter(p, sdt);
sdt.Remove();
mainPart.Document.Save();

End Result

Putting everything together and running this code, we end up with a Word document, which contains the SmartArt imported from my deck.

Here is a screenshot of the final document:

Zeyad Rajabi

Office 2010 and how it can help you build powerful solutions...

There will be more details over the coming months on what's coming in Office 2010, and the power it brings to developers. I'm particularly excited to start talking about the ways in which you can combine the power of the rich clients with server side technologies to build solutions that no longer just target a specific individual using a specific application, but instead expand out to the workgroup, and all the content they are leveraging to accomplish their goals.

Many developers are already building solutions that benefit from the combination of SharePoint and the Office rich clients, and I wanted to make sure that folks were aware of the developer conference coming up this fall that will go into all the details of Office 2010. As Gray points out in his latest blog post (http://blogs.technet.com/gray_knowlton/archive/2009/05/30/office-2010-for-developers-office-developer-conference-moving-to-sharepoint-conference-2009.aspx), there will be a dedicated track at the SharePoint conference 2009 specifically focused on the rich extensibility model in the latest version of the Office client suite. Many of the features that provide immediate value to the typical end users also will allow developers to build some pretty interesting and powerful custom applications.

Also, don't forget to sign up for the technical preview

-Brian

Posted by BrianJones | 4 Comments
Filed under:

Breaking Changes in the Open XML SDK v2 April 2009 CTP

One of the big changes we made in the Open XML SDK v2 April 2009 CTP was improving the Low Level DOM component to include functionality related to Office 2007 SP2. With this improvement came a difference in how some elements were interpreted as a 1st class property of a parent class/element vs. as a child element. For example, SdtProperties is no longer a property off of the SdtXXX classes, but is rather interpreted as just a child element. Scouring through the different customer feedback channels, the Open XML SDK forum, www.openxmldeveloper.org, and the SDK Connect site, I've noticed that this change broke some of my previous posts and code samples. In today's post, I am going to show you a workaround to this issue and I'm going to point you to April 2009 CTP complaint versions of sample code that is currently broken.

SdtBlock does not contain a definition for SdtProperties

Many of my Word related SDK posts leverage content controls as semantic structure. In my posts, I show you how to use these content controls to easily find content or add content to a specific area. In many of these posts I used the following code snippet construct:

SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>()
.Where(s => s.SdtProperties.GetFirstChild<Alias>().Val.Value
.Equals(sdtName)).First();

If you try to compile the above code snippet in the latest CTP of the SDK you will get the following compiler error:

'DocumentFormat.OpenXml.Wordprocessing.SdtBlock' does not contain a definition for 'SdtProperties' and no extension method 'SdtProperties' accepting a first argument of type 'DocumentFormat.OpenXml.Wordprocessing.SdtBlock' could be found (are you missing a using directive or an assembly reference?)

To resolve this error you need to get access to SdtProperties as child element rather than a property. This task can be accomplished with the following code snippet:

SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>()

.Where(s =>

s.GetFirstChild<SdtProperties>().GetFirstChild<Alias>().Val.Value

.Equals(sdtName)).First();

SdtBlock does not contain a definition for SdtContentBlock

Similarly to the issue with SdtProperties, SdtContentBlock was moved to be a child element rather than a property. You will need to change the following code snippet from:

Paragraph p = sdt.SdtContentBlock.GetFirstChild<Paragraph>();

To:

Paragraph p = sdt.GetFirstChild<SdtContentBlock>().GetFirstChild<Paragraph>();

Links to Fixed Code Samples/Solutions

Here are links to fixed versions of previous solutions that were broken due to April 2009 CTP changes:

Feedback

We are currently looking into improving our SDK behavior to re-include some of the lost functionality of not seeing some elements as 1st class properties. Stay tuned for an update.

I also wanted to take this time to thank you guys for all your feedback. Keep sending it our way.

Thanks,

Zeyad Rajabi

Retrieving Word Content Based on Styles

In previous posts, like Importing a Table from Word to Excel, I showed you how to retrieve content within specific content controls. In these posts, content controls were used to add semantic structure to a document, where this structure aided in retrieving and inserting content. What about other types of content? Well, one common request is being able to retrieve content based on styles, where content can be paragraphs, runs, or even tables. In other words, styles, too, can be used to add semantic structure and meaning to a document. In today's post, I am going to show you two things:

  1. How to retrieve content based on styles
  2. How to extend the Open XML SDK with Extension Methods

Something new I am going to try is to also create a video for my blog posts. Let me know if these videos are helpful to you as well.

Solution

To find Word content based on styles we need to take the following actions:

  1. Open up a Word document via the Open XML SDK
  2. Get access to the main document part
  3. Find the style id that references the style name. The style id is referenced in paragraphs, runs, and tables
  4. Look for all paragraphs, runs, or tables within the main document part
  5. Filter down the list of paragraphs, runs, or tables based on whether those objects/elements reference a specific style name or not
  6. Return back the final list of paragraphs, runs, or tables

For the sake of this post, let's say I am starting with the following Word document, which contains Paragraph, Run and Table styles:

In this document, I am using the following styles:

  • Paragraph Style – Heading 1
  • Run Style – Intense Emphasis
  • Table Style – Light List Accent 1

If you want to jump straight into the code, feel free to download this solution here.

The Code

For this solution I thought it would be really cool to take advantage of Extension Methods for C#. Extension methods allow me to "add" methods to existing types without creating a new derived type, recompiling, or otherwise modifying the original type. In my case, I am going to add three extension methods off of the MainDocumentPart class (remember this class represents the main document.xml part within my Word document):

  1. ParagraphsByStyleName – This method will retrieve a list of paragraphs contained within the main document part that have a specific style name
  2. RunsByStyleName – This method will retrieve a list of runs contained within the main document part that have a specific style name
  3. TablesByStyleName – This method will retrieve a list of tables contained within the main document part that have a specific style name

These three methods are very similar, but have one important difference; these methods all use different strongly typed classes to query for information. These extension methods will live within a class I called WordStyleExtensions. Feel free to reuse or even extend this class for your own purposes.

Since styles are referenced via ids on paragraphs, runs, and tables, we need a way to look up the style id from a style name. The following code accomplishes this task for any style:

private static string GetStyleIdFromStyleName(MainDocumentPart mainPart, string styleName)
{
StyleDefinitionsPart stylePart = mainPart.StyleDefinitionsPart;
string styleId = stylePart.Styles.Descendants<StyleName>()
.Where(s => s.Val.Value.Equals(styleName))
.Select(n => ((Style)n.Parent).StyleId).FirstOrDefault();
return styleId ?? styleName;
}

This code simply looks up the style id from a style name. If one is not found then the style name is returned.

Let's dive down into the code for retrieving paragraphs based on a style name. As described in the solution section above, this task is broken down into two steps. The first step is to retrieve all paragraphs in the main document, which can be accomplished with the following code:

public static IEnumerable<Paragraph> ParagraphsByStyleName(this MainDocumentPart mainPart, string styleName)
{
string styleId = GetStyleIdFromStyleName(mainPart, styleName);
IEnumerable<Paragraph> paraList =
mainPart.Document.Descendants<Paragraph>()
.Where(p => IsParagraphInStyle(p, styleId));
return paraList;
}

The next step is to filter down the paragraphs based on whether the paragraph uses a specific style name. This task can accomplished with the following code:

private static bool IsParagraphInStyle(Paragraph p, string styleId)
{
ParagraphProperties pPr = p.GetFirstChild<ParagraphProperties>();
if (pPr != null)
{
ParagraphStyleId paraStyle = pPr.ParagraphStyleId;
if (paraStyle != null)
{
return paraStyle.Val.Value.Equals(styleId);
}
}
return false;
}

Pretty simple! The cool thing is that these methods can be easily modified to work with runs and tables. Here are the methods to retrieve content based on run and table styles:

public static IEnumerable<Run> RunsByStyleName(this MainDocumentPart mainPart, string styleName)
{
string styleId = GetStyleIdFromStyleName(mainPart, styleName);
IEnumerable<Run> runList = mainPart.Document.Descendants<Run>()
.Where(r => IsRunInStyle(r, styleId));
return runList;
}
private static bool IsRunInStyle(Run r, string styleId)
{
RunProperties rPr = r.GetFirstChild<RunProperties>();
if (rPr != null)
{
RunStyle runStyle = rPr.RunStyle;
if (runStyle != null)
{
return runStyle.Val.Value.Equals(styleId);
}
}
return false;
}
public static IEnumerable<Table> TablesByStyleName(this MainDocumentPart mainPart, string styleName)
{
string styleId = GetStyleIdFromStyleName(mainPart, styleName);
IEnumerable<Table> tableList = mainPart.Document.Descendants<Table>()
.Where(t => IsTableInStyle(t, styleId));
return tableList;
}
private static bool IsTableInStyle(Table tbl, string styleId)
{
TableProperties tblPr = tbl.GetFirstChild<TableProperties>();
if (tblPr != null)
{
TableStyle tblStyle = tblPr.TableStyle;
if (tblStyle != null)
{
return tblStyle.Val.Value.Equals(styleId);
}
}
return false;
}

Now that our extension methods have been created all we have left to do is call these methods:

static void Main(string[] args)
{
string paraStyle = "Heading1";
string runStyle = "IntenseEmphasis";
string tableStyle = "LightList-Accent1";
using (WordprocessingDocument myDoc = WordprocessingDocument.Open("input.docx", true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
Console.WriteLine("Number of paragraphs with " + paraStyle + " styles: " + mainPart.ParagraphsByStyleName(paraStyle).Count());
Console.WriteLine("Number of runs with " + runStyle + " styles: " + mainPart.RunsByStyleName(runStyle).Count());
Console.WriteLine("Number of tables with " + tableStyle + " styles: " + mainPart.TablesByStyleName(tableStyle).Count());
}
Console.ReadKey();
}

End Result

Putting everything together and running this code, we end up with an easy way to retrieve content based on styles. For simplicity sake, I decided to just show the number of paragraphs, runs, or tables with a specific style.

Here is the output:

Zeyad Rajabi

Removing Comments from Excel and PowerPoint Files

In a previous post, I showed you how to remove comments from a Word file. In today's post, I am going to show you how to accomplish the same scenario, but this time with Excel and PowerPoint files. Excel and PowerPoint have a Document Inspector feature, which is able to remove multiple types of data/content, including comments. This feature works great for client side solutions, but how do you cleanse these files of comments on the server? The Open XML SDK can accomplish these scenarios in just a few lines of code.

Note: The code showed in this post is backwards compatible with version 1 of the SDK.

Solution for PowerPoint

Imagine I start off with a PowerPoint deck that has multiple slides and comments. Here is a screenshot of a comment within a PowerPoint slide:

To remove comments from a PowerPoint deck we need to take the following actions:

  1. Open up the PowerPoint deck via the Open XML SDK
  2. Access the main presentation part, which will give us access to all the slide parts within the package
  3. Delete the comment part associated with each slide part
  4. Save changes made to the deck

If you want to jump straight into the code, feel free to download this solution here.

The Code

The code is pretty easy and maps 1:1 to the steps I mentioned above:

static void RemovePowerPointComments(string filename)
{
using (PresentationDocument myPresentation = PresentationDocument.Open(filename, true))
{
PresentationPart presentationPart = myPresentation.PresentationPart;
foreach (SlidePart slide in presentationPart.SlideParts)
{
slide.DeletePart(slide.SlideCommentsPart);
}
}
}

End Result

Running this method, I end up with a presentation void of comments. Pretty easy!

Here is a screenshot of the final presentation:

Solution for Excel

This solution is very similar to the PowerPoint solution.

Imagine I start off with an Excel workbook that has multiple worksheets, where each worksheet contains multiple comments. Here is a screenshot of a comment within a worksheet cell:

To remove comments from an Excel workbook we need to take the following actions:

  1. Open up the Excel workbook via the Open XML SDK
  2. Access the main workbook part, which will give us access to all the worksheet parts within the package
  3. Delete all comment parts associated with each worksheet part
  4. Save changes made to the workbook

If you want to jump straight into the code, feel free to download this solution here.

The Code

This code is very similar to the code used to remove comments from a PowerPoint deck:

static void RemoveExcelComments(string filename)
{
using (SpreadsheetDocument myWorkbook = SpreadsheetDocument.Open(filename, true))
{
WorkbookPart workbookPart = myWorkbook.WorkbookPart;
foreach (WorksheetPart sheet in workbookPart.WorksheetParts)
{
sheet.DeleteParts<WorksheetCommentsPart>(sheet.GetPartsOfType<WorksheetCommentsPart>());
}
}
}

End Result

Running this method, I end up with a workbook void of comments.

Here is a screenshot of the final workbook:

Zeyad Rajabi

Announcing the Release of the Open XML SDK Version 2 April 2009 CTP

I'm really happy to announce the release of the second CTP for version 2 of the Open XML SDK! Back in October 2008 I showed you guys an architecture diagram of the Open XML SDK. Let's take another look at this diagram:

In version 1 of the Open XML SDK we released the Open XML Packaging API, which allows you to create, open and manipulate Open XML files at the package and part level. In the first CTP of version 2 we released the Open XML Low Level DOM and Stream Reading/Writing components, which allow you to create and manipulate objects within xml parts contained in an Open XML package. In the second CTP of version 2 we are providing schema level validation functionality:

More information about the SDK and all of its components can be found here: http://msdn.microsoft.com/en-us/library/bb448854(office.14).aspx

Download Details

You can download the latest CTP for version 2 here: http://go.microsoft.com/fwlink/?LinkId=127912

Importance of Validation

If you've played around with the Open XML SDK there is a good chance at one point in time your solution has created an invalid or corrupt Open XML file. Manipulating Open XML Formats by using the Open XML Base layer makes it much easier for you to work on the Open XML files, but doing so does not guarantee the production of valid Open XML files.

The new Schema Level Validation component provides a mechanism to help you discover Open XML errors within files and in your code. This component assists you in debugging and validating Open XML files based on the schemas.

An Example

Let's say I want to create a simple document with the text "hello world." The XML snippet necessary to create this text is the following:

<w:body>
<w:p>
<w:r>
<w:t>hello world</w:t>
</w:r>
</w:p>
...
</w:body>

What happens if I forget to include the run element:

<w:body>
<w:p>
<w:t>hello world</w:t>
</w:p>
...
</w:body>

If I try to open the generated document in Word I will get the following error:

Not super helpful, right. Validating this file with .NET I get the following error message:

\word\document.xml

The element 'p' in namespace 'http://schemas.openxmlformats.org/wordprocessingml/2006/main' has invalid child element 't' in namespace 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'. List of possible elements expected: 'pPr, customXml, smartTag, sdt, dir, bdo, r, proofErr, permStart, permEnd, bookmarkStart, bookmarkEnd, moveFromRangeStart, moveFromRangeEnd, moveToRangeStart, moveToRangeEnd, commentRangeStart, commentRangeEnd, customXmlInsRangeStart, customXmlInsRangeEnd, customXmlDelRangeStart, customXmlDelRangeEnd, customXmlMoveFromRangeStart, customXmlMoveFromRangeEnd, customXmlMoveToRangeStart, customXmlMoveToRangeEnd, ins, del, moveFrom, moveTo' in namespace 'http://schemas.openxmlformats.org/wordprocessingml/2006/main' as well as 'oMathPara, oMath' in namespace 'http://schemas.openxmlformats.org/officeDocument/2006/math' as well as 'fldSimple, hyperlink, subDoc' in namespace 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'. (line 1, col 703)

Yes this error message is pretty useful, but the Open XML SDK schema validation component provides a bit more useful information. Here is the error message as described by the Open XML SDK:

The element has invalid child element "http://schemas.openxmlformats.org/wordprocessingml/2006/main:t". List of possible elements expected: <http://schemas.openxmlformats.org/wordprocessingml/2006/main:pPr>. Path: /word/document.xml:/w:document[1]/w:body[1]/w:p[1] OuterXml: <w:p w:rsidR="00900A7A" w:rsidRDefault="00E0086A" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:t>test</w:t></w:p>

Notice that the error gives you the exact XPath to the element and XML snippet that triggered this validation error. There are more advantages, but I will leave talking about those advantages for a future post.

Questions

Feel free to post any questions here or on the Open XML SDK MSDN forum: http://social.msdn.microsoft.com/Forums/en-US/oxmlsdk/threads. You can also post questions on the Open XML SDK Connect site: https://connect.microsoft.com/site/sitehome.aspx?SiteID=589.

Zeyad Rajabi

Importing a Table from WordprocessingML to SpreadsheetML

In a previous post, I showed you how to import charts from spreadsheets to a Wordprocessing document. Today, as promised in my last post, I am going to show you how to import a table from a Wordprocessing document into a spreadsheet.

Solution

To import a table from a Wordprocessing document to a spreadsheet we need to take the following actions:

  1. Create a template in Word that contains a content control that will be used to demarcate the region of the table to be imported
  2. Open up the Word document via the Open XML SDK
  3. Open up an output Excel workbook, which will be used to contain the imported table, via the Open XML SDK
  4. Locate the content control that contains the Word table and access the table object
  5. Convert the Word table into an Excel spreadsheet table. By convert I am mean manually create an Excel table based on the Word table data
  6. Add a table definition part within the Excel workbook
  7. Give the added table definition a unique name and id
  8. Specify the range of the table based on the inserted data
  9. Add column header information to the table definitions part
  10. Add a style reference to the table definition
  11. Save changes made to the Excel workbook

My post will talk about using version 2 of the SDK.

For the sake of this post, let's say I am starting with the following Word document and table (notice that the table is contained within a content control):

Also, let's say I am starting with a completely empty Excel workbook, which only contains a style definition for tables:

If you just want to jump straight into the code, feel free to download this solution here.

The Code

As described in the solution section above, steps two and three require us to open the Word and Excel files. Below is the code snippet necessary to accomplish these tasks:

//Open Word document
using (WordprocessingDocument myDoc = WordprocessingDocument.Open("tables.docx", true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
//Open spreadhseet
using (SpreadsheetDocument mySpreadsheet = SpreadsheetDocument.Open("output.xlsx", true))
{
WorkbookPart workbookPart = mySpreadsheet.WorkbookPart;
//Import a specific table from the document into a specific sheet within the spreadsheet
ImportWordTable(mainPart, "WorldPopulationTable", workbookPart, "Sheet1");
}
}

I created a generic method called ImportWordTable, which will accomplish the rest of the steps in the solution section above. The next step is to get access to the Word table object found within the content control, which can be accomplished with the following code:

//Find the content control that contains the specific table
SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>()
.Where(s => s.SdtProperties.GetFirstChild<Alias>().Val.Value
.Equals(sdtName)).First();
Table tbl = sdt.SdtContentBlock.GetFirstChild<Table>();

Now that we have access to the Word table our next task is to create an Excel table based on the data contained within the Word table. This task can be accomplished with the following code snippet:

static Spreadsheet.SheetData ConvertWordTableToExcel(Table tbl)
{
//sheetdata contains data for table
Spreadsheet.SheetData sheetdata = new Spreadsheet.SheetData();
//For every row in my Word table we need to create a row in Excel
foreach (TableRow row in tbl.Descendants<TableRow>())
{
numRows++;
Spreadsheet.Row sheetRow = new Spreadsheet.Row();
//For every Word cell create an Excel cell
foreach (TableCell cell in row.Descendants<TableCell>())
{
Spreadsheet.Cell sheetCell = new Spreadsheet.Cell();
string textValue = cell.InnerText;
double numValue;
//Check to see if value is a number rather than a string
//First row should be strings because I am going to create a table
if ((numRows != 1) && (Double.TryParse(textValue, System.Globalization.NumberStyles.Any, null, out numValue)))
{
Spreadsheet.CellValue v = new Spreadsheet.CellValue();
textValue = numValue.ToString();
v.Text = textValue;
sheetCell.Append(v);
}
else //value is a string
{
sheetCell.DataType = Spreadsheet.CellValues.InlineString;
Spreadsheet.InlineString inlineString = new Spreadsheet.InlineString();
Spreadsheet.Text t = new Spreadsheet.Text();
t.Text = textValue;
inlineString.Append(t);
sheetCell.Append(inlineString);
}
//Need to keep track of column headers for the table
//definitions part
if (numRows == 1)
colHeaders.Add(textValue);
sheetRow.Append(sheetCell);
}
    
sheetdata.Append(sheetRow);
}
return sheetdata;
}

The code snippet above simply creates an Excel spreadsheet row and cell for every Word table row and cell. A spreadsheet cell is a bit special as compared to a Word cell because a spreadsheet cell defines the datatype of the cell content. In the example above, we have two types of spreadsheet cells:

  1. Inline string cells
  2. Number value cells

Since this solution is about creating an Excel table, we need to take care of a couple of extra things. For one, we need to make sure that the first row of the table has only inline string cells, since we will be identifying this row as our table header. Secondly, we need to keep track of the header row strings because we need to reuse these strings within the table definitions part.

At the end of this step we have generated a valid Excel spreadsheet that contains all of the data from the Word table:

We could have called this job done at this point, but we can further improve this output by formatting the data as a table.

Looking at step #6 in the solution section above, we need to add a reference to a table definitions part within the worksheet where the data exists. Here is the code snippet that creates a new table definition part and adds a reference to that part from the worksheet that contains the table data:

//Add the table definitions part to make the imported table look like a table
TableDefinitionPart tableDefPart = worksheetPart.AddNewPart<TableDefinitionPart>(relId);
//Reference the added table part
Spreadsheet.TableParts tableParts = new Spreadsheet.TableParts();
Spreadsheet.TablePart tablePart = new Spreadsheet.TablePart();
tablePart.Id = relId;
tableParts.Append(tablePart);
worksheetPart.Worksheet.Append(tableParts);

The last major set of steps is to create the appropriate xml within our new table definitions part. Every table definition needs a unique name, unique id, display name, and appropriate data range reference in order to get everything to work properly. The following code snippet accomplishes these tasks:

int id = 1;
tableDefPart.Table = new Spreadsheet.Table();
tableDefPart.Table.Id = 1;
tableDefPart.Table.Name = "Table1";
tableDefPart.Table.DisplayName = "Table1";
char endCol = 'A';
//Note that this approach is good for columns up to Z
//Excel can contain more columns, but 26 columns should
//be good enough for this demo
for (int i = 1; i < numCols; i++)
endCol++;
//Specify the range of the table
string reference = "A1:" + endCol + numRows;
tableDefPart.Table.Reference = reference;

One advantage of an Excel table vs. a Word table is an Excel table allows you to filter values within a column. It's pretty easy to add this functionality to our table with the following code:

//Make sure the table has the ability to filter
Spreadsheet.AutoFilter autoFilter = new Spreadsheet.AutoFilter();
autoFilter.Reference = reference;
Spreadsheet.TableColumns tableColumns = new Spreadsheet.TableColumns();
tableColumns.Count = numCols;

Now let's add the table header information:

//Add the column headers to the table definition part
foreach (string s in colHeaders)
{
Spreadsheet.TableColumn tableColumn = new Spreadsheet.TableColumn();
tableColumn.Id = (uint)id;
tableColumn.Name = s;
id++;
tableColumns.Append(tableColumn);
}

Almost done! Let's make this table look good by applying a table style. Once we apply the style we can append all the appropriate elements together and save our changes. These tasks can be accomplished with the following code:

//Apply a nice table style (contained within my template)
Spreadsheet.TableStyleInfo tableStyleInfo = new Spreadsheet.TableStyleInfo();
tableStyleInfo.Name = "TableStyleMedium9";
tableStyleInfo.ShowRowStripes = true;
tableDefPart.Table.Append(autoFilter, tableColumns, tableStyleInfo);
tableDefPart.Table.Save();

End Result

Putting everything together and running this code, we end up with an Excel workbook, which contains a table imported from my Word document.

Here is a screenshot of the final workbook (notice the filtering capability):

Zeyad Rajabi

More Open XML Solution Tools

While I finish up another blog solution, this time on importing a table from Word into Excel, I thought I would share some information on two useful tools you guys can leverage when building Open XML solutions. The first tool I want to talk about is the next release of PowerTools for Open XML. PowerTools for Open XML is an open source project on CodePlex, which is entirely based on version 1 of the Open XML SDK. This tool supports the PowerShell piping architecture, by providing 30+ cmdlets. Think of a cmdlet as a black box, where something comes in and something else comes out. In the case of PowerTools for Open XML, these cmdlets create and modify Open XML documents in a variety of ways. For example, there are cmdlets that remove comments, accept tracked changes, merge multiple documents or sections, and even create charts. One of the coolest things about this tool is that IT professionals and developers can perform batch scripts right within PowerShell. By the way, this tool is released as open source, under the Microsoft Public License (Ms-PL), which means you are free to use the code in your own solutions.

Eric White has a great blog post outlining some of the details with respect to version 1.1 of the PowerTools for Open XML. In this post he has some great examples showing you how to merge sections of documents together. In my previous post, I showed you the easy way to assemble multiple Word documents by taking advantage of altChunks. Version 1.1 of the PowerTools for Open XML actually does merging the hard way. Check it out and let us know what you think.

The next tool I want to talk about is around creating Office document solutions within native code, like C++. The Open XML SDK is a great tool, but is essentially tied down to .NET technologies because of its reliance on System.IO.Packaging. How do you create a solution using native code? In the past you would either have to write your own native -> managed transition layer or do some kind of context switch. The good news is Microsoft has just released a Win32 version of System.IO.Packaging, called Win32 OPC. Feel free to post any questions about this technology at the following forum or at the Win32 OPC blog.

Thanks,

Zeyad Rajabi

Importing Charts from Spreadsheets to Wordprocessing Documents

There have been several requests made by people asking how to import a chart from one document type to another document type. In a previous post, I showed you how to push data into a spreadsheet to create a chart. Today, I am going to show you how to import a chart from a spreadsheet to a Word document.

Solution

To import a chart from a spreadsheet to a Word document we need to take the following actions:

  1. Create a template in Word that contains a content control that will be used to demarcate the region where the chart will be inserted
  2. Open up the Word document via the Open XML SDK and access its main document part
  3. Locate the content control that will contain the chart
  4. Nuke any placeholder content within the content control
  5. Create a new run and inline drawing object to be inserted in the content control. This inline drawing will contain the referencing information for the chart
  6. Open up the spreadsheet via the Open XML SDK and access all the appropriate parts (main workbook part, worksheet part, drawing part, and chart part)
  7. Clone the chart part and add it to the Word document
  8. Clone the chart graphic information (name of the chart and properties) from the spreadsheet and add it to the Word document
  9. Give the added chart a unique name and id in the Word document
  10. Add the chart data to the content control
  11. Save changes made to the Word document

My post will talk about using version 2 of the SDK.

For the sake of this post, let's say I am starting with the following spreadsheet and chart:

Also, let's say I am starting with the following Word document, which contains a placeholder content control:

If you just want to jump straight into the code, feel free to download this solution here.

The Code

As described in the solution section above, steps two through four require us to open the Word document, find the content control that will contain the imported chart, and nuke the placeholder content of the content control to get it ready to contain the chart. Below is the code snippet necessary to accomplish those tasks:

static void ImportChartFromSpreadsheet(string spreadsheetFileName, string wordFileName)
{
//Open Word document
using (WordprocessingDocument myWordDoc = WordprocessingDocument.Open(wordFileName, true))
{
//Find the content control that will contain the chart
MainDocumentPart mainPart = myWordDoc.MainDocumentPart;
SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>()
.Where(s => a.SdtProperties.GetFirstChild<Alias>().Val.Value
.Equals("Chart1")).First();
//Nuke the placeholder content of the content control
Paragraph p = sdt.SdtContentBlock.GetFirstChild<Paragraph>();
p.RemoveAllChildren();
...
}
}

You might notice the code used to find the proper content control looks very much like the code I used in my previous post for assembling Word documents. By nuking all the children under the paragraph within the content control, we have made it ready to contain a chart.

Instead of having run and text elements, the content control needs to have a run with an inline drawing element so that we can reference the chart properly. This task is accomplished with the following code snippet:

//Create a new run that has an inline drawing object
Run r = new Run();
p.Append(r);
Drawing drawing = new Drawing();
r.Append(drawing);
//These dimensions work perfectly for my template document
wp.Inline inline = new wp.Inline(
new wp.Extent()
{ Cx = 5486400, Cy = 3200400 });

Notice that I have hardcoded the dimensions of the chart I will be importing. The dimensions I used are based on what I thought would work best for my template. Feel free to choose any dimension that works best for your document. At this point, the Word document is ready to have the chart imported and be contained within the content control. To import the chart we need to first get the proper chart part and then add that part to my Word document. These tasks are accomplished with the following code snippet:

//Open Excel spreadsheet
using (SpreadsheetDocument mySpreadsheet =
SpreadsheetDocument.Open(spreadsheetFileName, true))
{
//Get all the appropriate parts
WorkbookPart workbookPart = mySpreadsheet.WorkbookPart;
WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById("rId1");
DrawingsPart drawingPart = worksheetPart.DrawingsPart;
ChartPart chartPart = (ChartPart)drawingPart.GetPartById("rId1");
//Clone the chart part and add it to my Word document
ChartPart importedChartPart = mainPart.AddPart<ChartPart>(chartPart);
string relId = mainPart.GetIdOfPart(importedChartPart);
...
}

The chart part is now part of my Word document package. All we have to do now is reference the chart part properly from the inline drawing contained in the content control. There are two main items we need to include to get everything to work:

  1. A graphic reference to the chart part
  2. A unique id and a chart name needs to be specified for each chart

#1 is easy to accomplish since we can reuse the same graphic objects from the spreadsheet. The main difference is we need to make sure the relationship is pointing to the appropriate chart part within the Word document. Here is the code snippet to accomplish item #1:

//The frame element contains information for the chart
GraphicFrame frame =
drawingPart.WorksheetDrawing.Descendants<GraphicFrame>().First();
string chartName = frame.NonVisualGraphicFrameProperties.NonVisualDrawingProperties.Name;
//Clone this node so we can add it to my Word document
d.Graphic clonedGraphic = (d.Graphic)frame.Graphic.CloneNode(true);
ChartReference c = clonedGraphic.GraphicData.GetFirstChild<ChartReference>();
c.Id = relId;


#2 requires us to calculate a unique id for the chart. Here is a method that can accomplish this task:

static uint GetMaxDocPrId(MainDocumentPart mainPart)
{
uint max = 1;
//Get max id value of docPr elements
foreach (wp.DocProperties docPr in mainPart.Document.Descendants<wp.DocProperties>())
{
uint id = docPr.Id;
if (id > max)
max = id;
}
return max;
}

Almost done! The last bit of work we need to do is add the unique name and id to chart drawing and then add everything to the inline drawing object. Here is the code snippet to accomplish these tasks:

//Give the chart a unique id and name
wp.DocProperties docPr = new wp.DocProperties();
docPr.Name = chartName;
docPr.Id = GetMaxDocPrId(mainPart) + 1;
//add the chart data to the inline drawing object
inline.Append(docPr, clonedGraphic);
drawing.Append(inline);

End Result

Putting everything together and running this code, we end up with a Word document, which contains the chart imported from my spreadsheet.

Here is a screenshot of the final document:

Zeyad Rajabi

How to Assemble Multiple PowerPoint Decks

I just want to let you guys know we are working on some server issues here, which is why some of the previous posts are not showing images or links to source code. This issue will hopefully be resolved soon.

In my previous post, I showed you the easy way to assemble multiple Word documents. Today, I am going to show you how to assemble multiple PowerPoint decks together.

Solution

To merge two decks, a source deck and a destination deck, together we need to take the following actions:

  1. Open up the destination deck via the Open XML SDK and access its main presentation part
  2. Open up the source deck and access its main presentation part
  3. Add every slide in the source deck, in order, to the destination deck
  4. For every slide added to the destination deck, make sure that there is a relationship between the main destination presentation part and the copied master slide layout part
  5. For every added master slide layout part make sure there is a reference to that part within the main destination part
  6. For every added slide part make sure there is a reference to that part within the main destination part
  7. Perform cleanup work to ensure that all references to slide layout parts have unique ids
  8. Save changes made to the destination deck

My post will talk about using version 2 of the SDK.

For the sake of this post, let's say I am starting with the following two decks, each with three slides:

Destination Deck

Source Deck

Destination deck with three slides

Source deck with three slides

If you just want to jump straight into the code, feel free to download this solution here.

The Code

As described in the solution section above, the first three steps require us to open both the destination and source decks in order to add every slide in the source deck to the destination deck. Below is the code snippet necessary to accomplish those tasks:

static void MergeDecks(string sourceDeck, string destDeck)
{
int id = 1;
//Open up the destination deck
using (PresentationDocument myDestDeck = PresentationDocument.Open(destDeck, true))
{
PresentationPart destPresPart = myDestDeck.PresentationPart;
//Open up the source deck
using (PresentationDocument mySourceDeck = PresentationDocument.Open(sourceDeck, true))
{
PresentationPart sourcePresPart = mySourceDeck.PresentationPart;
//Need to get a unique ids for slide master and slide lists //(will use these later)
uniqueId = GetMaxIdFromChild(destPresPart.Presentation.SlideMasterIdList);
uint maxSlideId = GetMaxIdFromChild(destPresPart.Presentation.SlideIdList);
//Copy each slide in my source deck in order to my destination deck
foreach (SlideId slideId in
sourcePresPart.Presentation.SlideIdList)
{
SlidePart sp;
SlidePart destSp;
SlideMasterPart destMasterPart;
string relId;
SlideMasterId newSlideMasterId;
SlideId newSlideId;
//come up with a unique relationship id
id++;
sp = (SlidePart)sourcePresPart
.GetPartById(slideId.RelationshipId);
relId = sourceDeck.Remove(sourceDeck.IndexOf('.')) + id;
destSp = destPresPart.AddPart<SlidePart>(sp, relId);
//Add the slide part to the destination deck    
SlidePart destSp = destPresPart.AddPart<SlidePart>(sp, relId);
...
}
...
}
...
}
}

We needed to go through the slide id list in order to ensure that we added all the slides from the source deck to the destination deck in order. In addition, note that we also taking advantage of AddPart, which allows us to not only add a slide part, but all parts referenced by that slide part.

At this point we have just imported the slide part to the destination deck. Our next task is to fix up the relationship between the main destination presentation part to the added slide master part. Again, we are going to take advantage of AddPart to add this relationship with the following code (AddPart will fix up relationships if the part already exists in the package):

//Master part was added, but now we need to make sure the relationship is in place
destMasterPart = destSp.SlideLayoutPart.SlideMasterPart;
destPresPart.AddPart(destMasterPart);

The next steps are to add the list of slide master part ids and slide ids to the appropriate lists in the main destination presentation part. The ids referenced in each of these lists need to be unique. An additional constraint is that the id values across the slide master id and the slide layout id lists need to be unique and are required to have id values that are greater than or equal to 2147483684. To help deal with the uniqueness factor we need a method that is able to return the current max id value in the list. So anytime we add a new item to the list we simply use the max id value and add 1. Below is a generic method that is able to retrieve the max id value in a set of children elements:

static uint GetMaxIdFromChild(OpenXmlElement el)
{
uint max = 1;
//Get max id value from set of children
foreach (OpenXmlElement child in el.ChildElements)
{
OpenXmlAttribute attribute = child.GetAttribute("id", "");
    
uint id = uint.Parse(attribute.Value);
if (id > max)
max = id;
}
return max;
}

This method is called before the initial for loop in order to get the appropriate unique ids for the slide master part id and slide id lists. Once we have the unique ids we can add the slide master part id and slide id to the appropriate lists in the main destination presentation part. Below is the necessary code needed to add to these two lists:

//Add slide master to slide master list
uniqueId++;
newSlideMasterId = new SlideMasterId();
newSlideMasterId.RelationshipId = destPresPart.GetIdOfPart(destMasterPart);
newSlideMasterId.Id = uniqueId;
//Add slide to slide list
maxSlideId++;
newSlideId = new SlideId();
newSlideId.RelationshipId = relId;
newSlideId.Id = maxSlideId;
destPresPart.Presentation.SlideMasterIdList.Append(newSlideMasterId); destPresPart.Presentation.SlideIdList.Append(newSlideId);

Almost done! The last bit of work we need to do is ensure that all slide layout ids are unique. As mentioned above these id values cannot conflict with the id values of the slide master part id list. In addition we need to make sure that these ids are greater than 2147483684. Below is a method that will go through all slide master parts and will fix up all referenced slide layout ids by simply incrementing id values based on the maximum seen uint value seen thus far:

static void FixSlideLayoutIds(PresentationPart presPart)
{
//Need to make sure all slide layouts have unique ids
foreach (SlideMasterPart slideMasterPart in presPart.SlideMasterParts)
{
foreach (SlideLayoutId slideLayoutId in slideMasterPart.SlideMaster.SlideLayoutIdList)
{
uniqueId++;
slideLayoutId.Id = (uint)uniqueId;
}
slideMasterPart.SlideMaster.Save();
}
}

End Result

Putting everything together and running this code, we end up with a presentation that has six slides, all in the proper order.

Here is a screenshot of the final presentation:

Final merged deck with six slides

Zeyad Rajabi

How to Copy a Worksheet within a Workbook

As promised in my previous post about reusable methods for manipulating WordprocessingML, I mentioned I would write a post on how to copy a worksheet within a workbook. Note that this blog post talks about copying a worksheet within the same package. Perhaps sometime in the future I will write a post on how to export/import worksheets across different workbooks.

Solution

To copy a worksheet within a workbook we need to take the following actions:

  1. Open up the Spreadsheet document via the Open XML SDK
  2. Access the main workbook part, which will give us access to a bunch of related parts, like the different worksheets
  3. Access the worksheet we want to copy
  4. Clone the found worksheet plus all related parts and add the clone plus all related parts back to the workbook
  5. Perform cleanup work to ensure that tables, views, etc. work
  6. Add the newly created worksheet reference to the sheets list in the main workbook part
  7. Save changes made to the workbook

My post will talk about using version 2 of the SDK.

For the sake of this post, let's say I am starting with a pretty complex workbook, which contains data, conditional formatting, a shape, an image, a table, a SmartArt, and a chart. The workbook contains three worksheets and looks like the following:

Screenshot of sample Excel workbook

If you just want to jump straight into the code, feel free to download this solution here.

AddPart<T>() vs. AddNewPart<T>()

Before we get into the details of the steps listed above, I wanted to take this opportunity to discuss the difference between two methods that the SDK provides for adding parts to an Open XML package. The AddNewPart method does the following:

  1. It creates an empty part of type T and adds it to the package
  2. Once the part is created it then adds a reference from the referencing part to the new part

The next step after adding a new part via this method is usually calling FeedData() to stream in data into the part.

The AddPart method does the following:

  1. If the added part is not already in the package, it will add that part plus all of its related parts to the package. So if you are adding part A and part A references part B, which in turns references part C, then calling this method will add part A, part B, and part C. In addition it will ensure that each of these added parts will maintain their relationships. You can think of this functionality as something similar to a deep clone import
  2. If the added part is already in the package, it will add a reference from the referencing part to the part already in the package. For example, let's say you have part A and part B in the package, but part A does not reference part B. Calling this method will then add a reference from part A to part B if the Open XML format supports such a reference

As you can see AddPart is a lot more powerful than simply calling AddNewPart. This fact will be useful when I show you how to clone a part within a package.

The Code

As described in the solution section above, the first three steps require us to open the workbook and get access to the worksheet we want to copy. Below are the code snippets necessary to accomplish those tasks:

static void CopySheet(string filename, string sheetName, string clonedSheetName)
{
//Open workbook
using (SpreadsheetDocument mySpreadsheet = SpreadsheetDocument.Open(filename, true))
{
WorkbookPart workbookPart = mySpreadsheet.WorkbookPart;
//Get the source sheet to be copied
WorksheetPart sourceSheetPart = GetWorkSheetPart(workbookPart, sheetName);
...
}
}

Below is the snippet necessary to get a worksheet part based on the sheet name:

static WorksheetPart GetWorkSheetPart(WorkbookPart workbookPart, string sheetName)
{
//Get the relationship id of the sheetname
string relId = workbookPart.Workbook.Descendants<Sheet>()
.Where(s => s.Name.Value.Equals(sheetName))
.First()
.Id;
return (WorksheetPart)workbookPart.GetPartById(relId);
}

Now that we have access to the worksheet part we want to copy, we need to perform our clone task. Well, here is where I am going to take advantage of our AddPart functionality. Perhaps in a future build of our SDK we will actually have a clone method for parts. As mentioned above, AddPart is great at adding a part plus all referenced parts. Unfortunately, this functionality only works when adding a part that does not already exist in a package. Well, to work around this issue we can simply call AddPart to a temporary workbook and then call AddPart again back into the main workbook. The following code accomplishes this task:

static void CopySheet(string filename, string sheetName, string clonedSheetName)
{
...
//Take advantage of AddPart for deep cloning
SpreadsheetDocument tempSheet = SpreadsheetDocument.Create(new MemoryStream(), mySpreadsheet.DocumentType);
WorkbookPart tempWorkbookPart = tempSheet.AddWorkbookPart();
WorksheetPart tempWorksheetPart = tempWorkbookPart.AddPart<WorksheetPart>(sourceSheetPart);
//Add cloned sheet and all associated parts to workbook
WorksheetPart clonedSheet = workbookPart.AddPart<WorksheetPart>(tempWorksheetPart);
...
}

At this point in time, we have successfully cloned the worksheet and added it plus all related parts into the workbook. We are almost done...

The next thing we need to do is perform a couple of cleanup tasks. For example, SpreadsheetML requires that every table has a unique name and id. In addition, there really should be only one worksheet that is set as the main view. The following code shows you how to clean these issues up:

static void CopySheet(string filename, string sheetName, string clonedSheetName)
{
...
//Table definition parts are somewhat special and need unique ids...so let's make an id based on count
int numTableDefParts = sourceSheetPart.GetPartsCountOfType<TableDefinitionPart>();
tableId = numTableDefParts;
//Clean up table definition parts (tables need unique ids)
if (numTableDefParts != 0)
FixupTableParts(clonedSheet, numTableDefParts);
//There should only be one sheet that has focus
CleanView(clonedSheet);
...
}

Clean the view means just remove any view reference in the cloned worksheet.

static void CleanView(WorksheetPart worksheetPart)
{
//There can only be one sheet that has focus
SheetViews views = worksheetPart.Worksheet.GetFirstChild<SheetViews>();
if (views != null)
{
views.Remove();
worksheetPart.Worksheet.Save();
}
}

Fix up the table parts simply means make sure each table has a unique id and name.

static void FixupTableParts(WorksheetPart worksheetPart, int numTableDefParts)
{
//Every table needs a unique id and name
foreach (TableDefinitionPart tableDefPart in worksheetPart.TableDefinitionParts)
{
tableId++;
tableDefPart.Table.Id = (uint)tableId;
tableDefPart.Table.DisplayName = "CopiedTable" + tableId;
tableDefPart.Table.Name = "CopiedTable" + tableId;
tableDefPart.Table.Save();
}
}

Alright, last step is to add a reference to the added worksheet in the main workbook part with the following code:

static void CopySheet(string filename, string sheetName, string clonedSheetName)
{
...
//Add new sheet to main workbook part
Sheets sheets = workbookPart.Workbook.GetFirstChild<Sheets>();
Sheet copiedSheet = new Sheet();
copiedSheet.Name = clonedSheetName;
copiedSheet.Id = workbookPart.GetIdOfPart(clonedSheet);
copiedSheet.SheetId = (uint)sheets.ChildElements.Count + 1;
sheets.Append(copiedSheet);
//Save Changes
workbookPart.Workbook.Save();
...
}

End Result

Putting everything together and running my code, we end up with a workbook that has four sheets, where the last sheet in the workbook, called CopiedData, is an exact replica of the first sheet.

Here is a screenshot of the final workbook:

Screenshot of Excel workbook after running my code

Zeyad Rajabi

Reusable Methods for Manipulating Paragraphs in WordprocessingML

In a previous post, I showed you guys the easy way to merge multiple Word documents into one final document by taking advantage of altChunks. One issue with using altChunks is in order to view the final merged document you need an application, like Word, that understands altChunks and is able to actually perform the complex merge tasks. What happens if you don't have the luxury of using Word, or any other application that understands altChunks? Well, then you are required to manually merge the documents together.

Manually merging content within the same or a different document is possible, but requires you to deal with certain issues. There are a number of things you need to consider before you can call your merge task complete. Here are a few example complexities:

  • Styles
    • Does your content reference any styles?
    • Does your destination document also reference those styles?
    • Are there any style conflicts between the source and destination documents? For example, does the source document specify bold for style "Foo" while the destination document specify italics for style "Foo?"
    • Are the document defaults and Normal style definitions different between the source and destination files?
  • Numbering
    • Does your content reference any numbering?
    • Does your source document reference a numbering definition that already exists in the destination document?
    • Do you want continue numbering or restart numbering for copied content?
  • References
    • Does your content reference other parts, like images, comments, headers/footers, etc.?
  • Range Elements
    • Does your content contain range based elements, like bookmarks, content controls, custom xml, etc.?

The issues listed above are just some of the things you need to think about before you can accomplish manual merging. Sounds like a lot of work, but I do have some good news.

Eric White recently wrote a post, where he talks about how we have extended the Power Tools for Open XML to include functionality around manipulating, inserting, and deleting paragraphs within a Wordprocessing document. The great thing about these Power Tools is that they are completely open source under the Microsoft Public License (Ms-PL). That means you can freely deploy solutions that use any of the code within the Power Tools. Another cool piece of information is that these tools are built off of version 1 of the SDK, which has a "go-live" license. In other words, there is nothing stopping you from reusing any of the code within Power Tools for your own solution. Perhaps in a later post I will build on top of these libraries to accomplish a rich end-to-end scenario.

Next Time

In my last post, I got a request from Anthony Rubalcaba to write a post on copying a spreadsheet within a workbook. So, next week I will show you how to accomplish this scenario.

Zeyad Rajabi

More Posts Next page »
 
Page view tracker