Welcome to MSDN Blogs Sign in | Join | Help

Open XML SDK Sessions at the Professional Developers Conference (PDC)

I'm pretty excited, next week I'll be at PDC in Los Angeles giving another presentation on the Open XML SDK. I'll be at the conference from Nov 17-19, so if you're at PDC do come by and say hi. Here's the description and time of the session I'll be presenting:

Session Title

Time & Date

Session Code

Description

Document Assembly and Manipulation on Microsoft SharePoint Server 2010 Using Word Automation Services and Open XML

11/19/2009 11:30AM

@ 408B

PR09

Come learn about the Open XML SDK, which provides a set of .NET APIs to help developers create and manipulate documents in the Open XML Formats, and the Microsoft Office services available in SharePoint Server 2010. Hear how Word Automation Services and Excel Services can be used to build solutions for server environments without the need of the Office clients. Come and check out all the demos and free source code.

Let me know if there are specific demos you would like to see. Besides the presentation, you will be able to find me at the Office client booth during the week of PDC and at the "Ask the Experts" event. I'm looking forward to hearing feedback from you guys at the conference.

Zeyad Rajabi

Open XML and Office Services

If you didn't see the news up on the Word team blog announcing Word Automation Services, and you have any interest in server side conversion of .docx files into .pdf or .xps, you should definitely go take a look: http://blogs.msdn.com/microsoft_office_word/archive/2009/10/26/introducing-word-automation-services.aspx

Capturing Business Processes in Office

I see a lot of different types of solutions people build to make their workgroups run more accurately and efficiently. Many of the solutions are created in order to codify one or more pieces of a business process, and as you can imagine, Office plays a huge role in these processes. For example, there may be people using Outlook for communication; Excel for analysis; Word for documentation; and just about any process ends in PowerPoint where you present the results of the work.

As we see the shift from a focus on an individual's productivity to a focus on the entire workgroup's productivity, it also means the way in which people program against our applications changes. This is why you see a lot more SharePoint development in addition to traditional Office development (I briefly mentioned this over the summer). It's also why the Open XML formats and the Open XML SDK were such important investments for us… they help the Office applications continue to play an important role in these workgroup level scenarios.

Printing and Re-Calcing on the Server

For example, one request we've heard from folks working with the Open XML formats on the server is that they want access to some core client functionality, like printing/layout in Word, or calculations in Excel. A couple weeks ago at the SharePoint conference we announced the next version of Excel Services, as well as a new service called Word Automation Services. These two services are great resources for people doing server side document assembly/manipulation with Open XML. The Open XML formats are great for consuming or generating content for dynamic document assembly scenarios, but often there is also the need to recalc the model, refresh the charts, or print the document, and you don't want to necessarily call back to the client to do this.

Server Side Document Assembly Example

A quick example to help explain this is a process I've seen a number of times, where banks were trying to do bulk generation of loan applications (thousands a day). There are end users involved in creating the template for the loan application, as well as building out the financial models for how the terms of the loan are determined, but the generation of the individual applications was a bulk server-side process based an incoming list of applicants. The combination of Open XML and content controls makes it pretty easy to create .docx files on the server, but in this scenario the banks wanted to send out either PDFs or hard copies, which meant you needed to include the Word client in the process (and it was obviously the big bottle neck).

With the new services announced at the SharePoint conference though, you get the perfect mix of software + services to solve the scenario. The end users can continue to work in the applications they are familiar with (Word & Excel) to build out the template and financial model, and the bulk generation can all happen on the server in an automated process:

Here's a quick step by step explanation of the process:

  1. Client-side: Loan Template author generates the template for the loan application, and uses content controls to specify where the data should go. He saves it up to SharePoint so others can collaborate with him.
  2. Client-side: Folks from the Legal department are able to work with in the document at the same time as the Template author because of the new co-authoring functionality in Word 2010.
  3. Client-side: A financial analyst builds up a model for determining whether an applicant should be considered and what the terms of the loan should be. This model is saved up to SharePoint as an .xlsx
  4. Server-side: As new applicants request an application, a server side process takes their data, and uses the Open XML SDK to inject it into xlsx. (Example 1; Example 2)
  5. Server-side: The financial model (.xlsx) is then sent off to Excel Services to perform calculations and pull out the results
  6. Server-side: The process takes the results of the calculation, and injects them into the Word template, using the content controls to determine what data should go where, producing a .docx file. (Example 1; Example 2)
  7. Server-side: The .docx file is passed off to Word services where a .pdf file is generated, which can either then be sent on to a high volume printer, or e-mailed directly to the applicant.
  8. Client-side: Any of the users can make updates to the documents (assuming this is allowed as part of the workflow), and those changes will automatically make their way into the bulk generation process.

So, you can see this is a pretty basic but also extremely power scenario, and if you've been reading Zeyad's examples over the past year or so you know how easy this can be with the Open XML SDK. I've been excited about these technologies for a few years now, and it's great that we can finally start talking publicly about them. Zeyad had a couple great demos at the SharePoint conference that really helped show the value of these services in combination with the Open XML formats. We'll post a video of the presentation as soon as it comes available, and will also have separate blog posts that drill into each one over the coming months.

-Brian

Migrating your custom solutions to Office 14

For those folks who have custom Office solutions within their enterprise, one area that you often need to focus on during the upgrade cycle is testing out those custom solutions to make sure they'll run in the latest version. With Office 14, we're going to provide a great set of tools to make it even easier for folks tasked with deployment. Michael Kiselman provided a great overview today up on Gray Knowlton's blog: http://blogs.technet.com/gray_knowlton/archive/2009/10/22/announcing-the-office-2010-application-compatibility-program.aspx

We always focus heavily on testing application compatibility from version to version, and in most cases applications will continue to run just like they did in the older version. From time to time though there are cases where a custom application may need to be updated. The tools Michael describes help with both the discovery and migration processes.

-Brian

Posted by BrianJones | 0 Comments

Open XML SDK Demos Shown at the SharePoint Conference (SPC)

First off, I want to thank everyone who attended my two sessions at SPC on developing solutions on top of the Open XML SDK. The sessions were recorded, so as soon as they are published to the web I will send you guys some links. In the meantime, I want to share with you guys some really cool demos and source code I presented at the sessions.

Here is a link to all the demos that I showed, plus a bunch more: http://zeyadrajabi.members.winisp.net/sourcecode/Open XML demos.zip

This zip file includes demos I have already blogged about plus a few new solutions around using the Open XML SDK with Office Services. Expect to see future blog posts for the demos that haven't been blogged about yet.

Zeyad Rajabi

Open XML SDK Sessions at the SharePoint Conference (SPC)

We're just a couple of weeks away before the start of the SharePoint Conference (SPC) being held in Las Vegas. I want to give you guys a heads up that I will be at the conference from Oct 19-22. I'm pretty excited because I will be giving two different presentations on the Open XML SDK at the conference. Here's the description and times of the two sessions I'll be doing:

Session Title

Time & Date

Session Code

Description

SharePoint 2010 Based Document Assembly and Manipulation using Word Automation Services and Open XML

10/20/2009 10:30 AM

@ Mandalay Bay H

SPC349

The Open XML Formats are the new default file formats for Word, Excel and PowerPoint (docx, xlsx, pptx). With the Open XML SDK, which provides a set of .NET APIs that allows developers to create and manipulate documents in the Open XML Formats, and the Office services available on SharePoint 2010 (Word Automation Services, which performs high-performance bulk document conversions, and Excel Services, which provides server-side support for spreadsheet calculation), developers can now build solutions for server environments without the need of the Office clients. With just a few lines of code you will be able to create rich solutions, like document assembly. Come and check out all the demos and free source code.

Deep Dive Open XML and the Open XML SDK

10/20/2009 1:15 PM

@ Mandalay Bay J

SPC402

This session will introduce you to Open XML and the tools you use to develop Office document solutions. Using Open XML you have the power to create and edit documents on the server without needing to resort to COM based Office automation. You will get a basic understanding of how to use Open XML for authoring documents, spreadsheets and presentations. You will learn how to leverage the Open XML SDK and other various tools to make your development life easier. After this session is complete you will know everything you need to get going with building document automation processes. Come and check out all the demos and free source code.

Let me know if there are specific demos you guys would like to see and I'll try to work on getting them included. After my presentations, I will make available all my demos and source code on this blog. Besides the presentations, you will be able to find me at the Office client booth during the week of SPC and at the "Ask the Experts" event on Wednesday October 21. If you're at SPC do come by and say hello. I always look forward to hearing feedback from you guys at these conferences.

Zeyad Rajabi

Finding Open XML Errors with Open XML SDK Validation

In a previous post, I gave you an overview of the functionality added to the Open XML SDK 2.0 August 2009 CTP. Today, I want to deep dive into the schema and semantic level validation support within the SDK. Specifically, I am going to show you guys the Open XML SDK code needed to actually validate your Open XML files.

If you've played around with manipulating Open XML files there is a good chance at one point in time your resulting document was considered invalid or corrupt by the applications. You've probably even seen one of these dialogs:

What do you do when you get into this state? A lot of the time the application error dialogs don't really help you debug the issue. Well, that's where the Open XML SDK can help you out. With just a few lines of code you can identify key pieces of information that tell you what the error is and where to find it within the package. Validation with the Open XML SDK 2.0 is accomplished via the OpenXmlValidator class. This class allows you to enumerate all the errors within a file, where each error is represented via the ValidationErrorInfo class. The ValidationErrorInfo class stores the following information:

  • User friendly description of the error
  • An XPath to the exact location of the error
  • The part where this error exists
  • Other elements or parts that are related to this error

Here is a code snippet you can reuse to validate Word documents:

try
{
OpenXmlValidator validator = new OpenXmlValidator();
int count = 0;
foreach (ValidationErrorInfo error in validator.Validate(WordprocessingDocument.Open("InvalidFile.docx", true)))
{
count++;
Console.WriteLine("Error " + count);
Console.WriteLine("Description: " + error.Description);
Console.WriteLine("Path: " + error.Path.XPath);
Console.WriteLine("Part: " + error.Part.Uri);
Console.WriteLine("-------------------------------------------");
}
Console.ReadKey();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}

The same code can be used to validate Excel and PowerPoint documents. All you need to do is change the Open method to be one of the following:

foreach (ValidationErrorInfo error in validator.Validate(PresentationDocument.Open("InvalidFile.pptx", true)))

or

foreach (ValidationErrorInfo error in validator.Validate(SpreadsheetDocument.Open("InvalidFile.xlsx", true)))

Pretty simple stuff! If you want to jump straight into the code, feel free to download this solution here.

Let's walk through an example of validating and fixing an example corrupt Word document. Given this corrupt document, the Open XML SDK detects the following errors:

Let's look at each of these errors.

Error 1

  • Description: The attribute 'http://schemas.openxmlformats.org/wordprocessingml/2006/main:rsidR' has invalid value '006B4C'. The actual length according to datatype 'hexBinary' is not equal to the specified length. The expected length is 4.
  • Path: /w:document[1]/w:body[1]/w:p[1]
  • Part: /word/document.xml

Let's take a look at the xml within the main document part:

The error indicates that the length of the value for rsidR is not correct. We can fix this issue by changing the value to 00006B4C.

Error 2

  • Description: Element 'DocumentFormat.OpenXml.Wordprocessing.Footnote' referenced by 'footnoteReference@id' does not exist in part '/word/footnotes.xml'. The reference value is '3'.
  • Path: /w:document[1]/w:body[1]/w:p[6]/w:r[2]/w:footnoteReference[1]
  • Part: /word/document.xml

Let's take a look at the xml within the main document part:

Let's take a look at the xml within the footnotes part:

The error indicates that there is a reference to a footnote using the value "3", but no such value exists in the footnotes part. Let's go ahead and change the footnoteReference to have a value of "2".

Error 3

  • Description: Attribute 'id' should have unique value. Its current value '1' duplicates with others.
  • Path: /w:endnotes[1]/w:endnote[4]
  • Part: /word/endnotes.xml

Let's take a look at the xml within the endnotes part:

The error indicates that that more than one endnote specify the same id value. Let's go ahead and change the values to be unique.

End Result

After making these fixes we should be able to open the fixed document with no issues as shown below:

Try out the validation functionality and let us know what you think.

Zeyad Rajabi

Open XML SDK Code Snippets

In my previous post, I announced the release of the Open XML SDK August 2009 CTP. Today, I want to announce the release of the Open XML SDK code snippets. This package of code snippets provides over fifty reusable code samples, in both C# and VB.NET, which accomplish many common tasks involving Excel, PowerPoint, or Word documents. Looking back at the architecture diagram for the Open XML SDK 2.0, these code snippets are part of the high level helper functions:

Architecture diagram

Erika Ehrli provided a quick summary on all the code snippets in the following post.

Using the Open XML SDK Code Snippets

Let's walk through a quick example of using the Open XML SDK code snippets. In this example, we are given a spreadsheet with a table of data and are asked to read and change a particular cell value. Here is a screenshot of the spreadsheet:

Spreadsheet example

Let's say we are asked to read the value of C4 and then change the value from "Austin" to "Houston".

Here is how you would accomplish this scenario using the Open XML SDK code snippets and the Open XML SDK 2.0:

  1. Create a solution in Visual Studio 2008
  2. Add references to the Open XML SDK 2.0 (DocumentFormat.OpenXml.dll) and WindowsBase.dll
  3. Enable the code snippets for your solution by following these steps
  4. To read a cell value, add a new method to your solution based on the Open XML SDK code snippets. In particular, add the "Excel: Get cell value given row and column" code snippet, which retrieves a cell value given its row and column numbers, or a row number and column name
  5. Use the following code to read and display the value for C4:

    string c4Value = XLGetCellValueRowCol("output.xlsx", "Sheet1", "C", 4); Console.WriteLine("The value for C4 is: " + c4Value);

  6. To change a cell value, add a new method to your solution based on the "Excel: Insert string into cell" code snippet, which given a document name, a worksheet name, a cell name, and a value, inserts text into the specified cell
  7. Use the following code to change the value of C4 to "Houston"

XLInsertStringIntoCell("output.xlsx", "Sheet1", "C4", "Houston");

At the end of step #7 we end up with the following Excel spreadsheet:

Spreadsheet end result

Pretty easy with the code snippets! If you are interested in the full solution you can find it here.

Zeyad Rajabi

Announcing the Release of the August 2009 CTP for the Open XML SDK

I'm really happy to announce the release of the 3rd CTP for the Open XML SDK 2.0 for Microsoft Office! So what did we do in this CTP? Well, there were three main improvements we made to the SDK:

  1. Add semantic level validation support
  2. Add markup compatibility/extensibility support
  3. General improvements based on your feedback

Semantic Level Validation Support

Let's go back to the Open XML SDK architecture diagram I showed you when we first announced the Open XML SDK:

As mentioned in a previous post, the April 2009 CTP of the Open XML SDK added schema level validation support for Office 2007 Open XML files. In the August 2009 CTP, one of the big things we added is semantic level validation support for Office 2007 Open XML files:

Semantic level validation goes beyond restrictions or rules defined by schemas. Semantic level validation allows developers to validate files against restrictions defined within the prose of the Open XML documentation. These are restrictions, which cannot be expressed in an XSD language.

Let's look at a semantic level restriction example. Specifically, let's look at the element endnote (Section 17.11.2 of Part 1 in the ISO/IEC-29500 specification). In the standard, it states that the id attribute of endnote, "specifies a unique ID which shall be used to match the contents of a footnote or endnote to the associated footnote/endnote reference mark … If more than one footnote shares the same ID, then this document shall be considered non-conformant. If more than one endnote shares the same ID, then this document shall be considered non-conformant." As you can see, having more than one endnote with the same id value will result in a non-conformant document. This non-conformant document may not be interpreted properly by a consuming application, like Word.

The Open XML SDK can now help you find these types of problems and will report the error to you by giving you the following information:

  1. User friendly description of the error
    • In this case, imagine seeing the following error "Attribute 'id' should have unique value. Its current value '1' duplicates with others."
  2. An Xpath to the exact location of the error
    • In this case, imagine seeing the following path "/w:endnotes[1]/w:endnote[4]," which indicates that the problem exists in the fourth endnote element
  3. The part where this error exists
    • In this case, imagine seeing the following part information "DocumentFormat.OpenXml.Packaging.EndnotesPart"

We hope that you can use this type of information to more easily find and fix problems. I will devote at least one blog post in the future to go into details on the validation functionality.

Markup Compatibility/Extensibility Support

As defined by the ISO/IEC-29500 specification, there are several ways to extend markup within the Open XML formats. Some of the extension mechanisms, like ignorable content and alternate content blocks, may result in differences within the XML tree structure of a document. Here is an example of markup that contains an alternate content block:

<w:document mc:Ignorable="w14 wp14">
<w:body>
<w:p w:rsidR="00FA0A01" w:rsidRDefault="00AF5A8F">
<w:r>
<w:rPr/>
<mc:AlternateContent>
<mc:Choice Requires="wps">
<w:drawing>…… </w:drawing>
</mc:Choice>
<mc:Fallback>
<w:pict>
<v:roundrect id="Rounded Rectangle 1" o:spid="_x0000_s1026" style="position:absolute… " arcsize="10923f" o:gfxdata="" fillcolor="#4f81bd" strokecolor="#385d8a" strokeweight="2pt">
<v:textbox style="mso-rotate-with-shape:t"/>
</v:roundrect>
</w:pict>
</mc:Fallback>
</mc:AlternateContent>
</w:r>
</w:p>
<w:sectPr w:rsidR="00FA0A01">…… </w:sectPr>
</w:body>
</w:document>

In the example above, the expected child of the run element differs depending on the chosen alternate content choice. The fallback choice is what one would expect from a document created in Office 2007, while the choice requiring the wps namespace is from a document created in Office 2010. Imagine you are a solution developer working with Open XML who has deployed a solution that works perfectly on top of Office 2007 Open XML files. How would your solution work with files coming in from Office 2010? Specifically, would your solution work with documents that contain these types of extension mechanisms?

As part of the August 2009 CTP we have added functionality that allows developers to abstract away some of the difficulty intrinsic with markup compatibility and extensibility. This feature allows you to preprocess the content of Open XML files based on specific Office versions. Using the example above, if we use the August CTP to open the document based on Office 2007 we will only see the following XML markup:

<w:document>
<w:body>
<w:p w:rsidR="00FA0A01" w:rsidRDefault="00AF5A8F">
<w:r>
<w:rPr/>
<w:pict>
<v:roundrect id="Rounded Rectangle 1" o:spid="_x0000_s1026" style="position:absolute… " arcsize="10923f" o:gfxdata="" fillcolor="#4f81bd" strokecolor="#385d8a" strokeweight="2pt">
<v:textbox style="mso-rotate-with-shape:t"/>
</v:roundrect>
</w:pict>
</w:r>
</w:p>
<w:sectPr w:rsidR="00FA0A01">…… </w:sectPr>
</w:body>
</w:document>

If your solution expected a pict element as a child of a run element, then your solution would work perfectly with this file. In other words, using this feature, solutions won't break when future versions of Office introduce new markup into the format.

General Improvements

First off we want to thank everyone for their feedback and suggestions! Based on your feedback we made the following big changes to the SDK:

  • AutoSave: By default, previous CTPs of the SDK forced you to perform a manual save for changes made to specific parts within the package. We have now introduced the concept of AutoSave, where changes would automatically be saved into the package, without the need to call Save() methods. For those not interested in this functionality, there is a way to turn off this feature
  • Base Classes for CustomXml objects and Sdt objects: The SDK currently has multiple classes to represent CustomXml and Sdt objects based on the different types of elements specified in the standard. The August 2009 CTP has introduced one base class for each of these objects in order to make it easier for you to develop solutions. In other words, your solution can now just work on the following two abstract classes: CustomXmlElement for CustomXml objects and SdtElement for Sdt objects
  • Simple types for Boolean type attributes: The standard specifies the concept of a simple type called ST_OnOff, which allows for values like "On", "Off", "True", "False", "0", and "1." We have updated the SDK to allow you to directly get/set such attributes using standard C# Boolean values. For example, you can now set attribute values to false or true. Without this enhancement you were forced to compare values using the enum BooleanValues

What's Next?

Our next task for the SDK is to add Office 2010 Office Open XML support. Expect to see another CTP in the next several months released with this functionality. Our goal is to be done with the Open XML SDK 2.0 around the same time as Office 2010 ships (date not public yet).

More Feedback Always Welcome

Please continue to send us your feedback, either on this blog or at our Microsoft Connect site for the Open XML SDK https://connect.microsoft.com/site/sitehome.aspx?SiteID=589. We look forward to hearing from you.

Zeyad Rajabi

Native Code Open Packaging Convention APIs

In my introduction post on the Open XML SDK I mentioned that the SDK is built on top of System.IO.Packaging. System.IO.Packaging is a set of APIs that are part of .NET 3.0, which allow developers to create and manipulate documents based on the Open Package Convention (OPC). Given that Open XML Formats are based on OPC, the SDK uses System.IO.Packaging APIs to open, edit, create, and save Open XML packages.

Some of you have left comments on this blog and on the Open XML SDK forum asking about native code APIs to manipulate Open XML files. I have some good news for you guys. Included in Windows 7 will be all new native code Win32 Packaging APIs. For more information on this new API, check out the OPC team blog. In particular check out the post comparing the managed version of the API with the native code version.

Zeyad Rajabi

Adding Repeating Data to PowerPoint

In a previous post, I showed you how to create a product catalog in a Word document using the Open XML SDK. I also showed you how to make PowerPoint a reporting application based on data within a database. Today, I am going to show you how to create a product catalog in a PowerPoint deck using the Open XML SDK. One of the key things I am going to show you in this post is how to create repeating data within a table on a slide.

If you want to jump straight into the code, feel free to download this solution here.

Scenario

Imagine a scenario where I'm a developer working for a fictional company called Adventure Works. In my company, we use a database to store our entire product inventory. The sales and marketing teams have asked me to build a report generation tool that is able to take the list of products and create a viewable presentation. In other words, these teams want to use a PowerPoint deck to showcase all our company's products.

Solution

Before I get into the details of my solution I want to state that I am using the freely available Adventure Works database built for SQL Server 2008.

The scenario I listed above talks about reading product data and creating a report in the form of a presentation. The products in the database are organized into categories, like clothing or bikes, and subcategories, like mountain bikes or road bikes. To make viewing the resulting presentation easier I want to make sure my slides are also organized and separated based on categories. To accomplish this task, the first thing I need to do is create a presentation template that I can use for my solution. In this case, my presentation template will contain three slides:

  1. Title slide – This slide represents the title of the presentation deck
  2. Template category section slide – This slide represents the divider that will separate different sections of my deck based on categories
  3. Template product table slide – This slide represents the table I will extend based on the products in my database

My presentation template will look like the following:

Given this template, here is one way to automatically generate a PowerPoint deck based off data from a database:

  1. Open up the template presentation via the Open XML SDK and access its main presentation part
  2. From the presentation part, access the two template slides (category section slide and product table slide)
  3. Connect to the Adventure Works database and query for the list of products sorted by category
  4. Go through every product returned in the query
  5. For every new category encountered, clone the template category section slide and change the placeholder text to be the actual category name
  6. Clone the template product table slide at the start of every new section or if the product table becomes too large and needs to be extended to the next slide
  7. For every product, add an image part to the slide and feed the image part with data from the database
  8. For every product, add a new row to the appropriate cloned table slide. These rows will contain five cells, where the first cell will contain a background image of the product, and the other four cells will contain text
  9. Delete the two template slides
  10. Save and close the presentation

The Code

I should note that this solution will reuse a bit of functionality and methods from my previous post on creating a presentation report based on data. Specifically, I am going to reuse the following methods:

  1. SlidePart CloneSlidePart(PresentationPart presentationPart, SlidePart slideTemplate)
    1. This method will be used to actually clone slide parts
  2. void SwapPlaceholderText(SlidePart slidePart, string placeholder, string value)
    1. This method will be used to swap placeholder text for a new value
  3. void DeleteTemplateSlide(PresentationPart presentationPart, SlidePart slideTemplate)
    1. This method will be used to delete the template slides. I extended this method to have an additional parameter in order to specify the relationship id of the part to be deleted. It seems that my previous post actually hardcoded the value in the method

In addition to the three above methods, I will also reuse functionality from my previous post on pushing data from a database into a Word document. Specifically, I am going to reuse the following method:

  1. void CalculateImageEmus(Bitmap bitmap, out int widthInEmu, out int heightInEmu)
    1. This method will be used to calculate image widths and heights in EMUs. The image height will be used to calculate the row height

Reusing these methods will actually make coding this solution a lot easier!

The first couple of steps require us to open the presentation template and access the two template slide parts. Here is the code snippet used to accomplish this task:

using (PresentationDocument myPres = PresentationDocument.Open("output.pptx", true))
{
PresentationPart presPart = myPres.PresentationPart;
SlidePart sectionSlidePart = (SlidePart)presPart.GetPartById("rId3");
SlidePart tableSlidePart = (SlidePart)presPart.GetPartById("rId4");
...
}

The next step is to query the database to retrieve a list of products sorted by category. Here is the code snippet used to accomplish this task:

AdventureWorksDataContext db = new AdventureWorksDataContext();
var productQuery =
from p in db.Products
join PM in db.ProductModels on p.ProductModelID equals PM.ProductModelID
join PSC in db.ProductSubcategories on p.ProductSubcategoryID equals PSC.ProductSubcategoryID
join PC in db.ProductCategories on PSC.ProductCategoryID equals PC.ProductCategoryID
orderby PC.Name, PSC.Name
select p;

Now we should be able to go through the list of products. As described in the solution section, for every product we encounter in our query we may need to add a new category slide or a new product table slide. We will add a new category slide if we encounter a new category (remember we are sorting our products based on category). Here is the code snippet used to accomplish this task:

string section = "";
foreach (var product in productQuery)
{
string category = product.ProductSubcategory.ProductCategory.Name;
string subcategory = product.ProductSubcategory.Name;
string model = product.ProductModel.Name;
string productName = product.Name;
decimal price = Math.Round(product.ListPrice, 2);
if (section != category)
{
SlidePart newSectionPart = CloneSlidePart(presPart, sectionSlidePart);
SwapPlaceholderText(newSectionPart, "Section Title", category);
section = category;
overflow = true;
}
...
}

We will add a new product table slide whenever we encounter the first product within a category or if adding a new row to the current slide's table would cause the table to be too high and off of the visible slide. Here is the code snippet used to accomplish this task:

bool overflow = false;
int totalHeight = 0;
SlidePart current = null;
foreach (var product in productQuery)
{
...
if (overflow)
{
SlidePart newTablePart = CloneSlidePart(presPart, tableSlidePart);
SwapPlaceholderText(newTablePart, "Section", category);
current = newTablePart;
overflow = false;
totalHeight = 0;
}
...
}

Notice that I am using the Boolean value of overflow to indicate whether or not a table has too much content in it already. The next step is to add an image for every product we encounter. In addition, we need to calculate the height of the image so that we can keep track of how much content is in the current table. The height of the image will be used for the height of the added row. If the height becomes too large then we will need to make sure that overflow is set to true. Here is the code snippet used to accomplish this task:

string imageRel = "imageRelId";
int imageRelId = 1;
...
ImagePart imagePart = current.AddImagePart(ImagePartType.Gif, imageRel + imageRelId);
imagePart.FeedData(new
MemoryStream(product.ProductProductPhotos.First().ProductPhoto.LargePhoto.ToArray()));
// We need to know the proper dimensions of the image in Emus
Bitmap bitmap = new Bitmap(new MemoryStream(product.ProductProductPhotos.First().ProductPhoto.LargePhoto.ToArray()));
CalculateImageEmus(bitmap, out widthInEmu, out heightInEmu);
totalHeight += heightInEmu;
if (totalHeight > 4000000)
overflow = true;

The next step is to add a new table row that contains five cells. The first cell is going to include the product image as a background image in the cell and the other four cells will contain text. Here is the code snippet used to create the new row:

A.Table tbl = current.Slide.Descendants<A.Table>().First();
A.TableRow tr = new A.TableRow();
tr.Height = heightInEmu;
tr.Append(CreateDrawingCell(imageRel + imageRelId));
tr.Append(CreateTextCell(category));
tr.Append(CreateTextCell(subcategory));
tr.Append(CreateTextCell(model));
tr.Append(CreateTextCell(price.ToString()));
tbl.Append(tr);
imageRelId++;

Note that variable "A" refers to

using A = DocumentFormat.OpenXml.Drawing;

I created two methods to create these two types of table cells. Here is the code snippet used to create a text cell:

static A.TableCell CreateTextCell(string text)
{
A.TableCell tc = new A.TableCell(
new A.TextBody(
new A.BodyProperties(),
new A.Paragraph(
new A.Run(
new A.Text(text)))),
new A.TableCellProperties());
return tc;
}

Here is the code snippet used to create an image cell:

static A.TableCell CreateDrawingCell(string relId)
{
A.TableCell tc = new A.TableCell(
new A.TextBody(
new A.BodyProperties(),
new A.Paragraph()),
new A.TableCellProperties(
new A.BlipFill(
new A.Blip() { Embed = relId },
new A.Stretch(
new A.FillRectangle()))));
return tc;
}

Almost done! The last step is to delete the two template slides. Here is the code snippet used to accomplish this task:

DeleteTemplateSlide(presPart, sectionSlidePart, "rId3");
DeleteTemplateSlide(presPart, tableSlidePart, "rId4");

End Result

Running this code I end up with a presentation that has over one hundred slides.

Here is another view of the output:

Pretty cool stuff. The best part about this solution is that a designer can easily change the look of the template and still have this solution work as expected. For example, here is the output after the template design has been changed (same code running):

I should also note that all the screenshots above were taken with Office 2010. In other words, this solution works in Office 2007 and Office 2010.

Zeyad Rajabi

Office Extensibility Blog Now Offered in Russian

I wanted to pass along some really exciting news. Brian Jones' blog is now available in Russian! Check it out here: http://blogs.msdn.com/brian_jones_ru/. Our plan is to continue localizing this blog into Russian so that we can share all the Office 2010 extensibility goodness to more people. Let us know if you want to see other Office blogs get localized to Russian as well.

Zeyad Rajabi

Posted by BrianJones | 8 Comments

The Open XML SDK and Fluent UI Extensibility

Fluent UI, or the Ribbon, was introduced as part of Office 2007 as a replacement of the previous system of toolbars and menus. The Fluent UI technology, like the Open XML formats, is based on xml, which allows for a much richer extensibility story for developers. In Erika Ehril's blog post she described several tools and resources related to Fluent UI extensibility. I wanted to take this opportunity to extend her post by showing you how the Open XML SDK can be used to extend or actually control custom UI within documents.

As is the case with other features in Open XML files, the Open XML SDK 2.0 for Microsoft Office supports Fluent UI through strongly typed access to the custom UI xml part as well strongly typed access to the underlying xml contained within the custom UI xml part. In other words, you can easily add, remove or modify custom UI for a particular document or set of documents using the SDK. In today's post, I am going to show you how to add custom UI to a set of documents within a directory.

If you want to jump straight into the code, feel free to download this solution here.

Scenario

Imagine a scenario where I would like to programmatically add custom UI to a set of documents based on custom UI within a specific template. These documents can exist within a directory or a SharePoint library. In this scenario, my template is going to contain the same custom UI as designed by Frank Rice in his Office developer resources ribbon UI addin blog post. For the sake of this example, let's say I am starting with the following Word document as my template:

This custom UI provides a tab called Office Developer Resources, which contains commands that take you to specific web sites. The code behind these commands is actually contained within macros within the file. In order to add this custom UI to other documents we need to copy over the custom UI as well as the macros that power those commands.

The Solution

To add custom UI to a set of documents we can take the following actions:

  1. Go through all files within a specific directory
  2. If a file is a macro-free document then convert the document to a macro-enabled document
  3. Open the template file using the Open XML SDK
  4. Grab the ribbon extensibility part (contains the custom UI)
  5. Grab the VBA project part (contains the macros)
  6. Open the converted file (or the original file if it was a macro-enabled file) and import both the ribbon and macro parts
    • The file can only contain one instance of each part so be sure to remove preexisting ribbon and macro parts from the file

Note that the steps outlined above are just one method to accomplish this scenario. Another more practical use of this solution is to go through a list of files contained within a SharePoint library instead of a directory on disk.

The Code

The first step as outlined in the solution section above requires us to go through a directory of files. Here is the code snippet to accomplish this task:

string newFileName = null;
 
string[] files = Directory.GetFiles(@"D:\Open XML SDK demos\Word\DeployCustomUI\DeployCustomUI\bin\Debug\Files");
 
foreach (string filename in files)
{
if (filename.EndsWith(".docx"))
{
newFileName = ChangeDocumentType(filename);
}
else if (filename.EndsWith(".docm"))
newFileName = filename;
 
ImportCustomUI(templateFile, newFileName);
}

If a file in the directory turns out to be a macro-free file we need to change the document type to be a macro-enabled file. Here is the code snippet used to accomplish this task:

static string ChangeDocumentType(string filename)
{
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(filename, true))
{
myDoc.ChangeDocumentType(WordprocessingDocumentType.MacroEnabledDocument);
}
 
string newFileName = Path.GetDirectoryName(filename) + @"\" + Path.GetFileNameWithoutExtension(filename) + "(was docx).docm";
 
File.Move(filename, newFileName);
File.Delete(filename);
 
return newFileName;
}

Notice how the Open XML SDK provides functionality to switch document types. At this point we are ready to import our custom UI from our template into our files within the directory. Below is the code snippet necessary to grab the custom UI and vba project parts from the template document:

static void ImportCustomUI(string templateFile, string outputFile)
{
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(templateFile, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
 
RibbonExtensibilityPart customRibbonPart =
myDoc.GetPartsOfType<RibbonExtensibilityPart>().First();
ExtendedPart vbaPart = null;
 
foreach (IdPartPair partPair in mainPart.Parts)
{
if (partPair.OpenXmlPart.RelationshipType == "http://schemas.microsoft.com/office/2006/relationships/vbaProject")
{
vbaPart = (ExtendedPart)partPair.OpenXmlPart;
break;
}
}
 
AddCustomUIParts(outputFile, customRibbonPart, vbaPart);
}
}

Now that we have the two parts from our template file we are ready to import them into our output file. Below is the code snippet necessary to accomplish this task:

static void AddCustomUIParts(string filename, RibbonExtensibilityPart customRibbonPart, ExtendedPart vbaPart)
{
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(filename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
 
if (myDoc.GetPartsCountOfType<RibbonExtensibilityPart>() > 0)
myDoc.DeletePart(myDoc.GetPartsOfType<RibbonExtensibilityPart>().First());
 
myDoc.AddPart<RibbonExtensibilityPart>(customRibbonPart);
 
ExtendedPart extendedPart = null;
 
foreach (IdPartPair partPair in mainPart.Parts)
{
if (partPair.OpenXmlPart.RelationshipType ==
"http://schemas.microsoft.com/office/2006/relationships/vbaProject")
{
extendedPart = (ExtendedPart)partPair.OpenXmlPart;
break;
}
}
 
if (extendedPart != null)
mainPart.DeletePart(extendedPart);
 
if (vbaPart != null)
mainPart.AddPart<ExtendedPart>(vbaPart);
}
}

Notice in the code above that I am removing any previous custom UI or vba part within the output document. Instead of deleting these parts I could have chosen to do some kind of merge, although that approach would have been more difficult.

End Result

Running this code I should end up with a directory full of macro-enabled files that all have custom UI enabled. Here is a screenshot of one of my documents, which has the imported custom UI:

Zeyad Rajabi

Embedding Any File Type, Like PDF, in an Open XML File

In my last post, I showed you guys how to embed an Excel spreadsheet within a Word document without the need to invoke an OLE Server. In today's post I am going to show you how to embed any file in an Open XML file. Specifically, I am going to show you how to embed a PDF file into a Word document. Note that this approach requires you to invoke an OLE Server to embed the file into an Open XML file.

My post will talk about using version 2 of the SDK.

If you just want to jump straight into the code, feel free to download this solution here.

<

Solution

To embed a PDF file into a Word document we can take the following actions:

  1. Create a template in Word that contains a content control that will be used to demarcate the region where the embedded object will be inserted
  2. Open up the Word document via the Open XML SDK and access its main document part
  3. Invoke the OLE server application associated with PDF files to create an IStorage and an image of the embedded object
  4. Add an image part to the document
  5. Feed the data from the generated image into the added image part
  6. Add an embedded object part to the document
  7. Feed the data from the generated IStorage into the embedded object part
  8. Determine the prog id associated with the application associated with PDF files
  9. Create a paragraph that contains the embedded object
  10. Locate the content control that will contain the embedded object
  11. Swap out the content control for the newly created paragraph
  12. Save changes made to the Word document

Note that the steps outlined above are just one method to accomplish this scenario. The steps above are very similar to my previous post showing you how to embed an Excel spreadsheet within a Word document. The main difference is in how we go about adding the embedded object to the Word document. No application, at least on my computer, has written out a subkey IPersistStorageType under HKCR\CLSID\{Apps_OLE_Storage_CLSID} for PDF files, which means there is no way for us to know the required structure of an IStorage containing a PDF file. Instead we are required to rely on the OLE server application associated with PDF files to generate the appropriate IStorage.

For the sake of this example, let's say I am starting with the following Word document:

Embed1

This document contains a content control, named "EmbedObject," which will contain my embedded object. In addition, let's say I have the following PDF file I wish to embed:

Embed2

The Code

As mentioned in my previous post, embedding an object in a document requires both a visual representation of the object and the underlying data. In this post, I am going to show you how to generate the IStorage and the image representing the embedded object by invoking the OLE Server associated with PDF files. To create the underlying data for a non-Office embedded object we need to look up the prog id of the application associated with the file format extension. To get this data we need to look under \HKCR\.XXX within the registry, where XXX is the file format extension (ex. PDF). Under this path you should see at least two sub keys: "(Default)" and "Content Type." The value specified for "(Default)" represents the prog id of the application associated with the file format. On my computer, the prog id associated with PDF files is "AcroExch.Document."

Since we don't know the structure of the embedded object we shouldn't use the content type associated with the file format extension. Instead, we should use the generic content type for embedded objects, which is "application/vnd.openxmlformats-officedocument.oleObject."

Our next step is to create the IStorage and an image representation for the embedded object. As mentioned in the Solution section above, we need to invoke the OLE Server associated with PDF files. Below is the C++ code needed to accomplish this task:

//********** This snippet is C++ code *************//
HRESULT PackageOleObject(LPCTSTR inputFile, LPCTSTR outputFile)
{
HRESULT hr = S_OK;
IStoragePtr pStorage = NULL;
IOleObjectPtr pOle = NULL;
IDataObjectPtr pdo = NULL;
FORMATETC fetc;
STGMEDIUM stgm;
HENHMETAFILE hmeta;
 
// Create a compound storage document.
hr = StgCreateStorageEx (
outputFile,
STGM_READWRITE | STGM_SHARE_EXCLUSIVE | STGM_CREATE | STGM_TRANSACTED,
STGFMT_DOCFILE,
0,
NULL,
NULL,
IID_IStorage,
reinterpret_cast<void**>(&pStorage));
CheckHr(hr);
    
// Create OLE package from file.
hr = OleCreateFromFile(CLSID_NULL, inputFile, ::IID_IOleObject,
OLERENDER_NONE, NULL, NULL, pStorage, (void**)&pOle);
 
hr = OleRun(pOle);
CheckHr(hr);
 
hr = pOle->QueryInterface(IID_IDataObject, (void**)&pdo);
CheckHr(hr);
 
fetc.cfFormat = CF_ENHMETAFILE;
fetc.dwAspect = DVASPECT_CONTENT;
fetc.lindex = -1;
fetc.ptd = NULL;
fetc.tymed = TYMED_ENHMF;
 
stgm.hEnhMetaFile = NULL;
stgm.tymed = TYMED_ENHMF;
hr = pdo->GetData(&fetc, &stgm);
CheckHr(hr);
 
// Create image metafile for object.
CopyEnhMetaFile(stgm.hEnhMetaFile, emfFile);
 
hr = pStorage->Commit(STGC_DEFAULT );
CheckHr(hr);
 
pOle->Close(0);
DeleteEnhMetaFile(stgm.hEnhMetaFile);
DeleteEnhMetaFile(hmeta);    
    
return hr;
}

The above C++ code snippet will create two output files that represent the IStorage and the image representation for our embedded object.

We are now ready to accomplish the rest of the steps. Here is how you add the appropriate image data and embedded object data to a Word file:

using (WordprocessingDocument myDoc = WordprocessingDocument.Open(output, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
 
//Note that I created this emf file using my C++ solution
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Emf);
imagePart.FeedData(File.Open("output.emf", FileMode.Open));
 
EmbeddedObjectPart embeddedObjectPart =
mainPart.AddEmbeddedObjectPart(@"application/vnd.openxmlformats-officedocument.oleObject");
 
//Note that I created this bin file using my C++ solution
embeddedObjectPart.FeedData(File.Open("input.pdf.bin", FileMode.Open));
 
...
}

I should note that both the image and the embedded data were created using my C++ code that I showed you earlier in this post. The next step is to create a paragraph that represents our embedded object. Using the Document Reflector to help me out, I was able to create the following method:

static Paragraph CreateEmbeddedPDFParagraph(string imageId, string embedId, string progId)
{
Paragraph p =
new Paragraph(
new Run(
new EmbeddedObject(
new V.Shapetype(
new V.Stroke() { JoinStyle = V.StrokeJoinStyleValues.Miter },
new V.Formulas(
new V.Formula() { Equation = "if lineDrawn pixelLineWidth 0" },
new V.Formula() { Equation = "sum @0 1 0" },
new V.Formula() { Equation = "sum 0 0 @1" },
new V.Formula() { Equation = "prod @2 1 2" },
new V.Formula() { Equation = "prod @3 21600 pixelWidth" },
new V.Formula() { Equation = "prod @3 21600 pixelHeight" },
new V.Formula() { Equation = "sum @0 0 1" },
new V.Formula() { Equation = "prod @6 1 2" },
new V.Formula() { Equation = "prod @7 21600 pixelWidth" },
new V.Formula() { Equation = "sum @8 21600 0" },
new V.Formula() { Equation = "prod @7 21600 pixelHeight" },
new V.Formula() { Equation = "sum @10 21600 0" }),
new V.Path() { AllowGradientShape = V.BooleanValues.T, ConnectionPointType = OVML.ConnectValues.Rectangle, AllowExtrusion = V.BooleanValues.F },
new OVML.Lock() { Extension = V.ExtensionHandlingBehaviorValues.Edit, AspectRatio = OVML.BooleanValues.T }
) { Id = "_x0000_t75", CoordinateSize = "21600,21600", Filled = V.BooleanValues.F, Stroked = V.BooleanValues.F, OptionalNumber = 75, PreferRelative = V.BooleanValues.T, EdgePath = "m@4@5l@4@11@9@11@9@5xe" },
new V.Shape(
new V.ImageData() { Title = "", RelationshipId = imageId }
) { Id = "_x0000_i1025", Style = "width:459pt;height:594pt", Ole = V.BooleanEntryWithBlankValues.Empty, Type = "#_x0000_t75" },
new OVML.OleObject() { Type = OVML.OLEValues.Embed, ProgId = progId, ShapeId = "_x0000_i1025", DrawAspect = OVML.OLEDrawAspectValues.Content, ObjectId = "_1309181277", Id = embedId }
) { DxaOriginal = (UInt32Value)9180U, DyaOriginal = (UInt32Value)11881U })
);
return p;
}

The last step of the solution is to swap out the content control for this newly created paragraph. Here is the code snippet to accomplish this task:

Paragraph p = CreateEmbeddedPDFParagraph(
mainPart.GetIdOfPart(imagePart),
mainPart.GetIdOfPart(embeddedObjectPart),
"AcroExch.Document");
 
SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>()
.Where(s => s.GetFirstChild<SdtProperties>().GetFirstChild<Alias>().Val.Value
.Equals("EmbedObject")).First();
 
OpenXmlElement parent = sdt.Parent;
parent.InsertAfter(p, sdt);
sdt.Remove();
mainPart.Document.Save();

End Result

Running this code I should end up with a document that looks like the following:

Embed3

Upon activating the embedded object I will see the following:

Embed4

Let me know if you guys are interested in more solutions around embedded objects.

Zeyad Rajabi

Added video to blog post

Office 2010 Technical Preview

This week is really an exciting week for those of us who work on Office. We just released our first public offering of the Office 2010 Technical Preview. Check out the Office 2010 website for articles, videos, and demos related to this Technical Preview. In today's post I am going to talk about, at a very high-level, some of the really cool developer-centric features we are introducing in Office 2010. In the coming months, in addition to Open XML SDK based solution blog posts, we will blog about Office 2010 based solutions/scenarios.

What's New for Developers in Office 2010?

Check out John Durant's blog post, where he does a great job highlighting some of the benefits Office 2010 provides for developers.  Here is a quick summary of some key additions/tools we have made as part of the Office 2010 wave:

  • Continual innovation of the Open XML SDKThe Open XML SDK remains a big piece of our developer story for Office 2010. We've all seen a shift lately where solutions take more advantage of the cloud and services, and with that shift the need to consume, manipulate or create Office documents on the server becomes more important. The Open XML SDK is a vital tool in this new world for any developer and is a great complement to any service based solution. You've already seen this a bit in some of the services we provided in Office 2007…for example, while the Open XML SDK cannot accomplish recalculation, Excel Services can accomplish this scenario without any issues. Similarly, there may be operations that existing services aren't capable of, but the Open XML SDK is perfectly suited for (document assembly; data extraction; etc.). With this in mind, we are continuing to improve version 2 of the Open XML SDK, so that it fits in well with the new world of software + services. One big area of improvement is to make the Open XML SDK provide more robust validation, so that you will have an easier time ensuring that the server based solutions you write will create valid Office documents. Files you create with the Open XML SDK will work for both Office 2007 and Office 2010. In other words, you will now be able to use the SDK to create Office 2010 based solutions. Since the Open XML SDK will support Office 2010 based Open XML file formats, the final release of the Open XML Format SDK V2.0 will be available at the same time the final version of Office 2010 is available.
  • Tighter relationship between Office and Visual Studio – Visual Studio 2010 makes it even easier to create Office client or SharePoint based solutions. Deployment for Office client solutions created with VS will be much simpler, with less runtime requirements. In addition Visual Studio 2010 ships with more out of the box templates, which allows solutions to get off the ground quicker.
  • Even easier to create rich solutions on top of Access 2010 – Check out the following video to see all the cool improvements related to Access. This is a new model for development that we think you'll really enjoy.
  • All Office 2010 applications now have Fluent UI – The ribbon has now been added to more Office applications, like OneNote, Outlook (now is fully integrated throughout the UI), and Visio. Since the ribbon is based on XML, having all these applications incorporate the ribbon means a better extensibility story for developers.
  • Fluent UI has been improved – In Office 2010 we have added the ability for programmatically activating tabs in the Fluent UI. For example, you can now have your custom tabs behave like built in contextual tabs, where tabs only show when specific events are fired.
  • Office 2010 has a new Backstage view – One of the many UI improvements we have made in Office 2010 is the addition of a new extensible Backstage view to the products. This new UI not only improves the overall customer workflow and user experience, but also provides a rich extensible platform for developers. The Backstage view will allow you to add custom UI and elements much in the same way the ribbon provides extensibility. Check out Chris Bryant's brief intro, which showcases the Backstage in action.
  • Offload Excel calculations to High Performance Computing grids – In Excel 2007 we added multi-threaded recalculation (MTR) to Excel, including the ability for developers to create user defined functions (UDFs) that could participate in MTR.  With Excel 2010, we've gone a step further to allow massive parallelization by offloading UDFs from the local machine to a high performance computing grid, with very minimal changes to existing XLL UDFs.

Again, in the next few weeks we will be posting more Office 2010 centric posts, so stay tuned.

- Brian and Zeyad

Updated post to correct a broken link...

Posted by BrianJones | 7 Comments
Filed under:

Embedding an Open XML File in another Open XML File

A couple of weeks ago I gave a presentation on the Open XML SDK to a few customers, where I was asked questions on how to embed files within Open XML documents. I thought it would be a good opportunity to devote a couple of posts around this topic. In today's post I am going to show you how to embed an Open XML file in another Open XML file. Specifically, I am going to show you how to embed an Excel spreadsheet (.xlsx) into a Word document (.docx). Next post will cover how to embed other file types in Open XML files.

My post will talk about using version 2 of the SDK.

If you just want to jump straight into the code, feel free to download this solution here.

Solution

To embed an Excel spreadsheet into a Word document we can take the following actions:

  1. Create a template in Word that contains a content control that will be used to demarcate the region where the embedded object will be inserted
  2. Open up the Word document via the Open XML SDK and access its main document part
  3. Add an image part to the document (this image will be a placeholder image of the embedded object file)
  4. Add an embedded package part to the document
  5. Create a paragraph that contains the embedded object
  6. Locate the content control that will contain the embedded object
  7. Swap out the content control for the newly created paragraph
  8. Save changes made to the Word document

Note that the steps outlined above are just one method to accomplish this scenario.

For the sake of this example, let's say I am starting with the following Word document:

This document contains a content control, named "EmbedObject," which will contain my embedded object. In addition, let's say I have the following Excel spreadsheet I wish to embed:

Embedded Objects in Open XML

Before we get into the code, I wanted to talk more about embedded objects. Office has three ways of storing embedded objects:

  1. Those where Office persists the IStorage as given to Office during OLE operations
  2. Those where Office persists the IStorage as given during OLE operations, but gives the embedded object a friendly extension and filename. This method assumes that the embedded object is a native file format of the application in question
  3. Those where Office interprets the IStorage given during OLE operations as simply a wrapper for a package and only stores the package. This method assumes that the package conforms to Open Packaging Conventions

The major difference between #0 vs. #1 and #2 is in how objects are embedded within a file. Types #1 and #2 allows developers working with Open XML files to more easily extract and insert embedded objects because there is no need to talk to an OLE server. Instead, developers can simply read/write embedded objects as if they were reading from or writing to files on disk. Office differentiates between these three types by looking for a specific registry key under HKCR\CLSID\{Apps_OLE_Storage_CLSID}, where Apps_OLE_Storage_CLSID is the CLSID of the OLE storage server. The Office applications look for a subkey named IPersistStorageType and determines the type of the embedded object in the following manner:

  • Office assumes the embedded object is type #0 if no subkey is specified or if the value of the subkey is 0
  • Office assumes the embedded object is type #1 if the subkey has a value of 1
  • Office assumes the embedded object is type #2 if the subkey has a value of 2

The cool thing is that other applications can take advantage of this reg key. For example, if an application writes out a value of 1 for this subkey for a particular file format then the Office applications will embed files of that type natively in the Open XML file formats.

One more thing to note is that all embedded object types require a prog id, which you can find from the registry, as well as an image representation of the object.

The Code

As mentioned above, when an object is embedded in a document, both a visual representation of the object and the underlying data is stored. The visual representation is simply an image of what you would see if you were to activate the object. For the sake of this solution, my visual representation of the document will be a placeholder image that indicates to users how to refresh the embedded object and will look like the following image:

Looking at the steps outlined above in the Solution section, here is the code snippet to accomplish steps two through four:

using (WordprocessingDocument myDoc = WordprocessingDocument.Open(output, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Png);
imagePart.FeedData(File.Open("placeholder.png", FileMode.Open));
EmbeddedPackagePart embeddedObjectPart =
mainPart.AddEmbeddedPackagePart(@"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
embeddedObjectPart.FeedData(File.Open("embed.xlsx", FileMode.Open));
}

The placeholder.png refers to the placeholder image I showed you above and the embed.xlsx file is the spreadsheet that will be embedded. The string "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" represents the content type of an Excel spreadsheet with an extension .xlsx. You can find the content type of a particular file by going to HKCR\.XXX, where XXX is the extension of the file format, and looking for a sub key named "Content Type."

The next step is to create a paragraph that represents our embedded object. Like my other post on importing SmartArt from PowerPoint to Word, I am going to take advantage of the Document Reflector tool that ships free with the SDK. Using this tool's output as a starting point, I am able to generate the necessary paragraph with the following code snippet:

static Paragraph CreateEmbeddedObjectParagraph(string imageId, string embedId)
{
Paragraph p =
new Paragraph(
new Run(
new EmbeddedObject(
new V.Shapetype(
new V.Stroke() { JoinStyle = V.StrokeJoinStyleValues.Miter },
new V.Formulas(
new V.Formula() { Equation = "if lineDrawn pixelLineWidth 0" },
new V.Formula() { Equation = "sum @0 1 0" },
new V.Formula() { Equation = "sum 0 0 @1" },
new V.Formula() { Equation = "prod @2 1 2" },
new V.Formula() { Equation = "prod @3 21600 pixelWidth" },
new V.Formula() { Equation = "prod @3 21600 pixelHeight" },
new V.Formula() { Equation = "sum @0 0 1" },
new V.Formula() { Equation = "prod @6 1 2" },
new V.Formula() { Equation = "prod @7 21600 pixelWidth" },
new V.Formula() { Equation = "sum @8 21600 0" },
new V.Formula() { Equation = "prod @7 21600 pixelHeight" },
new V.Formula() { Equation = "sum @10 21600 0" }),
new V.Path() { AllowGradientShape = V.BooleanValues.T, ConnectionPointType = OVML.ConnectValues.Rectangle, AllowExtrusion = V.BooleanValues.F },
new OVML.Lock() { Extension = V.ExtensionHandlingBehaviorValues.Edit, AspectRatio = OVML.BooleanValues.T }
) { Id = "_x0000_t75", CoordinateSize = "21600,21600", Filled = V.BooleanValues.F, Stroked = V.BooleanValues.F, OptionalNumber = 75, PreferRelative = V.BooleanValues.T, EdgePath = "m@4@5l@4@11@9@11@9@5xe" },
new V.Shape(
new V.ImageData() { Title = "", RelationshipId = imageId }
) { Id = "_x0000_i1025", Style = "width:500pt;height:400pt", Ole = V.BooleanEntryWithBlankValues.Empty, Type = "#_x0000_t75" },
new OVML.OleObject() { Type = OVML.OLEValues.Embed, ProgId = "Excel.Sheet.12", ShapeId = "_x0000_i1025", DrawAspect = OVML.OLEDrawAspectValues.Content, ObjectId = "_1307530183", Id = embedId }
) { DxaOriginal = (UInt32Value)10957U, DyaOriginal = (UInt32Value)8455U })
);
return p;
}

The last step of the solution is to swap out the content control for this newly created paragraph. This code is very similar to a lot of my previous posts where I used content controls as semantic structures. Here is the code snippet to accomplish this task:

Paragraph p = CreateEmbeddedObjectParagraph(mainPart.GetIdOfPart(imagePart),
mainPart.GetIdOfPart(embeddedObjectPart));
SdtBlock sdt = mainPart.Document.Descendants<SdtBlock>()
.Where(s => s.GetFirstChild<SdtProperties>().GetFirstChild<Alias>().Val.Value
.Equals("EmbedObject")).First();
OpenXmlElement parent = sdt.Parent;
parent.InsertAfter(p, sdt);
sdt.Remove();
mainPart.Document.Save();

End Result

Running this code I should end up with a document that looks like the following:

Upon activating the embedded object I will see the following:

Pretty easy stuff! Next time I will show you how to embed other file formats, like PDF.

Zeyad Rajabi

More Posts Next page »
 
Page view tracker