Introduction to Office Open XML

Hi all,

Today, I am going to talk a little about OO XML(Office Open XML), as well as touching a little bit on the Open XML development tools.

You should all noticed that since office 2007, your favourite office documents has an additional "x" added in the file extension (I.e. doc to docx). This is actually the beginning of Open XML File Format. The easiest way to illustrate a doc and a docx file can be done by:

  1. Make a seperate copies of a docx, for example: docx_example.docx
  2. Rename the file extension to zip so that you have docx_example.zip
  3. You should be able view the zip files as if there are muliple layers of XML files!
  4. You cannot do the same with the old doc files.

Try it! This is because the Open XML file format is using a collection of XML files to store the document's content as well as its other attributes, such as styles, images, properties, and comments etc.

Open XML File Format Benefits:

  • The open file formats and the open specifications enable broad access to technologies.
  • The decoupled contents and other document building bloicks allow great programming flexibility and accessibility
  • Segmented data storage improves data recovery and fault tolerance. Unlike the older version of office files, the document is still recoverable if part of the file is corrupted.
  • ZIP compression reduces file sizes
  • Create/manipulate Office documents without using Office OM, Perfect for server-based scenarios.
  • Lightweight requirements: open/save zip files and XML parser
  • Extremely high performance for processing mass amounts of documents

Now, with this new open file formats and open specification, Microsoft Office development becomes faster, easier and even more customized. Open XML development allows developers to quickly parse through Office documents details and custom contents, manupulate specific parts of the Office documents and rapid generation of mass amount of Open XML Office documents.

Typical application of Open XML development can be to quickly generate office documents by reusing a similar Open XML documents template while only changing the delta between the documets.

For Open XML development, there are several tools I would recommend:

  • Open XML SDK 2.0
    • Strongly typed part classes to manipulate Open XML document packages
    • Strongly typed content classes to manipulate Open XML parts
    • Content Construction, Search, and Manipulation using LINQ
    • Validation of Open XML documents
  • Open XML SDK Productivity Tools
    • Explore the structure of Open XML documents
    • Generate Open XML SDK source code based on document content
    • Highlight differences between Open XML documents
    • Validate a document, part, or segment against 2007 and 2010 formats
    • SDK Documentation
  • Open XML SDK Code Snippets
    • Visual Studio code snippets
    • 52 snippets for performing common tasks in Word, Excel, and PowerPoint
    • Speed up your development or use as a learning tool
  • Open XML Package Editor Power Tool for Visual Studio
    • An add-in for Visual Studio 2010 that enables you to parse and edit Open Packaging Convention files (including Word, Excel, and PowerPoint documents).
    • Open any Open XML Package file directly in Visual Studio 2010
    • Browse contents in a tree view
    • Open parts in Visual Studio's rich XML editor
    • Add/remove parts and relationships
    • Import and export part contents
    • Create new Office Packages from a set of templates using Visual Studio's File > New dialog

With the Open XML SDK, creating a new Word document can be as simple as this: 

1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.
 // Create a Wordprocessing document. using (WordprocessingDocument package = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))  {    // Add a new main document part.   package.AddMainDocumentPart();     // Create the Document DOM.   package.MainDocumentPart.Document =      new Document(        new Body(          new Paragraph(            new Run(              new Text("Hello World!")))));  } 

 Feel free to drop a note on what you would like to hear specifically for Open XML, also everything you need to know about developing in Office can also be found here: https://msdn.microsoft.com/en-us/office/bb265236.aspx

Cheers,
Danny