DII Event London, May 18, 2009 – Recap
10 June 09 08:06 AM

Well, a few days have turned into a few weeks. The joy of technology, travel and catching up on things. You can read a recap of the event and download the presentations here. I will just take a few minutes to provide some salient points from my perspective.

We held the event in the Microsoft offices at Cardinal Place in London, UK.  We had an excellent turnout with participants from Fraunhofer FOKUS, Workshare, PowerPoint Alchemy, Griffin Brown Digital Publishing, FEDICT, Dialogika, Gama System, Genisoft, PowerPoint Alchemy, Datalucid Limited, and RealDolmen, as well as independent experts from SC34 and the OASIS ODF/OIC technical committees.

The focus of this event was the new Fraunhofer FOKUS IS29500 Validator and Document Library project. Members of the Fraunhofer team presented the project to industry experts and received feedback from industry experts including Alex Brown (convener of SC34 WG1 and member of WG4), Bart Hanssens (Chair of the OASIS OIC TC), and Dennis Hamilton (Secretary of  the OASIS OIC TC). This broad expertise across document formats led to a wide-ranging conversation about managing document format standards.

Stephanie Krieger, Julien Chable and John Wilson spoke up quite often at the event, raising interoperability concerns with standards conformance from real world customer situations. Stephanie is a well accomplished author who has written a number of books, including Advanced Microsoft Office Documents 2007 Edition Inside Out.

Some of the attendees have taken the time to write their thoughts on the event, here is a link to their posts:

In addition to the introduction of the Fraunhofer FOKUS project, there were a number of presentations shared by attendees. I have included a brief description of each of the presentations.

Introduction and Interoperability @ Microsoft UK

Paul Lorimer (left) is the Group Manager of the Office Interoperability team. Paul kicked off the event by talking about value of the Fraunhofer FOKUS project and along with some goals that Microsoft is looking to achieve. Giampiero Nanni (right) is the Director of Interoperability for Microsoft in the UK; he presented on what Microsoft in the UK is doing around interoperability. It was great to have Paul and Giampiero at the event, as they were able to answer a number of questions that came up during the event, sharing Microsoft’s goals and efforts. You can download Paul’s presentation here, and Giampiero’s presentation here. You can read more about interoperability at Microsoft here.

Standards-based validation of IEC/ISO 29500 XML resources

Alex Brown discussed how the word “valid” has a very specific meaning within a standard, and that when people use the word validation, they generally mean “schema-valid”. Alex explained how validation requires a much deeper meaning, requiring terms such as: conformant, valid, interoperable and portable. Alex provided a history of ODF going through the standards process and explained where IS29500 is in the process along with the current set of activities there. Alex then explained the differences between “application” conformance and “document” conformance. He finished his presentation with a demonstration of using a new W3C technology, XProc, to show how XML Pipelines can be used to test all of the previously mentioned validation terms in a succinct and manageable way. You can read more about XML Pipelines in this post on his blog. You can download Alex’s presentation here.

High Fidelity Programmatic Access to Document Content

Matevž Gačnik explained the definition of “original” content as defined by the Slovenian government and European Union legislature. According to these regulations, a document can be considered “original” if it is signed by the author, stored and archived by a certified software solution and is stored in a preferred document format. Matevž explained the challenge that IS29500 is not currently a preferred format because when the CTD was approved, IS29500 had not yet been approved as a standard. Matevž shared some feedback from their organization about Office and IS29500; one point that stood out to me was his comment that parsing Office documents as XML is “2000x faster”. You can download Matevž’s presentation here (Note: to view this presentation, you may need to right-click on the link and select “Save Target As…”, then download and open from your local computer.)

PHP PowerPoint Project on CodePlex

Maarten Balliauw, in lightning speed, introduced the group to a new PHP project on CodePlex, called PHPPowerPoint. The PHPPowerPoint project provides a set of classes for PHP for reading and writing the PresentationML file formats. The PHPPowerPoint project originated from the PHPExcel project. Maarten demonstrated the PHPPowerPoint, and Slide classes, then showing us how the PHPPowerPoint_Reader_IReader and PHPPowerPoint_Writer_IWriter interfaces are used for persisting the document. Maarten concluded his presentation by generating a document using PHPPowerPoint. You can download Maarten’s presentation here.

Interoperability by Community

Gerd Schürmann started by sharing a little history about Fraunhofer, introducing us to the late Joseph von Fraunhofer (1787 – 1826). Joseph was a scientist, discovering the “Fraunhofer Lines” in the sun spectrum; an inventor, creating a new manufacturing method for lenses; and an entrepreneur, being a director and associate of a glassworks. Gerd explained the breadth of offerings that Fraunhofer provides, including: research and development projects, advance studies and consultancies, services, standardization and fora activities, academic education and teaching and prototype development. Gerd concluded by introducing the IS29500 Validator and Document Library project. You can download Gerd’s presentation here.

PLANETS & Doc Conversion Tools

Wolfgang Keber (left) and Natasa Milic-Frayling (right) introduced us to the PLANETS project, which focuses on preserving digital assets. The four-year project is co-funded by the European Union and PLANET is an acronym which stands for Preservation and Long-term Access through Networked Services. Natasa is from the Microsoft Research labs, which has contributed to this project. Wolfgang explained the challenges in going between different document formats. For example, converting a document from a Binary MS Office document and converting it to ODF or UOF. Wolfgang then explained that by creating a wrapper around each format, they have been able to achieve converting documents from many formats to many other formats. Wolfgang concluded his presentation by showing us a demo. Stephanie Krieger and Julien Chable proposed some difficult questions about the formatting, which spurred some lively and interesting discussion about the interoperability of some document formats with others. You can download Wolfgang’s presentation here.

Extensibility within Standards

I then had the privilege of presenting on the topic of extensibility within standards. I started the presentation with a discussion, asking people what they thought of when they heard the terms extensibility and standards together. I then moved into showing the extensibility mechanisms defined in Part 3 of the IS29500:2008 standard. I showed how custom elements and attributes can be added to the markup of the document. I then showed how an implementer can use alternate content blocks (ACB) to allow a consumer to gracefully render a previous version of the markup. I then provided a demo where I added custom elements and attributes to the markup of a PresentationML document, and opened the document in PowerPoint 2007. I concluded my session with another discussion, asking people whether they think extensibility mechanisms are a healthy object oriented way of advancing standards. This led to a lively discussion, but in general, I think people agreed that extensibility within standards has value. You can download my presentation here.

Fraunhofer – Validator and Test Document Library Project

Jan Ziesing (left) and Ucheoma “Uche” Ishionwu (right) picked up from where Gerd’s presentation left off by officially introducing the Fraunhofer FOKUS IS29500 Validator and Document Library project. Jan was the presenter, and he turned to Uche for three specific demos. Jan started by explaining that the purpose of the document library is to create a suite of documents for testing and verifying IS29500 interoperability. Fraunhofer will maintain a web site for a document repository where people can up/download documents. Jan shared with us their research on the complexity of categorizing documents, explaining how automation and validation can be used to categorize documents into specific domains when they are uploaded. You can download Jan’s presentation here.

Uche provided demos for categorizing documents, building semantic rules for a photo book and semantic validation. In the first demo, Uche showed how categorizing documents is something that can be done programmatically. He identified different attributes of the presentation, then added weighted values to some attributes to level their importance in the categorization. You can download Uche’s first demo here. In the second demo, Uche described how this categorization can be applied to a real world photo book document when uploaded to the Document Library. By applying these attributes Jan and Uche demonstrated how the programmatic categorization of the document allows the document to be easily found within the Document Library. You can download Uche’s second demo here. In the third demo, Uche showed how an XSD schema is not enough to completely validate a document against a standard. Uche manually modified a document to invalidate it against the standard, but keep it compliant with the XSDs; he then ran validation on the document, which validated correctly. Uche then used Schematron to add semantic validation rules (i.e. rules that are only specified in the text of the standard) to more accurately validate his file against the standard. You can download Uche’s third demo here.

You can email Jan Ziesing to learn how to signup and contribute to the project.

Roundtable Discussion

The event concluded with a roundtable discussion, led by Fraunhofer. Many topics were discussed; attendees provided feedback about the Validator and Document Library project, also sharing their thoughts about what validation scenarios are important to them. Here are some of the feedback that was shared:

      • Some people expressed the opinion that they would like SC34 to contribute to the validator; defining/validating the rules needed to validate IS29500 files
      • Some people expressed the opinion that they would like the OASIS OIC TC to coordinate efforts between the Validator and Document Library project and the work the OIC TC is doing with ODF interoperability and validation
      • Some people expressed that they would like to see the validator be made available as a web service
      • Some people shared that some organizations may consider their documents as proprietary, and want to know if the validator could be made available to these organizations in such a way that they could either a) securely pass documents to the validator without fear of the document being made available to others, and/or b) have a copy of the validator that they can run privately within their own infrastructure
      • Some people expressed that they would like the Document Library to have a mechanism by which the intellectual property rights (IPR) of the document and/or owner can be verified, thereby protecting the IPR of the document and/or owner. They felt that this would make the library more valid and useful to users

Fraunhofer noted this feedback and hopes to incorporate it into their work.

Postedby stephenperont | 5 Comments    
DII Event London – Home Safe
21 May 09 08:20 AM

I am back in the US and arrived safely at home yesterday late in the afternoon. This was truly a great event; so good, in fact, that it was a tough decision for some people as to whether we should keep talking or go to dinner (of course, dinner won). I am just catching up on things now, and re-acclimating to the EST, but will publish a full report in a few days.

Below is a picture we captured of the Rosetta Stone while at the British Museum, along with close-ups of the languages inscribed on the stone and a picture from the event.

Postedby stephenperont | 2 Comments    
DII Event London, May 18, 2009
18 May 09 05:42 AM

Welcome to Victoria Station in London!


We are only hours away from the start of the Document Interoperability Initiative event in London, and I am very excited about today’s event. I was able to have dinner with some attendees last night and catch up on both personal and technical topics.

One topic that seemed to take stage at dinner was Interoperability and Extensibility. I will be presenting on this topic at the event, and will share the details of my presentation in a future post. One topic that people want to discuss with Fraunhofer FOKUS is their recent report on interoperability between IS29500 and ODF. The main topic of the event is their new IS29500 Test Document Library and Validator project.

I need to get back to preparing for the event; however, I will share details about the event later in the week. For now, here is the agenda for today.

09:00 – 09:15 Introduction (Paul Lorimer)
09:15 – 09:30 Interoperability @ Microsoft UK (Giampiero Nanni)
09:30 – 10:00 Standards-based validation of IEC/ISO 29500 XML resources (Alex Brown)
10:00 – 10:25 High Fidelity Programmatic Access to Document Content (Matevž Gačnik)
10:25 – 10:35 Break
10:30 – 10:50 PHP PowerPoint Project on CodePlex (Maarten Balliauw)
10:50 – 11:20 Interoperability by Community (Gerd Schürmann)
11:20 – 11:40 PLANETS & Doc Conversion Tools (Natasa Milic-Frayling & Wolfgang Keber)
11:45 – 01:00 Lunch
01:00 – 01:45 Extensibility within Standards (Stephen Peront)
01:45 – 03:45 Fraunhofer – Validator and Test Document Library Project (Gerd Schürmann)
03:45 – 04:00 Break – Tea Time
04:00 – 04:55 Round Table Discussion (FhI FOKUS)
04:55 – 05:00 Wrap-up and Final Comments (Giampiero Nanni)

 

Postedby stephenperont | 6 Comments    
DII Event London and Custom Document Format Interoperability
17 April 09 03:16 AM

http://en.wikipedia.org/wiki/Image:Cardinalplace.jpgI am very excited about the Document Interoperability Initiative (DII) event that Doug recently announced, which is coming up in May. The event is taking place in London where Fraunhofer will be sharing a community project they recently started to create an IS29500 validator and test document library. This project was started to address feedback from developers at past DII workshops about the need for a validator to ensure that the IS29500 documents they create will interoperate well with other implementations. They clearly stated that implementers need a place where they can go to download tools and resources that allow them to validate their documents against the IS29500 standard. While Microsoft is one of the contributors to this project, this is a community project that anyone can contribute to.

The DII event in London is a free event that anyone can attend. I’ll be there to update everyone on some of the things we’re currently doing to enable interoperability, and I’m also managing registrations for the event so if you would like to come, send me an email and I will provide you with the event details. Fraunhofer will be sharing details on how you can contribute to the community project. If you are not able to make it to the event, but still want to contribute, soon you will be able to go to the project website to read about it and sign up. I will post details as soon as they are available. If you have any specific questions about the project, let me know and I will do my best to answer your questions.

Custom Document Format Interoperability

You may have heard that Office 2007 SP2 will now support editing files in the OpenDocument 1.1 (ODF) format. This document format was added to Office’s long list of supported documents formats to give customers more choices for the format they use to save their documents.

In addition to allowing you to edit the ODF 1.1 format within Office 2007, SP2 also supports a new External File Format API that can be used to edit other document formats as well. With this API, users can choose to save their documents in any format they want. In this post we will explore how to use the API to enable Office 2007 to edit our own custom document format. We will then use Office 2007 to save our custom format as DOCX, ODT and HTML.

Our Custom Document Format

For the purpose of this article, we have a company who needs to manage their sales pipeline information. The data is available as XML, but they do not want to spend the money to build a custom editor. They just want to let their users edit the pipeline data in Word, as a table. They give these files an extension of SPLX (i.e. Sales PipeLine Xml)

The sales pipeline information is made up of a series of SalesItem tags, each with a unique id that represents the index of the item. They track the name of the customer (CustomerName), how much the deal represents (DealValue) and a percent that represents how confident they are that the sales opportunity will close (ConfidencePercent).

Here is the sample XML file:


<?xml version="1.0" encoding="utf-8"?>
<SalesPipeline>
    <SalesItem id="1">
        <CustomerName>ABC Company</CustomerName>
        <DealValue>1000000</DealValue>
        <ConfidencePercent>.2</ConfidencePercent>
    </SalesItem>
    <SalesItem id="2">
        <CustomerName>123 Company</CustomerName>
        <DealValue>1200000</DealValue>
        <ConfidencePercent>.15</ConfidencePercent>
    </SalesItem>
    <SalesItem id="3">
        <CustomerName>XNA Company</CustomerName>
        <DealValue>500000</DealValue>
        <ConfidencePercent>.65</ConfidencePercent>
    </SalesItem>
    <SalesItem id="4">
        <CustomerName>Defender Company</CustomerName>
        <DealValue>60000</DealValue>
        <ConfidencePercent>.9</ConfidencePercent>
    </SalesItem>
</SalesPipeline>

We will create an External File Converter that will transform the XML into a WordprocessingML, document when opened; and then transform the respective document format back to the XML format when saved. This will allow the users to edit the sales pipeline information in Office 2007, while keeping the data in their own XML document format.

Implementing our Custom External File Converter

Create an Out-of-Process COM Object

Use the following list of steps to create an out-of-process COM object.

  1. Open Visual Studio with Administrator privileges. You can do this by right-clicking on the Visual Studio link in the start menu and selecting Run as Administrator. Administrative privileges will be needed because the COM object will make changes to the registry when registering with COM+ services.


  2. Create a new Project (File -> New ->Project)

  3. In the Project types list, select Visual C# -> Windows. In the Templates list, select the Empty Project item. Type "MyEFC" for the Solution Name and "SalesPipeline" for the Name. Your New Project window should look like the picture below, then click the Ok button.


  4. Right-click on the project References, and select Add References...


  5. Select the .NET tab, and select the System.EnterpriseServices item with Version 2.0.0.0. Click the Ok button to add the reference.


  6. Right-click on the project References, and select Add References…. Select the .NET tab, and select the System.Windows.Forms item with Version 2.0.0.0. Click the Ok button to add the reference.

  7. Right-click on the project References, and select Add References…. Select the COM tab, and select the Microsoft Office 12.0 Object Library item with TypeLib Version 2.4. Click the Ok button to add the reference.

  8. Now we will add a class for our COM server. Right-click on the project and select Add -> Class.

  9. Enter MyCOMServer.cs for the Name; then click the Ok button to create the class. Visual Studio will automatically add some additional references that are needed.

  10. Update the class in the following ways.
    • Add a using statement to the System.Windows.Forms namespace.
    • Add a static Main() entry point method to the class.
    • Mark the entry point method as single threaded by applying the [STAThread] attribute.
    • Create a Windows message loop by calling Application.Run() method.
    • Your code should now look something like this:
     
    using System;
    using System.Windows.Forms;

    namespace SalesPipeline
    {
        static class MyCOMServer
        {
            [STAThread]
            static void Main()
            {
                Application.Run();
            }
        }
    }

  11. We now need to add assembly flags to make our assembly COM visible and assign it a GUID. Right-click on the project and select Properties. Select the Application tab, then click the Assembly Information… button. Enter "{BD3489D9-EAE7-4c9d-BF88-D7B7C05DDE45}" into the GUID field, then click Ok. IMPORTANT: Click the Assembly Information… button a second time and this time, check the Make assembly COM-Visible checkbox; then click Ok button again. Doing this a second time is important.


  12. Next, we need to set the Project Output type to a Windows Application. While the tab is selected on the project properties, select Windows Application from the Output Application type drop-down.


  13. Next we need to sign the assembly. While still in the project properties, select the Signing tab. Check the Sign the assembly checkbox, and select <New...> from the drop down.


  14. Type SalesPipelineKey into the Key file name field. Uncheck the Protect my key file with a password checkbox and click the Ok button.


  15. Open the AssemblyInfo.cs file make the following list of changes:
    • Add a reference to the System.EnterpriseServices namespace
    • Add the ApplicationActivation assembly attribute with the ActivationOption.Server parameter
    • Add the ApplicationAccessControl assembly attribute with a false parameter
    • Make sure the that ComVisible assembly attribute has a true parameter
    • Make sure the Guid assembly attribute is set to the correct Guid
    • The following is the code that reflects these steps
     
    ...
    using System.EnterpriseServices;
    ...
    [assembly: ComVisible(true)]
    [assembly: ApplicationActivation(ActivationOption.Server)]
    [assembly: ApplicationAccessControl(false)]
    [assembly: Guid("BD3489D9-EAE7-4c9d-BF88-D7B7C05DDE45")]
    ...

  16. At this point, you should have a windowless COM server that is ready to host our out-of-process COM object. Compile and run the application to ensure that things are working correctly. Note: When you run the application, nothing will happen, but Visual Studio should be in a debug state. Click the Stop Debugging to stop the application.

Create a Basic External File Converter

Now that we have created a COM server it is time to create our External File Converter COM object. Use the following steps to create the COM object:

  1. Right-click on the Project and select Add -> Class. Enter SalesPipelineConverter.cs in the Name field and click the Add button to create the class.
    • Add references to the Microsoft.Office.Core, System.EnterpriseServices and System.Runtime.InteropServices namespaces.
    • Add the ComVisible attribute with a parameter of true.
    • Add the Guid attribute with a parameter of "CC03A6F5-8517-48c6-B8A5-DD287855F9BA"
    • Mark the class as public, and inherit it from the ServicedComponent class
    • Inherit the class from the IConverter interface, and add the default implementation
    • Compile your code to make sure that there are no syntax mistakes. Your code should now look something like this:
     
    using System;
    using Microsoft.Office.Core;
    using System.EnterpriseServices;
    using System.Runtime.InteropServices;

    namespace SalesPipeline
    {
        [ComVisible(true)]
        [Guid("CC03A6F5-8517-48c6-B8A5-DD287855F9BA")]
        public class SalesPipelineConverter : ServicedComponent,
            IConverter
        {
            //
            // IConverter Members
            //
            public void HrExport(
                string bstrSourcePath,
                string bstrDestPath,
                string bstrClass,
                IConverterApplicationPreferences pcap,
                out IConverterPreferences ppcp,
                IConverterUICallback pcuic)
            {
                throw new NotImplementedException();
            }
            public void HrGetErrorString(
                int hrErr,
                out string pbstrErrorMsg,
                IConverterApplicationPreferences pcap)
            {
                throw new NotImplementedException();
            }

            public void HrGetFormat(
                string bstrPath,
                out string pbstrClass,
                IConverterApplicationPreferences pcap,
                out IConverterPreferences ppcp,
                IConverterUICallback pcuic)
            {
                throw new NotImplementedException();
            }

            public void HrImport(
                string bstrSourcePath,
                string bstrDestPath,
                IConverterApplicationPreferences pcap,
                out IConverterPreferences ppcp,
                IConverterUICallback pcuic)
            {
                throw new NotImplementedException();
            }

            public void HrInitConverter(
                IConverterApplicationPreferences pcap,
                out IConverterPreferences ppcp,
                IConverterUICallback pcuic)
            {
                throw new NotImplementedException();
            }

            public void HrUninitConverter(
                IConverterUICallback pcuic)
            {
                throw new NotImplementedException();
            }
        }
    }


  2. Right-click on the Project and select Add -> Class. Enter SalesPipelineConverterPreferences.cs in the Name field and click the Add button to create the class.
    • Add references to the Microsoft.Office.Core namespace.
    • Mark the class as public, inherit it from the IConverterPreferences interface, and add the default implementation
    • Compile your code to make sure that there are no syntax mistakes. Your code should now look something like this:
     
    using System;
    using Microsoft.Office.Core;

    namespace SalesPipeline
    {
        public class SalesPipelineConverterPreferences :         IConverterPreferences
        {
            //
            // IConverterPreferences Members
            //
            public void HrCheckFormat(
                out int pFormat)
            {
                throw new NotImplementedException();
            }
            
            public void HrGetLossySave(
                out int pfLossySave)
            {
                throw new NotImplementedException();
            }
            
            public void HrGetMacroEnabled(
                out int pfMacroEnabled)
            {
                throw new NotImplementedException();
            }
        }
    }


  3. Next we will provide a default implementation for the IConverterPreferences interface.
    • Update the HrCheckFormat method, setting the pFormat output parameter to a value of 1. This setting specifies that we support the WordprocessingML ECMA376 macro-free document format.
    • Update the HrGetLossySave method, setting the pfLossySave output parameter to the integer value of false. This setting specifies that there is no loss of data when saved through our converter.
    • Update the HrGetMacroEnabled method, setting the pfMacroEnabled output parameter to the integer value of false. This setting specifies that we do not support macro enabled formats.
    • Compile your code to make sure that there are no syntax mistakes. The code for your IConverterPreferences implementation should now look something like this:
     
    //
    // IConverterPreferences Members
    //
    public void HrCheckFormat(
        out int pFormat)
    {
        pFormat = 1;
    }

    public void HrGetLossySave(
        out int pfLossySave)
    {
        pfLossySave = Convert.ToInt32(false);
    }

    public void HrGetMacroEnabled(
        out int pfMacroEnabled)
    {
        pfMacroEnabled = Convert.ToInt32(false);
    }


  4. Next we will provide a default implementation for the IConverter interface:
    • Update the HrExport method, setting the ppcp output parameter to a new SalesPipelineConverterPreferences instance.
    • Update the HrGetErrorString, setting the pbstrErrorMsg output parameter to null.
    • Update the HrGetFormat method, setting the pbstrClass output parameter to “SalesPipelineConverter” and setting the ppcp output parameter to a new SalesPipelineConverterPreferences instance.
    • Update the HrImport method, setting the ppcp output parameter to a new SalesPipelineConverterPreferences instance.
    • Update the HrInitConverter method, setting the ppcp output parameter to a new SalesPipelineConverterPreferences instance.
    • Update the HrUninitConverter method, to have no code in it.
    • Compile your code to make sure that there are no syntax mistakes. The code for your IConverter implementation should now look something like this:
     
    //
    // IConverter Members
    //
    public void HrExport(
        string bstrSourcePath,
        string bstrDestPath,
        string bstrClass,
        IConverterApplicationPreferences pcap,
        out IConverterPreferences ppcp,
        IConverterUICallback pcuic)
    {
        ppcp = new SalesPipelineConverterPreferences();
    }
    public void HrGetErrorString(
        int hrErr,
        out string pbstrErrorMsg,
        IConverterApplicationPreferences pcap)
    {
        pbstrErrorMsg = null;
    }

    public void HrGetFormat(
        string bstrPath,
        out string pbstrClass,
        IConverterApplicationPreferences pcap,
        out IConverterPreferences ppcp,
        IConverterUICallback pcuic)
    {
        pbstrClass = "SalesPipelineConverter";
        ppcp = new SalesPipelineConverterPreferences();
    }

    public void HrImport(
        string bstrSourcePath,
        string bstrDestPath,
        IConverterApplicationPreferences pcap,
        out IConverterPreferences ppcp,
        IConverterUICallback pcuic)
    {
        ppcp = new SalesPipelineConverterPreferences();
    }

    public void HrInitConverter(
        IConverterApplicationPreferences pcap,
        out IConverterPreferences ppcp,
        IConverterUICallback pcuic)
    {
        ppcp = new SalesPipelineConverterPreferences();
    }

    public void HrUninitConverter(
        IConverterUICallback pcuic)
    {
        // do nothing for now
    }


  5. Now that we have created a default implementation for an External File Converter, we need to update our COM Server to register and unregister our COM object when the COM server starts and ends. Update the main method using the following steps:
    • Add a call to the Application.OleRequired method before the call to Application.Run.
    • Create our COM object before the call to Application.Run
    • Dispose of our COM object when the call to the Application.Run method returns
    • Compile your code to make sure that there are no syntax mistakes. Your Main method should now look something like this:
     
    using System;
    using System.Windows.Forms;

    namespace SalesPipeline
    {
        static class MyCOMServer
        {
            [STAThread]
            static void Main()
            {
                Application.OleRequired();
                SalesPipelineConverter salesPipelineConverter =
                    new SalesPipelineConverter();
                Application.Run();
                if (salesPipelineConverter != null)
                    salesPipelineConverter.Dispose();
                salesPipelineConverter = null;
            }
        }
    }


  6. Set a breakpoint on the call to the Application.OleRequired method and Run your application. Step through the code till it gets to the Application.Run method, then Run the application. Make sure there are no runtime errors, and the call to create the SalesPipelineConverter object should take a minute as your COM object should register with COM+ services. The COM object is now ready to test with the Word 2007 SP2 application.
    • Note: If you receive an error that the application must be run as an Administrator. Close all instances of Visual Studio, and open it using the right-click, Run as Administrator steps described earlier. This may cause an issue with the registration of your object in COM+ services and you may need to restart from the beginning, using different names and Guid IDs.

  7. To test our External File Converter, we need to register our COM object with the Word application through the registry. Add the following registry keys:
     
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Word\Text Converters\OOXML Converters]

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Word\Text Converters\OOXML Converters\Export]

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Word\Text Converters\OOXML Converters\Export\Sales Pipeline]
    "Clsid"="{CC03A6F5-8517-48c6-B8A5-DD287855F9BA}"
    "Name"=" Sales Pipeline"
    "Extensions"="splx"

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Word\Text Converters\OOXML Converters\Import]

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Word\Text Converters\OOXML Converters\Import\Sales Pipeline]
    "Clsid"="{CC03A6F5-8517-48c6-B8A5-DD287855F9BA}"
    "Name"=" Sales Pipeline"
    "Extensions"="splx"

  8. Before we test our file converter, we need to create a file with the .splx extension that includes the sales pipeline data. Create a file named Sales Pipeline Data Jan 2009.splx, and copy the sales pipeline data listed earlier in this post into the file; then save and close the file.

Implement Import/Export for our Custom File Format

Now that we have created a basic External File Converter, it is time to customize the HrImport and HrExport methods. The HrImport method will be customized to convert our Sales Pipeline XML into a Word table when opened. The HrExport method will be customized to convert the Word table into our Sales Pipeline XML when saved.

  1. Add a reference to System.Xml.Linq, DocumentFormat.OpenXml and WindowsBase
  2. Add using statements to System.IO, System.Xml, System.Xml.Linq, and DocumentFormat.OpenXml, DocumentFormat.OpenXml.Packaging and DocumentFormat.OpenXml.Wordprocessing
     
    using System;
    using System.IO;
    using System.Xml;
    using System.Xml.Linq;
    using Microsoft.Office.Core;
    using System.EnterpriseServices;
    using System.Runtime.InteropServices;

    using OpenXml = DocumentFormat.OpenXml;
    using Packaging = DocumentFormat.OpenXml.Packaging;
    using Wordprocessing = DocumentFormat.OpenXml.Wordprocessing;

  3. Update the HrImport() method with the following code, which reads through the SPLX document format and creates a Word Table with the data.
     
    public void HrImport(
        string bstrSourcePath,
        string bstrDestPath,
        IConverterApplicationPreferences pcap,
        out IConverterPreferences ppcp,
        IConverterUICallback pcuic)
    {
        ppcp = new SalesPipelineConverterPreferences();

        int tempIndex = 0;
        bool foundFile = false;
        string tempDocPath = "";
        string tempDir = (new FileInfo(bstrDestPath)).Directory.FullName;
        while (!foundFile && tempIndex < 999)
        {
            tempDocPath = String.Format("{0}\\~SalesPipeline{1:0000}.docx", tempDir, tempIndex++);
            if (!File.Exists(tempDocPath))
                foundFile = true;
        }
        if (!foundFile)
            throw new FileNotFoundException("Unable to create temp file");

        using (Packaging.WordprocessingDocument tempDoc = Packaging.WordprocessingDocument.Create(tempDocPath, OpenXml.WordprocessingDocumentType.Document))
        {
            // create the table, table properties, and header row
            Wordprocessing.Table salesPipelineTable = new DocumentFormat.OpenXml.Wordprocessing.Table(
                new Wordprocessing.TableProperties(
                    new Wordprocessing.TableStyle() { Val = "TableGrid" },
                    new Wordprocessing.TableWidth() { Width = 0, Type = Wordprocessing.TableWidthUnitValues.Auto },
                    new Wordprocessing.TableBorders(
                        new Wordprocessing.TopBorder() { Val = Wordprocessing.BorderValues.Single, Size = 4, Space = 0, Color = "auto" },
                        new Wordprocessing.LeftBorder() { Val = Wordprocessing.BorderValues.Single, Size = 4, Space = 0, Color = "auto" },
                        new Wordprocessing.BottomBorder() { Val = Wordprocessing.BorderValues.Single, Size = 4, Space = 0, Color = "auto" },
                        new Wordprocessing.RightBorder() { Val = Wordprocessing.BorderValues.Single, Size = 4, Space = 0, Color = "auto" },
                        new Wordprocessing.InsideHorizontalBorder() { Val = Wordprocessing.BorderValues.Single, Size = 4, Space = 0, Color = "auto" },
                        new Wordprocessing.InsideVerticalBorder() { Val = Wordprocessing.BorderValues.Single, Size = 4, Space = 0, Color = "auto" }),
                    new Wordprocessing.TableCellMargin(
                        new Wordprocessing.TopMargin() { Width = 10, Type = Wordprocessing.TableWidthUnitValues.Dxa },
                        new Wordprocessing.LeftMargin() { Width = 10, Type = Wordprocessing.TableWidthUnitValues.Dxa },
                        new Wordprocessing.BottomMargin() { Width = 10, Type = Wordprocessing.TableWidthUnitValues.Dxa },
                        new Wordprocessing.RightMargin() { Width = 10, Type = Wordprocessing.TableWidthUnitValues.Dxa }),
                    new Wordprocessing.TableLook() { Val = "04A0" }),
                new Wordprocessing.TableGrid(
                    new Wordprocessing.GridColumn() { Width = 3192 },
                    new Wordprocessing.GridColumn() { Width = 3192 },
                    new Wordprocessing.GridColumn() { Width = 3192 }),
                new Wordprocessing.TableRow(
                    new Wordprocessing.TableCell(
                        new Wordprocessing.TableCellProperties(
                            new Wordprocessing.TableCellWidth() { Width = 3192, Type = Wordprocessing.TableWidthUnitValues.Dxa }),
                        new Wordprocessing.Paragraph(
                            new Wordprocessing.Run(
                                new Wordprocessing.Text("Customer Name")))),
                    new Wordprocessing.TableCell(
                        new Wordprocessing.TableCellProperties(
                            new Wordprocessing.TableCellWidth() { Width = 3192, Type = Wordprocessing.TableWidthUnitValues.Dxa }),
                        new Wordprocessing.Paragraph(
                            new Wordprocessing.ParagraphProperties(
                                new Wordprocessing.Justification() { Val = Wordprocessing.JustificationValues.Center }),
                            new Wordprocessing.Run(
                                new Wordprocessing.Text("Deal Value")))),
                    new Wordprocessing.TableCell(
                        new Wordprocessing.TableCellProperties(
                            new Wordprocessing.TableCellWidth() { Width = 3192, Type = Wordprocessing.TableWidthUnitValues.Dxa }),
                        new Wordprocessing.Paragraph(
                            new Wordprocessing.ParagraphProperties(
                                new Wordprocessing.Justification() { Val = Wordprocessing.JustificationValues.Center }),
                            new Wordprocessing.Run(
                                new Wordprocessing.Text("Confidence %"))))));

            // loop through each sales item and add a row to the table
            XDocument salesPipelineDoc = XDocument.Load(bstrSourcePath);
            foreach (XElement salesItem in salesPipelineDoc.Root.Descendants("SalesItem"))
            {
                salesPipelineTable.Append(new Wordprocessing.TableRow(
                    new Wordprocessing.TableCell(
                        new Wordprocessing.TableCellProperties(
                            new Wordprocessing.TableCellWidth() { Width = 3192, Type = Wordprocessing.TableWidthUnitValues.Dxa }),
                        new Wordprocessing.Paragraph(
                            new Wordprocessing.Run(
                                new Wordprocessing.Text(salesItem.Element("CustomerName").Value)))),
                    new Wordprocessing.TableCell(
                        new Wordprocessing.TableCellProperties(
                            new Wordprocessing.TableCellWidth() { Width = 3192, Type = Wordprocessing.TableWidthUnitValues.Dxa }),
                        new Wordprocessing.Paragraph(
                            new Wordprocessing.ParagraphProperties(
                                new Wordprocessing.Justification() { Val = Wordprocessing.JustificationValues.Center }),
                            new Wordprocessing.Run(
                                new Wordprocessing.Text(String.Format("${0:#,#}", Convert.ToInt32(salesItem.Element("DealValue").Value)))))),
                    new Wordprocessing.TableCell(
                        new Wordprocessing.TableCellProperties(
                            new Wordprocessing.TableCellWidth() { Width = 3192, Type = Wordprocessing.TableWidthUnitValues.Dxa }),
                        new Wordprocessing.Paragraph(
                            new Wordprocessing.ParagraphProperties(
                                new Wordprocessing.Justification() { Val = Wordprocessing.JustificationValues.Center }),
                            new Wordprocessing.Run(
                                new Wordprocessing.Text(String.Format("{0:#}%", (Convert.ToDecimal(salesItem.Element("ConfidencePercent").Value) * 100))))))));
            }

            // create a document part and markup, inserting the table we created
            tempDoc.AddMainDocumentPart();
            tempDoc.MainDocumentPart.Document =
                new Wordprocessing.Document(
                    new Wordprocessing.Body(
                        salesPipelineTable));
            tempDoc.MainDocumentPart.Document.Save();
            tempDoc.Close();
        }
        File.Copy(tempDocPath, bstrDestPath, true);
        File.Delete(tempDocPath);
    }


  4. Update the HrExport() method with the following code, which reads through the Word Table and exports the values to the SPLX document format.
     
    public void HrExport(
        string bstrSourcePath,
        string bstrDestPath,
        string bstrClass,
        IConverterApplicationPreferences pcap,
        out IConverterPreferences ppcp,
        IConverterUICallback pcuic)
    {
        ppcp = new SalesPipelineConverterPreferences();

        XDocument salesPipelineDoc = new XDocument(new XElement("SalesPipeline"));

        // open the source document
        using (Packaging.WordprocessingDocument tempDoc = Packaging.WordprocessingDocument.Open(bstrSourcePath, false))
        {
            int rowIndex = 0;
            foreach (Wordprocessing.TableRow tableRow in tempDoc.MainDocumentPart.Document.Descendants<Wordprocessing.TableRow>())
            {
                // skip the header row
                if (rowIndex == 0)
                {
                    rowIndex++;
                    continue;
                }

                int cellIndex = 1;
                string customerName = "";
                string dealValue = "";
                string confidencePercent = "";
                foreach (Wordprocessing.TableCell cell in tableRow.Descendants<Wordprocessing.TableCell>())
                {
                    if (cellIndex == 1)
                        customerName = cell.InnerText;
                    else if (cellIndex == 2)
                        dealValue = cell.InnerText.Replace("$", "").Replace(",", "");
                    else if (cellIndex == 3)
                        confidencePercent = (Convert.ToDecimal(cell.InnerText.Replace("%", "")) / 100).ToString();
                    cellIndex++;
                }

                salesPipelineDoc.Root.Add(
                    new XElement("SalesItem",
                        new XAttribute("id", rowIndex),
                        new XElement("CustomerName", customerName),
                        new XElement("DealValue", dealValue),
                        new XElement("ConfidencePercent", confidencePercent)));

                rowIndex++;
            }
        }

        // save it to XML
        salesPipelineDoc.Save(bstrDestPath);
    }


Test the Sales Pipeline External File Converter

You are now ready to test your custom External File Converter.

  1. Open Word 2007 SP2, and select Open from the Office Menu
  2. Select Sales Pipeline (*.splx) from the file type drop-down
  3. Select the Sales Pipeline Data Jan 2009.splx file that you created earlier


  4. Word should open the file and display the data in a table


  5. You can add a row, fill in the appropriate values, and Save the document
  6. The Sales Pipeline Data Jan 2009.splx file should now contain more XML with the data from the newly added row.

You can also edit your sales pipeline information in the ODT format by doing the following:

  1. Use the Save As feature of Word to save the document as an Open Document Text (*.odt) format
  2. Continue editing the ODT file in Word or close Word and open the ODT file using your favorite ODT editor. For example, you could open the file using Open Office Writer or Symphony.
  3. Add a row of data and save the file
  4. If using an application other than Word, open the file using Word 2007, then use the Save As feature to save the document as our Sales Pipeline (*.splx) format
  5. The row(s) that you added should now be saved in our custom document format. You can open the *.splx file using an XML editor to see the added XML record(s).

You can also save your custom format in any other format that Office 2007 SP2 supports. For example, you can use the Save As feature of Word to save the document as HTML.

External File Converter Resources

If you want to create your own Open XML External File Converter, you can read more about and download the API on MSDN. The MSDN article has a link to a code sample that you can download.

As always, let me know if you have any questions or comments.J

Postedby stephenperont | 12 Comments    
Links for 02-24-2009
24 February 09 08:04 PM

In addition to posting my own content, I will from time to time post links to the great new Open XML developer content posted by a few of my colleagues. I took a look back to January of this year and there has been amazingly useful content posted, here is a summary of a few of my favorites. I hope you appreciate the quality of this content! This content just doesn’t exist out there yet; and it is great to see these people dedicating time to get us solid, useful content. They are setting the bar pretty high for the rest of us ... Thanks! (pun intended, chuckles)

Fraunhofer Fokus – IS29500-Validator and Test-Library. Open XML Developer announced this week that Microsoft has partnered with Fraunhofer Fokus (Fraunhofer) on a project that will test the validity of IS29500 documents. As part of the project Fraunhofer will start a community effort to build an Open Source document validator and test library. You can expect to see a lot more published here on my blog and at Fraunhofer’s site about this project as it gets started.

Generating a Product Catalog as a Word Document. Brian Jones and Zeyad Rajabi have posted an article which shows how to build a solution that is able to easily generate a product catalog as a Word document from a database. They use the Open XML SDK 2.0 and build upon previous articles, and this approach works in both client and server environments; this is definitely a good read.

Creating Documents by Using the Open XML Format SDK 2.0. MSDN (Erika Ehrli) has published a three part article where Zeyad Rajabi and Frank Rice break down in very easy terms how to create wordprocessingML, presentationML and spreadsheetML documents. Part 1, starts the series by introducing the Open XML SDK 2.0 CTP, showing how the packaging conventions work with the document parts and showing some sample code to create a simple document. Part 2 shows how to create a SpreadsheetML document by building a chart from a data source to create a sales order. Part 3 explores creating a PresentationML document, with roll-up information from Part 2. This is the most comprehensive article series on Open XML SDK development yet, you should seriously take the time to walk through each part.

Equality Semantics of LINQ to XML Trees. In this post, you will learn why it is important to be able to compare two XML trees for equivalence and how to use Linq to do just that. Eric White uses XSD to validate normalized XML trees and provide differential updates of one tree to the other. Eric explains the issues with normalization and provides excellent guidance on how to handle Xnames, and Xattributes.

Removing Comments from a Wordprocessing Document. Zeyad Rajabi explains on Brian Jones blog how to sanitize a document in order to remove personally identifiable information. Zeyad shows how to programmatically remove the types of personally identifiable information similar to what the Document Inspector feature in Office 2007 does.

How to Copy a Worksheet within a Workbook. Based on feedback from blog readers, Zeyad creates an example of how to safely copy a worksheet between different workbooks. In this post, Zeyad lists the following steps:

 
  1. Open up the Spreadsheet document via the Open XML SDK
  2. Access the main workbook part, which will give us access to a bunch of related parts, like the different worksheets
  3. Access the worksheet we want to copy
  4. Clone the found worksheet plus all related parts and add the clone plus all related parts back to the workbook
  5. Perform cleanup work to ensure that tables, views, etc. work
  6. Add the newly created worksheet reference to the sheets list in the main workbook part
  7. Save changes made to the workbook
 

Zeyad goes out of his way to show useful worksheet scenarios such as conditional formatting, images and tables.

Export Data to Excel. In this post, Erika Ehrli explains how to perform the common ASP.NET developer task of exporting data from a database, Web service, or third-party API to Excel. Erika provides guidance on the different ways this is possible and shows how to avoid the File Format Differ warning in Excel.

Move/Insert/Delete Paragraphs. In this post, Eric White tackles the quite daunting process of copying paragraphs that contain markup which refers to something outside of the paragraph. For example, a paragraph containing markup indicating that a comment begins inside that paragraph, but where the comment ends in a later paragraph, a different paragraph contains the markup indicating the end of the comment. Eric breaks this complex problem down and shows how to safely work with Paragraphs.

Finding Paragraphs by Style Name or Content. Eric continues working with paragraph’s now taking the time to explain methods on how to find the paragraph(s) that you want to work with. This is a logical read before or after the move/insert/delete paragraphs post. Eric shows how to use Linq to query the XML markup of the wordprocessingML to find specific style names or content and get access to that part of the document to work with it.

Postedby stephenperont | 1 Comments    
Implementer Notes Just Make Good Sense
16 January 09 06:46 AM

I am pretty excited about our release of the ECMA-376 Implementer Notes. These notes provide a wealth of information that are very useful to developers who are writing code that interoperates with Office. I have been working with Open XML for quite some time now and there are many days that I look back and wish that I had these notes to help aid me in my development. Here is a quick example.

In this example we will create a simple Excel spreadsheet with one workbook and three worksheets, using the Open XML Format SDK 2.0. We start with some code to create the document.

Before we go write a bunch of code to place values into cells, we can first go to the Document Interop Initiative website to learn about the restrictions that Excel places on values in cells.

We can navigate to the ECMA-376 standard outline by clicking References, then selecting the ECMA-376 1st Edition item from the menu.

We then use the left navigation to expand Part 4, Section 3.3.1.93. v (Cell Value) and click on this node to see the details for this section of the standard. While on that page, we see that there is a View Notes button indicating that there are implementer notes.

We click the View Notes button to open a popup window that displays the notes that implementers have added for this section of the specification. While there, we see a note, from Microsoft, that explains that Office 2007 places value restrictions.

We expand this note, by clicking the blue "..." icon at the end of the note preview. And... voila! We see the restrictions that Excel places on the values of a cell.

This is really great!

This is just a short post to introduce the ECMA-376 Implementer Notes and show how it can help enable implementers to achieve great interoperability with Office. I hope you are having a great day!

Postedby stephenperont | 7 Comments    

This Blog

Tags

No tags have been created or used yet.

Syndication

Page view tracker