Erika Ehrli - Adventures with Office Products & Technologies
MSDN & TechNet: Releasing Office, SharePoint, Exchange & Lync Centers and content for developers and IT professionals.

Extracting Microsoft Office Application Properties without automation

Extracting Microsoft Office Application Properties without automation

  • Comments 56

Every file created by a Microsoft Office application supports a set of built-in document properties. In addition, you can add your own custom properties to an Office document either manually or through code. You can use document properties to create, maintain, and track information about an Office document such as when it was created, who the author is, where it is stored, and so on. To get or set the properties you can use automation to extract the Microsoft Office application properties.

Take a look at the following links for samples:

http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q303296&

http://msdn2.microsoft.com/en-us/library/4e0tda25.aspx

But what happens if you are working with a Web-based application and you want to avoid the use of automation in a Web server…

I found a nice workaround to extract Office document properties without using automation. You can use the Dsofile, an in-process ActiveX component that allows you to read and to edit the OLE document properties that are associated with Microsoft Office files, such as the following:
• Microsoft Excel workbooks
• Microsoft PowerPoint presentations
• Microsoft Word documents
• Microsoft Project projects
• Microsoft Visio drawings
• Other files without those Office products installed

If you are working with a managed application follow the next steps:

  1. Download and install the DSO File control.
  2. Add a reference to InteropDSOfile.dll to your managed Web application.
  3. Create a new Web form and copy the following code.
    <%@ Page Language="C#" %>

    <script runat="server">
        
    protected void btnLoadFile_Click(object sender, EventArgs e)
        {
            
    // Define a path to save the file in the server
            
    string serverTempFilePath Server.MapPath(@"/yourpath/" + FileUpload1.FileName);
            
    FileUpload1.PostedFile.SaveAs(serverTempFilePath);

            
    // Create the DSOFile document
            
    DSOFile.OleDocumentPropertiesClass oleDocument = new DSOFile.OleDocumentPropertiesClass();
            
    DSOFile.SummaryProperties summaryProperties;

            
    oleDocument.Open(serverTempFilePath,
                    
    true,
                    DSOFile.dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess)
    ;

            
    // Extract the properties
            
    summaryProperties oleDocument.SummaryProperties;
            
    tbTitle.Text summaryProperties.Title;
            
    tbAuthors.Text summaryProperties.Author;
            
    tbCompany.Text summaryProperties.Company;
            
    tbNumPages.Text summaryProperties.PageCount.ToString();
            
    tbWordCount.Text summaryProperties.WordCount.ToString();

            
    // Close the DSOFile.OleDocumentPropertiesClass
            
    oleDocument.Close(false);
        
    }
    </script>

    <html xmlns="http://www.w3.org/1999/xhtml">
    <head runat="server">
        
    <title>DSOFileDemo</title>
    </head>
    <body>
        
    <form id="form1" runat="server">
            
    <div>
                
    <strong>
                DSOFileDemo
    </strong><br />
                <
    br />
                <
    table border="1">
                    
    <tr>
                        
    <td valign="top">
                            File upload:
    </td>
                        
    <td>
                            
    <asp:FileUpload ID="FileUpload1" runat="server" />
                            <
    asp:Button ID="btnLoadFile" runat="server" OnClick="btnLoadFile_Click" Text="Load File Properties" /><br />
                        </
    td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Title:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbTitle" runat="server"></asp:TextBox> 
                        
    </td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Author:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbAuthors" runat="server"></asp:TextBox> 
                        
    </td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Company:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbCompany" runat="server"></asp:TextBox> 
                        
    </td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Number of Pages:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbNumPages" runat="server"></asp:TextBox></td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Word count:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbWordCount" runat="server"></asp:TextBox> 
                        
    </td>
                    
    </tr>
                
    </table>
            
    </div>
        
    </form>
    </body>
    </html>

  4. If you run the previous Web form you will get something like this:

You can also extract custom properties using the DSOFile control.

Have a peek and enjoy!

Leave a Comment
  • Please add 7 and 1 and type the answer here:
  • Post
  • G'day,

    Just wondering if you had any luck with setting or extracting OLE properties for PDF or even Outlook Message files ?

    cheers
    Bill
  • Hi Bill,

    I only tried using DSOControl for Office files. However, you can always try extracting generic file properties using the System.IO.FileInfo class:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemiofileinfoclasstopic.asp

  • It's is possible to change (custom)properties of other files (like pdf, txt, bmp) BUT when these files are compressed or burned on cd, these properties are lost.

    This is not the case for MS-Office files.
    WHY???
  • hi,

    how do i extract document properties for pdf files in C#? FileInfo class doesnot give details like author, keywords, comments and other properties in which i am interested.

    Regards
    Namrata

  • Looks like the microsoft DSOfile DLL V2.0 (09 feb 06)is bugged : i can update file summary fields only if they've been set manually before (in particular for the "title" field). Otherwise, i get a stupid "persmission is denied" error message, though i'm running locally with admin rights.

    The DLL VB6 and .NET demos crash the same if these fields were not manually set before !

    This is pretty annoying. I've been looking for an explanation on the web for hours but couln't find any. Microsoft should care more about the quality of its code.


  • I have exactly the same problem as Fred: can't update the "title" and "category" fields if they were not manually set before.

    I'm looking for a not bugged component who can update summary fields for PDF files.
  • Just a little comment to clarify the scope of the DSOfile dll. The Microsoft Developer Support OLE File Property Reader 2.0 is a code sample that demonstrates how to use the OLE IPropertyStrorage interface to read and write the document properties of OLE files, such as the properties of native Microsoft Word, Microsoft Excel, Microsoft PowerPoint, Microsoft Publisher, and Microsoft Visio files.

    The sample was not intended to work with PDF files...
  • Emeric and Fred, it seems that you have an authorization problem here. That is because, the sample code opens the files as read only and with no write Access:

    m_oDocument.Open(sFile, fOpenReadOnly, DSOFile.dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess)

    See the sample code and check out the following comment and line of code:

    ' Here we can tell if file was open read-only...
    m_fOpenedReadOnly = m_oDocument.IsReadOnly

    I also found this comment as part of the sample code:

    'The dsoOptionOpenReadOnlyIfNoWriteAccess allows us to open the file, read/write if we have access, but go ahead and open read-only if
    we don't. Since viewing properties is main purpose of the sample it is OK for us to fail write access lock on this open...

    I hope this helps,

    -Erika

  • Thanks for your help Erika, but had the same problems with Word files: could'nt update the "title" and "category" fields if they were not manually set before.

    But I find the solution and here it is:
    http://www.codecomments.com/message813451.html

    All you have to do is to debug the dsofile by yourself. Just follow the instruction given in the link, it's not very difficult and it works fine, even for pdf files.
  • Emeric and everyone,

    I am sorry that the DSOFile control has problems to set fields (title and summary) that were not manually set before.

    I loved this control because it was great for extracting properties and my intentions to share this with the community were the best. I am sure some people might find this useful however.

    I also want to share that Ken Getz just wrote a new column on how to extract document properties using Office 2007.

    http://msdn.microsoft.com/msdnmag/issues/06/06/AdvancedBasics/default.aspx

    You will see it's quite interesting and I love the fact that the new file formats offer better ways to extract/update document properties and override the need to use automation or the DSOControl.

    Extracting/writing document properties contained in an XML document is very simple, I am sure everyone will be delighted with this new option.



  • If you want to access the files that are on a remote machine and you are trying to use impersonation then you will need to use AspCompat="true" for the page trying to access the file. Take a look at this KB for more information http://support.microsoft.com/kb/325791

    -Sunith Nair
  • Hi I've used dsofile.dll tp develop an application. Dsofile.dll fails to register when I deploy to another computer even though it returns a "registration successful" method when regsvr32 is used at the command line.

    Any ideas?

    Thanks
  • What is the best way t0 get PDF properties ? Can this be used ?
  • Hello,
    I have also tried this component and used both versions that I know of. However, when I upgraded to version 2, I can no longer store properties with empty strings ("") as values. If so, Windows will no longer display any custom properties on the properties tab (even though they are there). And if opening the file in Word, and then trying to look at the custom properties, Word will crash. How come this is? Is there anyone that has a workaround?
  • Thanks for the example, this is great.  I can't getting working though ...

    I'm getting an Access denied error on the line that instantiates the DSOFile.OleDocumentPropertiesClass object ... any ideas?
Page 1 of 4 (56 items) 1234