Welcome to MSDN Blogs Sign in | Join | Help

Erika Ehrli

Adventures in Office Development and .NET

News

  • who's online visitors here with you.

    Programming Microsoft® Office Business Applications Locations of visitors to this page
Extracting Microsoft Office Application Properties without automation

Every file created by a Microsoft Office application supports a set of built-in document properties. In addition, you can add your own custom properties to an Office document either manually or through code. You can use document properties to create, maintain, and track information about an Office document such as when it was created, who the author is, where it is stored, and so on. To get or set the properties you can use automation to extract the Microsoft Office application properties.

Take a look at the following links for samples:

http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q303296&

http://msdn2.microsoft.com/en-us/library/4e0tda25.aspx

But what happens if you are working with a Web-based application and you want to avoid the use of automation in a Web server…

I found a nice workaround to extract Office document properties without using automation. You can use the Dsofile, an in-process ActiveX component that allows you to read and to edit the OLE document properties that are associated with Microsoft Office files, such as the following:
• Microsoft Excel workbooks
• Microsoft PowerPoint presentations
• Microsoft Word documents
• Microsoft Project projects
• Microsoft Visio drawings
• Other files without those Office products installed

If you are working with a managed application follow the next steps:

  1. Download and install the DSO File control.
  2. Add a reference to InteropDSOfile.dll to your managed Web application.
  3. Create a new Web form and copy the following code.
    <%@ Page Language="C#" %>

    <script runat="server">
        
    protected void btnLoadFile_Click(object sender, EventArgs e)
        {
            
    // Define a path to save the file in the server
            
    string serverTempFilePath Server.MapPath(@"/yourpath/" + FileUpload1.FileName);
            
    FileUpload1.PostedFile.SaveAs(serverTempFilePath);

            
    // Create the DSOFile document
            
    DSOFile.OleDocumentPropertiesClass oleDocument = new DSOFile.OleDocumentPropertiesClass();
            
    DSOFile.SummaryProperties summaryProperties;

            
    oleDocument.Open(serverTempFilePath,
                    
    true,
                    DSOFile.dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess)
    ;

            
    // Extract the properties
            
    summaryProperties oleDocument.SummaryProperties;
            
    tbTitle.Text summaryProperties.Title;
            
    tbAuthors.Text summaryProperties.Author;
            
    tbCompany.Text summaryProperties.Company;
            
    tbNumPages.Text summaryProperties.PageCount.ToString();
            
    tbWordCount.Text summaryProperties.WordCount.ToString();

            
    // Close the DSOFile.OleDocumentPropertiesClass
            
    oleDocument.Close(false);
        
    }
    </script>

    <html xmlns="http://www.w3.org/1999/xhtml">
    <head runat="server">
        
    <title>DSOFileDemo</title>
    </head>
    <body>
        
    <form id="form1" runat="server">
            
    <div>
                
    <strong>
                DSOFileDemo
    </strong><br />
                <
    br />
                <
    table border="1">
                    
    <tr>
                        
    <td valign="top">
                            File upload:
    </td>
                        
    <td>
                            
    <asp:FileUpload ID="FileUpload1" runat="server" />
                            <
    asp:Button ID="btnLoadFile" runat="server" OnClick="btnLoadFile_Click" Text="Load File Properties" /><br />
                        </
    td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Title:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbTitle" runat="server"></asp:TextBox> 
                        
    </td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Author:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbAuthors" runat="server"></asp:TextBox> 
                        
    </td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Company:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbCompany" runat="server"></asp:TextBox> 
                        
    </td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Number of Pages:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbNumPages" runat="server"></asp:TextBox></td>
                    
    </tr>
                    
    <tr>
                        
    <td>
                            Word count:
    </td>
                        
    <td>
                            
    <asp:TextBox ID="tbWordCount" runat="server"></asp:TextBox> 
                        
    </td>
                    
    </tr>
                
    </table>
            
    </div>
        
    </form>
    </body>
    </html>

  4. If you run the previous Web form you will get something like this:

You can also extract custom properties using the DSOFile control.

Have a peek and enjoy!

Posted: Wednesday, November 30, 2005 11:27 AM by erikaehrli
Filed under:

Comments

Bill said:

G'day,

Just wondering if you had any luck with setting or extracting OLE properties for PDF or even Outlook Message files ?

cheers
Bill
# January 6, 2006 12:41 AM

erikaehrli said:

Hi Bill,

I only tried using DSOControl for Office files. However, you can always try extracting generic file properties using the System.IO.FileInfo class:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemiofileinfoclasstopic.asp

# January 6, 2006 1:08 PM

EdgE said:

It's is possible to change (custom)properties of other files (like pdf, txt, bmp) BUT when these files are compressed or burned on cd, these properties are lost.

This is not the case for MS-Office files.
WHY???
# February 14, 2006 4:34 AM

Namrata said:

hi,

how do i extract document properties for pdf files in C#? FileInfo class doesnot give details like author, keywords, comments and other properties in which i am interested.

Regards
Namrata
# March 3, 2006 1:23 AM

fred said:


Looks like the microsoft DSOfile DLL V2.0 (09 feb 06)is bugged : i can update file summary fields only if they've been set manually before (in particular for the "title" field). Otherwise, i get a stupid "persmission is denied" error message, though i'm running locally with admin rights.

The DLL VB6 and .NET demos crash the same if these fields were not manually set before !

This is pretty annoying. I've been looking for an explanation on the web for hours but couln't find any. Microsoft should care more about the quality of its code.


# March 4, 2006 7:06 AM

Emeric said:

I have exactly the same problem as Fred: can't update the "title" and "category" fields if they were not manually set before.

I'm looking for a not bugged component who can update summary fields for PDF files.
# May 3, 2006 9:24 AM

Erika Ehrli said:

Just a little comment to clarify the scope of the DSOfile dll. The Microsoft Developer Support OLE File Property Reader 2.0 is a code sample that demonstrates how to use the OLE IPropertyStrorage interface to read and write the document properties of OLE files, such as the properties of native Microsoft Word, Microsoft Excel, Microsoft PowerPoint, Microsoft Publisher, and Microsoft Visio files.

The sample was not intended to work with PDF files...
# May 4, 2006 4:29 PM

Erika Ehrli said:

Emeric and Fred, it seems that you have an authorization problem here. That is because, the sample code opens the files as read only and with no write Access:

m_oDocument.Open(sFile, fOpenReadOnly, DSOFile.dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess)

See the sample code and check out the following comment and line of code:

' Here we can tell if file was open read-only...
m_fOpenedReadOnly = m_oDocument.IsReadOnly

I also found this comment as part of the sample code:

'The dsoOptionOpenReadOnlyIfNoWriteAccess allows us to open the file, read/write if we have access, but go ahead and open read-only if
we don't. Since viewing properties is main purpose of the sample it is OK for us to fail write access lock on this open...

I hope this helps,

-Erika

# May 4, 2006 4:35 PM

Emeric said:

Thanks for your help Erika, but had the same problems with Word files: could'nt update the "title" and "category" fields if they were not manually set before.

But I find the solution and here it is:
http://www.codecomments.com/message813451.html

All you have to do is to debug the dsofile by yourself. Just follow the instruction given in the link, it's not very difficult and it works fine, even for pdf files.
# May 5, 2006 8:25 AM

Erika Ehrli said:

Emeric and everyone,

I am sorry that the DSOFile control has problems to set fields (title and summary) that were not manually set before.

I loved this control because it was great for extracting properties and my intentions to share this with the community were the best. I am sure some people might find this useful however.

I also want to share that Ken Getz just wrote a new column on how to extract document properties using Office 2007.

http://msdn.microsoft.com/msdnmag/issues/06/06/AdvancedBasics/default.aspx

You will see it's quite interesting and I love the fact that the new file formats offer better ways to extract/update document properties and override the need to use automation or the DSOControl.

Extracting/writing document properties contained in an XML document is very simple, I am sure everyone will be delighted with this new option.



# May 11, 2006 7:18 PM

Sunith Nair said:

If you want to access the files that are on a remote machine and you are trying to use impersonation then you will need to use AspCompat="true" for the page trying to access the file. Take a look at this KB for more information http://support.microsoft.com/kb/325791

-Sunith Nair
# June 22, 2006 8:10 AM

Isaih said:

Hi I've used dsofile.dll tp develop an application. Dsofile.dll fails to register when I deploy to another computer even though it returns a "registration successful" method when regsvr32 is used at the command line.

Any ideas?

Thanks
# July 13, 2006 1:23 AM

Antony said:

What is the best way t0 get PDF properties ? Can this be used ?
# July 18, 2006 11:04 AM

Henrik said:

Hello,
I have also tried this component and used both versions that I know of. However, when I upgraded to version 2, I can no longer store properties with empty strings ("") as values. If so, Windows will no longer display any custom properties on the properties tab (even though they are there). And if opening the file in Word, and then trying to look at the custom properties, Word will crash. How come this is? Is there anyone that has a workaround?
# July 20, 2006 1:23 AM

John Rummell said:

Thanks for the example, this is great.  I can't getting working though ...

I'm getting an Access denied error on the line that instantiates the DSOFile.OleDocumentPropertiesClass object ... any ideas?
# September 25, 2006 4:03 PM

John Rummell said:

I got it working - had to fix security for com applications.
# September 27, 2006 1:37 PM

John Rummell said:

Since I've received a few emails on how I fixed my issue, I'll drop the link to my solution here --> http://forums.asp.net/thread/1409599.aspx

# November 15, 2006 6:30 PM

James said:

Please forgive as this may be a very simple fix. I am very new to programming but have beeb spending many late nights on this.  I keep getting the following error:

Compiler Error Message: CS0246: The type or namespace name 'DSOFile' could not be found (are you missing a using directive or an assembly reference?)

I do have the reference and the dll is registered both on my local machine and server.  I even tried to Import Namespace="DSOFile" and "Interop.DSOFile" and "InteropDSOFile" and none of that worked.

Please help.

Thanks,

James

# December 1, 2006 10:57 PM

Nike said:

Maybe a simple thought but, if there's a fix for this issue, why doesn't ms release a new version of the DSOFile.dll with the fix embedded? Seems logical to me...

# December 6, 2006 5:36 AM

roylasris said:

So how can I extract, let's say, a document's subject in a docx (2007) Word file. DSOfile doesn't seem to work against a docx file.

Thanks,

Roy

# December 22, 2006 2:45 PM

roylasris said:

Part 2: I have studied Erika's letter and Ken Getz' article, but am still confounded as to how I can, within a VBA project, extract, let's say, the 'subject' of an unopened Word2007 document. (I can do it with an open document without any problem. Its the unopened ones that give me a problem.)

Given how simple 2007 is supposed to make investigating the various parts of a docx document, it would seem that it should be an easier process.  (Ken's article (for a VBA person such as myself) is way beyond me.) Is there a simple 'replacement' for dsofile in 2007?

   --Roy

# December 24, 2006 7:01 AM

Rifmetroid said:

Sorry my English isn't very well, but i'll try.

Does anybody use the dsofile.dll under Windows XP64?

Does it work? I have some Problems with it and i want to fix it.

Rif

# January 12, 2007 6:57 AM

Liesha said:

Has anyone tried using DSO for files on a mapped drive or a  remote network location ..

Would be obliged if one could suggest how it works...

Thanks,

-Liesha.

# January 18, 2007 9:32 AM

A. Mandl said:

Hello, I am trying to get the digital Signiture with DSO File. It is enough for me to find out weather an office file is signed or not (I don't need to proof the signiture or get the signiture...)

is this possible?

regards

Alex

# February 26, 2007 9:31 AM

A. Mandl said:

Hello, another question:

with DSOFile 1.4 it was possible to find out if a macro was attached to an Office File - with DSOFile 2.0 I have not found this opportuity?!

Am I doing something wrong?

regards Alex

# February 27, 2007 9:12 AM

Pachara said:

Hi Erika

  I have some question. I need add Summary Properties for file(xls,pdf,txt,.....) by C#. But When I complie this program completed and right click this file. file summary properties it is enable. it can't change summary properties by manual.

 when open file with DSOFile it Error:"A lock violation has occurred. (Exception from HRESULT: 0x80030021 (STG_E_LOCKVIOLATION))"

Thank you

Pachara

# February 27, 2007 12:18 PM

A.Mandl said:

Hi again: has anyone a full list of the Documentproperties and their types? than I can extend the DSOFile source...

cheers alex

# February 27, 2007 4:08 PM

Pachara said:

Hi!

I make program use C# for edit summury properties file. it can't open file .txt.

--**code exsample**---

String filePath = "C:\myfile.txt";

myDSOOleDocument.Open(filePath, false, DSOFile.dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess)

myDSOOleDocument.SummaryProperties.Keywords = txtKeywords.text;

myDSOOleDocument.save();

# February 28, 2007 7:16 AM

sth_Weird said:

Hello.

I'm afraid the sample doesn't work for me, either.

It doesn't throw any exception when I simply open a file, but it can't write anything. Neither with the sample application nor with my own c# crogramme. I get an access denied error every time I try and the programme crashes.

The file I tried to use is a simple txt file I created using C#.

When using the OleDocumentPropertiesClass to open a file in my c# code I get an exception that the file has no ole storage.

???

# March 9, 2007 9:01 AM

Pachara said:

Hello!

You can't show your sample code for me.

thank you very much.

# March 14, 2007 7:37 AM

TrevG said:

Hi,

I don't have a C complier to rebuild the DSOFILE.dll as described by Emeric in May.

Can anyone e-mail me the patched DSFILE.DLL to tjg001@tpg.com.au

Regards ..... Trevor G

# March 29, 2007 10:29 PM

Amit said:

Hi,

dso Document Properties does not match with document properties shows in File->property window in application.

PLease check it.

i think this bug in dso dll.

Thanks

Amit

# May 16, 2007 8:35 AM

tmlay said:

Column handlers in XP do not look at the same property set fields as in Windows 2K.

I have an application that allows you update propery fields like author and comments for a variety of file types including PDF's. The information is properly displayed for all file types in windows 2K.

In windows XP, the author and comments fields are not displayed for multi-media files (jpg, gif etc.) in Windows Explorer although if you reinvoke my shell extension the values are there.

Is there any documentation on other property sets? and which property sets are referenced by different column hanlders in Explorer??

Thanks,

Tom

# July 7, 2007 9:59 AM

Sandeep Mishra said:

Hi,

I have a problem with DSOFile 2.0,I am trying to identify that whether a macro is attached to my document using DSOFile.

How this can be obtained.

Regards

Sandeep

# July 19, 2007 1:10 AM

Sandeep Mishra said:

Hi Erika,

I have a problem with DSOFile 2.0,I am trying to identify that whether a macro is attached to my document using DSOFile.

How this problem can be solved.Coz there is no direct property in DSOFile 2.0 to identify a macro in my document.

looking for your reply,its urgent.

Do any one else know the solution to this?

Regards

Sandeep

# July 20, 2007 7:00 AM

Dave Kolb said:

I installed the latest version of dsofile and find that using the FilePropDemoVB7 program and merely looking at a file adds ADS files to the file being looked at even while it shows you there are no extended properties. At least it did for me for .txt, .rtf and .htm files. Has anyone else experienced this? Hopefully dsofile will be fixed to not do this. This is a bug in either the demo program of dsofile is it not? Thanks, Dave

# August 7, 2007 2:17 AM

Dave Kolb said:

Also, is there anyway to get Vista explorer to show the new properties? I added a "description" to a .txt file and turned on "description" in the explorer view for that folder but did not see the description data I entered. Dave

# August 7, 2007 2:18 AM

Lakshminarayana said:

Hi,

I have used DSOFile.dll v2.1 to read and write the custom properties of the document. When opened the custom properties tab by right-clicking on the file, no properties are visible even though they are present with the file.

These properties are visible if I open up the document and view the properties from the File menu.

Why cannot I see them by right clicking on the file itself? That too, this is happening with only few documents.

If I re-open the file using the following code snippet and closes it, it shows up the same custom properties by right-clicking them. But this solution doesn't help me in my project.

**********************

       If strWindowsFilePath.Substring(strWindowsFilePath.IndexOf(".") + 1, 3) = "doc" Then

           Dim oWord As Word.Application

           Dim oDoc As Word.Document

           Dim oBuiltInProps As Object

           Dim oCustomProps As Object

           Dim oProp As Object

           'Create Word Application

           oWord = CreateObject("Word.Application")

           'Open the document

           oDoc = oWord.Documents.Open(strWindowsFilePath)

           'Get custom properties collection

           oCustomProps = oDoc.CustomDocumentProperties

           'This will let the word know that the document should be saved

           oWord.ActiveDocument.Saved = False

           'Save changes in the document

           oWord.ActiveDocument.Save()

           'Quit Word

           oWord.Quit(savechanges:=True)

*************************************************

Your help greately appreciated!!!

# August 16, 2007 8:58 AM

Ruben said:

Thanks for this information, this was very useful to me, because I didn't really want to use Word Interop to simply set and get some document properties. Using DSO is a much cleaner and leaner solution.

# October 4, 2007 5:19 AM

Saju said:

Hi Pachara,

The error that you (and probably others facing problems with Office 2007) are facing might be due to registration of the dll. If you change the location of dsofile.dll, after extracting it, you need to register it using: regsvr32 [File path]dsofile.dll (in Windows -> Run). Hope this helps.

# October 11, 2007 9:13 AM

Zal Ahmet said:

I have been enocuntering a great deal of problems trying to get this example to work.  I hope someoner can help me.

I have added a page to my website wherei have inserted all the code from the above example.

I have downloaded the dsofile.exe and registered it to the server regsvr.exe file.  I have added the dll file to my bin folder, so i get Interop.DSOFile.dll in my bin folder.

I ran the example and i upload a word document from my desktop to a folder location in my website.  I have given access rights to read and write to this folder so i hav no issues there.  When i click on the bowse button the location from my desktop is inserted into the file control.  when i click the load properties the proporties of the document are not populated.  And when i browsse to the folder location in the website to the file which has jsut been uplaoded the docmuent content disappears.  The only thing which appears as the content of the word document is the following line:

Mediachase.FileUploader.McHttpModule:a74523fa-8957-4424-a205-21af536f39b2

I have used the uplaod method many times in previous work so i dont understand what is happening.  Also none of the document proerties exist in this uploaded document.

Please help!

Many thanks in advance

zal

# October 31, 2007 7:24 AM

Zal said:

replying to my previous issue.

I have resovled it i was not using the correct file uploader so i jsut changed it to the  Mediachase.FileUploader and it worked.

yeppie doooooooooooooooo

Happy programming

# October 31, 2007 9:50 AM

Raam said:

Hi,

I tried to Extract Text from Excel File, after Open the Excel file ,some files get extracted but some other not. now i am using VS2005 language C#

am new for Automation Process

# April 8, 2008 5:56 AM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker