I posted previously about needing the ability to extract images from articles submitted by our authors.  I also posted about how to write tools to do this programmatically, from both Word .docx files and packaged files in general.  However, due to the fact that some authors haven't yet upgraded to Office 2007, we still receive .doc files from which we need to extract images, and since .doc is not based on the Open Packaging Convention (OPC), we can't use the .NET Packaging APIs to parse the file format.

To get around this, and given the code we already have for extracting images from .docx files, one of the easiest ways to extract images from .doc files is to first convert the .doc into a .docx, and then use the code we already have to extract the images.  With Office 2007 and its .NET programmability support installed, we can do this with a few lines of code:

private static void ConvertDocToDocx(string sourceDoc, string destinationDocx)
{
    object saveChanges = false, optional = Missing.Value;

    // Load Word
    ApplicationClass wordApp = new ApplicationClass();
    wordApp.Visible = false;
    try
    {
        // Open the source .doc
        object objFilename = sourceDoc;
        Document document = wordApp.Documents.Open(ref objFilename,
            ref optional, ref optional, ref optional, ref optional, ref optional,
            ref optional, ref optional, ref optional, ref optional, ref optional,
            ref optional, ref optional, ref optional, ref optional, ref optional);

        // Save it as a .docx
        objFilename = destinationDocx;
        object wordFormat = WdSaveFormat.wdFormatXMLDocument;
        document.SaveAs(ref objFilename, ref wordFormat,
            ref optional, ref optional, ref optional, ref optional, ref optional,
            ref optional, ref optional, ref optional, ref optional, ref optional,
            ref optional, ref optional, ref optional, ref optional);

        // Close the original .doc
        ((_Document)document).Close(ref saveChanges,
            ref optional, ref optional);
    }
    finally
    {
        // Close Word
        wordApp.Quit(ref saveChanges, ref optional, ref optional);
        Marshal.FinalReleaseComObject(wordApp);
    }
}

-Stephen