Extract Files from Patches

Extract Files from Patches

Rate This
  • Comments 18

From the mailbag, someone asked how to extract files from a patch. Now presumably one would want to extract the files as they apply to a product if the patch were installed but I will cover both ways because one can lead to the other. If you're looking for the simplest and quickest way to extract files from a patch skip toward the end; otherwise, if you're interested in the structure of a .msp file and how to extract all files regardless of a particular product the patch targets read on.

Recall from What's in a Patch that a .msp file contains sub-storages for pairs of transforms that transform the patch target package to the patch upgrade package, and possibly one or more sub-streams for the cabinet files that contain the files to be patched. Because the internal structure of a .msp file uses OLE structured storage you can extract the transforms and cabinet files out; however, to allow for 72 characters instead of 36 characters as limited by OLE, Windows Installer compresses stream names except for the summary information stream, named \005SummaryInformation. You'll also find more streams than perhaps expected for use by Windows Installer. That doesn't prevent you from at least extracting the cabinet files from .msp files.

To enumerate and thereby extract all sub-storages and streams use the OLE structured storage APIs like the StgOpenStreamEx function to get a pointer to the IStorage interface. Call the IStorage::EnumElements function on the interface to get the IEnumSTATSTG interface pointer. As typical with IEnumXXXX interface implementations, call the Next function. In this case you get an STATSTG structure. If the STATSTG.type field is STGTY_STORAGE (1) you've found a transform and the STATSTG.pwszName is the name of the transform. If the STATSTG.type field is STGTY_STREAM (2) you've found a stream. To determine if the stream is a cabinet you can check the first 4 bytes of the stream for "MSCF".

Patches produced with PatchWiz.dll from the Windows Installer SDK will contain one cabinet with all files for all transforms in the patch. The files in the cabinet all use the value of the File column of the File table so with a quick lookup you can get whatever files you want. This allows you get all of the files for a patch regardless of what product .msi packages the patch targets. Obviously there's quite a bit of work here.

A similar approach is to open the .msp file using the MsiOpenDatabase function, passing MSIDBOPEN_PATCHFILE for the second parameter. Note that this cannot be done in a custom action because the second parameter will marshal as a string so any value besides MSIDBOPEN_READONLY (0) won't marshal correctly.

You can then use the view APIs like the MsiDatabaseOpenView, MsiViewExecute, and MsiViewFetch functions to query the _Storages table to get the transforms and the _Streams table to get the cabinet file in a patch. Querying the _Streams table in a .msi file or a .msm file may also return other streams like binaries in the Binary table or icons in the Icon table. While you can read data directly from the Data column of the _Streams table using the MsiRecordReadStream function you cannot read from the Data column of the _Storages table. You can use the names and the OLE structured storage APIs as described above to get the exact name of the sub-storage to extract using the IStorage::OpenStorage function.

There is a much simpler way to accomplish all of this but you'll only extract files from a patch that apply to a specific product since the first pair of transforms to apply to a product from a patch are used. If your patch only targets a single product then you have no worries. You first perform an administrative installation of the target product .msi package, which runs only basic actions like InstallFiles in the AdminExecuteSequence table. Passing the command to start /wait will block until msiexec.exe completes and returns.

start /wait msiexec /a product.msi TARGETDIR="%TMP%\Product" /qn

Next you apply the patch that contains the files you want to extract. This is the same method you would use to apply any minor upgrades that a patch might target. Patches will typically transform the AdminExecuteSequence table to add the PatchFiles action.

start /wait msiexec /p patch.msp /a "%TMP%\Product\product.msi" /qn

Now the files that were patched in the product will exist in the directory structure and you can fish them out as necessary. If your patch targets multiple products you'll need to repeat this for each product, which is why in such cases the more complicated method of file extraction described above is beneficial. Note also that any directories that depend upon 64-bit redirection but whose source directories structures are the same will overwrite files because such redirection is not performed for administrative installations. This happened with an early pre-release of the .NET Framework 2.0.

Leave a Comment
  • Please add 6 and 6 and type the answer here:
  • Post
  • How to identify different Windows Installer file types without relying on the file extension.
  • Do you know of any command line utils that'll extract the file streams not that worried about the filenames, I need to generate hashes of any executables contained with the msp to use with our security software....
  • Ian, as I pointed out toward the bottom of my article you can use msiexec.exe to create an admin image and then patch it. While you could extract the cabinet using the other means I mentioned, then extract files from the cabinet, I wouldn't recommend that since a patch can patch only parts of files (delta patching) so your hashes would be useless.
  • From what I can see most MSP's contain complete files rather than portions of... What I need to do is extract files from MSP's from a WSUS update source. I can't go down the admin installation route as a) I need to capture every single version of the files b) To work out which MSI each MSP relates to and then create admin points for each instance would be very time intensive.

    I was hoping to be able to locate the MSP's in each dirnectory (actually there's a intermedate step where the MSPs are compressed in CAB files) then extract the files into a temp directory for each MSP, then run my scan tools to build the hashes and lists. I guess if a small number do only contain part files, that'll show up as an exception later on, but to get the bulk out of the way would be a greate time saver.

    I've already got the cmd scripts written to recurse down the structure, process any MSPs, extract and MSPs in CAB files, then finally extract files from the MSPs - but I need a cmd line util that I can run to do this last stage. Tools like 7-zip and RAR enable me to open the MSP thru their GUI's and extract files (in most cases) - but the command line versions don't seem to process MSPs....

    Any ideas?!
  • Ian, you can simply write a script using the MSI automation objects to open the MSP, select from the _Streams table, which allows you to call Record.ReadStream and write the contents - the cabinet - out to disk. Since you clearly have a utility to extract files from a CAB (there are also plenty of tools that do it, like cabarc.exe in the Platform SDK) it shouldn't be a problem, then.

    I'll make a note to post some sample source but I can't promise when I'll post it.
  • Download the source and a release binary for a tool to extract transforms and binary streams like cabinets from Windows Installer files, such as patches.
  • Just playing with MSI...
  • find windows installer works better in command files (brings up less windows) using:

    cmd /c msiexec /a product.msi TARGETDIR="%TMP%\Product" /qn

    rather than start /wait. just my 2 cents
  • Tom, using "cmd /c" or "start /wait" results in the same thing and has nothing to do with what UI is displayed. Both simply call CreateProcess(). It's that you pass "/qn" to msiexec.exe that hides any UI.

    Using both "cmd /c" or "start /wait" are equivalent and both good methods to block in a script until msiexec.exe exits.
  • Your post is very interesting to me and I am in the process of re-reading it and trying to fully digest all the great info.

    I am implementing a "Primary Installer" as defined in the Vista Restart Manager docs.  The Restart Manager requires that a primary installer register all the files that are going to be patched so that it can discover all the processes that need to be shutdown and restarted.  The new function in MSI 4.0, MsiGetPatchFileList, seems to provide the bridge for retrieving the files to be registered with the Restart Manager.  But here's the fun part.  I'm trying to implement a facsimile of the Restart Manager on earlier OS's (i.e. win2k, XP, 2003).  I was hoping that I could write MsiGetPatchFileList myself but it would be good to know: a) if this is even possible, and b) if this makes sense, and c) how to do it.

    Any help is appreciated.  I have yet to through the sample code you provided so maybe I'll be a bit smarter in a day or two.

    Cheers,
    -Daniel
  • Daniel, Windows Installer 4.0 on Vista works with the Restart Manager already. I may be misunderstanding you, but you should not have to register files with the Restart Manager prior to installing patches.

    For down-level support, you could extract the cabinet stream (even to memory, since the stream itself would be loaded with calls to OLE storage) and get the file IDs, comparing those against your MSI. MsiGetPatchFileList works on internal datas, so it's difficult to replicate.

    In my experience, though, I'd recommend working with the technology. Older patches did a lot of custom "things" that lead to a great many KB articles about the patches themselves. Windows Installer does work with files in use, and if you author a FilesInUse dialog you can at least prompt users to close windowed applications that have a title.
  • Thanks for your response, Heath. I guess my post was a bit terse so I'll try to provide more context.  We are writing a general purpose patch client UI that will install MSP patches. I guess the best analogy for this is Windows Update.  We author the patches for our own products and push it to our servers.  The client piece will check the server periodically and automatically download updates.  We don't want the installer UI to display, so we install these patches in silent mode using MsiApplyMultiplePatches and display our own progress UI.

    The Vista Restart Manager documentation refers to this type of client component as a "Primary" installer - because it executes 1 or more patch installs.

    As I understand it, the Vista Restart Manager API documentation indicates that a Primary installer should create a "session", register the resources (aka files, apps and services) that are going to be replaced, call RmShutdown, execute the patch(es), then call RmRestart.

    Anyway, I guess I thought MsiGetPatchFileList was the easy way to learn what the files are.  Also, it seemed more than a coincidence that this function was added side-by-side with Restart Manager support.

    If we know that we are only supporting MSI 3.0 for our own MSP's, we don't really have the risk of having to worry about funky legacy patches.  Also, my assumption was that MsiGetPatchFileList did not have a dependency on tools that only build MSI 4.0 compatible patches.

    But perhaps I can hook into the FilesInUse callback to learn about the programs that need to quit.  I'm going to look into this.

    Any other pointers appreciated.

    thanks!
  • Daniel, keep in mind that MSI is an engine. If and when a newer version of MSI ships down-level (pre-Vista) it may include that function, but currently it is only defined in msi.dll v4, and can't simply be dropped down-level. You can build MSI 4-compatible patches that take advantage of the new functionality, but as long as the Page Count is, for example, 300, you can run this in MSI 3.0 and newer. There are a lot of back- and forward-compatibility tests the MSI team runs.

    It really sounds like what you should write is an external UI handler. You can even get a program list back for files in use and display the UI however you want. This also gives you the ability to show real progress. You could, for example, divide your progress maximum by the number of patches to installed * 2 (because generating the script and running the script both track their own progress). If you schedule any InstallExecute or InstallExecuteAgain actions, you'll want to take that into account. This way, you can show users actual process across all of the patches to be installed.
  • Thanks again Heath.

    Actually, we have written an external UI handler and have this working pretty well.  We are calling MsiSetExternalUI and listening for the progress messages et all.

    What it comes down to is that we need either the list of files in use or the programs using them.  Like I said before, MsiGetPatchFileList seemed like an easy place to start; my assumption was that maybe it wouldn't be too difficult to implement on earlier OS's but it sounds like I'm wrong about this.

    Our external UI filter isn't currently asking for the INSTALLMESSAGE_FILESINUSE message, but we are putting that code in right now.

    From this sample code:

    http://msdn.microsoft.com/library/en-us/msi/setup/handling_progress_messages_using_msisetexternalui.asp

    it's not exactly clear what the format of the message text is but I guess we can debug through and find out.  My initial searches for this message text format came up empty.

    Anyway, I'll get back to you on our progress.  Thanks for the attention and tips.
  • We've added the INSTALLLOGMODE_FILESINUSE flag to MsiSetExternalUI, but it doesn't look like we are getting the INSTALLMESSAGE_FILESINUSE message.  Although at the end of the install, we are getting the Reboot required error code.  So we're pretty sure we are correctly simulating the file in use test.  Uggh
Page 1 of 2 (18 items) 12