Welcome to MSDN Blogs Sign in | Join | Help

Inside the MSI file format.

I've sat down to write this blog no less than four times in the last week. Each time something has come up that has pulled me away from actually getting far enough into writing that it becomes basically self-propelled. Now tonight, I know there is at least one person out there focused on getting her homework done so I thought I'd buckle down and plow through a bit of writing myself.

Let's talk about MSI files. First, MSI files got their name back when we thought Darwin was going to be called the "Microsoft Installer". Thus the file extension MSI made some sense. Unfortunately, it so late when the name changed to "Windows Installer" that it wasn't really feasible to safely change the extension that everyone had come to know and love as MSI.

Anyway, the vision driving development of Darwin was that setup needed to be a transacted set of changes to a target machine that could be aborted and cleaned up if an error occurred or the user cancelled setup. This means the setup logic must be declarative so that an engine can interpret the logic and calculate not only the changes to the target but the changes necessary to undo any of those changes should something go wrong. There are many ways to define data declaratively (XML being my personal favourite these days) but back around 1995 (when Darwin was first started) the team decided the setup logic should be in a database. Unfortunately, all of the database technologies back then required substantial amounts of setup before they could be used. Since a setup technology is kinda' needed before you can setup anything, it wasn't really feasible to use any of database engines that existed. Think classic chicken and egg problem.

So, the Darwin team decided to build a custom relational database. As an aside, in my humble opinion, building this custom relational database to store all the setup logic was unnecessary and generated a lot of overhead over the years (especially for those of us that have to create the flipping MSI files). However, my opinion is based on hindsight and we all know we see better when looking back on history. Anyway, I just wanted to be up front that I can't provide a really strong justification for why MSI files had to be relational databases.

Okay, so say you're in the middle of the 1990's and you need to build a relational database, what do you do? Well, if you're in Office (like the Darwin team was at the time) and you look at the Word and Excel file formats you migh think, "Hey, those structured storage file thingies are really cool! I bet we could use that!"

So, MSI files are actually little databases laid out in a structured storage file. For those of you that haven't played with structured storage files let me talk about them a little. A structured storage file exists on disk as a single file but can contain many "streams" and/or "sub-storages". Streams are essentially just a bunch of bits with a name stored inside a structured storage file. Sub-storages are just structured storage files embedded in another structured storage files. I've seen people compare structured storage files to typical file systems where "files" map to "streams" and "directories" map to "sub-storages". Structured storage files are also often called "compound documents" or sometimes "OLE documents".

There are a few advantages to using structured storage files as the basis for your file format. First, the format provides a very natural way to separate your data with the streams and sub-storages. The MSI file uses separate streams for each of the tables in the database. Second, you can store multiple files in a single structured file which is nice when you want to have a single redistributable. For example, streams are used to store things like UI graphics, CustomAction DLLs, and even the binaries to be installed in many cases. Also, sub-storages are used to nest one MSI file inside another MSI file (note: you should never do this, but I'll talk about nested installs another day). Finally, structured storage files have built in transaction semantics. Having someone else provide the transaction functionality for you is really nice when you're trying to build a database on top of the format.

There are also a few disadvantages to structured storage files. First, the names of streams can only be something around 63 characters. This restriction isn't particularly restrictive but it can cause some really wacky error messages. Second, structured storage files don't shrink. If you add then delete data to a structured storage file, the file maintains its largest size. This design works out okay if you consider the case where a user is writing a document. In those cases, the user spends most of the time adding data and any deletes are often replaced with more data. Editing MSI files does not necessarily follow the same pattern so it is possible to end up with bloated MSI files if you're not careful. Finally, structured storage files don't handle multiple writers well at all. For example, open an MSI file in Orca then try to install the MSI by double clicking on it. You'll get a lovely message box that says something like:

This installation package could not be opened. Verify that the package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer package. [OK]

Okay? No, not okay but whatever. Every time I see that message box I wonder how many hours have been lost trying to figure out what the heck is wrong with an MSI file only to find that it was held open for editing in Orca. K, a buddy of mine at work, was just about pulling his hair out one day trying to figure out what was going wrong with one of his MSI files until I pointed out that he had Orca editing the file on one of his other test machines.

Anyway, there are a couple other things I want to say about the MSI file format.

In the mid-1990's Microsoft was still shipping Office on 3.5" floppies. Granted Office '97 shipped on something like 39 floppy disks but CD-ROMs weren't quite popular enough (i.e. weren't cheap enough). So one of the things the Darwin team needed to do was make the MSI files as small as possible so that the setup logic would fit on a single floppy disk (trying to read a structured storage file the spanned multiple floppy disks was not an option). This need led to the creation of the "string pool" and many dreaded "string pool corruption" bugs.

More detail. If you're familiar with relational databases, you know that primary key identifiers are duplicated everywhere you have a foreign key reference. Well, primary key identifiers in MSI are strings that are recommended to be 72 or less characters long. It's not hard to imagine how quickly all those identifiers could add up to create unnecessarily large MSI files. To combat this bloat there is a single stream in the MSI file that holds all the strings. This stream is calle the string pool contains a single entry for each unique string. That way a string column in a table is just an integer offset into the string pool.

The string pool can save quite a bit of space. It was also pretty tricky to get right. I wasn't directly involved, but I remember quite a few late night bugs when I was an intern where my mentor spent the whole night tracking down why the wrong string or a corrupt string was coming out of the string pool. Then there were the nights trying to figure out why localized strings were coming out corrupt. Anyway, if ever come across a copy of the original msival.exe you'll see a command-line switch that would run tests to detect string pool corruption. Fortunately, the string pool code is stable now and that isn't necessary any more.

On the note of localized strings, I should note that the MSI file format is not Unicode. I'm not an expert on localization and there is a pretty detailed topic in MSDN about localizing MSI files so I'm not going to say much more. Just keep in mind that you have to deal with codepages when storing localized strings in a MSI file. Yeah, I know, "Ick."

So there's a bunch of detail about MSI files at a level that is probably not terribly useful. Next blog I'll actually try to answer Jim's question about creating custom tables in a MSI file. However, now it is time to go to sleep and search out happy dreams in the synaptic gaps.

Published Tuesday, November 25, 2003 4:44 AM by robmen
Filed under:

Comments

# RE: Inside the MSI file format.

Thursday, November 27, 2003 12:58 PM by Mike Dunn
Since I always like snooping around inside files where I'm not supposed to go, I loaded up an MSI (WinDbg tools) in a doc file viewer, but all the stream names are nonprintable characters (except for _SummaryInformation). Care to give us any hints about which streams hold what? ;) I didn't see anything that looked like a string pool (are the strings compressed perhaps?).

# RE: Inside the MSI file format.

Thursday, November 27, 2003 11:32 PM by Reid Gustin
I'd recommend firing up Orca.exe on that msi file. It's a low-level viewer of the relational databases inside an msi, and you'll find it to be *much* easier to deal with. You can get it as part of the MSI SDK, but unfortunately MSDN is structured in a way that makes linking straight to a download rather difficult. So, you may want to <A HREF="http://www.microsoft.com/msdownload/platformsdk/sdkupdate/default.htm?p=/msdownload/platformsdk/sdkupdate/psdkredist.htm">start here</A> and then click the link on the left that says "Windows Installer SDK".

# RE: Inside the MSI file format.

Thursday, November 27, 2003 11:34 PM by Reid Gustin
Of course, if I'd read the sign that told me "HTML not allowed", I could have saved some trouble. At any rate, the address is correct, though you'll have to cut and paste all the pieces. At some point, I'll look into BlogX and figure out how to post a real link, but not tonight.

# RE: Inside the MSI file format.

Friday, November 28, 2003 4:16 AM by Mike Dunn
But using a program made specially for viewing MSI files is no fun! I was really just looking for something to do to kill some time this morning before going to my mom's apt to pig out on free food. But thanks for the link anyway.;)

# RE: Inside the MSI file format.

Friday, November 28, 2003 1:25 PM by Frank Hileman
I am glad size optimizations like the string pool are in the MSI format. Since many installers are now downloaded, size is still a very important factor, and will probably continue to be for a long time.

# RE: Inside the MSI file format.

Sunday, December 28, 2003 12:40 AM by Phil
Was searching the web on msi files and found this blog. I'm having a problem if there is someone who can give some answers. Just installed XP home upgrade and I have a MS Office Keyboard. The old program won't work with XP, but they do have an upgrade. Unfortunatly I must have downloaded and upgrade from MS but I can't find the install files. Now I can't install the XP version since it won't un-install since I don't have the last files. It's looking for an msi file that no longer exists. Anyone that can help, I would appreciate it. You can email me too. Thanks.

# re: Inside the MSI file format.

Thursday, January 29, 2004 1:19 PM by Johnathan
Phil.. I'm not sure I understood your question, or even if you'll be back to read this, but why don't you just download the original files again? That is one thing about MSI that I hate.. it won't (or at least I haven't been able to get it to) uninstall MSI apps, if you don't have the MSI anymore. It sucks.

# re: Inside the MSI file format.

Friday, January 30, 2004 7:17 AM by Robin Pemberton
I am new to this and have a requirment to extract a file from an MSI file update this and merge the change back into the msi file. Is this possible? And if so how does one go about doing it.

# re: Inside the MSI file format.

Saturday, February 21, 2004 6:32 PM by rodier
vn@kamarad.cz
Hi, please how to extract .msi files without using the stupid MSIexec installer ?
I have one msi file what I need extract, its 100% OK but msiexec say to me This installation package could not be installed..blabla contact vendor..etc..blablabla... something about the file is bad, but isn't.
how to extract it ? if anybody can, please reply me asap - icq 10905813, email vn@kamarad.cz thanks

# re: Inside the MSI file format.

Wednesday, March 31, 2004 2:06 PM by Catalin
What would be the solution to fix the error message after an MSI file that was edited with Orca?

# re: Inside the MSI file format.

Monday, May 10, 2004 1:05 AM by Axel A
Can I uninstall a program without the proper version of my msi file ?

# re: Inside the MSI file format.

Monday, May 10, 2004 12:14 PM by Rob Mensching
Catalin,

The error message I described above wasn't caused because the MSI file was edited by Orca. It occurred simply because the MSI file was held open in Orca. Close Orca, and the Windows Installer should be free to do the install.

# re: Inside the MSI file format.

Monday, May 10, 2004 12:17 PM by Rob Mensching
Axel,

If the MSI was authored properly then you should be able to remove the MSI from your machine. If the Product shows up in Add/Remove Programs you can try to remove it from there. Otherwise, you'll need to get a list of the Products Ids (they are GUIDs) installed on your machine and run "msiexec /x {G-U-I-D}" to remove the MSI. I think there are tools in the Windows Installer SDK to list the Product Ids registered on your machine.

Of course, if the MSI was improperly authored it may require the original source media to uninstall. In that case, you'll need the original MSI (and possibly all the files) to uninstall.

# re: Inside the MSI file format.

Friday, June 11, 2004 7:28 AM by Donald
Is it possible to open an MSI database in MS Access?

# re: Inside the MSI file format.

Friday, June 11, 2004 9:25 AM by Rob Mensching
Donald,

No, the file formats are completely different. Although funnily enough early in the development of the Windows Installer there was something (and ODBC provider or adapter of some sort) that would allow you to import and export data to/from Access and MSI. However, that was over five years ago and never really truly was used... just a bit of trivia.

# re: Inside the MSI file format.

Monday, June 14, 2004 1:11 PM by Donald
Rob,
Thanks for the response. I'd love to know what that "something" was.

At this point, I've resorted to exporting tables from Orca, openning them with and saving them in Excel, importing them into Access, and manually recreating the relationships in Access (as described in the Windows Installer SDK).

This allows me to do do custom reporting, but it'd be nice to have a more direct way...

# re: Inside the MSI file format.

Monday, June 14, 2004 2:20 PM by Rob Mensching
Donald,

That "something" was dead sometime back in 1996 or 1997. There hasn't been a direct link between MSI and any other SQL-like database since then (i.e. at least 7 years). That means any tools you write are the "direct way". <smile/>

# file(1) just doesn't know

Monday, July 05, 2004 11:02 PM by Trejkaz Xaoza
The UNIX file(1) command has been having grief with these files since they appeared. It still thinks they are application/msword because they must have some of the magic data at the front which gives the wrong impression.

Is there an actual way to tell the difference between Word documents and MSI files programmatically? I mean, other than actually trying to open the document. :-/

# re: Inside the MSI file format.

Tuesday, July 06, 2004 6:25 PM by asdf
Is there a utility to view/extract the files inside a .msi file? Sort of like there are .cab file viewers.

# re: Inside the MSI file format.

Thursday, July 08, 2004 11:10 AM by Jonathan
asdf: MSI SDK contains msicab.exe that will extract files from MSIs.

# re: Inside the MSI file format.

Thursday, July 08, 2004 6:02 PM by asdf
Is there a direct download to that somewhere? I couldn't find it in:
http://download.microsoft.com/download/platformsdk/sdk/update/win98mexp/en-us/3790.0/msisdk-common.3.0.cab

# re: Inside the MSI file format.

Monday, July 19, 2004 8:14 AM by Srikanth
Is there any way I can retrieve version number from MSI file .

# re: Inside the MSI file format.

Monday, July 19, 2004 8:21 AM by Rob Mensching
Srikanth,

Yes, you can write SQL Queries against the MSI database to get any information you want out of it. I suggest reading through the MSI SDK for more information.

# re: Inside the MSI file format.

Thursday, July 29, 2004 8:20 AM by Dave Illing
I created an msi package a week or so ago, using a VS.NET deployment project. I ran the msi on a machine to test it and it worked OK. I've done some more development and generated a new msi. Now when I try to use that on the test machine, the installer tells me that a version is already installed and must be removed using Control Panel. But when I try to uninstall, it seems to want the original msi package (which of course is long gone)! How can I proceed? There must be some way to hack the machine so it doesn't think the package is installed?

# re: Inside the MSI file format.

Thursday, July 29, 2004 9:30 AM by Rob Mensching
Dave,

Take a look at the command-line switches for the Windows Installer documented in the Windows Installer SDK (there's a link above). I think "msiexec /fv my.msi" will get you what you want.

# re: Inside the MSI file format.

Friday, July 30, 2004 3:30 AM by Dave Illing
Thanks for that -- really saved my life!!
Excellent blog, BTW

# re: Inside the MSI file format.

Wednesday, July 06, 2005 11:52 AM by Tsab
Like asdf, I try to 'see' what files are in an MSI package?
But, not finding the "msicab.exe" nor "Windows Installer SDK". Can You help me?

# Digging In: MSI Transforms

Sunday, July 17, 2005 1:21 PM by Setup Sense and Sensibility

When one starts working with transforms, whether for patching or for administration, digging in a bit...

# re: Inside the MSI file format.

Wednesday, September 14, 2005 2:15 PM by Isaac
Rob, I am wondering - is it possible to query the actual tables in the MSI database? I mean, if you didn't know what tables to expect, how could you pull up a list of all current tables? You can run an SQL query against table names to gather table row data, but what if you didn't know the table names - can they be retrieved?

# re: Inside the MSI file format.

Tuesday, September 27, 2005 7:28 AM by Albert
hi, need some informations about msi and mst and how to create unatended installations for MSI packages. We use LandDesk to distribute SW. I need a tool to create MST Files that works like the Costum Installtion Wizard for Office 2003. This would fit to my needs. I tryed AdminStudio 6 (InstallShield) but it is too big and too expensiv.

thx
albert

# re: Inside the MSI file format.

Tuesday, October 25, 2005 4:57 PM by Ian
If we deploy internationally is a product like Wise easier (automated) to use then Wix since MSI file format is not Unicode

# re: Inside the MSI file format.

Thursday, November 03, 2005 12:18 PM by Graham S
Regarding getting a list of files contained in an MSI file, there's a handy script here which does the trick:
http://www.serverwatch.com/tutorials/article.php/1548261

# MSI and CAB

Tuesday, January 24, 2006 10:28 PM by Jay Moka
The configuration manager at my company just created an MSI package for our product, but added a CAB file to it. Is this normal? I thought MSI files are always packaged alone!!! Now it seems like an .exe (it actually shares the same CAB file with .exe). If you any comments, please email me at jaymoka1@yahoo.com. thx a bunch...

# re: Inside the MSI file format.

Thursday, February 09, 2006 7:32 AM by Sandrina
I have three different msi files, and I need to make one which includes and executes all three. They have to install in specific order, and two of them require rebooting the system. I have no idea how to do that... I probably should mention that all I ever did with msi files is creating setup project in Visual Studio, and all I ever got to was just copying the files on the client... How do I make them execute? I tried Orca to solve my problem, but all those tables are just confusing....

# re: Inside the MSI file format.

Monday, February 27, 2006 8:03 AM by Claus
I wanted to install JRE, but I couldn't install it, it says 'invalid drive K' (which is a removable drive, which isn't connected anymore). What can I do as an enduser to solve the installer problem? Can I use an installer editor? How?

# re: Inside the MSI file format.

Wednesday, June 21, 2006 6:00 AM by Shailesh
I need to extract file information like version, checksum, type from msi package.
Wish to use the same to generate report mentioning above info.

Is there any programatic interface to installer which can supply/ export such information?

# re: Inside the MSI file format.

Tuesday, September 26, 2006 5:45 PM by Fabio Esquivel
I have been looking for an explanation to the "this installation package could not be opened" error message for 2 weeks now...

- I already have the lastest version of Windows Installer (v3.1 for XP); I had uninstalled/unregistered it and reinstalled from Windows Update and also using the standalone installer

- I don't use M$ Office (never had to in this PC), so M$ explanations relating M$ Office installations do not apply in my case

I just can't install ANY .MSI file at all... Any .MSI file I download gives the same error and I just don't know how to diagnose and fix this

Any suggestions?

# re: Inside the MSI file format.

Sunday, October 08, 2006 7:15 AM by Richard
I have read all of this and as I am not an expert I wonder can you answer a very simple question? I have installed a small program to run a remote webcam - its called EyeSpyFX (neither here nor there really) but every time I try to start it up it demands that I run ScanSoft PDF 3 installer!!! Depsite me purging this from my Registry several times it keeps coming back... It was sugegsted to me that deleting the ScanSoft.msi (if I can find it) would solve the problem... is it safe to delete MSI's in any event? Thanks.... going insane here

# re: Inside the MSI file format.

Monday, October 23, 2006 2:50 AM by Ravi Vaswani

Hi,

I changed the .msi file using orca.exe.  Basically I added the file control where user can select the path where the virtual directory should get created.

It will be great, if somebody can let me know how do I get the value from diaglog box of websetup.  

Thanks & Regards,

Ravi

vaswani_ravi@rediffmail.com

# re: Inside the MSI file format.

Wednesday, November 01, 2006 9:59 AM by John

I have a .msi file that is looking for the Flash.ocx file.  I am using the orca editor.  I need to change the msi file to look for either Flash.ocx and Flash8.ocx can any one lend a hand.  Thanks.

# re: Inside the MSI file format.

Monday, November 13, 2006 1:42 PM by BlogBoy

Hey guys,

Blogs are not the most ideal way to find help for a problem.

Try:

installsite.org or something like that...

# re: Inside the MSI file format.

Monday, November 20, 2006 1:44 AM by Wipoou

Hi, Anyone has some information about introduction to msi table. I will do a presentation and I need some reference besides my experiences and msdn.

Thanks,

New Comments to this post are disabled
 
Page view tracker