Holy cow, I wrote a book!
Commenter ton asks
why the registry was defined as a hierarchical database instead of
a relational database.
Heck, it's not even a hierarchical database!
The original registry was just a dictionary;
i.e., a list of name/value pairs, accessed by name.
In other words, it was a flat database.
If you turned your head sideways and treated the backslashes as
node separators, you could sort of trick yourself into believing
that this resulted in something vaguely approximating
a hierarchical database,
and a really lame one at that (since each node held only one piece
When you choose your data structures, you necessarily are guided
by the intended use pattern and the engineering constraints.
One important engineering constraint was that you have to minimize
All of the registry code fit in 16KB of memory.
(Recall that Windows 3.1 had to run on machines with only 1MB of memory.)
Okay, what is the usage pattern of the registry?
As originally designed, the registry was for recording information
about file types.
We have the file types themselves (txtfile),
properties about those file types (DefaultIcon),
verbs associated with those file types (open),
and verb implementations (command or ddeexec).
Some verb implementations are simple (command involves
just a single string describing the command line);
others are complex (ddeexec requires the execute string,
the application, and the topic, plus an optional alternate execute string).
Given the heterogeneity of the data the registry needs to keep track of,
imposing some sort of uniform schema is doomed to failure.
"But you can just update the schemata each time the registration is
That creates its own problems.
For example, to support
roaming user profiles,
you need a single registry hive to work on multiple versions of the
If version N+1 adds a new schema, but then the profile roams to a
machine running version N,
then that registry hive will be interpreted as corrupted since it
contains data that matches no valid schema.
"Well, then include the schemata with the roaming profile so that
when the older operating system sees the hive, it also sees the
This is trickier than it sounds, because when the profile roams to
the newer operating system, you presumably want the schemata to be
upgraded and written back into the user profile.
It also assumes that the versioning of the schemata is strictly linear.
(What if you roam a user profile from a Windows XP machine
to a Server 2003 machine? Neither is a descendant of the other.)
But what kills this proposal is that it makes it impossible for a program
to "pre-register" properties for a future version of the operating system.
Suppose a new schema is added in version N+1,
like, say, the IDropTarget verb implementation.
You write a program that you want to run on version N as well as
on version N+1.
If your installer tries to register the version N+1 information,
it will fail since there is no schema for it.
But that means that when the user upgrades to version N+1,
they don't get the benefit of the version N+1 feature.
In order to get the version N+1 feature to work, they have to
reinstall the program so the installer says,
"Oh, now I can register the version +1 information."
"Well, then allow applications to install a new schema whenever
they need to."
In other words, make it a total free-for-all.
In which case, why do you need a schema at all?
Just leave it as an unregulated collection of name/value pairs
governed by convention rather than rigid rules,
as long as
the code which writes the information and the code which reads it
agree on the format of the information and where to look for it.
Hey, wow, that's what the registry already is!
And besides, if you told somebody,
"Hi, yeah, in order to support looking up four pieces of
information about file types,
Windows 3.1 comes with a copy of
they would think you were insane.
That's like using a bazooka to kill a mosquito.
What are you planning on doing with this relational database anyway?
Are you thinking of doing an INNER JOIN on the registry?
(Besides, the registry is already being abused enough already.
Imagine if it were a SQL server: Everybody would store
all their data in it!)
ton explains one way applications could use this advanced functionality:
An application would have a table or group of tables
in relational style registry.
A group of settings would be a row.
A single setting would be a column.
Is it starting to become clearer now how SQL like statements
could now be used to constrain what gets deleted and added?
How good is your understanding of SQL and DBMS?
You know what most application authors would say?
They would say "Are you mad?
You're saying that I need to create a table with one column for each
And this table would have a single row (since I have only one application)?
All this just so
I can save my window position?
Screw it, I'm going back to INI files."
What'll happen in practice is that everybody will create a table with
a string called name and a blob called
Now we've come full circle:
We have our flat database again.
And how would they make sure the name of their table doesn't
collide with the name of a table created by another application?
Probably by encoding the company name and application name into
the name of the table, according to some agreed-upon convention.
Like say, the Settings table used by the LitSoft program
written by LitWare would be called
So querying a value from this table would go something like
SELECT value FROM PerUser.LitWare_LitSoft_Settings
WHERE name = "WindowPosition"
Hey, this looks an awful lot like
One of ton's arguments for using a relational database is that
it permits enforcement of referential integrity.
But I would argue that in the general case, you
strict enforcement of referential integrity.
Suppose you uninstall a program.
The uninstaller tries to delete the program registration,
but that registration is being referenced by foreign keys in other tables.
These references were not created by the application itself;
perhaps the shell common dialog created them as part of its
If the registry blocked the deletion, then the uninstall would fail.
"Cannot uninstall application
because there's still a reference to it somewhere."
And that reference might be
in Bob's user profile,
from that time Bob said, "Hey can I log onto your machine quickly?
I need to look up something."
Bob is unlikely to come back to your machine any time soon,
so his user profile is just going to sit there holding a reference
to that application you want to uninstall for an awfully long time.
"Hi, Bob, can you come by my office?
I need you to log on so I can uninstall an app."
So let's assume it goes the other way:
The registry automatically deletes orphaned foreign key rows.
(And for hives that are not currently available, it just remembers
that those foreign key rows should be deleted the next time they are loaded.
Nevermind that that list of "foreign key rows that should be deleted
the next time Bob logs on" is going to get pretty long.)
Now suppose you're uninstalling a program not because you want to
get rid of it, but because you're doing an uninstall/reinstall
You uninstall the program, all the orphaned foreign key rows are
automatically deleted, then you reinstall the program.
Those orphaned foreign key rows are not undeleted; they remain deleted.
Result: You lost some settings.
This is the reason why
you don't clean up per-user data when uninstalling programs.
Enforcing referential integrity also means that you can't create
One example of this was given earlier, where you register something
on version N even though the feature doesn't get activated
until the user upgrades to version N+1.
More generally, Program X may want to create a reference to
Program Y at installation,
even if program Y isn't installed yet.
(For example, X is a Web browser and Y is a popular plug-in.)
The Program Y features remain dormant, because the attempt by
Program X to access Program Y will fail,
but once the user installs Program Y,
then the Program Y features are magically "turned on"
in Program X.
Consider, as an even more specific example, the "kill bit" database.
There, the goal isn't to "turn on" features of Program Y but
to turn them off.
Imagine if referential integrity were enforced:
You couldn't kill an ActiveX control until after it was installed!