There's data and then there's metadata...
People like to think a lot about programs in terms of code. It's what we focus on. conditional statements, looping constructs, interface design and implementation, etc. It's really where almost all the research has been focused and it's arguably the most important issues when it comes to attracting and retaining developers for some development environment (nee platform).
So that's cool. Code is sexy, code sells, code is what gets the press when it's bad.
But there's so much more to a program running than code.
If every program had a “game console” like experience where it was a world unto itself, then while maybe we might argue that there's only the code that came with the application and maybe some code that's fixed function underneath the application (BIOS, some level of resource management in the game console ROM, U*ix/Linux, whatever).
But that kind of program isn't really very interesting for a multifunction device. Well, that's not to say that it's not that interesting, but it's certainly not the main focus of a general purpose operating system or multifunction device to just let one program do its thing, stop and go away. The really interesting part is how you can take data from one program and exchange it with another program. Or maybe have the two programs interoperate.
If it's just about the user getting data from one program to another, well, maybe that's easy. They drag-and-drop, they use a command shell and redirection or piping, they save the output of one program in a location they can remember, launch the other program, open up the file and maybe it works.
But you see that's not the value proposition of a general purpose operating system or if a multifunction device like a PC.
The coolness and value proposition of a device like a personal computer is that you can have a large number of programs on it and while each program has a particular task that it's most proficient at, the pipelining of data is done automatically.
You can look at this problem as how to write a larger program - some multifunction devices which are not general purpose can take 2 or more functions, design them together and get something with combined value. Combo TV/DVD/VCR units are a great example of this. There wasn't some grand design or scheme for the functions to know about each other and affect each other. Instead the product engineers said something like “hey, we've got these things which are separate, if we hard-wire integrate them together, we can sell more units because there is only one power cord and maybe if the user's lucky we'll switch inputs for them automatically”. (I am very annoyed that all the combo DVD/VCR units don't switch which device is active automatically; maybe the DVD/VCR/TV combo units do this more smoothly...)
So yes, you can write larger programs which integrate functionality (Outlook Express - it's a mail client! no, it's a newsreader, no, it's both!). But this doesn't scale over time.
Once again to allude to the Real World, plugability still doesn't cut the mustard. My wife who is a very smart person doesn't get our tv/stereo setup (arguably because it's not important to her so moving the quality of components over the simple integrated experience isn't valuable enough to develop an affinity with the problem/solution space) would be helpless if I asked her to buy another TiVo and figure out how to integrate it into our family room AV setup.
Similarly, thinking that the way to interchange data between applications is by a user picking and choosing the data flow is fundamentally flawed (not too many people make this argument any more).
So, ummm.. how is one program supposed to find another? It's not reasonable for the end user to try to hook up data producers and consumers. This would be like telling Anne she has to hook up a new TiVo and VCR into the AV system. It probably would just sit in the box.
When communications have to cross machine boundaries, people tend to realize that Something Special is going on, and coupled with the plethora of RPC experiments in the 70s and early 80s, we have pretty good stories for how to design interoperable program-to-program communications protocols. Several limitations come out though: the expectation is that while the program which serves the protocol may vary from machine to machine, it's generally assumed that the endpoint for the communications is a single program. Some protocols have evolved into meta-protocols where some namespace is used to further navigate into various programs which operate within the container protocol (e.g. web servers finding things served by cgi-bin or isapi extensions) but the issue there is to enable coexistence between static content and dynamic content generated by potentially a very large varying number of mechanisms (you could certainly have a site with a mixture of static content, isapi generated content, ASP, ASPX and JSP pages).
But something very special happened on the PC. It's certainly not unique to the PC and there's plenty of prior art in this realm going back to OSes in the 50s and 60s that people can use as their “I was doing that in machine code on KL10s running ITS back at MIT way back when...” running entries.
Love it, hate it, by design, happy accident, worst thing ever to happen to the industry, whatever, a simple metadata store was invented where people could in a standard way publish things about their programs during installation. It was win.ini. (admit it, you thought I was going to say the registry but I want to bring up an example which was fraught with even more problems than the registry so that we can look at the registry in context.)
Wow, look, you can look up name value pairs by section name! Wow, you can use these themselves as section names and create arbitrary information hierarchies! Wow, look, it's “just a text file” so anyone can just open it up in the editor and party on it to their heart's content! Wow, it looks like it's even cached, so it's fast to read it using these handy APIs!
However for all the groovyness that was win.ini, it was also a nightmare. People didn't edit it correctly and installation programs didn't modify it correctly. A setup that actually had an uninstaller was not that common and one that could edit win.ini to remove itself without commonly corrupting win.ini was even rarer.
So then someone (I'm sorry, I don't know who and I'm not sure if they're a hero or a villian) invented a simple tree-structured namespace (very similar to the tree structured namespaces of naming services in vogue around the same time) with a semi-decent API around not only reading it but also writing it called the registry.
The registry started to house two important kinds of meta-data: configuration and discovery. I'm not very familliar with the 16-bit registry but I am led to believe that a large amount of the OLE 1.0 scenarios are/were enabled with it, where the first and foremost thing was to deliver a namespace for discovery of programs which conformed to some protocol and which particular data formats the programs owned. (a/k/a HKEY_CLASSES_ROOT nowadays).
Configuration is a very interesting topic but isn't my focus tonight so I'll just touch on it briefly. The essence is that there is configuration/settings which are specific to the machine/the user/the application which need to be kept somewhere. In low-level NT, the registry is actually called the configuration manager; the original intention was to store information about the system configuration like the device/service configuration and tweakable settings. I'm more interested in talking about discovery right now so I'll leave configuration issues to another blog entry some day.
The program manager was all over this since instead of inventing logical/virtual groupings of programs, it was always based on physically backed hierarchy (e.g. directories in the filesystem) so now they had some cool way of not just showing programs which might have neato icons embedded in them, but also using those neato icons also on files that were associated with the programs! (I wasn't around, so I'm not trying to give a history lesson here as RaymondC might; I did use an Apple Lisa for a year or so in the 1982-1983 time frame at a computer magazine I worked at, SoftSide, so the notion of documents having an appearance based on the program that manipulated them isn't some kind of big suprise.)
The really important thing out of this is the following: having some mechanism for analyzing data to determine its protocol (and the debates still rage on whether you should sniff the bytes vs. using a naming convention like file extensions vs. out-of-band protocol identification like ContentType in RFC822) and then matching this up to some program, not based on human intervention but rather on the machine understanding some automatic selection mechanism.
But as we all know, the registry is/was Fatally Flawed. Or maybe its usage patterns were. It's hard to tell the difference since there's never been any official Design Guideline for How To Use the Registry.
One of the problems is the fact that you cannot distinguish data in the registry which is for discovery vs. configuration. You may be able to make some educated guesses but for just about every “educated guess”, you can find counter examples (is RunAs when used under HKCR\Clsid\{clsid} configuration or registration/discovery?)
Another problem is that most people who approached the problem of using the registry for discovery “just wanted the one handler for <X> right now”. Man, I just want to find what program is supposed to edit “.TXT” files. I just want to find the COM server for Microsoft.Office.Word. When the only pattern of usage available is to overwrite a given entry, you have no opportunity to understand how much of even these relatively straightforward discovery mechanisms is “just installation” vs. configuration. (I guess when an MP3 player actively stomps on HKCR\.MP3 it's configuration but when it's written at installation time, it's just discovery?)
We're working on solving these problems in Windows in the future but I felt like writing tonight about the fact that we do have these problems and they seem to occur in spades. Windows' fundamental value proposition over a single function device is that you can perform multiple functions concurrently and there are opportunities for programs to cheaply interoperate. (We can debate whether in-process loading of extensions is just plain evil or is a necessary evil, but it's one of the things that made Windows successful and if it wasn't done in the first place, nobody would be looking back and wondering if it should have been done differently.)
The meta-point tonight is that people want their code to work. It's the sexy thing. Man, give me that garbage collection stuff so I can write more bugs^h^h^h^hcode per hour and I'm not going to look back. But for the code to work right, it needs to be deployed correctly and serviced correctly. But then they're willing to look the other way time and time again when it's clear that some of the patterns which make up their bread-and-butter are fragile and need revision.
How do you know when you can remove metadata? What metadata needs to get backed up? All of it? Well I guess it's cool to be able to burn a DVD that I can boot and rebuild my current system configuration from (I think that there are features like that enabled... never done it myself...) but that's too much data to think about managing and backing up all the time. What's the really valuable data? If it's metadata that reconstructable given the installation package, maybe it should be omitted from a backup and just the fact that software packages <x>, <y> and <z> were installed should be backed up since we can recreate everything else from there.
We really need to revolutionize software development, testing and deployment so that how software gets onto the machine, updated and eventually removed is just as core a value as whether your string concatenations work without buffer overruns. I guess it's not sexy but we're really hitting a wall in terms of the ability to manage software configurations on general purpose computers. Maybe corporations should be deploying their applications on DVD and getting every employee an XBOX or something but the fundamental reason that PCs demolished the large IT departments in the first place was putting the power to manage information and tools in the hands of the departments; we need to be wary of looking to solutions like lockdown of desktops or building silos of applications as the answer - this is why PCs replaced the much more (financially and management-wise) scalable character cell terminals of the previous decades.