From Bravo to Word

Not long after the Wired article about Word 5.1, someone sent me a link to this post on the Cult of Mac Blog. It mentions what I had thought to be a rather widely known fact: that most of the people who first worked on Word came from the Xerox PARC. In fact, the one person most responsible for most of the design ideas that permeate Word is Richard Brodie who also worked on Bravo while at the Xerox PARC.

Not surprisingly, a number of ideas that were first explored in Bravo found their way into Word. The Cult of Mac mentions the file format, but the statement that’s quoted isn’t quite accurate—at least insofar as it leaves out some important details.

The basic design goal behind Word’s file format was to be able to read in only that amount of information that was necessary to fill the document window with text. You can see the fruit of this today by conducting a little experiment:

  • Boot Word.
  • In a new document, type ‘=rand()’
  • Save this document as “SmallDoc”, and close it
  • Open a new untitled document
  • Type ‘=rand(100)’
  • Type <Cmd>-y about twenty times (until you have more than 100 pages of text).
  • Save this document as “BigDoc”, and close it.

You could, if you wanted to, grab a stopwatch and time the next few steps, which are:

  • Select “SmallDoc” from the “File” menu
  • Select “BigDoc” from the “File” menu

If you’re timing this, start the stopwatch when you mouse-up in each file name on the menu, and stop the stopwatch when you first see the insertion cursor blink.

The first thing you’ll note is that there is no appreciable difference between the amount of time it takes Word to open BigDoc and the amount of time it takes Word to open SmallDoc—this despite the huge difference in sizes. In my experiment, BigDoc is over 1 MB in size while SmallDoc is barely more than 24K. BigDoc is 40 times larger than SmallDoc, but I can’t tell the difference when I open the files.

Now, there are a few data structures that are stored in the file and are proportional to the amount of text in the file, but the actual data in them is so small that reading them in approaches constant time relative to the amount of time it takes to read in the actual text and formatting. The result is, even today, a file open time that is proportional to the size of your document window, not proportional to the size of your document.

The post on the Cult of Mac Blog quotes Bruce Damer’s claim that, “Bravo and BravoX stored out files by essentially just dumping the memory heap,” which is really a gross oversimplification. If the file format consisted of a straight dump of the memory heap, then opening a document would still take time proportional to the size of your document.

The formatting in a Word file, however, is allocated in blocks of 512 bytes. Formatting information is added to each of these blocks until they fill up, in which case new blocks are allocated. These blocks are written to the file as full 512-byte blocks whether they’re full or not, which is the only sense in which a Word file consists of a dump of the memory heap.

Damer attributes his claim to something Charles Simonyi said, but it’s almost certain that either Damer didn’t fully understand what Simonyi was saying or that Simonyi wasn’t entirely clear that this “memory dump” aspect of Word’s file format is limited to the disk pages that hold formatting information.