Mom always said, “The only good thing about beating your head against the wall is that it feels good when you stop.”  Well, sorry Mom, but that’s not fully true.  While you’re sitting on the couch buried beneath an ice pack, you tend to come up with a few ways to mind your head.

Shipping a crappy product is a lot like beating your head against the wall.  It really does feel good when you ship a great product as a follow-up, and it really does motivate you to spend some time trying to figure out how not to ship a crappy product again.

Mac Word 6.0 was a crappy product.  And, we spent some time trying to figure out how not to do that again.  In the process, we learned a few things, not the least of which was the meaning of the term “Mac-like.”

In order to understand why Mac Word 6.0 was a crappy product, we need to understand both the historical background that led to some key decisions, and we need to understand some of the technical problems that resulted from those decisions.

Mac Word 5 and Pyramid

On October 5, 1991, we shipped Mac Word 5.0.  The reviews were glowing.  For the effort, we received the Mac software equivalent of a Tony award: the Mac World Eddy.  Even today, there are people who say that Mac Word 5.0/5.1 comprise the best version of Mac Word we’ve ever shipped.

While Mac Word 5 was a great product, there was one problem with it: Win Word 2.  They both shipped at about the same time, but Win Word 2 had more features (most notably a macro language, but there were a few others).  This was a major sore point for Mac Word users.  They wanted feature parity, and they wanted it now!  The longer they had to wait for feature parity between Win Word and Mac Word, the more we got raked over the coals.

But we had a problem.  Actually, we had a couple problems, the first being that Win Word and Mac Word were built from separate code bases.  The other problem was Word Perfect.  At that time, it still represented a major competitor on Windows, and we still had some catch-up work to do in order to get better than Word Perfect.  If we had continued to develop Mac Word and Win Word from separate code bases, Mac Word would never have caught up to Win Word in terms of feature parity.

As of October of 1991, we already had a plan to address the first problem: the Pyramid project.  It was a complete rewrite of Word intended both to address some nagging issues with what had, by that time, become somewhat of a crusty code base and to address the separate code base problem.  Both Win Word and Mac Word would be built from that same code base.

Exit Jeff Raikes, Enter Chris Peters

Feature parity problem solved.  Well, not quite.  At the same time, Jeff Raikes was promoted from Word business unit manager to some other position in Microsoft (I forget exactly which), and Chris Peters was promoted to fill Jeff Raikes’ position.  Most everyone knows about Jeff Raikes these days.  Chris Peters, however, had been the development manager for Excel before moving to Word.  His favorite pastime is bowling, and he was known for having huge stacks of empty Coke cans in his office.

While Jeff Raikes thought the Pyramid project as a good idea, Chris Peters looked at the Word Perfect problem and decided that Pyramid was a bad way to solve the feature parity problem.  A complete code rewrite is risky.  The whole point of a complete rewrite is to take a few steps backward in the short-run in order to be able to make some greater strides in the long-run.  Chris Peters decided that we couldn’t afford to take the short-run hit that Pyramid required.

So, Chris Peters killed Pyramid.  At that point, the only way to solve the feature parity issue is to start both Mac Word and Win Word from the Win Word 2.0 code base, which is exactly what we did.

But, that’s not the full effect of Chris Peters’ decision.  At the time Chris Mason was the development manager for Word, and he strongly disagreed with Chris Peters’ decision.  As a result, Chris Mason left the Word group to work on other things at Microsoft.  Chris Mason understood the Mac, and had been a Word developer going back to Mac Word 3.0.  Chris Mason was replaced by Ed Fries, who was far less of a Mac person than Chris Mason was, so we lost a good bit of Mac understanding in the higher-level management of the Word group.  While it’s impossible to say exactly what effect this had, there’s a high probability that some of the trade-offs we made with Mac Word 6.0 would have gone in a different direction.

Technical Hurdles

Starting from the Win Word 2.0 code base presented a couple of technical problems for those of us on the Mac side.  The first was that it was written to the Windows APIs.  Solving this problem isn’t simply a matter of writing a layer that emulates the Windows APIs on the Mac.  The way the two systems handle windows are fundamentally different, though it’s interesting to note that the new Carbon APIs are far more similar to the way Windows does things.  The biggest problem is that Windows has the concept of child windows, while the Mac does not.  The other is that, on Windows, everything is a subclass of the Window object.  Even controls are Windows.

The other problem was a limitation in the Mac OS.  While 68K Classic Mac OS was a nice operating system, it had one very glaring flaw.  It didn’t do memory management very well.  In fact, it barely did any memory management at all.  Users had to tell the OS how much memory a program needed in order to run, and that’s how much memory the program got regardless of what the program might need at any given time during execution.

The memory problem was worse on 68K machines, because the memory given to a program, regardless of the virtual memory settings, was what the program got to use for both code and data.  Under 68K, code was contained in something called a “Code Resource”.  Now, you could swap these code resources in and out of memory as needed, which meant that the actual memory needs of your program could change drastically depending on what the user wanted to do.

For example, consider a grammar checker.  The user isn’t going to want to check grammar all the time, so the grammar checker doesn’t need to be loaded into memory all the time.  But a grammar checker isn’t a simple piece of code.  It’s a memory pig.  The way 68K Classic Mac OS handled memory meant that you had to set a minimum amount of memory for your application such that you could load that memory pig of a grammar checker.

I’m making a distinction between 68K Classic Mac OS and PowerPC Classic Mac OS, because Apple changed how code was stored, loaded and executed on the PowerPC.  For those of you who remember, when you did Get Info on an application, it would show you two different memory requirements: one with virtual memory turned on and one with virtual memory turned off.  With virtual memory turned on, the application’s code could be handled through something called “demand paged” virtual memory, so the code no longer had to fit in the application’s memory partition.  That notorious grammar checker didn’t have to be given account when trying to figure out the application’s minimum memory requirements.

I want to be careful, here, not to lay blame for this at Apple’s feet.  Doing true virtual memory requires hardware support.  Microprocessors in 1984 didn’t have the full functionality required to support full demand paged virtual memory, so designing it into the original Mac OS would have been a waste of time.  We often make design decisions that make perfect sense in light of current system limitations, only to have those design decisions come back to haunt us when Moore’s Law makes those systems orders of magnitude more powerful.  There’s a reason Apple scrapped the Motorola 68K line of processors in favor of the PowerPC, not the least of which is the fact that it afforded them an opportunity to revisit some of those early design decisions.

Technical Achievement

Having reaped the benefits of a decade’s worth of Moore’s Law, we who now think very little of putting 128 MB or even a half a GB of memory into a laptop computer might find it difficult to grasp just how much of a problem the 68K memory wall presented for Mac Word 6.0.  But we were trying to get the whole thing to run in 4 MB of memory—that’s total system memory, not just the application partition.

This was no small matter.  Word 6 was getting a bevy of new features over and above Win Word 2.0.  Relative to Mac Word 5.0, this was two major releases worth of feature changes.  OLE, the built-in lexical analyzer and rule-based inference engine required for AutoCorrect/AutoFormat and a grammar checker that included state-of-the-art natural language processing technology (which made the grammar checker even more of a memory pig) combined with things like a full-blown macro language (WordBasic) to make Mac Word 6.0 huge relative to common Mac systems of that time.

Please note the “relative” qualifier to the word “huge” back there.  To see this in perspective, fire up BBEdit on your Mac OS X machine, open the Terminal window, and type “top” at the command line.  Now read the values in the RSIZE and VSIZE columns.  When I open my .tcshrc file in BBEdit, those values are 12.1 MB and 164 MB respectively.  As I type this document into my most recent build of Word 2004, those values are 36.6 MB and 222 MB respectively—and Word’s a full-blown word processor.

The amazing thing is that we actually managed to get Word 6.0 to run on systems that had only 4 MB of memory (well, “walk” might be a better word than “run,” but you get the point).  To fully grasp the extent of this achievement, we need to understand a little bit about how programs are written and how they execute.  What follows is my attempt to explain a fairly technical issue in lay terms.  If your eyes start rolling into the back of your head, feel free to jump ahead to the next section.

Programs are written in relatively small chunks of code called “functions.”  Each function represents a single, functional aspect of the program.  Functions can represent high-level concepts (e.g. layout a page of text) or low-level concepts (format a single line of text within a page).  Higher-level functions perform their work by calling lower-level functions, and there’s a protocol that helps the computer to know how to return from a low-level function back to the high-level function that called it.  This protocol is known as “procedure prologue and epilogue” and it involves something known as a “call stack.”  While the lower-level code is running, the higher-level code that called it is said to be “on the call stack.”

Trying to get a body of code to run in a memory space that’s smaller than the code itself involves something called code swapping.  This is generally very easy to do if the code that you’re swapping out doesn’t cross these high-level to low-level boundaries.  Our grammar checker is a good example.  It represents a distinct functional unit, so we can swap the grammar checker’s code out of memory if we no longer need that code around without having to worry about swapping it back in when we’re done executing the current chunk of code.

But, we can group code at a level of granularity that crosses high-level to low-level functional boundaries.  For example, the code that lays out a page of text can be in one module (or code segment), while the code that formats a single line of text can be in another module.  When you’re laying out a single line of text, you really don’t need the code that lays out the whole page in memory.  Conceptually, at least, you can swap out the page layout code while running the format line code.

There’s a problem with this idea: the page layout code calls the format line code, which means that the page layout code is still on the call stack.  When the format line code is finished, the protocol that allows computer to know how to return execution back to the page layout code needs to know that the page layout code is no longer in memory.  This is such a difficult problem that Apple’s documentation claimed that it was simply not possible to swap out code that was still on the call stack.  Yet, this is exactly what we were able to do with Word 6.0

There is an unfortunate downside to being able to swap out code that’s on the call stack.  It leads to something called thrashing.  Consider our page layout/format line example.  Page layout works by calling format line for each line of text on the page.  Every time we cross the boundary between the page layout code and the format line code, we need to stop and load a chunk of code into memory, which will, in turn, require removing another chunk of code from memory.

Now, I’ve grossly oversimplified the whole process in order to explain what was going in.  The swapping algorithm is a bit smarter about deciding what parts of the program to swap out of memory in order to be able to swap in a piece of code that’s needed immediately.  In practice, then, it’s highly unlikely that page layout and format line would ever thrash by themselves.  Nonetheless, thrashing does occur when the available memory is small enough.  When the system thrashes like this, performance goes down the toilet.

Learning the Meaning of “Mac-Like”

OK, so Mac Word 6.0 was big and slow relative to the memory that most computers had available at the time we shipped it, but that’s not the reason why Mac Word 6.0 was such a crappy product, or at least not directly.  Not long after Word 6.0 shipped, people could afford to add more memory to accommodate the added features of Mac Word 6.0.  Those people who found those features to be very useful, and you’ll run into a few of them even today, felt that the cost of the added memory was worth the work-savings that those features afforded them.  Moore’s Law, and the PowerPC, would have solved the memory problem in due time.

Moreover, while people complained about the performance, the biggest complaint we kept hearing about Mac Word 6.0 was that it wasn’t “Mac-like.”  So, we spent a lot of time drilling down into what people meant when they said it wasn’t “Mac-like.”  We did focus groups.  Some of us hung out in various Usenet newsgroups.  We talked to product reviewers.  We talked to friends who used the product.  It turns out that “Mac-like” meant Mac Word 5.0.

We spent so much time, and put so much effort into, solving all the technical problems of Mac Word 6.0 that we failed to make the UI of Mac Word 6.0 behave like Mac Word 5.0.  As a result there were many differences, some little, some huge and even some that were simply gratuitous, between the way Mac Word 6.0 did things and the way Mac Word 5.0 did things.  The end result was a UI that could only be described as clunky relative to Mac Word 5.0’s elegance.  More importantly, Mac Word users had to unlearn all the ways they had come to do certain things, and relearn the Word 6.0 way of doing them.

My favorite example of this is they way you defined styles.  In Mac Word 5.0, style definition was a semi-modal task.  You defined or modified a style the same way you changed the font or paragraph properties in the document itself.  In Mac Word 6.0, the task was completely modal.  The entire array of menus and toolbar buttons that you could use in Mac Word 5.0 (and with which you were quite familiar as a user) was replaced by a single drop-down menu in the New/Modify style dialog box.  Even today, you can’t use the Formatting Palette to change the font or paragraph information in a style in Word 2001 or Word X, and this remains one of the things I want to fix in Word before I leave MacBU.

The other thing we figured out as a result of coming to understand what “Mac-like” meant was that we weren’t going to be able to deliver “Mac-like” products if Office remained a singular product from which both the Win and Mac versions were built.  The mere fact that “Mac-like” was an issue at all meant that there were some fundamental differences between the Win Word market and the Mac Word market.  If we were to understand both those markets, then our Mac products and Win products needed separate marketing and PGM organizations.  The lessons we learned from Mac Word 6.0 are some of the reasons that Mac BU exists today.

We still bang our heads against the wall from time to time.  Understanding users isn’t an exact science.  But, we do it far less often than we used to.  And it really does feel much better.  In a future post, I’ll describe in more detail how we go about trying to understand both our current users and potential new users.

As for my own role in Mac Word 6.0, I was responsible for the PowerPC port.  But that’s also a story for another post.

Lastly, as for how I felt about the demise of Pyramid, all during the Word 97 project, I kept six empty Mac Word 6.0 boxes stacked in a vertical triangle next to my desk.  I called it a slice out of a pyramid.  A few people got the point.  And the boxes when into recycling as soon as Mac BU was formed and we started work on Word 98.