Holy cow, I wrote a book!
After learning about the bad things that happened if you synchronized
your application's input queue with its debugger,
commenter kme wonders
how debugging worked in 16-bit Windows,
since 16-bit Windows didn't have asynchronous input?
In 16-bit Windows, all applications shared the same input queue,
which means you were permanently in the situation described
in the original article,
where the application and its debugger (and everything else)
shared an input queue and therefore would constantly deadlock.
The solution to UI deadlocks is to make sure the debugger doesn't
have any UI.
At the most basic level, the debugger communicated with the developer
through the serial port.
You connected a dumb terminal to the other end of the serial port.
Mine was a
Wyse 50 serial console terminal.
All your debugging happened on the terminal.
You could disassemble code,
inspect and modify registers and memory,
and even patch new code on the fly.
If you wanted to consult source code,
you needed to have a copy of it available somewhere else
(like on your other computer).
It was similar to using the cdb debugger,
where the only commands available were
Oh, and bp to set breakpoints.
Now, if you were clever, you could use a
terminal emulator program so you didn't need a dedicated
physical terminal to do your debugging.
You could connect the target computer to your development machine
and view the disassembly and the source code on the same screen.
But you weren't completely out of the woods,
because what did you use to debug your development machine if it crashed?
The dumb terminal, of course.¹
I did pretty much all my Windows 95 debugging this way.
If you didn't have two computers,
another solution was to use a debugger like CodeView.
CodeView avoided the UI deadlock problem by not using the GUI to
present its UI.
When you hit a breakpoint or otherwise halted execution
of your application, CodeView talked directly to the video driver
to save the first 4KB of video memory, then
switched into text mode to tell you what happened.
When you resumed execution, it restored the video memory,
then switched the video card back into
restored all the pixels it captured,
then resumed execution as if nothing had happened.
(If you were debugging a graphics problem,
you could hit F3 to switch temporarily to graphics mode,
so you could see what was on the screen.)
If you were really fancy,
you could spring for a monochrome adapter,
either the original IBM one or the Hercules version,
and tell CodeView to use that adapter for its debugging UI.
That way, when you broke into the debugger,
you could still see what was on the screen!
We had multiple monitors before it was cool.
¹ Some people were crazy and cross-connected their
target and development machines.
This allowed them to use their target machine to debug their
development machine and vice versa.
But if your development machine crashed
while it was debugging the target machine,
then you were screwed.
Anon is interested in
why the FAT driver is called FASTFAT.SYS.
"Was there an earlier slower FAT driver?
What could you possibly get so wrong with a FAT implementation that
it needed to be chucked out?"
The old FAT driver probably had a boring name like, um, FAT.SYS.
At some point, somebody decided to write a newer, faster one,
so they called it FASTFAT.
And the name stuck.
As for what you could possibly get so wrong with a FAT implementation
that it needed to be improved:
Remember that circumstances change over time.
A design that works well under one set of conditions may start
to buckle when placed under alternate conditions.
It's not that the old implementation was wrong;
it's just that conditions have changed,
and the new implementation is better for the new conditions.
For example, back in the old days,
there were three versions of FAT: FAT8, FAT12, and FAT16.
For such small disks, simple algorithms work just fine.
In fact, they're preferable because a simple algorithm
is easy to get right and is easier to debug.
It also typically takes up a lot less space,
and memory was at a premium in the old days.
An O(n) algorithm is not a
big deal if n never gets very large and the
constants are small.
Since FAT16 capped out at 65535 clusters per drive,
there was a built-in limit on how big n could get.
If a typical directory has only a few dozen files in it,
then a linear scan is just fine.
It's natural to choose algorithms
that map directly to the on-disk data structures.
(Remember, data structures determine algorithms.)
FAT directories are just unsorted arrays of file names,
so a simple directory searching function would just
read through the directory one entry at a time until it
found the matching file.
Finding a free cluster is just a memory scan looking for a 0
in the allocation table.
Memory management was simple: Don't try.
Let the disk cache do it.
These simple algorithms worked fine until FAT32 showed up
and bumped n sky high.
But fortunately, by the time that happened, computers were also
faster and had more memory available,
so you had more room to be ambitious.
The big gains in FASTFAT came from algorithmic changes.
For example, the on-disk data structures are transformed
into more efficient in-memory data structures and cached.
The first time you look in a directory, you need to do
a linear search to collect all the file names,
but if you cache them in a faster data structure
(say, a hash table),
subsequent accesses to the directory become much faster.
And since computers now have more memory available,
you can afford to keep a cache of directory entries around,
as opposed to the old days where memory was tighter and
large caches were a big no-no.
(I wonder if any non-Microsoft FAT drivers do this sort of
optimization, or whether they just do the obvious thing and
use the disk data structures as memory data structures.)
The original FAT driver
was very good at solving the problem it was given,
while staying within the limitations it was forced to operate under.
It's just that over time, the problem changed,
and the old solutions didn't hold up well any more.
I guess it's a matter of interpretation whether this means that
the old driver was "so wrong."
If your child outgrows his toddler bed,
does that mean the toddler bed was a horrible mistake?
some time ago that
there is a 64KB no-man's land
near the 2GB boundary
to accommodate a quirk of the Alpha AXP processor architecture.
But that's not the only reason why it's there.
The no-man's land near the
boundary is useful even on x86 processors
because it simplifies parameter validation at the boundary between
user mode and kernel mode by taking out a special case.
If the 64KB zone did not exist,
then somebody could pass a buffer that straddles the
and the kernel mode validation layer
would have to detect that unusual condition and reject the buffer.
By having a guaranteed invalid region,
the kernel mode buffer validation code can simply validate that
the starting address is below the 2GB boundary,
then walk through the buffer checking each page.
If somebody tries to straddle the boundary,
the validation code will hit the permanently-invalid region and
Yes, this sounds like a micro-optimization,
but I suspect this was not so much for optimization purposes
as it was to remove weird boundary conditions,
because weird boundary conditions are where the bugs tend to be.
(Obviously, the no-man's land moves if you set the /3GB switch.)
The default base address for a DLL is 0x10000000,
but the default base address for an EXE is 0x00400000.
Why that particular value for EXEs?
What's so special about
It has to do with the amount of address space
mapped by a single page directory entry on an x86
and a design decision made in 1987.
The only technical requirement for
the base address of an EXE is that it be
a multiple of 64KB.
But some choices for base address are better than others.
The goal in choosing a base address is to minimize
the likelihood that modules will have to be relocated.
This means not colliding with things already in the
address space (which will force you to relocate)
as well as not colliding with things
that may arrive in the address space later
(forcing them to relocate).
For an executable, the not colliding with things that may
arrive later part means avoiding the region of the
address space that tends to fill with DLLs.
Since the operating system itself puts DLLs at high addresses
and the default base address for non-operating system DLLs
this means that the base address for the executable should be
somewhere below 0x10000000,
and the lower you go, the more room you have before you start
colliding with DLLs.
But how low can you go?
The first part means that
you also want to avoid the things that
are already there.
Windows NT didn't have a lot of stuff at low addresses.
The only thing that was already there was a PAGE_NOACCESS
page mapped at zero in order to catch null pointer accesses.
Therefore, on Windows NT, you could base your executable at
0x00010000, and many applications did just that.
But on Windows 95,
there was a lot of stuff already there.
The Windows 95 virtual machine manager
permanently maps the first
64KB of physical
memory to the first 64KB of virtual memory
in order to avoid a CPU erratum.
had to work around a lot of CPU bugs
Furthermore, the entire first megabyte of virtual
address space is mapped to the logical address space
of the active virtual machine.
actually a little more than a megabyte.)
This mapping behavior is required by the x86 processor's
Windows 95, like its predecessor Windows 3.1,
runs Windows in a special
virtual machine (known as the System VM),
and for compatibility it still routes all
sorts of things through 16-bit code
just to make sure the decoy quacks the right way.
Therefore, even when the CPU is running a Windows application
(as opposed to an MS-DOS-based application),
it still keeps the virtual machine mapping active
so it doesn't have to do page remapping
expensive TLB flush that comes with it)
every time it needs to go to the MS-DOS compatibility layer.
Okay, so the first megabyte of address space is
already off the table.
What about the other three megabytes?
Now we come back to that little hint at the top of the article.
In order to make context switching fast,
the Windows 3.1 virtual machine manager "rounds up" the per-VM
context to 4MB.
It does this so that a memory context switch can be performed
by simply updating a single 32-bit value in the page directory.
You also have to mark
instance data pages, but that's just flipping a dozen or so bits.)
This rounding causes us to lose three megabytes of address space,
but given that there was four gigabytes of address space,
a loss of less than one tenth of one percent was deemed a fair
trade-off for the significant performance improvement.
(Especially since no applications at the time came anywhere
near beginning to scratch the surface of this limit.
Your entire computer had only 2MB of RAM in the first place!)
This memory map was carried forward into
with some tweaks to handle separate address spaces for 32-bit Windows applications.
Therefore, the lowest address an executable could be loaded on Windows 95
was at 4MB, which is 0x00400000.
The linker chooses a default base address for executables of 0x0400000
so that the resulting binary can load without relocation on both Windows NT
and Windows 95.
Nobody really cares much about targeting Windows 95 any more,
so in principle, the linker folks could choose a different default base address now.
But there's no real incentive for doing it aside from making diagrams look prettier,
especially since ASLR makes the whole issue moot anyway.
And besides, if they changed it, then people would be asking,
"How come some executables have a base address of 0x04000000 and some executables
have a base address of 0x00010000?"
TL;DR: To make context switching fast.
There are a few references to the
DS_RECURSE dialog style scattered throughout MSDN,
and they are all of the form
"Don't use it."
But if you look in your copy of
there is no sign of DS_RECURSE anywhere.
This obviously makes it trivial to avoid using it
because you couldn't use it even if you wanted it,
seeing as it doesn't exist.
"Do not push the red button on the control panel!"
— There is no red button on the control panel.
"Well, that makes it easy not to push it."
As with many of these types of stories,
the answer is rather mundane.
When nested dialogs were added to Windows 95,
the flag to indicate that a dialog is a control host was
The name was intended to indicate
that anybody who is walking a dialog looking for controls
should recurse into this window, since it has more controls inside.
The window manager folks later decided to change the name,
and they changed it to DS_CONTROL.
All documentation that was written before the renaming had to be revised
to change all occurrences of DS_RECURSE to
It looks like they didn't quite catch them all:
There are two straggling references in the
Windows Embedded documentation.
My guess is that the Windows Embedded team
took a snapshot of the main Windows documentation,
and they took their snapshot before the renaming was complete.
Unfortunately, I don't have any contacts in the Windows Embedded
so I don't know whom to contact to get them to remove the references
to flags that don't exist.
This is the end of
a week that sort of happened around me and I had to catch up with.
The Windows 3.1 virtual machine manager had a clever solution for
There was only one synchronization object in the entire kernel.
It was called "the critical section", with the definite article
because there was only one.
The nice thing about a system where the only available
synchronization object is a single critical section
is that deadlocks are impossible:
The thread with the critical section will always be able to make
progress because the only thing that could cause it to stop would be
blocking on a synchronization object.
But there is only one synchronization object (the critical section),
and it already owns that.
When you hit Ctrl+Alt+Del in
Windows 3.1, a bunch of crazy stuff happened.
All this work was in a separate driver, known as the
virtual reboot device.
By convention, all drivers in Windows 3.1 were called
the virtual something device
because their main job was to virtualize some hardware or other
That's where the funny name VxD came from.
It was short for virtual x device.
First, the virtual reboot device driver checked
which virtual machine
If you were using an MS-DOS program, then it told all the device
drivers to clean up whatever they were doing for that virtual machine,
and then it terminated the virtual machine.
This was the easy case.
Otherwise, the focus was on a Windows application.
Now things got messy.
When the 16-bit Windows kernel started up, it gave the
virtual reboot device the addresses of a few magic things.
One of those magic things was a special byte that was set to 1
every time the 16-bit Windows scheduler regained control.
When you hit Ctrl+Alt+Del,
the virtual reboot device set the byte to 0,
and it also registered a callback with the virtual machine manager
to say "Call me back once the critical section becomes available."
The callback didn't do anything aside from remember the fact that it was
called at all.
And then the code waited for ¾ seconds.
(Why ¾ seconds? I have no idea.)
After ¾ seconds, the virtual reboot device looked to see
what the state of the machine was.
If the "call me back once the critical section becomes available"
callback was never called,
then the problem is that a device driver is stuck in the critical section.
Maybe the device driver put an
Abort, Retry, Ignore message on the screen that the user
needs to respond to.
The user saw this message:
After the user presses a key,
focus was placed on the
virtual machine that holds the critical section
so the user can address the problem.
A user who is still stuck can hit
to restart the whole process,
and this time, execution will go into the
"If you were using an MS-DOS program" paragraph,
and the code will shut down the stuck virtual machine.
If the critical section was not the problem, then the
virtual reboot device checked if the 16-bit kernel scheduler
had set the byte to 1 in the meantime.
If so, then it means that no applications were hung,
and you got the message
The System menu was called the Control menu back then.)
Otherwise, the special byte was still 0,
which means that the 16-bit scheduler never got control,
which meant that a 16-bit Windows application was not releasing
control back to the kernel.
The virtual reboot device then waited
for the virtual machine to finish processing any pending
(This allowed any pending MS-DOS emulation or 16-bit MS-DOS device
drivers to finish up their work.)
If things did not return to this sane state
within 3¼ seconds, then you got this screen:
we are in the case where the system returned to
a state where there are no active virtual interrupts.
The kernel single-stepped the processor if necessary
until the instruction pointer
was no longer in the kernel,
or until it had single-stepped for 5000
instructions and the instruction pointer was not in the heap manager.
(The heap manager was allowed to run for more than 5000 instructions.)
At this point, you got the screen that Steve Ballmer wrote.
If you hit Enter,
then the 16-bit kernel terminated the application by doing
mov ax, 4c00h followed by int 21h,
which was the system call that applications used to exit normally.
This time, the kernel is making the exit call on behalf of the
Everything looks like the application simply decided to exit
mov ax, 4c00h
The stuck application exits,
the kernel regains control,
and hopefully, things return to normal.
I should point out that I didn't write any of this code.
"It was like that when I got here."
There were various configuration settings to tweak all of the
For example, you could say that
restarted the computer rather than terminating the current application.
Or you could
skip the check whether the 16-bit kernel scheduler had
set the byte to 1 so that you could use
Ctrl+Alt+Del to terminate
an application even if it wasn't hung.¹
There was also a setting to restart the computer upon receipt
of an NMI,
the intention being that the signal would be triggered
either by a dedicated add-on switch
poking a ball-point pen in just the right spot.
(This is safer than just pushing the reset button because
the restart would flush disk caches and shut down devices in
an orderly manner.)
This setting was intended for developers to assist in debugging their
if you went for this option, the program that got terminated
is whichever one happened to have control of the CPU at the time you
This was, in theory, random,
but in practice it often guessed right.
That's because the problem was usually that a program got wedged
into an infinite message loop,
so most of the CPU was being run in the stuck application anyway.
The CRITICAL_SECTION structure has gone through
a lot of changes since its introduction back oh so many decades ago.
The amazing thing is that as long as you stick to the documented API,
your code is completely unaffected.
Initially, the critical section object had an owner field
to keep track of which thread entered the critical section, if any.
It also had
a lock count to keep track of how many times the owner thread
entered the critical section, so that the critical section would
be released when the matching number of
LeaveCriticalSection calls was made.
And there was an auto-reset event used to manage contention.
We'll look more at that event later.
(It's actually more complicated than this, but the details
If you've ever looked at the innards of a critical section
(for entertainment purposes only),
you may have noticed that the lock count was off by one:
The lock count was the number of active calls to
EnterCriticalSection minus one.
The bias was needed because the original version of the
interlocked increment and decrement operations
returned only the sign of the result, not the revised value.
Biasing the result by 1 means that all three states could be detected:
Unlocked (negative), locked exactly once (zero), reentrant lock (positive).
(It's actually more complicated than this, but the details
If a thread tries to enter a critical section but can't
because the critical section is owned by another thread,
then it sits and waits on the contention event.
When the owning thread releases all its claims on the critical section,
it signals the event to say,
"Okay, the door is unlocked.
The next guy can come in."
The contention event is allocated only when contention occurs.
(This is what older versions of MSDN meant when they said that
the event is "allocated on demand.")
Which leads to a nasty problem:
What if contention occurs,
but the attempt to create the contention event fails?
the answer was "The kernel raises an out-of-memory exception."
Now you'd think that a clever program could catch this exception
and try to recover from it, say, by unwinding everything that led
up to the exception.
Unfortunately, the weakest link in the chain is the critical section
In the original version of the code,
the out-of-memory exception was raised while the critical section was
in an unstable state.
Even if you managed to catch the exception and unwind everything you could,
the critical section was itself irretrievably corrupted.
Another problem with the original design became apparent on multiprocessor
systems: If a critical section was typically held for a very brief time,
then by the time you called into kernel to wait on the contention event,
the critical section was already freed!
void SetGuid(REFGUID guid)
g_theGuid = guid;
void GetGuid(GUID *pguid)
*pguid = g_theGuid;
This imaginary code uses a critical section to protect accesses
to a GUID.
The actual protected region is just nine instructions long.
Setting up to wait on a kernel object is way,
way more than nine instructions.
If the second thread immediately waited on the critical section
it would find that by the time the kernel got around to entering
the wait state, the event would say,
"Dude, what took you so long? I was signaleded, like, a bazillion
Windows 2000 added the
which lets you avoid the problem where waiting for a critical section
costs more than the code the critical section was protecting.
If you initialize with a spin count, then when a thread tries to
enter the critical section and can't,
it goes into a loop trying to enter it over and over again,
in the hopes that it will be released.
"Are we there yet?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?
How about now?"
If the critical section is not released after the requested number
then the old slow wait code is executed.
Note that spinning on a critical section is helpful only on
and only in the case where you know that all the protected code
segments are very short in duration.
If the critical section is held for a long time,
then spinning is wasteful since the odds that the critical section
will become free during the
are very low,
and you wasted a bunch of CPU.
Another feature added in Windows 2000 is the ability to
preallocate the contention event.
This avoids the dreaded
"out of memory" exception in
but at a cost of a kernel event for every critical section,
whether actual contention occurs or not.
Windows XP solved the problem of the dreaded "out of memory"
exception by using a fallback algorithm that could be used
if the contention event could not be allocated.
The fallback algorithm is not as efficient, but at least
it avoids the "out of memory" situation.
(Which is a good thing, because nobody really expects
EnterCriticalSection to fail.)
This also means that requests for the contention event to be preallocated
are now ignored, since the reason for preallocating
(avoiding the "out of memory" exception) no longer exists.
(And while they were there, the kernel folks also fixed
InitializeCriticalSection so that
a failed initialization left the critical section object in
a stable state so you could safely try again later.)
In Windows Vista, the internals of the critical section object
were rejiggered once again,
this time to add convoy resistance.
The internal bookkeeping completely changed;
the lock count is no longer a 1-biased count of the number of
EnterCriticalSection calls which are pending.
As a special concession to backward compatibility with people
who violated the API contract and
looked directly at the internal data structures,
the new algorithm goes to some extra effort to ensure that
if a program breaks the rules and
looks at a specific offset inside the critical section
the value stored there is −1 if and only if the critical section
Often, people will remark that "your compatibility problems would go
away if you just open-sourced the operating system."
I think there is some confusion over what "go away" means.
If you release the source code to the operating system,
it makes it even easier for people to take undocumented
dependencies on it,
because they no longer have the barrier of "Well, I can't find any
documentation, so maybe it's not documented."
They can just read the source code and say,
"Oh, I see that if the critical section is unlocked,
the LockCount variable has the value −1."
Boom, instant undocumented dependency.
Compatibility is screwed.
(Unless what people are saying "your compatibility problems would
go away if you just open-sourced all applications,
so that these problems can be identified and fixed as soon as they
Why isn't it important that the fallback algorithm be highly efficient?
We pick up the story
with Windows 95.
As I noted, the
dialog was introduced in Windows 3.1,
and in Windows 95; it was already gone.
In Windows 95,
called up a dialog box that looked something like this:
(We learned about Systray
some time ago.)
Whereas Windows 3.1 responded to fatal errors
by crashing out to a black screen,
Windows 95 switched to showing severe errors in blue.
And I'm the one who wrote it. Or at least modified it last.
I was responsible for the code that displayed blue screen messages:
Asking the kernel-mode video driver to switch into text mode,
filling the screen with a blue background,
drawing white text,
waiting for the user to press a key,
restoring the screen to its original contents,
and reporting the user's response back to the component that asked to
display the message.¹
When a device driver crashed,
Windows 95 tried its best to limp along despite a catastrophic
failure in a kernel-mode component.
It wasn't a blue screen of death so much as a blue screen of lameness.
For those fortunate never to have seen one,
it looked like this:
Note the optimistic message
"It may be possible to continue normally."
Everybody forgets that after Windows 95 showed you a blue screen error,
it tried its best to ignore the error and keep running anyway.
I mean, sure your scanner driver crashed, so scanning doesn't work any more,
but the rest of the system seems to be okay.
(Imagine if you did that today.
"Press any key to ignore this kernel panic.")
Technically, what happened was that the virtual machine manager
abandoned the event currently in progress and returned to the
It's the kernel-mode equivalent to
swallowing exceptions in window
procedures and returning to the message loop.
If there was no event in progress, then the current application was
Sometimes the problem was global,
and abandoning the current event or
terminating the application did nothing to solve the problem;
all that happened was that the next event or application to run encountered
the same problem,
and you got another blue screen message a few milliseconds later.
After about a half dozen of these messages,
you most likely gave up hope and hit
Now, that's what the message looked like originally,
but that message had a problem:
Since the addresses at which device drivers were loaded into the kernel
were not predictable,
having the raw address didn't really tell you much.
If you were someone who was told,
"This senior executive got this crash message,
can you figure out what happened?",
all you had to work with was a bunch of
mostly useless numbers.
That someone might have been me.
To help with this problem,
I tweaked the message to include the name of the driver,
the section number,
and the offset within the section.
Now you had the name of the driver that crashed,
which might give you a clue of where the problem is,
even if you knew nothing else.
somebody with access to a MAP file for the driver
could now look up the address and identify which line
Not great, but better than nothing,
and before I made this change,
nothing is what you had.
So you could say that I wrote the Windows 95 blue screen
of death lameness to make my own life easier.
Later, someone (I forget whether it was me,
so let's say it was one of my colleagues)
added some more code to inspect the crashing address,
and if it was inside the kernel heap manager,
the message changed slightly:
In this case,
the sentence "It may be possible to continue normally" disappeared.
Because we knew that, odds are, it won't be.
You considered posting a correction,
but your correction was also wrong.
At least you realized your mistake.
¹ Since this code ran in the kernel,
it didn't have access to keyboard layout information.
It doesn't know that if you are using the Chinese-Bopomofo keyboard layout,
the way to type "OK" is to press
C, followed by L,
followed by 3.
Not that it would help, because there is no IME in the kernel anyway.
As much as possible,
the responses were mapped to language-independent
And all of those articles cited
my article titled
"Who wrote the text for the Ctrl+Alt+Del dialog in Windows 3.1?"
Somehow, everybody decided to ignore that I wrote
and replace it with what they wanted to hear:
"blue screen of death".¹
Note also that people are somehow blaming the person who wrote the
text of the error message for the fact that the message appears
in the first place.
It's like blaming
for AT&T's crappy service.
It must suck being the person whose job it is to write error messages.
was not the blue screen of death.
I mean, it had a Cancel option, for goodness's sake.
What message of death comes with a Cancel option?
Grim Reaper: I am here to claim your soul.
You: No thanks.
Grim Reaper: Oh, well, in that case, sorry to have bothered you.
blue screen error messages were not new to Windows 3.1.
They existed in Windows 3.0, too.
What was new in Windows 3.1 was a special handler for
which tried to identify the program that was
not responding and give you the opportunity to terminate it.
Windows itself was fine; it was just the program that was in trouble.
Recall that Windows in Enhanced mode
ran three operating systems simultaneously,
A copy of Standard mode Windows ran inside one of the virtual machines,
and all your MS-DOS applications ran in the other virtual machines.
These blue screen messages came from the virtual machine manager.
If you had a single-floppy system, the two logical drives A: and B:
were shared by the single physical floppy drive.
When a program switched from accessing drive A: to drive B:,
or vice versa,
Windows prompted you to insert the disk for the new drive:
Another job of the virtual machine manager is to
arbitrate access to
As long as two virtual machines didn't try to access the same resource
the arbitration could be resolved invisibly.
But if two virtual machines tried to access, say, the serial port at the
Windows alerted you to the conflict and asked you which virtual machines
should be granted access and which should be blocked.
It looked like this:
Windows 3.1 didn't have a blue screen of death.
If an MS-DOS application crashed,
you got a blue screen message saying that the application crashed,
but Windows kept running.
If it was a device driver that crashed, then Windows 3.1
shut down the virtual machine that the device driver was running in.
But if the device driver crashed the Windows virtual machine,
then the entire virtual machine manager
shut itself down,
sometimes (but not always) printed a brief text message, and handed
control back to MS-DOS.
So you might say that it was a black screen of death.
The window of opportunity for seeing the
dialog was quite small:
You basically had to be running Windows 3.1 or
Next time, we'll see who actually wrote Windows 95
blue screen of death.
Spoiler alert: It was me.
wrote "Microsoft has revealed"
and "Redmond's Old new thing blog",
once again turning me into an official company spokesperson.
(They also chose not to identify me by name,
which may be a good thing.)
identified me as a Microsoft executive.
I'm still waiting for that promotion letter.
(And, more important,
the promotion pay raise.)
First honorable mention goes to
for illustrating the article with the
Windows 95 dialog that Steve Ballmer
I mean, I had a copy of the screen in my article,
and they decided to show some other screen instead.
I gues nobody bothered to verify that the dialogs matched.
Second honorable mention goes jointly to
for illustrating the article with the
Windows NT blue screen of death.
Not only was it the wrong dialog,
it was the wrong operating system.
Nice one, guys.
And special mention goes to
who entirely fabricated a scenario and posited it as real:
"What longtime Windows user can forget the panic that set in
the first time their entire screen went blue
for no explicable reason and was informed that
'This Windows application has stopped responding to the system.'"
Yeah, that never happened.
dialog did not appear spontaneously;
you had to press
to get it.
The answer to their question "Who remembers this?" is "Nobody."
BGR also titled their article
"It turns out Steve Ballmer was directly responsible
for developing Windows most hated feature."
He didn't develop the feature.
He just wrote the text.
I also wonder why giving the user the opportunity to terminate a runaway
program is the most-hated feature of Windows.
Sorry for giving you some control over your computer.
I guess they would prefer things the way Windows 3.0 did it:
A runaway program forces you to reboot.
One of the
differences between standard-mode Windows and
was what happened when you hit Ctrl+Alt+Del.
Since 16-bit Windows applications are co-operatively multi-tasked,
it is easy to determine whether the system is responding,
and if not, it is also easy to
identify the application which is responsible.
In that case, Windows gave
you options to close the non-responsive application, restart the computer,
During this time period,
Steve Ballmer was head of the Systems Division,
and he paid a visit to the Windows team to see what they were up to,
as is the wont of many executives.¹
When they showed him the
he nodded thoughtfully and added,
"This is nice, but I don't like the text of the message.
It doesn't sound right to me."
If you think you can do a better job, then go for it."
some other executive,
Steve took up the challenge,
and a few days later,
he emailed what he thought the
Ctrl+Alt+Del screen should say.
The text he came up with was actually quite good,
and it went into the product pretty much word for word.
Note to journalists: This is the
not the blue screen of death.
Thank you for paying attention.
¹ It occurred to me only as I wrote up this entry that people
took the phrase Right on top of my notepad from the
earlier story literally:
There was a chair, the chair had a notepad on its seat,
Bill sat in the chair (on top of the notepad).
That interpretation never occurred to me.
From the description in the previous paragraph, it was apparent to me
notepad was on a desk, and Bill's choice of seat blocked access to the
notepad. (I.e., the manager would have to reach around Bill to
get the notepad.)
The person telling the story
is not a native English speaker, so there may
have been a preposition translation issue.