Holy cow, I wrote a book!
Hacker News member
recalls the time
I went over after hours to help out the Money team debug
a nasty kernel issue.
They were running into mysterious crashes during their stress
testing and asked for my help in debugging it.
I helped out other teams quite a bit,
writing a new version of Dr. Watson for the Windows 98 team
writing a new version of the MSConfig tool based on a sketch
on a bar napkin.
And for a time,
I followed the official policy for moonlighting
to make sure everybody understood that I was doing work
outside the boundaries of my official job duties.
When the Money folks asked me for help,
I told them that before I could help them,
they would have to help me
fill out some paperwork.
The Money folks were not sure how to answer that last question,
since they didn't have any formal budget or procedures for hiring an outside
much less any procedures for hiring one
from inside the company.
I told them,
One slice of pizza."
Nobody from the Personnel department seemed to notice
the odd circumstances of this moonlighting request;
they simply rubber-stamped it and put it in my file.
The crash, it turns out, was in Windows itself.
There was a bug in the
special compiler the Languages team produced to help
build certain components of Windows 95
which resulted in an incorrect address computation
under a particularly convoluted boundary condition.
The Money folks had merely stumbled across this bug as part
of their regular testing.
I notified the appropriate people,
and the Windows team applied a workaround in their code to tickle
the compiler into generating the correct code.
As I recall, the pizza was just fine.
It was just your average delivery pizza,
nothing gourmet or anything.
Not that it had to be, because
I wasn't there
for the pizza.
why the Shut Down menu was removed from Task Manager.
I put the word "asks" in quotation marks, because it's really
a complaint disguised as a question.
As in "Why do you guys suck?"
The first thing to understand is that classic Task Manager went into
a state of sustained engineering since Windows 2000.
In other words,
the component is there, but there is no serious interest in improving it.
it wasn't updated to call
on its pages.)
It's not like there's a Task Manager Team of five people
permanently dedicated to making Task Manager as awesome as possible
for every release of Windows.
Rather, the responsibility for maintaining Task Manager is sort
of tacked onto somebody whose primary responsibilities are for
other parts of the system.
There are a lot of Windows components in this state of
"internal sustained engineering."
The infamous "Install font" dialog, for example.
The responsibility for maintaining these legacy components is
spread out among the product team so that on average,
teams are responsible both for cool, exciting things and
some not-so-cool, legacy things.
(On the other hand, according to xpclient,
an app must be serving its users really well if it hasn't changed much,
so I guess that Install font dialog is the best dialog box in
all of Windows at serving its users, seeing as it hasn't changed since 1995.)
The engineering budget for these components in internal sustained
engineering is kept to a minimum, both because there is no intention
of adding new features,
and also because the components are so old that there is unlikely
to be any significant work necessary in the future.
Every so often, some work becomes necessary, and given that the
engineering interest and budget are both very low,
the simplest way out when faced with a complicated problem in
a rarely-used feature is simply to remove the rarely-used feature.
And that's what happened to the Shut Down menu.
(Note that it's two words "Shut down" since it is being used
as a verb, not a noun.)
Given the changes to power management in Windows Vista,
the algorithm used by Task Manager was no longer accurate.
And instead of keeping Task Manager updated with every change,
the Shutdown user interface design team agreed to give the
Task Manager engineering team a break and say,
the Shut Down menu on Task Manager is rarely-used,
so we'll let you guys off the hook on this one,
so you don't keep getting weekly requests from us to change
the way Shut Down works."
I remember, back in the days of Windows XP,
seeing the giant spreadsheet used by the person responsible
for overall design of the Shutdown user interface.
It tracked the gazillion
and system configurations which all affect how shutting down
is presented to the user.
Removing the column for Task Manager from the spreadsheet
probably was met with a huge sigh of relief, not just from
the Task Manager engineering team,
but also from the person responsible for the spreadsheet.
engineering is about trade-offs.
If you decide to spend more effort making Task Manager awesome,
you lose the ability to expend that effort on something else.
(And given that you are expending effort in a code base that
is relatively old and not fresh in the minds of the people
who would be making those changes, you also increase the likelihood
that you're going to
introduce a bug
along the way.)
While it may no longer be true that
everything at Microsoft is built using various flavors of
Visual C++ 5.0, 6.0, and 7.0,
there is still a kernel of truth in it:
A lot of customers are still using Visual C++ 6.0.
That's why the unofficial slogan for Visual C++ 2010
10 is the new 6.
Everybody on the team got a T-shirt with the slogan
(because you don't have a product
until you have a T-shirt).
During the development of Windows 95,
the user interface team discovered that a component
provided by another team didn't work well
under multi-threaded conditions.
It was documented that the
Initialize function had to be the first call
made by a thread into the component.
The user interface team discovered that if one thread called
and then used the component,
then everything worked great.
But if a second thread called
the component crashed whenever the second thread tried to use it.
The user interface team reported this bug back to the team
that provided the component,
some time later, an updated version of the component was delivered.
Technically, the bug was fixed.
When the second thread called Initialize,
the function now failed with
The user interface team went back to the team that provided
"It's nice that your component detects that it is being used
by a multi-threaded client and fails the second thread's
attempt to initialize it.
But given that design,
how can a multi-threaded client use your component?"
The other team's reply was,
"It doesn't matter.
Nobody writes multi-threaded GUI programs."
The user interface team had to politely reply,
"Um, we are.
The next version of Windows will be built on a multi-threaded shell."
The other team said,
"Oh, um, we weren't really expecting that.
Hang on, we'll get back to you."
The idea that somebody might write a multi-threaded program
that used their component caught them by surprise,
and they had to come up with a new design of their component
that supported multiple threads in a clean way.
It was a lot of work, but they came through,
and Windows 95 could continue with its multi-threaded shell.
Most people know that
Windows 95 was code-named Chicago.
The subprojects of Windows 95 also had their code names,
in part because
code names are cool,
and in part because these projects were already under way
by the time somebody decided to combine them into one giant project.
Even when they were separate projects, the first three teams worked
closely together, so the names followed a pattern of ferocious cats.
My guess is that
when the user interface team chose their code name, they heard that
the other guys were naming themselves after cats,
so they picked a cat, too.
I don't know whether they did that on purpose or by accident,
but the cat they picked was not ferocious at all.
Instead, they picked
a cartoon cat.
When the feature to
show a special message after Windows had shut down
was first added,
the shutdown bitmap was a screen shot of
Ren and Stimpy saying good-bye.
we remembered to replace them before shipping.
If you were paying attention:
You would have noticed that
code names get reused a lot,
not because of any connection between the projects
but purely by coincidence.
when the WM_COPYDATA message was introduced.
The WM_COPYDATA message was introduced by Win32.
It did not exist in 16-bit Windows.
But it was there all along.
The WM_COPYDATA message was carefully
designed so that it worked in 16-bit Windows automatically.
In other words, you
retained your source code compatibility
between 16-bit and 32-bit Windows
without having to do a single thing.
Phew, one fewer
breaking change between 16-bit and 32-bit Windows.
As Neil noted,
there's nothing stopping you from sending message 0x004A
in 16-bit Windows
with a window handle in the wParam and a pointer to a
COPYDATASTRUCT in the lParam.
Since all 16-bit applications ran in the same address space,
the null marshaller successfully marshals the data between the
In a sense, support for the WM_COPYDATA
message was ported downlevel even before the message existed!
Once upon a time, there was a team developing two versions of
the first a short-term project to ship soon, and the other a more
ambitious project to ship later.
They chose to assign the projects
Ren and Stimpy,
in honor of the lead characters from the
eponymous cartoon series.
Over time, the two projects merged,
and the code name that stuck was Ren.
When the project came up in a meeting with Bill Gates,
it was mentioned verbally but never spelled out,
and since Bill wasn't closely tuned into popular culture,
he mapped the sound
/rɛn/ not to the hairless Mexican dog
but to the Christopher Wren,
St. Paul's Cathedral.
In follow-up email, he consistently referred to the
project by the name "Wren".
The Ren team liked the fact that their name gave the boss
the impression that the project
was going to be a masterpiece of architectural beauty,
so they never told him he got the name wrong.
Even though it has nothing to do with the story:
The project in question is the one that
eventually became known to the world as Outlook.
When you call GetSaveFileName,
the common file save dialog will ask the user to choose a file name,
and just before it returns it does a little create/delete dance
where it creates the file the user entered, and then deletes it.
What's up with that?
This is a leftover from the ancient days of 16-bit Windows 3.1,
back when file systems were real file systems and didn't have this
namby-pamby "long file name" or "security" nonsense.
(Insert sound effect of muscle flexing and gutteral grunting.)
Back in those days, the file system interface was MS-DOS,
and MS-DOS didn't have a way to query security attributes
because, well, the file systems of the day didn't have security
attributes to query in the first place.
But network servers did.
If you mapped a network drive from a server running one of those
fancy new file systems,
then you were in this case where your computer didn't
know anything about file system security,
but the server did.
The only way to find out whether you had permission to create
a file in a directory was to try it and see whether it worked
or whether it failed with the error
(or, as it was called back in the MS-DOS days, "5"),
Another reason why a server might reject a file name was
that it contained a character that, while legal in Windows,
was not legal on the server.
At the time, the most common reason for this was that you used
a so-called "extended character" (in other words,
a character outside the ASCII range like an accented lowercase e)
which was part of your local code page but not on the server's.
Yet another possibility was that the file name you chose would exceed
the server's path name limit.
suppose the server is running Windows for Workgroups
(which has a 64-character maximum path name limit),
and it shared out
If you mapped M: to \\server\share,
then the maximum path name on M: was only about 30 characters
used up half of your 64-character limit.
The only way to tell whether the file could be created, then,
was to try to create it and see what happens.
After creating the test file (to see if it could),
the common file save dialog immediately deleted it
in order to cover its tracks.
(This could lead to some weird behavior if users picked a directory
where they had permission to create files but no permission to delete
files that they created!)
This "test to see if I can create the file by creating it"
behavior has been carried forward ever since,
but you can suppress it by passing the
There is an oft-abused program named
Why does its name end in 32?
Why not just call it
(I will for the moment ignore the rude behavior of calling people stupid
under the guise of asking a question.)
Because there needed to be a way to distinguish the 16-bit version
from the 32-bit version.
Windows 95 had
(the 16-bit version)
(the 32-bit version).
Of course, with the gradual death of support for 16-bit Windows,
the 16-bit rundll.exe is now just a footnote in history,
leaving just the 32-bit version.
But why did the two have to have different names?
Why not just use the same name (rundll.exe) for both,
putting the 16-bit version in the 16-bit system directory
and the 32-bit version in the
32-bit system directory?
Because Windows 95 didn't have separate
16-bit and 32-bit system directories.
There was just one system directory called SYSTEM
and everything hung out there,
both 16-bit and 32-bit, like one big happy family.
Well, maybe not a happy family.
At any rate,
when 64-bit Windows was introduced,
the plan was not to do things the crazy mishmash way
and instead separate the 32-bit files into one directory
and the 64-bit files into a different directory.
That way, no files needed to be renamed,
and your batch file that ran
rundll32.exe with some goofy command line
still worked, even on 64-bit Windows.
During the discussion of
how real-mode Windows handled return addresses into discarded segments,
"What happens when
somebody does a longjmp into a discardable segment?"
I'm going to assume that everybody knows how longjmp
traditionally works so I can go straight to the analysis.
The reason longjmp is tricky is that it has to
jump to a return address that isn't on the stack.
(The return address was captured in the jmp_buf.)
If that segment got relocated or discarded, then the jump target
is no longer valid.
It would have gotten patched to a return thunk if it were on the
stack, but since it's in a jmp_buf,
the stack walker didn't see it, and the result is a return address
that is no longer valid.
(There is a similar problem if the data segment or stack segment
Exercise: Why don't you have to worry about the data segment
or stack segment being discarded?)
Recall that when a segment got discarded, all return addresses
which pointed into that segment were replaced with
I didn't mention it explicitly in the original discussion,
but there are three properties of return thunks
which will help us here:
The first property
(idempotence of the return thunk) is no accident.
It's required behavior in order for return thunks to work
After all, if the segment was loaded (say by a
direct call or some other return thunk),
then the return thunk needs to say,
"Well, I guess that was easy," and simply skip
the "load the target segment" step.
(It still needs to do the rest of the work,
The second property (abandonment) is also no
An application might decide to exit without returning
all the way to WinMain
(the equivalent of calling ExitProcess
instead of returning from WinMain).
This would abandon all the stack frames between
the exit point and the WinMain.
The third property (reuse) is a happy accident.
(Well, it was probably designed in for the purpose
we're about to put it to right here.)
Okay, now let's look at the jump buffer again.
If you've been following along so far,
you may have guessed the solution:
Pre-patch the return address as if it had already been
If it turns out that the segment was discarded,
then the return thunk will restore it.
If the segment is present (either because it was
never discarded, or because it was discarded and
reloaded, possibly at a new address),
the return thunk will figure out where the code
is and jump to it.
Actually, since the state is being recorded in a
jmp_buf, the tight space constraints
of stack patching do not apply here.
If it turns out you need 20 bytes of memory to
record this information, then go ahead and
make your jmp_buf 20 bytes.
You don't have to try to make it all fit
inside an existing stack frame.
The jmp_buf therefore
doesn't have to try to play the crazy
air-squeezing games that stack patching did.
It can record the return thunk,
the handles to the data and stack segments,
and the return IP without any encoding at all.
And in fact, the longjmp
function doesn't need to invoke the return
It can just extract the segment number
after the initial INT 3Fh and
pass that directly to the segment loader.
(There is a little hitch if the address
being returned to is fixed; in that case,
there is no return thunk.
But that just makes things easier:
The lack of a return thunk means that the
return address cannot be relocated,
so there is no patching needed at all!)
This magic with return thunks and segment
reloading is internal to the operating system,
so the core setjmp and
was provided by the kernel rather than the
C runtime library in a pair of functions
called Catch and Throw.
The C runtime's setjmp
and longjmp functions
merely forwarded to the kernel versions.