Holy cow, I wrote a book!
Post suggestions for future topics here instead of posting off-topic
comments. Note that the suggestion box is emptied and read periodically
so don't be surprised if your suggestion vanishes. (Note also that I am
under no obligation to accept any suggestion.)
Topics I are more inclined to cover:
Topics I am not inclined to cover:
You can also
send feedback on Microsoft products directly to Microsoft.
All the feedback gets read,
even the death threats.
Suggestions should be between two and four sentences in length.
As you can see, there are hundreds of them already,
so you have three seconds to get your point across.
Please also search the blog first
because your suggestion may have already been covered.
And remember, questions aren't suggestions.
Note the enormous topic backlog.
Consequently, the suggestion box has been closed temporarily and will
reopen once the existing backlog has cleared,
which I estimate will happen sometime in early 2010.
If your suggestion is that important, I'm sure you'll remember
it when the suggestion box reopens.
Exiting is one of the scariest moments in the
lifetime of a process.
(Sort of how landing is one of the scariest moments of air travel.)
Many of the details of how processes exit are left
unspecified in Win32, so different Win32 implementations can
follow different mechanisms.
Win32s, Windows 95, and
Windows NT all shut down processes differently.
(I wouldn't be surprised if Windows CE uses yet another
Therefore, bear in mind that what I write in this mini-series
is implementation detail and can change at any time without warning.
I'm writing about it because these details can highlight
bugs lurking in your code.
In particular, I'm going to discuss the way processes exit
on Windows XP.
I should say up front that I do not agree with many steps in the
way processes exit on Windows XP.
The purpose of this mini-series is not to justify the way processes
exit but merely to fill you in on some of the behind-the-scenes
activities so you are better-armed when you have to investigate into
a mysterious crash or hang during exit.
(Note that I just refer to it as the way processes exit
on Windows XP rather than saying that it is how process
exit is designed.
As one of my colleagues put it,
"Using the word design to describe this is like using
the term swimming pool to refer to a puddle in your garden.")
When your program calls ExitProcess a whole lot of
machinery springs into action.
First, all the threads in the process (except the one calling
are forcibly terminated.
This dates back to the old-fashioned theory on how processes
Under the old-fashioned theory,
when your process decides that it's time to exit,
it should already have cleaned up all its threads.
The termination of threads, therefore, is just a safety
net to catch the stuff you may have missed.
It doesn't even wait two seconds first.
Now, we're not talking happy termination like ExitThread;
that's not possible since the thread could be in the middle of
Injecting a call to ExitThread would result in
DLL_THREAD_DETACH notifications being sent at times
the thread was not prepared for.
Nope, these threads are terminated in the style of
Just yank the rug out from under it.
This is an ex-thread.
Well, that was a pretty drastic move, now, wasn't it.
And all this after the scary warnings in MSDN that
TerminateThread is a bad function that should
Wait, it gets worse.
Some of those threads that got forcibly terminated may have
owned critical sections, mutexes, home-grown synchronization
primitives (such as spin-locks), all those things
that the one remaining thread might need access to during its
Well, mutexes are sort of covered; if you try to enter that
mutex, you'll get the mysterious
WAIT_ABANDONED return code
which tells you that "Uh-oh, things are kind of messed up."
What about critical sections?
There is no "Uh-oh" return value for critical sections;
EnterCriticalSection doesn't have a return value.
Instead, the kernel just says "Open season on critical sections!"
I get the mental image of all the gates in a parking garage just
opening up and letting anybody in and out.
As for the home-grown stuff, well, you're on your own.
This means that if your code happened to have owned a critical section
at the time somebody called ExitProcess,
the data structure the critical section is protecting has a good
chance of being in an inconsistent state.
(Afer all, if it were consistent, you probably would have exited
the critical section!
Well, assuming you entered the critical section because you were
updating the structure as opposed to reading it.)
Your DLL_PROCESS_DETACH code runs,
enters the critical section, and it
succeeds because "all the gates are up".
Now your DLL_PROCESS_DETACH code
starts behaving erratically because the values in that data
structure are inconsistent.
Oh dear, now you have a pretty ugly mess on your hands.
And if your thread was terminated while it owned a spin-lock
or some other home-grown synchronization object,
your DLL_PROCESS_DETACH will most likely simply
hang indefinitely waiting patiently for that terminated thread
to release the spin-lock (which it never will do).
But wait, it gets worse.
That critical section might have been the one that protects
the process heap!
If one of the threads that got terminated happened to be in
the middle of a heap function like HeapAllocate
or LocalFree, then the process heap may very
well be inconsistent.
If your DLL_PROCESS_DETACH tries to allocate or
free memory, it may crash due to a corrupted heap.
Moral of the story:
If you're getting a DLL_PROCESS_DETACH due to
don't try anything clever.
Just return without doing anything
and let the normal process clean-up happen.
The kernel will close all your open handles to kernel objects.
Any memory you allocated will be freed automatically when
the process's address space is torn down.
Just let the process die a quiet death.
Note that if you were a good boy and cleaned up all the
threads in the process
before calling ExitThread,
then you've escaped all this craziness, since
there is nothing to clean up.
Note also that if you're getting a DLL_PROCESS_DETACH
due to dynamic unloading, then you do need to clean up
your kernel objects and allocated memory
because the process is going to continue running.
But on the other hand,
in the case of dynamic unloading, no other threads should be
executing code in your DLL anyway (since you're about to be unloaded),
so—assuming you coded up your DLL correctly—none of your
critical sections should be held and
your data structures should be consistent.
Hang on, this disaster isn't over yet.
Even though the kernel went around terminating all
but one thread in the process,
that doesn't mean that the creation of new threads is blocked.
If somebody calls CreateThread in their
DLL_PROCESS_DETACH (as crazy as it sounds),
the thread will indeed be created and start running!
But remember, "all the gates are up", so your critical sections
are just window dressing to make you feel good.
(The ability to create threads after process termination has begun
is not a mistake; it's intentional and necessary.
Thread injection is how the debugger breaks into a process.
If thread injection were not permitted, you wouldn't be able
to debug process termination!)
Next time, we'll see how the
way process termination takes place on Windows XP
caused not one but two problems.
†Everybody reading this article
should already know how to determine whether
this is the case.
I'm assuming you're smart.
Don't disappoint me.
Life was simpler back in the old days.
Back in the old days,
processes were believed to be in control of their threads.
You can see this in the "old fashioned" way of exiting a process,
namely by exiting all the threads.
This method works only if the process knows about all the
threads running in it and can get each one to clean up
when it's time for the process to exit.
In other words, the old-fashioned theory was that when
a process wanted to exit, it would do something like this:
Of course, that was before the introduction of programming
constructions that created threads that the main program didn't know
about and therefore had no control over.
Things like the thread pool, RPC worker threads,
DLLs that create worker threads
(something still not well-understood even today).
The world today is very different.
Next time, we'll look at how this simple view of processes
and threads affects the design of how processes exit.
Still, you learned enough today to be able to solve
this person's problem.
Last year, a Windows security update got a lot of flack for causing some machines to hang, and it was my fault. (This makes messing up a demo at the Financial Analysts Meeting look like small potatoes.)
The security fix addressed a category of attacks wherein people could construct shortcut files or other items which specified a CLSID that was never intended to be used as a shell extension. As we saw earlier, lots of people mess up IUnknown::QueryInterface, and if you pass the CLSID of one of these buggy implementations, Explorer would dutifully create it and try to use it, and then bad things would happen. The object might crash or hang or even corrupt memory and keep running (sort of).
To protect against buggy shell extensions, Explorer was modified to use a helper program called verclsid.exe whose job was to be the "guinea pig" and host the shell extension and do some preliminary sniffing around to make sure the shell extension passed some basic functionality tests before letting it run loose in Explorer. That way, if the shell extension went crazy, the victim would be the verclsid.exe process and not the main Explorer process.
The verclsid.exe program created a watchdog thread: If the preliminary sniffing took too long, the watchdog assumed that the shell extension was hung and the watchdog told Explorer, "Don't use this shell extension."
I was one of the people brought in to study this new behavior, poke holes in its design, poke holes in its implementation, review every line of code that changed and make sure that it did exactly what it was supposed to do without introducing any new bugs along the way. We found some issues, testers found some other issues, and all the while, the clock was ticking since this was a security patch and people enjoy mocking Microsoft over how long it takes to put a security patch together.
The patch went out, and reports started coming in that machines were hanging. How could that be? We created a watchdog thread specifically to catch the buggy shell extensions that hung; why isn't the watchdog thread doing its job?
That was a long set-up for today's lesson.
After running its sanity tests, the verclsid.exe program releases the shell extension, un-initializes COM, and then calls ExitProcess with a special exit code that means, "All tests passed." If you read yesterday's installment, you already know where I messed up.
The DLL that implemented the shell extension created a worker thread, so it did an extra LoadLibrary on itself so that it wouldn't get unloaded when COM freed it as part of CoUninitialize tear-down. When the DLL got its DLL_PROCESS_DETACH, it shut down its worker thread by the common technique of setting a "clean up now" event that the worker thread listened for, and then waiting for the worker thread to respond with a "Okay, I'm all done" event.
But recall that the first stage in process exit is the termination of all threads other than the one that called ExitProcess. That means that the DLL's worker thread no longer exists. After setting the event to tell the (nonexistent) thread to clean up, it then waited for the (nonexistent) thread to say that it was done. And since there was nobody around listening for the clean-up event, the "all done" event never got set. The DLL hung in its DLL_PROCESS_DETACH.
Why didn't our watchdog thread save us? Because the watchdog thread got killed too!
Now, the root cause for all this was a buggy shell extension that did bad things in its DLL_PROCESS_DETACH, but blaming the shell extension misses the point. After all, it was the fact that there existed buggy shell extensions that created the need for the verclsid.exe program in the first place.
Welcome Slashdot readers. Since you won't read the existing comments before posting your own, I'll float some of the more significant ones here.
The buggy shell extension was included with a printer driver for a printer that is no longer manufactured. Good luck finding one of those in your test suite.
The security update was recalled and reissued in a single action, which most people would call an update or refresh, but the word recall works better in a title.
To execute a command in each subdirectory of a directory tree
from a batch file, you can adapt the following:
for /f "delims=" %%i in ('dir /ad/s/b') do echo %%i
(If you want to play with this command from the command prompt,
then undouble the percent signs.)
The /F option enables various special behaviors
of the FOR command.
The most important change is that a string in single-quotation marks
causes the contents to be interpreted as a command whose output is
to be parsed.
(This behavior changes if you use the usebackq option,
but I'm not using that here.)
Therefore, the FOR command will run the
dir /ad/s/b command and parse the output.
The dir /ad/s/b command performs a recursive listing
of only directories, printing just the names of the directories found.
The option we provide, delims= changes the default
delimiter from a space to nothing.
This means that the entire line is to be read into the %i variable.
(Normally, only the first word is assigned to %i.)
Therefore, the FOR loop executes once for each subdirectory,
with the %i variable set to the subdirectory name.
The command request to be performed for each line is simply echoing the
In real life, you would probably put something more interesting here.
For example, to dump the security descriptor of each
directory (which was the original problem that inspired this entry),
you can type this on the command line:
for /f "delims=" %i in ('dir /ad/s/b') do cacls "%i" >>"%TEMP%\cacls.log"
I doubt anybody actually enjoys working with batch files,
but that doesn't mean tips on using it more effectively aren't valid.
In Windows 95, we experimented with other fonts for the console window,
and it was a disaster.
In order to be a usable font for the console window,
the font needs to be more than merely monospace.
It also needs to support all the characters in the OEM code page.
Testing this is easy for SBCS code pages, since they
have only 256 characters.
But for DBCS code pages, testing all the characters means testing
tens of thousands of code points.
The OEM code page test already rules out a lot of fonts,
because the 437 code page (default in the United States)
contains oddball characters like the box-drawing characters
and a few astronomical symbols
which most Windows fonts don't bother to include.
But checking whether the font supports all the necessary characters
is a red herring.
The most common reason why a font ends up unsuitable for use in
a console window is that the font contains characters with negative
A- or C-widths.
These A- and C-width values come from the
ABC structure and represent the amount of under-
and overhang a character consumes.
Consider, for example, the capital letter W.
In many fonts, this character contains both under- and overhang:
Notice how the left and right stems "stick out" beyond the
putative cell boundaries.
I wrote code in Windows 95 to allow any monospace
font to be used in console windows, and the
ink was hardly even dry on the CD before the bugs started
"When I choose Courier New as my font, my console
window looks like a Jackson Pollock painting with splotches of pixels
everywhere, and parts of other characters get cut off."
(Except that they didn't use words as nice as "splotches of pixels".)
The reason is those overhang pixels.
The console rendering model assumes each character fits neatly inside
its fixed-sized cell.
When a new character is written to a cell, the old cell is
overprinted with the new character, but if the old character
has overhang or underhang, those extra pixels are left behind
since they "spilled over" the required cell and infected neighbor cells.
Similarly, if a neighboring character "spilled over",
those "spillover pixels" would get erased.
The set of fonts that could be used in the console window
was trimmed to the fonts that were tested and known to work
acceptably in console windows.
For English systems, this brought us down to Lucida Console
"Why isn't there an interface for choosing a replacement font,
with a big annoying message box warning you that 'Choosing
a font not on the list above may result in really ugly results.
Don't blame me!'?"
First of all, because we know that nobody reads those warnings
Second, because a poor choice of font results in the console
window looking so ugly that everybody would rightly claim that
it was a bug.
"No, it's not a bug. You brought this upon yourself by choosing
a font that results in painting artifacts when used in a console
"Well, that's stupid. You should've stopped me from choosing
a font that so clearly results in nonsense."
And that's what we did.
Of course, if you're a super-geek and are willing to shoulder
the blame if the font you pick happens not to be suitable for
use in a console window,
you can follow the instructions in this Knowledge Base article
to add your font to the list.
But if you end up creating a work of modern art,
well, you asked for it.
In the title of this entry, s/console windows/Windows console windows/†
†s/Windows console windows/Windows console windows when displayed
inside a GUI window, as opposed to consoles that have gone to
hardware fullscreen, which is another matter entirely/.
New in Visual C++ 2005 is the ability to
specify a manifest
dependency via a #pragma directive.
This greatly simplifies using version 6 of the shell
You just have to drop the line
// do not use - see discussion below
#pragma comment(linker, \
into your program and the linker will do the rest.
Note that the processor architecture is hard-coded into the
above directive, which means that if you are targetting
x64, you'll get the wrong manifest.
To fix that, we need to do some preprocessor munging.
#define MANIFEST_PROCESSORARCHITECTURE "x86"
#define MANIFEST_PROCESSORARCHITECTURE "amd64"
#define MANIFEST_PROCESSORARCHITECTURE "ia64"
#error Unknown processor architecture.
#pragma comment(linker, \
"processorArchitecture='" MANIFEST_PROCESSORARCHITECTURE "' "\
Update: I didn't know that * is allowed here to indicate
That simplifies matters greatly.
#pragma comment(linker, \
Nearly all computer administrators are idiots.
That's not because the personnel department is incompetent
or because it's impossible to train competent administrators.
It's because, for a consumer operating system,
the computer administrator didn't ask to be one.
In nearly all cases,
the computer administrator is dad or grandma.†
They didn't ask to be to be the computer administrator.
They just want to surf the web and read email from Jimmy.‡
All this means is that you can't say,
"Well, if the user is an administrator, as opposed to a normal user,
then it's okay to show them all these dangerous things (such as
critical operating system files) because they
know what they're doing."
Grandma doesn't know what she's doing.
For a consumer operating system, a friendly user interface means
protecting the administrators from themselves.
One article without a nitpicker's corner and look what happens.
†The words "dad" and "grandma" refer to archetypes for
non-technical home users and are not
intended to be interpreted as literally dad and grandma.
‡Not all grandchildren are named Jimmy.
I just made up that term now because I needed a word to describe
the situation where some manager is put in charge of a feature
but is not given a staff to implement that feature.
This happens more often than you might think,
since there are many features that are "horizontal",
i.e., features which affect all teams throughout the project.
So-called taxes often fall into this category,
such as power management, accessibility, and multiple monitors.
(Larry Osterman calls them
I call them
The unempowered manager is in a predicament,
having been assigned a task without a staff to accomplish it.
All the unempowered manager can do is nag other people,
usually about bugs that fall into the manager's area.
Now, most of these unempowered managers understand that they
are just one of many demands on the development teams,
providing advice as necessary (since they have valuable specialized
knowledge about the problem area) but basically trying to
stay out of the way.
Others, on the other hand, take upon themselves a much more
active role in "driving" their pet issues.
This means that I will get mail like this:
You have an elephant† bug
The following elephant bug is assigned to you:
16384 Elephants not available in animal dropdown box (opened 2006/05/12)
What is the ETA for fixing this bug?
Somebody you've never heard of
You have an elephant† bug
The following elephant bug is assigned to you:
16384 Elephants not available in animal dropdown box (opened 2006/05/12)
What is the ETA for fixing this bug?
Somebody you've never heard of
This is another case of
"You're not my manager".
My manager decides what tasks I should be working on and in what order.
If you think this bug should be higher on my priority list,
feel free to set up a little meeting with my manager to work this out.
Until then, don't bug me.
I have work to do.
"But elephant-compatibility is important."
Are you saying that all my other tasks are unimportant?
What makes elephant-compatibility more important than my other tasks?
Do you even know what my other tasks are?
At one point, this got so bad, with many managers nagging me
about their favorite bugs on a nearly daily basis,
that I created a SharePoint site‡ called
"Raymond's task list for <date>".
Whenever somebody sent me nag mail,
"I have added your request to my SharePoint site
and assigned it a default priority."
And then I never heard from them again.
For those new to this web site
(and a reminder to those with poor memory):
†I disguise the name because (1) it's not important to the
story, and because (2) the goal is not to ridicule but
rather to illustrate a point.
Attempts to guess what "elephant" is will be deleted.
Don't make me delete further stories in this series
like I did with "Stories about Bob."
‡Or, if you're a trademark lawyer,
"A Web site powered by Microsoft® SharePoint® services."
your stupid asterisk.
Commenter Tom Grelinger asks via the Suggestion Box:
If I have a modal CDialog that is visible and usable to the user.
Let's say I receive an event somewhere else in the program
and I call DestroyWindow on the modal CDialog from within the event.
I notice that the OnDestroy is called on the CDialog,
but DoModal never exits until a WM_QUIT is posted to the modal's message pump.
What are the pitfalls to this?
Unfortunately, there is really no way to avoid this situation.
I'm not sure what the question is, actually.
The question as stated is "What are the pitfalls to this?"
but he answered that in his own question:
The pitfall is that "DoModal never exits until a WM_QUIT is
posted to the modal dialog's message pump."
I'm going to assume that the question really is,
"Why doesn't destroying the window work?"
with the follow-up question,
"What is the correct way to dismiss a modal dialog?"
The first problem with this question is that it assumes that I know
what a CDialog is.
From its name, I'm going to assume that this
is an MFC class for managing a dialog box.
But you don't even have to know that to answer the first
reformulated question operating only from Win32 principles:
DestroyWindow is not how you exit a modal dialog.
You exit a modal dialog with EndDialog.
The DestroyWindow technique is for modeless
But let's look at the question another way,
which is my point for today:
You have the MFC source code.
Don't be afraid to read it.
Especially since I don't use MFC personally;
I don't even know the basic principles of application design with MFC.
I work in straight Win32.
As a result,
I don't know the answer off the top of my
head, but fifteen minutes reading the MFC source code quickly reveals
the reason why destroying the window doesn't work.
Watch me as I go and find out the answer.
It's nothing you can't already do yourself.
calls CWnd::RunModalLoop to run the dialog loop.
If you look at CWnd::RunModalLoop,
you can see the conditions under which it will exit the modal loop.
Here's the code with irrelevant details deleted.
(They're irrelevant because they have nothing to do with how the
modal loop exits.)
int CWnd::RunModalLoop(DWORD dwFlags)
... preparatory work ...
// acquire and dispatch messages until the modal state is done
... code that doesn't break out of the loop ...
// phase2: pump messages while available
// pump message, but quit on WM_QUIT
... other code that doesn't break out of the loop ...
... other code that doesn't break the loop ...
} while (::PeekMessage(pMsg, NULL, NULL, NULL, PM_NOREMOVE))
m_nFlags &= ~(WF_MODALLOOP|WF_CONTINUEMODAL);
There are only two ways out of this loop.
The first is the receipt of a WM_QUIT message.
The second is if CWnd::ContinueModal decides that
the modal loop is finished.
The commenter already mentioned the quit message aspect to the
modal loop, so that just leaves CWnd::ContinueModal.
The CWnd::ContinueModal method is very simple:
return m_nFlags & WF_CONTINUEMODAL;
Therefore, the only other way the loop can exit is if somebody
clears the WF_CONTINUEMODAL flag.
A little grepping shows that there are only three places where this
flag is cleared.
One is in CPropertyPage, which is a derived class
of CDialog and therefore isn't relevant here.
(I'll ignore CPropertyPage in future searches.)
The second is in the line above right after the label
And the third is this method:
void CWnd::EndModalLoop(int nResult)
// this result will be returned from CWnd::RunModalLoop
m_nModalResult = nResult;
// make sure a message goes through to exit the modal loop
if (m_nFlags & WF_CONTINUEMODAL)
m_nFlags &= ~WF_CONTINUEMODAL;
This method is called in only one place:
void CDialog::EndDialog(int nResult)
if (m_nFlags & (WF_MODALLOOP|WF_CONTINUEMODAL))
Following the money one last step,
the CDialog::EndDialog method is called
from four places in CDialog.
It's called from CDialog::HandleInitDialog and
CDialog::InitDialog if some catastrophic error
occurs during dialog initialization.
And it's called from CDialog::OnOK
and CDialog::OnCancel in response to the
user clicking the OK or Cancel buttons.
Notice that the CDialog::EndDialog method is not
called when somebody forcibly destroys the dialog from
That's why destroying the dialog window doesn't break the modal loop.
If you want to break out of the modal loop, your only choices are
to post a quit message or call CWnd::EndModalLoop,
either directly or indirectly (via CDialog::EndDialog,
Notice that the MFC modal loop obeys the convention on quit messages
by re-posting the quit message when it breaks out of the modal loop.
(Though it really should have posted the wParam from
the quit message rather than just posting zero.)
The workaround therefore is not to destroy the dialog with
DestroyWindow (something you should have known
not to do a priori since that's not how you exit
modal dialog boxes) but rather by calling
CDialog::EndDialog, passing a result code that
lets the caller of CDialog::DoModal know that
the dialog box exited under unusual circumstances.
This took me fifteen minutes to research and a little over an hour to
All this work to answer a question that you should have been able
to answer yourself with a little elbow grease.
You're a smart person.
Have confidence in yourself.
You can do it.
I know you can.