Holy cow, I wrote a book!
Internally, a critical section is a bunch of counters and flags,
and possibly an event.
(Note that the internal structure of a critical section is subject
to change at any time—in fact, it changed between
Windows XP and Windows 2003.
The information provided here is therefore intended for troubleshooting
and debugging purposes and not for production use.)
As long as there is no contention, the counters and flags are
sufficient because nobody has had to wait for the critical section
(and therefore nobody had to be woken up when the critical section
If a thread needs to be blocked because the critical section it wants
is already owned by another thread,
the kernel creates an event for the critical section
(if there isn't one already) and waits on it.
When the owner of the critical section finally releases it,
the event is signaled, thereby alerting all the waiters that the
critical section is now available and they should try to enter it
(If there is more than one waiter, then only one will actually
enter the critical section and the others will return to the wait
If you get an invalid handle exception in
it means that the critical section code thought that there
were other threads waiting for the critical section to become
available, so it tried to signal the event, but the event handle
was no good.
Now you get to use your brain to come up with reasons why this might be.
One possibility is that the critical section has been corrupted,
and the memory that normally holds the event handle has been
overwritten with some other value that happens not to be a
Another possibility is that some other piece of code passed
an uninitialized variable to the CloseHandle
function and ended up closing the critical section's handle
This can also happen if some other piece of code has a double-close
bug, and the handle (now closed) just happened to be reused as the
critical section's event handle.
When the buggy code closes the handle the second time by mistake,
it ends up closing the critical section's handle instead.
Of course, the problem might be that the critical section is not
valid because it was never initialized in the first place.
The values in the fields are just uninitialized garbage,
and when you try to leave this uninitialized critical section,
that garbage gets used as an event handle, raising the invalid
Then again, the problem might be that the critical section is
not valid because it has already been destroyed.
For example, one thread might have code that goes like this:
... do stuff...
While that thread is busy doing stuff,
another thread calls
This destroys the critical section while another thread
was still using it.
Eventually that thread finishes doing its stuff and calls
which raises the invalid handle exception because the
DeleteCriticalSection already closed the handle.
All of these are possible reasons for an invalid handle
exception in LeaveCriticalSection.
To determine which one you're running into will require more
debugging, but at least now you know what to be looking for.
One of my colleagues from the kernel team points out that
the Locks and Handles checks in
Application Verifier are great
for debugging issues like this.
One of the more subtle ways
people mess up IUnknown::QueryInterface
when the problem wasn't actually an unsupported interface.
E_NOINTERFACE return value
has very specific meaning.
Do not use it as your generic "gosh, something went wrong" error.
(Use an appropriate error such as E_OUTOFMEMORY
Recall that the rules for
are that (in the absence of catastrophic errors such as
if a request for a particular interface succeeds,
then it must always succeed in the future for that object.
Similarly, if a request fails with E_NOINTERFACE,
then it must always fail in the future for that object.
These rules exist for a reason.
In the case where COM needs to create a proxy for your object
(for example, to marshal the object into a different apartment),
the COM infrastructure does a lot of interface caching (and
negative caching) for performance reasons.
For example, if a request for an interface fails, COM remembers
this so that future requests for that interface are failed
immediately rather than being marshalled to the original object
only to have the request fail anyway.
Requests for unsupported interfaces are very common in COM,
and optimizing that case yields significant performance improvements.
If you start returning E_NOINTERFACE for problems
other than "The object doesn't support this interface",
COM will assume that the object really doesn't support the interface
and may not ask for it again even if you do.
This in turn leads to very strange bugs that defy debugging:
You are at a call to
you set a breakpoint on your object's implementation of
IUnknown::QueryInterface to see what the problem is,
you step over the call and get
E_NOINTERFACE back without your breakpoint ever hitting.
Because at some point in the past, you said you didn't support
the interface, and COM remembered this and "saved you the trouble"
of having to respond to a question you already answered.
The COM folks tell me that they and their comrades in product support
end up spending hours debugging customer's problems like
"When my computer is under load, sometimes I start getting
E_NOINTERFACE for interfaces I definitely support."
Save yourself and the COM folks several hours of frustration.
Don't return E_NOINTERFACE
unless you really mean it.
Back in 2003, I wrote that
I'm doing this instead of writing a book.
That was true then, but last year I decided to give this book thing
another go, only to find that publishers generally aren't interested
in this stuff any more.
"Does the world really need another book on Win32?
Nobody buys Win32 books any more, that dinosaur!"
"A conversational style book?
People want books with step-by-step how-to's and comprehensive treatments,
not water cooler anecdotes!"
"Just 200 pages?
There isn't enough of an audience for a book that small!"
Luckily, I found a sympathetic ear from the folks at
who were willing to take a chance on my unorthodox proposal.
But I caved on the length, bringing it up to 500 pages.
Actually, I came up with more like 700 pages of stuff,
and they cut it back to 500,
because 700 pages would take the book into the next price tier, and
"There isn't enough of an audience for a book that big!"
Eighteen months later, we have
The Old New Thing:
Practical Development Throughout the Evolution of Windows,
following in what appears to be the current fad of giving your
book a title of the form
Catchy Phrase: Longer Explanation of What the Catchy Phrase Means.
It's a selection of entries from this blog,
and with new material sprinkled in.
There are also new chapters that go in depth into parts of Win32
you use every day but may not fully understand
(the dialog manager, window messages), plus a chapter dedicated
(For some reason, the Table of Contents on the book web site is incomplete.)
Oh, and those 200 pages that got cut?
They'll be made available for download as "bonus chapters".
(The bonus chapters aren't up yet, so don't all rush over there
looking for them.)
The nominal release date for the book is January 2007,
which is roughly in agreement with the book web site
which proclaims availability on December 29th.
Just in time for Christmas your favorite geek,
if your favorite geek can't read a calendar.
Now I get to see
how many people were lying
when they said, "If you wrote a book based on this blog, I'd buy it."
The bonus chapters are now available.)
Now available in Japanese!
Now available in Chinese!
As a follow-up to
the difference between My Documents and Application Data,
I'd like to rant about all the subdirectories of My Documents
that programs create because they think they're so cool.
I'm sure there are more.
Everything in the My Documents folder
the user should be able to point to and say,
"I remember creating that file on such-and-such date when I did a 'Save'
from Program Q."
If it doesn't pass that test, then don't put it into My Documents.
Use Application Data.
And don't create subdirectories off of My Documents.
If the user wants to organize their documents into subdirectories,
that's their business.
You just ask them where they want their documents and let it go at that.
(Yes, I'm not a fan of My Music, My Videos, and My Pictures, either.)
Omar Shahine points out that
Apple has similar guidelines for the Macintosh.
I wonder how well people follow them.
Jesse Kaplan, one of the CLR program managers,
why you shouldn't write in-process shell extensions in managed code.
The short version is that doing so introduces a CLR version dependency
which may conflict with the CLR version expected by the host process.
Remember that shell extensions are injected into all processes that
use the shell namespace, either explicitly by calling
SHGetDesktopFolder or implicitly by calling
a function like SHBrowseForFolder,
ShellExecute, or even
Since only one version of the CLR can be loaded per process,
it becomes a race to see who gets to load the CLR first and
establish the version that the process runs,
and everybody else who wanted some other version loses.
Now that version 4 of the .NET Framework
supports in-process side-by-side runtimes,
is it now okay to write shell extensions in managed code?
The answer is still no.
Last time, I introduced
a friend I called "Bob" for the purposes
of this story.
At a party earlier this year, I learned second-hand
what Bob had been up to more recently.
The team Bob worked for immediately prior to his retirement gave
him a call.
We're trying to ship version N+1 of Product X,
and we really need your help.
I know you're all retired and stuff,
and you don't live in the area any more,
but you're the only guy who can save us.
Could you come out of retirement just for a few months?"
Bob said, "Okay.
This is a favor to you guys since I like you so much."
When he sat down to sign the paperwork,
he took the contract and crossed out the amount of money he
would be paid and wrote in its place, "One dollar".
Because he wasn't taking this job to get rich.
He was doing it as a favor to his old team.
He then signed it and returned the contract to the agency.
The contracting agency was flabbergasted.
"You can't do this for just one dollar!
That's completely unheard of!"
The real reason the agency was so upset is probably
that their fee was a percentage of whatever Bob made,
and if Bob made only one dollar, they would effectively
be doing all the paperwork and getting paid a stick of chewing gum.
Bob said, "Okay, then, if you want me to get paid 'for real',
send me a contract with 'real money'."
The agency sent him the original contract (before he changed
it to "one dollar"), and Bob sent it back, indignant.
"I said 'real money'. This amount is an insult."
Although the function WinMain is documented in the
Platform SDK, it's not really part of the platform.
Rather, WinMain is the conventional name for the
user-provided entry point to a Windows program.
The real entry point is in the C runtime library,
which initializes the runtime,
runs global constructors, and then calls your
(or wWinMain if you prefer a Unicode entry point).
If you pay close attention, you'll notice that most user
interface actions tend to occur on the release,
not on the press.
When you click on a button, the action occurs when the mouse button
When you press the Windows key, the Start menu pops up when you
When you tap the Alt key, the menu becomes active when you release it.
(There are exceptions to this general principle, of course,
typing being the most notable one.)
Why do most actions wait for the release?
For one thing, waiting for the completion of a mouse action means
that you create the opportunity for the user to cancel it.
For example, if you click the mouse while it is over a button
(a radio button, push button, or check box),
then drag the mouse off the control, the click is cancelled.
But a more important reason for waiting for the press is to ensure
that the press won't get confused with the action itself.
suppose you are in mode where objects disappear when the user
clicks on them.
For example, it might be a customization dialog, with two columns,
one showing available objects and another showing objects in use.
Clicking on an available object moves it to the list of in-use objects
and vice versa.
Now, suppose you acted on the click rather than the release.
When the mouse button goes down while the mouse is over
on an item, you remove it from the list
and add it to the opposite list.
This moves the items the user clicked on, so that the item
beneath the mouse is now some other item that moved into the original
And then the mouse button is released, and you get a
WM_LBUTTONUP message for the new item.
Now you have two problems:
First, the item the user clicked on got a WM_LBUTTONDOWN
and no corresponding WM_LBUTTONUP,
and second, the new item got a WM_LBUTTONUP
with no corresponding WM_LBUTTONDOWN.
You can also get into a similar situation with the keyboard,
though it takes more work.
For example, if you display a dialog box while the Alt key
is still pressed rather than waiting for the release,
the Alt key may autorepeat and end up delivered to the dialog box.
This prevents the dialog box from appearing
since it's stuck in menu mode that was initiated by the Alt key,
and it's is waiting for you to finish your menu operation
before it will display itself.
Now, this type of mismatch situation is not often a problem,
but when it does cause a problem, it's typically a pretty nasty one.
This is particularly true if you're using some sort of
framework that tries to associate mouse and keyboard events with
the corresponding windowless objects.
When the ups and downs get out of sync,
things can get mighty confusing.
(This entry was posted late because
a windstorm knocked out power to the entire Seattle area.
My house still doesn't have electricity.)
Here's a question that floated past my view:
How do I set the ACLs on a file so users can read it
but can't copy it?
I can't find a "Copy" access mask that I can deny.
If I can't deny copying, I'd at least like to audit it,
so I can tell who made a copy of the file.
There is no "Copy" access mask because copying is not
a fundamental file operation.
Copying a file is just reading it into memory and then
writing it out.
Once the bytes come off the disk, the file system has no
control any more over what the user does with them.
"I bet somebody is looking to get a really nice bonus for that feature."
A customer was having trouble with one of their features that
scans for resources that their program can use,
and, well, the details aren't important.
What's important is that their feature ran in the Startup group,
and as soon as it found a suitable resource,
it displayed a balloon tip:
"Resource XYZ has been found.
Click here to add it to your resource portfolio."
We interrupted them right there.
— Why are you doing this?
"Oh, it's a great feature.
That way, when users run our program, they don't have to go
looking for the resources they want to operate with.
We already found the resources for them."
— But why are you doing it even when your program isn't running?
The user is busy editing a document or working on a spreadsheet
or playing a game.
The message you're displaying is out of context:
You're telling users about a program they aren't even using.
"Yeah, but this feature is really important to us.
It's crucial in order to remain competitive in our market."
— The message is not urgent.
It's a disruption.
Why don't you wait until
they launch your program
to tell them about the resources you found?
That way, the information appears in context:
They're using your program, and the program tells them about these
"We can't do that! That would be annoying!"