Holy cow, I wrote a book!
When analyzing the performance of a program,
you must be mindful that your performance analysis tools
can themselves affect the operation of the system
you are analyzing.
This is especially true if the performance analysis tool
is running on the same computer as the program being studied.
People often complain that Explorer takes a page fault
every two seconds even when doing nothing.
They determine this by opening Task Manager and
enabling the Page Faults column,
and observing that the number of Page Faults
increases by one every two seconds.
This got reported so often that I was asked to sit
down and figure out what's going on.
Notice, though, that if you change Task Manager's
Update Speed to High, then Explorer's page fault
rate goes up to four per second.
If you drop it to Low, then it drops to one every four seconds.
If you haven't figured it out by now,
the reason is that Task Manager itself
is causing those page faults.
Mind you, they are soft faults and therefore do not
entail any disk access.
Every two seconds (at the Normal update rate),
Task Manager updates the CPU meter in the taskbar,
and it is this act of updating the CPU meter that is
the cause of the page faults.
No Task Manager, no animating taskbar notification icon,
and therefore no page faults from Explorer when idle.
(A similar effect was discovered by
when he found that
Process Explorer's polling calls to
was triggering repeated registry access.)
from the Office User Interface team
has joined the 7am club,
posting fascinating glimpes into
and the upcoming version of Office code-named
And they come out at 7am every weekday.
maybe he's not real either.
some kind of a robot.
Some people have noticed that if you load a DLL with the
you sometimes get strange behavior if you then pass that HINSTANCE
to a dialog box function.
The problem here is that since the bottom 16 bits of a proper
HINSTANCE are always zero, different components have "borrowed"
those bits for different purposes.
The kernel uses the bottom bit to distinguish modules loaded
by having been mapped into memory as sections (i.e., loaded
normally) from those who have been mapped as one giant block
(loaded as a datafile).
It needs to know this so that the various resource-management
functions such as
the FindResource function
know how to interpret the data in order to locate the
resource in question.
Although everybody now knows that the HINSTANCE is the base
address of the DLL,
in principle, it is an opaque value
(and in the 16-bit world, the value was indeed opaque).
Meanwhile, the window manager has its own problems.
In order to support 16-bit applications running seamlessly
on the desktop
(not in a virtual machine, as discussed earlier),
as well as thunking between 16-bit and 32-bit code,
it needs to accept both 32-bit HINSTANCE values
as well as 16-bit HINSTANCE values.
memory allocation granularity is 64KB,
the window manager knows that valid 32-bit HINSTANCEs
have zero in the bottom 16 bits,
whereas 16-bit HINSTANCE values are nonzero there.
Perhaps you see the conflict now.
If you pass the instance handle of a DLL loaded as a datafile,
the kernel will set the bottom bit as a signal to itself to
locate its resources in the flat datafile manner rather than
in the mapped DLL manner.
But the window manager sees that the bottom 16 bits are not all
zero and assumes that it has been given a 16-bit HINSTANCE value.
Amazingly, this doesn't cause a problem most of the time
because the things that need to be handled differently
between 32-bit and 16-bit HINSTANCEs are relatively minor.
The one that is most likely to bite you is the dialog instance data
In 16-bit Windows, a dialog box came with its own data segment,
which was used as the local data segment for controls hosted by that
Most controls didn't need a lot of storage in the local data segment,
so the issue of where it came from wasn't really important.
The big exception was edit controls, since they can contain
multiple kilobytes of text.
a dozen kilobytes of text may very well not
fit in the application's data segment.
Therefore, creating a new data segment gave the edit controls
on the dialog a new 64KB block of memory to store their data in.
Programs were expected to extract the data from the edit control
via mechanisms such as
the GetWindowText function
and store the result someplace that had the capacity
to handle it (outside the cramped local data segment).
In order to maintain compatibility with 16-bit programs who
are expecting this behavior to continue, the window manager,
when it sees a 16-bit HINSTANCE, dutifully creates a 16-bit
data segment in which to store the data for the edit controls,
using a helper function provided by the 16-bit emulation layer.
But if you aren't really a 16-bit program, then the
16-bit emulation layer is not active, and consequently
it never got a chance to tell the window manager how to create
one of these compatibility segments.
The solution is to add the DS_LOCALEDIT style to your dialog
This flag means "Do not create a dialog box data segment;
just keep using the data segment of the caller."
Therefore, when your LOAD_LIBRARY_AS_DATAFILE
is mistaken for a 16-bit dialog template,
the dialog manager won't try to create a dialog box data segment and
therefore won't call that function that doesn't exist.
I believe this issue has been resolved in Windows XP SP 2.
The window manager uses a different mechanism to detect that it is
being asked to create a dialog box on behalf of a 16-bit program
and is no longer faked out by the faux-HINSTANCEs produced by
the LoadLibraryEx function.
Eric Schulman published
A Briefer History of Time,
based upon his previous effort to
capture the history of the universe in 200 words.
The book takes the initial 200-word summary and expands upon each phrase,
surreptitiously teaching you some science among the jokes.
(You can even
watch a video.)
this Hawking guy
shows up and gives his book
the exact same title.
What a rip-off.
(I'm told this Hawking guy has a lot of fans.
Those who live in the Seattle area might be interested to know that
he'll be in town in mid-November.)
Many people suggest solving the backwards compatibility problem
by merely running old programs in a virtual machine.
This only solves part of the problem.
Sure, you can take a recalcitrant program and run it in
a virtual machine, with its own display, its own hard drive,
its own keyboard, etc.
But there are very few types of
programs (games being a notable example)
where running them in that manner
yields a satisfying experience.
Because most programs expect to interact with other programs.
Since the virtual machine is running its own operating system,
you can't easily
share information across the virtual machine boundary.
For example, suppose somebody double-clicks a .XYZ file,
and the program responsible for .XYZ files is set to run in
a virtual machine.
The hassle with copying files around can be remedied by
treating the main operating system's hard drive as a remote
network drive in the virtual machine operating system.
But that helps only the local hard drive scenario.
If the user double-clicks a .XYZ file from a network server,
you'll have to re-map that server in the virtual machine.
In all cases, you'll have to worry about the case that the
drive letter and path may have changed as a result of the mapping.
And that's just the first problem.
Users will expect to be able to treat that program in the
virtual machine as if it were running on the main operating system.
Drag-and-drop and copy/paste need to work across the virtual machine
Perhaps they get information via e-mail (and their e-mail program
is running in the main operating system) and they want to paste
it into the program running in the virtual machine.
International keyboard settings wouldn't be synchronized;
changing between the English and German keyboards by tapping Ctrl+Shift
in the main operating system would have no effect on the virtual machine
Isolating the program in a virtual machine means that it doesn't
get an accurate view of the world.
If the program creates a taskbar notification icon, that icon will
appear in the virtual machine's taskbar, not on the main taskbar.
If the program tries to use DDE to communicate with Internet Explorer,
it won't succeed because Internet Explorer is running in the
main virtual machine.
And woe unto a program that tries to
FindWindow and then
to a window running in the other operating system.
If the program uses OLE to host an embedded Excel spreadsheet,
you will have to install Excel in the virtual machine operating system,
and when you activate the object, Excel will run in the virtual machine
rather than running in the main operating system.
Which can be quite confusing if a copy of Excel is also running
in the main operating system, since Excel is a single-instance program.
Yet somehow you got two instances running that can't talk to each other.
And running a virus checker in a virtual machine won't help keep your
main operating system safe.
As has already been noted, the virtual machine approach also doesn't
do anything to solve the plug-in problem.
You can't run Internet Explorer in the main operating system and
an Internet Explorer plug-in in a virtual machine.
And since there are so many ways that programs on the desktop
can interact with each other,
you can think of each program as just another Windows plug-in.
In a significant sense, a virtual machine is like having another
computer. Imagine if the Windows compatibility story was
"Buy another computer to run your old programs.
Sharing information between the two computers is your own problem."
I doubt people would be pleased.
For Windows 95, we actually tried this virtual machine idea.
Another developer and I got Windows 3.1
running in a virtual machine within Windows 95.
There was a Windows 3.1 desktop with Program Manager,
and inside it were all your Windows 3.1 programs.
(It wasn't a purely isolated virtual machine though.
We punched holes in the virtual machine in order to solve the
file sharing problem, taking advantage of the particular way
Windows 3.1 interacted with its DPMI host.)
Management was intrigued by this capability but ultimately
decided against it because it was a simply dreadful
The limitations were too severe,
the integration far from seamless.
Nobody would have enjoyed using it,
and explaining how it works to a non-technical person would have
been nearly impossible.
As I already noted,
I went down to Los Angeles
a few days before the PDC to spend time with friends
I stayed with a cousin who works for a major video game
and his boss gave him a homework assignment:
He was told to go home and play a specific video game.
(Unfortunately, it wasn't a particularly good video game,
but his boss didn't want him to admire the gameplay.
He wanted him to pay attention to the visual design.)
Tell this to a teenager and they will think my cousin
has a dream job.
"He plays video games and gets paid for it!"
But of course, we all know that there's a difference
between playing video games for fun (where you can choose
which game to play and how long to play it)
and playing it for work.
Anyway, when he was taking a break from his video game homework,
I turned on the Playstation and popped in
by far the most screwed-up video game ever.
In a good way.
I won't bother explaining the game; there are
plenty of other sites that
do a better job of it than I can,
perhaps the most poetic of which is
Namco's own site.
(They obviously got a professional translator to do the site
rather than relying on the bizarro-English used in the game itself!)
Featuring ball-rolling and object-collecting gameplay mechanics
of mesmerizing fluidity, reduced to Pac-Man simplicity,
through pure absurdity.
Dimensions change drastically as your clump grows
from a fraction of an inch to a monstrous freak of nature.
I was indeed mesmerised by the utter simplicity of the
gameplay, the intuitiveness of the controls,
and the sense of total glee when you realize that
you can pick up an ocean liner.
The way the game changes scale in the span of twenty
minutes adds to the overall magic.
What was at the start of the game a wall you merely
accepted as part of the landscape
becomes, after you grow your katamari for a while,
an obstacle you have to avoid,
and later still, an item you can roll up, or,
if you neglect it long enough,
something you pick up off the ground purely incidentally
like a piece of gum stuck to your shoe.
I remember on the last level,
realizing that I had just picked up
the park where the level started.
The entire park.
One thing I found myself doing was standing up
as my katamari grew larger.
I would start out the level sitting down,
and by the time I reached 200 meters, I would be
standing up and leaning left and right as my
huge ball of junk became more and more unwieldy.
Anyway, there wasn't much of a point to this entry.
I just wanted to rave about this completely messed-up game.
(I'm hardly the only fan of this game.
This particular fan club
deserves special mention for
their wonderful "Your katamari is as big as <n> comments" link.
And then there's the unbelievable
katamari cake complete with prince.)
More often I see the reverse of the
"Low priority threads can run even when higher priority threads are running"
Namely, people who
think that Sleep(0)
is a clean way to yield CPU.
For example, they might have run out of things to do and merely
wish to wait for another thread to produce some work.
Recall that the scheduler looks for the highest priority runnable
thread, and if there is a tie, all the candidates share CPU
A thread can call Sleep(0) to relinquish its
quantum, thereby reducing its share of the CPU.
Note, however, that this does not guarantee
that other threads will run.
If there is a unique runnable thread with the highest priority,
it can call Sleep(0) until the cows come home,
and it will nevertheless not relinquish CPU.
That's because sleeping for zero milliseconds release the quantum
but leaves the thread runnable.
And since it is the only runnable thread with the highest priority,
it immediately gets the CPU back.
Sleeping for zero milliseconds is like going to back of the line.
If there's nobody else in line, you didn't actually yield to anyone!
Therefore, if you use
Sleep(0) as an ineffective yield,
you will never allow lower priority threads to run.
This means that various background activities (such as indexing)
never get anywhere since your program is hogging all the CPU.
What's more, the fact that your program never actually releases
the CPU means that the computer will never go into a low-power
Laptops will drain their batteries faster and run hotter.
Terminal Servers will spin their CPU endlessly.
The best thing to do is to wait on a proper synchronization object
so that your thread goes to sleep until there is work to do.
If you can't do that for some reason, at least sleep for a nonzero
amount of time.
That way, for that brief moment, your thread is not runnable
and other threads—including lower-priority threads—get
a chance to run.
(This will also reduce power consumption somewhat,
though not as much as waiting on a proper synchronization object.)
I like saying "withered hand",
but Google took this a bit too far and made me
the top hit for the phrase withered hand,
more popular than Jesus
with respect to that phrase,
at least for now.
I apologize to all the people looking for the
Miracle of the Withered Hand.
Yahoo were not fooled.
Just because you have a thread running at a higher priority
level doesn't mean that no threads of lower priority will ever run.
Occasionally, I see people write multi-threaded code and put
one thread's priority higher than the other, assuming that
this will prevent the lower-priority thread from interfering
with the operation of the higher-priority thread
so that they don't need to do any explicit synchronization.
// high priority thread
g_fReady = TRUE;
g_iResult = iResult;
// low priority thread
Let's ignore the cache coherency elephant in the room.
If there were a guarantee that the low priority thread will never
ever run while the high priority thread is running, this code looks
okay. Even if the high priority thread interrupts and sets
the result after the low priority thread has checked the ready flag,
all that happens is that the low priority thread misses out on the result.
(This is hardly a new issue, since
the principle of relativity of simultaneity
says that this was a possibility anyway.)
However, there is no guarantee that the low priority thread can't
interfere with the high priority thread.
The scheduler's rule is to look for the thread with the highest
priority that is "runnable", i.e., ready to run,
and assign it to a CPU for execution.
To be ready to run, a thread cannot be blocked on anything,
and it can't already be running on another CPU.
If there is a tie among runnable threads for the highest priority,
then the scheduler shares the CPU among them roughly equally.
You might think that, given these rules,
as long as there is a high priority thread that is runnable,
then no lower-priority thread will run.
But that's not true.
Consider the case of a multi-processor system
(and with the advent of hyperthreading, this is
becoming more and more prevalent),
where there are two runnable threads, one with higher priority
than the other.
The scheduler will first assign the high-priority thread
to one of the processors.
But it still has a spare CPU to burn,
so the low-priority thread will be assigned to the second CPU.
You now have a lower priority thread running simultaneously
as a higher priority thread.
Of course, another way a lower priority thread can run even
though there are higher priority threads in the system is
simply that all the higher priority threads are blocked.
In addition to the cases you might expect, namely
waiting on a synchronization object such as a semaphore
or a critical section,
a thread can also block for I/O or for paging.
Paging is the wildcard here, since you don't have any
control over when the system might decide to page out
the memory you were using due to memory pressure
elsewhere in the system.
The moral of the story is that thread priorities are not
a substitute for proper synchronization.
Next time, a reversal of this fallacy.
Last night, the MVP Global Summit broke up by product groups for dinner.
I was at the Windows Client product group dinner.
The problem for me was figuring out who were the MVPs
and who were just Microsoft employees looking for MVPs to chat with.
Unfortunately, the people who made up the badges didn't think of
making it easy to tell who is who.
I saw badges of different colors, but they appeared to be coded
by product group rather than by attendee status.
More than once, I sat down next to someone and introduced myself,
only to find that they were another Microsoft employee.
(Hey, but I made
Robert Flaming do a spit take.
That's gotta be worth something.)
One thing I was able to figure out at the 2005 PDC was the badge colors.
Here they are, for reference:
The color-coding by attendee type made it much easier to identify
attendees to chat with.
Though I somehow have developed an unfortunate knack for picking
a table where people aren't speaking English.
At the PDC, I sat down at a table and realized that everybody was
although I intend to learn Dutch eventually,
it's a few languages down my list.
Last night, at the MVP Global Summit,
I was about to join a table but realized that they
were speaking in what sounded like a Central or possibly
Eastern European language.
There's nothing like an international gathering
to make you feel linguistically inadequate...