Holy cow, I wrote a book!
Some time ago, I noted that in order to format a USB drive as NTFS,
you have to promise to go through the removal dialog.
NTFS is a journaling file system.
The whole point of a journaling file system
is that it is robust to these sorts of catastrophic failures.
how can surprise removal of an NTFS-formatted USB drive
result in corruption?
Well, no it doesn't result in corruption,
at least from NTFS's point of view.
The file system data structures remain intact
(or at least can be repaired from the change journal)
regardless of when you yank the drive out of the computer.
So from the file system's point of view,
the answer is "Go ahead, yank the drive any time you want!"
This is a case of
looking at the world through filesystem-colored glasses.
Sure, the file system data structures are intact,
but what about the user's data?
The file system's autopilot system was careful to land the plane,
but yanking the drive killed the passengers.
Consider this from the user's point of view:
The user copies a large file to the USB thumb drive.
Chug chug chug.
Eventually, the file copy dialog reports 100% success.
As soon as that happens, the user yanks the USB thumb drive
out of the computer.
The user goes home and plugs in the USB thumb drive,
and finds that the file is corrupted.
"Wait, you told me the file was copied!"
Here's what happened:
Now you insert the USB drive into another computer.
Since NTFS is a journaling file system,
it can auto-repair the internal data structures that are
used to keep track of files,
so the drive itself remains logically consistent.
The file is correctly set to the final size,
and its directory entry is properly linked in.
But the data you wrote to the file?
It never made it.
The journal didn't have a copy of the data you wrote in step 2.
It only got as far as the metadata updates from step 1.
That's why the default for USB thumb drives is to optimize
for Quick Removal.
Because people expect to be able to yank USB thumb drives
out of the computer as soon as the computer says that it's done.
If you want to format a USB thumb drive as NTFS,
you have to specify that you are Optimizing for Performance
and that you promise to warn the file system before yanking
the drive, so that it can flush out all the data sitting
in the disk cache.
Even though NTFS is robust and can recover from the surprise
that robustness does not extend to the internal consistency of
the data you lost.
From NTFS's point of view,
that's just a passenger.
It seems that people missed the first sentence of this article.
Write-behind caching is disabled by default on removable drives.
You get into this mess only if you override the default.
And on the dialog box that lets you override the default,
there is a warning message that says
that when you enable write-behind caching,
you must use the Safely Remove Hardware icon
instead of just yanking the drive.
In other words, this problem occurs because you explicitly
changed a setting from the safe setting to the dangerous one,
and you ignored the warning that came with the dangerous setting,
and now you're complaining that the setting is dangerous.
Miscellaneous notes, largely unorganized.
MOV EDI, EDI
One time, somebody asked me,
"What nationality are you?"
I answered, "American."
"No, I mean what nationality are your parents?"
"They're also American."
"No, I mean where are your parents from?"
"They're from New Jersey."
"No, I mean before that."
some time ago
that the nominal mouse wheel amount for
one click (known as a "detent")
is specified by the constant
which has the value 120.
Why not a much more convenient number like 100, or even 10?
Because the value 120 made it easier to create
higher-resolution mouse wheels.
noted in the documentation:
The delta was set to 120 to allow Microsoft or other vendors
to build finer-resolution wheels
(a freely-rotating wheel with no notches)
to send more messages per rotation,
but with a smaller value in each message.
Suppose the original wheel mouse had nine clicks
around its circumference.
Click nine times, and you've made a full revolution.
(I have no idea how many actual clicks there were,
but the actual number doesn't matter.)
Therefore, each click of the wheel on the original mouse
resulted in 120 wheel units.
Now, suppose you wanted to build a double-resolution wheel,
say one with eighteen clicks around the circumference
instead of just nine.
If you reported 120 wheel units for each click,
then your mouse would feel "slippery",
because it scrolled twice as fast as the original mouse.
Have each click of your double-resolution mouse report
60 wheel units instead of 120.
That's why the number chosen was 120.
The number 120 has a lot more useful factors than 100.
The number 100 = 2² × 5²
can be evenly divided by the small integers 2, 4, 5, and 10.
On the other hand, the number 120 = 2³ × 3 × 5
can be evenly divided
by 2, 3, 4, 5, 6, 8, and 10.
On the other hand, if MOUSE_WHEEL were 120,
then the triple-resolution mouse could simply report 40 units per
Okay, so why 120 instead of just 12?
As noted in the documentation,
the value was chosen so that it would be possible to
build a mouse with no clicks at all.
The wheel simply spun smoothly,
and you could stop it at any point.
Such a wheel would report one wheel unit for every
one-third of one degree of rotation.
If the detent were only 12 units,
then the wheel would report one unit for every 3 1/3
degrees of rotation,
which wouldn't be as smooth.
I don't know if anybody has developed such a mouse,
but at least the possibility is still there.
(There are free-spinning mouse wheels, but I don't know
whether they are normal WHEEL_DELTA
wheels just without the mechanical detents,
or whether they really do report fine rotational information.)
The History of the Scroll Wheel,
written by its inventor,
Mouse wheel trivia:
The code name for the mouse wheel project was Magellan.
The code name still lingers in
that pop up from the original wheel mouse driver.
A developer from another group within Microsoft
wanted to create a TaskDialog with
a progress bar,
but they couldn't figure out how to get rid of the Cancel button.
"Is there a way to remove all the buttons from a Task Dialog?"
users hate it when you give them a window that cannot be closed
What should the user do if the reticulation server stops responding?
Shut down the computer?
(Hey, at least shutting down the computer
will actually work.)
"The process usually takes around two seconds,
and we time out after ten.
In the case of timeout, we replace the progress dialog with
a failure dialog with the options Close and Retry.
But for this dialog, we just want to show the progress bar
so they know that we are doing something.
We have not yet finalized the design.
One design is to have a Cancel button on the progress dialog;
another is to remove the option to Cancel.
We're just investigating the possibility of the second option.
We haven't committed to it yet."
You should leave the Cancel button enabled,
and if the user clicks it,
then go straight to the "timed out" dialog.
Removing the Cancel button leaves the user trapped in a dialog box
with no escape route.
By an astonishing coincidence,
a few weeks after this email exchange concluded,
I happened to encounter the Reticulating Splines dialog,
and it got stuck,
and there was no Cancel button.
The frustrated user who got trapped with a window that could
not be closed or cancelled turned out to be me.
At least as of the time this article was originally written,
the HTML clipboard format
is officially at version 0.9.
A customer observed that sometimes they received
HTML clipboard data that marked itself as version 1.0
and wanted to know where they could find documentation
on that version.
As far as I can tell, there is no official version 1.0 of
the HTML clipboard format.
I hunted around, and the source of the rogue version 1.0
format appears to be
the WPF Toolkit.
Version 1.0 has been the version used by
since its initial commit.
If you read the code, it appears that they are not generating
HTML clipboard data that uses any features beyond version 0.9,
so the initial impression is that
it's just somebody who jumped the gun and set their version number
higher than they should have.
The preliminary analysis says that
you can treat version 1.0 the same as version 0.9.
But that's merely the preliminary analysis.
A closer look at the
function shows that it generated the HTML content incorrectly.
The code treats the fragment start and end offsets as character
offsets, not byte offsets.
But the offsets are explicitly documented as in bytes.
Byte count from the beginning of the clipboard
to the start of the fragment.
Byte count from the beginning of the clipboard
to the end of the fragment.
Now, WPF knows that
the DataFormats.HTML clipboard format
is encoded in UTF-8, so when you pass a C# string to be placed
on the clipboard as HTML, it knows to convert the string
to UTF-8 before putting it on the clipboard.
But it doesn't know to convert the offsets you provided in the
HTML fragment itself.
As a result, the values encoded in the offsets end up too small
if the text contains non-ASCII characters.
(You can see this by copying text containing non-ASCII characters
from the DataGrid control, then pasting into Word.
Result: Truncated text, possibly truncated to nothing depending on
the nature of the text.)
There are two other errors in the
Although the code attempts to follow the recommendation of
the specification by placing a
<!--EndFragment--> marker after the fragment,
they erroneously insert a \r\n in between.
Furthermore, the EndHTML value is off by two.
(It should be
which is 38, not 36.)
Okay, now that we see the full situation, it becomes clear that
at least five things need to happen.
The immediate concern is what an application should do when it sees
a rogue version 1.0.
One approach is to exactly undo the errors in the WPF Toolkit:
Treat the offsets as character offsets (after converting from UTF-8
to UTF-16) rather than byte offsets.
This would address the direct problem of the WPF Toolkit, but it is
also far too aggressive, because there may be another application which
accidentally marked its HTML clipboard data as version 1.0
but which does not contain the exact same bug as the WPF Toolkit.
applications which see a
version number of 1.0 should treat the
EndHTML, EndFragment, and EndSelection offsets as untrustworthy.
The application should verify that the EndFragment lines up with the
If it does not, then ignore the specified value for EndFragment and
infer the correct offset to the
fragment end by searching for the last occurrence of
<!--EndFragment--> marker in the clipboard data,
but trim off the spurious \r\n that the WPF Toolkit
erroneously inserted, if present.
Similarly, EndHTML should line up with the end of the
</HTML> tag; if not, the specified offset
should be ignored and the correct value inferred.
Fortunately, the WPF Toolkit does not use EndSelection,
so there is no need to attempt to repair that value,
and it does not use multiple fragments, so only one fragment repair
Welcome to the world of application compatibility,
where you have to accommodate the mistakes of others.
Some readers of this Web site would suggest that the correct course
of action for your application is to
detect version 1.0 and put up an
error message saying,
"The HTML on the clipboard was placed there by a buggy
Contact the vendor of that application and tell them to fix their bug.
Until then, I will refuse to paste the data you copied.
Don't blame me! I did nothing wrong!"
Good luck with that.
Second, the authors of the WPF Toolkit should fix their bug so that
they encode the offsets correctly in their HTML clipboard format.
Third, at the same time they fix their bug, they should switch their
reported version number back to 0.9,
so as to say,
"Okay, everybody, this is the not-buggy version.
No workaround needed any more."
If they leave it as 1.0, then applications which took the more
aggressive workaround will end up double-correcting.
Fourth, the maintainers of the HTML clipboard format may want
to document the rogue version 1.0 clipboard format and provide
recommendations to applications (like I just did)
as to what they should do when they encounter it.
Fifth, the maintainers of the HTML clipboard format must not
use version 1.0 as the version number for any future revision of
the HTML clipboard format.
If they make another version, they need to call it
0.99 or 1.01 or something different from 1.0.
Version 1.0 is now tainted.
It's the version number that proclaims, "I am buggy!"
At first, we thought that all we found was
a typo in an open-source helper library,
but digging deeper and deeper revealed that it was actually
a symptom of a much deeper problem that has now turned into
an industry-wide five-pronged plan for remediation.
Occasionally, a customer will ask,
"What is Rundll32.exe and when should I use it instead of just
writing a standalone exe?"
The guidance is very simple:
Don't use rundll32.
Just write your standalone exe.
Rundll32 is a leftover from Windows 95,
and it has been deprecated since at least
Windows Vista because it violates a lot of modern
If you run something via Rundll32,
then you lose the ability to tailor the execution
environment to the thing you're running.
Instead, the environment is set up for whatever Rundll32 requests.
You get the idea.
Note also that Rundll32 assumes that the entry point you provide
corresponds to a task which pumps messages,
since it creates a window on your behalf and passes it as
the first parameter.
A common mistake is writing a Rundll32 entry point for a long-running
task that does not pump messages.
The result is an unresponsive window that
Digging deeper, one customer explained that they asked for guidance
making this choice because they want to create a scheduled task
that runs code inside a DLL,
and they wanted to decide whether to create a Rundll32 entry point
in their DLL,
or whether they should just create a custom executable whose sole
job is loading the DLL and calling the custom code.
By phrasing it as an either/or question,
they missed the third (correct) option:
Create your scheduled task with an
that specifies a CLSID your DLL implements.
Bohemian Rhapsody was not part of my world growing up,
so I view the continuing
cultural fascination with the piece with detached confusion.
The hallmark of cultural preoccupation is the fact that
the Wikipedia entry
deconstructs the piece moment by moment,
clocking in at over 2000 words,
far in excess of the Wikipedia recommendation of a 60-word summary
for a 6-minute piece (10 words per minute).
longer than the entire Wikipedia page for Ruth Bader Ginsburg.
When you type a phrase
into the Windows Vista Start menu's search box and click
Search the Internet,
then the Start menu hands the query off to your
default Internet search provider.
Or at least that's what the
would have you believe.
A customer reported that
when they typed a phrase into the Search box and clicked
Search the Internet,
they got a screenful of advertisements
disguised to look like search results.
What kind of evil Microsoft shenanigans is this?
If you looked carefully at the URL for the
bogus search "results",
the results were not coming from
Windows Live Search.
They were coming from a server controlled by
the customer's ISP.
That was the key to the rest of the investigation.
Here's what's going on:
The ISP configured all its customers to use the ISP's custom DNS servers
That custom DNS server,
when asked for the
location of search.live.com,
the actual IP address of Windows Live Search but rather the IP address
of a machine hosted by the ISP.
(This was confirmed by manually running nslookup
on the customer machine and seeing that the wrong IP addresses
were being returned.)
The ISP was stealing traffic from Windows Live Search.
It then studied the URL you requested,
and if it is the URL used by the Start menu Search feature,
then it sent you to
the page of fake search results.
Otherwise, it redirected you to the real Windows Live Search,
and you're none the wiser, aside from your Web search taking
a fraction of a second longer than usual.
(Okay, snarky commenters,
and aside from the fact that it was Windows Live Search.)
The fake results page does have an About This Page link,
but that page only talks about how the ISP
intercepts failed DNS queries
(which has by now become
It doesn't talk about redirecting successful DNS queries.
I remember when people noticed
widespread hijacking of search traffic,
and my response to myself was,
I've know about this for years."
It so happens that the offending ISP's Acceptable Use Policy
explicitly lists as a forbidden activity
"to spoof the URL, DNS, or IP addresses of «ISP» or any
In other words, they were violating their own AUP.
More than once, a customer has noticed that
running the exact same program under the debugger
rather than standalone
causes it to change behavior.
And not just in the "oh, the timing of various operations
changed to hit different race conditions" but in much
more fundamental ways like
"my program runs really slow"
"my program crashes in a totally
different location" or (even more frustrating)
"my bug goes away".
What's going on?
I'm not even switching between the retail and debug
versions of my program,
so I'm not a victim of
changing program semantics in the debug build.
When a program is running under the debugger,
some parts of the system behave differently.
One example is that the
CloseHandle function raises
(I believe it's
STATUS_INVALID_HANDLE but don't quote me)
if you ask it to close a handle that isn't open.
But the one that catches most people is that when run under
an alternate heap is used.
This alternate heap has a different memory layout,
and it does extra work when allocating and freeing memory
to help try to catch common heap errors,
like filling newly-allocated memory with a known sentinel value.
But this change in behavior can make your debugging harder
So much for people's suggestions to
switch to a stricter implementation of the Windows API
when a debugger is attached.
On Windows XP and higher,
disable the debug heap even when debugging.
If you are using a dbgeng-based debugger
like ntsd or WinDbg,
you can pass the -hd command line switch.
If you are using
set the _NO_DEBUG_HEAP environment variable to 1.
If you are debugging on a version of Windows prior to Windows XP,
you can start the process without a debugger,
then connect a debugger to the live process.
The decision to use the debug heap is made at process
startup, so connecting the debugger afterwards ensures
that the retail heap is chosen.