Holy cow, I wrote a book!
Okay, everybody, here's your chance to solve a compatibility problem.
There is no answer yet;
I'm looking to see how you folks would attack it.
This is a real bug in the Windows Vista database.
A beta tester reported that Explorer fails to show more
than about a hundred files per directory from file servers running
a particular brand of the file server software.
The shell and networking teams investigated the problem together
and tracked it down to the server incorrectly handling
certain types of directory queries.
Although the server claims to support both slow and fast queries,
if you try a fast query,
it returns only the first hundred or so files and then gives up with
a strange error code.
On the other hand, if Explorer switches to the slow query,
then everything works fine.
(Windows XP always used the slow query.)
An update to the server software was released earlier this year
which claims to fix the bug.
However (as of this writing),
all of the vendor's distributors continue to ship the buggy version
of the driver.
What should we do?
Here are some options.
Choose of of the below or make up your own!
Make no accomodation for this particular buggy protocol implementation.
People who are running that particular implementation will get
incomplete directory listings.
Publish a Knowledge Base article describing the problem and directing
customers to contact the vendor for an updated driver.
Explorer should recognize the strange error code
and display an error message to the user saying,
"The server \\servername appears to be running an old version
of the XYZ driver that does not report the contents of large
Not all items in the directory are shown here.
Please contact the administrator of the machine \\servername to have the
(Possibly with a "Don't show this dialog again" check-box.)
Explorer should recognize the strange error code
and say, "Oh, this server must have the buggy driver.
It's too late to do anything about the current directory information,
but I'll remember that I should do things the slow way in the future
when talking to this server."
To avoid denial-of-service attacks, remember only the last 16 (say)
servers that exhibit the problem.
(If the list of "known bad" servers were unbounded, then an attacker
could consume all the memory on your computer by creating a server
that responded to a billion different names and
using HTTP redirects to get you to visit all of those servers in turn.)
Add a configuration setting to the Windows network client
to tell it "If somebody asks whether a server supports fast queries,
always say No, even if the server says Yes."
In this manner, no program will attempt to use fast queries;
they will all use slow queries.
Directory queries will run slower, but at least they will work.
Add a configuration setting to Explorer
to tell it "Always issue slow queries; never issue fast queries."
Directory queries will run slower, but at least they will work.
But this affects only Explorer;
other programs which ask the server "Do you support fast queries?"
will receive an affirmative response and attempt to use fast queries,
only to rediscover the problem that Explorer worked around.
Stop supporting "fast mode" in the network client
since it is unreliable;
there are some servers that don't handle "fast mode" correctly.
This forces all programs to use "slow mode".
Optionally, have a configuration setting to re-enable "fast mode".
Make sure to list both advantages and disadvantages of your proposal.
there were an awful lot of comments yesterday
and it will take me a while to work through them all.
But I'll start with some more background on the problem
and clarifying some issues that people had misinterpreted.
As a few people surmised, the network file server software
in question is Samba,
a version of which comes with most Linux distributions.
(I'll have to do a better job next time of disguising the
identities of the parties involved.)
Samba is also very popular as the network file server for
embedded devices such as network-attached storage.
The bug in question is fixed in the latest version of Samba,
but none of the major distributions have picked up the fix yet.
Not that that helps the network-attached storage scenario any.
It appears that a lot of people though the buggy driver
was running on the Windows Vista machine,
since they started talking about
blocking its installation.
The problem is not on the Windows Vista machine;
the problem is on the file server, which is running Linux.
WHQL does not certify Linux drivers,
it can't stop you from installing a driver
on some other Linux machine,
and it certainly can't
download an updated driver and somehow upgrade your Linux machine
Remember, the bug is on the server,
which is another computer running some other operating system.
Asking Windows to update the driver on the remote server makes about
as much sense as asking Internet Explorer to upgrade the version
of Apache running on slashdot.org.
You're the client; you have no power over the server.
Some people lost sight of the network-attached storage scenario,
probably because they weren't familiar with the term.
A network-attached storage device is a self-contained device
consisting of a large hard drive, a tiny computer, and a place
to plug in a network cable.
The computer has an operating system burned into its ROMs
(often a cut-down version of Linux with Samba),
and when you turn it on, the device boots the computer,
loads the operating system, and acts as a file server on your network.
Since everything is burned into ROM,
the driver will get upgraded and the problem will eventually be long
is wishful thinking.
It's not like you can download a new Samba driver and install
it into your network-attached storage device.
You'll have to wait for the manufacturer to release a new ROM.
As for detecting a buggy driver, the CIFS protocol doesn't
really give the client much information about what's running
on the server, aside from a "family" field that identifies
the general category of the server (OS/2, Samba, Windows NT, etc.)
All that a client can tell, therefore, is "Well, the server
is running some version of Samba."
It can't tell whether it's a buggy version or a fixed version.
The only way to tell that you are talking to a buggy server
is to wait for the bug to happen.
(Which means that people who said, "Windows Vista should just default
to the slow version," are saying that they want Windows Vista
to run slow against Samba servers and fast against Windows NT servers.
This plays right into the hands of the conspiracy theorists.)
My final remark for today is explaining how a web site can
"bloat the cache" of known good/bad servers and create a denial
of service if the cache did not have a size cap:
First, set up a DNS server that directs all requests for *.hackersite.com
to your Linux machine.
On this Linux machine, install one of the buggy versions of Samba.
Now serve up this web page:
<IFRAME SRC="\\a1.hackersite.com\b" HEIGHT=1 WIDTH=1></IFRAME>
<IFRAME SRC="\\a2.hackersite.com\b" HEIGHT=1 WIDTH=1></IFRAME>
<IFRAME SRC="\\a3.hackersite.com\b" HEIGHT=1 WIDTH=1></IFRAME>
<IFRAME SRC="\\a4.hackersite.com\b" HEIGHT=1 WIDTH=1></IFRAME>
<IFRAME SRC="\\a10000.hackersite.com" HEIGHT=1 WIDTH=1></IFRAME>
Each of those IFRAMEs displays an Explorer window
with the contents of the directory \\a1.hackersite.com\b.
(Since all the names resolve to the same machine,
all the \\*.hackersite.com machines are really the same.)
In that directory, put 200 files, so as to trigger the
"more than 100 files" bug and force Windows Vista to cache the server
as a "bad" server.
In this way, you forced Windows Vista to create ten thousand records
for the ten thousand bad servers you asked to be displayed.
Throw in a little more script and you can turn this into a loop that
accesses millions of "different" servers (all really the same server).
If the "bad server" cache did not have a cap, you just allowed a
bad server to consume megabytes of memory that will never be freed
until the computer is rebooted.
Pretty neat trick.
Even worse, if you proposed preserving this cache across reboots,
then you're going to have to come up with a place to save this information.
Whether you decide that it goes in a file or in the registry,
the point is that an attacker can use this "bloat attack" and cause
the poor victim's disk space/registry usage to grow without bound
until they run out of quota.
And once they hit quota, be it disk quota or registry quota,
not only do bad things start happening, but they don't even know
what file or registry key they have to delete to get back under quota.
Next time, I'll start addressing some of the proposals that people came
up with, pointing out disadvantages that they may have missed
in their analysis.
You may have noticed that there's a copy of Notepad
and another in
Compatibility, of course.
Windows 3.0 put Notepad in the Windows directory.
Windows NT put it in the System32 directory.
Notepad is perhaps the most commonly hardcoded program in Windows.
many Setup programs use it to view the Readme file,
you can use your imagination to come up with other places
where a program or batch file or
printed instructions will hard-code the path to Notepad.
In order to be compatible with programs designed for
Windows 95, there needs to be a copy of Notepad
in the Windows directory.
And in order to be compatible with programs designed for
Windows NT, there also needs to be a copy in the
And now that Notepad exists in both places,
new programs have a choice of Notepads,
and since there is no clear winner,
half of them will choose the one in the Windows directory
and half will choose the one in the System32 directory,
thereby ensuring the continued existence of two copies
of Notepad for years to come.
Often, people will not even realize that their solution to a problem
merely replaces it with another problem.
The quip attributed to Jamie Zawinski captures the sentiment:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
For example, in response to
"How do I write a batch file that..."
some people will say,
"First, install <perl|bash|monad|...>".
This doesn't actually solve the problem;
it merely replaces it with a different problem.
In particular, if the solution begins with
you've pretty much lost out of the gate.
Solving a five-minute problem by taking a half hour to download and
install a program is a net loss.
In a corporate environment, adding a program to a deployment
is extraordinarily expensive.
You have to work with your company's legal team to make sure
the licensing terms for the new program are acceptable
and do not create undue risk from a legal standpoint.
What is your plan of action if
the new program stops working, and your company starts losing
tens of thousands of dollars a day?
You have to do interoperability testing to make sure the new
program doesn't conflict with the other programs in the deployment.
(In the non-corporate case, you still run the risk that the
new program will conflict with one of your existing programs.)
Second, many of these "solutions" require that you
abandon your partial solution so far and rewrite it in the
If you've invested years in tweaking a batch file and you just need
one more thing to get that new feature working,
and somebody says,
"Oh, what you need to do is throw away you batch file and
start over in this new language,"
you're unlikely to take up that suggestion.
So be careful when you suggest a solution that has a high
Sure, something could be taken care of by a one-line perl script,
but getting perl onto the machine is hardly a one-line endeavor.
A commenter named "Al" wondered why
the window manager couldn't just take over behavior that used to
be within the application's purview, such as painting the non-client
area, in order to avoid problems with applications not responding
to messages promptly enough.
If the window manager were being rewritten, then perhaps it could.
But to do it now would introduce many compatibility issues.
First, there are many applications that have subtle dependencies
on message ordering or receiving certain types of messages at
certain times, even though there is no actual guarantee in the
specification that such messages be delivered.
There are a large number of applications that rely on
WM_PAINT messages being delivered even if
there is nothing to paint,
because they defer some critical computations until the
first WM_PAINT message,
and if something that requires the result of that computation
happens before a WM_PAINT, they crash.
For example, if you launch a program minimized, then right-click
on the taskbar button for the program's main window,
these programs would crash because the code
that handles the system menu uses a pointer variable that
the WM_PAINT handler initializes
or divides by a global variable whose default value is zero
but whose value is calculated during WM_PAINT handling.
To accomodate these programs, the window manager is forced to send
"dummy" WM_PAINT messages with an empty rcPaint.
These such messages appear to accomplish nothing,
but the hidden agenda is that the program gets
its cherished WM_PAINT message
and can perform whatever operations it is that keeps it from
crashing later on.
Second, removing customizability of message behavior from the
window manager would prevent programs from customizing their
appearance in nonstandard ways.
Media players are perhaps the most popular example of
programs that want to override normal non-client painting
in order to present a totally customized window to the user.
Would you be happy if a change to Windows meant that you could
no longer "skin"
your favorite media player application?
That said, there have been changes to the window manager
over the years to maintain this "air of customizability"
while simultaneously intervening on behalf of the user to
keep things from going completely to the dogs.
For example, if a window stops painting for an extended
period of time, Windows would take it upon itself to paint
the window with a standard caption bar (even if the application
wanted to customize the caption bar), just so that the user
would be able to see something.
Another example of this "message virtualization" is
the appending of the phrase "(Not responding)" to the
caption of a window that has stopped responding,
and capturing the window contents as they were last visible,
drawing those captured window contents in the meantime
until the application woke up from its slumber,
and even allowing you to move, resize, minimize, and close those
The infrastructure necessary to support this behavior is quite
extensive, because the window manager needs to maintain two
sets of bookkeeping.
The first is, "What the application thinks the window state is";
if the application asks for the size of its hung window,
it needs to be told, "Oh, you're still that size you were before,
don't you worry your pretty little head",
even though the actual window size on the screen has changed
Once the hung window starts responding to messages again,
all the activity that happened "while it was away" needs to
be replayed to get the window "back up to speed" with the
state of the world.
Interesting things happen if the program wanted to customize
one of the actions that happened to the "virtual window".
For example, it might want to reject certain window sizes
or display a special message before minimizing.
Resolving these conflicts in a manner that doesn't cause
applications to crash outright is another of the difficulties
of trying to get the virtual and real window states back into sync.
In a sense, therefore, the window manager does take over
selected behaviors that used to be within the application's
purview, but it has to do it in a delicate enough manner
that neither the application nor the end user will even realize
that it's happening.
And that's what makes it hard.
Windows File Protection works by replacing files after they have
Why didn't Windows just apply ACLs to deny write permission to the
We tried that.
It didn't work.
Programs expect to be able to overwrite the files.
A program's setup would run and it decided that it needed to "update"
some system file and attempt to overwrite it.
If the system tried to stop the file from being overwritten,
the setup program would halt and report that it was unable
to install the file.
Even if the operating system detected that somebody was trying
to overwrite a system file and instead gave them a handle to
those programs would nevertheless notice that they had been
hoodwinked because as a "verification" step,
they would open the file they had just copied
and compare it against the "master copy"
on the installation CD.
The solution was to let the program think it had won,
and then, when it wasn't looking,
put the original back.
Now that Windows File Protection has been around for a few years,
software installers have learned that it's not okay to overwrite
system files (and trying to do it won't work anyway),
so starting in Windows Vista,
the Windows File Protection folks have started taking stronger
steps to protect system files,
and this includes using ACLs to make the files harder to replace.
Presumably, they will have compatibility plans in place to
accomodate programs whose setup really wants to overwrite a file.
There are some basic ground rules that apply to all system programming,
so obvious that most documentation does not bother explaining them
because these rules should have been internalized by practitioners
of the art to the point where they need not be expressed.
In the same way that when plotting driving directions
you wouldn't even consider taking a shortcut through somebody's
backyard or going the wrong way down a one-way street,
and in the same way that an experienced chess player doesn't even consider
illegal moves when deciding what to do next,
an experienced programmer doesn't even consider violating the following
basic rules without explicit permission in the documentation to the contrary:
(Remember, every statement here is a basic ground rule, not an
absolute inescapable fact. Assume every sentence here is
prefaced with "In the absence of indications to the contrary".
If the caller and callee have agreed
on an exception to the rule, then that exception applies.
a pointer is prototyped as volatile
is explicitly marked as "This value can change from another thread,"
so the rule against modifying function parameters does not apply to
such a pointer.)
Coming up with this was hard,
in the same way it's hard to come up with a list of
illegal chess moves.
The rules are so automatic that they aren't really rules
so much as things that simply are and it would be crazy
even to consider otherwise.
As a result, I'm sure there are other "rules so obvious they need
not be said" that are missing.
(For example, "You cannot terminate a thread while it is inside
somebody else's function.")
One handy rule of thumb for what you can do to a function call
is to ask, "How would I like it if somebody did that to me?"
(This is a special case of the "Imagine if this were possible" test.)
At dinner yesterday,
I mentioned how I felt ripped off when I eventually learned that
the Lenten fast does not apply to Sunday.
If you give up, say, chocolate for Lent,
you are not held to that obligation on Sundays.
Those who are mathematically inclined would have
noticed that something was up:
Lent is forty days long, yet if you count
backwards forty days from Easter Sunday,
you don't get Ash Wednesday.
To hit Ash Wednesday, you have to skip over the Sundays.
When I related this little anecdote,
the head of one of the other people at dinner perked up.
Apparently, he didn't know about this rule at all!
His parents had withheld this information from him all these years.
"I'm going to make sure to bring this up the next time I talk to them."
I was also disappointed that people were
angling for (and received!)
dispensations from the Lenten fast on St. Patrick's Day.
it didn't work when baseball's opening day fell on a Friday.)
My attitude is that
if you're going to be a member of a religion,
then don't go looking around for loopholes.
"Yeah, I'm a member of XYZ religion,
except for the parts that cramp my style."
If you don't like the rules of your religion,
then try to change them or
go find some other religion that's more compatible
with your lifestyle.
(Raymond braces for the onslaught of flames now that
he's touched on a religious topic.)
On MSDN, there's a series of articles on
the top ten things to do to make your application a Vista application.
The series began last December, and just this month, they covered
a topic dear to my heart:
If you have feedback about these articles,
posting that feedback here won't accomplish much since
I am not the author of those articles.]
Caught out by the FDA.
I happened to be in the bug spray section of the store when
I spotted a bottle of mosquito repellant that proudly
proclaimed "100% DEET".
But the FDA-mandated labelling tells a different story:
foods labeled "zero fat" are actually allowed to contain
up to a half gram of fat.
(Well, up to but not including.)
This is a definition of "zero" with which I had previously been unfamiliar.