The network interoperability compatibility problem, second follow-up
I post this entry with great reluctance, because I can feel the
heat from the pilot lights of the flame throwers all the way from here.
The struggle with the network interoperability problem continued
for several months after
I brought up the topic.
In that time, a
significant number of network attached storage devices
were found that did not implement "fast mode" queries correctly.
(Buried in this query are some of them; there are others.)
Some of them were Samba-based whose vendors did not have an upgrade
available that fixed the bug.
But many of them used custom implementations of CIFS;
consequently, any Samba-specific solutions would not have helped
those devices.
(Most of the auto-detection suggestions
people proposed addressed only the Samba scenario.
Those non-Samba devices would still not have worked.)
Even worse, most of the devices are low-cost solutions which
aren't firmware-upgradable or have any vendor support.
Some of the reports came from people running fully-patched well-known
Linux distributions.
So much for being in
all the new commercially supported offerings over the next couple months.
Furthermore, those buggy non-Samba implementations mishandled fast mode
queries in different ways.
For example, one of them I was asked to look at didn't return
any error codes at all.
It just returned garbage data (most noticeably,
corrupting the file name by deleting the first five characters).
How do you detect that this has happened?
If the server reports "I have a file called e.txt",
is Windows supposed to say, "Oh, I don't think so. I bet you're
one of those buggy servers that chops off the first five letters
of file names and that you really meant to say (scrunches forehead
in concentration) readme.txt"?
What if you really had a file called e.txt?
What if the server said, "This directory has two files, 1.txt
and 2.txt"?
Is this a buggy server?
Maybe the files are really abcde1.txt and defgh2.txt,
or maybe the server wasn't lying and the files really are
1.txt and 2.txt.
One device simply crashed if asked to perform a fast mode query.
Another wedged up and had to be reset.
"Oh, looks like somebody brought their Vista laptop from home
and plugged it into the corporate network.
Our document server crashed again."
Given the much broader ways that servers mishandled fast queries,
any attempt at auto-detecting them will necessarily be incomplete
and fail to detect broken servers.
This is fundamentally the case for servers which return perfectly
formed, but incorrect, data.
And even if the detection were perfect, if it left the server in
a crashed or hung state, that wouldn't be much consolation.
Given this new information, the solution that was settled on was
simply to stop using "fast mode" queries for anything other than
local devices.
The most popular
file system drivers for local devices (NTFS, FAT, CDFS, UDF)
are all under Microsoft's control and they have already been tested
with fast mode queries.
Such is the sad but all-too-true
cost of interoperability and compatibility.
(To address other minor points:
It's not the case that the Vista developers
"knew the [fast mode query] would break Samba-based devices since
late 2005".
The fast mode query was added, and the incompatibility with Samba
wasn't discovered until March 2006.
"Why didn't you notify the Samba team?"
Because by the time we found the problem,
they had already fixed it.)