March, 2006

  • The Old New Thing

    The network interoperability compatibility problem, first follow-up of many


    Okay, there were an awful lot of comments yesterday and it will take me a while to work through them all. But I'll start with some more background on the problem and clarifying some issues that people had misinterpreted.

    As a few people surmised, the network file server software in question is Samba, a version of which comes with most Linux distributions. (I'll have to do a better job next time of disguising the identities of the parties involved.) Samba is also very popular as the network file server for embedded devices such as network-attached storage. The bug in question is fixed in the latest version of Samba, but none of the major distributions have picked up the fix yet. Not that that helps the network-attached storage scenario any.

    It appears that a lot of people though the buggy driver was running on the Windows Vista machine, since they started talking about driver certification and blocking its installation. The problem is not on the Windows Vista machine; the problem is on the file server, which is running Linux. WHQL does not certify Linux drivers, it can't stop you from installing a driver on some other Linux machine, and it certainly can't download an updated driver and somehow upgrade your Linux machine for you. Remember, the bug is on the server, which is another computer running some other operating system. Asking Windows to update the driver on the remote server makes about as much sense as asking Internet Explorer to upgrade the version of Apache running on You're the client; you have no power over the server.

    Some people lost sight of the network-attached storage scenario, probably because they weren't familiar with the term. A network-attached storage device is a self-contained device consisting of a large hard drive, a tiny computer, and a place to plug in a network cable. The computer has an operating system burned into its ROMs (often a cut-down version of Linux with Samba), and when you turn it on, the device boots the computer, loads the operating system, and acts as a file server on your network. Since everything is burned into ROM, claiming that the driver will get upgraded and the problem will eventually be long forgotten is wishful thinking. It's not like you can download a new Samba driver and install it into your network-attached storage device. You'll have to wait for the manufacturer to release a new ROM.

    As for detecting a buggy driver, the CIFS protocol doesn't really give the client much information about what's running on the server, aside from a "family" field that identifies the general category of the server (OS/2, Samba, Windows NT, etc.) All that a client can tell, therefore, is "Well, the server is running some version of Samba." It can't tell whether it's a buggy version or a fixed version. The only way to tell that you are talking to a buggy server is to wait for the bug to happen.

    (Which means that people who said, "Windows Vista should just default to the slow version," are saying that they want Windows Vista to run slow against Samba servers and fast against Windows NT servers. This plays right into the hands of the conspiracy theorists.)

    My final remark for today is explaining how a web site can "bloat the cache" of known good/bad servers and create a denial of service if the cache did not have a size cap: First, set up a DNS server that directs all requests for * to your Linux machine. On this Linux machine, install one of the buggy versions of Samba. Now serve up this web page:


    Each of those IFRAMEs displays an Explorer window with the contents of the directory \\\b. (Since all the names resolve to the same machine, all the \\* machines are really the same.) In that directory, put 200 files, so as to trigger the "more than 100 files" bug and force Windows Vista to cache the server as a "bad" server. In this way, you forced Windows Vista to create ten thousand records for the ten thousand bad servers you asked to be displayed. Throw in a little more script and you can turn this into a loop that accesses millions of "different" servers (all really the same server). If the "bad server" cache did not have a cap, you just allowed a bad server to consume megabytes of memory that will never be freed until the computer is rebooted. Pretty neat trick.

    Even worse, if you proposed preserving this cache across reboots, then you're going to have to come up with a place to save this information. Whether you decide that it goes in a file or in the registry, the point is that an attacker can use this "bloat attack" and cause the poor victim's disk space/registry usage to grow without bound until they run out of quota. And once they hit quota, be it disk quota or registry quota, not only do bad things start happening, but they don't even know what file or registry key they have to delete to get back under quota.

    Next time, I'll start addressing some of the proposals that people came up with, pointing out disadvantages that they may have missed in their analysis.

  • The Old New Thing

    Diese Briefe wurden von unserem chinesischen Freund übersetzt


    A friend of mine is taking a vacation to Germany with her husband, and she asked me for help in booking a guest room in a seminary in one of the cities they will be visiting. I translated her initial inquiry into German, and she e-mailed both the English and German versions to the manager.

    The response was entirely in German.

    For the next few days, I translated the responses from the residence manager to English, then translated my friend's replies back into German. Finally, all the details appear to have been settled, but my friend was somewhat concerned that the residence manager may be in for a bit of a surprise to learn that neither she nor her husband speak German. (Well, her husband studied it for a year in high school and has been working through free German audio lessons courtesy of Deutsche Welle.) She asked me to make a little note at the end of her final message to set the manager's expectations:

    P.S. Wirklich sprechen wir kein Deutsch. Diese Briefe wurden von unserem chinesischen Freund übersetzt.

    My friend had invited me to join her on the trip when she started planning it, but I declined at the time. In retrospect, I should have accepted. It probably would have been a lot of fun.

    Yesterday, Michael Puff remarked in a comment, "Wenn du mal nach Deutschland kommst, lass es mich wissen, dann treffen wir uns mal und sprechen nur Deutsch." ("If you ever come to Germany, let me know and we can meet and speak exclusively in German.")

    Thanks, Michael for the kind offer, aber ich vermute, dass ich keine Probleme haben werde, in Deutschland Gelegenheiten zu finden, nur Deutsch zu sprechen. ("... but I suspect that I won't have any problems finding opportunities in Germany to speak exclusively in German.")

    (I'm joking! I'll be sure to let everybody know when I go travelling and am willing to meet up with people.)

  • The Old New Thing

    How would you solve this compatibility problem: Network interoperability


    Okay, everybody, here's your chance to solve a compatibility problem. There is no answer yet; I'm looking to see how you folks would attack it. This is a real bug in the Windows Vista database.

    A beta tester reported that Explorer fails to show more than about a hundred files per directory from file servers running a particular brand of the file server software. The shell and networking teams investigated the problem together and tracked it down to the server incorrectly handling certain types of directory queries. Although the server claims to support both slow and fast queries, if you try a fast query, it returns only the first hundred or so files and then gives up with a strange error code. On the other hand, if Explorer switches to the slow query, then everything works fine. (Windows XP always used the slow query.) Additional data: An update to the server software was released earlier this year which claims to fix the bug. However (as of this writing), all of the vendor's distributors continue to ship the buggy version of the driver.

    What should we do? Here are some options. Choose of of the below or make up your own!

    Do nothing

    Make no accomodation for this particular buggy protocol implementation. People who are running that particular implementation will get incomplete directory listings. Publish a Knowledge Base article describing the problem and directing customers to contact the vendor for an updated driver.


    • Operating system remains "pure", unsullied by compatibility hacks.


    • Customers with this problem may not even realize that they have it.
    • Even if customers notice something wrong, they won't necessarily know to search for the vendor's name (as opposed to the distributor's name) in the Knowledge Base to see if there are any known interoperability problems with it.
    • And even if the customer finds the Knowledge Base article, they will have to bypass their distributor and get the driver directly from the vendor. This may invalidate their support contract with the distributor.
    • If the file server software is running on network attached storage, the user likely doesn't even know what driver is running inside the sealed plastic case. Upgrading the server software will have to wait for the distributor to issue a firmware upgrade. Until then, the user will experience temporary data loss. (Those files beyond the first hundred are invisible.)
    • If the customer does not own the file server, the best they can do is ask the file server's administrator to upgrade their driver and hope the administrator agrees to do so.
    • Since Windows XP didn't use fast queries, it didn't have this problem. Users will interpret it as a bug in Windows Vista.

    Auto-detect the buggy driver and put up a warning dialog

    Explorer should recognize the strange error code and display an error message to the user saying, "The server \\servername appears to be running an old version of the XYZ driver that does not report the contents of large directories properly. Not all items in the directory are shown here. Please contact the administrator of the machine \\servername to have the driver upgraded." (Possibly with a "Don't show this dialog again" check-box.)


    • Users are told why they are getting incomplete results.


    • There's not much the user can do about the incomplete results. It looks like a "Ha ha, you lose" dialog.
    • Users often don't know who the administrators of a file server are, so telling them to contact the administrator merely leads to a frustrated, "And who is that, huh?", or even worse, "That's me! And I have no idea what this dialog box is telling me to do." (Consider the network attached storage device.)
    • The administrator of that machine might have his/her reasons for not upgrading the driver (for example, because it voids the support contract), but they will keep getting pestered by users thanks to this new dialog.
    • Since Windows XP didn't use fast queries, it didn't have this problem. Users will interpret it as a bug in Windows Vista.

    Auto-detect the buggy driver and work around it next time

    Explorer should recognize the strange error code and say, "Oh, this server must have the buggy driver. It's too late to do anything about the current directory information, but I'll remember that I should do things the slow way in the future when talking to this server."

    To avoid denial-of-service attacks, remember only the last 16 (say) servers that exhibit the problem. (If the list of "known bad" servers were unbounded, then an attacker could consume all the memory on your computer by creating a server that responded to a billion different names and using HTTP redirects to get you to visit all of those servers in turn.)


    • Windows auto-detects the problem and works around it.


    • The first directory listing of a large directory from a buggy server will be incorrect. If that first directory listing is for something that has a long lifetime (for example, Explorer's folder tree), then the incorrect data will persist for a long time.
    • If you regularly visit more than 16 (say) buggy servers, then when you visit the seventeenth, the first one falls out of the cache and will return incorrect data the first time you visit a large directory.
    • May also have to develop and test a mechanism so that network administrators can deploy a "known bad list" of servers to all the computers on their network. In this way, servers on the "known bad list" won't have the "first directory listing is bad" problem.
    • Since Windows XP didn't use fast queries, it didn't have this problem. Users will interpret it as a bug in Windows Vista.

    Have a configuration setting to put the network client into "slow mode"

    Add a configuration setting to the Windows network client to tell it "If somebody asks whether a server supports fast queries, always say No, even if the server says Yes." In this manner, no program will attempt to use fast queries; they will all use slow queries. Directory queries will run slower, but at least they will work.


    • With the setting set to "slow mode", you never get any incomplete directory listings.


    • Since the detection is not automatic, you have many of the same problems as "Do nothing". Customers have to know that they have a problem and know what to search for before they can find the configuration setting in the Knowledge Base. Until then, the behavior looks like a bug in Windows Vista.
    • This punishes file servers that are not buggy by making them use slow queries even though they support fast queries.

    Have a configuration setting to put Explorer into "slow mode"

    Add a configuration setting to Explorer to tell it "Always issue slow queries; never issue fast queries." Directory queries will run slower, but at least they will work. But this affects only Explorer; other programs which ask the server "Do you support fast queries?" will receive an affirmative response and attempt to use fast queries, only to rediscover the problem that Explorer worked around.


    • With the setting set to "slow mode", you never get any incomplete directory listings.


    • Every program that uses fast queries must have their own setting for disabling fast queries and running in "slow mode".
    • Plus all the same disadvantages as putting the setting in the network client.

    Disable "fast mode" by default

    Stop supporting "fast mode" in the network client since it is unreliable; there are some servers that don't handle "fast mode" correctly. This forces all programs to use "slow mode". Optionally, have a configuration setting to re-enable "fast mode".


    • All directory listings are complete. Everything just works.


    • The "fast mode" feature may as well never have been created: It's off by default and nobody will bother turning it on since everything works "well enough".
    • People will accuse Microsoft of unfair business practices since the client will run in "slow mode" even if the server says it supports "fast mode". "Obviously, Microsoft did this in order to boost sales of its competing product which doesn't have this artificial and gratuitous speed limiter."

    Something else

    Be creative. Make sure to list both advantages and disadvantages of your proposal.

  • The Old New Thing

    Inadvertently passing large objects by value


    One mark of punctuation can make all the difference.

    One program was encountering a stack overflow exception in a function that didn't appear to be doing anything particularly stack-hungry. The following code illustrates the problem:

    bool TestResults::IsEqual(TestResults& expected)
     if (m_testMask != expected.m_testMask) {
      return false;
     bool result = true;
     if (result && (m_testMask & AbcTestType)) {
      result = CompareAbc(expected);
     if (result && (m_testMask & DefTestType)) {
      result = CompareDef(expected);
     if (result && (m_testMask & GhiTestType)) {
      result = CompareGhi(expected);
     if (result && (m_testMask & JklTestType)) {
      result = CompareJkl(expected);
     return result;

    (In reality, the algorithm for comparing two tests results was much more complicated, but that's irrelevant to this discussion.)

    And yet on entry to this function, a stack overflow was raised.

    The first thing to note is that this problem occurred only on the x64 build of the test. The x86 version ran fine, or at least appeared to. It so happens that the x64 compiler aggressively inlines functions, which as it turned out was a major exacerbator of the problem.

    The title of this entry probably tipped you off to what happened: The helper functions accepted the test results parameter by value not by reference:

    bool TestResults::CompareAbc(TestResults expected);
    bool TestResults::CompareDef(TestResults expected);
    bool TestResults::CompareGhi(TestResults expected);
    bool TestResults::CompareJkl(TestResults expected);

    and those comparison functions in turn called other comparison functions, which also passed the TestResults by value. Since the test results were passed by value, a temporary copy was made on the stack and passed to the comparison function. It so happened that the TestResults class was a very large one, a hundred kilobytes or so, and the TestResults::IsEqual function therefore needed to reserve room for a large number of such temporary copies, one for each call to a comparison function in each of the inlined functions. A dozen temporary copies times a hundred kilobytes per copy comes out to over a megabyte of temporary variables, which exceeded the default one megabyte stack size and therefore resulted in a stack overflow exception on entry to the TestResults::IsEqual function.

    This code appeared to run fine when compiled for the x86 architecture because the x86-targetting compiler did not inline quite as aggressively, so the large temporaries were not reserved on the stack until the helper comparison was actually called. Since the comparisons went only three levels deep, there were only three temporary copies of the expected parameter, which fit within the one megabyte default stack. It was still bad code—consuming a few hundred kilobytes of stack for no reason—but it wasn't bad enough to cause a problem. The fix, of course, was to change the comparison functions to accept the parameter by reference.

    bool TestResults::IsEqual(const TestResults& expected) const;
    bool TestResults::CompareAbc(const TestResults& expected) const;
    bool TestResults::CompareDef(const TestResults& expected) const;
    bool TestResults::CompareGhi(const TestResults& expected) const;
    bool TestResults::CompareJkl(const TestResults& expected) const;

    For good measure, the parameter was changed to a const reference, and the function was tagged as itself const to emphasize that neither the object nor the expected value will be modified as part of the comparison, thereby ensuring that changing from a copy to a const reference didn't change the previous behavior. Without the const reference, there was a possibility that somewhere deep inside the comparison functions, they made a change to the expected parameter. Under the old pass-by-value declaration, this change was discarded when the function returned since the change was made to a copy. If we had left off the const from the reference, then we would have changed the behavior: The change to the expected parameter would have modified the original TestResults. Making the parameter const reassures us that an attempt to modify expected would be flagged by the compiler and therefore brought to our attention.

    (This technique is not foolproof, however. Somebody could always cast away const-ness and modify the original, but we were being reckless and assuming that nobody would be that crazy.)

  • The Old New Thing

    The rise and fall of the German language


    Kyle James reports for a variety of public radio programs and networks, including Deutsche Welle via Worldview (see March 26), NPR, and PRI's Marketplace. His English grammar is perfect, the pronunciation impeccably American, but if you listen, you'll still notice something odd about his voice. It may even take you a few listens before you figure out what it is.

    It's the cadence.

    Even though Mr. James is speaking in English, the shape of the sentences—the rise and fall of the pitch, the changing velocity of the words—is characteristically German. The German language has a particular shape to it. It's hard to describe in words; you have to listen to a lot of spoken German to start to get a feel for it. For example, in an "if, then" type of sentence, American English takes the important word of the "if" part (typically the last word) and starts it at a higher pitch, dropping it rapidly to a very low pitch, and holding it there for the remainder of the clause, perhaps with a very small uptick at the end.

    If it rains to- .

    German, on the other hand, tends to take the important word and raise its pitch, holding the pitch high until the end of the clause.

    heute regnet,
    Wenn es

    (On the other hand, if the emphasis were not on the day but on the weather, then the pitch would rise on the word "rain" or "regnet".)

    I remember listening to an English-language Deutsche Welle broadcast where the native German newsreaders were speaking with a BBC cadence. (The BBC end-of-sentence cadence, in particular.) It worked for a while, but after a minute I simply couldn't bear to listen any more and had to shut it off. The problem was that they were using that one sentence shape over and over again instead of varying as the flow of the article demanded.

    Getting the right sentence flow is one of the things you almost never learn formally when studying a language. Rather, it's something you simply have to pick up as you go. And it's often so subtle that you never perfect it. For example, when I'm speaking German—which happens almost never nowadays—I often get so worried about declining my adjectives correctly (how hard can it be? there are only 48 scenarios to worry about) that I pay almost no attention to getting the right sentence shape.

    Here's a chart of the 48 adjective endings, as applied to the regular adjective weich, which means "soft". Don't worry about the last six charts; they are just repeats of the first three with a different root. It wouldn't be so bad if adjectives didn't come in three "strengths"... Then again, I'm sure other languages like Finnish or Icelandic can put German to shame. (And I have to admit, after working with German adjectives for a few years, I eventually developed a quasi-instinctive feel for how they should work, although the plural adjective endings always fool my intuition. When I learned that Swedish adjectives come in only two strengths, I felt kind of cheated.)

  • The Old New Thing

    Why are there two copies of Notepad?


    You may have noticed that there's a copy of Notepad in %windir%\notepad.exe and another in %windir%\system32\notepad.exe. Why two?

    Compatibility, of course.

    Windows 3.0 put Notepad in the Windows directory. Windows NT put it in the System32 directory.

    Notepad is perhaps the most commonly hardcoded program in Windows. many Setup programs use it to view the Readme file, and you can use your imagination to come up with other places where a program or batch file or printed instructions will hard-code the path to Notepad.

    In order to be compatible with programs designed for Windows 95, there needs to be a copy of Notepad in the Windows directory. And in order to be compatible with programs designed for Windows NT, there also needs to be a copy in the System32 directory.

    And now that Notepad exists in both places, new programs have a choice of Notepads, and since there is no clear winner, half of them will choose the one in the Windows directory and half will choose the one in the System32 directory, thereby ensuring the continued existence of two copies of Notepad for years to come.

  • The Old New Thing

    Public service announcement for Roman Catholics: Sunday is not a fast day


    At dinner yesterday, I mentioned how I felt ripped off when I eventually learned that the Lenten fast does not apply to Sunday. If you give up, say, chocolate for Lent, you are not held to that obligation on Sundays. Those who are mathematically inclined would have noticed that something was up: Lent is forty days long, yet if you count backwards forty days from Easter Sunday, you don't get Ash Wednesday. To hit Ash Wednesday, you have to skip over the Sundays. Hm...

    When I related this little anecdote, the head of one of the other people at dinner perked up. Apparently, he didn't know about this rule at all! His parents had withheld this information from him all these years. "I'm going to make sure to bring this up the next time I talk to them."

    I was also disappointed that people were angling for (and received!) dispensations from the Lenten fast on St. Patrick's Day. (At least it didn't work when baseball's opening day fell on a Friday.) My attitude is that if you're going to be a member of a religion, then don't go looking around for loopholes. "Yeah, I'm a member of XYZ religion, except for the parts that cramp my style." If you don't like the rules of your religion, then try to change them or go find some other religion that's more compatible with your lifestyle.

    (Raymond braces for the onslaught of flames now that he's touched on a religious topic.)

  • The Old New Thing

    Why doesn't the window manager just take over behavior that used to be within the application's purview?


    A commenter named "Al" wondered why the window manager couldn't just take over behavior that used to be within the application's purview, such as painting the non-client area, in order to avoid problems with applications not responding to messages promptly enough. If the window manager were being rewritten, then perhaps it could. But to do it now would introduce many compatibility issues.

    First, there are many applications that have subtle dependencies on message ordering or receiving certain types of messages at certain times, even though there is no actual guarantee in the specification that such messages be delivered. There are a large number of applications that rely on WM_PAINT messages being delivered even if there is nothing to paint, because they defer some critical computations until the first WM_PAINT message, and if something that requires the result of that computation happens before a WM_PAINT, they crash. For example, if you launch a program minimized, then right-click on the taskbar button for the program's main window, these programs would crash because the code that handles the system menu uses a pointer variable that the WM_PAINT handler initializes or divides by a global variable whose default value is zero but whose value is calculated during WM_PAINT handling. To accomodate these programs, the window manager is forced to send "dummy" WM_PAINT messages with an empty rcPaint. These such messages appear to accomplish nothing, but the hidden agenda is that the program gets its cherished WM_PAINT message and can perform whatever operations it is that keeps it from crashing later on.

    Second, removing customizability of message behavior from the window manager would prevent programs from customizing their appearance in nonstandard ways. Media players are perhaps the most popular example of programs that want to override normal non-client painting in order to present a totally customized window to the user. Would you be happy if a change to Windows meant that you could no longer "skin" your favorite media player application?

    That said, there have been changes to the window manager over the years to maintain this "air of customizability" while simultaneously intervening on behalf of the user to keep things from going completely to the dogs. For example, if a window stops painting for an extended period of time, Windows would take it upon itself to paint the window with a standard caption bar (even if the application wanted to customize the caption bar), just so that the user would be able to see something.

    Another example of this "message virtualization" is the appending of the phrase "(Not responding)" to the caption of a window that has stopped responding, and capturing the window contents as they were last visible, drawing those captured window contents in the meantime until the application woke up from its slumber, and even allowing you to move, resize, minimize, and close those unresponsive windows. The infrastructure necessary to support this behavior is quite extensive, because the window manager needs to maintain two sets of bookkeeping. The first is, "What the application thinks the window state is"; if the application asks for the size of its hung window, it needs to be told, "Oh, you're still that size you were before, don't you worry your pretty little head", even though the actual window size on the screen has changed significantly. Once the hung window starts responding to messages again, all the activity that happened "while it was away" needs to be replayed to get the window "back up to speed" with the state of the world. Interesting things happen if the program wanted to customize one of the actions that happened to the "virtual window". For example, it might want to reject certain window sizes or display a special message before minimizing. Resolving these conflicts in a manner that doesn't cause applications to crash outright is another of the difficulties of trying to get the virtual and real window states back into sync.

    In a sense, therefore, the window manager does take over selected behaviors that used to be within the application's purview, but it has to do it in a delicate enough manner that neither the application nor the end user will even realize that it's happening. And that's what makes it hard.

  • The Old New Thing

    The simplified office


    In response to my description of my own office, my colleague Colin Birge shared this anecdote about one Microsoft employee who took office simplification about as far it could go:

    He was one of the earliest usability specialists in Office, later to become the usability manager before ultimately retiring. As befits a person of seniority, for most of the last four years or so he spent at Microsoft, he had a window office, usually extra large.

    The contents of his office were as follows:

    • One small square table.
    • One office chair.
    • One guest chair.
    • One laptop.
    • One whiteboard.

    That was it. The rest of the standard desk was gone, including the corner piece and the extended table. There were no bookcases. Apart from the whiteboard, he had nothing on his walls. He had no books, no computer equipment apart from the laptop, no personal materials at all. When he retired, he didn't have to clean out his office. He took nothing with him but the clothes he was wearing.

    Apparently his apartment was nearly as spartan.

    I knew a guy once who claimed that you didn't really own anything that you couldn't carry at a dead run while firing an AK-47 over your shoulder. This fellow was the only person I ever knew who lived that philosophy.

  • The Old New Thing

    Where technology names came from: WiFi and FireWire


    Phil Belanger tells the story behind the name WiFi (and it is not short for "Wireless Fidelity"). Meanwhile, Michael Johas Teener tells the story of where the name FireWire came from. (Scroll down to "Why all these names?")

    [9:30am - I originally had a link to a NY Times article, but it was the wrong article and I can't find the right one, so I'm just going to link directly to Michael Jonas Teener's explanation.]

Page 1 of 5 (41 items) 12345