February, 2007

  • The Old New Thing

    Bonus material for The Old New Thing (the book) is now available for download


    I've just been informed by my publisher that the bonus chapters from my book are now available for download. Click on "Sample Chapters". Sorry they're late.

    The source code for the programs in the book can be downloaded from the "Source Code" link. And on a more embarrassing note, there's that "Errata" link, too.

  • The Old New Thing

    What does an NMI error mean? (The infamous "Hardware Malfunction")


    I promised to talk more about NMI, so here it is.

    What generates an NMI? What does it mean?

    The first question is easy to answer but doesn't actually shed much light: Any device can pull the NMI line, and that will generate a non-maskable interrupt. Back in the Windows 95 days, a few really cool people had taken the ball-point pen trick one step further: They had a special expansion card in their computer with a cord coming out the back. At the end of the cord was a momentary switch like the one you might see on a quiz show. If you pressed it, the card generated an NMI. No fumbling around with ball-point pens for these folks, no-ho! (To be honest, I had two of these. One of them was a simple NMI card, triggered by a foot pedal! The other was really a card with a high-resolution real-time clock that could be used for performance analysis. I used the NMI button far more often than the timer...)

    In practice, the only device that generates an NMI (on purpose) is the memory controller, which raises it when a parity error is detected. The non-geek explanation of a parity error: Your memory chips are acting flakey.

    Here's what a parity error looks like. It shows up as a mysterious "Hardware Malfunction" error.

    Now, it's possible that a device may be generating an NMI by mistake. For example, in Wendy's case, it may have been due to damaged caused by overheating.

    If you suspect your memory chips, you can run a memory diagnostic tool to see if it can find the bad memory.

    My colleague Keith Moore reminded me that paradoxically, on the IBM PC-AT, you could mask the non-maskable interrupt! This definitely falls into the category of "Unclear on the concept." The masking was done in hardware that could be configured via some magic port I/O. It prevented the NMI from reaching the CPU in the first place. (NMI is still not maskable in the CPU.)

  • The Old New Thing

    Please feel free to stop using DDE


    A commenter asked, "As an application programmer, can I really ignore DDE if I need to interact with explorer/shell?"

    The answer is, "Yes, please!"

    While it was a reasonable solution back in the cooperatively-multitasked world of 16-bit Windows where it was invented, the transition to 32-bit Windows was not a nice one for DDE. Specifically, the reliance on broadcasts to establish the initial DDE conversation means that unresponsive programs can jam up the entire DDE initiation process. The last shell interface to employ DDE was the communication with Program Manager to create program groups and items inside those groups. This was replaced with Explorer and the Start menu back in Windows 95. DDE has been dead as a shell interface for over ten years.

    Of course, for backwards compatibility, the shell still supports DDE for older programs that choose to use it. You can still create icons on the Start menu via DDE and you can still register your documents to launch via DDE if you really want to, but if you take a pass on DDE you won't be missing anything.

    On the other hand, even though there is no technological reason for you to use DDE, you still have to be mindful of whether your actions will interfere with other people who choose to: If you stop processing messages, you will clog up DDE initiation, among other things. It's like driving an automatic transmission instead of a manual transmission. There is no requirement (in the United States, at least) that you own a manual transmission or even know how to operate one. But you still have to know to ensure that your actions do not interfere with people who do have manual transmissions, such as watching out for cars waiting for the traffic light to change while pointed uphill.

  • The Old New Thing

    Why can't you set the command prompt's current directory to a UNC?


    If you try to set the current directory of a command prompt, you get the error message "CMD does not support UNC paths as current directories." What's going on here?

    It's MS-DOS backwards compatibility.

    If the current directory were a UNC, there wouldn't be anything to return to MS-DOS programs when they call function 19h (Get current drive). That function has no way to return an error code, so you have to return a drive letter. UNCs don't have a drive letter.

    You can work around this behavior by using the pushd command to create a temporary drive letter for the UNC. Instead of passing script.cmd to the CreateProcess function as the lpCommandLine, you can pass cmd.exe /c pushd \\server\share && script.cmd.

    (Griping that seems to happen any time I write about batch files, so I'll gripe them pre-emptively: Yes, the batch "language" sucks because it wasn't designed; it just evolved. I write this not because I expect you to enjoy writing batch files but because you might find yourself forced to deal with them. If you would rather abandon batch files and use a different command interpreter altogether, then more power to you.)

  • The Old New Thing

    Email tip: Don't add people to a thread without saying why


    If you add me to an existing discussion, you have to say why. Do you have a specific question for me? Do you want my opinion on something? Are you just sharing a funny joke?

    Sometimes, I'll get a piece of mail that goes like this:

    From: Xxxxx
    To: Aaaaa; Bbbbb; Ccccc; Raymond

    Adding Raymond.

    --- Original Message ---

    Gee, that's very nice of you to add me, but you didn't say why. Is this a FYI? Is there a question you want answered? Often, the discussion is just "Gosh, there's this bug, person A proposes a theory, person B proposes a counter-theory, person C runs some tests and has some preliminary results, adding Raymond."

    It's like "Adding Raymond" is a ritual phrase people sprinkle into a mail thread. They don't know what'll happen when they say it, they don't even have any expectations, but it doesn't hurt to say it, right? "When in doubt, add Raymond."

    If you don't explain why you added me to a thread, I'm just going to killfile it.

  • The Old New Thing

    With what operations is LockWindowUpdate not meant to be used?


    Okay, now that we know what operations LockWindowUpdate is meant to be used with, we can look at various ways people misuse the function for things unrelated to dragging.

    People see the "the window you lock won't be able to redraw itself" behavior of LockWindowUpdate and use it as a sort of lazy version of the WM_SETREDRAW message. Though sending the WM_SETREDRAW message really isn't that much harder than calling LockWindowUpdate. It's twenty more characters of typing, half that if you use the SetWindowRedraw macro in <windowsx.h>.

    Instead of LockWindowUpdate(hwnd)
    Use SendMessage(hwnd, WM_SETREDRAW, FALSE, 0) or
    SetWindowRedraw(hwnd, FALSE)
    Instead of LockWindowUpdate(NULL)
    Use SendMessage(hwnd, WM_SETREDRAW, TRUE, 0) or
    SetWindowRedraw(hwnd, TRUE)

    As we noted earlier, only one window in the system can be locked for update at a time. If your intention for calling LockWindowUpdate is merely to prevent a window from redrawing, say, because you're updating it and don't want the window continuously refreshing until your update is complete, then just disable redraw on that window. If you use LockWindowUpdate, you create a whole slew of subtle problems.

    First off, if some other program is misusing LockWindowUpdate in this same way, then one of you will lose. Whoever tries LockWindowUpdate first will get it, and the second program will fail. Now what do you do? Your window isn't locked any more.

    Second, if you have locked your window for update and the user switches to another program and tries to drag an item (or even just tries to move the window!), that attempt to LockWindowUpdate will fail, and the user is now in the position where drag/drop has stopped working for some mysterious reason. And then, ten seconds later, it starts working again. "Stupid buggy Windows," the user mutters.

    Conversely, if you decide to call LockWindowUpdate when a drag/drop or window-move operation is in progress, then your call will fail.

    This is just a specific example of the more general programming mistake of using global state to manage a local condition. When you want to disable redrawing in one of your windows, you don't want this to affect other windows in the system; it's a local condition. But you're using a global state (the window locked for update) to keep track of it.

    I can already anticipate people saying, "Well, the window manager shouldn't let somebody lock a window for update if they're not doing a drag/drop operation." But how does the window manager know? It knows what is happening, but it doesn't know why. Is that program calling LockWindowUpdate because it's too lazy to use the WM_SETREDRAW message? Or is it doing it in response to some user input that resulted in a drag/drop operation? Note that you can't just say, "Well, the mouse button has to be down," because the user might be performing a keyboard-based operation (such as resizing a window with the arrow keys) that has the moral equivalent of a drag/drop. Morality is hard enough to resolve as it is; expecting computers to be able to infer it is asking a bit much.

    Next time, a final remark on LockWindowUpdate.

  • The Old New Thing

    What does LockWindowUpdate do?


    Poor misunderstood LockWindowUpdate.

    This is the first in a series on LockWindowUpdate, what it does, what it's for and (perhaps most important) what it's not for.

    What LockWindowUpdate does is pretty simple. When a window is locked, all attempt to draw into it or its children fail. Instead of drawing, the window manager remembers which parts of the window the application tried to draw into, and when the window is unlocked, those areas are invalidated so that the application gets another WM_PAINT message, thereby bringing the screen contents back in sync with what the application believed to be on the screen.

    This "keep track of what the application tried to draw while Condition X was in effect, and invalidate it when Condition X no longer hold" behavior you've seen already in another guise: CS_SAVEBITS. In this sense, LockWindowUpdate does the same bookkeeping that would occur if you had covered the locked window with a CS_SAVEBITS window, except that it doesn't save any bits.

    The documentation explicitly calls out that only one window (per desktop, of course) can be locked at a time, but this is implied by the function prototype. If two windows could be locked at once, it would be impossible to use LockWindowUpdate reliably. What would happen if you did this:

    LockWindowUpdate(hwndA); // locks window A
    LockWindowUpdate(hwndB); // also locks window B
    LockWindowUpdate(NULL); // ???

    What does that third call to LockWindowUpdate do? Does it unlock all the windows? Or just window A? Or just window B? Whatever your answer, it would make it impossible for the following code to use LockWindowUpdate reliably:

    void BeginOperationA()
    void EndOperationA()
    void BeginOperationB()
    void EndOperationB()

    Imagine that the BeginOperation functions started some operation that was triggered by asynchronous activity. For example, suppose operation A is drawing drag/drop feedback, so it begins when the mouse goes down and ends when the mouse is released.

    Now suppose operation B finishes while a drag/drop is still in progress. Then EndOperationB will clean up operation B and call LockWindowUpdate(NULL). If you propose that that should unlock all windows, then you've just ruined operation A, which expects that hwndA still be locked. Similarly, if you argue that it should unlock only hwndA, then only only is operation A ruined, but so too is operation B (since hwndB is still locked even though the operation is complete). On the other hand, if you propose that LockWindowUpdate(NULL) should unlock hwndB, then consider the case where operation A completes first.

    If LockWindowUpdate were able to lock more than one window at a time, then the function prototype would have to have been changed so that the unlock operation knows which window is being unlocked. There are many ways this could have been done. For example, a new parameter could have been added or a separate function created.

    // Method A - new parameter
    // fLock = TRUE to lock, FALSE to unlock
    BOOL LockWindowUpdate(HWND hwnd, BOOL fLock);
    // Method B - separate function
    BOOL LockWindowUpdate(HWND hwnd);
    BOOL UnlockWindowUpdate(HWND hwnd);

    But neither of these is the case. The LockWindowUpdate function locks only one window at a time. And the reason for this will become more clear as we learn what LockWindowUpdate is for.

  • The Old New Thing

    How to get your laptop to resume from standby in under two seconds


    One of my colleagues recently posted the story of the work he did to get laptops to resume quickly. The fun part was implementing the optimizations in the kernel. The not-fun part was finding all the drivers who did bad things and harassing their owners into fixing the bugs.

    One some laptops, he could get the resume time down to an impressive one second. And then entropy set in.

    It's likely you've never seen a real off-the-shelf laptop resume this quickly. And the reason is that as soon as you stop twisting the arms of all the driver writers, they stop worrying about how fast your laptop resumes and go back to worrying about when they can get their widget driver mostly working so they can get through WHQL and sell their widget.

    But now you have some tools to fight back, at least a little bit. The second half of that article explains how to use the event viewer to track down which drivers are ruining your resume time and disable them.

  • The Old New Thing

    The politician's fallacy and the politician's apology


    I learned this from Yes, Minister. They call it the politician's fallacy:

    1. Something must be done.
    2. This is something.
    3. Therefore, we must do it.

    As befits its name, you see it most often in politics, where poorly-thought-out solutions are proposed for urgent problems. But be on the lookout for it in other places, too. You might see somebody falling victim to the politician's fallacy at a business meeting, say.

    Something else I picked up is what I'm going to call the politician's apology. This is where you apologize for a misdeed not by apologizing for what you did, but rather apologizing that other people were offended. One blogger coined the word "fauxpology" to describe this sort of non-apology. In other words, you're not apologizing at all! It's like the childhood non-apology.

    "Apologize to your sister for calling her ugly."

    "I'm sorry you're ugly."

    In the politician's apology, you apologize not for the offense itself, but for the fact that what you did offended someone. "I'm sorry you're a hypersensitive crybaby."

    The president regretted any hurt feelings his statements may have caused.

    Another form of non-apology is to state that bad things happened without taking responsibility for causing them:

    There should not have been any physical contact in this incident. I am sorry that this misunderstanding happened at all, and I regret its escalation and I apologize.

    This particular non-apology even begins with the accusation that the other party was at fault for starting the incident!

    What bothers me is that these types of non-apologies are so common that nobody is even offended by their inadequacy. They are accepted as just "the way people apologize in public". (It's become so standard that Slate's William Saletan has broken it down into steps for us.)

  • The Old New Thing

    The network interoperability compatibility problem, second follow-up


    I post this entry with great reluctance, because I can feel the heat from the pilot lights of the flame throwers all the way from here.

    The struggle with the network interoperability problem continued for several months after I brought up the topic. In that time, a significant number of network attached storage devices were found that did not implement "fast mode" queries correctly. (Buried in this query are some of them; there are others.) Some of them were Samba-based whose vendors did not have an upgrade available that fixed the bug. But many of them used custom implementations of CIFS; consequently, any Samba-specific solutions would not have helped those devices. (Most of the auto-detection suggestions people proposed addressed only the Samba scenario. Those non-Samba devices would still not have worked.) Even worse, most of the devices are low-cost solutions which aren't firmware-upgradable or have any vendor support.

    Some of the reports came from people running fully-patched well-known Linux distributions. So much for being in all the new commercially supported offerings over the next couple months.

    Furthermore, those buggy non-Samba implementations mishandled fast mode queries in different ways. For example, one of them I was asked to look at didn't return any error codes at all. It just returned garbage data (most noticeably, corrupting the file name by deleting the first five characters). How do you detect that this has happened? If the server reports "I have a file called e.txt", is Windows supposed to say, "Oh, I don't think so. I bet you're one of those buggy servers that chops off the first five letters of file names and that you really meant to say (scrunches forehead in concentration) readme.txt"? What if you really had a file called e.txt? What if the server said, "This directory has two files, 1.txt and 2.txt"? Is this a buggy server? Maybe the files are really abcde1.txt and defgh2.txt, or maybe the server wasn't lying and the files really are 1.txt and 2.txt.

    One device simply crashed if asked to perform a fast mode query. Another wedged up and had to be reset. "Oh, looks like somebody brought their Vista laptop from home and plugged it into the corporate network. Our document server crashed again."

    Given the much broader ways that servers mishandled fast queries, any attempt at auto-detecting them will necessarily be incomplete and fail to detect broken servers. This is fundamentally the case for servers which return perfectly formed, but incorrect, data. And even if the detection were perfect, if it left the server in a crashed or hung state, that wouldn't be much consolation.

    Given this new information, the solution that was settled on was simply to stop using "fast mode" queries for anything other than local devices. The most popular file system drivers for local devices (NTFS, FAT, CDFS, UDF) are all under Microsoft's control and they have already been tested with fast mode queries.

    Such is the sad but all-too-true cost of interoperability and compatibility.

    (To address other minor points: It's not the case that the Vista developers "knew the [fast mode query] would break Samba-based devices since late 2005". The fast mode query was added, and the incompatibility with Samba wasn't discovered until March 2006. "Why didn't you notify the Samba team?" Because by the time we found the problem, they had already fixed it.)

Page 1 of 4 (39 items) 1234