September, 2012

  • The Old New Thing

    How did the X-Mouse setting come to be?


    Commenter HiTechHiTouch wants to know whether the "X-Mouse" feature went through the "every request starts at −100 points filter", and if so, how did it manage to gain 99 points?

    The X-Mouse feature is ancient and long predates the "−100 points" rule. It was added back in the days when a developer could add a random rogue feature because he liked it.

    But I'm getting ahead of myself.

    Rewind back to 1995. Windows 95 had just shipped, and some of the graphics people had shifted their focus to DirectX. The DirectX team maintained a very close relationship with the video game software community, and a programmer at one of the video game software companies mentioned in passing as part of some other conversation, "Y'know, one thing I miss from my X-Windows workstation is the ability to set focus to a window by just moving the mouse into it."

    As it happened, that programmer happened to mention it to a DirectX team member who used to be on the shell team, so the person he mentioned it to actually knew a lot about all this GUI programming stuff. Don't forget, in the early days of DirectX, it was a struggle convincing game vendors to target this new Windows 95 operating system; they were all accustomed to writing their games to run under MS-DOS. Video game programmers didn't know much about programming for Windows because they had never done it before.

    That DirectX team member sat down and quickly pounded out the first version of what eventually became known to the outside world as the X-Mouse PowerToy. He gave a copy to that programmer whose request was made almost as an afterthought, and he was thrilled that he could move focus around with the mouse the way he was used to.

    "Hey, great little tool you got there. Could you tweak it so that when I move the mouse into a window, it gets focus but doesn't come to the top? Sorry I didn't mention that originally; I didn't realize you were going to interpret my idle musing as a call to action!"

    The DirectX team member added the feature and added a check-box to the X-Mouse PowerToy to control whether the window is brought to the top when it is activated by mouse motion.

    "This is really sweet. I hate to overstay my welcome, but could you tweak it so that it doesn't change focus until my mouse stays in the window for a while? Again, sorry I didn't mention that originally."

    Version three of X-Mouse added the ability to set a delay before it moved the focus. And that was the version of X-Mouse that went into the PowerToys.

    When the Windows NT folks saw the X-Mouse PowerToy, they said, "Aw shucks, we can do that too!" And they added the three System­Parameters­Info values I described in an earlier article so as to bring Windows NT up to feature parity with X-Mouse.

    It was a total rogue feature.

  • The Old New Thing

    Rogue feature: Docking a folder at the edge of the screen


    Starting in Windows 2000 and continuing through Windows Vista, you could drag a folder out of Explorer and slam it into the edge of the screen. When you let go, it docked itself to that edge of the screen like a toolbar. A customer noticed that this stopped working in Windows 7 and asked, "Was this feature dropped in Windows 7, and is there a way to turn it back on?"

    Yes, the feature was dropped in Windows 7, and there is no way to turn it back on because the code to implement it was deleted from the product. (Well, okay, you could "turn it back on" by working with your support representative to file a Design Change Request with the Windows Sustained Engineering team and asking them to restore the code. But they'll probably cackle with glee as they click REQUEST DENIED. They will also probably add a buzzing sound just for extra oomph.)

    The introduction of this feature took place further back in history than I have permission to access the Windows source code history database, so I can't explain how it was introduced, but I can guess, and then the person who removed the feature confirmed that my guess was correct.

    First of all, very few people were actually using the feature. And of the people who activated it, most of them did so by mistake and couldn't figure out how to undo it. (Sound familiar?) The feature was creating far more trouble than benefit, and by that calculation alone, it was a strong candidate for removal. Furthermore, the design team was interested in a new way to use the edges of the screen. Nobody could figure out how the docking feature actually got added. We strongly suspect that it was another rogue feature added by a specific developer who had a history of slipping in rogue features.

  • The Old New Thing

    Does the CopyFile function verify that the data reached its final destination successfully?


    A customer had a question about data integrity via file copying.

    I am using the File.Copy to copy files from one server to another. If the call succeeds, am I guaranteed that the data was copied successfully? Does the File.Copy method internally perform a file checksum or something like that to ensure that the data was written correctly?

    The File.Copy method uses the Win32 Copy­File function internally, so let's look at Copy­File.

    Copy­File just issues Read­File calls from the source file and Write­File calls to the destination file. (Note: Simplification for purposes of discussion.) It's not clear what you are hoping to checksum. If you want Copy­File to checksum the bytes when the return from Read­File, and checksum the bytes as they are passed to Write­File, and then compare them at the end of the operation, then that tells you nothing, since they are the same bytes in the same memory.

    while (...) {
     ReadFile(sourceFile, buffer, bufferSize);
     readChecksum.checksum(buffer, bufferSize);
     writeChecksum.checksum(buffer, bufferSize);
     WriteFile(destinationFile, buffer, buffer,Size);

    The read­Checksum and write­Checksum are identical because they operate on the same bytes. (In fact, the compiler might even optimize the code by merging the calculations together.) The only way something could go awry is if you have flaky memory chips that change memory values spontaneously.

    Maybe the question was whether Copy­File goes back and reads the file it just wrote out to calculate the checksum. But that's not possible in general, because you might not have read access on the destination file. I guess you could have it do a checksum if the destination were readable, and skip it if not, but then that results in a bunch of weird behavior:

    • It generates spurious security audits when it tries to read from the destination and gets ERROR_ACCESS_DENIED.
    • It means that Copy­File sometimes does a checksum and sometimes doesn't, which removes the value of any checksum work since you're never sure if it actually happened.
    • It doubles the network traffic for a file copy operation, leading to weird workarounds from network administrators like "Deny read access on files in order to speed up file copies."

    Even if you get past those issues, you have an even bigger problem: How do you know that reading the file back will really tell you whether the file was physically copied successfully? If you just read the data back, it may end up being read out of the disk cache, in which case you're not actually verifying physical media. You're just comparing cached data to cached data.

    But if you open the file with caching disabled, this has the side effect of purging the cache for that file, which means that the system has thrown away a bunch of data that could have been useful. (For example, if another process starts reading the file at the same time.) And, of course, you're forcing access to the physical media, which is slowing down I/O for everybody else.

    But wait, there's also the problem of caching controllers. Even when you tell the hard drive, "Now read this data from the physical media," it may decide to return the data from an onboard cache instead. You would have to issue a "No really, flush the data and read it back" command to the controller to ensure that it's really reading from physical media.

    And even if you verify that, there's no guarantee that the moment you declare "The file was copied successfully!" the drive platter won't spontaneously develop a bad sector and corrupt the data you just declared victory over.

    This is one of those "How far do you really want to go?" type of questions. You can re-read and re-validate as much as you want at copy time, and you still won't know that the file data is valid when you finally get around to using it.

    Sometimes, you're better off just trusting the system to have done what it says it did.

    If you really want to do some sort of copy verification, you'd be better off saving the checksum somewhere and having the ultimate consumer of the data validate the checksum and raise an integrity error if it discovers corruption.

  • The Old New Thing

    Why can't I set "Size all columns to fit" as the default?


    A customer wanted to know how to set Size all columns to fit as the default for all Explorer windows. (I found an MSDN forum thread on the same subject, and apparently, the inability to set Size all columns to fit as the default is "an enormous oversight and usability problem.")

    The confusion stems from the phrasing of the option; it's not clear whether it is a state or a verb. The option could mean

    • "Refresh the size of all the columns so that they fit the content" (verb)
    • "Maintain the size of all the columns so that they fit the content" (state)

    As it happens, the option is a verb, which means that it is not part of the state, and therefore can't be made the default. (The cue that it is a verb is that when you select it, you don't get a check-mark next to the menu option the next time you go to the menu.)

    Mind you, during the development cycle, we did try addressing the oversight part of the enormous oversight and usability problem, but we discovered that fixing the oversight caused an enormous usability problem.

    After changing Size all columns to fit from a verb to a state, the result was unusable: The constantly-changing column widths (which were often triggered spontaneously as the contents of the view were refreshed or updated) were unpredictable and consequently reduced user confidence since it's hard to have the confidence to click the mouse if there is an underlying threat that the thing you're trying to click will move around of its own volition.

    Based on this strong negative feedback, we changed it back to a verb. Now the columns shift around only when you tell them to.

    I find it interesting that even a decision that was made by actually implementing it and then performing actual usability research gets dismissed as something that was "an enormous oversight and usability problem."

    Sigh: Comments closed due to insults and name-calling.

  • The Old New Thing

    Raymond learns about some of the things people do to get banned on Xbox LIVE


    I still enjoy dropping in on Why Was I Banned? every so often, but not being a l33t Xbox haxxor, I don't understand a lot of the terminology. Fortunately, some of my colleagues were kind enough to explain them to me. (And now I'm explaining them to you so that you don't have to look as stupid asking them.)

    A modded lobby is a pre-game lobby (a server you connect to in order to find other people to play with or against) that has been modified (modded) with carefully-crafted parameters so that they grant people who visit them various advantages. For example, the reward for winning the game could be some absurd number of experience points. Sometimes the reward is granted merely for visiting the lobby; you don't need to actually play the game.

    A glitch lobby is a modded lobby that takes advantage of a bug (glitch) in the software. An infection lobby is a modded lobby that modifies (infects) your character so that the modification persists even after you leave the modded lobby and return to regular play.

    I mused that it would be interesting (if possibly ultimately a bad idea) to create a separate universe for all the modded accounts. You aren't banned from Xbox entirely, but your account has been moved permanently to the mod universe. You're allowed to play games only against other modded accounts. Soon, you will realize that other people are much better than modding than you, and the result is that the gameplay is totally unfair and not fun at all.

    And if you complain that the mod universe is totally unfair and no fun at all, then everybody laughs at you and you earn the IRONY badge.

    (At least until somebody comes up with a mod that removes the IRONY badge.)

  • The Old New Thing

    The case of the asynchronous copy and delete


    A customer reported some strange behavior in the Copy­File and Delete­File functions. They were able to reduce the problem to a simple test program, which went like this (pseudocode):

    // assume "a" is a large file, say, 1MB.
    while (true)
      // Try twice to copy the file
      if (!CopyFile("a", "b", FALSE)) {
        if (!CopyFile("a", "b", FALSE)) {
      // Try twice to delete the file
      if (!DeleteFile("b")) {
        if (!DeleteFile("b")) {

    When they ran the program, they found that sometimes the copy failed on the first try with error 5 (ERROR_ACCESS_DENIED) but if they waited a second and tried again, it succeeded. Similarly, sometimes the delete failed on the first try, but succeeded on the second try if you waited a bit.

    What's going on here? It looks like the Copy­File is returning before the file copy is complete, causing the Delete­File to fail because the copy is still in progress. Conversely, it looks like the Delete­File returns before the file is deleted, causing the Copy­File to fail because the destination exists.

    The operations Copy­File and Delete­File are synchronous. However, the NT model for file deletion is that a file is deleted when the last open handle is closed.¹ If Delete­File returns and the file still exists, then it means that somebody else still has an open handle to the file.

    So who has the open handle? The file was freshly created, so there can't be any pre-existing handles to the file, and we never open it between the copy and the delete.

    My psychic powers said, "The offending component is your anti-virus software."

    I can think of two types of software that goes around snooping on recently-created files. One of them is an indexing tool, but those tend not to be very aggressive about accessing files the moment they are created. They tend to wait until the computer is idle to do their work. Anti-virus software, however, runs in real-time mode, where they check every file as it is created. And that's more likely to be the software that snuck in and opened the file after the copy completes so it can perform a scan on it, and that open is the extra handle that is preventing the deletion from completing.

    But wait, aren't anti-virus software supposed to be using oplocks so that they can close their handle and get out of the way if somebody wants to delete the file?

    Well, um, yes, but "what they should do" and "what they actually do" are often not the same.

    We never did hear back from the customer whether the guess was correct, which could mean one of various things:

    1. They confirmed the diagnosis and didn't feel the need to reply.
    2. They determined that the diagnosis was incorrect but didn't bother coming back for more help, because "those Windows guys don't know what they're talking about."
    3. They didn't test the theory at all, so had nothing to report.

    We may never know what the answer is.


    ¹Every so often, the NT file system folks dream of changing the deletion model to be more Unix-like, but then they wonder if that would end up breaking more things than it fixes.

  • The Old New Thing

    How do you deal with an input stream that may or may not contain Unicode data?


    Dewi Morgan reinterpreted a question from a Suggestion Box of times past as "How do you deal with an input stream that may or may not contain Unicode data?" A related question from Dave wondered how applications that use CP_ACP to store data could ensure that the data is interpreted in the same code page by the recipient. "If I send a .txt file to a person in China, do they just go through code pages until it seems to display correctly?"

    These questions are additional manifestations of Keep your eye on the code page.

    When you store data, you need to have some sort of agreement (either explicit or implicit) with the code that reads the data as to how the data should be interpreted. Are they four-byte sign-magnitude integers stored in big-endian format? Are they two-byte ones-complement signed integers stored in little-endian format? Or maybe they are IEEE floating-point data stored in 80-bit format. If there is no agreement between the two parties, then confusion will ensue.

    That your data consists of text does not exempt you from this requirement. Is the text encoded in UTF-16LE? Or maybe it's UTF-8. Or perhaps it's in some other 8-bit character set. If the two sides don't agree, then there will be confusion.

    In the case of files encoded in CP_ACP, you have a problem if the source and destination have different values for CP_ACP. That text file you generate on a US-English system (where CP_ACP is 1252) may not make sense when decoded on a Chinese-Simplified system (where CP_ACP is 936). It so happens that all Windows 8-bit code pages agree on code points 0 through 127, so if you restrict yourself to that set, you are safe. The Windows shell team was not so careful, and they slipped some characters into a header file which are illegal when decoded in code page 932 (the CP_ACP used in Japan). The systems in Japan do not cycle through all the code pages looking for one that decodes without errors; they just use their local value of CP_ACP, and if the file makes no sense, then I guess it makes no sense.

    If you are in the unfortunate situation of having to consume data where the encoding is unspecified, you will find yourself forced to guess. And if you guess wrong, the result can be embarrassing.

    Bonus chatter: I remember one case where a customer asked, "We need to convert a string of chars into a string of wchars. What code page should we pass to the Multi­Byte­To­Wide­Char function?"

    I replied, "What code page is your char string in?"

    There was no response. I guess they realized that once they answered that question, they had their answer.

  • The Old New Thing

    Why don't the shortcuts I put in the CSIDL_COMMON_FAVORITES folder show up in the Favorites menu?


    A customer created some shortcuts in the CSIDL_COMMON_FAVORITES folder, expecting them to appear in the Favorites menu for all users. Instead, they appeared in the Favorites menu for no users. Why isn't CSIDL_COMMON_FAVORITES working?

    The CSIDL_COMMON_FAVORITES value was added at the same time as the other CSIDL_COMMON_* values, and its name strongly suggests that its relationship to CSIDL_FAVORITES is the same as the relationship between CSIDL_COMMON_STARTMENU and CSIDL_STARTMENU, or between CSIDL_COMMON_PROGRAMS and CSIDL_PROGRAMS, or between CSIDL_COMMON_DESKTOP­DIRECTORY and CSIDL_DESKTOP­DIRECTORY.

    That suggestion is a false one.

    In fact, CSIDL_COMMON_FAVORITES is not hooked up to anything. It's another of those vestigial values that got created with the intent of actually doing something but that thing never actually happened. I don't think any version of Internet Explorer ever paid any attention to that folder. Maybe the designers decided that it was a bad idea and cut the feature. Maybe it was an oversight. Whatever the reason, it's just sitting there wasting space.

    Sorry for the fake-out.

    Exercise: Another customer wanted to know why creating a %ALL­USERS­PROFILE%\Microsoft\Internet Explorer\Quick Launch directory and putting shortcuts into it did not result in those shortcuts appearing in every user's Quick Launch bar. Explain.

  • The Old New Thing

    I brought this process into the world, and I can take it out!


    Clipboard Gadget wants to know why normal processes can kill elevated processes via TerminateProcess, yet they cannot do a trivial Show­Window(hwnd, SW_MINIMIZE). "Only explorer seems to be able to do so somehow."

    There are several questions packed into this "suggestion." (As it happens, most of the suggestions are really questions; the suggestion is "I suggest that you answer my question." That's not really what I was looking for when I invited suggestions, but I accept that that's what I basically get.)

    First, why can normal processes kill elevated processes?

    The kernel-colored glasses answer is "because the security attributes for the process grants the logon user PROCESS_TERMINATE access."

    Of course, that doesn't really answer the question; it just moves it to another question: Why are elevated processes granting termination access to the logged-on user?

    I checked with the security folks on this one. The intent was to give the user a way to terminate a process that they elevated without having to go through another round of elevation. If the user goes to Task Manager, right-clicks the application, and then selects "Close", and the application doesn't respond to WM_CLOSE, then the "Oh dear, this isn't working, do you want to try harder?" dialog box would appear, and if the user says "Go ahead and nuke it," then we should go ahead and nuke it.

    Note that this extra permission is granted only if the process was elevated via the normal elevation user interface (which nitpickers will point out may not actually display anything if you have enabled silent elevation). The user was already a participant in elevating the process and already provided the necessary credentials to do so. You might say that elevating a process pre-approves it for being terminated. As Bill Cosby is credited with saying, "I brought you into this world, and I can take you out!"

    If the process was elevated by some means other than the user interface (for example, if it was started remotely or injected by a service), then this extra permission is not granted (because it is only the elevation user interface that grants it), and the old rules apply.

    Phew, that's part one of the question. Now part two: Why can't you do a trivial Show­Window(hwnd, SW_MINIMIZE)? Because that runs afoul of User Interface Privilege Isolation, which prevents low-integrity processes from manipulating the user interface of higher-integrity processes.

    My guess is that Clipboard Gadget though that terminating a process is a higher-privilege operation than being able to manipulate it. It isn't. Terminating a process prevents it from doing anything, which is different from being able to make it do anything you want. You might hire a chauffeur to drive you all over town in a limousine, and you can fire him at any time, but that doesn't mean that you can grab the wheel and drive the limousine yourself.

    Finally, Clipboard Gadget wants to know how Explorer can minimize windows. Explorer does not call Show­Window(hwnd, SW_MINIMIZE) to minimize windows, because Explorer is running at medium integrity and cannot manipulate the windows belonging to high-integrity processes. Instead, it posts a WM_SYS­COMMAND with the request SC_MINIMIZE. This does not minimize the window; it is merely a request to minimize the window. The application is free to ignore this request; for example, the application may have disabled its Minimize box. Most applications, however, accede to the request by minimizing the window. Just like how most chauffeurs will agree to take you to your destination along the route you specify. Unless your instructions involving going the wrong way down a one-way street or running over pedestrians.

    But don't fool yourself into thinking that you're driving the limousine.

  • The Old New Thing

    Data in crash dumps are not a matter of opinion


    A customer reported a problem with the System­Time­To­Tz­Specific­Local­Time function. (Gosh, why couldn't they have reported a problem with a function with a shorter name! Now I have to type that thing over and over again.)

    We're having a problem with the System­Time­To­Tz­Specific­Local­Time function. We call it like this:

                                     &sysTime, &localTime);

    On some but not all of our machines, our program crashes with the following call stack:

    ExceptionAddress: 77d4a0d0 (kernel32!SystemTimeToTzSpecificLocalTime+0x49)
       ExceptionCode: c0000005 (Access violation)
      ExceptionFlags: 00000000
    NumberParameters: 2
       Parameter[0]: 00000000
       Parameter[1]: 000000ac
    Attempt to read from address 000000ac

    This problem appears to occur only with the release build; the debug build does not have this problem. Any ideas?

    Notice that in the line of code the customer provided, they are not calling System­Time­To­Tz­Specific­Local­Time; they are instead calling some application-defined method with the same name, which takes different parameters from the system function.

    The customer apologized and included the source file they were using, as well as a crash dump file.

    void CContoso::ResetTimeZone()
     SYSTEMTIME sysTime, localTime;
     for (int timeZoneId = 1;
          timeZoneId < MAX_TIME_ZONES;
          timeZoneId++) {
      if (!s_pTimeZones->SystemTimeToTzSpecificLocalTime((BYTE)timeZoneId,
                                      &sysTime, &localTime)) {
      ... do something with localTime ...
    BOOL CTimeZones::SystemTimeToTzSpecificLocalTime(
        BYTE bTimeZoneID,
        LPSYSTEMTIME lpUniversalTime,
        LPSYSTEMTIME lpLocalTime)
        return ::SystemTimeToTzSpecificLocalTime(
            lpUniversalTime, lpLocalTime);

    According to the crash dump, the first parameter passed to CTime­Zones::System­Time­To­Tz­Specific­Local­Time was 1, and the m_pTimeZoneInfo member was nullptr. As a result, a bogus non-null pointer was passed as the first parameter to System­Time­To­Tz­Specific­Local­Time, which resulted in a crash when the function tried to dereference it.

    This didn't require any special secret kernel knowledge; all I did was look at the stack trace and the value of the member variable.

    So far, it was just a case of a lazy developer who didn't know how to debug their own code. But the reply from the customer was most strange:

    I don't think so, for two reasons.

    1. The exact same build on another machine does not crash, so it must be a machine-specific or OS-specific bug.
    2. The code in question has not changed in several months, so if the problem were in that code, we would have encountered it much earlier.

    I was momentarily left speechless by this response. It sounds like the customer simply refuses to believe the information that's right there in the crash dump. "La la la, I can't hear you."

    Memory values are not a matter of opinion. If you look in memory and find that the value 5 is on the stack, then the value 5 is on the stack. You can't say, "No it isn't; it's 6." You can have different opinions on how the value 5 ended up on the stack, but the fact that the value is 5 is beyond dispute.

    It's like a box of cereal that has been spilled on the floor. People may argue over who spilled the cereal, or who placed the box in such a precarious position in the first place, but to take the position "There is no cereal on the floor" is a pretty audacious move.

    Whether you like it or not, the value is not correct. You can't deny what's right there in the dump file. (Well, unless you think the dump file itself is incorrect.)

    A colleague studied the customer's code more closely and pointed out a race condition where the thread which calls CContoso::ResetTimeZone may do so before the CTimeZone object has allocated the m_pTimeZoneInfo array. And it wasn't anything particularly subtle either. It went like this, in pseudocode:

    s_pTimeZones = new CTimeZones;
    // the CTimeZones::Initialize method allocates m_pTimeZoneInfo
    // among other things

    The customer never wrote back once the bug was identified. Perhaps the sheer number of impossible things all happening at once caused their head to explode.

    I discussed this incident later with another colleague, who remarked

    Frequently, some problem X will occur, and the people debugging it will say, "The only way that problem X to occur is if we are in situation Y, but we know that situation Y is impossible, so we didn't bother investigating that possibility. Can you suggest another idea?"

    Yeah, I can suggest another idea. "The computer is always right." You already saw that problem X occurred. If the only way that problem X can occur is if you are in situation Y, then the first thing you should do is assume that you are in situation Y and work from there."

    Teaching people to follow this simple axiom has avoid a lot of fruitless misdirected speculative debugging. People seem hard-wired to prefer speculation, though, and it's common to slip back into forgetting simple logic.

    To put it another way:

    • If X, then Y.
    • X is true.
    • Y cannot possible be true.

    In order for these three statements to hold simultaneously, you must have found a fundamental flaw in the underlying axioms of logic as they have been understood for thousands of years.

    This is unlikely to be the case.

    Given that you have X right in front of you, X is true by observation. That leaves the other two statements. Maybe there's a case where X does not guarantee Y. Maybe Y is true after all.

    As Sherlock Holmes is famous for saying, "When you have eliminated the impossible, whatever remains, however improbable, must be the truth." But before you rule out the impossible, make sure it's actually impossible.

    Bonus chatter: Now that I've told you that the debugger never lies, I get to confuse you in a future entry by debugging a crash where the debugger lied. (Or at least wasn't telling the whole truth.)

Page 1 of 3 (26 items) 123