November, 2006

  • The Old New Thing

    I bet somebody got a really nice bonus for that feature

    • 132 Comments

    I often find myself saying, "I bet somebody got a really nice bonus for that feature."

    "That feature" is something aggressively user-hostile, like forcing a shortcut into the Quick Launch bar or the Favorites menu, like automatically turning on a taskbar toolbar, like adding an icon to the notification area that conveys no useful information but merely adds to the clutter, or (my favorite) like adding an extra item to the desktop context menu that takes several seconds to initialize and gives the user the ability to change some obscure feature of their video card.

    Allow me to summarize the guidance:

    The Quick Launch bar and Favorites menu belong to the user. There is intentionally no interface to manipulate shortcuts in the Quick Launch bar. We saw what happened to the Favorites menu and learned our lesson: Providing a programmatic interface to high-valued visual real estate results in widespread abuse. Of course, this doesn't stop people from hard-coding the path to the Quick Launch directory—too bad the name of the directory isn't always "Quick Launch"; the name can change based on what language the user is running. But that's okay, I mean, everybody speaks English, right?

    There is no programmatic interface to turn on a taskbar toolbar. Again, that's because the taskbar is a high-value piece of the screen and creating a programmatic interface can lead to no good. Either somebody is going to go in and force their toolbar on, or they're going to go in and force a rival's toolbar off. Since there's no programmatic interface to do this, these programs pull stunts like generating artificial user input to simulate the right-click on the taskbar, mousing to the "Toolbars" menu item, and then selecting the desired toolbar. The taskbar context menu will never change, right? Everybody speaks English, right?

    The rule for taskbar notifications is that they are there to, well, notify the user of something. Your print job is done. Your new hardware device is ready to use. A wireless network has come into range. You do not use a notification icon to say "Everything is just like it was a moment ago; nothing has changed." If nothing has changed, then say nothing.

    Many people use the notification area to provide quick access to a running program, which runs counter to the guidance above. If you want to provide access to a program, put a shortcut on the Start menu. Doesn't matter whether the program is running already or not. (If it's not running, the Start menu shortcut runs it. If it is already running, the Start menu shortcut runs the program, which recognizes that it's already running and merely activates the already-running copy.)

    While I'm here, I may as well remind you of the guidance for notification balloons: A notification balloon should only appear if there is something you want the user to do. It must be actionable.

    BalloonAction
    Your print job is complete. Go pick it up.
    Your new hardware device is ready to use. Start using it.
    A wireless network has come into range. Connect to it.

    The really good balloons will tell the user what the expected action is. "A wireless network has come into range. Click here to connect to it." (Emphasis mine.)

    Here are some bad balloons:

    Bad BalloonAction?
    Your screen settings have been restored. So what do you want me to do about it?
    Your virtual memory swap file has been automatically adjusted. If it's automatic, what do I need to do?
    Your clock has been adjusted for daylight saving time. Do you want me to change it back?
    Updates are ready for you to install. So?

    One of my colleagues got a phone call from his mother asking him what she she should do about a new error message that wouldn't go away. It was the "Updates are ready for you to install" balloon. The balloon didn't say what she should do next.

    The desktop context menu extensions are the worst, since the ones I've seen come from video card manufacturers that provide access to something you do maybe once when you set up the card and then don't touch thereafter. I mean, do normal users spend a significant portion of their day changing their screen resolution and color warmth? (Who on a laptop would even want to change their screen resolution?) What's worse is that one very popular such extension adds an annoying two second delay to the appearance of the desktop context menu, consuming 100% CPU during that time. If you have a laptop with a variable-speed fan, you can hear it going nuts for a few seconds each time you right-click the desktop. Always good to chew up battery life initializing a context menu that nobody on a laptop would use anyway.

    The thing is, all of these bad features were probably justified by some manager somewhere because it's the only way their feature would get noticed. They have to justify their salary by pushing all these stupid ideas in the user's faces. "Hey, look at me! I'm so cool!" After all, when the boss asks, "So, what did you accomplish in the past six months," a manager can't say, "Um, a bunch of stuff you can't see. It just works better." They have to say, "Oh, check out this feature, and that icon, and this dialog box." Even if it's a stupid feature.

    As my colleague Michael Grier put it, "Not many people have gotten a raise and a promotion for stopping features from shipping."

  • The Old New Thing

    It's not surprising at all that people search for Yahoo

    • 99 Comments

    Earlier this year, one columnist was baffled as to why "Yahoo" was the most searched-for term on Google. I wasn't baffled at all. Back in 2001, Alexa published the top ten most searched-for terms on their service, and four of the top ten were URLs: yahoo.com, hotmail.com, aol.com, and ebay.com.

    A lot of people simply don't care to learn the difference between the search box and the address bar. "If I type what I want into this box here, I sometimes get a strange error message. But if I type it into that box there, then I get what I want. Therefore, I'll use that box there for everything." And you know what? It doesn't bother me that they don't care. In fact, I think it's good that they don't care. Computers should adapt to people, not the other way around.

    You can try to explain to these people, "You see, this is a URL, so you type it into the address box. But that is a search phrase, so you type it into the search box."

    "You-are-what? Look, I don't care about your fancy propeller-beanie acronyms. You computer types are always talking about how computers are so easy to use, and then you make up these arbitrary rules about where I'm supposed to type things. If I want something, I type into this box and click 'Search'. And it finds it. Watch. I want Yahoo, so I type 'yahoo' into the box, and boom, there it is. I have a system that works. Why are you trying to make my life more confusing?"

    I remember attending a presentation by the MSN Explorer team on what they learned about how people use a web browser. They found many situations where people failed to accomplish their desired task because they typed the right thing into the wrong box. But instead of trying to teach people which box to type it in, they just expanded the definition of "right". You typed your query into the wrong box? No problem, we'll just pretend you typed it into the correct box. In fact, let's just get rid of all these special-purposes boxes. Whatever you want, just type it into this box, and we'll get it for you.

    I wish the phone company would learn this. Sometimes I'll dial a telephone number and I'll get an automated recording that says, "I'm sorry. You must dial a '1' before the number. Please hang up and try again." Or "I'm sorry. You must not dial a '1' before the number. Please hang up and try again." That's because in the state of Washington, there are complicated rules about when you have to dial a "1" in front of the number and when you don't. (Fortunately, the rule on when you have to dial the area code is easier to remember: If the area code you are calling is the same as the area code you are dialing from, then you can omit the area code.) For example, suppose your home number is 425-882-xxxx. Here's how you have to dial the following numbers:

    To call this numberyou dial
    425-202-xxxx202-xxxx
    425-203-xxxx1-203-xxxx
    206-346-xxxx206-346-xxxx
    206-347-xxxx1-206-347-xxxx

    If you get it wrong, the voice comes on the line to tell you. Hey, since you know what I did wrong and you know what I meant to do, why not just fix it? If I dial a number and forget the "1", just insert the 1 and connect the call. If I dial a number and include the "1" when I didn't need to, just delete the 1 and connect the call. Don't make me have to look up in the book whether I need a 1 or not. (In the front of the phone book are tables showing which numbers need a "1" and which don't. I hate those tables.)

    (Yes, I know there are weird technical/legal reasons for why I have to dial the phone in four different ways depending on whom I want to call. But it's still wrong that these technical/legal reasons mean that the rules for dialing a telephone are impossibly complicated.)

  • The Old New Thing

    On the importance of backwards compatibility for large corporations

    • 62 Comments

    Representatives from the IT department of a major worldwide corporation came to Redmond and took time out of their busy schedule to give a talk on how their operations are set up. I was phenomenally impressed. These people know their stuff. Definitely a world-class operation.

    One of the tidbits of information they shared with us is some numbers about the programs they have to support. Their operations division is responsible for 9,000 different install scripts for their employees around the world.

    That was not a typo.

    Nine thousand.

    This highlighted for me the fact that backwards compatibility is crucial for adoption in the corporate world. Do the math. Suppose they could install, test and debug ten programs each business day, in my opinion, a very optimistic estimate. Even at that rate, it would take them three years to get through all their scripts.

    This isn't a company that bought some software ten years ago and don't have the source code. They have the source code for all of their scripts. They have people who understand how the scripts work. They are not just on the ball; they are all over the ball. And even then, it would take them three years to go through and check (and possibly fix) each one.

    Oh, did I mention that four hundred of those programs are 16-bit?

  • The Old New Thing

    Converting an HRESULT to a Win32 error code: Diagram and answer to exercise

    • 60 Comments

    Here's the diagram from How do I convert an HRESULT to a Win32 error code?. If you are offended by VML, cover your ears and hum for a while.

    Win32 HRESULT

    The little sliver at the top is the mapping of zero to zero. The big white box at the bottom is the mapping of all negative numbers to corresponding negative numbers. And the rainbow represents the mapping of all the positive values, mod 65536, into the range 0x80070000 through 0x8007FFFF.

    Now let's take a look at that puzzle I left behind:

    Sometimes, when I import data from a scanner, I get the error "The directory cannot be removed." What does this mean?

    My psychic powers told me that the customer was doing something like this (error checking deleted):

    ReportError(HWND hwnd, HRESULT hr)
    {
     DWORD dwError = HRESULT_CODE(hr);
     TCHAR szMessage[256];
     FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM, NULL,
                   dwError, 0, szMessage, 256, NULL);
     MessageBox(hwnd, szMessage, TEXT("Error"), MB_OK);
    }
    

    and that the actual HRESULT was WIA_ERROR_COVER_OPEN, which is defined as

    #define WIA_ERROR_COVER_OPEN MAKE_HRESULT(SEVERITY_ERROR, FACILITY_WIA, 16)
    

    Passing this value to HRESULT_CODE would yield 16, which maps to

    //
    // MessageId: ERROR_CURRENT_DIRECTORY
    //
    // MessageText:
    //
    //  The directory cannot be removed.
    //
    #define ERROR_CURRENT_DIRECTORY          16L
    

    And that would explain why the customer reported this strange error when reading data from a scanner.

  • The Old New Thing

    A fork is an easy-to-find nonstandard USB device

    • 43 Comments

    Remember the Ten Immutable Laws of Security. Today, we're going to talk about number three: If a bad guy has unrestricted physical access to your computer, it's not your computer any more.

    There was a bug which floated past my field of vision many months ago that went something like this: "I found a critical security bug in the USB stack. If somebody plugs in a USB device which emits a specific type of malformed packet during a specific step in the protocol, then the USB driver crashes. This is a denial of service that should be accorded critical security status."

    Now, it's indeed the case that the driver should not crash when handed a malformed USB packet, and the bug should certainly be fixed. (That said, I'm sure some people will manage to interpret this article as advocating not fixing the bug.) But let's look at the prerequisites for this bug to manifest itself: The attacker needs to build a USB device that is intentionally out of specification in one particular way and plug that device into a vulnerable machine. While that's certainly possible, it's a lot of work for your typical hacker to burn a custom EEPROM with USB firmware that manages to hit the precise conditions necessary to trigger the driver bug.

    It's much easier just to grab a fork.

    You see, since this attack requires physical access to a USB port, you may as well attack the machine in a much more direct manner that doesn't require you to spend hours with a soldering gun and a circuit board: Just grab a fork and jam it into the USB port. I haven't tried it, but I suspect that will crash the machine pretty effectively, too. If you can't get the fork to work, pouring a glass of water into the USB port will probably seal the deal.

    Doron tells me that some companies address this problem by removing physical access: They fill the USB ports on all their machines with epoxy.

    Update: Randy Aull tells me that the USB 2.0 specification anticipated the fork attack and requires that all transceivers be able to withstand short circuits "of D+ and/or D- to VBUS, GND, other data lines, or the cable shield at the connector, for a minimum of 24 hours." (Though I'm not sure if that also covers shorting VBUS to GND.) I wonder if they also have a paragraph specifying that USB devices must also withstand water immersion... Of course, you could still use that fork to push the power button or jam it into an outlet on the same circuit as the computer you want to take down in order to blow a fuse.

  • The Old New Thing

    There's going to be an awful lot more overclocking out there

    • 39 Comments

    Last year, I told the story of overclocking being the source of a lot of mysterious crashes and that some of those overclocked machines were overclocked at the store. These machines came from small, independent shops rather than the major manufacturers.

    Well it looks like that's about to change.

    Gateway's FX530 Desktop can be ordered overclocked by the manufacturer.

    Just say no to DIY overclocking and let us do it for you! We'll factory overclock your Intel® quad-core processor.4 Yep, you read that right: factory overclock, which is something that most other major PC manufacturers don't do.

    We live in interesting times.

  • The Old New Thing

    How do I test that return value of ShellExecute against 32?

    • 39 Comments

    We discussed earlier the history behind the the return value of the ShellExecute function, and why its value in Win32 is meaningless aside from testing it against the value 32 to determine whether an error occurred.

    How, then, should you check for errors?

    Let's turn the question around. How would you, the implementor of the ShellExecute function, report success? The ShellExecute is a very popular function, so you have to prepared for the ways people check the return code incorrectly yet manage to work in spite of themselves. The goal, therefore, is to report success in a manner that breaks as few programs as possible.

    (Now, there may be those of you who say, "Hang compatibility. If programs checked the return value incorrectly, then they deserve to stop working!" If you choose to go in that direction, then be prepared for the deluge of compatibility bugs to be assigned to you to fix. And they're going to come from a grumpy compatibility testing team because they will have spent a long time just finding out that the problem was that the program was checking the return value of ShellExecute incorrectly.)

    Since there is still 16-bit code out there that may thunk up to 32-bit code, you probably don't want to return a value greater than 0xFFFF. Otherwise, when that value gets truncated to a 16-bit HINSTANCE will lose the high word. If you returned a value like 0x00010001, this would truncate to 0x0001, which would be treated as an error code.

    For similar reasons, the 64-bit implementation of the ShellExecute function had better not use the upper 32 bits of the return value. Code that casts the return value to int will lose the high 32 bits.

    Furthermore, you shouldn't return a value that, when cast to an integer, results in a negative number. Some people will use a signed comparison against 32; others will use an unsigned comparison. If you returned a value like -5, then the people who used a signed comparison would think the function failed, whereas those who used an unsigned comparison would think it succeeded.

    By the same logic, the value you choose as the return value should not result in a negative number when cast to a 16-bit integer. If the return value is passed to a 16-bit caller that casts the result to an integer and compares against 32, you want consistent results independent of whether the 16-bit caller used a signed or unsigned comparison.

    Edge conditions are tricky, so you don't want to return the value 32 exactly. If you look at code that checks the return value from ShellExecute, you'll probably find that the world is split as to whether 32 is an error code or not. So it'd be in your best interest not to return the value 32 exactly but rather a value larger than 32.

    So far, you're constrained to choosing a value in the range 33–32767.

    Finally, you might be a fan of Douglas Adams. (Most geeks are.) The all-important number 42 fits into this range. Your choice of return value, therefore, might be (HINSTANCE)42.

    Going back to the original question: How should I check the return value of ShellExecute for errors? MSDN says you can cast the result to an integer and compare the result against 32. That'll work fine. You could cast in the other direction, comparing the return value against (HINSTANCE)32. That'll work fine, too. Or you could cast the result to an INT_PTR and compare the result against 32. That's fine, too. They'll all work, because the implementor of the ShellExecute function had to plan ahead for you and all the other people who call the ShellExecute function.

  • The Old New Thing

    The quiet dream of placebo settings

    • 35 Comments

    Back in the Windows 95 days, people swore that increasing the value of MaxBPs in the system.ini file fixed application errors. People usually made up some pseudo-scientific explanation for why this fixed crashes. These explanations were complete rot.

    These breakpoints had nothing to do with Windows applications. They were used by 32-bit device drivers to communicate with code in MS-DOS boxes, typically the 16-bit driver they are trying to take over from or are otherwise coordinating their activities with. A bunch of these are allocated at system startup when drivers settle themselves in, and on occasion, a driver might patch a breakpoint temporarily into DOS memory, removing it when the breakpoint is hit (or when the breakpoint is no longer needed). Increasing this value had no effect on Windows application.

    I fantasized about adding a "Performance" page to Tweak UI with an option to increase the number of "PlaceBOs". I would make up some nonsense text about this setting controlling how high in memory the system should place its "breakpoint opcodes". Placing them higher will free up memory for other purposes and reduce the frequency of "Out of memory" errors. Or something like that.

    I was reminded of this story by my pals in products support who were trying to come up with a polite way of explaining to their customer that there is no /7GB boot.ini switch. In other situations, they sometimes dream of shipping placebo.dll to a customer to solve their problem.

    (And by the way, the technical reason why the user-mode address space is limited to eight terabytes was given by commenter darwou: The absence of a 16-byte atomic compare-and-exchange instruction means that bits need to be sacrificed to encode the sequence number which avoids the ABA problem.)

  • The Old New Thing

    The window manager moves the mouse; applications choose the cursor

    • 34 Comments

    You can sometimes narrow down the source of a problem just by looking at the screen and moving the mouse.

    When you move the mouse, the cursor on the screen moves to match. This work is done in the window manager in kernel mode. The mouse hardware notifies the window manager, "Hey, I moved left twenty units." The window manager takes this value, accelerates or decelerates it according to your mouse acceleration settings, calls any low-level mouse hooks that are installed, and then tells the display driver, "Move that sprite left about thirty pixels" (say). It then sets the "the mouse moved" flag so that the program who owns the window under the new mouse position will get a WM_MOUSEMOVE message. The window manager also sets the cursor to the "virtual cursor state" corresponding to the window beneath the cursor. The "virtual cursor state" remembers the cursor that the thread (or threads, if input has been attached) responsible for the window most recently set. Maintaining the virtual cursor state is important, for if a thread calls SetCursor to change the cursor to an hourglass and then stops processing messages (because it is busy), you really want the cursor to change back to an hourglass when it moves over the thread's windows.

    What does it mean if the cursor doesn't move at all when you move the mouse? Could it be caused by an application? If you read through the flowchart I described above, the only place applications get involved in the "move the mouse cursor" code flow is if they are filtering out the mouse motion in a low-level mouse hook. (Another way an application can "lock up" the mouse is by calling the ClipCursor function, but vanishingly few applications do this. I'm assuming you aren't the victim of malicious software but instead are trying to figure out what program, if any, is accidentally freezing the mouse.)

    Low-level mouse hooks are comparatively uncommon since they exact a high performance penalty on the system. If you're moving your mouse and don't see the cursor move around on the screen, my guess is that there is a problem in the kernel-mode side of the equation. If you're seeing the entire system freeze up, then it's probably a device driver that has started acting up and held a lock for too long.

    A flaky hard drive can have the same effect. If the window manager itself takes a page fault, it has to wait for the hard drive to page in the data. and if the window manager happened to be holding a lock when this happened, that lock is held across the entire I/O operation. If your hard drive is flaky and, say, takes ten seconds to produce a sector of data instead of several milliseconds, then it will look like the system has frozen for ten seconds, since the window manager is stuck waiting on your disk, which is in turn grunting and recalibrating in a desperate attempt to produce the data the memory manager requested.

    In other words: If the cursor won't move, it's likely a driver or hardware problem. (Figuring out which driver/hardware will require hooking up a kernel debugger and poking around. Not for the faint of heart.)

  • The Old New Thing

    It takes only one program to foul an upgrade

    • 32 Comments

    "Worst software ever." That was Aaron Zupancic's cousin's reaction to the fact that Windows XP was incompatible with one program originally designed for Windows 98.

    Then again, commenter Aargh! says "The bad code should be fixed, period. If it can't be fixed, it breaks, too bad." Perhaps Aargh! can send a message to Aaron's cousin saying, "Too bad." I'm sure that'll placate Aaron's cousin.

Page 1 of 3 (28 items) 123