November, 2006

  • The Old New Thing

    I bet somebody got a really nice bonus for that feature

    • 132 Comments

    I often find myself saying, "I bet somebody got a really nice bonus for that feature."

    "That feature" is something aggressively user-hostile, like forcing a shortcut into the Quick Launch bar or the Favorites menu, like automatically turning on a taskbar toolbar, like adding an icon to the notification area that conveys no useful information but merely adds to the clutter, or (my favorite) like adding an extra item to the desktop context menu that takes several seconds to initialize and gives the user the ability to change some obscure feature of their video card.

    Allow me to summarize the guidance:

    The Quick Launch bar and Favorites menu belong to the user. There is intentionally no interface to manipulate shortcuts in the Quick Launch bar. We saw what happened to the Favorites menu and learned our lesson: Providing a programmatic interface to high-valued visual real estate results in widespread abuse. Of course, this doesn't stop people from hard-coding the path to the Quick Launch directory—too bad the name of the directory isn't always "Quick Launch"; the name can change based on what language the user is running. But that's okay, I mean, everybody speaks English, right?

    There is no programmatic interface to turn on a taskbar toolbar. Again, that's because the taskbar is a high-value piece of the screen and creating a programmatic interface can lead to no good. Either somebody is going to go in and force their toolbar on, or they're going to go in and force a rival's toolbar off. Since there's no programmatic interface to do this, these programs pull stunts like generating artificial user input to simulate the right-click on the taskbar, mousing to the "Toolbars" menu item, and then selecting the desired toolbar. The taskbar context menu will never change, right? Everybody speaks English, right?

    The rule for taskbar notifications is that they are there to, well, notify the user of something. Your print job is done. Your new hardware device is ready to use. A wireless network has come into range. You do not use a notification icon to say "Everything is just like it was a moment ago; nothing has changed." If nothing has changed, then say nothing.

    Many people use the notification area to provide quick access to a running program, which runs counter to the guidance above. If you want to provide access to a program, put a shortcut on the Start menu. Doesn't matter whether the program is running already or not. (If it's not running, the Start menu shortcut runs it. If it is already running, the Start menu shortcut runs the program, which recognizes that it's already running and merely activates the already-running copy.)

    While I'm here, I may as well remind you of the guidance for notification balloons: A notification balloon should only appear if there is something you want the user to do. It must be actionable.

    BalloonAction
    Your print job is complete. Go pick it up.
    Your new hardware device is ready to use. Start using it.
    A wireless network has come into range. Connect to it.

    The really good balloons will tell the user what the expected action is. "A wireless network has come into range. Click here to connect to it." (Emphasis mine.)

    Here are some bad balloons:

    Bad BalloonAction?
    Your screen settings have been restored. So what do you want me to do about it?
    Your virtual memory swap file has been automatically adjusted. If it's automatic, what do I need to do?
    Your clock has been adjusted for daylight saving time. Do you want me to change it back?
    Updates are ready for you to install. So?

    One of my colleagues got a phone call from his mother asking him what she she should do about a new error message that wouldn't go away. It was the "Updates are ready for you to install" balloon. The balloon didn't say what she should do next.

    The desktop context menu extensions are the worst, since the ones I've seen come from video card manufacturers that provide access to something you do maybe once when you set up the card and then don't touch thereafter. I mean, do normal users spend a significant portion of their day changing their screen resolution and color warmth? (Who on a laptop would even want to change their screen resolution?) What's worse is that one very popular such extension adds an annoying two second delay to the appearance of the desktop context menu, consuming 100% CPU during that time. If you have a laptop with a variable-speed fan, you can hear it going nuts for a few seconds each time you right-click the desktop. Always good to chew up battery life initializing a context menu that nobody on a laptop would use anyway.

    The thing is, all of these bad features were probably justified by some manager somewhere because it's the only way their feature would get noticed. They have to justify their salary by pushing all these stupid ideas in the user's faces. "Hey, look at me! I'm so cool!" After all, when the boss asks, "So, what did you accomplish in the past six months," a manager can't say, "Um, a bunch of stuff you can't see. It just works better." They have to say, "Oh, check out this feature, and that icon, and this dialog box." Even if it's a stupid feature.

    As my colleague Michael Grier put it, "Not many people have gotten a raise and a promotion for stopping features from shipping."

  • The Old New Thing

    It's not surprising at all that people search for Yahoo

    • 99 Comments

    Earlier this year, one columnist was baffled as to why "Yahoo" was the most searched-for term on Google. I wasn't baffled at all. Back in 2001, Alexa published the top ten most searched-for terms on their service, and four of the top ten were URLs: yahoo.com, hotmail.com, aol.com, and ebay.com.

    A lot of people simply don't care to learn the difference between the search box and the address bar. "If I type what I want into this box here, I sometimes get a strange error message. But if I type it into that box there, then I get what I want. Therefore, I'll use that box there for everything." And you know what? It doesn't bother me that they don't care. In fact, I think it's good that they don't care. Computers should adapt to people, not the other way around.

    You can try to explain to these people, "You see, this is a URL, so you type it into the address box. But that is a search phrase, so you type it into the search box."

    "You-are-what? Look, I don't care about your fancy propeller-beanie acronyms. You computer types are always talking about how computers are so easy to use, and then you make up these arbitrary rules about where I'm supposed to type things. If I want something, I type into this box and click 'Search'. And it finds it. Watch. I want Yahoo, so I type 'yahoo' into the box, and boom, there it is. I have a system that works. Why are you trying to make my life more confusing?"

    I remember attending a presentation by the MSN Explorer team on what they learned about how people use a web browser. They found many situations where people failed to accomplish their desired task because they typed the right thing into the wrong box. But instead of trying to teach people which box to type it in, they just expanded the definition of "right". You typed your query into the wrong box? No problem, we'll just pretend you typed it into the correct box. In fact, let's just get rid of all these special-purposes boxes. Whatever you want, just type it into this box, and we'll get it for you.

    I wish the phone company would learn this. Sometimes I'll dial a telephone number and I'll get an automated recording that says, "I'm sorry. You must dial a '1' before the number. Please hang up and try again." Or "I'm sorry. You must not dial a '1' before the number. Please hang up and try again." That's because in the state of Washington, there are complicated rules about when you have to dial a "1" in front of the number and when you don't. (Fortunately, the rule on when you have to dial the area code is easier to remember: If the area code you are calling is the same as the area code you are dialing from, then you can omit the area code.) For example, suppose your home number is 425-882-xxxx. Here's how you have to dial the following numbers:

    To call this numberyou dial
    425-202-xxxx202-xxxx
    425-203-xxxx1-203-xxxx
    206-346-xxxx206-346-xxxx
    206-347-xxxx1-206-347-xxxx

    If you get it wrong, the voice comes on the line to tell you. Hey, since you know what I did wrong and you know what I meant to do, why not just fix it? If I dial a number and forget the "1", just insert the 1 and connect the call. If I dial a number and include the "1" when I didn't need to, just delete the 1 and connect the call. Don't make me have to look up in the book whether I need a 1 or not. (In the front of the phone book are tables showing which numbers need a "1" and which don't. I hate those tables.)

    (Yes, I know there are weird technical/legal reasons for why I have to dial the phone in four different ways depending on whom I want to call. But it's still wrong that these technical/legal reasons mean that the rules for dialing a telephone are impossibly complicated.)

  • The Old New Thing

    How do I convert an HRESULT to a Win32 error code?

    • 32 Comments

    Everybody knows that you can use the HRESULT_FROM_WIN32 macro to convert a Win32 error code to an HRESULT, but how do you do the reverse?

    Let's look at the definition of HRESULT_FROM_WIN32:

    #define HRESULT_FROM_WIN32(x) \
      ((HRESULT)(x) <= 0 ? ((HRESULT)(x)) \
    : ((HRESULT) (((x) & 0x0000FFFF) | (FACILITY_WIN32 << 16) | 0x80000000)))
    

    If the value is less than or equal to zero, then the macro returns the value unchanged. Otherwise, it takes the lower sixteen bits and combines them with FACILITY_WIN32 and SEVERITY_ERROR.

    How do you reverse this process? How do you write the function WIN32_FROM_HRESULT?

    It's impossible to write that function since the mapping provided by the HRESULT_FROM_WIN32 function is not one-to-one. I leave as an execise to draw the set-to-set mapping diagram from DWORD to HRESULT. (Original diagram removed since people hate VML so much, and I can't use SVG since it requies XHTML.) If you do it correctly, you'll have a single line which maps 0 to S_OK, and a series of blocks that map blocks of 65536 error codes into the same HRESULT space.

    Notice that the values in the range 1 through 0x7FFFFFFFF are impossible results from the HRESULT_FROM_WIN32 macro. Furthermore, values in the range 0x80070000 through 0x8007FFFF could have come from quite a few original Win32 codes; you can't pick just one.

    But let's try to write the reverse function anyway:

    BOOL WIN32_FROM_HRESULT(HRESULT hr, OUT DWORD *pdwWin32)
    {
     if ((hr & 0xFFFF0000) == MAKE_HRESULT(SEVERITY_ERROR, FACILITY_WIN32)) {
      // Could have come from many values, but we choose this one
      *pdwWin32 = HRESULT_CODE(hr);
      return TRUE;
     }
     if (hr == S_OK) {
      *pdwWin32 = HRESULT_CODE(hr);
      return TRUE;
     }
     // otherwise, we got an impossible value
     return FALSE;
    }
    

    Of course, we could have been petulant and just written

    BOOL WIN32_FROM_HRESULT_alternate(HRESULT hr, OUT DWORD *pdwWin32)
    {
     if (hr < 0) {
      *pdwWin32 = (DWORD)hr;
      return TRUE;
     }
     // otherwise, we got an impossible value
     return FALSE;
    }
    

    because the HRESULT_FROM_WIN32 macro is idempotent: HRESULT_FROM_WIN32(HRESULT_FROM_WIN32(x)) == HRESULT_FROM_WIN32(x). Therefore you would be technically correct if you declared that the "inverse" function was trivial. But in practice, people want to try to get "x" back out, so that's what we give you.

    Now that you understand how the HRESULT_FROM_WIN32 macro works, you can answer this question, based on an actual customer question:

    Sometimes, when I import data from a scanner, I get the error "The directory cannot be removed." What does this mean?

    You will have to use some psychic powers, but I think you're up to it.

    One unfortunate aspect of both HRESULTs and Win32 error codes is that there is no single header file that contains all the errors. This is understandable from a logistical point of view: Multiple teams need to make up new error codes for their components, but the winerror.h file is maintained by the kernel team. If winerror.h were selected to be the master repository for all error codes, it means that any team that wanted to add a new error code or change an existing one would have to pester the kernel team to make the change for them. Things get even more complicated if those teams have their own SDK. For example, suppose both the DirectX and Windows Media teams wanted to include the new winerror.h in their corresponding SDKs. If you install the SDKs in the wrong order (and how are you supposed to know which should be installed first, DirectX 8 or WMSDK 6?), you can end up regressing your winerror.h file. It's the version conflict problem, but without the benefit of version resources.

    Many teams have prevailed upon the kernel team to reserve a chunk of error codes just for them.

    Networking2100–2999
    Cluster5000–5999
    Traffic Control7500–7999
    Active Directory8000–8999
    DNS9000–9999
    Winsock10000–11999
    IPSec13000–13999
    Side By Side14000–14999

    There is room for only 65535 Win32 error codes, and over an eighth of them have already been carved out by these "block assignments". I wonder if we will eventually run out of error codes prematurely due to having given away error codes in too-large chunks. (Some sort of analogy with IPv4 could be made here but I'm not going to try.)

  • The Old New Thing

    On the importance of backwards compatibility for large corporations

    • 62 Comments

    Representatives from the IT department of a major worldwide corporation came to Redmond and took time out of their busy schedule to give a talk on how their operations are set up. I was phenomenally impressed. These people know their stuff. Definitely a world-class operation.

    One of the tidbits of information they shared with us is some numbers about the programs they have to support. Their operations division is responsible for 9,000 different install scripts for their employees around the world.

    That was not a typo.

    Nine thousand.

    This highlighted for me the fact that backwards compatibility is crucial for adoption in the corporate world. Do the math. Suppose they could install, test and debug ten programs each business day, in my opinion, a very optimistic estimate. Even at that rate, it would take them three years to get through all their scripts.

    This isn't a company that bought some software ten years ago and don't have the source code. They have the source code for all of their scripts. They have people who understand how the scripts work. They are not just on the ball; they are all over the ball. And even then, it would take them three years to go through and check (and possibly fix) each one.

    Oh, did I mention that four hundred of those programs are 16-bit?

  • The Old New Thing

    How do I test that return value of ShellExecute against 32?

    • 39 Comments

    We discussed earlier the history behind the the return value of the ShellExecute function, and why its value in Win32 is meaningless aside from testing it against the value 32 to determine whether an error occurred.

    How, then, should you check for errors?

    Let's turn the question around. How would you, the implementor of the ShellExecute function, report success? The ShellExecute is a very popular function, so you have to prepared for the ways people check the return code incorrectly yet manage to work in spite of themselves. The goal, therefore, is to report success in a manner that breaks as few programs as possible.

    (Now, there may be those of you who say, "Hang compatibility. If programs checked the return value incorrectly, then they deserve to stop working!" If you choose to go in that direction, then be prepared for the deluge of compatibility bugs to be assigned to you to fix. And they're going to come from a grumpy compatibility testing team because they will have spent a long time just finding out that the problem was that the program was checking the return value of ShellExecute incorrectly.)

    Since there is still 16-bit code out there that may thunk up to 32-bit code, you probably don't want to return a value greater than 0xFFFF. Otherwise, when that value gets truncated to a 16-bit HINSTANCE will lose the high word. If you returned a value like 0x00010001, this would truncate to 0x0001, which would be treated as an error code.

    For similar reasons, the 64-bit implementation of the ShellExecute function had better not use the upper 32 bits of the return value. Code that casts the return value to int will lose the high 32 bits.

    Furthermore, you shouldn't return a value that, when cast to an integer, results in a negative number. Some people will use a signed comparison against 32; others will use an unsigned comparison. If you returned a value like -5, then the people who used a signed comparison would think the function failed, whereas those who used an unsigned comparison would think it succeeded.

    By the same logic, the value you choose as the return value should not result in a negative number when cast to a 16-bit integer. If the return value is passed to a 16-bit caller that casts the result to an integer and compares against 32, you want consistent results independent of whether the 16-bit caller used a signed or unsigned comparison.

    Edge conditions are tricky, so you don't want to return the value 32 exactly. If you look at code that checks the return value from ShellExecute, you'll probably find that the world is split as to whether 32 is an error code or not. So it'd be in your best interest not to return the value 32 exactly but rather a value larger than 32.

    So far, you're constrained to choosing a value in the range 33–32767.

    Finally, you might be a fan of Douglas Adams. (Most geeks are.) The all-important number 42 fits into this range. Your choice of return value, therefore, might be (HINSTANCE)42.

    Going back to the original question: How should I check the return value of ShellExecute for errors? MSDN says you can cast the result to an integer and compare the result against 32. That'll work fine. You could cast in the other direction, comparing the return value against (HINSTANCE)32. That'll work fine, too. Or you could cast the result to an INT_PTR and compare the result against 32. That's fine, too. They'll all work, because the implementor of the ShellExecute function had to plan ahead for you and all the other people who call the ShellExecute function.

  • The Old New Thing

    Converting an HRESULT to a Win32 error code: Diagram and answer to exercise

    • 60 Comments

    Here's the diagram from How do I convert an HRESULT to a Win32 error code?. If you are offended by VML, cover your ears and hum for a while.

    Win32 HRESULT

    The little sliver at the top is the mapping of zero to zero. The big white box at the bottom is the mapping of all negative numbers to corresponding negative numbers. And the rainbow represents the mapping of all the positive values, mod 65536, into the range 0x80070000 through 0x8007FFFF.

    Now let's take a look at that puzzle I left behind:

    Sometimes, when I import data from a scanner, I get the error "The directory cannot be removed." What does this mean?

    My psychic powers told me that the customer was doing something like this (error checking deleted):

    ReportError(HWND hwnd, HRESULT hr)
    {
     DWORD dwError = HRESULT_CODE(hr);
     TCHAR szMessage[256];
     FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM, NULL,
                   dwError, 0, szMessage, 256, NULL);
     MessageBox(hwnd, szMessage, TEXT("Error"), MB_OK);
    }
    

    and that the actual HRESULT was WIA_ERROR_COVER_OPEN, which is defined as

    #define WIA_ERROR_COVER_OPEN MAKE_HRESULT(SEVERITY_ERROR, FACILITY_WIA, 16)
    

    Passing this value to HRESULT_CODE would yield 16, which maps to

    //
    // MessageId: ERROR_CURRENT_DIRECTORY
    //
    // MessageText:
    //
    //  The directory cannot be removed.
    //
    #define ERROR_CURRENT_DIRECTORY          16L
    

    And that would explain why the customer reported this strange error when reading data from a scanner.

  • The Old New Thing

    Placebo setting: QoS bandwidth reservation

    • 27 Comments

    A placebo setting that has been getting a lot of play in recent years is that of QoS bandwidth reservation. The setting in question sets a maximum amount of bandwidth that can be reserved for QoS. I guess one thing people forgot to notice is the word "maximum". It doesn't set the amount of reserved bandwidth, just the maximum.

    Changing the value will in most cases have no effect on your download speed, since the limit kicks in only if you have an application that uses QoS in the first place. QoS, which stands for "quality of service", is a priority scheme for network bandwidth. A program can request a certain amount of bandwidth, say for media streaming, and when the program accesses the network, up to that much bandwidth is guaranteed to be available to the program. The setting in question controls how much bandwidth can be claimed for high priority network access. If no program is using QoS, then all your bandwidth is available to non-QoS programs. What's more, even if there is a QoS reservation active, if the program that reserved the bandwidth isn't actually using it, then the bandwidth is available to non-QoS programs.

    Consider this analogy: A restaurant seats 100 people, and it has a policy of accepting reservations for at most twenty percent of those seats. This doesn't mean that twenty seats are sitting empty all the time. If ten people have made reservations for dinner at 8pm, then ninety seats are available for drop-in customers at that time. The twenty percent policy just means that once twenty people have made reservations for dinner at 8pm, the restaurant won't accept any more reservations.

    Here's an example with made-up numbers: Suppose you are downloading a large file over your 720kbps connection. Since there is nothing else using the network, your download proceeds at 720kbps. Now suppose you fire up a program that uses QoS, say, for streaming media. (I don't know whether Windows Media Player uses QoS.) You connect to a streaming media source, and the media player does some math and determines that in order to give you smooth playback, it needs a minimum of 100kbps. (If it gets more, then great, but it needs at least that much to avoid dropouts.) The program places a reservation of that amount through QoS. With a default maximum reservation of 20% = 144kbps, this reservation request is granted. Playback of the streaming media begins, and your bandwidth is now split, with 100kbps going to your media player and the remaining 620kbps going to your download.

    Now you hit pause on the media player to answer the phone. Even though the media player has a 100kbps reservation, it's not using it, so all 720kbps of bandwidth is devoted to your download. You get off the phone and unpause the media player. Bandwidth is once again divided 100kbps for the media player and 620kbps for the download.

    Now, sure, you can set your QoS maximum reservation to zero. This means that when the media player asks for a guarantee of 100kbps, QoS will tell it, "Sorry, no can do." The media player will still play the streaming media, but since it no longer has a guarantee of bandwidth, there may be stretches where the download consumes most of the network bandwidth and the streaming media gets only 50kbps. Result: dropped frames, stuttering, or pauses for buffering.

    So tweak this value all you want, but understand what you're tweaking.

  • The Old New Thing

    It takes only one program to foul an upgrade

    • 32 Comments

    "Worst software ever." That was Aaron Zupancic's cousin's reaction to the fact that Windows XP was incompatible with one program originally designed for Windows 98.

    Then again, commenter Aargh! says "The bad code should be fixed, period. If it can't be fixed, it breaks, too bad." Perhaps Aargh! can send a message to Aaron's cousin saying, "Too bad." I'm sure that'll placate Aaron's cousin.

  • The Old New Thing

    Paradoxically, you should remove the smart card when logging on with a smart card

    • 23 Comments

    To connect to the Microsoft corporate network from home, employees need to use smartcard authentication. But, somewhat paradoxically, you do better if you remove the smart card.

    A colleague of mine tipped me off to this. To initiate the connection, you have to insert the smart card and provide the smart card password. Then the system connects to Microsoft and validates both the smart card and password. During this time, you can see the smart card access light blink on and off, and an "elapsed time" meter will start running.

    Once the elapsed time reaches five seconds, remove the smart card. The actual authentication happens in five seconds; the rest of the time is doing other validation, quarantining your system, confirming that you have all the necessary patches, that sort of thing. Some of those operations in turn require authentication, and if you leave your smart card in the reader, the system will try to authenticate with the smart card (slow) even though that isn't the authentication it needs.

    If you remove the card, then the system won't try to use the smart card, and the rest of the logon process will go much faster.

    This tip may not work for other people who use smart cards for authentication, but it works for me to connect to Microsoft. What used to take thirty seconds now takes just seven.

  • The Old New Thing

    The window manager moves the mouse; applications choose the cursor

    • 34 Comments

    You can sometimes narrow down the source of a problem just by looking at the screen and moving the mouse.

    When you move the mouse, the cursor on the screen moves to match. This work is done in the window manager in kernel mode. The mouse hardware notifies the window manager, "Hey, I moved left twenty units." The window manager takes this value, accelerates or decelerates it according to your mouse acceleration settings, calls any low-level mouse hooks that are installed, and then tells the display driver, "Move that sprite left about thirty pixels" (say). It then sets the "the mouse moved" flag so that the program who owns the window under the new mouse position will get a WM_MOUSEMOVE message. The window manager also sets the cursor to the "virtual cursor state" corresponding to the window beneath the cursor. The "virtual cursor state" remembers the cursor that the thread (or threads, if input has been attached) responsible for the window most recently set. Maintaining the virtual cursor state is important, for if a thread calls SetCursor to change the cursor to an hourglass and then stops processing messages (because it is busy), you really want the cursor to change back to an hourglass when it moves over the thread's windows.

    What does it mean if the cursor doesn't move at all when you move the mouse? Could it be caused by an application? If you read through the flowchart I described above, the only place applications get involved in the "move the mouse cursor" code flow is if they are filtering out the mouse motion in a low-level mouse hook. (Another way an application can "lock up" the mouse is by calling the ClipCursor function, but vanishingly few applications do this. I'm assuming you aren't the victim of malicious software but instead are trying to figure out what program, if any, is accidentally freezing the mouse.)

    Low-level mouse hooks are comparatively uncommon since they exact a high performance penalty on the system. If you're moving your mouse and don't see the cursor move around on the screen, my guess is that there is a problem in the kernel-mode side of the equation. If you're seeing the entire system freeze up, then it's probably a device driver that has started acting up and held a lock for too long.

    A flaky hard drive can have the same effect. If the window manager itself takes a page fault, it has to wait for the hard drive to page in the data. and if the window manager happened to be holding a lock when this happened, that lock is held across the entire I/O operation. If your hard drive is flaky and, say, takes ten seconds to produce a sector of data instead of several milliseconds, then it will look like the system has frozen for ten seconds, since the window manager is stuck waiting on your disk, which is in turn grunting and recalibrating in a desperate attempt to produce the data the memory manager requested.

    In other words: If the cursor won't move, it's likely a driver or hardware problem. (Figuring out which driver/hardware will require hooking up a kernel debugger and poking around. Not for the faint of heart.)

Page 1 of 3 (28 items) 123