November, 2006

  • The Old New Thing

    How do I test that return value of ShellExecute against 32?

    • 39 Comments

    We discussed earlier the history behind the the return value of the ShellExecute function, and why its value in Win32 is meaningless aside from testing it against the value 32 to determine whether an error occurred.

    How, then, should you check for errors?

    Let's turn the question around. How would you, the implementor of the ShellExecute function, report success? The ShellExecute is a very popular function, so you have to prepared for the ways people check the return code incorrectly yet manage to work in spite of themselves. The goal, therefore, is to report success in a manner that breaks as few programs as possible.

    (Now, there may be those of you who say, "Hang compatibility. If programs checked the return value incorrectly, then they deserve to stop working!" If you choose to go in that direction, then be prepared for the deluge of compatibility bugs to be assigned to you to fix. And they're going to come from a grumpy compatibility testing team because they will have spent a long time just finding out that the problem was that the program was checking the return value of ShellExecute incorrectly.)

    Since there is still 16-bit code out there that may thunk up to 32-bit code, you probably don't want to return a value greater than 0xFFFF. Otherwise, when that value gets truncated to a 16-bit HINSTANCE will lose the high word. If you returned a value like 0x00010001, this would truncate to 0x0001, which would be treated as an error code.

    For similar reasons, the 64-bit implementation of the ShellExecute function had better not use the upper 32 bits of the return value. Code that casts the return value to int will lose the high 32 bits.

    Furthermore, you shouldn't return a value that, when cast to an integer, results in a negative number. Some people will use a signed comparison against 32; others will use an unsigned comparison. If you returned a value like -5, then the people who used a signed comparison would think the function failed, whereas those who used an unsigned comparison would think it succeeded.

    By the same logic, the value you choose as the return value should not result in a negative number when cast to a 16-bit integer. If the return value is passed to a 16-bit caller that casts the result to an integer and compares against 32, you want consistent results independent of whether the 16-bit caller used a signed or unsigned comparison.

    Edge conditions are tricky, so you don't want to return the value 32 exactly. If you look at code that checks the return value from ShellExecute, you'll probably find that the world is split as to whether 32 is an error code or not. So it'd be in your best interest not to return the value 32 exactly but rather a value larger than 32.

    So far, you're constrained to choosing a value in the range 33–32767.

    Finally, you might be a fan of Douglas Adams. (Most geeks are.) The all-important number 42 fits into this range. Your choice of return value, therefore, might be (HINSTANCE)42.

    Going back to the original question: How should I check the return value of ShellExecute for errors? MSDN says you can cast the result to an integer and compare the result against 32. That'll work fine. You could cast in the other direction, comparing the return value against (HINSTANCE)32. That'll work fine, too. Or you could cast the result to an INT_PTR and compare the result against 32. That's fine, too. They'll all work, because the implementor of the ShellExecute function had to plan ahead for you and all the other people who call the ShellExecute function.

  • The Old New Thing

    Aspiring to the wrong office on election day

    • 28 Comments

    Many years ago, the local public radio station invited the three candidates for Seattle Port Commissioner to take part in a program on the upcoming election. Each candidate was given some time to address the voters directly. Here's how it went, roughly.

    Candidate 1: "We need an experienced person to find solutions for our region's transportation problems while still protecting the Sound against environmental damage."

    Candidate 2: "We need to make the port more attractive to cruise ships by lowering fees. I assure you that the fact that I own a cruise ship company is purely coincidental."

    Candidate 3: "We need to shorten the work week to avoid layoffs. We must forgive all third-world debt. And the United States must pull its troops out of East Timor."

    Once again, this was an election for the position of Seattle Port Commissioner. (It reminds me of my graduate school days where the undergraduate student government passed a resolution condemning the government of El Salvador. Yeah, that'll show them. I can imagine a meeting down there in San Salvador: "Ooh, we'd better get our act together. Those college students in the United States are really upset with us.")

  • The Old New Thing

    There's going to be an awful lot more overclocking out there

    • 39 Comments

    Last year, I told the story of overclocking being the source of a lot of mysterious crashes and that some of those overclocked machines were overclocked at the store. These machines came from small, independent shops rather than the major manufacturers.

    Well it looks like that's about to change.

    Gateway's FX530 Desktop can be ordered overclocked by the manufacturer.

    Just say no to DIY overclocking and let us do it for you! We'll factory overclock your Intel® quad-core processor.4 Yep, you read that right: factory overclock, which is something that most other major PC manufacturers don't do.

    We live in interesting times.

  • The Old New Thing

    Tonya and Nancy: The Opera

    • 4 Comments

    No, really, it's not a joke. There really is an opera written about Tonya Harding and Nancy Kerrigan. And Only a Game is there.

    The work was composed as a master's thesis at Tufts University by Abigail Al-Doory and, as is the nature of said productions, it was performed exactly twice and then packed away.

    Abigail Al-Doory: We've got our opening scene with the first press conference since the knee-whacking. Tonya and Nancy meet for the first time. Then we go into a flashback which shows the knee-whacking.

    Bill Littlefield: Knee-whacking as opera. This is irresistable.

    Abigail Al-Doory: It's incredible to me. I mean, I wrote it, and every time I watch it...

    Bill Littlefield: [interrupting, singing] "Oh ho ho ho! I whack your knee!" Something like that?

    Abigail Al-Doory: Not exactly.

    [mp3] [RealAudio] [Slideshow of dress rehearsal]

  • The Old New Thing

    On the importance of backwards compatibility for large corporations

    • 62 Comments

    Representatives from the IT department of a major worldwide corporation came to Redmond and took time out of their busy schedule to give a talk on how their operations are set up. I was phenomenally impressed. These people know their stuff. Definitely a world-class operation.

    One of the tidbits of information they shared with us is some numbers about the programs they have to support. Their operations division is responsible for 9,000 different install scripts for their employees around the world.

    That was not a typo.

    Nine thousand.

    This highlighted for me the fact that backwards compatibility is crucial for adoption in the corporate world. Do the math. Suppose they could install, test and debug ten programs each business day, in my opinion, a very optimistic estimate. Even at that rate, it would take them three years to get through all their scripts.

    This isn't a company that bought some software ten years ago and don't have the source code. They have the source code for all of their scripts. They have people who understand how the scripts work. They are not just on the ball; they are all over the ball. And even then, it would take them three years to go through and check (and possibly fix) each one.

    Oh, did I mention that four hundred of those programs are 16-bit?

  • The Old New Thing

    How do I convert an HRESULT to a Win32 error code?

    • 32 Comments

    Everybody knows that you can use the HRESULT_FROM_WIN32 macro to convert a Win32 error code to an HRESULT, but how do you do the reverse?

    Let's look at the definition of HRESULT_FROM_WIN32:

    #define HRESULT_FROM_WIN32(x) \
      ((HRESULT)(x) <= 0 ? ((HRESULT)(x)) \
    : ((HRESULT) (((x) & 0x0000FFFF) | (FACILITY_WIN32 << 16) | 0x80000000)))
    

    If the value is less than or equal to zero, then the macro returns the value unchanged. Otherwise, it takes the lower sixteen bits and combines them with FACILITY_WIN32 and SEVERITY_ERROR.

    How do you reverse this process? How do you write the function WIN32_FROM_HRESULT?

    It's impossible to write that function since the mapping provided by the HRESULT_FROM_WIN32 function is not one-to-one. I leave as an execise to draw the set-to-set mapping diagram from DWORD to HRESULT. (Original diagram removed since people hate VML so much, and I can't use SVG since it requies XHTML.) If you do it correctly, you'll have a single line which maps 0 to S_OK, and a series of blocks that map blocks of 65536 error codes into the same HRESULT space.

    Notice that the values in the range 1 through 0x7FFFFFFFF are impossible results from the HRESULT_FROM_WIN32 macro. Furthermore, values in the range 0x80070000 through 0x8007FFFF could have come from quite a few original Win32 codes; you can't pick just one.

    But let's try to write the reverse function anyway:

    BOOL WIN32_FROM_HRESULT(HRESULT hr, OUT DWORD *pdwWin32)
    {
     if ((hr & 0xFFFF0000) == MAKE_HRESULT(SEVERITY_ERROR, FACILITY_WIN32)) {
      // Could have come from many values, but we choose this one
      *pdwWin32 = HRESULT_CODE(hr);
      return TRUE;
     }
     if (hr == S_OK) {
      *pdwWin32 = HRESULT_CODE(hr);
      return TRUE;
     }
     // otherwise, we got an impossible value
     return FALSE;
    }
    

    Of course, we could have been petulant and just written

    BOOL WIN32_FROM_HRESULT_alternate(HRESULT hr, OUT DWORD *pdwWin32)
    {
     if (hr < 0) {
      *pdwWin32 = (DWORD)hr;
      return TRUE;
     }
     // otherwise, we got an impossible value
     return FALSE;
    }
    

    because the HRESULT_FROM_WIN32 macro is idempotent: HRESULT_FROM_WIN32(HRESULT_FROM_WIN32(x)) == HRESULT_FROM_WIN32(x). Therefore you would be technically correct if you declared that the "inverse" function was trivial. But in practice, people want to try to get "x" back out, so that's what we give you.

    Now that you understand how the HRESULT_FROM_WIN32 macro works, you can answer this question, based on an actual customer question:

    Sometimes, when I import data from a scanner, I get the error "The directory cannot be removed." What does this mean?

    You will have to use some psychic powers, but I think you're up to it.

    One unfortunate aspect of both HRESULTs and Win32 error codes is that there is no single header file that contains all the errors. This is understandable from a logistical point of view: Multiple teams need to make up new error codes for their components, but the winerror.h file is maintained by the kernel team. If winerror.h were selected to be the master repository for all error codes, it means that any team that wanted to add a new error code or change an existing one would have to pester the kernel team to make the change for them. Things get even more complicated if those teams have their own SDK. For example, suppose both the DirectX and Windows Media teams wanted to include the new winerror.h in their corresponding SDKs. If you install the SDKs in the wrong order (and how are you supposed to know which should be installed first, DirectX 8 or WMSDK 6?), you can end up regressing your winerror.h file. It's the version conflict problem, but without the benefit of version resources.

    Many teams have prevailed upon the kernel team to reserve a chunk of error codes just for them.

    Networking2100–2999
    Cluster5000–5999
    Traffic Control7500–7999
    Active Directory8000–8999
    DNS9000–9999
    Winsock10000–11999
    IPSec13000–13999
    Side By Side14000–14999

    There is room for only 65535 Win32 error codes, and over an eighth of them have already been carved out by these "block assignments". I wonder if we will eventually run out of error codes prematurely due to having given away error codes in too-large chunks. (Some sort of analogy with IPv4 could be made here but I'm not going to try.)

  • The Old New Thing

    Make sure you disable the correct window for modal UI

    • 10 Comments

    Some time ago, I was asked to look at two independent problems with people trying to do modal UI manually. Well, actually, when the issues were presented to me, they weren't described in quite that way. They were more along the lines of, "Something strange is happening in our UI. Can you help?" Only in the discussion of the scenarios did it become apparent that it was improper management of modal UI that was the cause.

    We already saw one subtlety of managing modal UI manually, namely that you have to enable and disable the windows in the correct order. That wasn't the root of the problems I was looking at, but enabling and disabling windows did play a major role.

    When we took a look at the dialog loop, the first steps involved manipulating the hwndParent parameter to ensure that we enable and disable the correct window at the correct time.

     if (hwndParent == GetDesktopWindow())
      hwndParent = NULL;
     if (hwndParent)
      hwndParent = GetAncestor(hwndParent, GA_ROOT);
     HWND hdlg = CreateDialogIndirectParam(hinst,
                   lpTemplate, hwndParent, lpDlgProc,
                   lParam);
     BOOL fWasEnabled = EnableWindow(hwndParent, FALSE);
    

    In both cases, the first two "if" statements were missing. We already saw the danger of disabling the desktop window, which is what the first "if" statement protects against. But the specific problem with modal UI was being caused by the missing second "if" statement.

    Both of the problems boiled down to somebody passing a child window as the hwndParent and the code doing manual modal UI failing to convert this window to a top-level window. As a result, when they did the EnableWindow(hwndParent, FALSE), they disabled a child window, leaving the top-level window enabled.

    The two problems had the same root cause but manifested themselves differently. The first problem led to strange behavior because the user could still interact with the top-level window since it was still enabled. Sure, a portion of the window was disabled (the portion controlled by the child window passed as hwndParent), but the caption buttons still worked, as did many of the other controls on the window.

    In the second case, disabling the wrong window created a different problem: When the modal UI was complete, the window manager activated the top-level window that was the owner of the modal window since that window was never disabled. This caused the top-level window to receive a WM_ACTIVATE message, which it handled by putting focus on the control that had focus when the top-level window was deactivated. Unfortunately, that window was the window that was passed as the hwndParent, which was disabled by mistake. The attempt to restore focus failed, and when the manual modal UI finally finished up and enabled the child window, it was too late. You wound up with focus nowhere and a dead keyboard. This second problem was reported as simply "SetFocus is not working." Only after peeling back a few layers (and application of some psychic powers) did the root cause emerge.

    Now, even though this was a subtle problem, you already knew all the pieces that went into it since I had covered them earlier. And as for those psychic powers that I used? It's really not that magic. In this case of psychic debugging, I worked backwards. In response to the report that SetFocus was not working, the next set of questions was to determine why. Is it a valid window handle? Does the window belong to your thread? Is it enabled?

    Aha, the window isn't enabled. That's when the customer also mentioned that they were doing this inside a WM_ACTIVATE handler. If you're gaining activation, who were you gaining it from? Oh, a modal dialog, you say? One that you're managing manually? Once I discovered that they were trying to manage modal UI manually, I suspected that they were disabling the wrong window, since that fit all the symptoms and it's something that people tend to get wrong.

    Most of what looks like psychic debugging is really just knowing what people tend to get wrong.

  • The Old New Thing

    I bet somebody got a really nice bonus for that feature

    • 132 Comments

    I often find myself saying, "I bet somebody got a really nice bonus for that feature."

    "That feature" is something aggressively user-hostile, like forcing a shortcut into the Quick Launch bar or the Favorites menu, like automatically turning on a taskbar toolbar, like adding an icon to the notification area that conveys no useful information but merely adds to the clutter, or (my favorite) like adding an extra item to the desktop context menu that takes several seconds to initialize and gives the user the ability to change some obscure feature of their video card.

    Allow me to summarize the guidance:

    The Quick Launch bar and Favorites menu belong to the user. There is intentionally no interface to manipulate shortcuts in the Quick Launch bar. We saw what happened to the Favorites menu and learned our lesson: Providing a programmatic interface to high-valued visual real estate results in widespread abuse. Of course, this doesn't stop people from hard-coding the path to the Quick Launch directory—too bad the name of the directory isn't always "Quick Launch"; the name can change based on what language the user is running. But that's okay, I mean, everybody speaks English, right?

    There is no programmatic interface to turn on a taskbar toolbar. Again, that's because the taskbar is a high-value piece of the screen and creating a programmatic interface can lead to no good. Either somebody is going to go in and force their toolbar on, or they're going to go in and force a rival's toolbar off. Since there's no programmatic interface to do this, these programs pull stunts like generating artificial user input to simulate the right-click on the taskbar, mousing to the "Toolbars" menu item, and then selecting the desired toolbar. The taskbar context menu will never change, right? Everybody speaks English, right?

    The rule for taskbar notifications is that they are there to, well, notify the user of something. Your print job is done. Your new hardware device is ready to use. A wireless network has come into range. You do not use a notification icon to say "Everything is just like it was a moment ago; nothing has changed." If nothing has changed, then say nothing.

    Many people use the notification area to provide quick access to a running program, which runs counter to the guidance above. If you want to provide access to a program, put a shortcut on the Start menu. Doesn't matter whether the program is running already or not. (If it's not running, the Start menu shortcut runs it. If it is already running, the Start menu shortcut runs the program, which recognizes that it's already running and merely activates the already-running copy.)

    While I'm here, I may as well remind you of the guidance for notification balloons: A notification balloon should only appear if there is something you want the user to do. It must be actionable.

    BalloonAction
    Your print job is complete. Go pick it up.
    Your new hardware device is ready to use. Start using it.
    A wireless network has come into range. Connect to it.

    The really good balloons will tell the user what the expected action is. "A wireless network has come into range. Click here to connect to it." (Emphasis mine.)

    Here are some bad balloons:

    Bad BalloonAction?
    Your screen settings have been restored. So what do you want me to do about it?
    Your virtual memory swap file has been automatically adjusted. If it's automatic, what do I need to do?
    Your clock has been adjusted for daylight saving time. Do you want me to change it back?
    Updates are ready for you to install. So?

    One of my colleagues got a phone call from his mother asking him what she she should do about a new error message that wouldn't go away. It was the "Updates are ready for you to install" balloon. The balloon didn't say what she should do next.

    The desktop context menu extensions are the worst, since the ones I've seen come from video card manufacturers that provide access to something you do maybe once when you set up the card and then don't touch thereafter. I mean, do normal users spend a significant portion of their day changing their screen resolution and color warmth? (Who on a laptop would even want to change their screen resolution?) What's worse is that one very popular such extension adds an annoying two second delay to the appearance of the desktop context menu, consuming 100% CPU during that time. If you have a laptop with a variable-speed fan, you can hear it going nuts for a few seconds each time you right-click the desktop. Always good to chew up battery life initializing a context menu that nobody on a laptop would use anyway.

    The thing is, all of these bad features were probably justified by some manager somewhere because it's the only way their feature would get noticed. They have to justify their salary by pushing all these stupid ideas in the user's faces. "Hey, look at me! I'm so cool!" After all, when the boss asks, "So, what did you accomplish in the past six months," a manager can't say, "Um, a bunch of stuff you can't see. It just works better." They have to say, "Oh, check out this feature, and that icon, and this dialog box." Even if it's a stupid feature.

    As my colleague Michael Grier put it, "Not many people have gotten a raise and a promotion for stopping features from shipping."

Page 3 of 3 (28 items) 123