• The Old New Thing

    The importance of error code backwards compatibility

    • 83 Comments

    I remember a bug report that came on in an old MS-DOS program (from a company that is still in business so don't ask me to identify them) that attempted to open the file "". That's the file with no name.

    This returned error 2 (file not found). But the program didn't check the error code and though that 2 was the file handle. It then began writing data to handle 2, which ended up going to the screen because handle 2 is the standard error handle, which by default goes to the screen.

    It so happened that this program wanted to print the message to the screen anyway.

    In other words, this program worked completely by accident.

    Due to various changes to the installable file system in Windows 95, the error code for attempting to open the null file changed from 2 (file not found) to 3 (path not found) as a side-effect.

    Watch what happens.

    The program tries to open the file "". Now it gets error 3 back. It mistakenly treats the 3 as a file handle and writes to it.

    What is handle 3?

    The standard MS-DOS file handles are as follows:

    handle name meaning
    0stdinstandard input
    1stdoutstandard output
    2stderrstandard error
    3stdauxstandard auxiliary (serial port)
    4stdprnstandard printer

    What happens when the program writes to handle 3?

    It tries to write to the serial port.

    Most computers don't have anything hooked up to the serial port. The write hangs.

    Result: Dead program.

    The file system folks had to tweak their parameter validation so they returned error 2 in this case.

  • The Old New Thing

    How did MS-DOS report error codes?

    • 62 Comments

    The old MS-DOS function calls (ah, int 21h), typically indicated error by returning with carry set and putting the error code in the AX register. These error codes will look awfully familiar today: They are the same error codes that Windows uses. All the small-valued error codes like ERROR_FILE_NOT_FOUND go back to MS-DOS (and possibly even further back).

    Error code numbers are a major compatibility problem, because you cannot easily add new error code numbers without breaking existing programs. For example, it became well-known that "The only errors that can be returned from a failed call to OpenFile are 3 (path not found), 4 (too many open files), and 5 (access denied)." If MS-DOS ever returned an error code not on that list, programs would crash because they used the error number as an index into a function table without doing a range check first. Returning a new error like 32 (sharing violation) meant that the programs would jump to a random address and die.

    More about error number compatibility next time.

    When it became necessary to add new error codes, compatibility demanded that the error codes returned by the functions not change. Therefore, if a new type of error occurred (for example, a sharing violation), one of the previous "well-known" error codes was selected that had the most similar meaning and that was returned as the error code. (For "sharing violation", the best match is probably "access denied".) Programs which were "in the know" could call a new function called "get extended error" which returned one of the newfangled error codes (in this case, 32 for sharing violation).

    The "get extended error" function returned other pieces of information. It gave you an "error class" which gave you a vague idea of what type of problem it is (out of resources? physical media failure? system configuration error?), an "error locus" which told you what type of device caused the problem (floppy? serial? memory?), and what I found to be the most interesting meta-information, the "suggested action". Suggested actions were things like "pause, then retry" (for temporary conditions), "ask user to re-enter input" (for example, file not found), or even "ask user for remedial action" (for example, check that the disk is properly inserted).

    The purpose of these meta-error values is to allow a program to recover when faced with an error code it doesn't understand. You could at least follow the meta-data to have an idea of what type of error it was (error class), where the error occurred (error locus), and what you probably should do in response to it (suggested action).

    Sadly, this type of rich error information was lost when 16-bit programming was abandoned. Now you get an error code or an exception and you'd better know what to do with it. For example, if you call some function and an error comes back, how do you know whether the error was a logic error in your program (using a handle after closing it, say) or was something that is externally-induced (for example, remote server timed out)? You don't.

    This is particularly gruesome for exception-based programming. When you catch an exception, you can't tell by looking at it whether it's something that genuinely should crash the program (due to an internal logic error - a null reference exception, for example) or something that does not betray any error in your program but was caused externally (connection failed, file not found, sharing violation).

  • The Old New Thing

    Cleaner, more elegant, and harder to recognize

    • 116 Comments

    It appears that some people interpreted the title of one of my rants from many months ago, "Cleaner, more elegant, and wrong", to be a reference to exceptions in general. (See bibliography reference [35]; observe that the citer even changed the title of my article for me!)

    The title of the article was a reference to a specific code snippet that I copied from a book, where the book's author claimed that the code he presented was "cleaner and more elegant". I was pointing out that the code fragment was not only cleaner and more elegant, it was also wrong.

    You can write correct exception-based programming.

    Mind you, it's hard.

    On the other hand, just because something is hard doesn't mean that it shouldn't be done.

    Here's a breakdown:

    Really easy Hard Really hard
    Writing bad error-code-based code
    Writing bad exception-based code
    Writing good error-code-based code Writing good exception-based code

    It's easy to write bad code, regardless of the error model.

    It's hard to write good error-code-based code since you have to check every error code and think about what you should do when an error occurs.

    It's really hard to write good exception-based code since you have to check every single line of code (indeed, every sub-expression) and think about what exceptions it might raise and how your code will react to it. (In C++ it's not quite so bad because C++ exceptions are raised only at specific points during execution. In C#, exceptions can be raised at any time.)

    But that's okay. Like I said, just because something is hard doesn't mean it shouldn't be done. It's hard to write a device driver, but people do it, and that's a good thing.

    But here's another table:

    Really easy Hard Really hard
    Recognizing that error-code-based code is badly-written
    Recognizing the difference between bad error-code-based code and not-bad error-code-based code.
    Recognizing that error-code-base code is not badly-written
    Recognizing that exception-based code is badly-written
    Recognizing that exception-based code is not badly-written
    Recognizing the difference between bad exception-based code and not-bad exception-based code

    Here's some imaginary error-code-based code. See if you can classify it as "bad" or "not-bad":

    BOOL ComputeChecksum(LPCTSTR pszFile, DWORD* pdwResult)
    {
      HANDLE h = CreateFile(pszFile, GENERIC_READ, FILE_SHARE_READ,
           NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
      HANDLE hfm = CreateFileMapping(h, NULL, PAGE_READ, 0, 0, NULL);
      void *pv = MapViewOfFile(hfm, FILE_MAP_READ, 0, 0, 0);
      DWORD dwHeaderSum;
      CheckSumMappedFile(pvBase, GetFileSize(h, NULL),
               &dwHeaderSum, pdwResult);
      UnmapViewOfFile(pv);
      CloseHandle(hfm);
      CloseHandle(h);
      return TRUE;
    }
    

    This code is obviously bad. No error codes are checked. This is the sort of code you might write when in a hurry, meaning to come back to and improve later. And it's easy to spot that this code needs to be improved big time before it's ready for prime time.

    Here's another version:

    BOOL ComputeChecksum(LPCTSTR pszFile, DWORD* pdwResult)
    {
      BOOL fRc = FALSE;
      HANDLE h = CreateFile(pszFile, GENERIC_READ, FILE_SHARE_READ,
           NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
      if (h != INVALID_HANDLE_VALUE) {
        HANDLE hfm = CreateFileMapping(h, NULL, PAGE_READ, 0, 0, NULL);
        if (hfm) {
          void *pv = MapViewOfFile(hfm, FILE_MAP_READ, 0, 0, 0);
          if (pv) {
            DWORD dwHeaderSum;
            if (CheckSumMappedFile(pvBase, GetFileSize(h, NULL),
                                   &dwHeaderSum, pdwResult)) {
              fRc = TRUE;
            }
            UnmapViewOfFile(pv);
          }
          CloseHandle(hfm);
        }
        CloseHandle(h);
      }
      return fRc;
    }
    

    This code is still wrong, but it clearly looks like it's trying to be right. It is what I call "not-bad".

    Now here's some exception-based code you might write in a hurry:

    NotifyIcon CreateNotifyIcon()
    {
     NotifyIcon icon = new NotifyIcon();
     icon.Text = "Blah blah blah";
     icon.Visible = true;
     icon.Icon = new Icon(GetType(), "cool.ico");
     return icon;
    }
    

    (This is actual code from a real program in an article about taskbar notification icons, with minor changes in a futile attempt to disguise the source.)

    Here's what it might look like after you fix it to be correct in the face of exceptions:

    NotifyIcon CreateNotifyIcon()
    {
     NotifyIcon icon = new NotifyIcon();
     icon.Text = "Blah blah blah";
     icon.Icon = new Icon(GetType(), "cool.ico");
     icon.Visible = true;
     return icon;
    }
    

    Subtle, isn't it.

    It's easy to spot the difference between bad error-code-based code and not-bad error-code-based code: The not-bad error-code-based code checks error codes. The bad error-code-based code never does. Admittedly, it's hard to tell whether the errors were handled correctly, but at least you can tell the difference between bad code and code that isn't bad. (It might not be good, but at least it isn't bad.)

    On the other hand, it is extraordinarily difficult to see the difference between bad exception-based code and not-bad exception-based code.

    Consequently, when I write code that is exception-based, I do not have the luxury of writing bad code first and then making it not-bad later. If I did that, I wouldn't be able to find the bad code again, since it looks almost identical to not-bad code.

    My point isn't that exceptions are bad. My point is that exceptions are too hard and I'm not smart enough to handle them. (And neither, it seems, are book authors, even when they are trying to teach you how to program with exceptions!)

    (Yes, there are programming models like RAII and transactions, but rarely do you see sample code that uses either.)

  • The Old New Thing

    User interface design for interior door locks

    • 75 Comments

    How hard can it be to design the user interface of an interior door lock?

    Locking or unlocking the door from the inside is typically done with a latch that you turn. Often, the latch handle is in the shape of a bar that turns.

    Now, there are two possible ways you can set up your lock. One is that the a horizontal bar represents the locked position and a vertical bar represents the unlocked position. The other is to have a horizontal bar represent the unlocked position and a vertical bar represent the locked position.

    For some reason, it seems that most lock designers went for the latter interpretation. A horizontal bar means unlocked.

    This is wrong.

    Think about what the bar represents. When the deadbolt is locked, a horizontal bar extends from the door into the door jamb. Clearly, the horizontal bar position should recapitulate the horizontal position of the deadbolt. It also resonates with the old-fashioned way of locking a door by placing a wooden or metal bar horizontally across the face. (Does no one say "bar the door" any more?)

    Car doors even followed this convention, back when car door locks were little knobs that popped up and down. The up position represented the removal of the imaginary deadbolt from the door/jamb interface. Pushing the button down was conceptually the same as sliding the deadbolt into the locked position.

    But now, many car door locks don't use knobs. Instead, they use rocker switches. (Forwards means lock. Or is it backwards? What is the intuition there?) The visual indicator of the door lock is a red dot. But what does it mean? Red clearly means "danger", so is it more dangerous to have a locked door or an unlocked door? I can never remember; I always have to tug on the door handle.

    (Horizontally-mounted power window switches have the same problem. Does pushing the switch forwards raise the window or lower it?)

  • The Old New Thing

    User interface design for vending machines - answer to puzzle

    • 38 Comments

    Last time, we ended a discussion of vending machine design with a short puzzle: What problems do you see with numbering the products from 1 to 99?

    I'm not saying that these are the only possible answers, but they are ones that came to mind when I thought about it.

    • Product codes less than 10 would be ambiguous. Is a "3" a request for product number 3, or is the user just being slow at entering "32"? Solving this by adding a leading zero will not work because people are in the habit of ignoring leading zeros.
    • Product codes should not coincide with product prices. If there is a bag of cookies that costs 75 cents, users are likely to type "75" when they want the cookies, even though the product code for the cookies is 23.

    Ilya Birman was the first to point out the "bounce-effect" problem, thereby ruling out product codes like 11, 22, and 33.

  • The Old New Thing

    User interface design for vending machines

    • 75 Comments

    How hard can it be to design the user interface of a vending machine?

    You accept money, you have some buttons, the user pushes the button, they get their product and their change.

    At least in the United States, many vending machines arrange their product in rows and columns (close-up view). To select a product, you type the letter of the row and the number of the column. Could it be any simpler?

    Take a closer look at that vending machine design. Do you see the flaw?

    (Ignore the fact that the picture is a mock-up and repeats row C over and over again.)

    The columns are labelled 1 through 10. That means that if you want to buy product C10, you have to push the buttons "C" and "10". But in our modern keyboard-based world, there is no "10" key. Instead, people type "1" followed by "0".

    What happens if you type "C"+"1"+"0"? After you type the "1", product C1 drops. Then you realize that there is no "0" key. And you bought the wrong product.

    This is not a purely theoretical problem. I have seen this happen myself.

    How would you fix this?

    One solution is simply not to put so many items on a single row, considering that people have difficulty making decisions if given too many options. On the other hand, the vendor might not like that design, since their goal is to maximize the number of products.

    Another solution is to change the labels so that there are no items where the number of button presses needed do not match the number of characters in the label. In other words, no buttons with two characters on them (like the "10" button).

    Switch the rows and columns, so that the products are labelled "1A" through "1J" across the top row, and "9A" through "9J" across the bottom. This assumes you don't have more than nine rows. (This won't work for super size vending machines - look at the buttons on that thing; they go up to "U"!

    You can see another solution in that most recent vending machine: Instead of calling the tenth column "10", call it "0". Notice that they also removed rows "I" and "O" to avoid possible confusion with "1" and "0".

    A colleague of mine pointed out that some vending machines use numeric codes for all items rather than a letter and a digit. For example, if the cookies are product number 23, you punch "2" "3". If you want the chewing gum (product code 71), you punch "7" "1". He poses the following question:

    What are some problems with having your products numbered from 1 to 99?

    Answers next time.

  • The Old New Thing

    Marriage as a cross-branding opportunity

    • 10 Comments

    Jennifer Aniston and Brad Pitt have decided to besmirch the institution of marriage by deciding that the "until death do us part" thing was neither legally nor morally binding.

    "Brad said that they spent the rest of the holiday working out how they would release the news of the split. They worked out together the reasons they would give and how they would protect the brand they have built up."

    Oh, nevermind. This wasn't a marriage. It was a joint branding agreement!

    My favorite Brad/Jennifer memory was when cardiologist Robert Atkins, famous for his eponymous low-carb diet, suffered a fatal head injury, and the BBC news report was illustrated with a picture of... Brad Pitt and Jennifer Aniston.

  • The Old New Thing

    Why doesn't \\ autocomplete to all the computers on the network?

    • 63 Comments

    Wes Haggard wishes that \\ would autocomplete to all the computers on the network. [Link fixed 10am.] An early beta of Windows 95 actually did something similar to this, showing all the computers on the network when you opened the Network Neighborhood folder. And the feature was quickly killed.

    Why?

    Corporations with large networks were having conniptions because needlessly enumerating all the machines on the network can bring a large network to its knees. Think about all the times you type "\\". Now imagine if every single time you did that, Explorer started enumerating all the machines on the network. And imagine how your network administrator would feel if their network traffic saturated with enumerations each time you did that.

    Network administrators made it clear in no uncertain terms that having Windows casually enumerate all the machines on their LAN was totally unacceptable.

    The needs of the corporate environment are very different from those of the home network, and Windows needs to operate in both worlds.

  • The Old New Thing

    Seattle Snowstorm 2005 (insert swooshy sound effect)

    • 50 Comments

    As others have reported, it snowed here in the Seattle area yesterday.

    One whopping inch.

    You'd think that Seattle, which gets snow a few times a year, wouldn't go completely apoplectic the moment the flakes starts falling from the sky. Especially since it all melts away in a few hours anyway.

    I didn't watch the local news last night, but I suspect it went something like this:

    Cold Open (3 minutes): LIVE REPORT FROM SEATTLE WHERE YES I THINK I CAN SEE SOME SNOW THAT HASN'T YET MELTED BUT IT COULD JUST BE FOAM FROM A SPILT LATTE.

    Custom "SNOWSTORM 2005" graphic title sequence with extra special focus on the weatherman (45 seconds)

    Lead story (5 minutes): SNOWSTORM 2005 HAS STRUCK THE NORTHWEST WITH A VENGEANCE. WE WILL GO TO A LIVE REPORT FROM <Reporter X> IN BALLARD.

    Reporter X: "Snow is white fluffly stuff that falls from the sky. It happens rarely here in the Northwest, but some parts of the country get it a lot. And here's some science trivia you can try at your next cocktail party: Snow is actually just frozen water!"

    Reporter X and anchors do some Q&A like, "Is it safe to eat snow?" and "This snowfall has been predicted for days; do you think anxiety over the coming storm affected the Seahawks' performance on Saturday?"

    Throughout, a crawl at the bottom of the screen provides critical information like "Wear sturdy boots when walking in snow."

    "STAY TUNED, COVERAGE OF SNOWSTORM 2005 WILL CONTINUE AFTER A SHORT BREAK."

    Commercial (2 minutes)

    Second story (5 minutes): COVERAGE OF SNOWSTORM 2005 CONTINUES WITH A LIVE REPORT FROM <Reporter Y> IN ISSAQUAH.

    Reporter Y: "Yes, there were reports of snow being spotted as far away as Issaquah. I'm standing here with Gladys Wilkins who claims she saw some snow as early as 7:30 today."

    Interview with Ms. Wilkins.

    Anchors banter with Reporter Y saying things like, "Gosh, I hope you've got your shovels ready."

    Commercial (2 minutes)

    Third story (5 minutes) Continuing coverage of Snowstorm 2005 with footage from the last snowstorm, driving tips ("be careful when driving, because snow is slippery"), cold weather tips ("try wearing a sweater to stay warm").

    Commercial (2 minutes)

    Weather report (4 minutes): Weatherman talks about snow and shows you so many Doppler radar graphs it'll make your head spin. He doesn't actually say anything useful, but it's fun to hear phrases like "the Fraser Valley effect" and "a weakening area of low pressure".

    Commercial (2 minutes)

    In Other News (10 seconds): Police chief's gun stolen, how to get great deals at post-holiday sales, and bunny rabbits.

    Close titles (5 seconds).

  • The Old New Thing

    Taskbar notification balloon tips don't penalize you for being away from the keyboard

    • 70 Comments

    The Shell_NotifyIcon function is used to do various things, among them, displaying a balloon tip to the user. As discussed in the documentation for the NOTIFYICONDATA structure, the uTimeout member specifies how long the balloon should be displayed.

    But what if the user is not at the computer when you display your balloon? After 30 seconds, the balloon will time out, and the user will have missed your important message!

    Never fear. The taskbar keeps track of whether the user is using the computer (with the help of the GetLastInputInfo function) and doesn't "run the clock" if it appears that the user isn't there. You will get your 30 seconds of "face time" with the user.

    But what if you want to time out your message even if the user isn't there?

    You actually have the information available to you to solve this puzzle on the web pages I linked above. See if you can put the pieces together and come up with a better solution than simulating a click on the balloon. (Hint: Look carefully at what it means if you set your balloon text to an empty string.)

    And what if you want your message to stay on the screen longer than 30 seconds?

    You can't. The notification area enforces a 30 second limit for any single balloon. Because if they user hasn't done anything about it for 30 seconds, they probably aren't interested. If your message is so critical that the user shouldn't be allowed to ignore it, then don't use a notification balloon. Notification balloons are for non-critical transient messages to the user.

Page 366 of 429 (4,287 items) «364365366367368»