April, 2004

  • The Old New Thing

    News flash: People are fooled by the Onion


    Fact-checking? What's fact-checking? I found it on the Internet!

    Wired News has a story on various people and news agencies being fooled by Onion articles. This used to be news, but now it's so common it may end up relegated to just a counter. "Number of people fooled by Onion articles: n+1".

  • The Old New Thing

    Cleaner, more elegant, and wrong


    Just because you can't see the error path doesn't mean it doesn't exist.

    Here's a snippet from a book on C# programming, taken from the chapter on how great exceptions are.

    try {
      AccessDatabase accessDb = new AccessDatabase();
    } catch (Exception e) {
      // Inspect caught exception
    public void GenerateDatabase()
    Notice how much cleaner and more elegant [this] solution is.

    Cleaner, more elegant, and wrong.

    Suppose an exception is thrown during CreateIndexes(). The GenerateDatabase() function doesn't catch it, so the error is thrown back out to the caller, where it is caught.

    But when the exception left GenerateDatabase(), important information was lost: The state of the database creation. The code that catches the exception doesn't know which step in database creation failed. Does it need to delete the indexes? Does it need to delete the tables? Does it need to delete the physical database? It doesn't know.

    So if there is a problem creating CreateIndexes(), you leak a physical database file and a table forever. (Since these are presumably files on disk, they hang around indefinitely.)

    Writing correct code in the exception-throwing model is in a sense harder than in an error-code model, since anything can fail, and you have to be ready for it. In an error-code model, it's obvious when you have to check for errors: When you get an error code. In an exception model, you just have to know that errors can occur anywhere.

    In other words, in an error-code model, it is obvious when somebody failed to handle an error: They didn't check the error code. But in an exception-throwing model, it is not obvious from looking at the code whether somebody handled the error, since the error is not explicit.

    Consider the following:

    Guy AddNewGuy(string name)
     Guy guy = new Guy(name);
     guy.Team = ChooseRandomTeam();
     return guy;

    This function creates a new Guy, adds him to the league, and assigns him to a team randomly. How can this be simpler?

    Remember: Every line is a possible error.

    What if an exception is thrown by "new Guy(name)"?

    Well, fortunately, we haven't yet started doing anything, so no harm done.

    What if an exception is thrown by "AddToLeague(guy)"?

    The "guy" we created will be abandoned, but the GC will clean that up.

    What if an exception is thrown by "guy.Team = ChooseRandomTeam()"?

    Uh-oh, now we're in trouble. We already added the guy to the league. If somebody catches this exception, they're going to find a guy in the league who doesn't belong to any team. If there's some code that walks through all the members of the league and uses the guy.Team member, they're going to take a NullReferenceException since guy.Team isn't initialized yet.

    When you're writing code, do you think about what the consequences of an exception would be if it were raised by each line of code? You have to do this if you intend to write correct code.

    Okay, so how to fix this? Reorder the operations.

    Guy AddNewGuy(string name)
     Guy guy = new Guy(name);
     guy.Team = ChooseRandomTeam();
     return guy;

    This seemingly insignificant change has a big effect on error recovery. By delaying the commitment of the data (adding the guy to the league), any exceptions taken during the construction of the guy do not have any lasting effect. All that happens is that a partly-constructed guy gets abandoned and eventually gets cleaned up by GC.

    General design principle: Don't commit data until they are ready.

    Of course, this example was rather simple since the steps in setting up the guy had no side-effects. If something went wrong during set-up, we could just abandon the guy and let the GC handle the cleanup.

    In the real world, things are a lot messier. Consider the following:

    Guy AddNewGuy(string name)
     Guy guy = new Guy(name);
     guy.Team = ChooseRandomTeam();
     return guy;

    This does the same thing as our corrected function, except that somebody decided that it would be more efficient if each team kept a list of members, so you have to add yourself to the team you intend to join. What consequences does this have on the function's correctness?

  • The Old New Thing

    NFL cracks down on grandstanding


    The National (US) Football League adopted a 15-yard penalty for pre-planned celebrations, such as last year's "phone call from the end zone" or 2002's "autographed football". Apparently, the existing monetary fines weren't having much of an effect on players with multi-million-dollar contracts. (Surprised?) So now the league is going to hit them where it hurts: On the field.

    I found it odd that the players weren't penalized under existing rules for having unauthorized equipment on the field.

    So does this mean that teams will be penalized for keeping champagne on ice in anticipation of winning a championship? That certainly counts as a pre-planned celebration, doesn't it?

    Perhaps teams which win a championship game now must "spontaneously" send somebody out to the local liquor store to buy a few cases of bubbly once the game ends.

  • The Old New Thing

    Why the compiler can't autoconvert foreach to for


    People have discovered that the "natural" C# loop construct

    ArrayList list = ...;
    foreach (Object o in list) {
      ... do something with o ...

    is fractionally slower than the corresponding manual loop:

    ArrayList list = ...;
    for (int i = 0; i < list.Length; i++) {
        Object o = list[i];
      ... do something with o ...

    The first thing that needs to be said here is that

    The performance difference is almost certainly insignificant.

    Don't go running around changing all your foreach loops into corresponding for loops thinking that your program will magically run faster. It almost certainly won't, because loop overhead is rarely where a non-benchmark program spends most of its time.

    My topic for today is not how to make your code faster by abandoning your foreach loops. My topic is to answer the question, "Why doesn't the compiler autoconvert the foreach into the corresponding for, so I don't lose readability but get to take advantage of the performance benefit."

    The reason is that the two loops are in fact not identical.

    The semantics for enumeration is that you aren't allowed to change the object being enumerated while an enumeration is in progress. If you do, then the enumerator will throw an InvalidOperationException the next time you talk to it. On the other hand, the for loop doesn't care if you change the collection while you're enumerating it. If you insert items into the collection inside the for loop, the loop will keep on going and depending on where the insertion happened, you might double-enumerate an item.

    If the compiler changed the foreach to a for, then a program that used to throw an exception would now run without a hiccup. Whether you consider this an "improvement" is a matter of opinion. (Depending on the circumstances, it may be better for the program to crash than to produce incorrect results.)

    Now, the compiler might be able to prove that you don't change the collection inside the loop, but that is often hard to prove. For example, does this loop change the collection?

    ArrayList list = target.GetTheList();
    foreach (Object o in list) {

    Well, it doesn't look like it. But who knows, maybe target looks like this:

    class Sneaky {
      ArrayList list_;
      public Sneaky(ArrayList list) { list_ = list; }
      public override int GetHashCode()
        return base.GetHashCode();
    class SneakyContainer {
      public ArrayList GetTheList()
        ArrayList list = new ArrayList();
        list.Add(new Sneaky(list));
        return list;
    class Program {
      static public void Main()
        SneakyContainer target = new SneakyContainer();
        ArrayList list = target.GetTheList();
        foreach (object o in list) {

    Ah, little did you know that o.GetHashCode() modifies the ArrayList. And yet it looked so harmless!

    If the SneakyContainer class came from another assembly, then the compiler must assume the worst, because it's possible that somebody will make that assembly sneaky after you compiled your assembly.

    If that's not a messed-up enough reason, here's another: The ArrayList class is not sealed. Therefore, somebody can override its IEnumerable.GetEnumerator and return a nonstandard enumerator. For example, here's a class that always returns an empty enumerator:

    class ApparentlyEmptyArrayList : ArrayList {
      static int[] empty = new int[] { };
      public override IEnumerator GetEnumerator()
        { return empty.GetEnumerator(); }

    "Who would be so crazy as to override the enumerator?"

    Well, this one is rather bizarro, but more generally one might override the enumerator in order to add a filter or to change the order of enumeration.

    So you can't even trust that your ArrayList really is an ArrayList. It might be an ApparentlyEmptyArrayList!

    Now if the compiler wanted to do this rewrite optimization, not only would it have to prove that the object being enumerated is not modified inside the enumeration, it also has to prove that the object really is an ArrayList and not a derived class that may have overridden the GetEnumerator method.

    Given the late-binding nature of cross-assembly classes, the number of cases where the compiler can prove these requirements is very restricted indeed, to the point where the number of places where the optimization can safely be performed without changing semantics becomes so vanishingly small as to be not worth the effort.

    (By some astonishing universal synchronicity, this topic got picked up by several people all at once:

    Sort of the same way a movie subject gets covered all at once. My favorite is the year that there were two volcano disaster movies, Volcano and Dante's Peak.)

  • The Old New Thing

    Good-Bye, Lenin!


    This weekend I saw Good-Bye, Lenin!, a German movie about a young man who must pretend that East Germany still exists, for the sake of his mother who was in a coma during the fall of the Berlin Wall and therefore remains unaware of the earth-shattering changes the took place while she was unconscious.

    There is, of course, the comedy of a young man attempting to recreate a world that no longer exists. But there is also the look into the lives of the people of East Germany. Behind that wall were real people, living their lives day to day. They weren't evil people. And when the wall fell, that life ended.

    Of course, this movie also taught me that my German needs a lot of work.

    From a "learning German" point of view, I tried to keep track of when the formal "Sie" was used and when the informal "du"/"ihr". Since I didn't grow up in Germany, deciding which form to use is for me still a bit of a puzzlement. (When I spoke with some German college students, they said that my German was okay, except that I kept using the wrong word for "you". Then again, one of them also fell over laughing when I said, "Bilder knipsen". Apparently "knipsen" is the cutesy way of taking pictures. Thanks to my German textbook for not pointing this out.)

  • The Old New Thing

    Why can't the system hibernate just one process?


    Windows lets you hibernate the entire machine, but why can't it hibernate just one process? Record the state of the process and then resume it later.

    Because there is state in the system that is not part of the process.

    For example, suppose your program has taken a mutex, and then it gets process-hibernated. Oops, now that mutex is abandoned and is now up for grabs. If that mutex was protecting some state, then when the process is resumed from hibernation, it thinks it still owns the mutex and the state should therefore be safe from tampering, only to find that it doesn't own the mutex any more and its state is corrupted.

    Imagine all the code that does something like this:

    // assume hmtx is a mutex handle that
    // protects some shared object G
    WaitForSingleObject(hmtx, INFINITE);
    // do stuff with G
    // do more stuff with G on the assumption that
    // G hasn't changed.

    Nobody expects that the mutex could secretly get released during the "..." (which is what would happen if the process got hibernated). That goes against everything mutexes stand for!

    Consider, as another example, the case where you have a file that was opened for exclusive access. The program will happily run on the assumption that nobody can modify the file except that program. But if you process-hibernate it, then some other process can now open the file (the exclusive owner is no longer around), tamper with it, then resume the original program. The original program on resumption will see a tampered-with file and may crash or (worse) be tricked into a security vulnerability.

    One alternative would be to keep all objects that belong to a process-hibernated program still open. Then you would have the problem of a file that can't be deleted because it is being held open by a program that isn't even running! (And indeed, for the resumption to be successful across a reboot, the file would have to be re-opened upon reboot. So now you have a file that can't be deleted even after a reboot because it's being held open by a program that isn't running. Think of the amazing denial-of-service you could launch against somebody: Create and hold open a 20GB file, then hibernate the process and then delete the hibernation file. Ha-ha, you just created a permanently undeletable 20GB file.)

    Now what if the hibernated program had created windows. Should the window handles still be valid while the program is hibernated? What happens if you send it a message? If the window handles should not remain valid, then what happens to broadcast messages? Are they "saved somewhere" to be replayed when the program is resumed? (And what if the broadcast message was something like "I am about to remove this USB hard drive, here is your last chance to flush your data"? The hibernated program wouldn't get a chance to flush its data. Result: Corrupted USB hard drive.)

    And imagine the havoc if you could take the hibernated process and copy it to another machine, and then attempt to restore it there.

    If you want some sort of "checkpoint / fast restore" functionality in your program, you'll have to write it yourself. Then you will have to deal explicitly with issues like the above. ("I want to open this file, but somebody deleted it in the meantime. What should I do?" Or "Okay, I'm about to create a checkpoint, I'd better purge all my buffers and mark all my cached data as invalid because the thing I'm caching might change while I'm in suspended animation.")

  • The Old New Thing

    Beethoven as ambient music


    Who knew that Beethoven wrote ambient music?

    The people at NOTAM took Beethoven's 9th Symphony and slowed it down so that the entire performance takes 24 hours. It's actually quite nice to listen to. (I like 2.1 myself.)

    [Rats, scooped by MetaFilter. Honest, it was in my queue! Nobody will believe me; they'll think I swiped it from MeFi...]

    A few months ago, NPR covered a similar experiment with the movie Psycho. I think it was a little less successful.

  • The Old New Thing

    WM_KILLFOCUS is the wrong time to do field validation


    "I'll do my field validation when I get a WM_KILLFOCUS message."

    This is wrong for multiple reasons.

    First, you may not get your focus loss message until it's too late.

    Consider a dialog box with an edit control and an OK button. The edit control validates its contents on receipt of the WM_KILLFOCUS message. Suppose the user fills in some invalid data.

    Under favorable circumstances, the user clicks the OK button. Clicking the OK button causes focus to move away from the edit control, so the edit control's WM_KILLFOCUS runs and gets a chance to tell the user that the field is no good. Since button clicks do not fire until the mouse is released while still over the button, invalid data will pop up a message box, which steals focus, and now the mouse-button-release doesn't go to the button control. Result: Error message and IDOK action does not execute.

    Now let's consider less favorable circumstances. Instead of clicking on the OK button, the user just presses Enter or types the keyboard accelerator for whatever button dismisses the dialog. The accelerator is converted by IsDialogMessage into a WM_COMMAND with the button control ID. Focus does not change.

    So now the IDOK (or whatever) handler runs and calls EndDialog() or performs whatever action the button represents. If the dialog exits, then focus will leave the edit control as part of dialog box destruction, and only then will the validation occur, but it's too late now. The dialog is already exiting.

    Alternatively, if the action in response to the button is not dialog termination but rather starting some other procedure, then it will do it based on the unvalidated data in the dialog box, which is likely not what you want. Only when that procedure moves focus (say, by displaying a progress dialog) will the edit control receive a WM_KILLFOCUS, at which time it is too late to do anything. The procedure (using the unvalidated data) is already under way.

    There is also a usability problem with validating on focus loss. Suppose the user starts typing data into the edit control, and then the user gets distracted. Maybe they need to open a piece of email that has the information they need. Maybe they got a phone call and need to look up something in their Contacts database. Maybe they went to the bathroom and the screen saver just kicked in. The user does not want a "Sorry, that partial information you entered is invalid" error dialog, because they aren't yet finished entering the data.

    I've told you all the places you shouldn't do validation but haven't said where you should.

    Do the validation when the users indicate that they are done with data entry and want to go on to the next step. For a simple dialog, this would mean performing validation when the OK or other action verb button is clicked. For a wizard, it would be when the Next button is clicked. For a tabbed dialog, it would be when the user tabs to a new page.

    (Warnings that do not change focus are permitted, like the balloon tip that apperas if you accidentally turn on Caps Lock while typing your password.)

  • The Old New Thing

    A $2 billion bridge to one person


    The New York Times reported on two enormous construction projects of dubious merit:

    [The first bridge] would connect [Ketchikan, population 7845] to an island that has about 50 residents and the area's airport, which offers six flights a day (a few more in summer). It could cost about $200 million.

    The other bridge would span an inlet for nearly two miles to tie Anchorage to a port that has a single regular tenant and almost no homes or businesses. It would cost up to $2 billion.

    The first bridge replaces a five minute ferry ride with a drive that most likely will take even longer.

    And the representative from Alaska behind these pointless construction projects is hardly ashamed of this. Quite the contrary: He's proud of his achievement.

    "I stuffed it like a turkey."

    United States politics is not about trying to make the world a better place. It's about doing whatever it takes to get re-elected.

  • The Old New Thing

    Mapping all those "strange" digits to "0" through "9"

    In an earlier article, I discussed how the Char.IsDigit() method and its Win32 counterpart, GetStringTypeEx report things to be digits that aren't just "0" through "9".

    If you really care just about "0" through "9", then you can test for them explicitly. For example, as a regular expression, use [0-9] instead of \d. Alternatively, for a regular expression, you can enable ECMA mode via RegexOptions.ECMAScript. Note that this controls much more than just the interpretation of the \d character class, so make sure to read carefully to ensure that you really want all the ECMA behavior.

    It has been pointed out to me that there is a way to convert all those "strange" digits to the "0" through "9" range, namely by calling the FoldString function with the MAP_FOLDDIGITS flag.

    (I put the word "strange" in quotation marks because of course they aren't strange at all. Just different.)

    This converts digits but doesn't help with decimal points, so you still have to deal with correctly interpreting "1,500" as either "one thousand five hundred" (as it would be in the United States) or "one and a half" (as it would be in most of Europe). For that, you need to call GetLocaleInfo to get the LOCAL_SDECIMAL and LOCAL_STHOUSAND strings.

Page 2 of 4 (36 items) 1234