August, 2008

  • The Old New Thing

    The gradual erosion of the car trip experience

    • 38 Comments

    How will kids today learn to get along with their siblings? I just learned that another of the basic childhood conflict resolution scenarios has disappeared, thanks to the dual-screen DVD player and entertainment system for your car, so each kid can remain content without the burden of having to interact with their annoying brother or sister. The traditional car ride games will slowly fade away, replaced with questions like, "Grandma, where's the Nintendo?"

    Why stop there? Why not just equip the car with tranquilizing gas in the back seat? The kids go in, you knock them unconscious, and you wake them up when you arrive at the destination.

    One of my friends told me that as a child, she objected that her brother was looking out her window, a degree of territoriality I was previously not aware of. Her parents naturally ridiculed her for making such a complaint, and I think she turned out okay.

  • The Old New Thing

    The implementation of iterators in C# and its consequences (part 3)

    • 10 Comments

    I mentioned that there was an exception to the general statement that the conversion of an iterator into traditional C# code is something you could have done yourself. That's true, and it was also a pun, because the exception is exception handling.

    If you have a try ... finally block in your iterator, the language executes the finally block under the following conditions:

    • After the last statement of the try block is executed. (No surprise here.)
    • When an exception propagates out of the try block. (No surprise here either.)
    • When execution leaves the try block via yield break.
    • When the iterator is Disposed and the iterator body was trapped inside a try block at the time.

    That last case can occur if somebody decides to abandon the enumerator before it is finished.

    IEnumerable<int> CountTo10()
    {
     try {
      for (int i = 1; i <= 10; i++) {
       yield return i;
      }
     } finally {
      System.Console.WriteLine("finally");
     }
    }
    
    foreach (int i in CountTo10()) {
     System.Console.WriteLine(i);
     if (i == 5) break;
    }
    

    This code fragment prints "1 2 3 4 5 finally".

    If you think about it, this behavior is completely natural. You want the finally block to execute when the try block is finished executing, either by normal or abnormal means. Although control leaves the try block during the yield return, it comes back when the caller asks for the next item from the enumerator, so execution of the try block isn't finished yet. The try is finished executing after the last statement completes, an exception is thrown past it, or execution is abandoned when the enumerator is prematurely destroyed.

    And this is exactly what you want when you use the finally block to clean up resources used by the try block.

    Now, technically, you can write this yourself without using iterators, but it's pretty ugly. You'll need more internal state variables to keep track of whether the try block is still active and whether the exit of the try block is temporary (due to yield return) or permanent. It's a real pain in the neck, however, so you probably are better off letting the compiler do the work for you.

  • The Old New Thing

    The implementation of iterators in C# and its consequences (part 2)

    • 49 Comments

    Now that you have the basic idea behind iterators under your belt, you can already answer some questions on iterator usage. Here's a scenario based on actual events:

    I have an iterator that is rather long and complicated, so I'd like to refactor it. For illustrative purposes, let's say that the enumerator counts from 1 to 100 twice. (In real life, of course, the iterator will not be this simple.)

    IEnumerable<int> CountTo100Twice()
    {
     int i;
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    As we learned in Programming 101, we can pull common code into a subroutine and call the subroutine. But when I do this, I get a compiler error:

    IEnumerable<int> CountTo100Twice()
    {
     CountTo100();
     CountTo100();
    }
    
    void CountTo100()
    {
     int i;
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    What am I doing wrong? How can I move the "count to 100" into a subroutine and call it twice from the CountTo100Twice function?

    As we saw last time, iterators are not coroutines. The technique above would have worked great had we built iterators out of, say, fibers instead of building them out of state machines. As state machines, all yield return statements must occur at the "top level". So how do you iterate with the help of subroutines?

    You make the subroutine its own iterator and suck the results out from the main function:

    IEnumerable<int> CountTo100Twice()
    {
     foreach (int i in CountTo100()) yield return i;
     foreach (int i in CountTo100()) yield return i;
    }
    
    IEnumerable<int> CountTo100()
    {
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    Exercise: Consider the following fragment:

     foreach (int i in CountTo100Twice()) {
      ...
     }
    

    Explain what happens on the 150th call to MoveNext() in the above loop. Discuss its consequences for recursive enumerators (such as tree traversal).

  • The Old New Thing

    The unwritten rule of riding a Seattle Metro bus

    • 62 Comments

    The Metro King County transit site has all the facts about how to ride the bus, but there's another rule that is applied by convention rather than by any formal codification:

    For some reason, and I see this only in Seattle, it is customary to say Thank you to the bus driver as you get off the bus.

    Tip for new riders: If you aren't familiar with the area, you can ask the bus driver to announce the stop you intend to get off at.

    Bonus tip for bicyclists: There is an experimental program running this summer to allow bicyclists to ride out-of-service buses across the 520 bridge for free. You have to get on at Evergreen Point and get off at Montlake (or vice versa). This is in addition to the existing policy of allow bicyclists to ride out-of-service buses across the bridge (paying the normal fare), eastbound as far as 51st St, assuming the bus is heading that way. (This bonus tip is not that helpful to Microsoft employees, who already get free bus passes and a special bike shuttle, but I figured I'd toss it out there.)

  • The Old New Thing

    The implementation of iterators in C# and its consequences (part 1)

    • 33 Comments

    Like anonymous methods, iterators in C# are very complex syntactic sugar. You could do it all yourself (after all, you did have to do it all yourself in earlier versions of C#), but the compiler transformation makes for much greater convenience.

    The idea behind iterators is that they take a function with yield return statements (and possible some yield break statements) and convert it into a state machine. When you yield return, the state of the function is recorded, and execution resumes from that state the next time the iterator is called upon to produce another object.

    Here's the basic idea: All the local variables of the iterator (treating iterator parameters as pre-initialized local variables, including the hidden this parameter) become member variables of a helper class. The helper class also has an internal state member that keeps track of where execution left off and an internal current member that holds the object most recently enumerated.

    class MyClass {
     int limit = 0;
     public MyClass(int limit) { this.limit = limit; }
    
     public IEnumerable<int> CountFrom(int start)
     {
      for (int i = start; i <= limit; i++) {
       yield return i;
      }
     }
    }
    

    The CountFrom method produces an integer enumerator that spits out the integers starting at start and continuing up to and including limit. The compiler internally converts this enumerator into something like this:

     class MyClass_Enumerator : IEnumerable<int> {
      int state$0 = 0;// internal member
      int current$0;  // internal member
      MyClass this$0; // implicit parameter to CountFrom
      int start;      // explicit parameter to CountFrom
      int i;          // local variable of CountFrom
    
      public int Current {
       get { return current$0; }
      }
    
      public bool MoveNext()
      {
       switch (state$0) {
       case 0: goto resume$0;
       case 1: goto resume$1;
       case 2: return false;
       }
    
     resume$0:;
       for (i = start; i <= this$0.limit; i++) {
        current$0 = i;
        state$0 = 1;
        return true;
     resume$1:;
       }
    
       state$0 = 2;
       return false;
      }
      ... other bookkeeping, not important here ...
     }
    
     public IEnumerable<int> CountFrom(int start)
     {
      MyClass_Enumerator e = new MyClass_Enumerator();
      e.this$0 = this;
      e.start = start;
      return e;
     }
    

    The enumerator class is auto-generated by the compiler and, as promised, it contains two internal members for the state and current object, plus a member for each parameter (including the hidden this parameter), plus a member for each local variable. The Current property merely returns the current object. All the real work happens in MoveNext.

    To generate the MoveNext method, the compiler takes the code you write and performs a few transformations. First, all the references to variables and parameters need to be adjusted since the code moved to a helper class.

    • this becomes this$0, because inside the rewritten function, this refers to the auto-generated class, not the original class.
    • m becomes this$0.m when m is a member of the original class (a member variable, member property, or member function). This rule is actually redundant with the previous rule, because writing the name of a class member m without a prefix is just shorthand for this.m.
    • v becomes this.v when v is a parameter or local variable. This rule is actually redundant, since writing v is the same as this.v, but I call it out explicitly so you'll notice that the storage for the variable has changed.

    The compiler also has to deal with all those yield return statements.

    • Each yield return x becomes
       current$0 = x;
       state$0 = n;
       return true;
      resume$n:;
      

      where n is an increasing number starting at 1.

    And then there are the yield break statements.

    • Each yield break becomes
       state$0 = n2;
       return false;
      
      where n2 is one greater than the highest state number used by all the yield return statements. Don't forget that there is also an implied yield break at the end of the function.

    Finally, the compiler puts the big state dispatcher at the top of the function.

    • At the start of the function, insert
      switch (state$0) {
      case 0: goto resume$0;
      case 1: goto resume$1;
      case 2: goto resume$2;
      ...
      case n: goto resume$n;
      case n2: return false;
      }
      

      with one case statement for each state, plus the initial zero state and the final n2 state.

    Notice that this transformation is quite different from the enumeration model we built based on coroutines and fibers. The C# method is far more efficient in terms of memory usage since it doesn't consume an entire stack (typically a megabyte in size) like the fiber approach does. Instead it just borrows the stack of the caller, and anything that it needs to save across calls to MoveNext are stored in a helper object (which goes on the heap rather than the stack). This fake-out is normally quite effective—most people don't even realize that it's happening—but there are places where the difference is significant, and we'll see that shortly.

    Exercise: Why do we need to write state$0 = n2; and add the case n2: return false;? Why can't we just transform each yield break into return false; and stop there?

  • The Old New Thing

    Psychic debugging: Why can't StreamReader read apostrophes from a text file?

    • 24 Comments

    As is customary, the first day of CLR Week is a warm-up. Actually, today's question is a BCL question, not a CLR question, but only the nitpickers will bother to notice.

    Can somebody explain why StreamReader can’t read apostrophes? I have a text file, and I read from it the way you would expect:

    StreamReader sr = new StreamReader("myfile.txt");
    Console.WriteLine(sr.ReadToEnd());
    sr.Close();
    

    I expect this to print the contents of the file to the console, and it does—almost. Everything looks great except that all the apostrophes are gone!

    You don't have to have very strong psychic powers to figure this one out.

    Here's a hint: In some versions of this question, the problem is with accented letters.

    Your first psychic conclusion is that the text file is probably an ANSI text file. But StreamReader defaults to UTF-8, not ANSI. One version of this question actually came right out and asked, "Why can't StreamReader read apostrophes from my ANSI text file?" The alternate version of the question already contains a false hidden assumption: StreamReader can't read apostrophes from an ANSI text file because StreamReader (by default) doesn't read ANSI text files at all!

    But that shouldn't be a factor, since the apostrophe is encoded the same in ANSI and UTF-8, right?

    That's your second clue. Only the apostrophe is affected. What's so special about the apostrophe? (The bonus hint should tip you off: What's so special about accented letters? What property do they share with the apostrophe?)

    There are apostrophes and there are apostrophes, and it's those "weird" apostrophes that are the issue here. Code points U+2018 (‘) and U+2019 (’) occupy positions 0x91 and 0x92, respectively, in code page 1252, and these "weird" apostrophes are all illegal lead bytes in UTF-8 encoding. And the default behavior for the Encoding.UTF8Encoding encoding is to ignore invalid byte sequences. Note that StreamReader does not raise an exception when incorrectly-encoded text is encountered. It just ignores the bad byte and continues as best it can, following Burak's advice.

    Result: StreamReader appears to ignore apostrophes and accented letters.

    There are therefore multiple issues here. First, you may want to look at why your ANSI text file is using those weird apostrophes. Maybe it's intentional, but I suspect it isn't. Second, if you're going to be reading ANSI text, you can't use a default StreamReader, since a default StreamReader doesn't read ANSI text. You need to set the encoding to System.Text.Encoding.Default if you want to read ANSI text. And third, why are you using ANSI text in the first place? ANSI text files are not universally transportable, since the ANSI code page changes from system to system. Shouldn't you be using UTF-8 text files in the first place?

    At any rate, the solution is to decide on an encoding and to specify that encoding when creating the StreamReader.

    This exercise is just another variation on Keep your eye on the code page.

  • The Old New Thing

    Supplementary reading on the subject of anonymous functions and other CLR topics

    • 1 Comments

    Welcome to CLR Week 2008. I'm going to mix it up and start with a link listing.

    Other CLR topics:

  • The Old New Thing

    Raymond rewrites newspaper headlines

    • 13 Comments

    Original headline: Monorail out of service this week.

    Raymond's headline: Monorail out of service this week: 4 people inconvenienced.

  • The Old New Thing

    If you return FALSE from DLL_PROCESS_ATTACH, will you get a DLL_PROCESS_DETACH?

    • 28 Comments

    If you return FALSE from DLL_PROCESS_ATTACH, will you get a DLL_PROCESS_DETACH?

    Yes.

    No.

    ...

    Yes.

    All three answers are correct, for different formulations of the question.

    From the kernel's point of view, the answer is a simple Yes. If a DLL's entry point returns FALSE to the DLL_PROCESS_ATTACH notification, it will receive a DLL_PROCESS_DETACH notification.

    However, most C and C++ programs do not use the raw DLL entry point. Instead, they use the C runtime entry point, which will have a name something like DllMainCRTStartup. That entry point function does work to manage the C runtime library and calls your entry point (which you've probably called DllMain) to see what you think.

    If you compiled your program prior to around 2002 and your DllMain function returns FALSE in response to the DLL_PROCESS_ATTACH notification, then the C runtime code says, "Oh, well, I guess I'm not running after all" and shuts itself down. When the kernel calls the C runtime entry point with the DLL_PROCESS_DETACH notification, the C runtime says, "Oh, I'm already shut down, thanks for asking" and returns immediately, which means that your entry point is not called with the DLL_PROCESS_DETACH notification. In other words, if you wrote your program prior to around 2002, the answer is No.

    Sometime in 2002 or maybe 2003, the C runtime folks changed the behavior. If your DllMain function returns FALSE in response to the DLL_PROCESS_ATTACH notification, you will nevertheless get the DLL_PROCESS_DETACH notification. In other words, if you wrote your program after around 2002 or maybe 2003, then the answer is Yes. Why change? Maybe they wanted to match the kernel behavior more closely, maybe they considered their previous behavior a bug. You'll have to ask them.

    What does this mean for you, the programmer? Some people may look at this and conclude, "Well, now that I know how each of the specific scenarios works, I can rely on knowing the behavior that results from the scenario I'm in. For example, since I'm using Visual Studio 2008, the answer is Yes." But I think that's the wrong conclusion, because you usually do not have total control over how your program is compiled and linked. You may share your code with another project, and that other project may not know that you are relying on the behavior of a specific version of Visual Studio 2008; they will compile your program with Borland C++ version 5.5,¹ and now your program is subtly broken. My recommendation is to write your DllMain function so that it works correctly regardless of which scenario it ends up used in. (And since you shouldn't be doing much in your DllMain function anyway, this shouldn't be too much of a burden.)

    Footnote

    ¹I do not know what the behavior of Borland C++ version 5.5 is with respect to returning FALSE from DllMain. I didn't feel like doing the research to find a compiler whose behavior is different from Visual Studio 2008, so I just picked one at random. I have a 50/50 chance of being right.

  • The Old New Thing

    If the law says you can't file a petition, you might need to file it anyway, in case somebody later says that you should've even though the law says you couldn't

    • 9 Comments

    It sounds like a scene from the movie Brazil, but in fact it's the law.

    Let's rewind a bit. The introduction is a bit technical, but I'll try to keep it short.

    There is a legal filing known as a habeas petition and another known as a petition for review. There are rules regarding what each one covers and the deadlines for filing them. Prior to 2005, there was no deadline for habeas petitions, but you had to file your petition for review within 30 days of whatever it was the government did that you wanted to object to. In 2005, Congress passed (and the President signed) a law which recategorizes what the two types of filings covered, and many claims that had fallen under the habeas petition have been reclassified as requiring a petition for review instead.

    This change in the rules creates a gap in coverage because Congress forgot to include a grandfather clause (or, for computer geeks, a "smooth migration plan"): What if, at the time the new law took effect, the thing you want to complain about was reclassified as requiring a petition for review, but it took place more than 30 days ago? You wake up one morning and somebody tells you, "Hey, there's a new law today. You just lost your right to respond."

    What you do, then, is file a lawsuit challenging the new rules. And then two years later, the Third Circuit Court hears your case and rules that, yes, you're right, the law that Congress passed is unconstitutional (a violation of Section 9, Clause 2, known commonly as the Suspension Clause) because it denied you the opportunity to file a claim.

    And now here's the weird part.

    Instead of saying, "And therefore we strike down this part of the law as unconstitutional," the court says "And therefore we will retroactively rewrite the law so it is constitutional again (saving Congress and the President the trouble of having to do it themselves), and oh lookie here, according to this new law we just made up, you did have a constitutionally guaranteed opportunity to file your petition, but it expired two years ago."

    In other words, what you should have done in 2005 was hire a psychic, who would then instruct you to spend thousands of dollars hiring an attorney to draft and file a petition which, according to the law, you were legally barred from filing, in anticipation of the courts (two years later) rewriting the law in order to make that filing legal again. And then when you file your petition, you have to convince the court to accept it, explaining that yes, I know that I cannot legally file this petition, but a psychic told me to do it.

    You can read the court's decision yourself. (Despite the connotations associated with the term legalese, court decisions are actually quite readable. You just have to skip over the complicated footnotes.)

Page 3 of 4 (37 items) 1234