August, 2008

  • The Old New Thing

    The implementation of iterators in C# and its consequences (part 1)

    • 33 Comments

    Like anonymous methods, iterators in C# are very complex syntactic sugar. You could do it all yourself (after all, you did have to do it all yourself in earlier versions of C#), but the compiler transformation makes for much greater convenience.

    The idea behind iterators is that they take a function with yield return statements (and possible some yield break statements) and convert it into a state machine. When you yield return, the state of the function is recorded, and execution resumes from that state the next time the iterator is called upon to produce another object.

    Here's the basic idea: All the local variables of the iterator (treating iterator parameters as pre-initialized local variables, including the hidden this parameter) become member variables of a helper class. The helper class also has an internal state member that keeps track of where execution left off and an internal current member that holds the object most recently enumerated.

    class MyClass {
     int limit = 0;
     public MyClass(int limit) { this.limit = limit; }
    
     public IEnumerable<int> CountFrom(int start)
     {
      for (int i = start; i <= limit; i++) {
       yield return i;
      }
     }
    }
    

    The CountFrom method produces an integer enumerator that spits out the integers starting at start and continuing up to and including limit. The compiler internally converts this enumerator into something like this:

     class MyClass_Enumerator : IEnumerable<int> {
      int state$0 = 0;// internal member
      int current$0;  // internal member
      MyClass this$0; // implicit parameter to CountFrom
      int start;      // explicit parameter to CountFrom
      int i;          // local variable of CountFrom
    
      public int Current {
       get { return current$0; }
      }
    
      public bool MoveNext()
      {
       switch (state$0) {
       case 0: goto resume$0;
       case 1: goto resume$1;
       case 2: return false;
       }
    
     resume$0:;
       for (i = start; i <= this$0.limit; i++) {
        current$0 = i;
        state$0 = 1;
        return true;
     resume$1:;
       }
    
       state$0 = 2;
       return false;
      }
      ... other bookkeeping, not important here ...
     }
    
     public IEnumerable<int> CountFrom(int start)
     {
      MyClass_Enumerator e = new MyClass_Enumerator();
      e.this$0 = this;
      e.start = start;
      return e;
     }
    

    The enumerator class is auto-generated by the compiler and, as promised, it contains two internal members for the state and current object, plus a member for each parameter (including the hidden this parameter), plus a member for each local variable. The Current property merely returns the current object. All the real work happens in MoveNext.

    To generate the MoveNext method, the compiler takes the code you write and performs a few transformations. First, all the references to variables and parameters need to be adjusted since the code moved to a helper class.

    • this becomes this$0, because inside the rewritten function, this refers to the auto-generated class, not the original class.
    • m becomes this$0.m when m is a member of the original class (a member variable, member property, or member function). This rule is actually redundant with the previous rule, because writing the name of a class member m without a prefix is just shorthand for this.m.
    • v becomes this.v when v is a parameter or local variable. This rule is actually redundant, since writing v is the same as this.v, but I call it out explicitly so you'll notice that the storage for the variable has changed.

    The compiler also has to deal with all those yield return statements.

    • Each yield return x becomes
       current$0 = x;
       state$0 = n;
       return true;
      resume$n:;
      

      where n is an increasing number starting at 1.

    And then there are the yield break statements.

    • Each yield break becomes
       state$0 = n2;
       return false;
      
      where n2 is one greater than the highest state number used by all the yield return statements. Don't forget that there is also an implied yield break at the end of the function.

    Finally, the compiler puts the big state dispatcher at the top of the function.

    • At the start of the function, insert
      switch (state$0) {
      case 0: goto resume$0;
      case 1: goto resume$1;
      case 2: goto resume$2;
      ...
      case n: goto resume$n;
      case n2: return false;
      }
      

      with one case statement for each state, plus the initial zero state and the final n2 state.

    Notice that this transformation is quite different from the enumeration model we built based on coroutines and fibers. The C# method is far more efficient in terms of memory usage since it doesn't consume an entire stack (typically a megabyte in size) like the fiber approach does. Instead it just borrows the stack of the caller, and anything that it needs to save across calls to MoveNext are stored in a helper object (which goes on the heap rather than the stack). This fake-out is normally quite effective—most people don't even realize that it's happening—but there are places where the difference is significant, and we'll see that shortly.

    Exercise: Why do we need to write state$0 = n2; and add the case n2: return false;? Why can't we just transform each yield break into return false; and stop there?

  • The Old New Thing

    The implementation of iterators in C# and its consequences (part 2)

    • 49 Comments

    Now that you have the basic idea behind iterators under your belt, you can already answer some questions on iterator usage. Here's a scenario based on actual events:

    I have an iterator that is rather long and complicated, so I'd like to refactor it. For illustrative purposes, let's say that the enumerator counts from 1 to 100 twice. (In real life, of course, the iterator will not be this simple.)

    IEnumerable<int> CountTo100Twice()
    {
     int i;
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    As we learned in Programming 101, we can pull common code into a subroutine and call the subroutine. But when I do this, I get a compiler error:

    IEnumerable<int> CountTo100Twice()
    {
     CountTo100();
     CountTo100();
    }
    
    void CountTo100()
    {
     int i;
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    What am I doing wrong? How can I move the "count to 100" into a subroutine and call it twice from the CountTo100Twice function?

    As we saw last time, iterators are not coroutines. The technique above would have worked great had we built iterators out of, say, fibers instead of building them out of state machines. As state machines, all yield return statements must occur at the "top level". So how do you iterate with the help of subroutines?

    You make the subroutine its own iterator and suck the results out from the main function:

    IEnumerable<int> CountTo100Twice()
    {
     foreach (int i in CountTo100()) yield return i;
     foreach (int i in CountTo100()) yield return i;
    }
    
    IEnumerable<int> CountTo100()
    {
     for (i = 1; i <= 100; i++) {
      yield return i;
     }
    }
    

    Exercise: Consider the following fragment:

     foreach (int i in CountTo100Twice()) {
      ...
     }
    

    Explain what happens on the 150th call to MoveNext() in the above loop. Discuss its consequences for recursive enumerators (such as tree traversal).

  • The Old New Thing

    It's not Christmas: Nobody enjoys unwrapping your present

    • 142 Comments

    I don't know why it happens, but it happens with disturbing frequency. A customer wants to report a problem, and then illustrate it with a screenshot or two, but instead of attaching the screenshots, they paste the screenshots inside a Word document (and for some reason it's always Word) and then attach the Word document.

    It's not a Christmas present. People aren't going to say "Wow, I wonder what's inside? I'm brimming with anticipation!" They're going to say, "Oh great, I can't even see the screen shot. I have to download the attachment, scan it for viruses, then load it into Word. Oh wait, this is a Word 2007 document and I only have Word 2003; let me run the converter first. Okay good, now I can open the document to see, oh, look, it's a picture." Most people won't bother. And then you're going to wonder why nobody answered your first message.

    If you insist on attaching the pictures, just attach them directly. And use a compressed image format like JPG or PNG, please. Don't send uncompressed screenshots; they are ridiculously huge. Cropping the image to the relevant portion of the screen helps, too. (This is very easy to do with the Snipping Tool.)

    In March of this year, a customer wrote, "I have attached a Word document that describes the problem." (Hey, here's an idea: Why not describe the problem in your email message?)

    The Word document contained a screenshot.

    The screenshot was of an email message.

    The email message contained a screenshot.

    Bonus remark from the customer liaison: "Once you open the document, you may need to zoom it further to read it."

    Wooden table not included.

  • The Old New Thing

    I warned you: The dangers of attaching input queues

    • 53 Comments

    Some people didn't take to heart my cautions on the subject of attached input queues, item number five on the list of five things every Win32 programmer should know. And then they find that their application stops responding.

    // Code in italics is wrong
    void TryToStealFocus(HWND hwnd)
    {
      // First try plain SetForegroundWindow
      SetForegroundWindow(hwnd);
      HWND hwndFG = GetForegroundWindow();
      if (hwndFG == hwnd) return;
    
      // That didn't work - if the foreground window belongs
      // to another thread, attach to that thread and try again
      DWORD dwCurrentThread = GetCurrentThreadId();
      DWORD dwFGThread = GetWindowThreadProcessId(hwndFG, NULL);
      if (dwFGThread == dwCurrentThread) return;
    
      AttachThreadInput(dwCurrentThread, dwFGThread, TRUE);
      SetForegroundWindow(hwnd); // hangs here
      AttachThreadInput(dwCurrentThread, dwFGThread, FALSE);
    }
    

    Their customer feedback data shows that this function often hangs at the second call to SetForegroundWindow. My exercise for you is to explain why. (Here's someone else with the same problem.)

    (Note that both of these customers are trying to circumvent the foreground lock timeout so that they can steal focus and shove a dialog box in the user's face.)

  • The Old New Thing

    The caret serves as the continuation character for batch files

    • 12 Comments

    We saw earlier that the caret is the escape character for the batch language. In a comment to that article, KJK::Hyperion mentioned that the caret serves as the line continuation character. A useful tip if you still find yourself messing with batch files. Mark Yocum elaborates on this point a bit more.

  • The Old New Thing

    Microspeak: The long pole

    • 23 Comments

    The long pole is the part of the project that is on the critical path due to its length. For example, if you have a project that consists of three independent sub-projects, then the sub-project with the longest completion date is the long pole.

    The etymology of this term is simultaneously obvious yet hard to pin down. Intuitively, the long pole is the one that determines the height: If you have a tent supported by three poles, then the long pole decides how tall the tent is. If you have a collection of poles you want to put in the back of a pick-up truck, the long pole is the one that decides how big a bed you need. If you have a set of poles and you hold them in your hand in a sheaf and rest the bottoms on the ground, then the long pole is the one that sticks up the highest.

    It is my impression that the tent analogy is the one that provides the source for this bit of Microspeak, but I'm not absolutely sure. If true, it's a nice bit of double-wordplay, because not only is it an analogy, but it's also a pun: The long pole is the one holding everything up.

    You don't want to be a long pole in your project, because that just means that everybody will be giving your component extra scrutiny to make sure it's not going to make everything late.

    Here are some citations.

    Design on track, but setup work appears to be long pole.
    XYZ work is long pole in the schedule; not clear whether issue can be mitigated.
    XYZ is still the long pole, now out [i.e., over schedule] by about six days, down from eight days last week after the ABC work was offloaded to UVW.
  • The Old New Thing

    The implementation of iterators in C# and its consequences (part 4)

    • 28 Comments

    You can breathe a sigh of relief. Our long national nightmare is over: this is the end of CLR Week 2008. We wind down with a look back at iterators.

    Michael Entin points out that you can use C# iterators to make asynchronous code easier to write. You can use C# iterators for more than simply iterating.

    The automatic conversion of straight line code into a state machine is handy when you want an easy way to write, well, a state machine. It's one of those things that's blindingly obvious once you look at it the right way.

    The transformation that the yield return statement induces on your function turns it from a boring function into an implicit state machine: When you execute a yield return, execution of your function is suspended until somebody asks your iterator the next item, at which point execution resumes at the statement after the yield return. This is exactly what you want when breaking a synchronous function into asynchronous pieces: Each time you would normally block on an operation, you instead perform a yield return, and when the operation completes, you call the MoveNext method, which resumes execution of the function until the next time it needs to wait for something and performs a yield return.

    It's so simple it's magic.

    Additional iterator-related reading:

  • The Old New Thing

    The implementation of iterators in C# and its consequences (part 3)

    • 10 Comments

    I mentioned that there was an exception to the general statement that the conversion of an iterator into traditional C# code is something you could have done yourself. That's true, and it was also a pun, because the exception is exception handling.

    If you have a try ... finally block in your iterator, the language executes the finally block under the following conditions:

    • After the last statement of the try block is executed. (No surprise here.)
    • When an exception propagates out of the try block. (No surprise here either.)
    • When execution leaves the try block via yield break.
    • When the iterator is Disposed and the iterator body was trapped inside a try block at the time.

    That last case can occur if somebody decides to abandon the enumerator before it is finished.

    IEnumerable<int> CountTo10()
    {
     try {
      for (int i = 1; i <= 10; i++) {
       yield return i;
      }
     } finally {
      System.Console.WriteLine("finally");
     }
    }
    
    foreach (int i in CountTo10()) {
     System.Console.WriteLine(i);
     if (i == 5) break;
    }
    

    This code fragment prints "1 2 3 4 5 finally".

    If you think about it, this behavior is completely natural. You want the finally block to execute when the try block is finished executing, either by normal or abnormal means. Although control leaves the try block during the yield return, it comes back when the caller asks for the next item from the enumerator, so execution of the try block isn't finished yet. The try is finished executing after the last statement completes, an exception is thrown past it, or execution is abandoned when the enumerator is prematurely destroyed.

    And this is exactly what you want when you use the finally block to clean up resources used by the try block.

    Now, technically, you can write this yourself without using iterators, but it's pretty ugly. You'll need more internal state variables to keep track of whether the try block is still active and whether the exit of the try block is temporary (due to yield return) or permanent. It's a real pain in the neck, however, so you probably are better off letting the compiler do the work for you.

  • The Old New Thing

    Why does Explorer generate a page fault every two seconds?

    • 31 Comments

    If you fire up Task Manager and look at the page fault count, you'll see that even when nothing is going on, Explorer takes a page fault every two seconds. Isn't this bad? I though you weren't supposed to poll.

    Here's an interesting experiment: Change your update speed to High. Wow, the page fault rate quadruples to a page fault every half second. At this point, you should start suspecting some sort of Heisenbehavior, that is, that the behavior of the system is changing due to your act of observing it.

    The page faults are coming from the CPU meter in the notification area. At each update, Task Manager sets a new icon into the notification area, and Explorer resizes it from the default icon size (which is the size of the icon that Task Manager hands it) to the notification icon size. To obtain the best quality image, the taskbar uses the LR_COPYFROMRESOURCE flag. This means that the window manager goes back to taskmgr.exe to locate the best match, which in turn triggers a soft page fault. It's a soft page fault since the information is already in the cache (after all, we access it every two seconds!), so no actual disk access occurs. But it still shows up as a page fault, and that makes some people nervous.

    What could Task Manager do to avoid triggering this false alarm and freaking people out? Well, when it calls Shell_NotifyIcon, it could pass icons that were loaded at the size GetSystemMetrics(SM_CXSMICON) by GetSystemMetrics(SM_CYSMICON). That way, when the notification area makes a copy of the icon, it won't need to be resized since it's already at the correct size.

    Now, there's really nothing wrong with the soft page faults aside from all the time the shell team has to spend explaining to people that nothing is actually wrong. Next time, we'll look at the wrong way of avoiding the soft page faults. Even though it's the wrong way, the exercise is still instructive.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    Microspeak: Pencils down

    • 37 Comments

    I'm particularly fascinated by Microspeak terms which nobody actually knows the meaning of. You can defend jargon by saying that it's a shorthand way of talking in order to improve communication, but if nobody actually knows what it means, the in order to improve communication part is completely turned on its head. The Microspeak that allegedly allows people to communicate better ends up making them communicate worse.

    A colleague of mine introduced me to this new type of Microspeak. Our conversation takes place in an impromptu hallway meeting between the development manager and a few members of the team.

    Team member: "What does this mean for our milestone? I'm assuming it means that our features are code complete."

    Development manager: It means you are pencils down.

    Team member: (confused) "What does 'pencils down' mean?"

    — It means your features are done for the milestone.

    "Where did that term come from?"

    — A meeting with «senior executive».

    "Okay, so what does 'done' mean?"

    — It means you are pencils down.

    "What exactly does that mean?"

    — It means you don't have to write any new code for your features.

    "Oh, okay. So it means code complete."

    — No, that means something different.

    "How are they different?"

    — I don't know.

    "Has anyone sent any email defining what 'pencils down' means?"

    — Not that I know of.

    My colleague has yet to find anybody who can provide a definition of the term pencils down.

    This sort of confirms what my colleague Michael Grier mentioned in a comment: The intended purpose for this jargon is not to communicate with the people who work for you but to impress the people you work for.

Page 1 of 4 (37 items) 1234