Under The Hood - Matt Pietrek

Fun with Iterators and state machines

I recently was given a piece of C# code with statements like:

 

yield return value;

 

and

 

yield break;

 

This was the beginning of my descent into the loopy world of C# 2.0 iterators.  It took me awhile to wrap my head around them, and when I tried to explain them to other team members I got looks of total confusion.  (I admit it, some of us don't keep the C# 2.0 specification on the night stand. :-)  If you're totally familiar with iterators, you can skip this blog entry.  If not, plunge in with me. Also, I'm not trying to give a comprehensive overview of C# iterators.  (For that, see the MSDN article) Rather, I'm focused on the cool compiler and run time magic that makes them possible.

 

<<Disclaimer:  This code is based on the Whidbey Beta 1 version of C#, and various details may change before release.>>

 

Consider the following program:

 

//
// Iterator1.cs - Requires C# 2.0 or later
//
using System;
using System.Collections.Generic;

class SomeContainer
{
     
public IEnumerable<string> MyIterator()
     
{
            yield
return "Foo";

            yield
return "Bar";

            yield
return "Baz";
     
}
}

class foo
{
     
static void  Main()
     
{
            SomeContainer container
= new SomeContainer();
           
           
foreach( string s in container.MyIterator() )
                  Console.WriteLine( s );
     
}
}

 

Compiling the program ( "csc iterator1.cs" ) and running it gives these results:

 

Foo

Bar

Baz

 

If you try to mentally trace the flow of control, you might notice something odd. What the heck does a "yield return" statement do? Looking at the MyIterator method, it seems like there's two possibilities:

 

1) The CPU executes only part of MyIterator() at any given time. At points, control comes back to the method, and it executes another chunk.

or

2) The entire MyIterator method runs to completion, and all the possible return values are put into a temporary collection.

 

It's easy enough to test the first hypothesis.  Let's add some WriteLine calls to MyIterator and see when they appear:

 

//
// Iterator2.cs - Requires C# 2.0 or later
//
using System;
using System.Collections.Generic;

class SomeContainer
{
     
public IEnumerable<string> MyIterator()
     
{
            Console.WriteLine(
"Entering MyIterator" );

            yield
return "Foo";
            Console.WriteLine(
"After Foo" );

            yield
return "Bar";
            Console.WriteLine(
"After Bar" );

            yield
return "Baz";
            Console.WriteLine(
"After Baz" );
     
}
}

class foo
{
     
static void  Main()
     
{
            SomeContainer container
= new SomeContainer();
           
           
foreach( string s in container.MyIterator() )
                  Console.WriteLine( s );
     
}
}

 

Compiling the new program ( "csc iterator2.cs" ) and running it gives these results:

 

Entering MyIterator

Foo

After Foo

Bar

After Bar

Baz

After Baz

 

Hmm...   Based on these results, it seems that the execution path definitely bounces back and forth between the Main() method and the MyIterator() method.  What the heck is going on here?

 

Since I work on the Visual Studio team, I could have asked a few questions, but some times it's more fun to figure things out on your own.  Luckily, with a tool like ILDASM, it's easy enough to discern what's happening.

 

First, let's recompile with Iterator2.cs with optimizations ( "CSC /O+ iterator2.cs" )  and see what ILDasm shows us:

___[MOD] iterator2.exe
| M A N I F E S T
|___[CLS] SomeContainer
| | .class private auto ansi beforefieldinit
| |___[CLS] <MyIterator>d__0
| | | .class nested private auto ansi sealed beforefieldinit
| | | implements class [mscorlib]System.Collections.Generic.'IEnumerable`1'<string>
| | | implements [mscorlib]System.Collections.IEnumerable
| | | implements class [mscorlib]System.Collections.Generic.'IEnumerator`1'<string>
| | | implements [mscorlib]System.Collections.IEnumerator
| | | implements [mscorlib]System.IDisposable
| | | .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) ...
| | |___[FLD] <>1__state : private int32
| | |___[FLD] <>2__current : private string
| | |___[FLD] <>4__this : public class SomeContainer
| | |___[MET] .ctor : void(int32)
| | |___[MET] MoveNext : bool()
| | | System.Collections.Generic.IEnumerable<System.String>.GetEnumerator : class [mscorlib]System.Collections.Generic.'IEnumerator`1'<string>()
| | | System.Collections.Generic.IEnumerator<System.String>.get_Current : string()
| | |___[MET] System.Collections.IEnumerable.GetEnumerator : class [mscorlib]System.Collections.IEnumerator()
| | |___[MET] System.Collections.IEnumerator.Reset : void()
| | |___[MET] System.Collections.IEnumerator.get_Current : object()
| | |___[MET] System.IDisposable.Dispose : void()
| | |___[PTY] 'System.Collections.Generic.IEnumerator<System.String>.Current' : instance string()
| | |___[PTY] System.Collections.IEnumerator.Current : instance object()
| |
| |___[MET] .ctor : void()
| | MyIterator : class [mscorlib]System.Collections.Generic.'IEnumerable`1'<string>()
|
|___[CLS] foo
| | .class private auto ansi beforefieldinit
| |___[MET] .ctor : void()
| |___[STM] Main : void()
|

 For our very simple SomeContainer class, the C# compiler sure generated a lot of methods and fields.  Particularly are the variables <>1__state and <>2__current.  They sound suspiciously like parts of a state machine.  A state machine could explain how just portions of a method execute.

 

Let's dig a little deeper. Expand the MoveNext() method and examine the disassembly (which I've added a few extra lines to in order to increase readability):

 

.method private hidebysig newslot virtual final
        instance bool  MoveNext() cil managed
{
.override [mscorlib]System.Collections.IEnumerator::MoveNext
.override  method instance bool class [mscorlib]System.Collections.Generic.'IEnumerator`1'<string>::MoveNext()
// Code size       164 (0xa4)
.maxstack  2
.locals init (int32 V_0)
IL_0000:  ldarg.0
IL_0001:  ldfld      int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0006:  stloc.0
IL_0007:  ldloc.0
IL_0008:  switch     (
                      IL_0022,
                      IL_0047,
                      IL_006c,
                      IL_0091)
IL_001d:  br          IL_00a2

IL_0022:  ldarg.0
IL_0023:  ldc.i4.m1
IL_0024:  stfld      int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0029:  ldstr      "Entering MyIterator"
IL_002e:  call       void [mscorlib]System.Console::WriteLine(string)
IL_0033:  ldarg.0
IL_0034:  ldstr      "Foo"
IL_0039:  stfld      string SomeContainer/'<MyIterator>d__0'::'<>2__current'
IL_003e:  ldarg.0
IL_003f:  ldc.i4.1
IL_0040:  stfld      int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0045:  ldc.i4.1
IL_0046:  ret

IL_0047:  ldarg.0
IL_0048:  ldc.i4.m1
IL_0049:  stfld      int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_004e:  ldstr      "After Foo"
IL_0053:  call       void [mscorlib]System.Console::WriteLine(string)
IL_0058:  ldarg.0
IL_0059:  ldstr      "Bar"
IL_005e:  stfld      string SomeContainer/'<MyIterator>d__0'::'<>2__current'
IL_0063:  ldarg.0
IL_0064:  ldc.i4.2
IL_0065:  stfld      int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_006a:  ldc.i4.1
IL_006b:  ret

IL_006c:  ldarg.0
IL_006d:  ldc.i4.m1
IL_006e:  stfld      int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0073:  ldstr      "After Bar"
IL_0078:  call       void [mscorlib]System.Console::WriteLine(string)
IL_007d:  ldarg.0
IL_007e:  ldstr      "Baz"
IL_0083:  stfld      string SomeContainer/'<MyIterator>d__0'::'<>2__current'
IL_0088:  ldarg.0
IL_0089:  ldc.i4.3
IL_008a:  stfld      int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_008f:  ldc.i4.1
IL_0090:  ret

IL_0091:  ldarg.0
IL_0092:  ldc.i4.m1
IL_0093:  stfld      int32 SomeContainer/'<MyIterator>d__0'::'<>1__state'
IL_0098:  ldstr      "After Baz"
IL_009d:  call       void [mscorlib]System.Console::WriteLine(string)
IL_00a2:  ldc.i4.0
IL_00a3:  ret
} // end of method '<MyIterator>d__0'::MoveNext

The listing breaks up into five distinct blocks.  Some of the key highlights include:

 

·                     IL_0000 - IL_001D: Load the value of the <>1__state field, and branch to one of  five different blocks in the code.  This is consistent with what a state machine will do.

·                     IL_0022 - IL_0024: Store the value "-1" into <>1__state.

·                     IL_0029 - IL_002E: Print "Entering MyIterator"

·                     IL_0033 - IL_0039: Store the string value "Foo" into the <>2__current field.

·                     IL_003E - IL_0040: Store the value "1" into <>1__state.

·                     IL_0045 - IL_0046: Make the method exit with a return value of 1.

 

Looking at another block (starting at IL_0047) we see a similar sequence of instructions.  The most key difference is at IL_0063 - IL_0065, where the <>1__state is set to '2", rather than "1".   In yet another block it's set to "3".  In short, the <>1__state field is a numeric value that keeps track of which state the MoveNext() method is in.  Each time MoveNext() is called, control begins at the top of the code, and branches to the appropriate handler block for that state. At the end of that block, <>1__state is updated to the next state.

 

What I've shown here is a rather simple example of C# iterators and how they're supported by rather sophisticated run time support which takes your code and creates a state machine around it.

 

If you think you've got iterators down, try wrapping your head around this slightly more complicated example:

 

//
// Iterator3.cs - Requires C# 2.0 or later
//
using System;
using System.Collections.Generic;

class SomeContainer
{
     
public IEnumerable<string> MyIterator()
     
{
            Console.WriteLine(
"Entering MyIterator" );

           
for ( int i = 0; i < 10; i++ )
           
{
                  yield
return i.ToString();
           
}
     
}
}

class foo
{
     
static void  Main()
     
{
            SomeContainer container
= new SomeContainer();
           
           
foreach( string s in container.MyIterator() )
                  Console.WriteLine( s );
     
}
}

 

Compile it with "csc iterator3.cs" and run the executable.  Pretty nifty, eh?

 

Next time, I'll show a very cool way of using C#'s built in state machine support to do idle time processing.  For instance, the Visual Studio IDE has a variety of code that needs to execute while the IDE is idle.  Using C# iterators, it's fairly easy to accomplish this.

Published Monday, July 26, 2004 2:04 PM by Matt Pietrek

Comments

 

Sanin Saracevic said:

This seems like a higher-level construct of a "standard" NT Fiber. Am I missing something?
July 26, 2004 3:12 PM
 

R. Andrew Lamonica said:

Does the method have to return "IEnumerable" or can the "yield return" be used to return other values? I can see where the lack of an Iterator would make it hard for the runtime to know when to restart at the begining of the function but it might be a fun trick to play with.

B.T.W. I could just check this but I don't seem to have a version of .NET that supports this construct. Is 2.0 Beta available to the public?
July 26, 2004 3:20 PM
 

Nick Parker said:

There is an article in MSDN covering Coroutines in .NET.

http://msdn.microsoft.com/msdnmag/issues/03/09/coroutinesinnet/
July 26, 2004 3:32 PM
 

Matt Pietrek said:

Sanin: When I first looked at Iterators, I also noticed the similarity to Win32 Fibers. However, Fibers aren't used in the C# implementation.

Andrew: The Whidbey beta can be downloaded: http://lab.msdn.microsoft.com/vs2005/get/default.aspx#express
July 26, 2004 3:46 PM
 

David's blog said:

July 26, 2004 5:32 PM
 

AT said:

An question - Why exceptions try/catch/finaly not allowed with yeild ?


See also:
Fibers, and other mad stuff that you shouldn’t use
http://blogs.msdn.com/greggm/archive/2004/06/07/150298.aspx
July 27, 2004 4:17 AM
 

Mike Taulty's Weblog said:

July 28, 2004 11:03 AM
 

.Net Security Blog said:

August 2, 2004 1:38 PM
 

Another Tired Idea said:

August 2, 2004 4:05 PM
 

Chad Humphries said:

Just thought I'd say thanks for the article. I really enjoyed it!
August 2, 2004 4:07 PM
 

drebin said:

Interesting..... but I don't get it. How or why would you ever use something like this???
August 2, 2004 5:18 PM
 

Amit said:

It seems that from the ILDASM code, the current state is modified _after_ the code in the yield block executes. This means that any construct that will cause the flow of execution to shift before state modification is either not allowed or will cause problems.

Isn't it better to modify the state first and then let the yield block execute?
August 2, 2004 8:54 PM
 

Julian - SerialN6kCom said:

Thanks, for the article.
August 2, 2004 10:48 PM
 

Timothy said:

Why not use collections?
August 3, 2004 1:35 AM
 

Michael Geary said:

It's interesting to compare these new C# iterators with the very similar ones in Python and Ruby. There's less code involved in those languages, making the iterators easier to understand.

To see how they compare, I translated some of the sample C# iterator code to Python and Ruby:

http://mg.to/2004/08/03/iterator-cpr
August 3, 2004 5:31 AM
 

Matt Pietrek said:

Timothy and drebin: The article I linked to in the main text (http://msdn.microsoft.com/msdnmag/issues/04/05/c20/) provides a much better "big picture" view of iterators than I did in this entry.

My short answer about the usefulness of iterators is that they make it much easier to implement your data as .NET collections that work with "foreach". In the past, implementing the code to support enumerating your data could be quite complicated. Iterators also make it very easy to support multiple ways of iterating over your data (e.g., first to last, last to first, every other item, etc...)
August 3, 2004 12:07 PM
 

Roshan James said:

Here are some potentially relevant writeups -

Implementation of Iterators in C#
http://pensieve.thinkingms.com/CommentView,guid,fd10bfa8-1aeb-4353-84c8-cd80e418424f.aspx

Implementation of Closures (Anonymous Methods) in C# 2.0
http://pensieve.thinkingms.com/CommentView,guid,9fe42970-09e3-44e2-a4d0-32d63139351a.aspx

These might be relevant too -

Iterators in Ruby
http://pensieve.thinkingms.com/CommentView,guid,dd0e61ef-6fb1-4156-9d16-81c20a6aa871.aspx

Warming up to using Iterators
http://pensieve.thinkingms.com/CommentView,guid,02da61dd-1e35-4a36-b46b-aa3a605b2ad6.aspx

The ‘Big Deal’ about Iterators
http://pensieve.thinkingms.com/CommentView,guid,d355277d-a942-4a50-aa03-d68413fd459c.aspx


CLR 2.0 Closures and Environment classes that capture arity
http://pensieve.thinkingms.com/CommentView,guid,92e51a8a-4d8d-4401-beb2-8b68a019d4bb.aspx


Matt, its surprising that a hardcore systems programming person like you is writing about implementation of managed compilers, very nice :)

cheers
Roshan James
August 4, 2004 3:56 AM
 

Paul Ballard said:

Congratulations, this blog has been featured on TheServerSide.NET
August 5, 2004 2:29 PM
 

Peli's Blog said:

August 10, 2004 11:17 AM
 

phool.org said:

August 17, 2004 2:01 PM
 

Peli's Blog said:

August 18, 2004 3:20 AM
 

Flier Lu said:

Ping Back来自:blog.csdn.net
August 22, 2004 11:28 AM
 

CCR 101 - Part 9: Iterators &laquo; Iodyne&#8217;s Weblog said:

July 13, 2008 6:35 AM
New Comments to this post are disabled

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker