y

Before we talk specifically about the yield keyword, let’s review a few constructs you probably use everyday, namely collection classes like lists and arrays.  We’re quite used to traversing these simply with a foreach loop, and what enables us to do so is that these types implement the System.Collections.IEnumerable interface.

IEnumerable is a rather simple interface that requires implementing a single method, GetEnumerator, which returns an object that implements another interface, IEnumerator. IEnumerator, in turn, encapsulates two methods and a property:

MoveNext called when advancing to the next element in the sequence or collection; it returns false when at the end of the collection.
Current obtains the System.Object at the current spot in the collection
Reset reinitializes the list or sequence

 
Now, let’s say we wanted to build a implementation of a class that returns the first n triangular numbers.  One way to do so would be the following:

   1: public class TriNumbers : IEnumerable  
   2:      {   
   3:          private Int32[] _nums;  
   4:          public TriNumbers(Int32 n)   
   5:          {
   6:              if (n <= 0) 
   7:                  throw new System.ArgumentOutOfRangeException();
   8:  
   9:              _nums = new Int32[n];   
  10:              _nums[0] = 1;   
  11:              for (Int32 i = 1; i < n; i++)  
  12:                  _nums[i] = _nums[i-1] + i + 1;  
  13:          } 
  14:         
  15:          public IEnumerator GetEnumerator() 
  16:          {  
  17:              return _nums.GetEnumerator();  
  18:          }  
  19:      }

and then iterate over it with code like this. 

class Program
{
static void Main(string[] args)
{
TriNumbers triNums = new TriNumbers(10);
foreach (var val in triNums)
Console.WriteLine(val);
Console.ReadLine();
}
}

In the implementation of TriNumbers, I sort of ‘cheated’ (line 17) by just deferring the iteration work to the iterator I get for free from the underlying integer array, which is really what’s storing my sequence. 

Not that cheating is wrong in this case, but it does require my array to be populated from the get-go, in the constructor.  Initializing an array of ten elements is no big deal, but what if the argument to the constructor were much larger? and there are many instances of this class in memory?  and what if I only end up iterating over the first couple of values?  Clearly I’m using up memory here that isn’t really needed, and I’m taking time to calculate each element of the sequence, when I’m not really even sure I need them all.  If the calculation was CPU-intensive or required a series of database or services calls to build the list, there could be a lot of wasted cycles dedicated to populating data that might never be requested in the application.

To make the class a little less wasteful up front, I can implement the IEnumerator interface explicitly – something like this:

public class TriNumbers : IEnumerable
{
private TriNumbersEnumerator _enum;
public TriNumbers(Int32 n)
{
_enum = new TriNumbersEnumerator(n);
}

public IEnumerator GetEnumerator()
{
return _enum;
}

private class TriNumbersEnumerator : IEnumerator
{
private Int32 _limit = 0;
private Int32 _index = 0;
private Int32 _value = 0;
public TriNumbersEnumerator(Int32 n)
{
if (n <= 0)
throw new System.ArgumentOutOfRangeException();
_limit = n;
}

public object Current
{
get
{
if (_index == 0)
throw new System.InvalidOperationException(
"Enumeration has not started. Call MoveNext.");
else
return _value;
}
}

public bool MoveNext()
{
_index++;
if (_index <= _limit)
{
_value += _index;
return true;
}
else
{
Reset();
return false;
}
}

public void Reset()
{
_index = 0;
_value = 0;
}
}
}

That works, but there’s an easier way!  Enter yield – not much longer than the original implementation, and certainly less wasteful of space and CPU cycles should the entire sequence not be enumerated.

   1: public class TriNumbers : IEnumerable
   2: {
   3:     private Int32 _limit = 0;
   4:     public TriNumbers(Int32 n)
   5:     {
   6:         if (n <= 0)
   7:             throw new System.ArgumentOutOfRangeException();
   8:         _limit = n;
   9:     }
  10:  
  11:     public IEnumerator GetEnumerator()
  12:     {
  13:         Int32 val = 0;
  14:         for (Int32 i = 0; i < _limit; i++)
  15:         {
  16:             val += i + 1;
  17:             yield return val;
  18:         }
  19:     }
 
The magic is on line 17, with the yield return statement.  Each time this statement is reached, the current value of val is returned as the value of the IEnumerator reference (namely, what the foreach in the calling program will see).  The current location and state of the GetEnumerator method is stored, and so the next time the iterator is called, we’ll get the next value from the for loop in line 14.

While reading IL is not something I do often, it’s interesting to look at what gets generated when using the yield return construct.  Using IL DASM, you can see a newobj call to a class called d__0, essentially indicating that there is a new class being constructed under the covers. IL DASM output

IL DASM shows that that class is implementing the IEnumerator interface (the Current property and the Reset and MoveNext methods), so essentially the yield return provides some "syntactic sugar" (more or less) for code similar to the TriNumbersEnumerator class that I wrote above. 

IL DASM


The real story is a tad deeper than that though, and Wes Dyer does a great job of looking more closely at the generated class in his blog post.  And if you’d like to do all this in Visual Basic, which unfortunately doesn’t have iterators (yet?), take a look at Matthew Doig’s blog.