Welcome to MSDN Blogs Sign in | Join | Help

The Marvels of Monads

If the word "continuation" causes eyes to glaze over, then the word "monad" induces mental paralysis.  Perhaps, this is why some have begun inventing more benign names for monads.

These days, monads are the celebrities of programming language theory.  They gloss the cover of blogs and have been compared to everything from boxes of fruit to love affairs.  Nerds everywhere are exclaiming that the experience of understanding monads causes a pleasantly painful mental sensation.

Like continuations, monads are simpler than they sound and are very useful in many situations.  In fact, programmers write code in a variety of languages that implicitly use common monads without even breaking a sweat.

With all of the attention that monads get, why am I writing yet another explanation of monads?  Not to compare them to some everyday occurrence or to chronicle my journey to understanding.  I explain monads because I need monads.  They elegantly solve programming problems in a number of languages and contexts.

Introducing Monads

Monads come from category theoryMoggi introduced them to computer scientists to aid in the analysis of the semantics of computations.  In an excellent paper, The Essence of Functional Programming, Wadler showed that monads are generally useful in computer programs to compose together functions which operate on amplified values rather than values.  Monads became an important part of the programming language Haskell where they tackle the awkward squad: IO, concurrency, exceptions, and foreign-function calls.

Monads enjoy tremendous success in Haskell, but like an actor who does well in a particular role, monads are now stereotyped in the minds of most programmers as useful only in pure lazy functional languages.  This is unfortunate, because monads are more broadly applicable.

Controlling Complexity

Composition is the key to controlling complexity in software.  In The Structure and Interpretation of Computer Programs, Abelson and Sussman argue that composition beautifully expresses complex systems from simple patterns.

In our study of program design, we have seen that expert programmers control the complexity of their designs with the same general techniques used by designers of all complex systems. They combine primitive elements to form compound objects, they abstract compound objects to form higher-level building blocks, and they preserve modularity by adopting appropriate large-scale views of system structure.

One form of composition, function composition, succinctly describes the dependencies between function calls.  Function composition takes two functions and plumbs the result from the second function into the input of the first function, thereby forming one function.

public static Func<T, V> Compose<T, U, V>(this Func<U, V> f, Func<T, U> g)
{
    return x => f(g(x));
}

For example, instead of applying g to the value x and then applying f to the result, compose f with g and then apply the result to the value x.  The key difference is the abstraction of the dependency between f and g.

var r = f(g(x));         // without function composition
var r = f.Compose(g)(x); // with function composition

Given the function Identity, function composition must obey three laws.

public static T Identity<T>(this T value)
{
    return value;
}

1.  Left identity

     Identity.Compose(f) = f

2.  Right identity

     f.Compose(Identity) = f

3.  Associative

     f.Compose(g.Compose(h)) = (f.Compose(g)).Compose(h)

Often, values are not enough.  Constructed types amplify values.  The type IEnumerable<T> represents a lazily computed list of values of type T.  The type Nullable<T> represents a possibly missing value of type T.  The type Func<Func<T, Answer>, Answer> represents a function, which returns an Answer given a continuation, which takes a T and returns an Answer.  Each of these types amplifies the type T.

Suppose that instead of composing functions which return values, we compose functions which take values and return amplified values.  Let M<T> denote the type of the amplified values.

public static Func<T, M<V>> Compose<T, U, V>(this Func<U, M<V>> f, Func<T, M<U>> g)
{
    return x => f(g(x)); // error, g(x) returns M<U> and f takes U
}

Function composition fails, because the return and input types do not match.  Composition with amplified values requires a function that accesses the underlying value and feeds it to the next function.  Call that function "Bind" and use it to define function composition.

public static Func<T, M<V>> Compose<T, U, V>(this Func<U, M<V>> f, Func<T, M<U>> g)
{
    return x => Bind(g(x), f);
}

The input and output types determine the signature of Bind.  Therefore, Bind takes an amplified value, M<U>, and a function from U  to M<V>, and returns an amplified value, M<V>.

public static M<V> Bind<U, V>(this M<U> m, Func<U, M<V>> k)

The body of Bind depends on the mechanics of the amplified values, M<T>.  Each amplified type will need a distinct definition of Bind.

In addition to Bind, define a function which takes an unamplified value and amplifies it.  Call this function "Unit".

public static M<T> Unit<T>(this T value)

Together the amplified type, M<T>, the function Bind, and the function Unit enable function composition with amplified values.

Meet the Monads

Viola, we have invented monads.

Monads are a triple consisting of a type, a Unit function, and a Bind function.  Furthermore, to be a monad, the triple must satisfy three laws:

1.  Left Identity

     Bind(Unit(e), k) = k(e)

2.  Right Identity

     Bind(m, Unit) = m

3.  Associative

     Bind(m, x => Bind(k(x), y => h(y)) = Bind(Bind(m, x => k(x)), y => h(y))

The laws are similar to those of function composition.  This is not a coincidence.  They guarantee that the monad is well behaved and composition works properly.

To define a particular monad, the writer supplies the triple, thereby specifying the mechanics of the amplified values.

The Identity Monad

The simplest monad is the Identity monad.  The type represents a wrapper containing a value.

class Identity<T>
{
    public T Value { get; private set; }
    public Identity(T value) { this.Value = value; }
}

The Unit function takes a value and returns a new instance of Identity, which wraps the value.

static Identity<T> Unit<T>(T value)
{
    return new Identity<T>(value);
}

The bind function takes an instance of Identity, unwraps the value, and invokes the delegate, k, with the unwrapped value.  The result is a new instance of Identity.

static Identity<U> Bind<T,U>(Identity<T> id, Func<T,Identity<U>> k)
{
    return k(id.Value);
}

Consider a simple program that creates two Identity wrappers and performs an operation on the wrapped values.  First, bind x to the value within the wrapper containing the value five.  Then, bind y to the value within the wrapper containing the value six.  Finally, add the values, x and y, together.  The result is an instance of Identity wrapping the value eleven.

var r = Bind(Unit(5), x =>
            Bind(Unit(6), y =>
                Unit(x + y)));

Console.WriteLine(r.Value);

While this works, it is rather clumsy.  It would be nice to have syntax for dealing with the monad.  Fortunately, we do.

C# 3.0 introduced query comprehensions which are actually monad comprehensions in disguise.  We can rewrite the identity monad to use LINQ.  Perhaps, it should have been called LINM (Language INtegrated Monads), but it just doesn't have the same ring to it.

Rename the method Unit to ToIdentity and Bind to SelectMany.  Then, make them both extension methods.

public static Identity<T> ToIdentity<T>(this T value)
{
    return new Identity<T>(value);
}

public static Identity<U> SelectMany<T, U>(this Identity<T> id, Func<T, Identity<U>> k)
{
    return k(id.Value);
}

The changes impact the calling code.

var r = 5.ToIdentity().SelectMany(
            x => 6.ToIdentity().SelectMany(
                y => (x + y).ToIdentity()));

Console.WriteLine(r.Value);

Equivalent methods are part of the standard query operators defined for LINQ.  However, the standard query operators also include a slightly different version of SelectMany for performance reasons.  It combines Bind with Unit, so that lambdas are not deeply nested.  The signature is the same except for an extra argument that is a delegate which takes two arguments and returns a value.  The delegate combines the two values together.  This version of SelectMany binds x to the wrapped value, applies k to x, binds the result to y, and then applies the combining function, s, to x and y.  The resultant value is wrapped and returned.

public static Identity<V> SelectMany<T, U, V>(this Identity<T> id, Func<T, Identity<U>> k, Func<T,U,V> s)
{
    return id.SelectMany(x => k(x).SelectMany(y => s(x, y).ToIdentity()));
}

Of course, we can remove some of the code from the generalized solution by using our knowledge of the Identity monad.

public static Identity<V> SelectMany<T, U, V>(this Identity<T> id, Func<T, Identity<U>> k, Func<T,U,V> s)
{
    return s(id.Value, k(id.Value).Value).ToIdentity();
}

The call-site does not need to nest lambdas.

var r = 5.ToIdentity()
         .SelectMany(x => 6.ToIdentity(), (x, y) => x + y);

Console.WriteLine(r.Value);

With the new definition of SelectMany, programmers can use C#'s query comprehension syntax.  The from notation binds the introduced variable to the value wrapped by the expression on the right.  This allows subsequent expressions to use the wrapped values without directly calling SelectMany.

var r = from x in 5.ToIdentity()
        from y in 6.ToIdentity()
        select x + y;

Since the original SelectMany definition corresponds directly to the monadic Bind function and because the existence of a generalized transformation has been demonstrated, the remainder of the post will use the original signature.  But, keep in mind that the second definition is the one used by the query syntax.

The Maybe Monad

The Identity monad is an example of a monadic container type where the Identity monad wrapped a value.  If we change the definition to contain either a value or a missing value then we have the Maybe monad.

Again, we need a type definition.  The Maybe type is similar to the Identity type but adds a property denoting whether a value is missing.  It also has a predefined instance, Nothing, representing all instances lacking a value.

class Maybe<T>
{
    public readonly static Maybe<T> Nothing = new Maybe<T>();
    public T Value { get; private set; }
    public bool HasValue { get; private set; }
    Maybe()
    {
        HasValue = false;
    }
    public Maybe(T value)
    {
        Value = value;
        HasValue = true;
    }
}

The Unit function takes a value and constructs a Maybe instance, which wraps the value.

public static Maybe<T> ToMaybe<T>(this T value)
{
    return new Maybe<T>(value);
}

The Bind function takes a Maybe instance and if there is a value then it applies the delegate to the contained value.  Otherwise, it returns Nothing.

public static Maybe<U> SelectMany<T, U>(this Maybe<T> m, Func<T, Maybe<U>> k)
{
    if (!m.HasValue)
        return Maybe<U>.Nothing;
    return k(m.Value);
}

The programmer can use the comprehension syntax to work with the Maybe monad.  For example, create an instance of Maybe containing the value five and add it to Nothing.

var r = from x in 5.ToMaybe()
        from y in Maybe<int>.Nothing
        select x + y;

Console.WriteLine(r.HasValue ? r.Value.ToString() : "Nothing");

The result is "Nothing".  We have implemented the null propagation of nullables without explicit language support.

The List Monad

Another important container type is the list type.  In fact, the list monad is at the heart of LINQ.  The type IEnumerable<T> denotes a lazily computed list.

The Unit function takes a value and returns a list, which contains only that value.

public static IEnumerable<T> ToList<T>(this T value)
{
    yield return value;
}

The Bind function takes an IEnumerable<T>, a delegate, which takes a T and returns an IEnumerable<U>, and returns an IEnumerable<U>.

public static IEnumerable<U> SelectMany<T, U>(this IEnumerable<T> m, Func<T, IEnumerable<U>> k)
{
    foreach (var x in m)
        foreach (var y in k(x))
            yield return y;
}

Now, the programmer can write the familiar query expressions with IEnumerable<T>.

var r = from x in new[] { 0, 1, 2 }
        from y in new[] { 0, 1, 2 }
        select x + y;

foreach (var i in r)
    Console.WriteLine(i);

Remember that it is the monad that enables the magic.

The Continuation Monad

The continuation monad answers the question that was posed at the end of the last post: how can a programmer write CPS code in a more palatable way?

The type of the continuation monad, K, is a delegate which when given a continuation, which takes an argument and returns an answer, will return an answer.

delegate Answer K<T,Answer>(Func<T,Answer> k);

The type K fundamentally differs from types Identity<T>, Maybe<T>, and IEnumerable<T>.  All the other monads represent container types and allow computations to be specified in terms of the values rather than the containers, but the continuation monad contains nothing.  Rather, it composes together continuations the user writes.

To be a monad, there must be a Unit function which takes a T and returns a K<T,Answer> for some answer type.

public static K<T, Answer> ToContinuation<T, Answer>(this T value)

What should it do?  When in doubt, look to the types.   The method takes a T and returns a function, which takes a function from T to Answer, and returns an Answer.  Therefore, the method must return a function and the only argument of that function must be a function from T to Answer.  Call the argument c.

return (Func<T, Answer> c) => ...

The body of the lambda must return a value of type Answer.  Values of type Func<T,Answer> and a T are available.  Apply c to value and the result is of type Answer.

return (Func<T, Answer> c) => c(value);

To be a monad, Bind must take a K<T,Answer> and a function from T to K<U, Answer> and return a K<U, Answer>.

public static K<U, Answer> SelectMany<T, U, Answer>(this K<T, Answer> m, Func<T, K<U, Answer>> k)

But what about the body?  The result must be of type K<U, Answer>, but how is a result of the correct type formed?

Expand K's definition to gain some insight.

return type

     Func<Func<U, Answer>, Answer>

m's type

     Func<Func<T, Answer>, Answer>

k's type

     Func<T, Func<Func<U, Answer>, Answer>>

Applying k to a value of type T results in a value of type K<U,Answer>, but no value of type T is available.  Build the return type directly by constructing a function, which takes a function from U to Answer.  Call the parameter c.

return (Func<U,Answer> c) => ...

The body must be type of Answer so that the return type of Bind is K<U,Answer>.  Perhaps, m could be applied to a function from T to Answer.  The result is a value of type Answer.

return (Func<U,Answer> c) => m(...)

The expression inside the invocation of m must be of type Func<T,Answer>.  Since there is nothing of that type, construct the function by creating a lambda with one parameter, x, of type T.

return (Func<U,Answer> c) => m((T x) => ...)

The body of this lambda must be of type Answer.  Values of type T, Func<U,Answer>, and Func<T,Func<Func<U,Answer>, Answer>> haven't been used yet.  Apply k to x.

return (Func<U,Answer> c) => m((T x) => k(x)...)

The result is a value of type Func<Func<U,Answer>,Answer>.  Apply the result to c.

return (Func<U,Answer> c) => m((T x) => k(x)(c));

The continuation monad turns the computation inside out.  The comprehension syntax can be used to construct continuations.

Construct a computation, which invokes a continuation with the value seven.  Pass this computation to another computation, which invokes a continuation with the value six.   Pass this computation to another computation, which invokes a continuation with the result of adding the results of the first two continuations together.  Finally, pass a continuation, which replaces "1"s with "a"s, to the result.

var r = from x in 7.ToContinuation<int,string>()
        from y in 6.ToContinuation<int,string>()
        select x + y;

Console.WriteLine(r(z => z.ToString().Replace('1', 'a'))); // displays a3

The continuation monad does the heavy-lifting of constructing the continuations.

Monadic Magic

Beautiful composition of amplified values requires monads.  The Identity, Maybe, and IEnumerable monads demonstrate the power of monads as container types.  The continuation monad, K, shows how monads can readily express complex computation.

Stay tuned for more with monads.  Until then, see what monads can do for you.

Posted by wesdyer | 26 Comments

Continuation-Passing Style

There are some technical words that cause quite a stir even amongst geeks.  When someone says the word "continuation", people's eyes glaze over and they seek the first opportunity to change the subject.  The stir is caused because most people don't understand what a continuation is or why someone would want to use one.  This is unfortunate, because they really are rather more simple than they sound.

Defining Continuations

A continuation represents the remainder of a computation given a point in that computation.  For example, suppose that two methods are defined: M and Main.  The method Main invokes M and then writes "Done" to the console.  The method M assigns x the value five, invokes F, increments x, and writes x to the console.

static void M()
{
    var x = 5;
    F();  // <----------- Point in the computation
    ++x;
    Console.WriteLine(x);
}

static void Main()
{
    M();
    Console.WriteLine("Done");
}

The continuation of the computation at the invocation of F is the remainder of the program execution beginning with incrementing x in the method M.  In this case, the continuation includes incrementing x, writing x, returning to Main, and writing "Done" to the console.

Some languages give programmers explicit access to continuations.  For example, Scheme has a function which calls a function passing a function representing the current continuation.  The function is aptly named call with current continuation or call/cc.  If such a function existed in .NET it might return a T given a delegate that returns a T given a delegate from T to T.

T CallCC<T>(Func<Func<T,T>, T> f)

Continuations are the functional counterparts of GOTOs both in power and inscrutability.  They can express arbitrary control flow like coroutines and exceptions while bewildering some of the brightest programmers.

Returning Values with Continuations

The simplest use of a continuation is to simulate returning out of a function.  Suppose that .NET had the function CallCC.  If Main calls a function Foo passing the value four and if Foo immediately invokes CallCC with a lambda that binds Return to the continuation at the point of the call to CallCC, then when Return is invoked inside of the lambda with the value four, computation will immediately jump out of the lambda to the point after CallCC but before the return with the value four on the stack.  This happens regardless what of occurs after the invocation of Return inside the lambda.

static int Foo(int n)
{
    return CallCC<int>(Return =>
        {
            Return(n);
            return n + 1;
        });
}

static void Main()
{
    Foo(4);
}

The result of running the program is therefore four and not five.

All of this demonstrates that returning from a function is equivalent to invoking the continuation defined at the function's call-site.  When a function "returns", it implicitly invokes the continuation of its call-site.

When people explain continuations, they usually discuss stack frames and instruction pointers.  It is easy to see why.  The implicit invocation of the "return" continuation restores the stack frame at the call-site and sets the instruction pointer to the instruction immediately after the call-site.  This is what invoking a continuation does: restore the appropriate stack frame and set the instruction pointer.

Continuation-Passing Style

It is still possible to use continuations if a language does not support a function like call/cc.  A programmer can explicitly construct the continuation of a computation and pass it directly to a function.

To illustrate this transformation, suppose that a function, Identity, is defined that returns the value it is given.

static T Identity<T>(T value)
{
    return value;
}

The return statement in Identity implicitly invokes the continuation of the call-site causing computation to leave Identity and resume at the point immediately after the invocation of Identity at the call-site.  If Main contains only a call to WriteLine passing the result of the call to Identity, then computation resumes with the call to WriteLine.

static void Main()
{
    Console.WriteLine(Identity("foo"));
}

However, instead of implicitly invoking Identity's continuation, the programmer can pass the continuation to Identity explicitly.  An extra argument is added to Identity which is a void-returning delegate with one parameter that is the same type as the return type of the former Identity function.

static void Main()
{
    Identity("foo", s => Console.WriteLine(s));
}

static void Identity<T>(T value, Action<T> k)
{
    k(value);
}

At the call-site, the remainder of the computation has moved into the lambda passed to Identity.  The continuation is explicitly passed.  On the callee's side, the return statement is replaced by the invocation of the continuation parameter, k, with the return value.

This pattern is called continuation-passing style or CPS.

Converting to CPS

Now that we know enough to be dangerous, let's see what we can do.  Consider the typical Max function.

static int Max(int n, int m)
{
    if (n > m)
        return n;
    else
        return m;
}

To convert this function to CPS, follow these steps:

1.  Change the return type to void

2.  Add an extra argument of type Action<T> where T is the original return type

3.  Replace all return statements with invocations of the new continuation argument passing the expression used in the return statement

Applying these steps to the integer version of Max, the function now returns void, takes an extra parameter, k, of type Action<int>, and invokes k everywhere a return statement appeared.

static void Max(int n, int m, Action<int> k)
{
    if (n > m)
        k(n);
    else
        k(m);
}

To convert the call-site to CPS:

1.  Remove all of the remaining computation after the call-site

2.  Put the remaining computation in the body of the lambda corresponding to the continuation argument

For example, suppose the user defined a method Main which invokes WriteLine with the result of applying Max to three and four.

static void Main()
{
    Console.WriteLine(Max(3, 4));
}

The remaining computation after the invocation of Max is the call to WriteLine.  Therefore, the call to WriteLine is moved into the lambda representing the continuation passed to Max.

static void Main()
{
    Max(3, 4, x => Console.WriteLine(x));
}

The steps for converting the call-site skirted the issue of what to do with return statements in the continuation.  For example, suppose there are three methods Main, F, and G where Main calls F and F calls G.  To convert F and G to CPS, follow the same transformation steps as with Max.

static void Main()
{
    Console.WriteLine(F(1) + 1);
}

static int F(int n)
{
    return G(n + 1) + 1;
}

static int G(int n)
{
    return n + 1;
}

However, what should the transformation do with the return statement in F?  The continuation passed to G should definitely not attempt to return a result because the continuation is a void-returning delegate.

static void F(int n, Action<int> k)
{
    G(n + 1, x => { return x + 1; });  // error, Action<int> cannot return a value
}

That wouldn't make any sense.  Delegates with void return-types cannot return values.  Furthermore, the code completely ignores k.

Instead, the return statement should be transformed into an invocation of k.

static void Main()
{
    F(1, x => Console.WriteLine(x + 1));
}

static void F(int n, Action<int> k)
{
    G(n + 1, x => k(x + 1));
}

static void G(int n, Action<int> k)
{
    k(n + 1);
}

This is how functions that use explicit continuations are composed together.

What about recursive functions like Factorial?

static void Main()
{
    Console.WriteLine(Factorial(5));
}

static int Factorial(int n)
{
    if (n == 0)
        return 1;
    else
        return n * Factorial(n - 1);
}

Recursive functions can be transformed just as easily following the same steps.

static void Main()
{
    Factorial(5, x => Console.WriteLine(x));
}

static void Factorial(int n, Action<int> k)
{
    if (n == 0)
        k(1);
    else
        Factorial(n - 1, x => k(n * x));
}

CPS turns a program inside-out.  In the process, the programmer may feel that his brain has been turned inside-out as well.

Why CPS?

What exactly can a programmer do with CPS besides showoff at parties?

Compilers use a more thorough CPS transformation to produce an intermediate form amenable to many analyses.  UI frameworks use CPS to keep the UI responsive while allowing nonlinear program interaction.  Web servers use CPS to allow computation to flow asynchronously across pages.

Most programmers have used functions which take a callback.  Often, the callback is the code that is invoked upon completion of the function.  In these cases, the callback is an explicitly-passed continuation.

Asynchronous Calls

Hiding network latency requires asynchronous calls.  In the first technology preview, Volta allows programmers to add asynchronous versions of methods on tier boundaries.

[RunAtOrigin]
static class C
{
    public static int F(string s)
    {
        return s != null ? s.Length : 0;
    }
}

To make a method asynchronous, define the CPS-equivalent method signature and annotate it with the Async attribute.  Volta will generate the body and modify the call-sites accordingly.

[Async]
public static void F(string s, Action<int> k);

If the programmer invokes an asynchronous method F, then Volta will launch the invocation on another thread and invoke the continuation upon completion of the call to F.

C.F("foo", x => Console.WriteLine(x));

Wrapping it Up

Continuations are powerful.  CPS gives programmers a way to use continuations in their day-to-day work.  Perhaps, someday soon we'll discuss how to make CPS more palatable, but we will need to discuss the "M" word first.

Posted by wesdyer | 9 Comments

Musings on Software Testing

It was spring 2003, I had just finished a weekend camping in the southern Arizona desert.  I was dusty and physically exhausted from hours of playing paintball.  For those who have never been in those parts, imagine long straight roads with dry sage brush, painful cactus, and jagged mountains.  I needed a book to pass the time during the drive.  Fortunately, I had brought along "Extreme Programming Explained" by Kent Beck.  After having spent too much time in projects that either completely lacked process or had way too much of it, Beck's position seemed like a revelation.  Over the next few months, I became acquainted with test-driven development, design patterns, refactoring and many of the other topics typically associated with agile programming.

Test Driven Development

In the spring, I had a course that ended up just teaching agile programming concepts.  One of the things that the professor emphasized was following the test-driven development (TDD) process to the letter.

image

The TDD Process:

1.  Add a test

2.  Run all tests and see the new one fail

3.  Write some code

4.  Run the automated tests and see them succeed 

5.  Refactor code

Repeat

When I say refactor, I mean it in the strictest sense.  I mean changing how a program is arranged internally without changing the semantics of the program at all.  This of course ignores the fact that things like timing, working set, or stack size change which can be material to the run-time behavior of the program.

During step three a programmer should remember, "It is important that the code written is only designed to pass the test; no further (and therefore untested) functionality should be predicted and 'allowed for' at any stage".

Machine Learning

I fully bought into this rigorous approach to TDD until one summer day when I was reading through Tom Mitchell's marvelous book "Machine Learning".  The book begins by defining machine learning:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Intuitively, the learning process is trying to learn some unknown concept.  The learner has access to its past experiences (the training data) and uses them to generate a hypothesis of the unknown concept.  This hypothesis is then evaluated and the new experiences are added to the training data.  The learning process then repeats itself as the learner forms a better approximation of the underlying concept. 

TDD is a learning process.  Where the training experience E is the automated tests, the task T is improving the quality of the program, and the performance measure P is the percentage of tests passed.  In this view of TDD, the programmer continues to hack away at his program generating hypotheses, where each hypothesis is a version of the code, until one satisfies the data.  Note that unless the tests exhaustively specify the desired behavior of a program, then the tests are not the target concept that needs to be learned but rather a few data points where the target concept is manifest.  Furthermore, the program should not only do well against the tests but also against real world scenarios which may not be covered by the tests.

j0385807

Bias and Learning

A learner is characterized by its hypothesis space and how it searches through that space.  Notice that since each learner has a hypothesis space, a given concept may have not be in that space.  One solution to this problem is to make a learner's hypothesis space include all hypotheses, but this leads to a fundamental problem: "a learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances."  The only thing that an unbiased learner can do is regurgitate the training data because it has no ability to make the assumptions and logical leaps necessary to deal with unseen data.

For example, suppose that I am trying to learn to classify animals as either mammals or non-mammals and suppose that I make no assumptions whatsoever in the process of learning this task.  If I am told that giraffes, lions, mice, bats, and whales are mammals but snakes, butterflies, frogs, lobsters, and  penguins are not, then I only have the ability to answer questions regarding those animals for which I already know the answer.  I could have done much better by trying to guess at what makes an animal a mammal even if I got it wrong!

Returning to TDD, note that at no time is a programmer supposed to write code that generalizes to the target concept.  The programmer should only write either the minimal code to make a test pass or refactor the code and not change the semantics.  So according to the pure TDD approach, the program would only learn to regurgitate the training data and do who knows what against new examples.  A critic might respond that more test cases are needed.  In some cases where the input space is finite and small, this might well help, but in general this is impractical.

Sufficiently Large Projects

A similar situation arises in projects that are sufficiently large.  These projects are characterized by the fact that they cannot be held inside any one person's head.  This means that when someone is making a change, they cannot know all of the consequences of their change.  Often in projects of this size, when people ask if a particular change is good, the response is, "It passed the tests."  This leads to a state where the development process as a whole is a learning process that uses the test cases as the training data and generates programs as hypotheses that are evaluated against the training data.  Most likely, the target concept is not normally fully determined by the tests.

You know the projects that I am talking about, and chances are that you have seen some code that was added in just to satisfy some test but didn't really capture the essence of what the test was originally trying to make the program do.  While tests are invaluable to programmers for debugging purposes, it would be better if programmers returned to the specification in order to understand what they are supposed to develop instead of returning to the tests.

Testing practices like clean-room testing and various forms of white box testing address address the programmer bias deficiency, but it can also be addressed automatically.

Testing an Infinite Space

I had a development lead who liked to say that testing is just sampling.  In a sense he is right, since a test is a particular input sequence given to a program from the set of all possible input sequences.  Think of a program as a state machine encoded in your favorite process calculus.  A test then verifies that a particular path of edges corresponding to the transitions based on the input ends in a particular state.  Most interesting programs have an infinite number of possible input sequences because of their recursive nature and so the set of tests must be sampled from an infinite input space.

Testing creates tests by drawing input sequences out of the input sequence space.  It is easy to imagine that tests could be generated randomly by selecting random input sequences.  Random selection has several advantages.  First, it enables statistical measures of quality.  For example, if we sample uniformly then the pass/fail rate should be representative of the whole given a large enough sample.  We could also keep generating more and more tests to improve our confidence that the program is indeed correct.  Second, random tests have the added benefit of not allowing the programmers to just "pass the tests" since the programmers cannot anticipate what tests will be run.  This forces the programmers to generate a hypothesis that more closely matches the target concept instead of the training data.

There are however some disadvantages.  The first apparent problem is repeatability of the tests.  This can be remedied by storing the seed that generates a set of tests.  A second more serious problem is that a uniform random sample doesn't necessarily deliver the best results.

On Bugs

Not all bugs are created equal.  Practices such as testing the boundary conditions and integration testing are based on that fact.  Various testing practices explore parts of the input space that are richer in bugs or are more likely to contain serious bugs.

image

The cost of a bug can be quantified as the total monetary loss due to the presence of that bug in the software.  This includes all sorts of things like lost sales through negative customer perception, cost of software patches, legal action, etc.  While this is an accurate measure of cost, it is not very practical because it is very hard to estimate.

A more practical measure for estimating the relative cost of a bug might be the probability that users will find it multiplied by the severity of the bug.  The first part of the equation is interesting because it indicates that those bugs that customers never find are not important for testing to find.

Back to Training

The testing ideal is to minimize the total cost of bugs.  There are many good methods for writing test cases.  In addition to these, we could also select input sequences based upon the probability that a user would select such an input sequence.

Imagine if it were possible to collect all of the input sequences that would ever be given to a program including duplicates.  Then our problem would be reduced to uniformly selecting input sequences out of this collection.  Obviously, this collection is unavailable but if we use the same machine learning principles which we have been discussing then we could develop a program that could learn to generate test cases according to some distribution by using a representative set of user input sequences as training data.

Markov Chains

One way that this could be done is by returning to the formulation of a computer program as a state machine.  The object is to learn the probability that a user would take some transition given a particular state and a sequence of previous states and transitions.  This can be formulated as a Markov chain.

Consider generating random test cases for a parser.  A straightforward approach is to randomly derive a string in the language from the root non-terminal; however, it will quickly become apparent that the least used parts of your language are getting undue attention.  Some more sophisticated tools allow productions in the grammar to be attributed with weights.  This works better but forces the tester to learn the proper weights manually and it doesn't accurately describe the weights since the probability that a particular production for a non-terminal will be used is also based on the history of previously expanded productions.

A better approach is to learn weights for the Markov chain corresponding to the pushdown automata.  Since a parser for an arbitrary context-free grammar can be expressed by a pushdown automaton, we can instrument the automaton to collect data on the probability of some transition given a state and a history of states and transitions.  The instrumented automaton can then be fed a large number of user-written programs.  Finally, a generator can be created that uses the data to generate programs that syntactically look like user-written programs.

Another approach might be to instrument branches and event sequences in a program and then train against user input.  The learner could then build a Markov chain that can be used to generate input sequences roughly like user input.

Conclusion

Next time you find yourself in a project that is devolving into an unbiased learning process, restore the sanity by using the specification and not the tests to decide what to implement.

Posted by wesdyer | 15 Comments
Filed under:

Volta and You

Yesterday, Volta was made publicly available for the first time.  It is an experimental project in the early stages of development.  The team decided to release an early technology preview so that developers everywhere can help guide the project through experience and feedback.  We want your feedback.

The first release provides the basic feature set that will be improved upon with time.  It has some obvious shortcomings that we are aware of and are actively addressing.  But really, at this stage, the preview is more concerned with sparking your imagination about what is possible than ironing out all of the details.

Perhaps you disagree.  Maybe the most important feature to you is the completeness of a final product.  If that is the case, then say so and we will seriously consider making it a higher priority for the upcoming early experimental releases.

At some point, Volta may become, feed into, or inform a product, but that is a little way off yet.  So let's enjoy the unique opportunity of working together to make something great.

In the coming months, I will alternate between three types of posts:

1.  Volta focused posts: explaining the motivation, features, and technical details

2.  C#: this includes both 3.0 and eventually 4.0 features

3.  Random thoughts: like it says; two that will be discussed soon are programmer tests and continuations

I hope you enjoy the posts and I look forward to engaging with you in discussion.

Posted by wesdyer | 19 Comments
Filed under:

Volta: Redefining Web Development

Anyone who writes web applications knows that web development is not easy.  Developers wrangle with a soup of technologies distributed across multiple tiers.  We live in a world where programmers accept the fact that they need to know four or five different languages, tools, and environments just to get a site up and running.  In ancient times, the Egyptians built marvelous structures despite the primitive tools that the workmen used.  Building a pyramid took most of the national resources of wealth and labor.  Today, we build structures which are vastly more complicated and yet require only a tiny fraction of the resources.  The difference is in the tools and the infrastructure.

In a similar way, Volta significantly improves web development.  Programmers write web applications using familiar .NET languages, libraries, and tools.  Volta splits the application into multiple parts potentially running on different tiers, say, client and server.  Client code needs only a minimal, JavaScript-enabled browser, though Volta will take advantage of additional runtimes that may be present.

Programmers simply refactor classes that need to run on tiers other than the client and Volta injects the boilerplate code for communication, serialization, synchronization -- all the remoting code.  The developer enjoys all of the benefits of .NET: a great debugger, test tools, profiling, intellisense, refactorings, etc.

Just how simple is Volta?  Let's write an application that uses a button to query the server for a string and displays that string to the client: the hello world web application.

image

Now, let's write the code for the web page.  We need a Div for the output and an Input for the interaction.  Of course, we could have constructed the page elements with HTML/CSS instead.

using System;
using Microsoft.LiveLabs.Volta.Html;
using Microsoft.LiveLabs.Volta.Xml;

namespace HelloWorld
{
    public partial class VoltaPage1 : Page
    {
        public VoltaPage1()
        {
            var output = new Div();

            var b = new Input();
            b.Type = "button";
            b.Value = "Get Message";
            b.Click += () => output.InnerHtml = C.GetMessage();

            Document.Body.AppendChild(output);
            Document.Body.AppendChild(b);
        }
    }

    class C
    {
        public static string GetMessage()
        {
            return "Hello, World";
        }
    }
}

But we want to produce the message on the server.  Time to refactor.

 image

Browser clients call the server, the "Origin", because this ensures the message will come from the same server that supplied the HTML.

[RunAtOrigin]
class C
{
    public static string GetMessage()
    {
        return "Hello, World";
    }
} 

That is it.  Try it out.

image

Now, click "Get Message".

image

Great.  But is it cross-browser compatible?

image

Yes.  And you can debug it.

 image

You can even debug across tiers.

image 

There is a lot more to Volta in the first technology preview which was made publicly available at 11 am PST today and there will be a lot more to come.

Still skeptical?  Try it out for yourself.

http://labs.live.com/volta

image

Posted by wesdyer | 25 Comments
Filed under:

More on Partial Methods

Thank you everyone for the feedback.  If you have any more to say then please do express your opinion (and yes it is valued ;).  I think there has been a bit of misunderstanding about what partial methods really are.  So let's set the matter straight.  Here is the current spec (quoting from here on out):

Partial methods are a new feature complementing partial classes with the option of declaring “hook” – or compile time events – in the code.

Here is an example

partial class Customer

{

  public string Name {

    get { … }

    set {

      OnNameChanging(value);

      _name = value;

      OnNameChanged();

    }

  }

  partial void OnNameChanging(string value);

  partial void onNameChanged();

}

Partial methods are only allowed in partial types; either classes or structs. There are no other kinds of partial members.

There are two parts of a partial method; the defining and the implementing partial method declaration. There can be at most one defining declaration, and at most one implementing declaration, and the implementing declaration is only allowed if there is a defining declaration. The defining declaration is distinguished by having a “;” instead of a method body. The two can both appear in the same partial type declaration.

Partial methods must always return void. The contextual keyword partial has to appear right before void – that is how we recognize it. They can have ref but not out parameters; both params and this modifiers are allowed in the usual positions.

Access modifiers on partial methods are not allowed, but partial instead implies private. The only modifiers allowed on a partial method declaration (apart from partial) are unsafe and static. Thus a partial method declaration cannot declare or override a virtual method. Also, partial methods cannot be extern, because the presence of the body determines whether they are defining or implementing.

If both a defining and an implementing partial method declaration are given, all modifiers on parameters and the method declarations themselves must be given in both declarations, although not necessarily in the same order.

Partial methods can be generic. Constraints are placed on the defining partial method declaration, and must be repeated on the implementing one. Parameter and type parameter names do not have to be the same in the implementing declaration and the defining one.

Whether or not there is an implementing declaration, a partial method participates in overload resolution and type checking in the normal way. If there is no implementing declaration, however, the method will be removed. Furthermore calls to the method are removed, including evaluation of arguments to the call, in a manner similar to conditional methods.

You can only make a delegate to a partial method if there is an implementing declaration; otherwise a compile time error occurs. Similarly, calls to a partial method occurring within the body of a lambda which is converted to an expression tree are only allowed if there is an implementing declaration of the partial method.

Because partial methods are always private, they cannot implement interface methods, neither implicitly or explicitly.

If both a defining and an implementing partial method declaration are given, attributes on the resulting method and its return type, parameters and type parameters are determined by assembling, in unspecified order, the attributes from both the defining and implementing declarations. Duplicates are not removed. Thus, for attributes that can only occur once on a given declaration, having that attribute on both declarations will result in a compile time error.

Definite assignment is only checked after the possible removal of partial method calls and evaluation of the argument expressions. Thus, if assignments occur within these argument expressions, and the method is removed, they will not affect definite assignment. This can potentially lead to compile time errors.

The Main method of a program can be a partial method, but will only be considered as the entry point for the program if there is an implementing declaration.

XML doc comments are allowed on both defining and implementing partial method declarations, but only the ones that appear on the implementing declaration will be used by the compiler. Other tools may make use of the comments on defining declarations.

Posted by wesdyer | 16 Comments

In Case You Haven't Heard

It has been a while since I have posted.  We have been working hard to get Orcas beta 1 and beta 2 done.  So I apologize for the long interlude between posts but I hope that you are enjoying beta 1 and that you are looking forward to beta 2.

Now that beta 1 is out there, what do you all think of it?  Beta 1 has most of the features that we intend to put into Orcas but not all of them.  Some notable differences you will see in later releases are improved performance, error messages, error recovery, stability, and a few refinements.  Basically the kind of polish that people expect as a product nears completion.

Partial Methods

One feature that you may have not heard of yet is a feature called partial methods.  This is some of the work that I did in the last coding milestone a few months back.  Partial methods are methods that enable lightweight event handling.  Here is an example declaration:

partial class C
{
  static partial void M(int i);
}

There are a few notable things here:

1.  Partial methods must be declared within partial classes

2.  Partial methods are indicated by the partial modifier

3.  Partial methods do not always have a body (well look at this more below)

4.  Partial methods must return void

5.  Partial methods can be static

6.  Partial methods can have arguments (including this, ref, and params modifiers)

7.  Partial methods must be private

Now suppose that I make a call to C.M.

partial class C
{

  static void Main()
  {
    C.M(Calculation());
  }
}

Since M has no body, all calls to C.M are removed at compile time as well as the evaluation of the arguments.  So the above program is equivalent to:

partial class C
{

  static void Main()
  {
  }
}

In this sense, partial methods are the distant cousins of conditional methods which also sometimes remove the call and the evaluation of the arguments.  But partial methods go even further.  When a partial method has no body then the partial method is not even emitted to metadata.

So far this might seem a little confusing.  C# now allows users to declare methods for which the calls, the evaluation of the arguments, and the method itself are not emitted.  Fortunately, the story doesn't end there.  Partial methods allow users to define a body for the method so that the method, the calls, and the evaluation of the arguments are emitted.

partial class C
{
  static partial void M(int i); // defining declaration
  static partial void M(int i)  // implementing declaration
  {
  }
}

In the code above, we see that there is a difference between a defining declaration of a partial method and an implementing declaration of a partial method and the difference is whether or not the method has a body.  These definitions don't have to be in the same partial class declaration.  There may only be one defining declaration and if a defining declaration exists then there may be an implementing declaration.

Why Partial Methods?

So how are these partial methods used?  The common scenario is to use them to do lightweight event handling.  For example a tool that generates code may wish to have hooks for users to customize what code is run.  For example, imagine that a tool generated a bunch of code for a class representing a customer.  It might look like this:

partial class Customer
{
  string name;

  public string Name
  {
    get
    {
      return name;
    }
    set
    {
      OnBeforeUpdateName();
      OnUpdateName();
      name = value;
      OnAfterUpdateName();
    }
  }

  partial void OnBeforeUpdateName();
  partial void OnAfterUpdateName();
  partial void OnUpdateName();
}

If the user doesn't add any implementing definitions then this code is equivalent to:

partial class Customer
{
  string name;

  public string Name
  {
    get
    {
      return name;
    }
    set
    {
      name = value;
    }
  }

}

No extra metadata for things that are not used and no extra instructions for useless operations.  On the other hand if the user listened to the OnUpdateName "event" like this:

partial class Customer
{
  partial void OnUpdateName()
  {
    DoSomething();
  }
}

Then the original definition is equivalent to:

partial class Customer
{
  string name;

  public string Name
  {
    get
    {
      return name;
    }
    set
    {
      OnUpdateName();
      name = value;
    }
  }

  partial void OnUpdateName();
}

Comparing Partial Methods to the Alternatives

At this point, it is sensible to ask why not just use subclassing and virtual methods?  Of course, this would also work but it does have the drawback that the calls, the methods, and the evaluation of the arguments will still be emitted even if the virtual methods are not overridden.  So in a system like Linq to SQL that has thousands of little events it allows these events to be very lightweight so that the user only pays for those events that she uses.

A Few Fine Points

Consider the following program...

partial class C
{
  static void Main()
  {
    int i = 3;
    C.M(i = 5);
    Console.WriteLine(i);
  }
}

What does it write to the console?  3, 5, ...?

Actually, it is impossible to tell from just this code.  If there is no implementing declaration then the program will display 3 because the i = 5 will never be evaluated, but if there is an implementing declaration then the program will display 5.  The same is true for conditional methods.  So if you want a side-effect to occur make sure you do not cause the side-effect to occur as an argument to a partial method.  Of course, the same trick can be used to hide expensive calculations.

partial class C
{
  static void Main()
  {
    C.M(VeryVeryExpensiveCalculation());
  }
}

If there is no implementing declaration then the very very expensive calculation will never be performed.

Now what about attributes, how does those work?

partial class D
{
  [W]
  [return:X]
  partial void M<[Y]T>([Z]int foo)
  {
  }

  [Z]
  [return:W]
  partial void M<[X]T>([Y]int foo);
}

What attributes are actually emitted on M?  W and Z are the attributes on M; X and W are the attributes on the return type; Y and X are the attributes on the type parameter; Z and Y are the attributes on the parameter.

Enjoy!

Posted by wesdyer | 90 Comments
Filed under:

All About Iterators

Design patterns have been all of the rage for a number of years now.  We have design patterns for concurrency, user interfaces, data access, object creation, and so many other things.  The seminal work on the topic is the Gang of Four's book, Design Patterns.  When used appropriately they are a fantastic way to codify the wisdom gleaned from the battles we have fought building software systems.

One of the criticisms leveled at design patterns is that they are simply formalisms to address weaknesses in programming languages.  They require the human compiler to generate code whenever a specific recurring problem is encountered that cannot be solved directly with language support.  Now this might sound like heresy to some, but there is some truth to the criticism.  Programmers adapt to the shortcomings in the languages they use by generating pattern like code either by hand or with metaprogramming (generics, dynamic code generation, reflection, expression trees, macros).

Let's take a look at one such example.

The Iterator Design Pattern

Among the many patterns in the literature is the iterator pattern.

Iterator Design Pattern 

In .NET this pattern is embodied by IEnumerable/IEnumerable<T> and IEnumerator/IEnumerator<T>.

IEnumerable<T> and IEnumerator<T>

An IEnumerable<T> is something that can be enumerated (iterated) by calling GetEnumerator which will return an IEnumerator<T> (iterator).  The IEnumerator<T> is used to move a virtual cursor over the items that are iterated.

Implementing the iterator pattern is a bit onerous.  For example, here is the suggested iterator implementation for List<T> from the Design Pattern book.

class ListIterator<T> : IEnumerator<T>
{
  List<T> list;
  int current;

  public ListIterator(List<T> list)
  {
    this.list = list;
    current = -1;
  }

  public T Current { get { return list[current]; } }

  public bool MoveNext()
  {
    return ++current < list.Count;
  }

  public void Reset()
  {
    current = -1;
  }
}

Since we can't change the list itself, we can introduce a ListIterable<T> that wraps a list.

class ListIterable<T> : IEnumerable<T>
{
  List<T> list;

  public ListIterable(List<T> list)
  {
    this.list = list;
  }

  public IEnumerator<T> GetEnumerator()
  {
    return new ListIterator<T>(list);
  }

}

And finally, we can write a method called GetElements which returns an IEnumerable<T> over the elements of a List<T>.

static IEnumerable<T> GetElements<T>(this List<T> list)
{
  return new ListIterable<T>(list);
}

Iterators in C#

Fortunately in most cases programmers don't need to deal with the iterator design pattern directly since the introduction of iterators in C# 2.0.  Instead of writing the iterator above, we can simply write the following:

static IEnumerable<T> GetElements<T>(this List<T> list)
{
  for (int index = 0; index < list.Count; ++index)
    yield return list[index];
}

When the C# compiler sees this method, it translates it into something very similar to the ListIterator<T> and ListIterable<T> above.  Using Reflector or ILDasm we can see that the GetElements method is rewritten as (the names have been changed for clarity as the compiler generates unspeakable names;):


private static IEnumerable<T> GetElements<T>(this List<T> list)
{
  GetElementsIterator<T> temp = new GetElementsIterator<T>(-2);
  temp.listParameter = list;
  return temp;
}

This is remarkably closer to the first GetElements we wrote rather than the second.  Like in the first GetElements, we create an object that implements IEnumerable<T> and parameterize this object with the list that was passed in.  The only other thing that happens in this implementation that doesn't in the first GetElements method is a -2 is passed into the object.  I'll come back to this later, but first let's take a look at the GetElementsIterator<T> class (I omitted a few details and changed names for clarity).


private sealed class GetElementsIterator<T> : IEnumerable<T>, IEnumerator<T>
{
  int state;
  T current;
  public List<T> listParameter;
  int initialThreadId;
  public int index;
  public List<T> list;

  public GetElementsIterator(int state);
  private bool MoveNext();
  IEnumerator<T> IEnumerable<T>.GetEnumerator();
  void IEnumerator.Reset();

  T IEnumerator<T>.Current { get; }
}

The most important thing to note at this time is that GetElementsIterator implements both IEnumerable<T> and IEnumerator<T>.  A strange combination, but we will see why in a little while.  Now let's look at what actually happened when we ran the constructor. 


public GetElementsIterator(int state)
{
  this.state = state;
  this.initialThreadId = Thread.CurrentThread.ManagedThreadId;
}

Did the constructor run the for loop?  No, it didn't.  It took some variable called state in and marked what thread created the iterator (if you look at Whidbey code the initialThreadId stuff won't be there...I'll get to that).  The state variable marks what state the GetElementsIterator is in.  The number -2 is the "I'm an IEnumerable<T>" state.  Now, let's look at the GetEnumerator method.


IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
  GetElementsIterator<T> temp;
  if ((Thread.CurrentThread.ManagedThreadId == initialThreadId) && (state == -2))
  {
    state = 0;
    temp = this;
  }
  else
  {
    temp = new GetElementsIterator<T>(0);
  }
  temp.list = this.listParameter;
  return temp;
}


The method first checks to see if the thread that is calling GetEnumerator is the same thread that created the GetElementsIterator object.  If it is the same thread and if the GetElementsIterator is in the "I'm an IEnumerable<T>" state then the state is changed to the "I'm an initialized IEnumerator<T>" state.  Otherwise, a new object of the same type is created but it is immediately put in the "I'm an initialized IEnumerator<T>" state so that thread safety is maintained.  Finally, in either case the list that was passed from the GetElements method is copied into the list field for consumption by the MoveNext method.

In Whidbey code, you will see that the if statement is different because an Interlock.Exchange was used to perform the same task.  The change was made to improve performance (especially for iterators that have 0 or 1 items to iterate over).

Now we have seen why GetElementsIterator implements both IEnumerable<T> and IEnumerator<T>, because it can morph from the IEnumerable<T> role into the IEnumerator<T> role without creating any new objects in most cases.

The Current property is very simple.


T IEnumerator<T>.Current
{
  get
  {
    return current;
  }
}


The Reset method simply throws a NotSupportedException.

This leaves us with only the MoveNext to examine.  We still haven't seen where the for loop went.  Hopefully, it is in the MoveNext.  The following code is produced with the /o+ compiler option (optimizations turned on).


private bool MoveNext()
{
  switch (state)
  {
  case 0:
    state = -1;
    index = 0;
    break;

  case 1:
    state = -1;
    index++;
    break;

  default:
    goto Done;
  }
  if (index < list.Count)
  {
    current = list[index];
    state = 1;
    return true;
  }
Done:
  return false;
}


Hm...a switch statement over a variable called state with a number integer case labels.  It looks like a state machine and indeed it is.  When MoveNext is first called, state is equal to 0, so the state is set to -1 (the "I'm finished" state) and the index is set to 0.  Then we check to see if the index is less than the number of things in the list.  If so then we set current to the current list element based on the index and set the state to 1.  We return true indicating that there is something to consume.

The second call to MoveNext will run case 1 since the state is equal to 1.  It will again set the state to -1 and increment the index.  If we still have elements in the list then the current will be set appropriately and the state will be set to 1 (the "we need to check again" state) and true will be returned.

This continues until there nothing left to consume and false is returned.

Finally, we found our for loop.  It is encoded in the MoveNext.  But note that on each call to MoveNext is only computes the part of the for loop that is relevant to realize the next element.  It never computes more than it needs to.  This is why iterators are an example of deferred execution.  When the actual GetElements method was called, no elements were realized at all!  Later as MoveNext is called, one element at a time is realized.  This of course enables all of sorts of great scenarios such as the pay-as-you-go model and the ability to have infinite lists.

The Cost of Iterators

Iterators are very performant.  In almost all situations that I have encountered they are more than performant enough and they simplify the code drastically as we have seen.  But sometimes you can get into trouble.

Consider the definition of the Concat sequence operator in Linq to Objects.  It looks something like this:

static IEnumerable<T> Concat<T>(this IEnumerable<T> sequence1, IEnumerable<T> sequence2)
{
  foreach (var item in sequence1)
    yield return item;
  foreach (var item in sequence2)
    yield return item;
}

Let's write a little benchmark to evaluate the performance of concat.

var stopWatch = new Stopwatch();
for (int length = 0; length <= 10000; length += 1000)
{
  var list = new[] { 1 };
  IEnumerable<int> ones = list;
  for (int i = 0; i < length; ++i)
    ones = ones.Concat(list);

  stopWatch.Reset();
  stopWatch.Start();

  foreach (var item in ones) ;

  stopWatch.Stop();
  Console.WriteLine("Length: {0} Time: {1}", length, stopWatch.ElapsedMilliseconds);
}

The results may be perhaps surprising.  The time to evalute the foreach statement is not linearly proportional to the number of concats that are composed together.  In fact it is proportional to the square of the number of concats composed together.

Upon closer inspection the reason why is obvious.  The time complexity of Concat is O(m+n) where m is the number of items in the first sequence and n is the number of items in the second sequence.  But note that in this example, n is always 1.  The outermost call is O(m+1).  The next call has O((m-1)+1), then O((m-2)+1), ... O(1+1).  There are m of these calls so the running time should be O(m^2).  Essentially, composing concats together like this causes O(m^2) yield returns to be executed.

Of course, using a List<T> here and adding on the sequences would have been much more performant because it eliminates the redundant calculations but it would not have been evaluated lazily.

Iterators are even more fun if the data structure that is being enumerated is more complicated.  For example, consider iterating over n-ary trees.  Here is a quick definition of a n-ary tree.

class Tree<T>
{
  public T Value { get; private set; }
  public Tree<T> NextSibling { get; private set; }
  public Tree<T> FirstChild { get; private set; }

  public Tree(T value, Tree<T> nextSibling, Tree<T> firstChild)
  {
    Value = value;
    NextSibling = nextSibling;
    FirstChild = firstChild;
  }

  public IEnumerable<Tree<T>> GetChildren()
  {
    for (var current = FirstChild; current != null; current = current.NextSibling)
      yield return current;
  }
}

Now it is easy to define an iterator that performs a preorder traversal of a n-ary tree.

static IEnumerable<T> PreOrderWalk<T>(this Tree<T> tree)
{
  if (tree == null)
    yield break;
  yield return tree.Value;
  foreach (var subTree in tree.GetChildren())
    foreach (var item in subTree.PreOrderWalk())
      yield return item;
}

Just the way that I like code: clear and concise.  The only problem is that the iterator could be more efficient.  This may or may not be a problem.  In a library it will almost certainly be a problem.

We can improve the efficiency somewhat by changing the code:

static IEnumerable<T> PreOrderWalk<T>(this Tree<T> tree)
{
  var stack = new Stack<Tree<T>>();
  if (tree != null)
    stack.Push(tree);
  while (stack.Count > 0)
  {
    for (var current = stack.Pop(); current != null; current = current.FirstChild)
    {
      yield return current.Value;
      if (current.NextSibling != null)
        stack.Push(current.NextSibling);
    }
  }
}

This second iterator doesn't recursively call iterators thus avoiding both the recursive call and the extra allocations.  Instead, it maintains a stack of work to do after the leftmost path has been exhausted.  Once the leftmost path has been exhausted then a node is popped off and the leftmost traversal is resumed at that node.

When we measure the difference, we see that the improvement is noticeable but that the number of nodes, O(b^d) where b is the branching factor and d is depth, dominates the cost of the traversal.  In the graph below, the green line indicates the total number of nodes in the tree.  The trees have a branching factor of 2.

So the key takeaway here is that iterators have great performance but as always measure the performance of your code.  If you find that performance is suffering, use a combination of profiling and analysis to find the problem.  If the problem is an iterator, you might be able to increase the performance by reworking the iterator as in the n-ary tree case.  In other cases, it might be the usage of the iterators as with pathological Concats.

One Possibility for Language Improvement (not in Orcas)

The Concat sequence operator is interesting because there is a lot of code that is seemingly redundant.  It takes two sequences and then has to iterate over them and yield their elements.  It's like it needs to expand their insides just to package them up together again.  As we have seen this doesn't lead to the best performance and the code is overly verbose.

Bart Jacobs, Erik Meijer, Frank Piessens, and Wolfram Shulte wrote