Welcome to MSDN Blogs Sign in | Join | Help

Recursive lambda expressions

This is a very geeky post. The tiny piece of useful information comes right at the bottom. The rest of it is all artifacts of the obscure art of doing lambda calculus in C#, which can also be characterized as doing very much with very little, sacrificing only readability.

People sometimes complain that you cannot write a lambda expression that is recursive. Good old factorial, for instance, how to write that as a lambda expression? Well the fathers of lambda calculus, who invented the lambda expressions in the 1930’s struggled with that, too, and as you might have guessed they came up with a solution.

In this post let us stand on the shoulders of those giants and see how you can get recursion into your own lambda expressions in C#. It may not be practical but it is fun!

How to write factorial

So what you really want is to be able to write the lambda expression:

x => x == 0 ? 1 : x * fac(x-1)

But that won’t work – we’re using fac to define fac, but we haven’t defined it yet! Usually when you want to use something you don’t know, you abstract over it at let someone pass it to you later. So we could abstract over the factorial function itself and write:

fac => x => x == 0 ? 1 : x * fac(x-1)

Now what we are saying is, if someone could please hand us the fac function, then we can tell them what the fac function should do, in terms of itself.

At the face of it that seems immediately useless; but in fact it is not; we just need to find a way to “wire the function back to itself”.

Fixed points

Some functions on a domain have one or more fixed points. A fixed point for a function f is a value x such that f(x) = x. For instance a fixed point of the function MultiplyByTwo is 0 because MultiplyByTwo(0) = 0.

Our lambda above is also a function on a domain; i.e., one that maps from a type to the same type. The type of it is

Func<Func<int,int>,Func<int,int>> F =

       fac => x => x == 0 ? 1 : x * fac(x-1)

In other words it maps from Func<int,int> to Func<int,int>. If we could find a fixed point for F, that would be the factorial function! Why? Because, by the definition of fixed points, that fixed point, let us optimistically call it Factorial would satisfy that F(Factorial) = Factorial. Which for any input x would mean that

Factorial(x)

= F(Factorial)(x)

= x == 0 ? 1 : x * Factorial(x-1).

This is the very definition of what it means to be the factorial function!

Finding fixed points for functions can be very hard. Finding fixed points for functions over functions however, is known territory, and in fact guys such as Church, Curry and Turing, who all dabbled in the lambda calculus back in the 1930’s and 40’s came up with nice ways of doing it automatically.

Self replication through self application

To see how we can get something recursive going using just lambda expression, let us look at the simplest form of self application. Imagine that we have a function that takes a function and applies it to itself, like this:

f => f(f)

First let us see how we can type this in C#. This is one of the few odd lambda expressions that you can’t really give a Func<…> type. Try it and see why. Instead let’s define a special delegate type for self-applicable functions:

delegate T SelfApplicable<T>(SelfApplicable<T> self);

 

We can now define the self application function above like this:

SelfApplicable<int> omega = f => f(f);

What happens if we apply omega to itself?

omega(omega);

well omega applies omega to itself, which will apply omega to itself, which will … infinite recursion. While not particularly useful in this setting, the example demonstrates that you can in fact have recursive application in a lambda expression. Now all we have to do is to make it do something useful as it goes, such as call functions recursively.

Y combinators

What both Curry and Turing came up with was ways to write a lambda expression that has the same self-replicating structure as omega, but which additionally replicates something else – the application of a given function.

For some reason the things they came up with are often called Y combinators. Just take that as a curious fact or look up the history yourself. The contract for a Y combinator is roughly this: “Give me myself and a higher order function f, and I’ll return to you a function that is a fixed point of f.” So:

SelfApplicable<Func<Func<Func<int,int>,Func<int,int>>,Func<int,int>>>

  Y = y => f => …;

If we can write the body of Y so that it fulfills the contract, that means that Y(Y) is actually a fixed point finder: Give it a higher order function and it returns a fixed point. For instance, use it on the factorial-related higher-order function F from above, Y(Y)(F), and you get – the factorial function!

But that is the case also inside the body of Y. Because the parameter (lower-case) y represents Y itself, y(y) is also a fixed point finder, which we can then apply inside the body. In particular, inside the body we can use it to take the fixed point of the second argument, f! Can we then just return y(y)(f) as the result?

SelfApplicable<Func<Func<Func<int,int>,Func<int,int>>,Func<int,int>>>

  Y = y => f => y(y)(f);// Not good

Not so fast. That would have us go around in circles forever, not making any progress. We’d never actually use f for anything. But we can recursively apply y(y)(f) to effect the next level of the recursion inside the body of Y. More specifically, remember that f is a higher order function that expects to be passed the fixed point. So let’s pass y(y)(f) – the fixed point – to f itself:

SelfApplicable<Func<Func<Func<int,int>,Func<int,int>>,Func<int,int>>>

  Y = y => f => f(y(y)(f));// Still not quite good

This would actually work if we were in a lazy language such as Haskell. However, in an eager call-by-value language like C#, it would go off and immediately try to compute an infinitely deep factorial function where the recursion is endlessly unfolded. So you never get to actually apply it because you have to wait forever first.

Instead, the right thing to do is to do the unfolding gradually as the arguments to the resulting function are passed. We can do it like this:

SelfApplicable<Func<Func<int,int>,Func<int,int>>, Func<int,int>> Y =

  y => f => x => f(y(y)(f))(x);// Quite good indeed

In this way we defer the recursive call of y(y)(f) until someone has actually called the resulting function – e.g., factorial – with a value. This turns out to work nicely.

Putting it together

We can now go ahead and define a fixed point finder (or rather builder) function Fix as simply:

Func<Func<Func<int,int>,Func<int,int>>,Func<int,int>> Fix = Y(Y);

And finally we can get our factorial function by writing

Func<int,int> factorial = Fix(F);

Or if you want to write it out:

Func<int,int> factorial = Fix(fac => x => x == 0 ? 1 : x * fac(x-1));

The whole code looks like this:

public class Program

{

  delegate T SelfApplicable<T>(SelfApplicable<T> self);

 

  static void Main(string[] args)

  {

     // The Y combinator

     SelfApplicable<

       Func<Func<Func<int,int>,Func<int,int>>,Func<int,int>>

     > Y = y => f => x => f(y(y)(f))(x);

 

     // The fixed point generator

     Func<Func<Func<int, int>, Func<int, int>>, Func<int, int>> Fix =

       Y(Y);

 

     // The higher order function describing factorial

     Func<Func<int,int>,Func<int,int>> F =

       fac => x => x == 0 ? 1 : x * fac(x-1);

 

     // The factorial function itself

     Func<int,int> factorial = Fix(F);

 

     for (int i = 0; i < 12; i++) {

       Console.WriteLine(factorial(i));

     }

  }

}

So how does this work again?

It is instructive to look at function evaluation as a gradual in-place replacement of terms with what they evaluate to. In fact that is how the semantics of the lambda calculus are typically described. So let’s go ahead and “execute” the program ourselves, to observe how the recursion comes about.

First, Y is a lambda expression, so there’s nothing to execute (well in C# a delegate is generated and stored, but that’s it).

Y = y => f => x => f(y(y)(f))(x)

Next, what happens when we build the Fix function? Let’s evaluate:

Fix = Y(Y) = f => x => f(Y(Y)(f))(x)

The higher order function F is again a lambda so nothing to evaluate:

          F = fac => x => x == 0 ? 1 : x * fac(x-1)

In the following we will allow ourselves to use the local variable names above as abbreviations of the functions themselves. Now we are ready to build our factorial function:

          factorial = Fix(F) = x => F(Y(Y)(F))(x)

Now let’s apply factorial to a value:

          factorial(5)
                   =
F(Y(Y)(F))(5)
                   =
(x => x == 0 ? 1 : x * Y(Y)(F)(x-1))(5)
                   =
5 == 0 ? 1 : 5 * Y(Y)(F)(5 – 1)
                   = 5 * Y(Y)(F)(4)
                   = 5 * (y => f => x => f(y(y)(f))(x))(Y)(F)(4))
           
= 5 * F(Y(Y)(F))(4)

See how the recursion occurs? The third-to-last line is equivalent to 5 * Fix(F)(4). So we take the fixed point of F once again, so that the last line is equivalent to 5 * factorial(4)! The factorial function has completed its self replication and is ready to party on the next argument. Of course it will go around and around from here, and all we need to see is how it ends. At some point we’ll get to:

          5 * 4 * 3 * 2 * 1 * F(Y(Y)(F))(0)
                   = 5 * 4 * 3 * 2 * 1 * (x => x == 0 ? 1 : x * Y(Y)(F)(x-1))(0)
                   = 5 * 4 * 3 * 2 * 1 * 0 == 0 ? 1 : 0 * Y(Y)(F)(0 – 1)
                   = 5 * 4 * 3 * 2 * 1 * 1
                   = 120

Magic! We’re done.

Recursive lambdas

Now you may argue that the definition above of factorial isn’t actually a lambda expression, and besides it was created using all sorts of helper functions. But look at the step above where we compute what factorial really “is”. This step is interesting because it shows how a recursive function such as factorial can be written directly as a pure lambda expression without any other predefined scaffolding than a few predefined types: Substitute the actual lambdas designated by Y and F directly in there (surrounded by a few explicit delegate creation expressions when the compiler can’t figure them out, and a renaming of x to i to avoid name collisions) and you have a fully self-contained recursive function directly described as a lambda. Admitted it ain’t pretty:

i => new Func<Func<int,int>,Func<int,int>>(fac => x => x == 0 ? 1 : x * fac(x - 1))(new SelfApplicable<Func<Func<Func<int,int>,Func<int,int>>,Func<int,int>>>(y => f => x => f(y(y)(f))(x))(y => f => x => f(y(y)(f))(x))(fac => x => x == 0 ? 1 : x * fac(x - 1)))(i)

Neat, huh? I can’t even figure out how to line break it so that it approaches readable, so I haven’t. But! It is proof of concept that you can write real recursive lambda expressions if you really really want to! This means you can do it in expression trees, too, although I wouldn’t really recommend it.

So the next time you hear someone complain that you can’t write a recursive lambda expression, just throw them a fixed point generator!

Enough already

Of course you would like to generalize so Y and Fix don’t only work on functions over int. You can’t have generic lambdas, so what do you do? A trick is to put Y and Fix as static members of a generic class:

public class Fun<T> {

  public static … Y = … ;

  public static … Fix = Y(Y);

}

Then you can access the fixed point operator for int functions as Fun<int>.Fix.

For practical purposes, you can also choose to write your fixed point generator as an ordinary named generic recursive method, which is much easier and works just as well:

Func<T, T> Fix<T>(Func<Func<T,T>, Func<T,T>> F) {

  return t => F(Fix(F))(t);

}

Then you also don’t need the intermediate Y combinator and the freak SelfApplicable delegate type.

Regardless of how you write your fixed point generator, it is a useful thing whenever you want to pass a recursive function that you cook up on the fly.

If you want to play some more with this stuff, I recommend the following two exercises:

1)    Make a fixed point generator for functions of two arguments

2)    Figure out how to make two mutually recursive functions in this way

Anyway, enough lambda calculus for one day. Now get back to work!

kick it on DotNetKicks.com

Is C# becoming a functional language?

As many of you will be aware, C#3.0 is adding a significant number of new language features.  While the overall driving force behind putting these features in is the support of LINQ (Language INtegrated Query), the way we do it is strongly inspired by functional programming techniques. Moreover, we strive to make the new features as general-purpose as possible (with the exception of query expressions – they are undeniably tied to the querying domain!), so they do come out as features that will enable more of a functional style of programming.

This begs the question: Are we making C# a functional language?

This of course is a vague question. First of all, what exactly is a functional programming language? What does it take to qualify? Rather than trying to define the canonical list of language features that you have to have to be functional, I think it makes more sense to define it as “a language that supports a functional style of programming.”

Given that, my just as vague answer would be “to some degree”: If you’re a dedicated functional programmer, is this the time to throw out your Haskell, Scheme, F# or OCaml compiler and join the party? Probably not. C# doesn’t go the whole way. In this post I’ll explore in more detail the goodie bag of functional programming, in a kind of willy-nilly fashion, not focusing on a particular functional language, and then examine to what extent C# is supporting each of the features or styles of programming.

Please be aware that Visual Basic is adding most of the same language features as C# in the Orcas release.

Expression-based programming

Wes Dyer has a great post about expression-based versus statement-based programming. When you construct things the expression-based way, you build them bottom-up from their constituents, as opposed to the statements based way, where you create things top down by modifying them.

Instead of

Point p1 = new Point();

P1.X = 3; P1.Y = 5;

Point p2 = new Point();

p2.X = -5; p2.Y = 4;

Point p3 = new Point();

p3.X = -1; p3.Y = -6;

Polygon triangle = new Polygon();

Triangle.Add(p1);

Triangle.Add(p2);

Triangle.Add(p3);

Given the right constructors you could do it the expression-oriented way:

Polygon triangle =

   new Polygon(

      new Point(3,5),

      new Point(-5,4),

      new Point(-1,-6));

Much nicer, and composes really well. Trouble is you have to have the right constructors, but sometimes you don’t, or sometimes it’s impractical or annoying to write all the constructors that correspond to reasonable initialization patterns of your type. Furthermore they may get big, and the caller then has to remember the order of all the arguments.

To this end, C#3.0 introduces object initializers and constructor initializers. Assuming that Polygon is a collection type (see my previous post for a definition of that term), you can now write this:

Polygon triangle =

     new Polygon {

       new Point { X = 3, Y = 5 },

     new Point { X = -5, Y = 4 },

     new Point { X = -1, Y = -6 }};

In LINQ, queries are all about expressions. In order to produce the right output of a query, you often need to construct new objects out of old ones as in

from m in members

group m by m.Address.PostalCode into group

select new Entry { PostalCode = group.Key, Number = group.Count() };

The subexpressions of the query expressions are, well, expressions, and it is really important that you can create and initialize new objects in that context, or we don’t have projection in our query story.

Value based programming

Expression based programming limits the need for mutation operations (assignment) to such a degree that some functional languages (the “pure” ones) don’t even have them. Instead all data, whether simple or complex, is considered immutable values.

Even if they do allow mutation, many functional languages rely more on value types. The lack of mutability is really quite a feature. No-one can mess with the stuff you pass to them, and concurrent access is really easy to synchronize! You can have sound value-based equality and hashing. Implementations can optimize to their heart’s contend by copying, parallelizing, etc.

In C# you can certainly create user defined types that are immutable, but it is far from as easy as in a functional language. Making properties immutable is quite easy – don’t write a setter -  but the associated value based semantics for equality and hashcode computation are a pain to write.

C# 3.0 doesn’t change that a lot, but this is something we may want to look into in the future, as the need for parallelism increases with the number of cores in our CPUs, and the number of people wanting to use value-based techniques increases.

Types on the fly

Many functional programming languages rely to a larger degree on structural types that do not have to be declared but are just there when you need them: You have type constructors that allow you to build types from other types, and corresponding value constructors to built values of those types. In a fully structurally typed world you never declare a new type: any type declaration would just introduce an alias for an existing type. (And you need that, because these types can get large!)

We don’t quite go there in C#3.0, but we do take you further in the direction of not having to declare so many trivial little types. With anonymous types, for instance, the above query can be written as

from m in members

group m by m.Address.PostalCode into group

select new { PostalCode = group.Key, Number = group.Count() };

eliminating the need to author a trivial type Entry to hold your results. The limitation here is that you cannot designate the type in any way, so it cannot be used

Another thing that takes you in the direction of structural types is generics: You still have to declare a generic type, but once declared you can construct new types from it over and over. We utilize that in the LINQ libraries to make a set of fully generic delegate types called Func, one for each number of arguments, e.g.:

public delegate TResult Func<TArg0, TResult>(TArg0 arg0);

From this you can create a delegate for almost any one argument function type, e.g. Func<int,bool> for functions from int to bool. Because we use the Func types consistently within LINQ, in that context they essentially act as structural function types. You never need to write your own delegate type again. Essentially, Func is a type constructor for delegate types.

Tuples

One kind of on-the-fly types heavily used in functional programming is tuples: strongly typed lists of values; just like the set of arguments to a method, only as a value in itself that you can pass around.

In popular syntax (1,”one”) is an example of a tuple value of the type (int,string). Some functional languages don’t even allow you to specify functions of more than one value; instead that value can be a tuple. Where tuples would be really interesting would be for multiple results of a function or method. Imagine

  (int,int) NextFib(int curr, int prev) { return (curr+prev,curr); }

Tuples is one really useful kind of type constructor that it might be useful to add in the future. Like the Func types above, you could imagine .NET having centrally defined generic Tuple<…> types predeclared up to a certain arity, e.g.:

  public class Tuple<TFirst,TSecond> {

     readonly TFirst first;

     readonly TSecond second;

     public TFirst First { get { return first; } }

     public TSecond Second { get { return second; } }

     public Tuple(TFirst first, TSecond second) { … }

  }

You get the idea. It would then be a matter of syntactic desugaring to translate (5,3) into new Tuple<int,int>(5,3).

There are currently no plans of shared tuple types on .NET, but languages such as F# on top of .NET roll their own.

Pattern matching

To really make use of constructed values, you need a good way of deconstructing them. For the functional programmer, pattern matching is the tool of choice. With tuples for instance, rather than “dotting your way” into the individual values, as in:

  (int,int) result = NextFib(5,3);

  int current = result.First;

  int previous = result.Second;

you can create and assign in one fell swoop the variables to hold the individual component value of the tuple, by “matching” its structure:

  (int current, int previous) = NextFib(5,3);

Pattern matching can be used to declare a function by cases – as in:

  FibHelp(0) = (1,0)

  FibHelp(n) = NextFib(FibHelp(n–1))

This defines one function, but saves the if-then-else logic by matching on the arguments.

A final common use of type constructors and pattern matching is with built-in immutable list types. Let’s say that [] is the empty list. You can the “cons up” a list as e.g. 1 :: 2 :: 3 :: []. Each occurrence of :: creates a new list value with the left hand side value prepended to the right hand side list. The cool thing is that you can pattern match the elements back off when you see them:

  AddAll([]) = 0;

  AddAll(n :: l) = n + AddAll(l);

Pattern matching is a whole different approach to plucking values apart, and we are nowhere near adding such features to C#.

Higher-order programming

When you want to have a piece of code executed conditionally, or repeatedly, C# has a nice set of built-in control structures. This is, however, a very closed regime: No one gets to add their own control structures as new language features! In order to write your own control structures you need the ability to be parameterized with functionality, with chunks of code.

Any object oriented programming language actually facilitates that. You can write a class

  class ChunkOfCode {

     public abstract void DoIt();

  }

which people can subclass to override the DoIt method with the right functionality. ChunkOfCode objects can then be passed to “control structure methods” that execute the DoIt methods if and where they want.

On .NET we make this a whole lot easier with delegate types. You can grab a delegate to any old method and pass it around.

In C#2.0 we made it even easier by allowing you to construct a delegate “on the fly” with anonymous methods. The major improvement here is not so much avoiding to declare a method, but the fact that anonymous methods are “closures” that allow you to refer to arguments and local variables in the enclosing method body. This is crucial when the functionality you want to pass depends on your local state, as is so often the case.

At the core of LINQ you find the standard query operators, such as Where, Select, GroupBy, Join, etc. These are really user defined control structures, parameterized by little bit of functionality represented as delegate types. Thus to make the passing of these delegates really neat we’ve given anonymous methods a face lift in the form of lambda expressions:

  members

  .GroupBy(m => m.Address.PostalCode)

  .Select(group => new { PostalCode=group.Key, Number=group.Count() });

Lambda expressions in C# are not entirely like anonymous functions in a functional language because they don’t have a type in and off themselves. Instead they can be converted to delegate types, and with the Func types sufficiently standardized, the effect is pretty much the same.

Currying

We talk above about tuples as a way of making input and output more symmetric on methods or functions. However, whereas a C# programmer today only has the option of writing multiple parameters, not multiple results, a functional programmer will often choose to do the opposite – use tuples only for compound result types. The reason is that functions of multiple arguments are often written using currying, where the original function only takes one argument, and then returns a function that takes the rest. Example: Instead of

  Add (x,y) = x + y

You write

  Add (x) (y) = x + y

which is a shorthand for

  Add x = y => x + y

So instead of calling Add(3,2) you can call Add(3) to get a function that will take an integer and return the result of adding 3 to it! Hardcore functional programmers will use this technique extensively and take great care to arrange their curried arguments in such an order that they will lead to the most useful “derived functions” when only some of the arguments are passed.

You have to question how useful this is in practice, but enough functional programmers swear by it that I guess I’m willing to believe it will save me some day.

Strictly speaking, with anonymous methods or lambda expressions C# methods can also be curried, but it ain’t pretty:

  Func<int,int> Add(int x) { return y => x + y; }

However currying of lambda expressions themselves gets pretty agreeable:

  x => y => x + y;

Type inference

One convenient feature of statically typed functional languages is that you can have most or all of your type annotations inferred for you so that typing is really the compiler’s problem. In an object-oriented world we don’t have that option in the large, and in statically typed object-oriented languages such as C# we sort of take pride in the fact that our APIs are in fact annotated with types that users can see. At the architectural level, having explicit contracts is important.

However, in the small there are often cases where types are just in the way.

Already in C# 2.0 we introduced type inference for generic method calls, so that most of the time you do not have to be explicit about the type arguments to a method – they are inferred from the types of the method arguments.

In C# 3.0 we take this idea further, by being really clever about the way we use lambda expressions for type inference when they are used as arguments to a call of a generic method. Thus the types “flow” through the method call in and out of the lambdas and out in the return type.

Furthermore we’re adding type inference for local variables with the var keyword. Thus in

  var query = customers

  .Where(c => c.City == “London”)

  .SelectMany(c => c.Orders)

  .GroupBy(o => o.OrderDate.Year);

Without writing a simple type the compiler can figure out that query is of type IGrouping<int,Order>.

In C# we have deliberately kept the type inferencing algorithm simple, in the sense that it never infers a type that is not already there in the constituent values: We don’t synthesize types. This keeps the types under control, so the user has a chance of understanding what is going on if they get an error message etc.

Comprehensions

Oftentimes the passing around of lambdas gets too messy even for a functional programmer, and so many functional languages introduce syntactic sugar for common patterns involving the passing of functions. Such sugar is called a comprehension (presumably because it makes the code comprehensible) and we snatched that idea right in with query expressions in C# and VB – indeed in the early design days we did call them query comprehensions. In C# these are really just syntactic sugar which enables people to write most queries without lambda expressions. The above query (or one very much like it), for instance is generated by the query expression

  var query =

     from c in customers

     where c.City == “London”

     from o in c.Orders

     group o by o.OrderDate.Year;

The extreme syntacticness of comprehensions is central in LINQ; it means that we just generate the method calls to the query operators syntactically, but anyone can implement their own methods with the right names on a given type and have it be the source of query expressions.

Conclusions

C# 3.0 and LINQ owe a lot to the functional programming languages in terms of the mechanisms we adopt. No doubt this will rub off to some degree on non-LINQ scenarios where a more expression-oriented style of programming becomes possible. People who are functional programmers by night and have to write C# by day will find that the daily headache sets in a little later in the afternoon.

However, as you can see from the above, there is some way to go before full-blown functional programming becomes totally natural in C#. There is probably a limit as to how far you can straddle before the pants rip, and we may never get to a point where C# is a true multiparadigm language with equal treatment of object-oriented and functional programming. Still I think there is room for more of this kind.

We have this thing about listening to customers, so I guess it is also up to you!

 

What is a collection?

Admitted, we blew it in the first version of the framework with System.Collections.ICollection, which is next to useless. But we fixed it up pretty well when generics came along in .NET framework 2.0: System.Collections.Generic.ICollection<T> lets you Add and Remove elements, enumerate them, Count them and check for membership.

Obviously from then on, everyone would implement ICollection<T> every time they make a collection, right? Not so. Here is how we used LINQ to learn about what collections really are, and how that made us change our language design in C# 3.0.

Collection initializers

With LINQ, the Language INtegrated Query framework that we're shipping in Orcas, we're enabling a more expression-oriented style of programming. For instance it should be possible to create and intialize an object within one expression. For collections, initialization typically amounts to adding an initial set of elements. Hence collection initializers in C# 3.0 look like this:

  new MyNames { "Luke Hoban", "Karen Liu", "Charlie Calvert" }

The meaning of this new syntax is simply to create an instance of MyNames using its no-arg constructor (constructor arguments can be supplied if necessary) and call its Add method with each of the strings.

So what types do we allow collection initializers on? Easy: collection types. What are those? Obvious: types that implement ICollection<T>. This is a nice and easy design - ICollection<T> ensures that you have an Add method so obviously that is the one that gets called for each element in the collection initializer. It is strongly typed, too - the initializer can contain only elements of the appropriate element type. In the above new expression, MyNames would be a class that implements ICollection<string> and everything works smoothly from there.

There's just one problem: Nobody implements ICollection<T>!

LINQ to LINQ

Well, nobody is a strong word. But we did an extensive study of our own framework classes, and found only a few that did. How? Using LINQ of course. The following query does the trick:

  from name in assemblyNames

  select Assembly.LoadWithPartialName(name) into a

  from c in a.GetTypes()

  where c.IsPublic &&

     c.GetConstructors().Any(m => m.IsPublic) &&

     GetInterfaceTypes(c).Contains(typeof(ICollection<>))

  select c.FullName;

Let’s go through this query a little bit and see what it does. For each name in a list of assemblyNames that we pre-baked for the purpose, load up the corresponding assembly:

  from name in assemblyNames

  select Assembly.LoadWithPartialName(name)

One at a time, put the reflection objects representing these assemblies into a, and for each assembly a run through the types c defined in there:

  from c in a.GetTypes()

Filter through, keeping each type only if it

a)    IsPublic

b)     has Any constructor that IsPublic

c)     implements ICollection<T> for some T:

  where c.IsPublic &&

     c.GetConstructors().Any(m => m.IsPublic) &&

     GetInterfaceTypes(c).Contains(typeof(ICollection<>))

For those that pass this test, select out their full name:

  select c.FullName;

Nothing to it, really.

What is a collection?

What did we find then? Only 14 of our own (public) classes (with public constructors) implement ICollection<T>! Obviously there are a lot more collections in the framework, so it was clear that we needed some other way of telling whether something is a collection class. LINQ to the rescue once more: With modified versions of the query it was easy to establish that among our public classes with public constructors there are:

·       189 that have a public Add method and implement System.Collections.IEnumerable

·       42 that have a public Add method but do not implement System.Collections.IEnumerable

If you look at the classes returned by these two queries, you realize that there are essentially two fundamentally different meanings of the name “Add”:

a)     Insert the argument into a collection, or

b)     Return the arithmetic sum of the argument and the receiver.

People are actually very good at (directly or indirectly) implementing the nongeneric IEnumerable interface when writing collection classes, so that turns out to be a pretty reliable indicator of whether an Add method is the first or the second kind. Thus for our purposes the operational answer to the headline question becomes:

A collection is a type that implements IEnumerable and has a public Add method

Which Add to call?

We ain’t done yet, though. Further LINQ queries over the 189 collection types identified above show:

·       28 collection types have more than one Add method

·       30 collection types have no Add method with just one argument

So, given that our collection initializers are supposed to call “the” Add method which one should they call? It seems that there will be some value in collection initializers allowing you to:

a)     choose which overload to call

b)     call Add methods with more than one argument

Our resolution to this is to refine our understanding of collection initializers a little bit. The list you provide is not a “list of elements to add”, but a “list of sets of arguments to Add methods”. If an entry in the list consists of multiple arguments to an Add method, these are enclosed in { curly braces }. This is actually immensely useful. For example, it allows you to Add key/value pairs to a dictionary, something we have had a number of requests for as a separate feature.

The initializer list does not have to be homogenous; we do separate overload resolution against Add methods for each entry in the list.

So given a collection class

public class Plurals : IDictionary<string,string> {

  public void Add(string singular, string plural); // implements IDictionary<string,string>.Add

  public void Add(string singular); // appends an “s” to the singular form

  public void Add(KeyValuePair<string,string> pair); // implements ICollection<KeyValuePair<string,string>>.Add

 

}

We can write the following collection initializer:

Plurals myPlurals = new Plurals{ “collection”, { “query”, “queries” }, new KeyValuePair(“child”, “children”) };

which would make use of all the different Add methods on our collection class.

Is this right?

The resulting language design is a “pattern based” approach. We rely on users using a particular name for their methods in a way that is not checked by the compiler when they write it. If they go and change the name of Add to AddPair in one assembly, the compiler won’t complain about that, but instead about a collection initializer sitting somewhere else suddenly missing an overload to call.

Here I think it is instructive to look at our history. We already have pattern-based syntax in C# - the foreach pattern. Though not everybody realizes it, you can actually write a class that does not implement IEnumerable and have foreach work over it; as long as it contains a GetEnumerator method. What happens though is that people overwhelmingly choose to have the compiler help them keep it right by implementing the IEnumerable interface. In the same way we fully expect people to recognize the additional benefit of implementing ICollection<T> in the future – not only can your collection be initialized, but the compiler checks it too. So while we are currently in a situation where very few classes implement ICollection<T> this is likely to change over time, and with the new tweaks to our collection initializer design we hope to have ensured that the feature adds value both now and in that future.

 

Welcome to the language designer's workshop

Hi there,

Ever tried designing a programming language? If you got far enough you'll know that it is fun, exciting, exhausting, mind boggling, frustrating and utterly surprising.

I'm the language PM on the C# team here at Microsoft. I am part of the language design group, take notes from our meetings, communicate decisions to the production team and write some of the specs.

This is a wonderful team. Everyone has a strong sense of responsibility for the quality of our product, and not least for the design of our language. We all have our own opinions of course, and we happily volunteer them. Ever so often, profound discussions erupt, sometimes as galactic mail threads slowly revolving around a fiery core, sometimes as violent tornados of creativity sucking up all productivity in their devastating trail for a couple of days.

Some issues just keep coming back. We get fed up with them, make some decision and throw them out the window, only to see them boomerang back and whack us squarely between the eyes a couple of months later.

Most baffling is the complexity. Sometimes what seems like a small or arbitrary decision has vast consequences. Oftentimes we happily settle on a design, believing that we've thought it all through, only to find weeks later when someone has to implement it that the whole thing is impractical or impossible in its current form.

On this blog I want to share some of that with you; some of the hard issues, cool ideas, crazy mistakes and whirling discussions that are part of my life here at Microsoft.

Thanks for visiting our workshop, and come back soon. And, er... please don't run that query over there - it has only been tested on Customers.

 
Page view tracker