Admitted, we blew it in the first version of the framework with System.Collections.ICollection, which is next to useless. But we fixed it up pretty well when generics came along in .NET framework 2.0: System.Collections.Generic.ICollection<T> lets you Add and Remove elements, enumerate them, Count them and check for membership.

Obviously from then on, everyone would implement ICollection<T> every time they make a collection, right? Not so. Here is how we used LINQ to learn about what collections really are, and how that made us change our language design in C# 3.0.

Collection initializers

With LINQ, the Language INtegrated Query framework that we're shipping in Orcas, we're enabling a more expression-oriented style of programming. For instance it should be possible to create and intialize an object within one expression. For collections, initialization typically amounts to adding an initial set of elements. Hence collection initializers in C# 3.0 look like this:

  new MyNames { "Luke Hoban", "Karen Liu", "Charlie Calvert" }

The meaning of this new syntax is simply to create an instance of MyNames using its no-arg constructor (constructor arguments can be supplied if necessary) and call its Add method with each of the strings.

So what types do we allow collection initializers on? Easy: collection types. What are those? Obvious: types that implement ICollection<T>. This is a nice and easy design - ICollection<T> ensures that you have an Add method so obviously that is the one that gets called for each element in the collection initializer. It is strongly typed, too - the initializer can contain only elements of the appropriate element type. In the above new expression, MyNames would be a class that implements ICollection<string> and everything works smoothly from there.

There's just one problem: Nobody implements ICollection<T>!

LINQ to LINQ

Well, nobody is a strong word. But we did an extensive study of our own framework classes, and found only a few that did. How? Using LINQ of course. The following query does the trick:

  from name in assemblyNames

  select Assembly.LoadWithPartialName(name) into a

  from c in a.GetTypes()

  where c.IsPublic &&

     c.GetConstructors().Any(m => m.IsPublic) &&

     GetInterfaceTypes(c).Contains(typeof(ICollection<>))

  select c.FullName;

Let’s go through this query a little bit and see what it does. For each name in a list of assemblyNames that we pre-baked for the purpose, load up the corresponding assembly:

  from name in assemblyNames

  select Assembly.LoadWithPartialName(name)

One at a time, put the reflection objects representing these assemblies into a, and for each assembly a run through the types c defined in there:

  from c in a.GetTypes()

Filter through, keeping each type only if it

a)    IsPublic

b)     has Any constructor that IsPublic

c)     implements ICollection<T> for some T:

  where c.IsPublic &&

     c.GetConstructors().Any(m => m.IsPublic) &&

     GetInterfaceTypes(c).Contains(typeof(ICollection<>))

For those that pass this test, select out their full name:

  select c.FullName;

Nothing to it, really.

What is a collection?

What did we find then? Only 14 of our own (public) classes (with public constructors) implement ICollection<T>! Obviously there are a lot more collections in the framework, so it was clear that we needed some other way of telling whether something is a collection class. LINQ to the rescue once more: With modified versions of the query it was easy to establish that among our public classes with public constructors there are:

·       189 that have a public Add method and implement System.Collections.IEnumerable

·       42 that have a public Add method but do not implement System.Collections.IEnumerable

If you look at the classes returned by these two queries, you realize that there are essentially two fundamentally different meanings of the name “Add”:

a)     Insert the argument into a collection, or

b)     Return the arithmetic sum of the argument and the receiver.

People are actually very good at (directly or indirectly) implementing the nongeneric IEnumerable interface when writing collection classes, so that turns out to be a pretty reliable indicator of whether an Add method is the first or the second kind. Thus for our purposes the operational answer to the headline question becomes:

A collection is a type that implements IEnumerable and has a public Add method

Which Add to call?

We ain’t done yet, though. Further LINQ queries over the 189 collection types identified above show:

·       28 collection types have more than one Add method

·       30 collection types have no Add method with just one argument

So, given that our collection initializers are supposed to call “the” Add method which one should they call? It seems that there will be some value in collection initializers allowing you to:

a)     choose which overload to call

b)     call Add methods with more than one argument

Our resolution to this is to refine our understanding of collection initializers a little bit. The list you provide is not a “list of elements to add”, but a “list of sets of arguments to Add methods”. If an entry in the list consists of multiple arguments to an Add method, these are enclosed in { curly braces }. This is actually immensely useful. For example, it allows you to Add key/value pairs to a dictionary, something we have had a number of requests for as a separate feature.

The initializer list does not have to be homogenous; we do separate overload resolution against Add methods for each entry in the list.

So given a collection class

public class Plurals : IDictionary<string,string> {

  public void Add(string singular, string plural); // implements IDictionary<string,string>.Add

  public void Add(string singular); // appends an “s” to the singular form

  public void Add(KeyValuePair<string,string> pair); // implements ICollection<KeyValuePair<string,string>>.Add

 

}

We can write the following collection initializer:

Plurals myPlurals = new Plurals{ “collection”, { “query”, “queries” }, new KeyValuePair(“child”, “children”) };

which would make use of all the different Add methods on our collection class.

Is this right?

The resulting language design is a “pattern based” approach. We rely on users using a particular name for their methods in a way that is not checked by the compiler when they write it. If they go and change the name of Add to AddPair in one assembly, the compiler won’t complain about that, but instead about a collection initializer sitting somewhere else suddenly missing an overload to call.

Here I think it is instructive to look at our history. We already have pattern-based syntax in C# - the foreach pattern. Though not everybody realizes it, you can actually write a class that does not implement IEnumerable and have foreach work over it; as long as it contains a GetEnumerator method. What happens though is that people overwhelmingly choose to have the compiler help them keep it right by implementing the IEnumerable interface. In the same way we fully expect people to recognize the additional benefit of implementing ICollection<T> in the future – not only can your collection be initialized, but the compiler checks it too. So while we are currently in a situation where very few classes implement ICollection<T> this is likely to change over time, and with the new tweaks to our collection initializer design we hope to have ensured that the feature adds value both now and in that future.