Immutable collections are now RC

Immutable collections are now RC

Rate This
  • Comments 33

Over the past several months, we’ve been working on a new set of collection types that offer an immutable design. Today we are happy to announce that we are one step closer to a stable version of this work: we’ve just shipped the release candidate (1.0.23 RC) of the Microsoft.Bcl.Immutable NuGet package.

Because we could: an immutable representation of our release timeline

What has changed in RC?

The changes in this update can be summarized into the following areas:

  • Removal of ImmutableArray<T>. With a heavy heart we’ve decided to remove ImmutableArray<T> for now because we aren’t happy with the current design.
  • Comparers. We’ve simplified the exposed concepts by not exposing the comparers through the interfaces.
  • Consistent construction. In the last update, we split the factory methods into methods that deal with scalars and ones that deal with other collections. We’ve improved this by renaming some APIs in order to apply the split consistently.
  • Consistency with ToString. Previously, we didn’t override ToString consistently. We’ve fixed this by not overriding ToString at all.
  • Minor improvements. We’ve added a few convenience APIs.

I’ll discuss these areas in more detail.

ImmutableArray<T> removed for now

After careful consideration we decided to remove ImmutableArray<T> from our 1.0 release plans. We aren’t happy with the design as it is today. This doesn’t mean we have given up on having an ImmutableArray<T> – it just means we need more time to do it properly, and we don’t want to block releasing a stable version of immutable collections because we aren’t happy with a single type. After we ship the stable 1.0, we’ll ship a 1.1 beta that will include ImmutableArray<T>.

Here’s why we weren’t happy with ImmutableArray<T>:

The primary goal for ImmutableArray<T> is to be a zero-overhead wrapper around an ordinary array. The current implementation, although it had little overhead, didn’t satisfy the performance needs of one of our most important partners, Project Roslyn, which served as an excellent benchmark for building a large-scale immutable system.

Because of performance requirements, ImmutableArray<T> needs to be a value type, i.e., a struct. In the CLR, all value types have an implied default state. This state is defined as having an underlying memory of all zeros. For ImmutableArray<T> that means that the underlying array it wraps is null. In order to provide meaningful semantics, we believe it’s best for an immutable array in its default state to behave exactly like an empty array. That’s consistent with the way the .NET Framework deals with value types in general.

Unfortunately, we discovered that this design causes a general performance issue. The JIT compiler is normally able to eliminate the bounds checks for standard for loops. For ImmutableArray<T>, the indexer and Length property are inlined so one would think the bounds check would be eliminated as well. Unfortunately, that’s not the case, because these properties don’t just forward to the underlying array but also contain an additional null check to provide the behavior I explained. This causes the JIT to no longer be able to eliminate the bounds checks. Our partner Roslyn uses ImmutableArray<T> throughout its system, so this caused an unacceptable performance loss.

Today, the only way to get the performance we need is to remove the additional null checks from the indexer and Length property. However, we believe this has significant usability issues.

For example, consider the following code:

int[] ints = new int[] { 1, 2, 3 };

ImmutableArray<int> empty = new ImmutableArray<int>();
ImmutableArray<int> immutableInts = empty.AddRange(ints);

Since all value types have an automatic default constructor that initializes the value type to its default state, empty will be an ImmutableArray<T> whose underlying array is null. Thus, the implementation of AddRange will crash with a NullReferenceException.

There are other places where this problem manifests itself. Since ImmutableArray<T> implements certain collection interfaces (such as IEnumerable and IReadOnlyList), you can pass it to methods that work against the interface. Since the interface reference will be non-null, consumers won’t expect any methods or property invocations to cause a NullReferenceException.

Not exactly a happy place. However, we do have some ideas on how to address this issue and decided to give it more baking time by removing ImmutableArray<T> for now.

Reduced complexity around comparers

After reviewing some of your feedback, we decided that certain APIs that exposed comparers were overkill within the current design. Thus, we removed IHashKeyCollection<TKey> and ISortKeyCollection<TKey> as well as the ValueComparer property from IImmutableDictionary<TKey,TValue>.

Consistent construction

When we introduced factory methods for constructing immutable collections, we also removed all Empty properties. So instead of:

return ImmutableList<int>.Empty;

you have to write:

return ImmutableList.Create<int>();

We received feedback that this looks quite bad in code reviews, because most reviewers assume that Create would return a new instance. For added expressiveness, we decided to bring back the Empty properties, so now you have both options.

In the past update, we introduced the From factory method so you could differentiate between factory methods that work on scalars (Create) and and factory methods that operate on collections (From). We realized that the name From is an unfortunate choice because it doesn’t show up together in IntelliSense with Create. In the RC update, we renamed From to CreateRange.

Also, our naming wasn’t fully consistent. There are some Create overloads on ImmutableDictionary that accepted an IEnumerable. We renamed those to CreateRange as well.

Consistency with ToString

ImmutableDictionary<TKey, TValue> now no longer overrides ToString (it was the only collection that did that anyway). This means that calling ToString on any immutable collection will now return the fully qualified type (which is the behavior of Object.ToString).

For debugging purposes, all collections already make use of DebuggerDisplayAttribute and DebuggerTypeProxyAttribute.

Minor improvements

The current signature of the ToImmutableDictionary method requires two selectors: one that determines the key and one that determines the value. This is inconsistent with Enumerable.ToDictionary, which provides an overload that allows you to specify only the key. The RC update includes the equivalent overload for ToImmutableDictionary. Thus, code like this will now compile:

ImmutableList<Customer> customers = GetCustomers();
ImmtuableDictionary<int, Customer> customerById = customers.ToImmutableDictionary(c => c.Id);

Performing a binary search is a fairly common operation for collections. Since ImmutableList<T> is using a tree behind the scenes, indexing is O(log n). Thus, providing a BinarySearch method on ImmutableList<T> can do a better job because it can traverse the internal tree and avoid having to repeatedly re-locate the node in the loop. For symmetry, we also added the BinarySearch method to the ImmutableList<T>.Builder.

What has not changed?

We’ve received a lot of feedback on immutable collections. There were two major design points many of you commented on: the lack of constructors and the implementation of mutable interfaces. I’d like to explain these design decisions.

The lack of constructors

Some of you asked us to bring back the constructors. The concern was that this code:

ImmutableList<int> list = new ImmutableList<int>(new int[] { 1, 2, 3});

is, by far, more intuitive than this code:

ImmutableList<int> list = ImmutableList.Create<int>(new[] { 1, 2, 3});

That’s absolutely true. When designing APIs, we generally avoid factory methods because they are often an indication of overengineering and generally hurt usability. For example, most developers would agree that XDocument is easier to use than XmlDocument. The fact that XDocument avoids factories is one of the reasons why it’s more usable.

The challenge in designing immutable APIs is that it’s very important to avoid unnecessary allocations. For example, most immutable collections start off with an empty instance. Offering a constructor will inevitably cause significant GC pressure due to the fact that this is the most intuitive way to construct them.

Using factory methods also provide other interesting opportunities for optimization. For example, consider this code:

class NumberProcessor
{
    private readonly ImmutableList<int> _numbers;

    NumberProcessor(IEnumerable<int> numbers)
    {
        _numbers = ImmutableList.CreateRange<int>(numbers);
    }

    // other methods elided for brevity
}

The goal of this code is to keep the numbers around. Since the input is typed as IEnumerable<T>, you can pass virtually any collection, including collections that are already immutable. In case numbers is already an instance of ImmutableList<int>, the CreateRange factory method can simply cast it and thus avoid the construction of a new object. A constructor would have forced us to create a brand new immutable list.

There is also the problem of collection initializers. Suppose we provide a default constructor for ImmutableList<int>. In that case, the following code would compile just fine:

ImmutableList<int> items = new ImmutableList<int> {1, 2, 3};
Console.WriteLine(items.Count);

You would probably expect the output to be 3 but in fact it’s 0. The reason is that collection initializers follow a pattern. As long as the type has a constructor, implements IEnumerable, and provides a public Add method, the compiler lets you use the syntax above. The code that the compiler generates looks like this:

ImmutableList<int> items = new ImmutableList<int>();
items.Add(1);
items.Add(2);
items.Add(3);
Console.WriteLine(items.Count);

Now it’s clear why the output is 0: It’s because collection initializers don’t expect the Add method to return a new instance and instead rely on side effects to mutate the existing instance. Since the result is never used, the items variable still refers to the original, empty collection.

Thus, we decided to exclusively offer factory methods to ensure that you get the best behavior and cannot accidentally do the wrong thing.

Implementing the mutable interfaces

Several people raised the issue that our immutable collection types implement the mutable interfaces. For example, this is what the declaration of ImmutableList<T> looks like:

public sealed class ImmutableList<T> :
        IImmutableList<T>,
        IEnumerable,
        IEnumerable<T>,
        IReadOnlyCollection<T>,
        IReadOnlyList<T>,
        ICollection,
        ICollection<T>,
        IList,
        IList<T>
{
   ...
}

It’s worth pointing out that this declaration does not have a correctness issue – you cannot mutate an instance of ImmutableList<T> by casting it to a mutable interface. For example, calling IList<T>.Add would throw NotSupportedException.

The mutable interfaces always supported a read-only mode, which means that consumers can’t change the contents. This mode is exposed via ICollection.IsReadOnly, which all mutable interfaces inherit from. Thus, the concrete immutable collection types all implement the existing, mutable interfaces in a read-only fashion, which means that IsReadOnly returns true and all methods that would mutate the collection throw NotSupportedException.

That’s not fundamentally incompatible with immutable collections; for all intents and purposes, immutable collections provide an even stronger guarantee that nobody, not just the consumer, can change their contents. Some of you suggested that we shouldn’t implement the mutable interfaces at all, so you shouldn’t be able to accidentally pass an immutable collection to a method that takes an IList<T>. This would compile just fine, but might cause a NotSupportedException at runtime if the method in question tries to modify the collection through the mutable interface.

Based on the data we have, we found that in many cases where methods take a mutable interface (such as IList<T>), the consumer only needs to read the contents. We believe that enabling interoperability with mutable collections is very important, because otherwise it would be difficult to adopt immutable collections into existing code.

Also, this interoperability is consistent with how the rest of the .NET Framework is designed. For example, Streams can support reading, writing, and seeking. Instead of providing separate types, we’ve decided to expose the capabilities via the optional features pattern because it is typically much simpler and doesn’t result in an explosion of combinatorial types. The IsReadOnly property on ICollection<T> is another instance of this pattern.

Thus, we decided that it’s best to implement the mutable interfaces on the concrete types.

Summary

Today we shipped the release candidate of our Microsoft.Bcl.Immutable NuGet package. We plan to release a stable 1.0 package by the end of September. We’ve removed ImmutableArray<T> from 1.0 because we weren’t happy with its design. It will re-appear in 1.1, which will be available as a beta release shortly after we ship a stable 1.0 package.

Please play around with the RC and let us know what you think!

Leave a Comment
  • Please add 2 and 2 and type the answer here:
  • Post
  • Thanks for the update and this post ...

    "However, we do have some ideas on how to address this issue and decided to give it more baking time by removing ImmutableArray<T> for now."

    Would be great if you could share these ideas, im very interested in how you've overcome this given current .net stack...

    Or does a resolution involve a new JITCompiler ?! ;)

  • I am also interested in your ideas with respect to supporting the default state of an ImmutableArray<T> value type. I have had a similar problem before and didn't find any satisfactory solution.

  • Very useful post - thanks.

    I'm also very interested in how you intend to solve the ImmutableArray<T> issue. I wonder whether it involves forgoing C# and using some other language, or perhaps IL directly . . . the suspense is killing me ;)

  • Why don't you support .NET 4.0 and Silverlight. This make the library semi-portable.

  • imho the implications on using the mutable interfaces is larger than you think. If you start using them we can't use them to communicate that our methods require mutable collections.

    I've blogged about it here: blog.gauffin.org/.../immutable-collections-should-not-implement-mutable-interfaces

  • The issues related to the immutable array implementing interface types could be largely mitigated by using explicit interface implementations for those APIs; since you've already boxed by that point (unless it is using constrained-call via generics), having the explicit interfaces include the bounds check shouldn't be an overhead.

  • Please consider adding an overload to BinarySearch that takes a Func<T, int>. That is the most general way to do a binary search. The use case for that is that you might have a List<Customer> and want to search on Customer.Name or some other computed property.

  • @MarcGravell:

    > The issues related to the immutable array implementing interface types could be largely mitigated by using explicit interface implementations for those APIs.

    In fact, that's exactly what we do. The immutable collection classes all implement the mutable interfaces explicitly so that you never see any of the methods that would throw NotSupportedException() via intellisense. You have to actually cast them to the mutable interface before you can call one of them. The immutable collection interfaces do not derive from the mutable ones so you do have a good, solid immutable interface you can use to pass around when you want.

  • @jgauffin:

    > imho the implications on using the mutable interfaces is larger than you think. If you start using them we can't use them to communicate that our methods require mutable collections.

    Thanks for the feedback. Your second statement above suggests that immutable collections will be what kills mutable interfaces. I don't see it that way. Those mutable interfaces, as you know, already support a read only mode. But more to the point there are already implementations of these interfaces inside and outside the .NET Framework (like ReadOnlyCollection<T>) that implement IList<T> but return IsReadOnly=true and throw NotSupportedException if you call a mutating method. So while we're adding to the set of implementations that are read only, your code *already* can't fully rely on IList<T> being mutable, so I don't see that we're crossing a line here as it's already crossed long ago.

    I read your blog post. You mentioned how you might check IsReadOnly as part of input validation to make sure an argument you expect to mutate is in fact mutable before proceeding and throwing if it is not. You can certainly do that if you wish. In many cases, that's not necessary however because eventually your method will call a mutating method like Add and that Add method will throw. Whether your input validation throws or the Add method throws later may not matter -- unless you've made some mutating changes in other objects and can't handle an exception being thrown later in your method in which case your input validation approach seems reasonable.

  • @John, @Tristan, @Kent: Yes, one of the ideas is improving the code gen. Whether that's feasible depends on many things. It's too early to share anything concrete but we are all wearing our thinking caps. The suspense is killing us, too! :-)

    @Laci: The reason we don't support Silverlight and .NET 4 is that we depend on new APIs that were added in .NET 4.5 and we don't have the resources to properly support two code bases. Moving forward, portability is a lot easier because immtuable collections uses the new refactored reference assemblies which will be supported by all future platforms as well.

    @xor80: Thanks for the feedback. I've added a backlog item for it.

  • Any chance of getting some improvements to ImmutableInterlocked? It seems to be missing methods to do safe reference compare & swaps for ImmutableList, ImmutableSortedSet etc.

  • @Nathan: I don't see why not. I've filed a backlog item for it. Thanks for the suggestion!

  • Would it be possible to make IImmutableList<T> etc. covariant?

    Changing the definition to

     interface IImmutableList<out T> { ... }

    would allow something like this:

     class Car { public void StartEngine() { } }

     class BMW : Car { }

     ...

     public void StartAll(ImmutableList<Car> cars)

     {

         foreach (var car in cars)

         {

             car.StartEngine();

         }

     }

     ...

     var myBmws = ImmutableList<BMW>.Empty;

     myBmws = myBmws.Add(new BMW());

     StartAll(myBmws);  // compiler error because IImmutableList<T> not covariant

    Note that in this example, there's an easy workaround: change the signature of StartAll to

     StartAll(IEnumerable<Car> cars)

    which works because IEnumerable<T> IS covariant (IEnumerable<out T>). It would be much nicer though

    if IImmutableList<T> was covariant too so the method could directly take advantage of the immutable collection,

    e.g. build another ImmutableList from it sharing memory.

  • I understand the resource problem very well. We also cannot support two code bases.

    The pity is we cannot evolve our code.

    We have customers on .NET 3.5, with some luck they will be in 5 years on 4.5.

    I can imagine that MS has also such customers.

  • @Max Horstmann: Unfortunately not. The way generic covariance is type checked in C# is that the generic type parameter T must not appear in input positions, such as Add(T item). For contra variance T must not appear in return positions. All but one method on IImmutableList<T> take a T, which would render it quite useless.

    As you noted, you can type the method argument as IEnumerable<T>. IReadOnlyCollection<T> and IReadOnlyList<T> would also work as both are covariant.

    @Laci: Sorry to hear that :-(

Page 1 of 3 (33 items) 123