Immutable collections ready for prime time

Immutable collections ready for prime time

Rate This
  • Comments 33

Today I’m very happy to announce that we released the stable version of the Microsoft.Bcl.Immutable NuGet package. We also published the MSDN documentation on immutable collections.

Thank you!

Nine months ago, we shipped the first preview of immutable collections. This was one of the first BCL features that we decided to ship early and often by releasing out of band (that is, outside the core .NET Framework releases). This allowed us to have a public design discussion, which, in turn, gave us the ability to address your feedback in a faster and more direct way than ever before.

Many of you participated and commented on our blog posts. Immutable collections wouldn’t be what it is now in this stable release if it weren’t for the great feedback and discussion we had with you. Thanks to everybody who participated!

What are immutable collections?

Over time, the .NET Framework has added many features that made concurrent programming a lot easier. This started with the introduction of the thread pool, got a lot more powerful with the task-based model and the Task Parallel Library (TPL), and was improved even more by the addition of the async and await language keywords.

While creating and running concurrently is easier than ever, one of the fundamental problems still exists: mutable shared state. Reading from multiple threads is typically very easy, but once the state needs to be updated, it gets a lot harder, especially in designs that require locking.

An alternative to locking is making use of immutable state. Immutable data structures are guaranteed to never change and can thus be passed freely between different threads without worrying about stepping on somebody else’s toes.

This design creates a new problem though: How do you manage changes in state without copying the entire state each time? This is especially tricky when collections are involved.

This is where immutable collections come in. We created the following immutable collections to make it a lot easier to create immutable data structures:

Let’s look at an example. You have built a billing system and you see that an immutable design will enable you to pass orders around without having to worry about data corruption when multiple threads are involved. For example, you could implement printing selected orders from a worker thread by passing a snapshot of them to the thread. This way you avoid blocking the UI, and allow the user to edit orders without affecting the snapshot passed to the worker thread.

Your mutable design might look like this:

class Order
{
    public Order()
    {
        Lines = new List<OrderLine>();
    }

    public List<OrderLine> Lines { get; private set; }
}

class OrderLine
{
    public int Quantity { get; set; }
    public decimal UnitPrice { get; set; }
    public float Discount { get; set; }

    public decimal Total
    {
        get
        {
         return Quantity * UnitPrice * (decimal) (1.0f - Discount);
        }
    }
}

Now let’s see how you would convert this to an immutable design.

Let’s start with OrderLine. A first step would be to make the properties read-only and initialize them from the constructor:

class OrderLine
{
    public OrderLine(int quantity, decimal unitPrice, float discount)
    {
        Quantity = quantity;
        UnitPrice = unitPrice;
        Discount = discount;
    }

    public int Quantity { get; private set; }

    public decimal UnitPrice { get; private set; }

    public float Discount { get; private set; }

    public decimal Total
    {
        get
        {
         return Quantity * UnitPrice * (decimal) (1.0f - Discount);
        }
    }
}

This new design requires that you create a new instance of an OrderLine whenever any of the values changes. You can make the design a bit more convenient by adding WithXxx methods that let you update individual properties without having to explicitly call the constructor yourself:

class OrderLine
{
    // ...

    public OrderLine WithQuantity(int value)
    {
        return value == Quantity
                ? this
                : new OrderLine(value, UnitPrice, Discount);
    }

    public OrderLine WithUnitPrice(decimal value)
    {
        return value == UnitPrice
                ? this
                : new OrderLine(Quantity, value, Discount);
    }

    public OrderLine WithDiscount(float value)
    {
        return value == Discount
                ? this
                : new OrderLine(Quantity, UnitPrice, value);
    }
}

This makes the immutable type very easy to use:

OrderLine apple = new OrderLine(quantity: 1, unitPrice: 2.5m, discount: 0.0f);
OrderLine discountedAppled = apple.WithDiscount(.3f);

Two things are noteworthy:

  • The WithXxx methods try to avoid creating new instances if the new value is identical to the current value.
  • Fruits, especially apples, seem to be expensive.

(By the way, if you dislike the amount of typing involved in creating an immutable type, you aren’t alone. Andrew Arnott has created a T4 template that helps you with that.)

Now let’s see how we would implement Order in an immutable fashion. The Lines property is already read-only, but it refers to an object that is mutable. Since it’s a collection, it’s easy to convert by simply replacing it with ImmutableList<T>:

class Order
{
    public Order(IEnumerable<OrderLine> lines)
    {
        Lines = lines.ToImmutableList();
    }

    public ImmutableList<OrderLine> Lines { get; private set; }

    public Order WithLines(IEnumerable<OrderLine> value)
    {
        return Object.ReferenceEquals(Lines, value)
            ? this
            : new Order(value);
    }
}

This design has some interesting properties:

  • The constructor accepts IEnumerable<T>, which allows passing in any collection.
  • We use the ToImmutableList() extension method, which will convert the enumerable to an ImmutableList<OrderLine>. If the instance is already an immutable list, it will simply cast instead of creating a new collection.
  • The WithLines() method follows the convention from our OrderLine, which avoids creating a new instance if the new list is identical to the current list of lines.

We could also add some convenience methods to make it easier to update the order lines:

class Order
{
    //...

    public Order AddLine(OrderLine value)
    {
        return WithLines(Lines.Add(value));
    }

    public Order RemoveLine(OrderLine value)
    {
        return WithLines(Lines.Remove(value));
    }

    public Order ReplaceLine(OrderLine oldValue, OrderLine newValue)
    {
        return oldValue == newValue
                ? this
                : WithLines(Lines.Replace(oldValue, newValue));
    }
}

These additions allow creating orders like this:

OrderLine apple = new OrderLine(quantity: 1, unitPrice: 2.5m, discount: 0.0f);
Order order = new Order(ImmutableList.Create(apple));

OrderLine discountedApple = apple.WithDiscount(discount);
Order discountedOrder = order.ReplaceLine(apple, discountedApple);

The nice thing about this design is that it avoids unnecessary object creation whenever possible. For example, when the value of discount is equal to 0.0f, i.e., when there is no discount, discountedApple and discountedOrder refer to the existing instances of apple and order.

Here is why:

  1. apple.WithDiscount() will return the existing instance of apple because the new discount is identical to the current value of the Discount property.
  2. order.ReplaceLine() will return the existing instance if both arguments are the same.

Other operations of our immutable collections follow this spirit of maximizing reuse. For example, adding an order line to an order with 1,000 order lines will not create an entire new list with 1,001 elements. Instead, it will reuse a large chunk of the existing list. This is possible because the list is internally represented as a tree which allows sharing nodes between different instances.

That sounds great - I’d like to learn more

The one-stop shop for all API questions is MSDN. So your first visit should be the MSDN documentation on immutable collections.

Our very first blog post on immutable collections is also a great in-depth introduction. It covers algorithmic complexities as well as the builder pattern, which allows for more efficient bulk updates.

If your question is more along the lines of “why did Microsoft do it this way,” take a look at the entire blog post series on the .NET Blog as well as the now retired BCL Blog. Many of you asked challenging questions and even proposed design improvements which we discussed in following posts.

If you are like me, you probably prefer videos and face-to-face conversations over reading text. If so, you may want to watch the two videos we put together for Channel 9. They allow you to meet the team and gain more insight into the design and inner workings of immutable collections.

Immutable Collections in .NET

Inner workings of immutable collections

Are immutable collections now done?

No. We still plan on bringing back ImmutableArray<T>, which we decided to remove for now, due to some design issues.

Also, our documentation team is publishing updates to MSDN every three weeks. Since the documentation on immutable collections is all new, we’re very interested to get your feedback so we can make the docs even better!

Summary

We’ve just shipped a stable release of Microsoft.Bcl.Immutable. The license now allows usage in production, so they are ready for prime time.

Please have a look at the new documentation on MSDN and let us know what you think!

Leave a Comment
  • Please add 5 and 6 and type the answer here:
  • Post
  • fix the example code please (OrderItem does not exist nor does appleOrder)

  • Is there any way we can get Andrew's T4 templates as a NuGet package?  It looks like there is an assembly and a number of .tt files that need to be included in any application that wants to use Andrew's immutable objects.  It would be great for that to be available as a NuGet package.

  • @Sander: Oops, sorry for that. I've fixed it.

  • @Finder: it sounds like you have need of very high lookup speeds of a dictionary that never mutates. If that's true, just use Dictionary. It's thread-safe for read-only access, and is the fastest implementation for those requirements.

    ImmutableDictionary is worth considering when you do incrementally mutate your dictionary.

    ConcurrentDictionary is worth considering when you must have many threads reading and writing the dictionary concurrently and need everyone seeing the latest stuff all the time. Otherwise, its overhead due to thread handling makes it a bad fit for your situation from what you've said.

  • @Kenny: I'll work on a NuGet distribution of the ImmutableObjectGraph and post an update as a comment here and on my blog blogs.msdn.com/.../andrewarnottms

  • @Andrew L Arnott, would it be a heuristic problem for Visual Studio intellisense to notify us to use ImmutableColleciton, by inspecting common patterns in the code?

    Something like:

    "You may consider using ImmutableCollection here to improve performance significantly".

  • I've playing around with the immutable collections and they're feeling very mature, so nice job you guys!

    I'm doing some examples for OrigoDb which relies heavily on serialization for persistence and moving objects across the network.

    Any particular reason why the collections aren't marked serializable out of the box?

  • @Robert: Yes. First of all, binary serialization isn't support in the portable subset we are using (visualstudio.uservoice.com/.../3701316-make-the-new-immutable-collection-types-serializab). Secondly, we prefer a model where serialization is done outside the core data structures because it makes it more resilient to implementation changes and solves the cross version serialization when used in client/server scenarios. Popular serialization libraries already plan on adding support for immutable collections, e.g. JSON.NET, protobuf-net.

  • Minor typo on the immutable stack documentation:

    "Gets and empty immutable stack."

    Should be

    "Gets an empty immutable stack."

  • @James Wiseman: Thanks. Our doc writer is fixing it now.

  • Could you comment on the remarks by several people on the fact that Immutability of these collections is still a run-time aspect?

    Directly implementing IList<T> does still feel like a mistake.

    It is left up to the user to check for read-only-ness of an IList<T> before mutating it.

    A failure to do so, is a mistake that might result in a run-time Exception.

    We believe that development would benefit if this mistake would result in a compilation error.

    (For background info:  blogs.msdn.com/.../please-welcome-immutablearray.aspx )

    Did you consider changing this design aspect? And if so, could you elaborate on the reasoning why you didn't?

    Maybe could anyone point me to alternative implementations of immutable collections in that differ

    wrt this design aspect?

    Thanks in advance,

    Jan Verley

    LMS International,

    A Siemens business

  • @jan verley: Yes, we've considered the design you proposed and discussed this extensively on our side. In the last blog post, we've outlined our reasoning:

    blogs.msdn.com/.../immutable-collections-are-now-rc.aspx

    (Section "Implementing the mutable interfaces")

    I'm not aware of an alternative implementation of immutable collections that would follow the design you proposed.

    We fully understand the problem you are describing when it comes to accidentally passing an immutable collection to an API typed as IList<T>. However, in practice we believe the problem isn't as bad as it might sound. First of all, this problem already exists since .NET 1.0 as many collections implemented the mutable interface in a read-only fashion and will throw if you try to mutate them. For example, Dictionary.Keys and Dictionary.Values will throw. In other words, a pure read-only state was always part of the contract for all mutable collection interfaces (both non-generic as well as generic).

    Also, from usage data we know that most methods that take the mutable interface only use it for reading. For example, it seems too darn useful to be able to pass an ImmutableArray<T> to a method that performs a binary search and is typed as taking an IList<T>.

  • Just a note: Ayende from the RavenDB team has been using ImmutableDictionary, and have hit some performance issues. Might be worth checking out: ayende.com/.../fruits-of-the-poisonous-tree-voron-performance-refactoring

  • I have a question about nested generic type parameters with the immutable classes. How would you go from something like ImmutableDictionary<int, ImmutableList<string>.Builder>.Builder to ImmutableDictionary<int, ImmutableList<string>>? Would you recommend declaring the type as ImmutableDictionary<int, IList<string>>.Builder and using typecast operations on the dictionary values, or is there a better way?

  • Is ImmutableDictionary differently implemented than FSharpx.PersistentHashMap? I know that FSharpx.PersistentHashMap are based on Hash Array Mapped Tries (HAMT), which is log_32 n for lookups and inserts. It can also supports fast bulk updates using a TransientHashMap. How similiar or different are the performance characteristics of ImmutableDictionary compared to FSharpx.PersistentHashMap? E.g. which is better for reads/writes until which number of items? What about their memory footprint?

Page 2 of 3 (33 items) 123