Please welcome ImmutableArray<T>

Please welcome ImmutableArray<T>

Rate This
  • Comments 25

We’ve just released an update to our immutable collection package which adds a new member to the family of immutable collection types: ImmutableArray<T>.

In this post, I’ll talk about why we added another collection and how it relates to the existing types. I’ll also cover some minor updates we did to our package.

Introduction ImmutableArray<T>

Sometimes a little bit of code says more than a thousand pictures, so let’s look at the declaration of immutable array:

        public struct ImmutableArray<T> : IList,
                                          IList<T>,
                                          IReadOnlyList<T>,
                                          IImmutableList<T>,
                                          IEquatable<ImmutableArray<t>>,
                                          IStructuralComparable,
                                          IStructuralEquatable
        {
            // ...
        }

As you can see ImmutableArray<T> implements IImmutableList<T> which begs the question how it is different from the existing implementation ImmutableList<T>.

The answer: performance.

Arrays are hard to beat in several ways: they provide an O(1) element access, they are very cache friendly as all data is co-located, and they provide low overhead for small collections (< 16 elements).

ImmutableArray<T> is a very thin wrapper around a regular array and thus shares all the benefits with them. We even made it a value type (struct) as it only has a single field which holds the array it wraps. This makes the size of the value type identical to the reference of the array. In other words: passing around an immutable array is as cheap as passing around the underlying array. Since it’s a value type, there is also no additional object allocation necessary to create the immutable wrapper, which can reduce GC pressure.

The key operations on ImmutableArray<T> just forward to the underlying array, including the indexer. In contrast to List<T> or ArraySegment<T> the underlying array always has the same length as the outer type, i.e. the immutable array. This means we can avoid having additional bounds checking and just rely on the underlying array. This allows the CLR code generation to inline the indexer access.

We’ve also implemented custom LINQ operators for ImmutableArray<T>. This avoids boxing immutable arrays into an IEnumerable<T>.

Using ImmutableArray<T>

Creating an immutable array is similar to creating an immutable list, i.e. it follows the factory pattern via static Create methods:

        ImmutableArray<int> array = ImmutableArray.Create(1, 2, 3);

You can also create an immutable array via the ToImmutableArray() extension method:

        IEnumerable<int> someInts = Enumerable.Range(1, 100);
        ImmutableArray<int> array = someInts.ToImmutableArray();

It also supports the builder pattern which allows constructing immutable arrays via mutation:

        ImmutableArray<int>.Builder builder = ImmutableArray.CreateBuilder<int>();
        builder.Add(1);
        builder.Add(2);
        builder.Add(3);
        ImmutableArray<int> oneTwoThree = builder.ToImmutable();

You may wonder how using the builder differs from using a List<T> with the ToImmutableArray() extension method. Generally, all immutable collection builders are functionally equivalent to their ordinary mutable collection types (List<T>, Dictionary<TKey, TValue> etc). They are purely there to provide better performance. Using the ImmutableArray<T> builder can improve performance as the implementation uses a trick to avoid type checks. Normally the CLR has to perform additional type checks at runtime to ensure that storing an element in an array is type safe, because arrays are covariant which means the correctness can’t be easily checked statically. ImmutableArray<T> avoids this by wrapping references in a pointer-sized value type. For more details on this subject I recommend this blog post from Eric Lippert.

Reading data from an ImmutableArray<T> is similar to reading regular arrays:

        ImmutableArray<int> array = //...
        for (var i = 0; i < array.Length; i++)
        {
            Console.WriteLine(array[i]);
        }

We’ve decided to go with having a Length property instead of Count because we see ImmutableArray<T> as a replacement for regular arrays. This makes it easier to port existing code. It also makes the design a bit more self-contained.

Of course we also support enumerating the elements via foreach. ImmutableArray<T> follows what other collection types do as well: it implements IEnumerable<T> but also provides a custom enumerator which allows the compiler to generate more efficient code which avoids boxing the enumerator.

Since ImmutableArray<T> is a value type, it overloads the equals (==) and not-equals (!=) operators. They are defined as using reference equality on the underlying array.

The default value of ImmutableArray<T> has the underlying array initialized with a null reference. In this case it behaves the same way as an ImmutableArray<T> that has been initialized with an empty array, i.e. the Length property returns 0 and iterating over it simply doesn’t yield any values. In most cases this is the behavior you would expect. However, in some cases you may want to know that the underlying array hasn’t been initialized yet. For that reason ImmutableArray<T> provides the property IsDefault which returns true if the underlying array is a null reference. For example you can use that information to implement lazy initialization:

        private ImmutableArray<byte> _rawData;
        public ImmutableArray<byte> RawData
        {
            get
            {
                if (_rawData.IsDefault)
                    _rawData = LoadData();
                return _rawData;
            }
        }

ImmutableArray<T> vs. ImmutableList<T>

In contrast to what I said earlier, we decided to make immutable array a persistent data structure. In other words: similar to ImmutableList<T> it has methods that allow creating new instances of immutable arrays with different contents. Mind you, the performance characteristic of those operations is fairly different from ImmutableList<T>. ImmutabeList<T> uses a tree data structure that allows sharing. It also allows for making sure updates can be done in O(log n).

The following table summarizes the performance characteristics of ImmutableArray<T>. You can find a similar table for the existing types here.

OperationImmutableArray<T> ComplexityImmutabeList<T> ComplexityComments
Create() O(n) O(n) Requires to copy the input array to ensure the underlying array can’t be modified
this[] O(1) O(log n) Directly index into the underlying array
Add() O(n) O(log n) Requires creating a new array

Reasons to use immutable array:

  • Updating the data is rare or the number of elements is quite small (<16)
  • you need to be able to iterate over the data in performance critical sections
  • you have many instances of immutable collections and you can’t afford keeping the data in trees

Reasons to stick with immutable list:

  • Updating the data is common or the number of elements isn’t expected to be small
  • Updating the collection is more performance critical than iterating the contents

In general, when all you need is an immutable array and you don’t plan on changing the data ever, use ImmutableArray<T>. If you need to update the data, use ImmutableList<T>.

If you do update the data but think ImmutableArray<T> could perform better overall, you should try both and measure. Remember that designing for performance means to consider different trade-offs. It’s key to measure those in actual scenarios, under real-world workloads.

What does this mean for code that operates with the interface IImmutableList<T>? The interface could be backed by either an immutable list or an immutable array (or a custom type). Due to the different complexities the code cannot rely on the exact complexity. So in general the code should use bulk operations whenever possible in order to make sure the cost is minimized. For example, instead of calling Add() in a loop you should prefer a single call to AddRange().

Other Changes in this Update

Create() and From() factory methods

In the last release, we added the Create() factory methods for constructing immutable collections. We created overloads for both scalars as well as collections:

        public static ImmutableList<T> Create<T>();
        public static ImmutableList<T> Create<T>(params T[] items);
        public static ImmutableList<T> Create<T>(IEnumerable<T> items);

We discovered that the overload that takes the IEnumerable<T> can have surprising results. You’d think you can use the overload that takes IEnumerable<T> by creating collections from other collections:

        var list = new List<string>();
        list.Add("One");
        list.Add("Two");
        // Doh! Actually I wanted to get ImmutableList<string>
        ImmutableList<list<string>> il = ImmutableList.Create(list);

Instead of creating an ImmutableList<string> you end up creating an ImmutableList<List<string>> because overload resolution prefers the params overload over an implicit conversion from List<string> to IEnumerable<string>.

For that reason we’ve decided to remove the ambiguity by renaming all factory methods that operate over IEnumerable<T> to From:

        public static ImmutableList<T> Create<T>();
        public static ImmutableList<T> Create<T>(params T[] items);
        public static ImmutableList<T> From<T>(IEnumerable<T> items);

We decided not to rename the params overload as it is usually called with one or more scalars and not with an array. We believe this design will appear more consistent from typical call sites.

CreateBuilder() factory methods

Previously, you couldn’t create builders directly. You had to create them from an immutable collection like this:

        ImmutableList<int>.Builder builder = ImmutableList.Create<int>().ToBuilder(); 

Now you can simply call a factory method, similar to creating an immutable collection itself:

        ImmutableList<int>.Builder builder = ImmutableList.CreateBuilder<int>();

Value Comparers

Our original design of IImmutableList<T> was based on the idea of keeping it aligned with IImmutableDictionary<TKey, TValue> and IImmutableSet<T>. Both of them have a built-in notion of comparers. When implementing IImmutableList<T> on ImmutableArray<T>, we were faced with the problem of supporting a custom comparer on the type. We could have added one more field to store it, but in this case ImmutableArray<T> would no longer be a cheap wrapper around an array. So we decided to drop the idea of storing a customer comparer on the list itself and instead require consumers to pass in the equality comparer they want to use.

As a result we removed the ValueComparer property and WithComparer methods from IImmutableList<T>.

For the members that implicitly used the comparer we’ve changed their signature to take a comparer (for example, the IndexOf method). This is a breaking change for implementers. For consumers, most of the signature changes shouldn’t be source breaking as we also added extension methods that pass in the default comparer.

TryGetValue() for sets

We’ve received the feedback that IImmutableSet<T> doesn’t have a way to get the actual value that is stored in the set – it only allows for checking whether it contains a value that is considered equal. In most cases, that’s not a problem at all. But consider a set of strings that uses a case-insensitive comparer, for example a set of filenames. If you want to know for a given filename what the canonical casing is, you are out of luck. Therefore we added a TryGetValue method that allows retrieving the original value that was added to the set:

        var set = ImmutableHashSet.Create<string>(StringComparer.OrdinalIgnoreCase);
        set = set.Add("D:\Src\Test.cs");
        string original;
        if (set.TryGetValue("d:\src\test.cs", out original))
        {
            // original contains "D:\Src\Test.cs"
        }

GetValueOrDefault() for dictionaries

Lastly, we’ve fixed an issue with the GetValueOrDefault() extension method we’ve added for dictionaries.

        ImmutableDictionary<string, string> dictionary = ImmutableDictionary
                                                        .Create<string, string>()
                                                        .Add("key1", "value1")
                                                        .Add("key2", "value2");
        string value = dictionary.GetValueOrDefault("key1");

Unfortunately, this results in a compilation error:

The call is ambiguous between the following methods or properties: 'ImmutableDictionary.GetValueOrDefault<string,string>(IReadOnlyDictionary<string,string>, string)' and 'ImmutableDictionary.GetValueOrDefault<string,string>(IDictionary<string,string>, string)'

The reason is that we offered this extension method for IDictionary<TKey, TValue> as well as IReadOnlyDictionary<TKey, TValue>. Since an instance of the concrete ImmutableDictionary<TKey, TValue> class is both, the compiler cannot decide which one to use. Typing the local variable to either of the types will work. You can also type the local variable as the IImmutableDictionary<TKey, TValue> interface as the interface only extends IReadOnlyDictionary<TKey, TValue> (in fact that’s why we didn’t catch it earlier as this was the code we used in the unit tests).

We’ve solved this issue by removing the overloads for IDictionary<TKey, TValue> and IReadOnlyDictionary<TKey, TValue>. Instead we added one for IImmutableDictionary<TKey, TValue>. This solves ambiguity and keeps the immutable package focused on types relating to immutable data structures.

Summary

The latest iteration of the immutable collections preview adds an immutable array type. It’s a zero overhead wrapper around a regular array that ensures it never changes.

Go play around with it and let us know what you think!

Leave a Comment
  • Please add 7 and 4 and type the answer here:
  • Post
  • This is great! Was just having a conversation recently about how we might want to consider moving to encouraging immutability in ReactiveUI and all the benefits that we might gain.

    However, looking at all the other libraries coming out of the BCL team lately, I'm concerned about the licensing. Are you planning to have a Windows only restriction on the non-eval-only release? Please consider this carefully and avoid a situation like what happened with MEF. tirania.org/.../Sep-07.html

    Talk to the ASP.NET MVC/Web API team about this as they've long removed that restriction from all their compiled binaries.

    I can't imagine (and correct me if I'm wrong) that there's a strategic benefit to this that would outweigh the benefit to you and the cross platform .NET community to have this code be runnable cross-platform.

  • @Haacked: Thanks for sharing your concerns and the candid feedback. You post raised two very different items:

    (1) Windows only restriction on the license of the binaries

    (2) Open sourcing immutable collections

    From what I read, it sounds you are more interested in (1), is this correct?

  • Awesome, glad to see you guys were able to add this.

    On the From method- I think you guys should just drop the params overload and rename the IEnumerable method back to Create. It's confusing (and harder to remember) to have methods Create and From that both conceptually do the exact same thing. It's probably rare someone would use the params overload, and even so, it's trivial to use an implicitly typed array instead (ImmutableList.Create(new[] { 3, 4, 5}) instead of ImmutableList.Create(3, 4, 5)).

  • Not sure why appending to the head of your immutable list isn't O(1)

  • @Immo, well I'm interested in both. But I also understand how Microsoft works enough to know that (1) is much easier than (2). :P

    So ultimately, I think both would be great, but for now, (1) is a good first step and a minimal requirement for us to use it in ReactiveUI etc.

  • It would be great if you could also add TryGetValue to the regular HashSet<T> class. An example where this would be useful is building a string pool (or any kind of object pool); if the string is already in the HashSet, getting a reference to the already-stored string allows that instance to be reused, saving memory. A workaround is using Dictionary<string, string> (and storing the same string as both the key and the value), but that's inefficient. (The other option is implementing one's own class, but it would be nice to be able to reuse BCL code.)

  • What's so special about arrays with less than 16 items? As I understand it, the overhead of array is O(1), while the overhead of tree is O(n), so array should be always more efficient for reading.

    Or is that suggestion because of the O(n^2) time required to create an array item by item without a builder?

  • I have to agree with Haacked on this one.

    As far as I'm concerned, .net/C# has by far the best developer story out there. I love it and I want to use it badly on new projects.

    But with libraries being Windows only and a clear shift towards multi-platform environments in the mobile and cloud spaces, starting a new project in C# is a tough sell if I cannot easily port it to other platforms.

    I could go on and reminisce about how Microsoft has forgotten about their actual customers, developers and tech enthusiasts in the name of a coherent corporate vision that is akin to squaring the circle.

    The current state of affairs is just saddening and I wish Microsoft would once again try to deliver us the best platform with the best support instead of trying to be Apple.

  • What's with hiding all the constructors? Sure, the various builder methods are useful, but if I really want an ImmutableArray<int> then I should just be able to write x = new ImmutableArray<int>(1,2,3); I shouldn't have to dig around for where you hid the create/from/build/whatever method.

    Take a look at our other immutable collections such as Tuple and String. They offer both builder methods and easy to find constructors.

    I'm also thinking about testing tools. I use code generation a lot to create templates for stuff I need to test. That coder generator looks for public constructors and prepopulates them with random but reasonable data. With this pattern I would have to special case all of the immutable collections.

  • Small typo: ImmutabeList

  • @Jonathan Allen

    From the paragraph titled "Construction" on this page blogs.msdn.com/.../update-to-immutable-collections.aspx:

    We explicitly decided not to provide constructors for immutable collections. The reason for this is that immutable collections typically start with an empty instance. Since all empty instances would be the same, this would result in a lot of wasted objects. Instead, we decided to use a singleton as the empty instance.

  • You guys should consider applying this same Builder treatment to Strings. Strings are immutable, but you certainly wouldn't know it from their API. There's no fluent way to convert them to a StringBuilder and then apply mutations. Trim, Remove, Replace, etc. all make a lot more sense on StringBuilder than they do on String.

  • I'm so happy to see this, especially on the back of the Jon Skeet blog post about the type check in normal arrays.

    Feature request: can we have overloads for ImmutableArray.From (or Create) that take an IReadOnlyCollection (or just an IEnumerable plus count) and an optional selector function, and allocate the correct-sized array up front? We can do it ourselves with ImmutableArray.CreateBuilder(int) and a loop, but it would be good if the package itself supported this case as a pit of success.

  • Another feature request, while I'm here: multidimensional immutable arrays, just like multidimensional regular arrays.

  • Nice! Would be great to see a CreateBuilder(int expectedLength), so that ToImmutable() becomes super cheap when the final length matches the expected length.

Page 1 of 2 (25 items) 12