A first hand look from the .NET engineering teams
We’ve just released an update to our immutable collection package which adds a new member to the family of immutable collection types: ImmutableArray<T>.
In this post, I’ll talk about why we added another collection and how it relates to the existing types. I’ll also cover some minor updates we did to our package.
Sometimes a little bit of code says more than a thousand pictures, so let’s look at the declaration of immutable array:
public struct ImmutableArray<T> : IList,
As you can see ImmutableArray<T> implements IImmutableList<T> which begs the question how it is different from the existing implementation ImmutableList<T>.
The answer: performance.
Arrays are hard to beat in several ways: they provide an O(1) element access, they are very cache friendly as all data is co-located, and they provide low overhead for small collections (< 16 elements).
ImmutableArray<T> is a very thin wrapper around a regular array and thus shares all the benefits with them. We even made it a value type (struct) as it only has a single field which holds the array it wraps. This makes the size of the value type identical to the reference of the array. In other words: passing around an immutable array is as cheap as passing around the underlying array. Since it’s a value type, there is also no additional object allocation necessary to create the immutable wrapper, which can reduce GC pressure.
The key operations on ImmutableArray<T> just forward to the underlying array, including the indexer. In contrast to List<T> or ArraySegment<T> the underlying array always has the same length as the outer type, i.e. the immutable array. This means we can avoid having additional bounds checking and just rely on the underlying array. This allows the CLR code generation to inline the indexer access.
We’ve also implemented custom LINQ operators for ImmutableArray<T>. This avoids boxing immutable arrays into an IEnumerable<T>.
Creating an immutable array is similar to creating an immutable list, i.e. it follows the factory pattern via static Create methods:
ImmutableArray<int> array = ImmutableArray.Create(1, 2, 3);
You can also create an immutable array via the ToImmutableArray() extension method:
IEnumerable<int> someInts = Enumerable.Range(1, 100);
ImmutableArray<int> array = someInts.ToImmutableArray();
It also supports the builder pattern which allows constructing immutable arrays via mutation:
ImmutableArray<int>.Builder builder = ImmutableArray.CreateBuilder<int>();
ImmutableArray<int> oneTwoThree = builder.ToImmutable();
You may wonder how using the builder differs from using a List<T> with the ToImmutableArray() extension method. Generally, all immutable collection builders are functionally equivalent to their ordinary mutable collection types (List<T>, Dictionary<TKey, TValue> etc). They are purely there to provide better performance. Using the ImmutableArray<T> builder can improve performance as the implementation uses a trick to avoid type checks. Normally the CLR has to perform additional type checks at runtime to ensure that storing an element in an array is type safe, because arrays are covariant which means the correctness can’t be easily checked statically. ImmutableArray<T> avoids this by wrapping references in a pointer-sized value type. For more details on this subject I recommend this blog post from Eric Lippert.
Reading data from an ImmutableArray<T> is similar to reading regular arrays:
ImmutableArray<int> array = //...
for (var i = 0; i < array.Length; i++)
We’ve decided to go with having a Length property instead of Count because we see ImmutableArray<T> as a replacement for regular arrays. This makes it easier to port existing code. It also makes the design a bit more self-contained.
Of course we also support enumerating the elements via foreach. ImmutableArray<T> follows what other collection types do as well: it implements IEnumerable<T> but also provides a custom enumerator which allows the compiler to generate more efficient code which avoids boxing the enumerator.
Since ImmutableArray<T> is a value type, it overloads the equals (==) and not-equals (!=) operators. They are defined as using reference equality on the underlying array.
The default value of ImmutableArray<T> has the underlying array initialized with a null reference. In this case it behaves the same way as an ImmutableArray<T> that has been initialized with an empty array, i.e. the Length property returns 0 and iterating over it simply doesn’t yield any values. In most cases this is the behavior you would expect. However, in some cases you may want to know that the underlying array hasn’t been initialized yet. For that reason ImmutableArray<T> provides the property IsDefault which returns true if the underlying array is a null reference. For example you can use that information to implement lazy initialization:
private ImmutableArray<byte> _rawData;
public ImmutableArray<byte> RawData
_rawData = LoadData();
In contrast to what I said earlier, we decided to make immutable array a persistent data structure. In other words: similar to ImmutableList<T> it has methods that allow creating new instances of immutable arrays with different contents. Mind you, the performance characteristic of those operations is fairly different from ImmutableList<T>. ImmutabeList<T> uses a tree data structure that allows sharing. It also allows for making sure updates can be done in O(log n).
The following table summarizes the performance characteristics of ImmutableArray<T>. You can find a similar table for the existing types here.
Reasons to use immutable array:
Reasons to stick with immutable list:
In general, when all you need is an immutable array and you don’t plan on changing the data ever, use ImmutableArray<T>. If you need to update the data, use ImmutableList<T>.
If you do update the data but think ImmutableArray<T> could perform better overall, you should try both and measure. Remember that designing for performance means to consider different trade-offs. It’s key to measure those in actual scenarios, under real-world workloads.
What does this mean for code that operates with the interface IImmutableList<T>? The interface could be backed by either an immutable list or an immutable array (or a custom type). Due to the different complexities the code cannot rely on the exact complexity. So in general the code should use bulk operations whenever possible in order to make sure the cost is minimized. For example, instead of calling Add() in a loop you should prefer a single call to AddRange().
In the last release, we added the Create() factory methods for constructing immutable collections. We created overloads for both scalars as well as collections:
public static ImmutableList<T> Create<T>();
public static ImmutableList<T> Create<T>(params T items);
public static ImmutableList<T> Create<T>(IEnumerable<T> items);
We discovered that the overload that takes the IEnumerable<T> can have surprising results. You’d think you can use the overload that takes IEnumerable<T> by creating collections from other collections:
var list = new List<string>();
// Doh! Actually I wanted to get ImmutableList<string>
ImmutableList<list<string>> il = ImmutableList.Create(list);
Instead of creating an ImmutableList<string> you end up creating an ImmutableList<List<string>> because overload resolution prefers the params overload over an implicit conversion from List<string> to IEnumerable<string>.
For that reason we’ve decided to remove the ambiguity by renaming all factory methods that operate over IEnumerable<T> to From:
public static ImmutableList<T> Create<T>();
public static ImmutableList<T> Create<T>(params T items);
public static ImmutableList<T> From<T>(IEnumerable<T> items);
We decided not to rename the params overload as it is usually called with one or more scalars and not with an array. We believe this design will appear more consistent from typical call sites.
Previously, you couldn’t create builders directly. You had to create them from an immutable collection like this:
ImmutableList<int>.Builder builder = ImmutableList.Create<int>().ToBuilder();
Now you can simply call a factory method, similar to creating an immutable collection itself:
ImmutableList<int>.Builder builder = ImmutableList.CreateBuilder<int>();
Our original design of IImmutableList<T> was based on the idea of keeping it aligned with IImmutableDictionary<TKey, TValue> and IImmutableSet<T>. Both of them have a built-in notion of comparers. When implementing IImmutableList<T> on ImmutableArray<T>, we were faced with the problem of supporting a custom comparer on the type. We could have added one more field to store it, but in this case ImmutableArray<T> would no longer be a cheap wrapper around an array. So we decided to drop the idea of storing a customer comparer on the list itself and instead require consumers to pass in the equality comparer they want to use.
As a result we removed the ValueComparer property and WithComparer methods from IImmutableList<T>.
For the members that implicitly used the comparer we’ve changed their signature to take a comparer (for example, the IndexOf method). This is a breaking change for implementers. For consumers, most of the signature changes shouldn’t be source breaking as we also added extension methods that pass in the default comparer.
We’ve received the feedback that IImmutableSet<T> doesn’t have a way to get the actual value that is stored in the set – it only allows for checking whether it contains a value that is considered equal. In most cases, that’s not a problem at all. But consider a set of strings that uses a case-insensitive comparer, for example a set of filenames. If you want to know for a given filename what the canonical casing is, you are out of luck. Therefore we added a TryGetValue method that allows retrieving the original value that was added to the set:
var set = ImmutableHashSet.Create<string>(StringComparer.OrdinalIgnoreCase);
set = set.Add("D:\Src\Test.cs");
if (set.TryGetValue("d:\src\test.cs", out original))
// original contains "D:\Src\Test.cs"
Lastly, we’ve fixed an issue with the GetValueOrDefault() extension method we’ve added for dictionaries.
ImmutableDictionary<string, string> dictionary = ImmutableDictionary
string value = dictionary.GetValueOrDefault("key1");
Unfortunately, this results in a compilation error:
The call is ambiguous between the following methods or properties: 'ImmutableDictionary.GetValueOrDefault<string,string>(IReadOnlyDictionary<string,string>, string)' and 'ImmutableDictionary.GetValueOrDefault<string,string>(IDictionary<string,string>, string)'
The reason is that we offered this extension method for IDictionary<TKey, TValue> as well as IReadOnlyDictionary<TKey, TValue>. Since an instance of the concrete ImmutableDictionary<TKey, TValue> class is both, the compiler cannot decide which one to use. Typing the local variable to either of the types will work. You can also type the local variable as the IImmutableDictionary<TKey, TValue> interface as the interface only extends IReadOnlyDictionary<TKey, TValue> (in fact that’s why we didn’t catch it earlier as this was the code we used in the unit tests).
We’ve solved this issue by removing the overloads for IDictionary<TKey, TValue> and IReadOnlyDictionary<TKey, TValue>. Instead we added one for IImmutableDictionary<TKey, TValue>. This solves ambiguity and keeps the immutable package focused on types relating to immutable data structures.
The latest iteration of the immutable collections preview adds an immutable array type. It’s a zero overhead wrapper around a regular array that ensures it never changes.
Go play around with it and let us know what you think!
Is it intentional that ImmutableArray<T>.Enumerator does not implement IEnumerator<T> ?
This makes it slightly annoying to implement a class that implements IEnumerable<T> and delegates to an immutable array.
BCL Immutable collections - General Feedback
(comments on the original blogpost were closed)
We completely agree with the comments on the
by @Joe White, @Harry and others.
We find it difficult to explain why this would even compile:
IList<int> l = ImmutableList.Create<int>(5);
Because it actually means: "an immutable container is also a mutable container".
It's a pity that immutability is only a run-time aspect.
This means that the information of immutability is already lost the moment you pass an immutable container into method F that expects just a container.
This hinders composability and separation of concerns:
- It is impossible to judge by this signature F(IList<int>l) whether passing in an ImmutableList<int> is valid.
- A change in the implementation of F(IList<int>l) might break code that previously worked with an ImmutableList<int> (e.g. adding a mutating call in F). But who made a mistake: client code or implementor?
How can clients judge by the signature if it is the implementors intent to modify l inside F(IList<int> l)?
At compile time there is no warning to the clients to provide a container supporting changes.
We suggest to reconsider this design decision.
Thanks in advance,
A Siemens business
I agree with Jan & Joe White: I think there is very little utility in having the Immutable constructs implement the mutable interfaces. 95% of the time, people want to expose an "IEnumerable<T> with a count" or a "HashSet/Dictionary that is threadsafe and will not be modified".
Yes, Implementing IList allows you to pass an ImmutableList to a method expecting an IList. However a method taking an IList<T> often expects a it to be mutable. If I'm not mistaken, it's a massive liskov violation to substitute a functional data structure when a mutable one is requested. I manage a large (multi-million-lines) codebase and am excited about the benefits of introducing functional data structures, but am confused why they need to implement mutable interfaces and worry about the new classes of bugs this would introduce.
In my opinion, the v1 interfaces like IList and ICollection are deeply flawed. In over 10 years of working with .NET code, I've never seen someone checking the IsReadOnly property. Why not drop direct support for IList<T> and instead provide methods to return simple wrapper objects which implement the mutable interfaces, like "AsReadOnlyCollection()"?
I noticed the XML docs on some of the methods say "see Interface". This is very problematic. I hope this will be fixed to have the correct documentation.
Why does the ImmutableLists's IList.Add implementation throw a NotSupportedException?
I think I am facing a serious bug here with 1.0.12:
Hi. It is late October, 2013. I cannot find the ImmutableArray in the release on Nuget (www.nuget.org/.../Microsoft.Bcl.Immutable) - what happened?
@FindItNot. We're working on it. See this post for details. blogs.msdn.com/.../immutable-collections-ready-for-prime-time.aspx
Why is less than 16 elements significant? Is it really LESS or UP TO 16 elements? What happens internally when it gets over 16 elements?
@HK: It's pretty much just an arbitrary number to qualify what we mean by saying "small". What is important is the ratio between the number of elements and the overhead they inflict when managing memory.
Please don't implement both IList and IReadOnlyList - if you do, you're going to break all kinds of extension methods that might be defined on either. There's a reason IReadOnlyList and IList aren't directly related, right? There's no reason for an immutable collection to ever expose IList.
@Eamon Nerbonne: For immutable we followed what we've already done in the framework.
All our built-in collection implementations, such as arrays, List<T>, Collection<T> and ReadOnlyCollection<T>, also implement the read-only collection interfaces. Because any collection can now be treated as a read-only collection, algorithms are able to declare their intent more precisely without limiting their level of reuse—they can be used across all collection types. In the preceding example, the consumer of LayoutShapes could pass in a List<Circle>, but LayoutShapes would also accept an array of Circle or a Collection<Circle>.
For more details, see the MSDN article I wrote on .NET 4.5.