Immutability in C# Part Nine: Academic? Plus my AVL tree implementation

Immutability in C# Part Nine: Academic? Plus my AVL tree implementation

Rate This
  • Comments 23

Good Monday morning all. I have received a lot of good comments on this series so far that I thought I would speak to a bit.

Academic?

First and foremost, a number of people have asked questions which could be summed up as "isn't this just an academic exercise? I have a job to do here!"

No, I do not believe that it is just an academic exercise, not at all. When I compare the practical, solving-real-problems code I write using immutable or ("mostly" immutable objects) to that which uses highly mutable objects, I find that the solutions which use immutable objects are easier to reason about. Knowing that an object isn't going to be "edited" by some other bit of code later on means that anything I deduce about it now is going to continue to be true forever.

I also find that programming in a more "functional" style using immutable objects somehow encourages me to write very small, clear methods that each do one thing very well. This is certainly possible when writing code with mutable objects, but I find that something about writing code for immutable objects encourages that style more. I haven't managed to quite figure out why that is yet. I like that style of programming a lot. Take a look at the short little methods below, for, say, tree rotation. Each of them is about four lines long and obviously correct if you read it carefully. That gives you confidence that those rotation helpers can then be composed to make a correct balancer, which gives you confidence that the tree is balanced.

Another reason why immutable objects tend to be easier to reason about is that this style encourages non-nullable programming. Take a look at the code from the previous entries in this series. The keyword "null" does not appear in any of them! And why should it? Why should we represent an empty stack the same way we represent an empty queue, the same way we represent an empty tree? By ensuring that everything which points to something else is never null, all that boring, slow, hard to read, hard to reason about code which checks to see if things are null just goes away. You need never worry whether that reference is null again if you design your objects so that there's never a null reference in the first place.

And as we'll see in later episodes of FAIC, immutable objects also make it easier to implement performance improvements such as memoization of complex calculations. If an object never changes then you can always cache it for reuse later. An object which someone can edit is less useful that way. Immutable objects stored in a stack make undo-redo operations trivial. And so on.

Learning how to use this style in your own code is therefore potentially quite handy. I believe it is also the case that more and more objects produced by other people will be immutable, so knowing how to deal with them will be increasingly important. For example, the objects which represent expression trees in C# 3.0 are all immutable. If you want to take an expression tree and transform it, you're simply not going to be able to transform it "in place". You're going to have to build a new expression tree out of the old one; knowing how to do so efficiently will help.

Thread safe?

The second question I've gotten a lot is "are immutable objects really more thread safe?" Well, it depends on what you mean by "thread safe". Immutable objects are not necessarily "more" thread safe, but they're a different kind of thread safe.

Let's explore this a bit more. What exactly do we mean by "thread safe"? We can talk about lock order and deadlocks and all of those aspects of thread safety, but let's put those aside for a moment and just consider basic race conditions. Suppose you have two threads each enqueuing jobs onto a queue object, and a third thread dequeuing jobs. Normally we think of the queue as being "thread safe" if no matter what the timing of those three threads happens to be, there is no way that a job is ever "lost". Two enqueues which happen "at the same time" result in a queue with those two things on it, and at some point, both will be dequeued.

If the queue is immutable then implementing this scenario just moves the problem around; now it becomes about serializing access to the mutable variable which is being shared by all three threads. Immutable objects make this kind of scenario harder to reason about and implement; better to just write a threadsafe mutable queue in the first place if that's the problem you're trying to solve.

But let's think about this a different way. Having a threadsafe mutable queue means that you never have any accurate information whatsoever about the state of the queue! You ask the queue "are you empty?" and it says "no", and you dequeue it and hey, you get an "empty queue" exception. Why? Because some other thread dequeued it in the time between when you asked and when you issued your dequeue request. You end up living in a world where enqueuing a queue can possibly make it shorter and dequeuing can make it longer depending on what is going on with other threads. It becomes impossible to reason locally about operations on an object; your consciousness has to expand to encompass all possible operations on the object. It's this necessity for global understanding that makes multithreaded programming so error-prone and difficult.

Immutable queues, on the other hand, give you complete thread safety in this regard. You enqueue an element onto a particular immutable queue and the result you get back is always the same no matter what is happening to that queue on any other thread. You ask a queue if it is empty, if it says no, then you have a 100% iron-clad guarantee that you can safely dequeue it. You can share data around threads as much as you like without worrying that some other thread is going to make an edit which messes up your logic. You can take a chunk of data and run ten different analyzers over it on ten different threads, each one making "changes" to the data during its analysis, and never interfering with each other.

AVL Tree implementation

As promised last time, my supremely unexciting implementation of a self-balancing height-balanced immutable AVL tree in C#. Note that I have made the choice to cache the height in every node rather than recalculating it. That trades memory usage for a bit of extra speed, which is probably worth it in this case. (Though of course to truly answer the question we'd want to set goals, try it both ways and measure the results.)

This is all pretty straightforward. Next time I want to start to wrap up this series by looking at a tricky data structure that solves the problem of building an immutable double-ended queue.

     public sealed class AVLTree<K, V> : IBinarySearchTree<K, V> where K : IComparable<K>
    {
        private sealed class EmptyAVLTree : IBinarySearchTree<K, V>
        {
            // IBinaryTree
            public bool IsEmpty { get { return true; } }
            public V Value { get { throw new Exception("empty tree"); } }
            IBinaryTree<V> IBinaryTree<V>.Left { get { throw new Exception("empty tree"); } }
            IBinaryTree<V> IBinaryTree<V>.Right { get { throw new Exception("empty tree"); } }
            // IBinarySearchTree
            public IBinarySearchTree<K, V> Left { get { throw new Exception("empty tree"); } }
            public IBinarySearchTree<K, V> Right { get { throw new Exception("empty tree"); } }
            public IBinarySearchTree<K, V> Search(K key) { return this; }
            public K Key { get { throw new Exception("empty tree"); } }
            public IBinarySearchTree<K, V> Add(K key, V value) { return new AVLTree<K, V>(key, value, this, this); }
            public IBinarySearchTree<K, V> Remove(K key) { throw new Exception("Cannot remove item that is not in tree."); }
            // IMap
            public bool Contains(K key) { return false; }
            public V Lookup(K key) { throw new Exception("not found"); }
            IMap<K, V> IMap<K, V>.Add(K key, V value) { return this.Add(key, value); }
            IMap<K, V> IMap<K, V>.Remove(K key) { return this.Remove(key); }
            public IEnumerable<K> Keys { get { yield break; } }
            public IEnumerable<V> Values { get { yield break; } }
            public IEnumerable<KeyValuePair<K,V>> Pairs { get { yield break; } }
        }
        private static readonly EmptyAVLTree empty = new EmptyAVLTree();
        public static IBinarySearchTree<K, V> Empty { get { return empty; } }
        private readonly K key;
        private readonly V value;
        private readonly IBinarySearchTree<K, V> left;
        private readonly IBinarySearchTree<K, V> right;
        private readonly int height;
        private AVLTree(K key, V value, IBinarySearchTree<K, V> left, IBinarySearchTree<K, V> right)
        {
            this.key = key;
            this.value = value;
            this.left = left;
            this.right = right;
            this.height = 1 + Math.Max(Height(left), Height(right));
        }
        // IBinaryTree
        public bool IsEmpty { get { return false; } }
        public V Value { get { return value; } }
        IBinaryTree<V> IBinaryTree<V>.Left { get { return left; } }
        IBinaryTree<V> IBinaryTree<V>.Right { get { return right; } }
        // IBinarySearchTree
        public IBinarySearchTree<K, V> Left { get { return left; } }
        public IBinarySearchTree<K, V> Right { get { return right; } }
        public IBinarySearchTree<K, V> Search(K key)
        {
            int compare = key.CompareTo(Key);
            if (compare == 0)
                return this;
            else if (compare > 0)
                return Right.Search(key);
            else
                return Left.Search(key);
        }
        public K Key { get { return key; } }
        public IBinarySearchTree<K, V> Add(K key, V value)
        {
            AVLTree<K, V> result;
            if (key.CompareTo(Key) > 0)
                result = new AVLTree<K, V>(Key, Value, Left, Right.Add(key, value));
            else
                result = new AVLTree<K, V>(Key, Value, Left.Add(key, value), Right);
            return MakeBalanced(result);
        }
        public IBinarySearchTree<K, V> Remove(K key)
        {
            IBinarySearchTree<K, V> result;
            int compare = key.CompareTo(Key);
            if (compare == 0)
            {
                // We have a match. If this is a leaf, just remove it
                // by returning Empty.  If we have only one child,
                // replace the node with the child.
                if (Right.IsEmpty && Left.IsEmpty)
                    result = Empty;
                else if (Right.IsEmpty && !Left.IsEmpty)
                    result = Left;
                else if (!Right.IsEmpty && Left.IsEmpty)
                    result = Right;
                else
                {
                    // We have two children. Remove the next-highest node and replace
                    // this node with it.
                    IBinarySearchTree<K, V> successor = Right;
                    while (!successor.Left.IsEmpty)
                        successor = successor.Left;
                    result = new AVLTree<K, V>(successor.Key, successor.Value, Left, Right.Remove(successor.Key));
                }
            }
            else if (compare < 0)
                result = new AVLTree<K, V>(Key, Value, Left.Remove(key), Right);
            else
                result = new AVLTree<K, V>(Key, Value, Left, Right.Remove(key));
            return MakeBalanced(result);
        }
        // IMap
        public bool Contains(K key) { return !Search(key).IsEmpty; }
        IMap<K, V> IMap<K, V>.Add(K key, V value) { return this.Add(key, value); }
        IMap<K, V> IMap<K, V>.Remove(K key) { return this.Remove(key); }
        public V Lookup(K key)
        {
            IBinarySearchTree<K, V> tree = Search(key);
            if (tree.IsEmpty)
                throw new Exception("not found");
            return tree.Value;
        }
        public IEnumerable<K> Keys { get { return from t in Enumerate() select t.Key; } }
        public IEnumerable<V> Values { get { return from t in Enumerate() select t.Value; }
        }
        public IEnumerable<KeyValuePair<K,V>> Pairs
        {
            get
            {
                return from t in Enumerate() select new KeyValuePair<K, V>(t.Key, t.Value);
            }
        }
        private IEnumerable<IBinarySearchTree<K, V>> Enumerate()
        {
            var stack = Stack<IBinarySearchTree<K,V>>.Empty;
            for (IBinarySearchTree<K,V> current = this; !current.IsEmpty || !stack.IsEmpty; current = current.Right)
            {
                while (!current.IsEmpty)
                {
                    stack = stack.Push(current);
                    current = current.Left;
                }
                current = stack.Peek();
                stack = stack.Pop();
                yield return current;
            }
        }
        // Static helpers for tree balancing
        private static int Height(IBinarySearchTree<K, V> tree)
        {
            if (tree.IsEmpty)
                return 0;
            return ((AVLTree<K,V>)tree).height;
        }
        private static IBinarySearchTree<K, V> RotateLeft(IBinarySearchTree<K, V> tree)
        {
            if (tree.Right.IsEmpty)
                return tree;
            return new AVLTree<K, V>( tree.Right.Key, tree.Right.Value,
                new AVLTree<K, V>(tree.Key, tree.Value, tree.Left, tree.Right.Left),
                tree.Right.Right);
        }
        private static IBinarySearchTree<K, V> RotateRight(IBinarySearchTree<K, V> tree)
        {
            if (tree.Left.IsEmpty)
                return tree;
            return new AVLTree<K, V>( tree.Left.Key, tree.Left.Value, tree.Left.Left,
                new AVLTree<K, V>(tree.Key, tree.Value, tree.Left.Right, tree.Right));
        }
        private static IBinarySearchTree<K, V> DoubleLeft(IBinarySearchTree<K, V> tree)
        {
            if (tree.Right.IsEmpty)
                return tree;
            AVLTree<K, V> rotatedRightChild = new AVLTree<K, V>(tree.Key, tree.Value, tree.Left, RotateRight(tree.Right));
            return RotateLeft(rotatedRightChild);
        }
        private static IBinarySearchTree<K, V> DoubleRight(IBinarySearchTree<K, V> tree)
        {
            if (tree.Left.IsEmpty)
                return tree;
            AVLTree<K, V> rotatedLeftChild = new AVLTree<K, V>(tree.Key, tree.Value, RotateLeft(tree.Left), tree.Right);
            return RotateRight(rotatedLeftChild);
        }
        private static int Balance(IBinarySearchTree<K, V> tree)
        {
            if (tree.IsEmpty)
                return 0;
            return Height(tree.Right) - Height(tree.Left);
        }
        private static bool IsRightHeavy(IBinarySearchTree<K, V> tree) { return Balance(tree) >= 2; }
        private static bool IsLeftHeavy(IBinarySearchTree<K, V> tree) { return Balance(tree) <= -2; }
        private static IBinarySearchTree<K, V> MakeBalanced(IBinarySearchTree<K, V> tree)
        {
            IBinarySearchTree<K, V> result;
            if (IsRightHeavy(tree))
            {
                if (IsLeftHeavy(tree.Right))
                    result = DoubleLeft(tree);
                else
                    result = RotateLeft(tree);
            }
            else if (IsLeftHeavy(tree))
            {
                if (IsRightHeavy(tree.Left))
                    result = DoubleRight(tree);
                else
                    result = RotateRight(tree);
            }
            else
                result = tree;
            return result;
        }
    }
}

 

 

 

 

  • That makes perfect sense! Write it up and let us see the code. I'll happily post it here, or a link.

  • Eric, you said: "When I compare the practical, solving-real-problems code I write using immutable or ("mostly" immutable objects) to that which uses highly mutable objects, I find that the solutions which use immutable objects are easier to reason about."

    I don't that that's the case, but so far you're only showing immutable constructs that you can use in those practical situations. I personally would find it helpful if you applied these constructs to real practical situations, perhaps compared to using mutable constructs, with explanation as to why it's easier to reason about, simpler to code, etc.

    Thanks, and we appreciate the series.

  • I think there are a couple of errors in the implementation -- or at least it doesn't match the wikipedia article you linked to.

    If I use the following code, it should hit the Right Left Case (in your code called a DoubleLeft).

               IBinarySearchTree<int, string> tree = AVLTree<int, string>.Empty;

               tree = tree.Add(5, "five");

               tree = tree.Add(7, "seven");

               tree = tree.Add(6, "six");

    What I actually get is a single left, same as if I had run it as:

               IBinarySearchTree<int, string> tree = AVLTree<int, string>.Empty;

               tree = tree.Add(5, "five");

               tree = tree.Add(6, "six");

               tree = tree.Add(7, "seven");

    The main reason for this is that the inner test shouldn't be looking for -2 or +2; -1 or +1 is sufficient.

    Changing the tests in MakeBalanced to match the Wikipedia algorithm makes it go down the correct rotation paths, but there's a second bug in RotateRight that causes an exception -- it should be grabbing tree.left.right, not tree.right.left.

  • I created an Immutible AVL Tree implementation that stores the balance information in different classes so has no need to store or calculate height.  It did not lead to simpler code but I think it is more memory and speed effecient.  First I created an enum and an additional interface with Balance and IsSingleLeaf properties.

           private enum AVLTreeBalance

           {

               LeftHeavy = -1,

               Balanced = 0,

               RightHeavy = 1

           }

           private interface IAVLTree : IBinarySearchTree<K, V>

           {

               AVLTreeBalance Balance { get; }

               bool IsSingleLeaf { get; }

               new IAVLTree Add(K key, V value);

               new IAVLTree Remove(K key);

               new IAVLTree Left { get; }

               new IAVLTree Right { get; }

           }

    Then I created 7 classes to represent different shapes and balance:

    private sealed class EmptyAVLTree : IBinarySearchTree<K, V>

    private sealed class SingleLeafAVLTree : IAVLTree

    private sealed class LeftLeafAVLTree : IAVLTree

    private sealed class RightLeafAVLTree : IAVLTree

    private sealed class BalancedAVLTree : IAVLTree

    private sealed class LeftHeavyAVLTree : IAVLTree

    private sealed class RightHeavyAVLTree : IAVLTree

    The code was pretty long but I thought I would share the Add for a RightHeavyAVLTree:

    public IAVLTree Add(K key, V value)

    {   // Add to RightHeavy Tree

       int compare = key.CompareTo(Key);

       if (compare > 0)

       {   // greater than add to right

           IAVLTree newRight = Right.Add(key, value);

           if (newRight.Balance == Right.Balance || newRight.Balance == AVLTreeBalance.Balanced)

               // shape stays the same if right tree keeps same shape

               // or if the right tree goes from from unbalanced to balanced (same height)

               return new RightHeavyAVLTree(Key, Value, Left, newRight);

           else //tree has gained a level on right making it 2 levels greater on right

               return RotateLeft(this, Left, newRight);

       }

       if (compare < 0)

       {   // less than add to left

           IAVLTree newLeft = Left.Add(key, value);

           if (newLeft.Balance == Left.Balance || newLeft.Balance == AVLTreeBalance.Balanced)

               // shape stays the same if left tree keeps same shape

               // or if the left tree goes from from unbalanced to balanced (same height)

               return new RightHeavyAVLTree(Key, Value, newLeft, Right);

           else //tree has gained a level on left making it balanced

               return new BalancedAVLTree(Key, Value, newLeft, Right);

       }

       throw new Exception("key already exists");

    }

    Here is code for RotateLeft and DoubleLeft:

    private static IAVLTree RotateLeft(IAVLTree top, IAVLTree left, IAVLTree right)

    {   // assumes top has key and value, right's height is 2 greater than left's

       if (right.Balance == AVLTreeBalance.RightHeavy)

           // Single Left Rotation and shape becomes balanced

           return new BalancedAVLTree(right.Key, right.Value,

               new BalancedAVLTree(top.Key, top.Value, left, right.Left),

               right.Right);

       else if (right.Balance == AVLTreeBalance.Balanced)

           // Single Left Rotation and shape becomes Left balanced

           return new LeftHeavyAVLTree(right.Key, right.Value,

               new RightHeavyAVLTree(top.Key, top.Value, left, right.Left),

               right.Right);

       else //(right.Balance == AVLTreeBalance.LeftHeavy)

           // Double Left Rotation and shape becomes balanced

           return DoubleLeft(top, left, right);

    }

    private static IAVLTree DoubleLeft(IAVLTree top, IAVLTree left, IAVLTree right)

    {   // assumes top has key and value, right's height is 2 greater than left's

       // assumes right tree is left heavy

       IAVLTree newTop = right.Left; // right.Left tree becomes new top and controls shape

       if (newTop.Balance == AVLTreeBalance.RightHeavy)

           // Left is LeftHeavy, Right is Balanced

           if (newTop.Right.IsSingleLeaf)

               // Left is LeftLeaf

               return new BalancedAVLTree(newTop.Key, newTop.Value,

                   new LeftLeafAVLTree(top.Key, top.Value, left),

                   new BalancedAVLTree(right.Key, right.Value, newTop.Right, right.Right));

           else

               // Left is LeftHeavy

               return new BalancedAVLTree(newTop.Key, newTop.Value,

                   new LeftHeavyAVLTree(top.Key, top.Value, left, newTop.Left),

                   new BalancedAVLTree(right.Key, right.Value, newTop.Right, right.Right));

       else if (newTop.Balance == AVLTreeBalance.Balanced)

           // Both children Balanced

           return new BalancedAVLTree(newTop.Key, newTop.Value,

               new BalancedAVLTree(top.Key, top.Value, left, newTop.Left),

               new BalancedAVLTree(right.Key, right.Value, newTop.Right, right.Right));

       else // (newTop.Balance == AVLTreeBalance.LeftHeavy)

           // Left is Balanced, Right is RightHeavy

           if (newTop.Left.IsSingleLeaf)

               // Right is RightLeaf

               return new BalancedAVLTree(newTop.Key, newTop.Value,

                   new BalancedAVLTree(top.Key, top.Value, left, newTop.Left),

                   new RightLeafAVLTree(right.Key, right.Value, right.Right));

           else

               // Right is RightHeavy

               return new BalancedAVLTree(newTop.Key, newTop.Value,

                   new BalancedAVLTree(top.Key, top.Value, left, newTop.Left),

                   new RightHeavyAVLTree(right.Key, right.Value, newTop.Right, right.Right));

    }

  • Consider approaching you problem as a set of static functions first and then immutable objects next.  Static functions force one to write code that a) does not depend on a context being setup b) do not affect things outside of their parameters (if written properly) and c) can easily be moved around in your code.

    The non-static pattern of

    int functionABC()

    {

    X a = new object X

    a.property_D = 2

    a.property_E = 5

    ...

    a.property_G = 6

    int r = a.method_call()

    return (r)

    }

    becomes a single line of code, functionABC is not needed and can be removed.

    X.method_call(2, 5, 6)

    This is is a common code defect since many class objects are designed for reuse in multiple parts of the application but end up being mostly made up of methods that could easily be made static.

    Static methods avoid the spaghetti code of needing 3 or more calls to an object (construct, set properties, call method)  when the functionality desired is a simple function call without side effects.

  • Hi, Eric. What do you think abount immutable variable length list? Is it possible to implement it?

  • I agree with Donnie Hale. Your immutable dictionary code looks good, but I don't see myself using it, as I wouldn't know how to make it fit into my main mutating-style code.

    For example, you mention a queue, two threads that push to it and one that pops from it. You say it's difficult to write correctly (or even think about), and I agree. Then you introduce the alternative - an immutable queue. Ok, say thread1 wants to push stuff to the queue. So it calls queue.Add a couple of times and gets a larger queue back. Thread2 does the same. So far so good. Now what do those two threads do with their own, larger queues?

    1. Do they merge them, (by iterating and skipping identical sequences of members)?

    2. Do they just provide an externally queriable "StuffThatMustBePushed" member, and leave it to the reader thread to query that and add it to its own queue?

    What would help is some sample (pseudo)code that demonstrates the entire situation you're envisioning with the three threads and the queue.

  • can you tell more about how immutable objects encourage non-nullable programming more then mutable objects ?

Page 2 of 2 (23 items) 12