Immutability in C# Part Nine: Academic? Plus my AVL tree implementation

Immutability in C# Part Nine: Academic? Plus my AVL tree implementation

Rate This
  • Comments 23

Good Monday morning all. I have received a lot of good comments on this series so far that I thought I would speak to a bit.

Academic?

First and foremost, a number of people have asked questions which could be summed up as "isn't this just an academic exercise? I have a job to do here!"

No, I do not believe that it is just an academic exercise, not at all. When I compare the practical, solving-real-problems code I write using immutable or ("mostly" immutable objects) to that which uses highly mutable objects, I find that the solutions which use immutable objects are easier to reason about. Knowing that an object isn't going to be "edited" by some other bit of code later on means that anything I deduce about it now is going to continue to be true forever.

I also find that programming in a more "functional" style using immutable objects somehow encourages me to write very small, clear methods that each do one thing very well. This is certainly possible when writing code with mutable objects, but I find that something about writing code for immutable objects encourages that style more. I haven't managed to quite figure out why that is yet. I like that style of programming a lot. Take a look at the short little methods below, for, say, tree rotation. Each of them is about four lines long and obviously correct if you read it carefully. That gives you confidence that those rotation helpers can then be composed to make a correct balancer, which gives you confidence that the tree is balanced.

Another reason why immutable objects tend to be easier to reason about is that this style encourages non-nullable programming. Take a look at the code from the previous entries in this series. The keyword "null" does not appear in any of them! And why should it? Why should we represent an empty stack the same way we represent an empty queue, the same way we represent an empty tree? By ensuring that everything which points to something else is never null, all that boring, slow, hard to read, hard to reason about code which checks to see if things are null just goes away. You need never worry whether that reference is null again if you design your objects so that there's never a null reference in the first place.

And as we'll see in later episodes of FAIC, immutable objects also make it easier to implement performance improvements such as memoization of complex calculations. If an object never changes then you can always cache it for reuse later. An object which someone can edit is less useful that way. Immutable objects stored in a stack make undo-redo operations trivial. And so on.

Learning how to use this style in your own code is therefore potentially quite handy. I believe it is also the case that more and more objects produced by other people will be immutable, so knowing how to deal with them will be increasingly important. For example, the objects which represent expression trees in C# 3.0 are all immutable. If you want to take an expression tree and transform it, you're simply not going to be able to transform it "in place". You're going to have to build a new expression tree out of the old one; knowing how to do so efficiently will help.

Thread safe?

The second question I've gotten a lot is "are immutable objects really more thread safe?" Well, it depends on what you mean by "thread safe". Immutable objects are not necessarily "more" thread safe, but they're a different kind of thread safe.

Let's explore this a bit more. What exactly do we mean by "thread safe"? We can talk about lock order and deadlocks and all of those aspects of thread safety, but let's put those aside for a moment and just consider basic race conditions. Suppose you have two threads each enqueuing jobs onto a queue object, and a third thread dequeuing jobs. Normally we think of the queue as being "thread safe" if no matter what the timing of those three threads happens to be, there is no way that a job is ever "lost". Two enqueues which happen "at the same time" result in a queue with those two things on it, and at some point, both will be dequeued.

If the queue is immutable then implementing this scenario just moves the problem around; now it becomes about serializing access to the mutable variable which is being shared by all three threads. Immutable objects make this kind of scenario harder to reason about and implement; better to just write a threadsafe mutable queue in the first place if that's the problem you're trying to solve.

But let's think about this a different way. Having a threadsafe mutable queue means that you never have any accurate information whatsoever about the state of the queue! You ask the queue "are you empty?" and it says "no", and you dequeue it and hey, you get an "empty queue" exception. Why? Because some other thread dequeued it in the time between when you asked and when you issued your dequeue request. You end up living in a world where enqueuing a queue can possibly make it shorter and dequeuing can make it longer depending on what is going on with other threads. It becomes impossible to reason locally about operations on an object; your consciousness has to expand to encompass all possible operations on the object. It's this necessity for global understanding that makes multithreaded programming so error-prone and difficult.

Immutable queues, on the other hand, give you complete thread safety in this regard. You enqueue an element onto a particular immutable queue and the result you get back is always the same no matter what is happening to that queue on any other thread. You ask a queue if it is empty, if it says no, then you have a 100% iron-clad guarantee that you can safely dequeue it. You can share data around threads as much as you like without worrying that some other thread is going to make an edit which messes up your logic. You can take a chunk of data and run ten different analyzers over it on ten different threads, each one making "changes" to the data during its analysis, and never interfering with each other.

AVL Tree implementation

As promised last time, my supremely unexciting implementation of a self-balancing height-balanced immutable AVL tree in C#. Note that I have made the choice to cache the height in every node rather than recalculating it. That trades memory usage for a bit of extra speed, which is probably worth it in this case. (Though of course to truly answer the question we'd want to set goals, try it both ways and measure the results.)

This is all pretty straightforward. Next time I want to start to wrap up this series by looking at a tricky data structure that solves the problem of building an immutable double-ended queue.

     public sealed class AVLTree<K, V> : IBinarySearchTree<K, V> where K : IComparable<K>
    {
        private sealed class EmptyAVLTree : IBinarySearchTree<K, V>
        {
            // IBinaryTree
            public bool IsEmpty { get { return true; } }
            public V Value { get { throw new Exception("empty tree"); } }
            IBinaryTree<V> IBinaryTree<V>.Left { get { throw new Exception("empty tree"); } }
            IBinaryTree<V> IBinaryTree<V>.Right { get { throw new Exception("empty tree"); } }
            // IBinarySearchTree
            public IBinarySearchTree<K, V> Left { get { throw new Exception("empty tree"); } }
            public IBinarySearchTree<K, V> Right { get { throw new Exception("empty tree"); } }
            public IBinarySearchTree<K, V> Search(K key) { return this; }
            public K Key { get { throw new Exception("empty tree"); } }
            public IBinarySearchTree<K, V> Add(K key, V value) { return new AVLTree<K, V>(key, value, this, this); }
            public IBinarySearchTree<K, V> Remove(K key) { throw new Exception("Cannot remove item that is not in tree."); }
            // IMap
            public bool Contains(K key) { return false; }
            public V Lookup(K key) { throw new Exception("not found"); }
            IMap<K, V> IMap<K, V>.Add(K key, V value) { return this.Add(key, value); }
            IMap<K, V> IMap<K, V>.Remove(K key) { return this.Remove(key); }
            public IEnumerable<K> Keys { get { yield break; } }
            public IEnumerable<V> Values { get { yield break; } }
            public IEnumerable<KeyValuePair<K,V>> Pairs { get { yield break; } }
        }
        private static readonly EmptyAVLTree empty = new EmptyAVLTree();
        public static IBinarySearchTree<K, V> Empty { get { return empty; } }
        private readonly K key;
        private readonly V value;
        private readonly IBinarySearchTree<K, V> left;
        private readonly IBinarySearchTree<K, V> right;
        private readonly int height;
        private AVLTree(K key, V value, IBinarySearchTree<K, V> left, IBinarySearchTree<K, V> right)
        {
            this.key = key;
            this.value = value;
            this.left = left;
            this.right = right;
            this.height = 1 + Math.Max(Height(left), Height(right));
        }
        // IBinaryTree
        public bool IsEmpty { get { return false; } }
        public V Value { get { return value; } }
        IBinaryTree<V> IBinaryTree<V>.Left { get { return left; } }
        IBinaryTree<V> IBinaryTree<V>.Right { get { return right; } }
        // IBinarySearchTree
        public IBinarySearchTree<K, V> Left { get { return left; } }
        public IBinarySearchTree<K, V> Right { get { return right; } }
        public IBinarySearchTree<K, V> Search(K key)
        {
            int compare = key.CompareTo(Key);
            if (compare == 0)
                return this;
            else if (compare > 0)
                return Right.Search(key);
            else
                return Left.Search(key);
        }
        public K Key { get { return key; } }
        public IBinarySearchTree<K, V> Add(K key, V value)
        {
            AVLTree<K, V> result;
            if (key.CompareTo(Key) > 0)
                result = new AVLTree<K, V>(Key, Value, Left, Right.Add(key, value));
            else
                result = new AVLTree<K, V>(Key, Value, Left.Add(key, value), Right);
            return MakeBalanced(result);
        }
        public IBinarySearchTree<K, V> Remove(K key)
        {
            IBinarySearchTree<K, V> result;
            int compare = key.CompareTo(Key);
            if (compare == 0)
            {
                // We have a match. If this is a leaf, just remove it
                // by returning Empty.  If we have only one child,
                // replace the node with the child.
                if (Right.IsEmpty && Left.IsEmpty)
                    result = Empty;
                else if (Right.IsEmpty && !Left.IsEmpty)
                    result = Left;
                else if (!Right.IsEmpty && Left.IsEmpty)
                    result = Right;
                else
                {
                    // We have two children. Remove the next-highest node and replace
                    // this node with it.
                    IBinarySearchTree<K, V> successor = Right;
                    while (!successor.Left.IsEmpty)
                        successor = successor.Left;
                    result = new AVLTree<K, V>(successor.Key, successor.Value, Left, Right.Remove(successor.Key));
                }
            }
            else if (compare < 0)
                result = new AVLTree<K, V>(Key, Value, Left.Remove(key), Right);
            else
                result = new AVLTree<K, V>(Key, Value, Left, Right.Remove(key));
            return MakeBalanced(result);
        }
        // IMap
        public bool Contains(K key) { return !Search(key).IsEmpty; }
        IMap<K, V> IMap<K, V>.Add(K key, V value) { return this.Add(key, value); }
        IMap<K, V> IMap<K, V>.Remove(K key) { return this.Remove(key); }
        public V Lookup(K key)
        {
            IBinarySearchTree<K, V> tree = Search(key);
            if (tree.IsEmpty)
                throw new Exception("not found");
            return tree.Value;
        }
        public IEnumerable<K> Keys { get { return from t in Enumerate() select t.Key; } }
        public IEnumerable<V> Values { get { return from t in Enumerate() select t.Value; }
        }
        public IEnumerable<KeyValuePair<K,V>> Pairs
        {
            get
            {
                return from t in Enumerate() select new KeyValuePair<K, V>(t.Key, t.Value);
            }
        }
        private IEnumerable<IBinarySearchTree<K, V>> Enumerate()
        {
            var stack = Stack<IBinarySearchTree<K,V>>.Empty;
            for (IBinarySearchTree<K,V> current = this; !current.IsEmpty || !stack.IsEmpty; current = current.Right)
            {
                while (!current.IsEmpty)
                {
                    stack = stack.Push(current);
                    current = current.Left;
                }
                current = stack.Peek();
                stack = stack.Pop();
                yield return current;
            }
        }
        // Static helpers for tree balancing
        private static int Height(IBinarySearchTree<K, V> tree)
        {
            if (tree.IsEmpty)
                return 0;
            return ((AVLTree<K,V>)tree).height;
        }
        private static IBinarySearchTree<K, V> RotateLeft(IBinarySearchTree<K, V> tree)
        {
            if (tree.Right.IsEmpty)
                return tree;
            return new AVLTree<K, V>( tree.Right.Key, tree.Right.Value,
                new AVLTree<K, V>(tree.Key, tree.Value, tree.Left, tree.Right.Left),
                tree.Right.Right);
        }
        private static IBinarySearchTree<K, V> RotateRight(IBinarySearchTree<K, V> tree)
        {
            if (tree.Left.IsEmpty)
                return tree;
            return new AVLTree<K, V>( tree.Left.Key, tree.Left.Value, tree.Left.Left,
                new AVLTree<K, V>(tree.Key, tree.Value, tree.Left.Right, tree.Right));
        }
        private static IBinarySearchTree<K, V> DoubleLeft(IBinarySearchTree<K, V> tree)
        {
            if (tree.Right.IsEmpty)
                return tree;
            AVLTree<K, V> rotatedRightChild = new AVLTree<K, V>(tree.Key, tree.Value, tree.Left, RotateRight(tree.Right));
            return RotateLeft(rotatedRightChild);
        }
        private static IBinarySearchTree<K, V> DoubleRight(IBinarySearchTree<K, V> tree)
        {
            if (tree.Left.IsEmpty)
                return tree;
            AVLTree<K, V> rotatedLeftChild = new AVLTree<K, V>(tree.Key, tree.Value, RotateLeft(tree.Left), tree.Right);
            return RotateRight(rotatedLeftChild);
        }
        private static int Balance(IBinarySearchTree<K, V> tree)
        {
            if (tree.IsEmpty)
                return 0;
            return Height(tree.Right) - Height(tree.Left);
        }
        private static bool IsRightHeavy(IBinarySearchTree<K, V> tree) { return Balance(tree) >= 2; }
        private static bool IsLeftHeavy(IBinarySearchTree<K, V> tree) { return Balance(tree) <= -2; }
        private static IBinarySearchTree<K, V> MakeBalanced(IBinarySearchTree<K, V> tree)
        {
            IBinarySearchTree<K, V> result;
            if (IsRightHeavy(tree))
            {
                if (IsLeftHeavy(tree.Right))
                    result = DoubleLeft(tree);
                else
                    result = RotateLeft(tree);
            }
            else if (IsLeftHeavy(tree))
            {
                if (IsRightHeavy(tree.Left))
                    result = DoubleRight(tree);
                else
                    result = RotateRight(tree);
            }
            else
                result = tree;
            return result;
        }
    }
}

 

 

 

 

  • You missed an important optimization:

    By putting the balancing logic in the constructor,  you save allocations and you also do not need to think about balancing again in any of the operations that you perform. You also can reuse the same code for a different balancing algorithm like RB trees for everything but the constructor.

    See Chris Okasaki and his paper containing implementation of red-black trees.

  • Hi Eric,

    Great posts about immutable objects! Go on...

    About the Academic part I think that the code you're writing in this series has its source in academic knowledge and reasoning. It is proved if you see the basic data structures already implemented in the mainstream programming frameworks such as .NET. It's the base of everything. To do the job you inevitably have to use such data structures being rewritten here but this time in a more powerful approach. The benefits of using such immutable implementations are incredible. We want better approaches to already implemented solutions. That's what moves us forward.

    Best,

    Leniel Macaferi

  • Eric,

    I completely agree on the Immutability front - it has made my life easier in many respects. Its a shame some serializers want to break it so readily (XmlSerializer being a big culprit) - could you please post your thoughts on these issues?

    Also, I think the ternary/tertiary/conditional operator (take your pick) really helps when I am writing more functional code. I end up using less local variables, can refactor out commonly used expressions. I would love to hear opinions on this as well. For consideration I rewrote the MakeBalanced function:

    private static IBinarySearchTree<K, V> MakeBalanced(IBinarySearchTree<K, V> tree)

    {

      return IsRightHeavy(tree) ?

                   (IsLeftHeavy(tree.Right) ? DoubleLeft(tree) : RotateLeft(tree)) :

                IsLeftHeavy(tree) ?

                   (IsRightHeavy(tree.Left) ? DoubleRight(tree) : RotateRight(tree)) :

                tree;

    }

  • > I also find that programming in a more "functional" style using

    > immutable objects somehow encourages me to write very small,

    > clear methods that each do one thing very well. [...] I haven't

    > managed to quite figure out why that is yet.

    I think it's because of separability. A functional program (as any program with only immutable types is going to be) is inherently seperable, in that each expression or set of expressions can exist on its own, and only depends on its obvious arguments. If it has some semantic meaning, it can easily and safely be extracted into a function. The same isn't true of programming with mutable objects -- a set of expressions there may only make sense if the object is in some unusual state (if its invariant is violated in some way), and may therefore require other expressions to come before and yet others to come after. Pulling such a set of expressions out into a function (even if they have some associated semantics) is therefore dangerous, since there's then strong coupling between the implementation of that function and its caller(s).

    > Another reason why immutable objects tend to be easier to reason

    > about is that this style encourages non-nullable programming.

    This one I really can't explain, but I'd certainly agree with your observation. I suspect this is as much an education thing as anything else -- it's not really any harder to use mutable types in a non-nullable fashion, but perhaps it's just much more obvious for immutable types?

  • I think a lot of the push back against this style comes from folks with mortgages to pay and established (ossified?) skill bases. I'm sure OO and before that HLL's came in for the same flack when they first appeared on the programming scene (of course in reality FP has been around *forever* in computing terms, but never so 'in your face' [in a good way] until now).

  • Re: where to balance -- good point, thanks Wesner.

  • Re: ternary operator vs if statement

    I'm of two minds on that.

    On the one hand, I cleave to the principle "statements should always have side effects, expressions should avoid side effects".  Therefore, you are right, in this case I should have used a ternary operator rather than an if statement.

    On the other hand, I find nested ternary operators difficult to read. It's just not a very well chosen syntax for legibility.

    Since the actual code they generate is for all practical purposes identical it becomes solely a stylistic question, and in this case I deliberately chose to go for the more readable solution over the more "expression-like" solution.

    What would be really awesome is if we had more operators that acted like statements. Query operators already act much like an expression version of "foreach", the ternary operator acts like an expression "if".  A switch expression would be pretty neat. I'm not sure how exactly a "try" expression would work but it might be something like the null coalescing operator.

    Hmm.  

    I wouldn't hold my breath waiting for this in any hypothetical future version of C#.  But we can dream.

  • For some reason, there's been a lot of buzz lately around immutability in C#. If you're interested in

  • Eric, do you have books, articles, whitepapers you can recommend for better understanding how to adjust ones mindset for functional coding? I find that most code I deal with is in some fashion a statemachine, where every object is the keeper of its state and by using references state is kept. It also means that the moment you venture into multi-threading, there is a lot of lock and waiting on reset events, etc. I've been following your Immutable series with great interest, and love the style of coding, but call it operational debt, most problems I solve on a daily basis, I wouldn't know how to solve if my objects weren't keepers of state.

  • Okasaki's book Purely Functional Data Structures is great and has been a good resource for me in writing this series, but it is VERY academic. I will ask around and see if anyone can recommend to me a good book on functional programming.

    You might consider learning F#. You get to keep using the .NET framework, so all your knowledge of the framework can still be put to good use.

  • re: questions of "Academic" exercise

    My guess for the wariness is that immutable data structures incur an obvious runtime penalty (modification = copy = allocation), and people are reluctant to pay it.  They want to be assured that there is a benefit to this style of coding.  There may indeed be benefits (I've used this before, and I count myself as convinced enough to try more), but they are very hard to quantify.

    Tom Kirby-Green (above) is wrong and condescending, I would argue.  Many in-the-trenches developers are willing to try something new (if it doesn't cost too much), but (a) they don't have time to chase fads and (b) this has not been sold very well [not to imply that this is a fad - I know how old the idea is].  If the sales job improves, I think that this idea actually has a better chance of filtering down than most, because of the platform effect.

    p.s. Love the series

  • Arne Claassen:  I have heard that <a  href="http://books.google.ca/books?id=xyO-KLexVnMC&dq=%22the+little+schemer%22&pg=PP1&ots=GIBHrX3NOy&sig=-cyyF6aMqSOSyCSGfccG2nP2teQ&hl=en&prev=http://www.google.ca/search?q=%22The+little+schemer%22&ie=utf-8&oe=utf-8&rls=org.mozilla:en-US:official&client=firefox-a&sa=X&oi=print&ct=title&cad=one-book-with-thumbnail">The Little Schemer</a> is an excellent place to start - granted, it's not F#, but it is an effective teacher of FP techniques and theory.

  • Welcome to the fortieth issue of Community Convergence. This week we have two new releases of note: We

  • Welcome to the fortieth issue of Community Convergence. This week we have two new releases of note: We

  • I have been fascinated by this discussion on Immutible data structures and have been doing some of my own research and investigation.  It seems that this example of an Immutible AVL tree is missing a couple important optimizations that Immutible data structures lend themselves to.

    First, memory can be reduced by creating more classes that represent different shapes of a tree.  There is no reason to store empty trees in the left and right variables.  SingleLeaf, LeftLeaf and RightLeaf classes can be created similar to the Empty class that do not allocate left and right variables.

    Second, even more memory reduction can be created if even more classes are created that represent the different states of Tree Balance.  3 classes can be created that represent non-leaf trees that are Balanced, LeftHeavy, and RightHeavy.  This should total get rid of the need to store Height because the class the tree is built from holds information about the balance of the tree similar to the way the IsEmpty property holds information about the shape of the tree.  New properties such as IsLeaf, IsBalanced, IsLeftHeavy, and IsRightHeavy can be created for each class.  This can also lead to simpler and faster code because each class of tree only needs to deal with its special shape.

    Does this make sense or am I all wet?

Page 1 of 2 (23 items) 12