Fabulous Adventures In Coding

Eric Lippert's Blog

Immutability in C# Part Seven: More on Binary Trees

Lots of good comments on my previous post. To briefly follow up:

  • One of the downsides of immutable tree implementations is that usually the tree must be built from the leaves up, which is not always convenient. We'll look at implementations which hide this fact from the user in future posts.
  • One smartypants pointed out that sure, this tree can have cycles -- if you have another binary tree implementation that has cycles and then paste such a cyclic tree in. True; what I meant was that using only the stated implementation, it's impossible to create a tree with cycles. If you are hell bent on making a bad tree, you can surely do that. Again, in a future post we'll have a tree which is built up by calling tree "mutating" operations, and those really will be guaranteed acyclic even in a world where there are other bad implementations hanging around.

And finally, one reader correctly identified the problem with my recursive implementation of in-order iteration. A tree of maximum height h ends up allocating O(h) iterator objects. The recursive calls get O(h) deep. If h is very large, this could blow the call stack. And if the tree has n nodes, each with an average height of O(h), then iterating each node will require O(h) recursive calls apiece. Therefore the total time cost in calls for iterating the entire tree is O(n h). 

In a binary tree with n nodes, h is always between log n and n, so that means that this iterator has best case asymptotic performance of O(n log n) and worst case  of O(n2) in time, and uses between O(log n) and O(n) in both call stack and heap space. That's pretty lousy.

A better implementation would be to write a tree traversal algorithm which does not use recursion. Use an explicit stack rather than the call stack:

public static IEnumerable<T> InOrder<T>(this IBinaryTree<T> tree)
{
    IStack<IBinaryTree<T>> stack = Stack<IBinaryTree<T>>.Empty;
    for (IBinaryTree<T> current = tree; !current.IsEmpty || !stack.IsEmpty; current = current.Right)
    {
        while (!current.IsEmpty)
        {
            stack = stack.Push(current);
            current = current.Left;
        }
        current = stack.Peek();
        stack = stack.Pop();
        yield return current.Value;
    }
}

This consumes O(n) time, O(h) heap space, and O(1) call stack space. It is also painful to read and analyze compared to the slow naive implementation.

I would very much like to add new syntax to a hypothetical future version of C# which would be a syntactic sugar for the code above. If, hypothetically, we were prioritizing features for unannounced future versions of C#, unfortunately that one would not be our highest priority, so I wouldn't expect it any time soon. My colleague Wes has written a great article that goes into more details about the problem above and ways we might solve it in the future, so check that out if you want the details.

I have the code for the next few blog posts written, but odds are good that I'm not going to have time to write the surrounding text until after the festive holiday season is over. I hope you have a wonderful time during the remainder of 2007 and we'll see you bright and early in 2008 for more fabulous adventures!

Published Wednesday, December 19, 2007 12:01 PM by Eric Lippert
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Vadmyst said:

Great series!!

Tree traversal can also be done without using stack. For example, http://vadmyst.blogspot.com/2007/09/non-recursive-binary-tree-traversal.html. But, this approach makes tree node more "heavy" as additional information has to be used.

December 19, 2007 4:55 PM
 

Eric Lippert said:

In your approach how do you traverse the tree for a second time? All the nodes will be marked as visited.

I think what you want to do is store the "visited" information in a separate dictionary object which is owned by the traverser.

December 19, 2007 5:54 PM
 

Dean Harding said:

You could make Visited an integer, incrementing it each time you visit a node.

Obviously, it also doesn't work for immutable trees :-)

December 19, 2007 7:41 PM
 

Dean Harding said:

I should say "incrementing it each time you TRAVERSE THE TREE"...

December 19, 2007 7:43 PM
 

Vadmyst said:

Dictionary approach seems to be better, in scenarios when traversing is done only with search purpose. In this case not all nodes will be visited and "visited" counter can be useless.

December 20, 2007 4:37 AM
 

Gabe said:

Perhaps a nice feature of a future C# would be to allow a function to delegate the yield ability to some deeper function. That would allow you to write InOrder like this maybe:

public static IEnumerable<T> InOrder<T>(this IBinaryTree<T> tree) { yield tree.left.InOrder(); yield return tree.value; yield tree.right.InOrder(); }

Of course that's not a trivial feature to implement, but maybe someday C# will have continuations.

December 20, 2007 10:02 AM
 

DRBlaise said:

The InOrder function should be checking against IsEmpty instead of null or Count > 0.

And speaking of null, it would be a nice feature to be able to specify that a variable of a class can not be set to null.  This would simplify code so that we would not have to check that a value passed into a method is null because it would not be allowed by the compiler.  This would require being able to identity a default value so that an array could be initialized.  In the case of these immutible data structues, the default would of course by Empty.

December 20, 2007 1:29 PM
 

Gabe said:

"I would very much like to add new syntax to a hypothetical future version of C# which would be a syntactic sugar for the code above."

It's called F#.

December 20, 2007 3:45 PM
 

Eric Lippert said:

Whoops, that's what I get for modifying the code from another project and posting it without compiling it. Thanks for the note.

December 20, 2007 4:55 PM
 

Yuval Gilboa said:

I've a small question unrelated to this immutables series, but I hope is still relevant for this blog:

Why did you not include generic properties/indexers in C#? Are you contemplating doing so in the "hypothetical future version of C#".

December 24, 2007 8:26 PM
 

mattiasw said:

Why do we have to bother about "out-of-stack". Using the stack for recursion in trees are done in functional programming like F#, Lisp all the time.

The only datastructure too big to use the stack is lists and similar, which can be several thousand elements. Trees on the other seldom becomes deeper than 100 levels, especially if you use balanced trees like AVL or red-black trees.

December 27, 2007 1:39 PM
 

Ifeanyi Echeruo [MSFT] said:

Stackless in-order binary tree traversal with left, right and parent pointers only

Off the top of my head and untested

class Node {

 Node Parent;

 Node Left;

 Node Right;

 Object Value;

}

class InOrderTreeWalk {

 bool descendLeft;

 Node current;

 InOrderTreeWalk(Node start) {

   this.descendLeft = true;

   this.current = start;

 }

 bool moveNext() {

   if(this.current == null) {

     return false;

   }

   if(this.descendLeft) {

     this.descendLeft = false;

     while(this.current.left != null) {

       this.current = this.current.left;

     }

   }

   else {

     if(this.current.right != null) {

       this.current = this.current.right;

       while(this.current.left != null) {

         this.current = this.current.left;

       }

     }

     else {

       if(this.current.parent != null && this.current == this.current.parent.left) {

          this.current = this.current.parent;

       }

       else {

         while(this.current.parent != null && this.current == this.current.parent.right) {

           this.current = this.current.parent;

         }

         if(this.current.parent == null) {

           this.current = null;

         }

       }

     }

   }

   return this.current != null;

 }

 object getValue() {

   return (this.current != null) ? this.current.Value : null;

 }

}

December 30, 2007 2:17 PM
 

Smirnov said:

Note that in Ifeanyi's code you are still using an additional O(n) memory for the parent pointers vs Eric's b-tree implementation. A binary tree already takes up O(n) memory for n nodes though, so you could just add it together and get back O(n) total memory usage. This doesn't sound so bad in theory.

On the other hand, if you are using a balanced binary search tree, with recursion or explicit stack a traversal will only use O(log n) for extra memory. In practice it will then use less memory, as you'd have to run O(n/logn) traversals concurrently for it to actually use more.

January 1, 2008 3:30 PM
 

Eric Lippert said:

How are you going to build an immutable binary tree that has parent pointers?

January 1, 2008 6:28 PM
 

Eric Lippert said:

Note that "binary tree" and "b tree" are two entirely different data structures. "b tree" is NOT short for "binary tree".

January 1, 2008 6:29 PM
 

Tales from the Evil Empire said:

For some reason, there's been a lot of buzz lately around immutability in C#. If you're interested in

January 18, 2008 4:05 PM
 

Smirnov said:

I didn't realize that b-tree was an acronym for another data structure. I meant to say binary tree there.

As for building immutable trees with parent pointers, I was not suggesting that, but pointing out that Ifeanyi's binary tree class has parent pointers, whereas Eric's binary tree class does not. The point was that including a parent pointer misleadingly looks like it uses less memory to traverse, when it actually grows at a faster rate than the stack usage.

January 19, 2008 8:59 PM
 

wesele said:

why?

       current = stack.Peek();

       stack = stack.Pop();

why not?

       current = stack.Pop();

February 13, 2008 8:35 PM
 

Eric Lippert said:

Because stack.Pop() returns the popped stack.  Remember, it's an immutable stack; popping it does not change the stack, popping it returns you an entirely different object which represents the state of the popped stack.

More generally, immutable data structures enable the abstract data type implementer to have every method do exactly one thing, which is good design.  Popping and peeking are two entirely different things. One creates a new data structure, the other inspects an existing data structure. So why should they be the same method?

Because mutable stacks are, well, mutable, the implementer of the ADT must somehow work around the inability of callers to reason about the state of the data structure. For this reason, it is very common to see mutable data structures where one method does the work properly done by many different methods, in order to guarantee atomicity.

This question has come up in almost every part of this series. See the comments to the other parts for more details.

February 13, 2008 9:16 PM
 

Eric Lippert said:

> Why did you not include generic properties/indexers in C#?

Is there a compelling customer scenario for doing so which cannot already be easily solved with a generic method? I am not aware of any, but I am happy to hear any such scenario you've got.  

> Are you contemplating doing so in the "hypothetical future version of C#".

Not at this time, no.  

February 13, 2008 9:21 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication


© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker