Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
[This is the first part of a series on generating every string in a language. The next part is here.]
The other day I wrote a little algorithm that did some operation on binary trees. I wanted to test it. I whipped up a few little test cases and it seemed fine, but I wasn’t quite satisfied. I was pretty confident, but maybe there was some odd binary tree topology that I hadn’t considered which would cause a bug. I reasoned that there have got to be only a finite number of binary tree topologies of a given size. I’ll just try all of them.
Before I go on, I need a compact notation for a binary tree. Here’s my tree node:
class Node{ public Node Left { get; private set; } public Node Right { get; private set; } public Node(Node left, Node right) { this.Left = left; this.Right = right; }}
Pretty straightforward –- left node, right node, and that’s it. Notice that for the sake of clarity of this article I’ve removed the data that would normally be stored in the tree node. Let’s assume that they’re numbers for now. The interesting bit here is how I represent the tree as an extremely compact string. A null tree node is represented as x. A nonempty tree node in my “valueless” tree is represented as (left right). Consider this tree:
1 / \x 2 / \ x x
The 2 node has two null children, so it is (xx). The 1 node has one null child on the left and the 2 node on the right, so it is (x(xx)). Make sense? We can write a bit of recursive code that produces such a string:
public static string BinaryTreeString(Node node){ var sb = new StringBuilder(); Action<Node> f = null; f = n => { if (n == null) sb.Append("x"); else { sb.Append("("); f(n.Left); f(n.Right); sb.Append(")"); } }; f(node); return sb.ToString();}
How can we enumerate all possible binary trees of a given size? I reasoned recursively:
There is one tree with zero non-null nodes in it: x. That’s our base case.
Now, pick a number. Four. We want to enumerate all the trees with four non-null nodes in them. Suppose we already know how to enumerate all the trees with three, two, one and zero nodes in them. Let's call the set of binary trees with n nodes in it B(n). Suppose we make all possible combinations of B(x) and B(y) such that a member of B(x) is to the left of the root and a member of B(y) is to the right. I'll write that B(x)B(y). In this notation the trees with four non-null nodes in them have to be in one of these four possible sets: B(0)B(3), B(1)B(2), B(2)B(1), or B(3)B(0).
Obviously this generalizes; we can enumerate all the trees with k nodes in them by going through each of k cases, where each case requires us to solve the problem on some tree sizes smaller than k. Perfect for a recursive solution. Here’s the code.
static IEnumerable<Node> AllBinaryTrees(int size){ if (size == 0) return new Node[] { null }; return from i in Enumerable.Range(0, size) from left in AllBinaryTrees(i) from right in AllBinaryTrees(size - 1 - i) select new Node(left, right);}
Once more, LINQ makes an algorithm read much more like its specification than the equivalent program with lots of nested for loops.
And sure enough, if we run
foreach (var t in AllBinaryTrees(4)) Console.WriteLine(BinaryTreeString(t));
we get all fourteen trees with four non-null nodes:
(x(x(x(xx))))(x(x((xx)x)))(x((xx)(xx)))(x((x(xx))x))(x(((xx)x)x))((xx)(x(xx)))((xx)((xx)x))((x(xx))(xx))(((xx)x)(xx))((x(x(xx)))x)((x((xx)x))x)(((xx)(xx))x)(((x(xx))x)x)((((xx)x)x)x)
And now I have a device which generates tree topologies that I can use to test my algorithm.
How many such trees are there, anyway? Seems like there might be rather a lot of them.
The number of binary trees with n nodes is given by the Catalan numbers, which have many interesting properties. The nth Catalan number is determined by the formula (2n)! / (n+1)!n!, which grows exponentially. (See Wikipedia for several proofs that this is a closed form for the Catalan numbers.) The number of binary trees of a given size is:
So perhaps my plan to try all binary trees of a particular size is not a great one; by the time you get into the mid-teens there are way too many to reasonably try all of them in a short amount of time. Still, I think this is a handy device to have around.
A challenge for you: Suppose we forget about binary trees for a moment and consider arbitrary trees. An arbitrary tree node can have zero, one, or any finite number of child tree nodes. Let's say that a non-empty arbitrary tree is notated as a list of child trees in braces. So {{}{}{{}}} is the tree
1 /|\ 2 3 4 | 5
Because the 2, 3 and 5 nodes each have no children, they are notated as {}. Make sense? Note that order matters; {{}{}{{}}} and {{}{{}}{}} are different trees even though they have a similar structure.
My challenge questions are (1) are there more or fewer arbitrary trees with n nodes than there are binary trees with n nodes? and (2) can you come up with a mechanism for enumerating all the arbitrary trees?
For #1, doesn't it have to be "equal or greater" because every binary tree can also be represented as an arbitrary tree?
@Szindbad: I'm not so sure you did. You *could*, if you wanted, use a depth-first search. Increment the count by 1 at each iteration, stop (pretend it's a dead-end, but append to your results) if count hits 0. Treat each node as having infinite branches (but you end up stopping instead of creating a branch if count hits 0. Of course, I think you could also mimic the recursive strategy Eric came up with.
Pretty cool! Thanks for sharing!
There are more binary trees of n nodes because binary trees also include positional data (left versus right) while an arbitrary tree doesn't.
Not the question asked, but a related problem that I once encountered: how do you enumerate all possible arithmetic expressions involving a set of numbers and a set of operators. If you're familiar with the Countdown numbers game (or Google that expression), you know what I mean.
Arithmetic expressions with binary operators can be treated as binary trees. A brute force approach (remember, when in doubt, use brute force) would be to enumerate every possible binary tree, and then enumerate every permutation of numbers and operators. But this won't be fast, as you can imagine.
A different way to approach it is to get rid of the explicit tree altogether, and use RPN instead. That way, you can permute RPN programs: write a recursive routine which alternately tries adding each operator to the program (if there are more than two items on the stack), and tries adding each number to the program, and recurses after each. When it's run out of numbers, you've got a solution - you can evaluate the RPN program as you go, or at the leaf of the recursion, as you desire.
Maybe I'm missing something, but this tree {{}{}{}} is an arbitrary tree of 4 nodes that is not included in the set of binary trees with 4 nodes. Wouldn't that pretty significantly prove that there are more arbitrary trees than binary trees? Still thinking about number 2.
Nevermind, Orion Adrian makes the correct point. At 4 nodes, the complete set of 4 node arbitrary trees is
{{{{}}}}
{{{}{}}}
{{{}}{}}
{{}{{}}}
{{}{}{}}
which is less than the fourteen trees mentioned in the blog.
For (2) I think it can be solved by thinking about adding a node at any level between the current level and the root level. Here is some code to do it for the string notation:
static void EnumerateTrees()
{
var trees = EnumTree(String.Empty, 0, 4);
foreach (var tree in trees)
Console.WriteLine(tree);
}
static IEnumerable<string> EnumTree(string currentString, int currentLevel, int remainingNodes)
if( remainingNodes == 0) {
return new string[] { currentString + String.Empty.PadLeft(currentLevel,'}') }; // close the last child added
return from levelToAdd in Enumerable.Range(0, currentLevel+1)
from tree in EnumTree(AddChild(currentString, currentLevel, levelToAdd), levelToAdd+1, remainingNodes - 1)
select tree;
static string AddChild(string currentString, int currentLevel, int newChildLevel)
if (currentLevel == newChildLevel)
return currentString + "{";
else
return AddChild(currentString + "}", currentLevel - 1, newChildLevel);
For (1) - the discrepancy is due to the binary trees each having two "slots" so there are null nodes which count as additional permutations: (x)x != x(x) whereas the arbitrary case, the number of slots is equal to the number of children nodes. It seems that Binary(n-1).Count = Arbitrary(n).Count. I don't have a feel yet for why this is so.
Orion Adrian: I'm not sure what the difference between positional information and order matters is.
From the blog post: "Note that order matters; {{}{}{{}}} and {{}{{}}{}} are different trees even though they have a similar structure."
There is only one tree with 1 node: {}.
To combine non-empty tree A (with x nodes) with non-empty tree B (with y nodes) to get a tree with x+y nodes, append the root of A to the child list of the root of B.
So, to enumerate the trees with n nodes, for each k between 1 and (n-1), for each tree A with k nodes, for each tree B with (n-k) nodes, add the combination of A and B to the enumeration.
Thinking about enumerating the trees, I asked myself if there is a closed form of the number of trees having n nodes. I found a surprisingly easy recursive solution, but could not yet resolve it to a closed form (maybe someone can try?):
Let A_n be the number of trees having n+1 nodes. Then,
A_0 := 1
A_n = Sum(i,1,n,A_(i-1) * A_(n-i))
So,
2 Nodes: A_1 = A_0 * A_0 = 1
3 Nodes: A_2 = A_0 * A_1 + A_1 * A_0 = 2
4 Nodes: A_3 = A_0 * A_2 + A_1 * A_1 + A_2 * A_0 = 5
5 Nodes: A_4 = A_0 * A_3 + A_1 * A_2 + A_2 * A_1 + A_3 * A_0 = 14
6 Nodes: A_5 = ... = 42
7 Nodes: A_6 = ... = 132
And so on. You can see this as follows:
- Every tree, say it has n nodes, has a root node. If you take it away, the remaining structure, called an ordered forest, has (n-1) nodes.
- Now, how many ordered forests are there for n nodes? We can solve this recursively. Say the first tree has i nodes, 1 < i <= n. Take this first tree away. Remaining are n-i nodes which form again an ordered forest. Now let's iterate i from 1 to n and sum up the results:
A_n =
(#trees with 1 node) * (#forests with n-1 nodes) +
(#trees with 2 nodes) * (#forests with n-2 nodes) +
(#trees with 3 nodes) * (#forests with n-3 nodes) +
... +
(#trees with n-1 nodes) * (#forests with 1 node)
So all we have to do is determine two things:
- (#trees with x nodes): We can again take the root node of every tree away, giving us an ordered forest. This is, by recursion, equal to A_(x-1)
- (#forests with x nodes): This is easy, since we already have a notion for it: A_x
You can substitue these results and form the sum(var,from,to,what)-expression, yielding the recursive form above.
I hope this is understandable to at least some degre... Otherwise, let me now.
I will probably paste some (short) enumeration code later.
Do I get points for thinking 'outside the tree' if I write a program to simply enumerate all possible balanced bracket strings consisting of exactly N pairs of brackets?
(start with the string "{" - at any given point you may add a { if there are less than N {'s, and may add a } if there are fewer }'s than {'s (minus one so you don't have multiple roots), then at the end add a })
I started writing the program thinking of it in terms of trees (first enumerating all the possibilities for how many nodes are contained in each subtree, then enumerating all the possible shapes of a subtree of that size for each of those), then realized - the strings are all the same length, a little thought tells me there's one { and one } per node, and anyway doesn't this notation look an awful lot like nested sets...
Philo: (x(xx)) and ((xx)x) are two different binary trees that are both a single root node with a single child node - i.e. {{}}.
It seems that the number of trees is equal to the number of trees in a binary tree with one fewer node. So the closed form would be: (2n-2)! / (n)!(n-1)!
Random832: That is how I started out thinking about it. I realized that picking how many closures to make is equivalent to selecting the level on which to add the next node.