Yesterday's post dealt with optimizing a single basic block.  Most useful progams contain some control flow however, so in the next few posts we'll explore extending the algorithm to larger regions.

The set of all basic block in a procedure form a graph, known as the Control Flow Graph, where the edges correspond to branches.  Algorithms such as yesterday's that operate on single blocks are known as "local" algorithms, algorithms that operate on the entire graph are referred to as "global," while algorithms that operate on a portion of the graph are referred to as "regional."  In this post and the next, we'll extend yesterday's value-numbering algorithm to operate on a regional level.

Yesterday we did a renaming pass over the block we were analyzing to ensure that no variable was assigned to more than once.  The fact that this is possible in a basic block is pretty obvious, it's much less obvious whether this can be done on an entire control flow graph.  It turns out that such a renaming is possible by rewriting the code into what's known as Static Single Assignment form.  We're just going to accept this fact on faith right now until I get around to discussing it.

For starters, consider a procedure consisting of two blocks A and B.  There's an edge from A to B, meaning that as soon as A finishes executing it branches to the beginning of B.  This combination of A and B satisifies the fundamental property of basic blocks that if one instruction executes they all execute.  Value numbering can clearly be applied across both of these blocks.  This leads us to the following definition:

Def:  Let B0, B1, ..., Bn be a sequence of basic blocks where each Bi is the only parent of Bi+1, and B0 is either the entry node or has multiple parents.  B0, B1, ..., Bn is called an   Extended Basic Block.

An extended basic block doesn't quite satisfy the basic block definition, however we can say that if any block executes all previous block must have executed in order.  It should be obvious then that we can run value numbering over an extended basic block treating it is as a single block.  Extended basic block can often take us pretty far in a procedure, so this is quite nice. 

Extended basic blocks can overlap quite a bit.  If we end up value numbering each extended basic block seperately, we'd end up reevaulating some blocks many times.  One way to do this more efficiently is by using a scoped hash table.

A scoped hash table is a data structure that's often used to interpret lexical scoping in languages.  The idea is to have a stack of hash tables, and to replace a lookup with a a sequence of lookups starting with the top hash table on the stack.  As soon as one lookup succeeds the search stops, this enables definitions higher on the stack to shadow definitions lower on the stack.

The set of extended basic block rooted at a node in the CFG form a tree structure.  We can walk it as follows

   List HashTables
   Extended-Value-Number(Block b)
       create hash table and insert at head of HashTables
   
       Value-Number(B)

       for every successor s of b
           if b is the only parent of s
               Extended-ValueNumber(s)

       pop the head off of HashTables

   Value-Number(Block b)
       perform value number of b with hash lookup
       replace with a scoped lookup in the list HashTables

This algorithm prevents recomputing value numbers.  If I have two extended basic blocks B0, B1, B2, B3 and B0, B1, B2, B4, the algorithm will first compute the value numberings for the prefix B0, B1, B2.  It will then branch to compute a value numbering of B3 based on the hash tables created by the prefix, throw away the hash table created for B3, and branch again to compute a value numbering of B4.

Optimizing extended basic block is an improvement, but we'd like to do even better.  With extended basic blocks, we give up as soon as two control-flow paths merge.  To do better, we're going to need a way to deal with these merges.