Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
Last time I discussed why it is that we have all the .NET compilers target an "intermediate language", or "IL", and then have jitters that translate IL to machine code: because doing so ultimately reduces the costs of building a multi-language, multi-hardware platform. Today I want to talk a bit about why IL is the way it is; specifically, why is it a "stack machine"?
To begin with, what is a "stack machine"? Let's consider how you might design a machine language that could describe the operation of adding together two integers to make a third. You could do it like this:
add [address of first addend], [address of second addend], [address of sum]
When the machine encounters this instruction it looks up the values of the addends stored in the two addresses, somehow adds them together -- how it does so is its business -- and stores the result in the third address.
You might instead say that there is a special region of memory called the "accumulator" which knows how to add a given value to itself:
clear_accumulator increase_accumulator [address of first addend]increase_accumulator [address of second addend]save_accumulator [address of sum]
Or, you could say that there is a special region of memory called the "stack" which can grow and shrink; you get access to the items on the top:
push_value_at [address of first addend]push_value_at [address of second addend]addpop_value_into [address of sum]
The "add" instruction takes the two values off the top of the stack, somehow adds them, and then puts the result back on the stack; the net result is that the stack shrinks by one.
A virtual machine where most of the instructions are of this third form is called a "stack machine", for obvious reasons.
IL specifies a stack machine, just like many other virtual machines. But most hardware instruction sets actually more closely resemble the second form: registers are just fancy accumulators. Why then are so many virtual machines specifying stack machines?
There are several reasons, but again, it primarily comes down to lowering costs. Stack machines are very easy to understand, they are very easy to write a compiler front-end for, they are easy to build interpreters for, they are easy to build jitters for, and they provide a concise abstraction for the common case. The common case is that the result of any particular operation is only going to be of interest for a brief period.
Imagine, for example, if we chose the first strategy for IL, and then had to compile an expression like x = A() + B() + C(); What would we have to do in the first case? Something like this:
create_temporary_storage // for result of A()call A(), [address of temporary storage]create_temporary_storage // for result of B()call B(), [address of temporary storage]create_temporary_storage // for result of first additionadd [address of first temporary storage], [address of second temporary storage], [address of third temporary storage]...
You see how this goes? The IL is getting huge, and all so that we can keep track of precisely which memory locations are used to store values that we are about to never care about again. A stack abstraction lets the stack implementation deal with the temporary storages; in a stack machine, the same code looks something like:
push [address of x]call A() // implicitly creates a temporary storage by pushing the result on the stackcall B() addcall C()addstore // store result on top of stack in address just below it.
The code is much smaller and much easier to understand. A stack machine is a very simple way to describe a complex computation; by being able to write code for such a simple machine, it lowers the cost of making a compiler. And not only is it easier to write compilers and jitters that target simple stack machines, it is easier to write other code analysis tools as well. The IL verifier, for example, can quickly determine when there is a code path through a method that, say, misaligns the stack, or passes the wrong types of arguments to a method.
I have always asked myself exactly this question. Why have a stack?
I would like to know why building an (optimizing) jitter is easier for a stack machine although the jitter target platform uses assignments?
@tobi: Actually there's probably no such thing as a jitter that performs optimizations on the stack representation, the stack based representation is usually used as a persistent, on disk, representation.
IMO, it's rather irrelevant if this persistent representation is stack based or register bases. Sure, the SSA form may look shorter in terms of number of instructions but those instructions usually require more explicit operands that in a stack based representation.
The main advantage of the SSA representation is that it requires less work on the jit compiler's part and since the jit compiler runs at runtime...
The stack based representation could have some advantage too, the stack based temporaries tend to have a short, explicit lifetime. Once and instruction like "add" is executed you know that the 2 stack temporaries used are dead. But with a register based repreasentation you need to do data flow analysis to figure this out.
Your "stack machine" example reminds me of Reverse Polish Notation calculators:
7 * (3 + 5) = 3 5 + 7 *
@Paul. Well that is exactly how you encode a calculation to be executed by a stack machine. I would say, rather sleepily, that they are equivalent.
Paul's comment about reverse Polish is spot on. People keep thinking that computers were invented around 1986, but mainframe computers with stack architecture based on reverse Polish were around in the 1960s. I worked on one for many years, and the instruction set and opportunuties for optimization make them a good choice.
Whoever thinks computers were invented in 1986 must not have been paying any attention what-so-ever. Even if we assume they were oblivious to the existence and development of "full size" computers, they should still know better. One of the earlier microcomputers, the Altair 8800 came out in 1975.
A very long time ago (at the time, I was still in secondary school), I wrote a simple expression compiler that targeted a stack machine; and to make the RPN "program" faster, I implemented a peephole optimizer. For patterns that looked like 'push a; push b; add' where a and b are constants, I would replace it with 'push c', where c = a+b. I did the same for sub, mul, div etc. And I ran this trivial optimizer iteratively over the instruction stream until it no longer made any replacements.
It turns out that if you do this, it amounts to constant subexpression evaluation for expression subtrees that are wholly constant (i.e. it can resolve 1 - 2 - x to -1 - x, but can't do the same for 1 - x - 2). So there are some meaningful optimizations that you can do with stack representations, even fairly cheaply.
I think "not paying attention what-so-ever" is a bit harsh. From what I remember of my youth, it was awfully easy to dismiss anything that was invented before I was born as not terribly relevant. I suspect the same is true for many young people today :-).
Re "a tiny bit disingenuous". Do people realise that 'disingenuous' means: "The opposite of ingenuous; lacking in candour or frankness, insincere, morally fraudulent. (Said of persons and their actions.)" [OED]?
I'm a tad disappointed in humanity that this somehow turned into a religious battle. Thanks for the post Eric.