Monday, January 14, 2008 11:54 PM
mmaly
Building a DLR Language - Trees
The ToyScript sample language's front-end is very similar to that of any traditional compiler. It has tokenizer, parser and an abstract syntax tree representation of the toy language. What makes it a DLR based language (from the compiler pipeline perspective) is that instead of generating its own intermediate code, or generating Microsoft .NET IL directly, ToyScript will generate a tree representation of the code - DLR Trees. Dynamic Language Runtime will then take care of the code generation.
The DLR Trees are essentially the DLR representation of programs so every language that targets DLR produces the DLR Trees. ToyScript's design is similar to that of all the other DLR based languages, such as IronPython or IronRuby in the sense that it first parses into its own AST and then generates the DLR trees. At one point ToyScript (being very simple language) parsed directly into the DLR trees, but we changed that to behave more like the other DLR based languages, even though for ToyScript it is not strictly necessary.
By generating the DLR Trees instead of IL, the compiler writer doesn't have to worry about the IL generation and the DLR will take care of the intricacies. The developer can then focus on the correct semantic of the language at hand by implementing the front-end and implementing the correct runtime semantic.
Let's take look at the following snippet of ToyScript code:
import System
print "Hello ToyScript"
Both of the language elements shown (import and print) are implemented simply by runtime library functions (ToyHelpers.Import and ToyHelpers.Print respectively).
There is a useful feature in the DLR which allows to trace all of the trees created during the program's execution: -X:ShowASTs. We can run the code above with the "-X:ShowASTs" command line parameter and see the output:
//
// AST D:\Dev\x.ts
//
.codeblock Object D:\Dev\x.ts ()() {
.var Object System (Local,InParameterArray)
{
(.bound System) = (ToyHelpers.Import)(
"System",
);
(ToyHelpers.Print)(
(Object)"Hello ToyScript",
);
}
}
The output of the trace shows essentially what we'd expect. The import statement turned into an assignment in which variable "System" is assigned return value of the call to the ToyHelpers.Import, passing "System" as an argument. Similarly, the print statement got turned into the call with a "Hello ToyScript" argument.
At the DLR Tree level, both of these nodes are MethodCallExpression, one of many expressions the DLR Trees support. Another one that we can see in this example is an assignment (BoundAssignment) and constant (ConstantExpression).
Who created this DLR Tree? Each AST node of the ToyScript AST has a "Generate" method which is responsible for creating the DLR Tree for the node. The Generate for "print" statement (in Print.cs) then creates the tree that we already saw above:
return Ast.Statement(
Span,
Ast.Call(
typeof(ToyHelpers).GetMethod("Print"),
Ast.ConvertHelper(
_expression.Generate(tg),
typeof(object)
)
)
);
The ToyScript sample above uses actually only few of the many available tree nodes. It only uses assignment (Ast.Assign), block (Ast.Block), call to a method specified as MethodInfo (Ast.Call), constant (Ast.Constant) and conversion/cast (Ast.Convert / Ast.ConvertHelper). Outside of these few, the tree supports quite a few nodes and factory methods for each of them.
Some of the other most important nodes in the DLR Trees are:
-
UnaryExpression - Negate, Convert, logical Not
-
BinaryExpression - comparisons, arithmetic and logical operations, array index
-
MemberExpression - field or property access
-
CallExpression - call to a method specified via MethodInfo
-
NewExpression - calling a constructor to create an instance of .NET class
-
BoundAssignment - assignment to a variable
-
BoundExpression - a value of the variable
-
ConditionalExpression - condition ? ifTrue : ifFalse
-
...
Since DLR represents complex programs, statement-like constructs are needed also:
- DoStatement - a "do .. while ..." loop
- LoopStatement - a simplified form of the C# -style "for" loop
- ReturnStatement
- ThrowStatement
- TryStatement
- SwitchStatement
- ...
The underlying idea for all of the nodes is that they are statically typed and the factory methods enforce the restrictions. For example, when creating a MethodCallExpression to call "System.Console.WriteLine(string)", or when creating an assignment, the factory method checks that the type of the variable is assignable from the type of the right-hand expression.
Now that we know how ToyScript generates the DLR Tree for a (very) simple constructs, the obvious question comes to mind: How does it deal with constructs that are not as obvious at compile time? For example "a + b". What happens when executing "a + b" completely depends on the values of "a" and "b" at the time when the expression is being evaluated. Adding integers or strings are completely different matter, not to mention objects that may implement operator "+". This is what we'll discuss in detail in the following chapters.