[The tenth in a series of posts on the evolution of TransForth]
This post may not quite be deserving of a wizard’s cape and 2001 Space Odyssey background music as when Sussman writes out Lisp in Lisp (at 34:34 - my absolute favorite SICP lecture by the way), but still… we are about to finish off the meta-circular Forth interpreter.
It’s been a while but I haven’t entirely forgotten the TransForth project. Last we left things, we had migrated the inner interpreter and had moved the dictionary, but we had one last major moving part remaining to move from F#-land to Forth-ville: the outer interpreter. The outer interpreter is responsible for taking input, parsing and compiling or executing words, and producing output.
Our current outer interpreter looks like this:
let rep () =
while inp.Peek() <> -1 do
let word = token ()
if word.Length > 0 then
match find word with
| -1 -> // literal?
let number, value = Int32.TryParse word
if number then
if interactive then push value else append LIT_ADDR; append value
else word + "?" |> failwith
| d ->
if interactive || isimmediate d
then push d; exec ()
else cfa d |> append
let rec repl () =
inp <- new StringReader(Console.ReadLine() + Environment.NewLine)
with ex -> out.Write ex.Message; repl ()
It handles prompting the user while parsing lines of console input. Lines are broken into words on whitespace boundaries. Words are looked up in the dictionary. If found, they are either executed in interactive mode (or when the word is flagged as immediate) or else it is appended to the dictionary in compiling mode. If the word isn’t found then it is assumed to be a number (or an error) and is either pushed to the stack if in interactive mode or is appended to the dictionary as a literal in compiling mode.
Before implementing this in Forth, we will need a few new primitives (at least for now). One to pull for keyboard input; each call pushing a single key to the stack:
let key () =
if inp.Peek() = -1 then
inp.Read() |> push
Another to echo characters to the console; again one per call:
let echo () = pop () |> char |> out.Write
We will need to lookup tokens in the dictionary. Remember that the most recently read token can be found tentatively appended at the end of the dictionary. This is convenient, by the way, when the token turns out to be a compiled word name.
let findtok () =
let sb = new StringBuilder()
let len = mem.[mem.[h]]
let tok = mem.[h] + 1
for i in [tok..tok + len - 1] do
mem.[i] |> char |> sb.Append |> ignore
sb.ToString() |> find |> push
Finally, we will need a means of executing found words; priming and kicking off the inner interpreter:
let rec exec () =
let c = pop () |> cfa
p <- c
w <- c
These are added as VM instructions (for now) and primitive words are added for them:
and execute () =
match instruction with
| 29 -> key ()
| 30 -> echo ()
| 31 -> findtok ()
| 32 -> exec ()
primitive "KEY" KEY
primitive "ECHO" ECHO
primitive "FIND" FIND
primitive "EXEC" EXEC
Now we are ready to rewrite the outer interpreter entirely in Forth itself. First, some small utilities:
: CRLF 13 ECHO 10 ECHO ; : SP 32 ; : ?FOUND ( w -- ?) DUP 0 >= ; : HIGHBIT -2147483648 ; : ISIMMEDIATE ( addr -- ?) @ HIGHBIT AND HIGHBIT = ; Next, it will be useful to allow tokenization on delimiters other than spaces. For example, to parse strings (terminated by double quotes) or comments (terminated by right-parenthesis or newline). If the delimiting character is a space then this is taken to mean any whitespace character:
: ?DELIM ( v d -- v ?) 2DUP SP = IF >= ELSE = THEN ; : ?WS SP ?DELIM ; : SKIPWS KEY ?WS IF DROP SKIPWS THEN ; \ leave first non-WS char on stack : TOKEN ( delim -- tok) >R HERE 1+ R@ SP = IF SKIPWS ELSE KEY THEN BEGIN OVER ! 1+ KEY R@ ?DELIM UNTIL R> 2DROP HERE - 1- HERE ! ; : WORD SP TOKEN ; Words are stuffed into the current dictionary header: : CFA ( addr -- c) 5 + ; : LINKA ( addr -- l) 4 + ; : HEADER WORD LATEST HERE LINKA ! HERE L ! HERE CFA H ! ; : TOKENCHARS ( -- b a) HERE HERE @ + 1+ HERE 1+ ; We will need also to be able to parse numbers (decimal for now):
: 0-ASCII 48 ; : 9-ASCII 57 ; : ?DIGIT ( c -- c ?) DUP 0-ASCII >= OVER 9-ASCII <= AND ; : ?NUMBER 0 TRUE TOKENCHARS DO I @ ?DIGIT SWAP >R AND SWAP 10 * R> + 0-ASCII - SWAP LOOP DUP NOT IF SWAP DROP THEN ;
Finally, the outer interpreter mimicking what our F# repl above was doing:
: OUTER WORD FIND ?FOUND IF DUP ISIMMEDIATE ISINTERACTIVE OR IF EXEC ELSE CFA , THEN ELSE DROP ?NUMBER IF ISINTERACTIVE NOT IF LITADDR , , THEN ELSE 63 ECHO SP ECHO \ ? THEN THEN OUTER ; \ Recurse forever!
We startup TransForth, type OUTER and voila! Notice of course that OUTER is recursive and never returns (ever!). Once kicked off, we never come back to the F# version of the outer interpreter! It’s no longer needed at all.
We have a bit of a “chicken and egg” situation on our hands. We have successfully rewritten the interpreter, but we cannot just go and delete the F# version now because then what would compile our replacement?
Some Forths solve the delima at this point by saving an “image” of the compiled dictionary and thereafter bootstrapping things by loading that. Once an image capable of compiling itself from source has been produced (an egg), we have a “meta-compiler” and can indeed nuke the F# version (the chicken). This is a very surreal moment to behold. We will have to ponder our options.
I’ve been spending some time with RetroForth recently (and made an F# version of the VM for it by the way). Retro indeed chooses the meta-circular path to enlightenment and it works out pretty nicely. Charles recently blogged about it and gave the minimal source from which a minimal image may be built. From this tiny image the full source can be compiled, producing a full image.
Another interesting study is the Factor bootstrap process. It too is an image-based system, but Slava goes to quite some effort to maintain the ability to rebuild a clean image from source with an absolutely minimal seed image and various stages of bootstrapping.
Like building a chicken from stem cells and raw DNA or something, imagine you are given some new processor with some never-before-seen instruction set. No Forth exists for it. No C compiler either. Certainly no CLR or F#. How on Earth would you go about bootstrapping a Forth?
Assuming you at least had an assembler (or not – you can stomach straight machine code right?), you might set about handcrafting minimal inner and outer interpreters, then bootstrap a full Forth from there. JonesForth is a very interesting read. It is a literate style development of just this sort. The source code reads like a book. First he builds the inner and outer interpreters in assembly, then he proceeds to define the rest in Forth.
A more Forth-oriented approach is to write a cross compiler on a host system in an existing Forth. To stretch the analogy, this is like getting a chicken to lay ostrich eggs; bootstrapping a new species from an existing one. Assemblers within Forth are very nice actually; much more powerful than any macro assembler on the planet.
For fun and learning I think I’ll explore cross compiling next. I’ve always loved the “Universal Machine” from the 2006 ICFP Programming Contest and Luke Hoban has a beautiful F# implementation ready to go. The current TransForth will be the host and the UM-32 will be the target. This should be fun!