[The tenth in a series of posts on the evolution of TransForth]
 

This post may not quite be deserving of a wizard’s cape and 2001 Space Odyssey background music as when Sussman writes out Lisp in Lisp (at 34:34 - my absolute favorite SICP lecture by the way), but still… we are about to finish off the meta-circular Forth interpreter.


It’s been a while but I haven’t entirely forgotten the TransForth project. Last we left things, we had
migrated the inner interpreter and had moved the dictionary, but we had one last major moving part remaining to move from F#-land to Forth-ville: the outer interpreter. The outer interpreter is responsible for taking input, parsing and compiling or executing words, and producing output.

F# Version of Outer Interpreter


Our current outer interpreter looks like this:
 

let rep () =

    while inp.Peek() <> -1 do

        let word = token ()

        if word.Length > 0 then

            match find word with

            | -1 -> // literal?

                let number, value = Int32.TryParse word

                if number then

                    if interactive then push value else append LIT_ADDR; append value

                else word + "?" |> failwith

            | d ->

                if interactive || isimmediate d

                then push d; exec ()

                else cfa d |> append


let
rec repl () =

    out.Write "\n>"

    try

        inp <- new StringReader(Console.ReadLine() + Environment.NewLine)

        rep ()

        out.Write "ok"

        repl ()

    with ex -> out.Write ex.Message; repl ()

It handles prompting the user while parsing lines of console input. Lines are broken into words on whitespace boundaries. Words are looked up in the dictionary. If found, they are either executed in interactive mode (or when the word is flagged as immediate) or else it is appended to the dictionary in compiling mode. If the word isn’t found then it is assumed to be a number (or an error) and is either pushed to the stack if in interactive mode or is appended to the dictionary as a literal in compiling mode.

Laying Some Foundation


Before implementing this in Forth, we will need a few new primitives (at least for now). One to pull for keyboard input; each call pushing a single key to the stack:
 

let key () =

    if inp.Peek() = -1 then

        printf "ok\n>"

        inp <- new StringReader(Console.ReadLine() + Environment.NewLine)

    inp.Read() |> push

 

Another to echo characters to the console; again one per call:


let
echo () = pop () |> char |> out.Write

 

We will need to lookup tokens in the dictionary. Remember that the most recently read token can be found tentatively appended at the end of the dictionary. This is convenient, by the way, when the token turns out to be a compiled word name.

 

let findtok () =

    let sb = new StringBuilder()

    let len = mem.[mem.[h]]

    let tok = mem.[h] + 1

    for i in [tok..tok + len - 1] do

        mem.[i] |> char |> sb.Append |> ignore

    sb.ToString() |> find |> push

 

Finally, we will need a means of executing found words; priming and kicking off the inner interpreter:

 

let rec exec () =

    let c = pop () |> cfa

    p <- c

    w <- c

    rpush i

    execute ()

 

These are added as VM instructions (for now) and primitive words are added for them:

 

and execute () =

    ...

    match instruction with

    ...

    | 29 -> key ()

    | 30 -> echo ()

    | 31 -> findtok ()

    | 32 -> exec ()

    ...


primitive "KEY" KEY

primitive "ECHO" ECHO

primitive "FIND" FIND

primitive "EXEC" EXEC

 

Forth Version of Outer Interpreter


Now we are ready to rewrite the outer interpreter entirely in Forth itself. First, some small utilities:
 

: CRLF 13 ECHO 10 ECHO ;
: SP 32 ;
: ?FOUND ( w -- ?) DUP 0 >= ;
: HIGHBIT -2147483648 ;
: ISIMMEDIATE ( addr -- ?) @ HIGHBIT AND HIGHBIT = ;


Next, it will be useful to allow tokenization on delimiters other than spaces. For example, to parse strings (terminated by double quotes) or comments (terminated by right-parenthesis or newline). If the delimiting character is a space then this is taken to mean any whitespace character:
 

: ?DELIM ( v d -- v ?) 2DUP SP = IF >= ELSE = THEN ;
: ?WS SP ?DELIM ;
: SKIPWS KEY ?WS IF DROP SKIPWS THEN ; \ leave first non-WS char on stack
: TOKEN ( delim -- tok) >R HERE 1+ R@ SP =
    IF SKIPWS ELSE KEY THEN BEGIN
      OVER ! 1+ KEY R@ ?DELIM
    UNTIL R> 2DROP HERE - 1- HERE ! ;
: WORD SP TOKEN ;


Words are stuffed into the current dictionary header:

: CFA ( addr -- c) 5 + ;
: LINKA ( addr -- l) 4 + ;
: HEADER  WORD   LATEST HERE LINKA !   HERE L !   HERE CFA H ! ;
: TOKENCHARS ( -- b a) HERE HERE @ + 1+ HERE 1+ ;


We will need also to be able to parse numbers (decimal for now):
 

: 0-ASCII 48 ;
: 9-ASCII 57 ;
: ?DIGIT ( c -- c ?)
DUP 0-ASCII >= OVER 9-ASCII <= AND ;
: ?NUMBER 0 TRUE TOKENCHARS DO I @ ?DIGIT SWAP >R AND SWAP 10 * R> + 0-ASCII - SWAP LOOP DUP NOT IF SWAP DROP THEN ;


Finally, the outer interpreter mimicking what our F#
repl above was doing:
 

: OUTER WORD FIND ?FOUND IF
    DUP ISIMMEDIATE ISINTERACTIVE OR
    IF EXEC ELSE CFA , THEN
  ELSE
    DROP ?NUMBER IF
      ISINTERACTIVE NOT IF LITADDR , , THEN
    ELSE
      63 ECHO SP ECHO \ ?
    THEN
  THEN
  OUTER ; \ Recurse forever!


We startup TransForth, type
OUTER and voila! Notice of course that OUTER is recursive and never returns (ever!). Once kicked off, we never come back to the F# version of the outer interpreter! It’s no longer needed at all.

Wait Just A Minute?!


We have a bit of a “chicken and egg” situation on our hands. We have successfully rewritten the interpreter, but we cannot just go and delete the F# version now because then what would compile our replacement?
 

Some Forths solve the delima at this point by saving an “image” of the compiled dictionary and thereafter bootstrapping things by loading that. Once an image capable of compiling itself from source has been produced (an egg), we have a “meta-compiler” and can indeed nuke the F# version (the chicken). This is a very surreal moment to behold. We will have to ponder our options.
 

I’ve been spending some time with RetroForth recently (and made an F# version of the VM for it by the way). Retro indeed chooses the meta-circular path to enlightenment and it works out pretty nicely. Charles recently blogged about it and gave the minimal source from which a minimal image may be built. From this tiny image the full source can be compiled, producing a full image.
 

Another interesting study is the Factor bootstrap process. It too is an image-based system, but Slava goes to quite some effort to maintain the ability to rebuild a clean image from source with an absolutely minimal seed image and various stages of bootstrapping.

But Can We Start From Nothing?


Like building a chicken from stem cells and raw DNA or something, imagine you are given some new processor with some never-before-seen instruction set. No Forth exists for it. No C compiler either. Certainly no CLR or F#. How on Earth would you go about bootstrapping a Forth?
 

Assuming you at least had an assembler (or not – you can stomach straight machine code right?), you might set about handcrafting minimal inner and outer interpreters, then bootstrap a full Forth from there. JonesForth is a very interesting read. It is a literate style development of just this sort. The source code reads like a book. First he builds the inner and outer interpreters in assembly, then he proceeds to define the rest in Forth.
 

A more Forth-oriented approach is to write a cross compiler on a host system in an existing Forth. To stretch the analogy, this is like getting a chicken to lay ostrich eggs; bootstrapping a new species from an existing one. Assemblers within Forth are very nice actually; much more powerful than any macro assembler on the planet.
 

For fun and learning I think I’ll explore cross compiling next. I’ve always loved the “Universal Machine” from the 2006 ICFP Programming Contest and Luke Hoban has a beautiful F# implementation ready to go. The current TransForth will be the host and the UM-32 will be the target. This should be fun!

Next>