[The second in a series of posts on the evolution of TransForth]
In the last post we completed the tokenizer/parser and the REPL along with some baked in primitives. We now want to be able to add to the dictionary of known words from within Forth itself; to build secondary new words that are essentially as much first class citizens as the primitives. This is our means of abstraction.
One of the difficulties in explaining how the machinery of Forth works is that so many of the pieces are intertwined and can’t be explained independently. Forth has an interesting approach that rolls the interactive, the tokenizer/parser and compiler into a single intricately choreographed dance. If you’re used to a cleanly staged pipeline, this will be enlightening.
By the end of the last post out outer interpreter could immediately deal with words being entered at the REPL; invoking them or parsing as a literal and pushing to the stack. This is interactive mode:
· Interactive mode – words are executed and literals are pushed immediately.
Now we’re going to introduce a compiling mode to the system. We will build a mechanism to add new secondary words to the dictionary, giving them names of our choosing. Once added, we then want to switch to compiling mode and supply the definition in the form of a chain of previously defined words. This chain is not invoked immediately as in interactive mode, but is instead appended to the current definition.
· Compiling mode – words and literals are appended to a secondary word’s definition.
Adding to this interplay between interactive and compiling mode, we’ll allow words to be flagged as “immediate words” which are invoked immediately even in compiling mode. With this arrangement we can do some amazing things!
As of last post, primitive words in the dictionary were a simple mapping of a name to a unit -> unit function. Now we’re adding the idea of a secondary word which is defined in terms of other (primitive or secondary) words. The definition of a secondary word then is really just a list of words to be executed (Def below):
type WordRecord = {
Name : string
Code : unit -> unit
Def : WordRecord list // secondary definition
Immediate : bool } // execute immediately even while compiling
let Word = { Name = ""; Code = id; Def = []; Immediate = false} // defaults
This WordRecord will be used to represent primitives as well. In that case the Code will just be the direct implementation and Def will be empty. For secondary words, the Def list will contain zero or more words to be executed in sequence by an “inner interpreter.”
All of the secondary words we define will have the following for Code: let interpret i () = (index i).Def |> List.iter (fun w -> w.Code ())
Given (and capturing) the index to the word in the dictionary, it will walk its Def calling each Code function in turn. Pretty simple, and that’s all the mechanics we need really to interleave primitives with secondary words. To execute either type, just blindly call Code which will either do something directly or will defer to child words (which may in turn defer to others; eventually hitting primitives).
[Note: If the implementation looks a little wonky to you, please bear with me. I’m trying to resemble the mechanics as they will be once we move to assembler. Hopefully this code is clear enough and at the same time the transition later will be smooth-ish.]
let mutable dict = []
let define name code = dict <- { Word with Name = name; Code = code } :: dict
let find name = List.tryFind (fun w -> w.Name = name) dict
let index i = dict.Length - i |> List.nth dict
Our new dictionary will not only support adding new definitions, but will allow re-definition (shadowing previously defined words) and will allow rollback. We’ve changed the dictionary from a Map to a flat list of our new WordRecord type. We’ve also added an index function (used by the inner interpreter) to look up words by ordinal (relative to the tail).
New definitions are simply cons’d onto the list. The find function returns the first one (the last one added). When we want to “forget” a definition, we need to also roll back any words that may have taken a dependency. This is the normal Forth behavior. It’s easily accomplished by blindly reducing the dictionary to the tail following the word we wish to forget:
let forget name =
let rest = dict |> Seq.skipWhile (fun w -> w.Name <> name) |> List.ofSeq
if rest.Length = 0 then failwith "Not found" else dict <- List.tail rest
define "FORGET" (fun () -> token () |> forget)
The Immediate flag on the currently-being-defined word may be set by the following:
let immediate () = dict <- { dict.Head with Immediate = true } :: dict.Tail define "IMMEDIATE" immediate
To control the mode of the system, we’ll add a flag and couple of words. [Note: The normal Forth name for this is literally MODE. We’ll conform to this convention later.]
let mutable interactive = true
define "INTERACTIVE" (fun () -> interactive <- true)
define "COMPILING" (fun () -> interactive <- false)
Next, we need a way of adding new secondary words:
define "CREATE" (fun () -> dict <- {
Word with Name = token (); Code = interpret (dict.Length + 1) } :: dict)
This creates a new empty definition with the name coming from the following token (That is, CREATE FOO will create a word named “FOO”). Notice that this is an instance of a word momentarily taking over parsing; snagging the next token and treating it as a “name.” The Code for secondary words is our inner interpreter, capturing the index of word just created.
The new secondary has an empty definition at this point (which is valid by the way; a NOP). We would like a way of adding to the definition. That’s easy enough:
let append word = // append to current definition
dict <- { dict.Head with Def = dict.Head.Def @ [word] } :: dict.Tail
But how do we expose this to Forth code? Well, all we need to do is to switch into compiling mode and let the outer interpreter do it for us – that’s what it’s meant to do when compiling.
Reviewing the outer interpreter from the last post, it works fine for interactive mode; immediately executing words upon successfully finding them in the dictionary. Words that aren’t in the dictionary are assumed to be literal numbers, are parsed, and are pushed to the stack – again, immediately. Now we need to add support for compiling mode in which words are not executed but are instead appended to the current definition.
Literals also are appended; becoming regular words capturing the value and pushing it to the stack at some later time when Code is invoked:
let literal num = { Word with Code = fun () -> push num }
Let’s not forget our little escape hatch when in compiling mode allowing a word to be marked Immediate <- true in which case it executes immediately regardless of the mode.
Here’s our new and improved outer interpreter:
let rep input =
out.Clear() |> ignore
source <- input
while not (Seq.isEmpty source) do
let word = token ()
if word.Length > 0 then
match find word with
| Some(word) ->
if interactive || word.Immediate
then word.Code() else append word
| None -> // literal?
let number, value = Int32.TryParse word
if number then
if interactive then push value else literal value |> append
else word + "?" |> failwith
Remember the : <name> <definition> ; syntax used to define new words? It’s quite simple now to explain what the colon (:) and semicolon (;) words should do. Colon should CREATE a new word and switch to compiling mode and semicolon simply switches back to interactive mode. That's it! We could define them as:
define ":" (fun () -> <body of CREATE>; interactive <- false )
define ";" (fun () -> interactive <- true); immediate ()
But we’re not going to make it that easy <insert evil smile>. Instead we’re going to start the magical process of letting Forth bootstrap itself!
rep "CREATE ; COMPILING INTERACTIVE"; interactive <- true; rep "IMMEDIATE"
Let’s see what this does. CREATE adds a new word called semicolon (;). Next, COMPILING (immediately) switches to compiling mode, and now that we’re in compiling mode, INTERACTIVE is appended to the definition of semicolon (;) by the outer interpreter. Then we have to get out of Forth-world for a moment and force a switch back out of compiling mode with a little bit of F# code (the need for this hack is exactly why we want this semicolon word!). Finally, in interactive mode now, IMMEDIATE is invoked to make semicolon an immediate word. The final definition of simicolon is nothing more than an immediate word that switches out of compiling mode.
With our handy semicolon in hand, we can define colon (:) purely in Forth:
rep "CREATE : COMPILING CREATE COMPILING ;"
We CREATE a new word called colon (:), switch to COMPILING mode and append (not invoke this time) the CREATE and COMPILING words themselves to colon’s definition. Kind of a meta-circular, tricky little bit of code. We just defined a word using CREATE and COMPILE that itself does nothing more than CREATE COMPILE. Pretty slick!
I always love the point in a language design at which you can begin to build the language within itself! We’ve just crossed over. We’ll add a bunch more in the next post, but here’s some standard Forth words now defined entirely in Forth itself and yet indistinguishable from primitives.
rep "
: SQUARE DUP * ;
: CUBE DUP DUP * * ;
: NEGATE -1 * ;
: -ROT ROT ROT ;
: NIP SWAP DROP ;
: TUCK SWAP OVER ;
"
case "7 NEGATE ." "-7 " // negate positive
case "-7 NEGATE ." "7 " // negate negative
case "5 SQUARE ." "25 " // square
case "5 CUBE ." "125 " // cubed
case "1 2 3 -ROT .S" "3 1 2 " // right rotate
case "1 2 3 NIP .S" "1 3 " // drop 2nd
case "1 2 3 TUCK .S" "1 3 2 3 " // bury to 2nd
Next>