[The second in a series of posts on the evolution of TransForth]

In the last post we completed the tokenizer/parser and the REPL along with some baked in primitives. We now want to be able to add to the dictionary of known words from within Forth itself; to build secondary new words that are essentially as much first class citizens as the primitives. This is our means of abstraction.

One of the difficulties in explaining how the machinery of Forth works is that so many of the pieces are intertwined and can’t be explained independently. Forth has an interesting approach that rolls the interactive, the tokenizer/parser and compiler into a single intricately choreographed dance. If you’re used to a cleanly staged pipeline, this will be enlightening.

By the end of the last post out outer interpreter could immediately deal with words being entered at the REPL; invoking them or parsing as a literal and pushing to the stack. This is interactive mode:


·         Interactive mode – words are executed and literals are pushed immediately.


Now we’re going to introduce a compiling mode to the system. We will build a mechanism to add new secondary words to the dictionary, giving them names of our choosing. Once added, we then want to switch to compiling mode and supply the definition in the form of a chain of previously defined words. This chain is not invoked immediately as in interactive mode, but is instead appended to the current definition.


·         Compiling mode – words and literals are appended to a secondary word’s definition.


Adding to this interplay between interactive and compiling mode, we’ll allow words to be flagged as “immediate words” which are invoked immediately even in compiling mode. With this arrangement we can do some amazing things!


Secondary Words


As of last post, primitive words in the dictionary were a simple mapping of a name to a unit -> unit function. Now we’re adding the idea of a secondary word which is defined in terms of other (primitive or secondary) words. The definition of a secondary word then is really just a list of words to be executed (Def below):


type WordRecord = {

    Name : string

    Code : unit -> unit

    Def : WordRecord list // secondary definition

    Immediate : bool } // execute immediately even while compiling


let Word = { Name = ""; Code = id; Def = []; Immediate = false} // defaults


This WordRecord will be used to represent primitives as well. In that case the Code will just be the direct implementation and Def will be empty. For secondary words, the Def list will contain zero or more words to be executed in sequence by an “inner interpreter.”


Inner Interpreter


All of the secondary words we define will have the following for Code:

let interpret i () = (index i).Def |> List.iter (fun w -> w.Code ())

Given (and capturing) the index to the word in the dictionary, it will walk its Def calling each Code function in turn. Pretty simple, and that’s all the mechanics we need really to interleave primitives with secondary words. To execute either type, just blindly call Code which will either do something directly or will defer to child words (which may in turn defer to others; eventually hitting primitives).


[Note: If the implementation looks a little wonky to you, please bear with me. I’m trying to resemble the mechanics as they will be once we move to assembler. Hopefully this code is clear enough and at the same time the transition later will be smooth-ish.]


Dictionary Revisit


let mutable dict = []

let define name code = dict <- { Word with Name = name; Code = code } :: dict

let find name = List.tryFind (fun w -> w.Name = name) dict

let index i = dict.Length - i |> List.nth dict


Our new dictionary will not only support adding new definitions, but will allow re-definition (shadowing previously defined words) and will allow rollback. We’ve changed the dictionary from a Map to a flat list of our new WordRecord type. We’ve also added an index function (used by the inner interpreter) to look up words by ordinal (relative to the tail).


New definitions are simply cons’d onto the list. The find function returns the first one (the last one added). When we want to “forget” a definition, we need to also roll back any words that may have taken a dependency. This is the normal Forth behavior. It’s easily accomplished by blindly reducing the dictionary to the tail following the word we wish to forget:


let forget name =

    let rest = dict |> Seq.skipWhile (fun w -> w.Name <> name) |> List.ofSeq

    if rest.Length = 0 then failwith "Not found" else dict <- List.tail rest

define "FORGET" (fun () -> token () |> forget)


The Immediate flag on the currently-being-defined word may be set by the following:


let immediate () = dict <- { dict.Head with Immediate = true } :: dict.Tail
"IMMEDIATE" immediate


The Compiler


To control the mode of the system, we’ll add a flag and couple of words. [Note: The normal Forth name for this is literally MODE. We’ll conform to this convention later.]


let mutable interactive = true

define "INTERACTIVE" (fun () -> interactive <- true)

define "COMPILING" (fun () -> interactive <- false)


Next, we need a way of adding new secondary words:


define "CREATE" (fun () -> dict <- {

    Word with Name = token (); Code = interpret (dict.Length + 1) } :: dict)


This creates a new empty definition with the name coming from the following token (That is, CREATE FOO will create a word named “FOO”). Notice that this is an instance of a word momentarily taking over parsing; snagging the next token and treating it as a “name.” The Code for secondary words is our inner interpreter, capturing the index of word just created.


The new secondary has an empty definition at this point (which is valid by the way; a NOP). We would like a way of adding to the definition. That’s easy enough:


let append word = // append to current definition

    dict <- { dict.Head with Def = dict.Head.Def @ [word] } :: dict.Tail


But how do we expose this to Forth code? Well, all we need to do is to switch into compiling mode and let the outer interpreter do it for us – that’s what it’s meant to do when compiling.


Outer Interpreter Revisit


Reviewing the outer interpreter from the last post, it works fine for interactive mode; immediately executing words upon successfully finding them in the dictionary. Words that aren’t in the dictionary are assumed to be literal numbers, are parsed, and are pushed to the stack – again, immediately. Now we need to add support for compiling mode in which words are not executed but are instead appended to the current definition.


Literals also are appended; becoming regular words capturing the value and pushing it to the stack at some later time when Code is invoked:


let literal num = { Word with Code = fun () -> push num }


Let’s not forget our little escape hatch when in compiling mode allowing a word to be marked Immediate <- true in which case it executes immediately regardless of the mode.


Here’s our new and improved outer interpreter:


let rep input =

    out.Clear() |> ignore

    source <- input

    while not (Seq.isEmpty source) do

        let word = token ()

        if word.Length > 0 then

            match find word with

            | Some(word) ->

                if interactive || word.Immediate

                then word.Code() else append word

            | None -> // literal?

                let number, value = Int32.TryParse word

                if number then

                    if interactive then push value else literal value |> append

                else word + "?" |> failwith


Word Defining Words


Remember the : <name> <definition> ; syntax used to define new words? It’s quite simple now to explain what the colon (:) and semicolon (;) words should do. Colon should CREATE a new word and switch to compiling mode and semicolon simply switches back to interactive mode. That's it! We could define them as:


    define ":" (fun () -> <body of CREATE>; interactive <- false )

    define ";" (fun () -> interactive <- true); immediate ()


But we’re not going to make it that easy <insert evil smile>. Instead we’re going to start the magical process of letting Forth bootstrap itself!


rep "CREATE ;   COMPILING INTERACTIVE"; interactive <- true; rep "IMMEDIATE"


Let’s see what this does. CREATE adds a new word called semicolon (;). Next, COMPILING (immediately) switches to compiling mode, and now that we’re in compiling mode, INTERACTIVE is appended to the definition of semicolon (;) by the outer interpreter. Then we have to get out of Forth-world for a moment and force a switch back out of compiling mode with a little bit of F# code (the need for this hack is exactly why we want this semicolon word!). Finally, in interactive mode now, IMMEDIATE is invoked to make semicolon an immediate word. The final definition of simicolon is nothing more than an immediate word that switches out of compiling mode.


With our handy semicolon in hand, we can define colon (:) purely in Forth:




We CREATE a new word called colon (:), switch to COMPILING mode and append (not invoke this time) the CREATE and COMPILING words themselves to colon’s definition. Kind of a meta-circular, tricky little bit of code. We just defined a word using CREATE and COMPILE that itself does nothing more than CREATE COMPILE. Pretty slick!


Language vs. Library


I always love the point in a language design at which you can begin to build the language within itself! We’ve just crossed over. We’ll add a bunch more in the next post, but here’s some standard Forth words now defined entirely in Forth itself and yet indistinguishable from primitives.


rep "

: SQUARE   DUP * ;

: CUBE   DUP DUP * * ;

: NEGATE   -1 * ;

: -ROT   ROT ROT ;







case "7 NEGATE ." "-7 " // negate positive

case "-7 NEGATE ." "7 " // negate negative

case "5 SQUARE ." "25 " // square

case "5 CUBE ." "125 " // cubed

case "1 2 3 -ROT .S" "3 1 2 " // right rotate

case "1 2 3 NIP .S" "1 3 " // drop 2nd

case "1 2 3 TUCK .S" "1 3 2 3 " // bury to 2nd