[The fifth in a series of posts on the evolution of TransForth]

 

Sadly (or happily), we’ve come to a point at which we need to begin thinking like an assembly programmer in order to appreciate the mechanics of Forth’s control flow words. We’ll start by implementing IFELSETHEN in this post and we’ll continue with DOLOOP, BEGINUNTIL, and such later.

 

I apologize ahead of time for the amount of twiddly detail here. I find it hard to make such unnatural, machine-level mechanics intuitive. Hopefully you’ll find it fun to dig into.

 

Memory

 

The load (@) and store (!) words we added in the last post are not really merely for variables. They can address any random memory.

 

let mem = Array.create 16384 0

 

define "@" (fun () -> push mem.[pop ()])

define "!" (fun () -> mem.[pop ()] <- pop ())

 

To a large extent memory will be treated as a heap; allocated incrementally as the dictionary grows and freed as we FORGET words from the dictionary. Very seldom is there a need for randomly poking around in memory. Instead things are done through variable (and later array) allocations and the scope is controlled by the dictionary.

 

let mutable h = 0

let append v = mem.[h] <- v; h <- h + 1

 

define "," (fun () -> pop () |> append)

 

The h pointer tracks the next available memory cell and the comma (,) word lets us append values to the heap. Normally this pointer is a proper Forth variable (usually called H) and the phrase HERE @ is used to get at it. We’ll define the word HERE@ (no space) to do the same:

 

define "HERE@" (fun () -> push h)

 

Dictionary Migration

 

We won’t move the whole dictionary structure to raw memory quite yet but we’ll take the first step. We’ll yank the Value property hack introduced in the last post to support variables which will instead now point to allocated cell of memory.

 

define "VARIABLE" (fun () -> let a = h in make (fun () -> push a); append 0)

 

More interestingly, we’ll move the word definitions to memory. Not the whole WordRecords (those will still be in a dict list), but the Def list of child words from secondary definitions. We’ll represent them instead as a series of plain integer indexes into the dictionary.

 

type WordRecord = {

    Name : string

    Code : unit -> unit

    Def : int // secondary definition address

    Immediate : bool ref } // execute immediately even while compiling

let word n c = { Name = n; Code = c; Def = h; Immediate = ref false }

 

Instead of Def being a list of WordRecords, it’s now a pointer off into memory (the first cell of the definition). During compilation, as we add a chain of words to a definitions we will append dictionary index values to the heap.

 

Inner Interpreter Migration

 

Our inner interpreter was pretty darn simple; just iterating a list of words, executing each Code function:

 

let interpret w () = (index w).Def |> List.iter (fun d -> d.Code ())

It’s going to get a little more complicated now. At the very least now, we need to walk a chain of memory cells and resolve indexes into the dictionary. For reasons that will be clear in a moment, we also want to expose the iteration index as we go.

 

let mutable i = 0

 

let interpret w () =

    let r = i

    i <- (index w).Def

    while mem.[i] <> -1 do

        let d = index mem.[i]

        i <- i + 1

        d.Code()

    i <- r

 

Notice that the interpret function needs to be reentrant because the call to d.Code() may itself actually be interpret (indeed it is on all secondaries). So we tuck away i as a return (r) address and restore it on our way back out. In a real Forth system there is a “return stack” of these. For now we’re cheating and implicitly using the call stack frames.

Notice also above that we use -1 as a terminator since the definitions will all be packed in together in memory. We’ll add a word to append this terminator and change the definition of semicolon (;) to use it:

 

define "EXIT" (fun () -> append -1) // marks end for inner interpreter

 

rep "CREATE ;   COMPILING EXIT INTERACTIVE"

interactive <- true

rep "EXIT IMMEDIATE"

 

Literals

 

One little catch is how on Earth to represent literals now? We used to essentially add anonymous words whose Code was a closure capturing the literal value:

 

let literal num = { Word with Code = fun () -> push num }

 

We have no such convenience now. What we’ll do (the normal Forth way) is this:

 

let lit = dict.Length

define "LIT" (fun () -> push mem.[i]; i <- i + 1)

let literal num = append lit; append num

 

We define a single word (LIT) to handle all literals. The index to this followed by the value itself is appended to the heap whenever we see a literal (in compile mode). We’re now mixing word indexes alongside random other values.

 

The tricky bit is the i <- i + 1 happening in LIT. Remember, we exposed the iteration index from the inner interpreter. This is sort of like a program counter. LIT pushes the literal value to the stack and then manually advances the interpreter to skip over it; nice!

 

Okay, Finally…

 

So we have our in-memory definitions and our exposed inner interpreter index. Now we’re finally ready to build the conditional words.

 

As you’ve probably already imagined, everything is based on manipulating the interpreter index to jump around the program. Don’t worry, we’re not going to add a GOTO word! Forth may have been born in the pre-Dijkstra-enlightened days but Chuck Moore is a smart guy and Forth is a structured language. However, at the very bottom of the layers of abstraction are indeed words for raw branching:

 

define "BRANCH" (fun () -> i <- mem.[i])

define "0BRANCH" (fun () -> if pop() = 0 then i <- mem.[i] else i <- i + 1)

 

They’re not meant to be used directly, but are instead compiled into definitions as the basis of higher-level constructs. Example:

 

IF FOO ELSE BAR THEN BAZ

 

This means, “when true (-1) is on the stack, do FOO, otherwise do BAR and then continue with BAZ in either case.” There can be any number of words between the IFELSETHEN and the ELSE is optional of course. Notice that THEN doesn’t mean what you might be used to in other languages. It’s more like “and THEN continue.”

 

In some RPN style stack languages (e.g. Factor, Joy, Cat), even conditionals are fully RPN. You push a pair of quotations along with a predicate or a boolean value and then execute a conditional word that pops and evaluates one or the other quotation. If you’re used to this then Forth’s syntax feels a bit like switching in and out of RPN mode. You imagine the IF word reading ahead to the ELSE and so on. As we’ll see though, compilation is actually the other way around; the ELSE reaches back to affect the IF. The above ends up compiling to:

 

clip_image001       clip_image003

 

The IF, ELSE and THEN don’t appear in the compiled form. All three are immediate words; executing at compile-time to produce the above. It’s clear from looking at the pretty pictures just how the inner interpreter encountering this list of words will carry out the intended semantics. Let’s walk through the compiling side though.

 

IF will cause a 0BRANCH to be appended. Wait a minute, how do we do that? We need a way, at compile-time of IF, to bake in a literal whose value is the dictionary index of 0BRANCH (or in general, any token):

 

define "'" (fun () -> append lit; append (token() |> find).Value); immediate ()

 

Then we need to, at run-time of IF (which will be compile-time of some word containing an IF – wow, I know, confusing!) append that literal – which we can already do with the comma (,) word from earlier. So with a combination of the two we can do what we want with ' 0BRANCH ,.

 

The 0BRANCH appended by IF is expected to be followed by an address but we don’t know the address at this point. So we’ll just append a zero as a placeholder. Here’s one tricky part. We push to the stack the current HERE@ (which would be 0001) just before doing this. The address will be popped and used later by ELSE or THEN to patch up the jump address. Here we go:

 

: IF   ' 0BRANCH , HERE@ 0 , ; IMMEDIATE

 

Next FOO (and any other words up to the ELSE or THEN) is appended automatically by the outer interpreter.

 

Next ELSE is compiled. First, we’re going to need to prevent the interpreter from just falling through into the “else clause” by appending an unconditional branch over it. Again, we don’t yet know the address we’ll be branching to so we slip in a zero placeholder. Just before doing that though, we push the current HERE@ (which happens to be 0004) for later use by THEN. Remember also that IF had done that for us and we now know where that 0BRANCH needs to go. It needs to be right here (0005); just after the unconditional branch which is about to become the body of the else clause. Sheesh, okay, a lot of talk here about what’s really not too super-complicated of a definition:

 

: ELSE   ' BRANCH , HERE@ 0 , SWAP HERE@ SWAP ! ; IMMEDIATE

 

Now to finish off, THEN just needs to patch up the previous branch (at 0004) from the matching ELSE or IF with the current HERE@ (which is 0006 – just after and else clause).

 

: THEN   HERE@ SWAP ! ; IMMEDIATE

 

More Library

 

We can now add to the library.

: ABS  ( n -- |n|)  DUP 0< IF NEGATE THEN ;

: MIN   2DUP > IF SWAP THEN DROP ;

: MAX   2DUP < IF SWAP THEN DROP ;

 

Tests

 

case "HERE@ : FOO 123 ; FORGET FOO HERE@ = ." "-1 " // forgetting frees heap

case ": FOO IF 1 THEN 2 ; TRUE FOO .S FORGET FOO" "1 2 " // if true

case ": FOO IF 1 THEN 2 ; FALSE FOO .S FORGET FOO" "2 " // if false

case ": FOO IF 1 ELSE 2 THEN 3 ; TRUE FOO .S FORGET FOO" "1 3 " // if then

case ": FOO IF 1 ELSE 2 THEN 3 ; FALSE FOO .S FORGET FOO" "2 3 " // else then

case "7 ABS ." "7 " // absolute value (positive)

case "-7 ABS ." "7 " // absolute value (negative)

case "10 4 MIN ." "4 " // min

case "10 4 MAX ." "10 " // max

case "-10 4 MIN ." "-10 " // min

case "-10 4 MAX ." "4 " // max

 

Next>