This is the beginning of a series:

When I heard that Chuck Moore was speaking at Strange Loop this year I just had to go. His talk yesterday was excellent! I'm not going to summarize it here. The talk will be online soon enough. Here I just want to make a few observations about his contribution and begin a series exploring the F18 and GA144 architecture.

[19 DEC 2013 Update: His talk is up. Go watch it! http://www.infoq.com/presentations/power-144-chip]

His career has been amazing. He invented Forth in 1968, then went on to invent stack machine hardware in the mid-80s while relentlessly improving both for the past 45 years. His singular focus has resulted in highly perfected software and hardware, perfectly suited to one another. His latest creation (and the subject of his presentation) is a 144-core, ultra low-power chip sold by GreenArrays programmed in Forth (colorForth, polyForth, ...).

I got a chance to shake his hand and ask him about his plans for etherForth. Sadly, he said that it's essentially his personal project and he doesn't want to support it. GreenArrays has wonderful, well-supported tooling (running on a PC), but I wanted to play with a complete system hosted on the chip itself, and I wanted to see his code to study. He has blogged about bits and pieces of it, but I wanted to see the editor and compiler and complete video driver. Oh well, I'll have have fun trying to do it myself! Won't be freaking easy though...

I first got a chance to use the GreenArrays GA144 a year or so ago. It's a wonderful chip! I think of it as essentially 144 tiny Forth inner interpreters in hardware. The easiest way to get started is to buy the eval board from GreenArrays. The arrayForth environment is quite unconventional and mind bending, but so much fun to learn and the documentation is excellent. If you can't afford the ($450) eval board (worth every penny), then you can cobble together a minimal setup for about $60. Or for the bargain price of $0, you can follow this series of blog posts and at least get to play with a software simulation I'll be putting together. It's quite a challenge to learn and you have to be willing to wrap your head around a lot of unconventional choices; suspend disbelief and eventually realize the beauty of colorForth and the hardware architecture. It's a mind blowing, fun journey.

Honestly, my journey was to first use the eval board for something at work and under deadline pressure. It was frustrating, slow going and we ended up using some standard MCU instead. Later I came back to it as a hobby project and with the attitude that I should just take my time and work through it bit by bit. Chuck Moore is absolutely brilliant and he has poured his immense intelligence into this one, relatively small focus for longer than I've been alive! There isn't much written about his work. The only way to appreciate it is to dig in and look at the colorForth source and his hardware designs. I decided that I should try to wrap my head around his point of view as much as possible and forget whatever conventions and preconceptions I have. It's been great fun and still going!

Nothing Conventional

In his talk, Chuck Moore was wearing his FiveFinger shoes on stage. That cracked me up. Absolutely no convention is followed blindly by this guy. He says casually during his talk, "I use 6-bit bytes, so..." No conventional 8-bit bytes for him. It does make more sense given the 18-bit words on the F18, but again, who doesn't go with power-of-two-sized words? Other aspects of his design are so out there: the F18 is completely asynchronous (no clock) and literally does nothing while blocking on a port read/write. It allows executing instructions streaming in over a port directly (without loading into memory). Multiple instructions are packed in each word allowing fewer fetches and interesting things like micronext (unext) which creates tiny (3 instruction at most) loops with zero fetches. The list goes on...

Even within the Forth community he's an outlier. His colorForth is hardly recognizable to someone familiar with ANS Forth. I'll do another post on colorForth but some of the beautiful strangeness, aside from the obvious tagged words (colors), includes allowing multiple entry and exit points in a definition and allowing definitions to "fall through" to each other. It's common to return early within an if/then rather than using else. Also using recursion (no "smudge" bit) as a primary looping construct (deprecating do/loop, begin/until, etc. and using for/next less often). The if word leaves the predicate on the stack, meaning that you drop it manually when necessary but never have to dup to preserve. The effortless (and syntaxless) switching between green (compiled) and yellow (immediate) words encourages moving as much as possible to compile-time. The idioms are completely different to regular Forth.

He is endlessly willing to rethink things from a truly clean slate. It's astonishing how simple things become when you're willing to do that and design something bottom up with that perspective; the "bottom" being raw silicon. He said in his talk that the colorForth compiler is "a dozen or so lines of code." This is shocking to most people. It's because tokenization is done at edit-time and there is almost a one-to-one correspondence between the primitive words in the language and instructions on the chip. The compiler is left with not much to do other than maintain the dictionary and do very straightforward instruction and call/jump packing. This "brutal simplicity" is possible because every aspect has been rethought and carefully orchestrated to work perfectly together. Every convention is open to change.

The F18 Architecture

The GA144 is an array of 144 tiny computers. Each computer is an instance of the F18 architecture. Each can be thought of as a Forth inner interpreter in hardware. Separate data and return stacks, zero-operand instructions working with values on the stacks. Most of the instructions have a very direct mapping to common Forth idioms and words (e.g. next/unext loops with induction counter).

As always, there are some very strange aspects of the architecture: 18-bit words, 5-bit instructions packed four per word with the last (3-bit) slot having assumed trailing zeros; meaning only a particular subset of eight instructions can go in this slot. Destination address length depends on the slots used and replace the low bits in the program counter; essentially forming "pages" within which you can jump. Extended arithmetic bit on the program counter changing the "mode" of execution; going to the return stack allowing nested mode switches. An asynchronous ALU that needs time to settle; necessitating nop instructions before certain operations. The stacks are circular. The list goes on...

There are just 32 instructions. You can see them listed on the slide shown during the talk above. He said something like, "If you haven't already, then just take another five minutes and you'll have them memorized." It is a simple machine. The best way to learn the instruction set is to just read the doc. It's a nice concise doc (as are all the GreenArrays docs). You can also go through the "F18A Architecture and Instruction Set" portion of the arrayForth Institute course. It's free and can be done in an afternoon.

I don't think I've ever made a blog post without code and it looks like I'm about to. So okay, here's some code; an F18A simulator in F# (https://github.com/AshleyF/Color). I want to start off by playing with just the instruction set. I think that some of the weirdness and constraints of the actual hardware distracts from learning just what can be done with the instruction set. The below is essentially an F18A, but with a large amount of RAM (8K words rather than 64), no ROM, byte-sized slots (regular 8-bit bytes I mean!) in 32-bit words without a truncated last slot, and a simple I/O scheme making it easy to interface with the console and file system.

We'll make use of this in future posts once we build an editor and assembler. Feel free to hand-poke bytes for now! :)

  1. module Machine
  2.  
  3. type Machine (input: (unit -> int) array, output: (int -> unit) array) =
  4.     let high 0x8000
  5.     let mutable p, i, slot, t, s, si, r, ri, a, high, 0, 4, 0, 0, 0, 0, 0, 0, high
  6.  
  7.     let stk, rtn Array.zeroCreate 8, Array.zeroCreate 8
  8.     let move = (d) &&& 0b111
  9.     let push r rtn.[ri] <- r; ri <- move ri 1; <- v
  10.     let popr () ri <- move ri 1; let in <- rtn.[ri]; x
  11.     let pushs stk.[si] <- s; si <- move si 1; <- t; <- v
  12.     let pops () let in <- s; si <- move si 1; <- stk.[si]; x
  13.  
  14.     let ram Array.zeroCreate 8192
  15.     let mem &&& high <> high
  16.     let get if mem then ram.[x] else input.[^^^ high] ()
  17.     let set if mem then ram.[x] <- else output.[^^^ high] (v)
  18.     let fetchm get |> int |> pushs
  19.     let storem pops () |> set x
  20.  
  21.     let unary <- t
  22.     let binary <- pops () |> ft
  23.     let flip a
  24.     let incp () let p' in (if mem then <- 1); p'
  25.     let inca () let a' in (if mem then <- 1); a'
  26.  
  27.     let step () =
  28.         let fetch () <- incp () |> get; slot <- 0
  29.         let decode () slot <- slot 1; (>>> (32 slot 8)) &&& 0xff
  30.         let slotzero () slot <- 0
  31.         let setp <- x; slotzero (); fetch ()
  32.         let jump () let = (0xffffff >>> (slot 1) *8) in (&&& (~~~m)) ||| (&&& m) |> setp
  33.         if slot then fetch ()
  34.         let ex () let in <- p; setp x
  35.         let next if then popr () |> ignore; f () else <- 1; g ()
  36.         let multiply () =
  37.             if (&&& 1) then <- s
  38.             let &&& in <- >>> 1; <- ((uint32 a) >>> 1) ||| ((uint32 x) <<< 31) |> int
  39.         let cond if then jump () else fetch ()
  40.         match decode () with
  41.             | 0x00 -> <- popr (); fetch () // ; (return)
  42.             | 0x01 -> ex ()                  // ex (execute)
  43.             | 0x02 -> jump ()                // name ; (jump)
  44.             | 0x03 -> pushr p; jump ()       // name (call)
  45.             | 0x04 -> next id slotzero       // unext (micronext)
  46.             | 0x05 -> next fetch jump        // next
  47.             | 0x06 -> cond ((=) 0)           // if
  48.             | 0x07 -> cond ((<=) 0)          // -if (minus-if)
  49.             | 0x08 -> incp () |> fetchm      // @p (fetch-P)
  50.             | 0x09 -> inca () |> fetchm      // @+ (fetch-plus)
  51.             | 0x0a -> fetchm b               // @b (fetch-B)
  52.             | 0x0b -> fetchm a               // @ (fetch)
  53.             | 0x0c -> incp () |> storem      // !p (store-P)
  54.             | 0x0d -> inca () |> storem      // !+ (store-plus)
  55.             | 0x0e -> storem b               // !b (store-B)
  56.             | 0x0f -> storem a               // ! (store)
  57.             | 0x10 -> multiply ()            // +* (multiply-step)
  58.             | 0x11 -> unary (flip (<<<) 1)   // 2* (two-star)
  59.             | 0x12 -> unary (flip (>>>) 1)   // 2/ (two-slash)
  60.             | 0x13 -> unary (~~~)            // - (not)
  61.             | 0x14 -> binary (+)             // + (plus)
  62.             | 0x15 -> binary (&&&)           // and
  63.             | 0x16 -> binary (^^^)           // or (exclusive or)
  64.             | 0x17 -> pops () |> ignore      // drop
  65.             | 0x18 -> pushs t                // dup
  66.             | 0x19 -> popr () |> pushs       // pop
  67.             | 0x1a -> pushs s                // over
  68.             | 0x1b -> pushs a                // a
  69.             | 0x1c -> ()                     // . (nop)
  70.             | 0x1d -> pops () |> pushr       // push
  71.             | 0x1e -> <- pops ()           // b! (B-store)
  72.             | 0x1f -> <- pops ()           // a! (A-store)
  73.  
  74.     member x.Run() try while true do step () with _ -> () 

The I/O scheme is a pair of int -> unit and unit -> int functions. Arrays of these are passed to the Machine constructor. The fetch and store instructions with an address (in A, B or P) having the high bit set become indexes into this array of I/O functions (see get and set above). Here are some for console and disk access. Notice that you can set the cursor position and console colors (useful for displaying colorForth source eventually, BTW!) via a simple protocol. The file system is similar to standard Forth in which numbered blocks are read and written. An input, output and "block select" port are given.

  1. let consoleInput () Console.ReadKey(true).Key |> int
  2.  
  3. let consoleOutput =
  4.     match >>> 24 with
  5.     | -> char |> Console.Write
  6.     | -> Console.SetCursorPosition(&&& 0xff, (>>> 8) &&& 0xff)
  7.     | -> Console.ForegroundColor <- enum (&&& 0xf)
  8.     | -> Console.BackgroundColor <- enum (&&& 0xf)
  9.  
  10. let blockIO =
  11.     let block File.Open(sprintf @"..\..\..\ColorSource\%i.blk" b, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.Read)
  12.     let file block 0
  13.     let reader ref (new BinaryReader(file))
  14.     let writer ref (new BinaryWriter(file))
  15.     let select =
  16.         (!reader).Close(); (!writer).Close()
  17.         let file block i
  18.         reader := new BinaryReader(file)
  19.         writer := new BinaryWriter(file)
  20.     let input () = (!reader).ReadInt32()
  21.     let output (int32) (!writer).Write(bytes); (!writer).Flush()
  22.     select, input, output
  23.  
  24. let blockSelect, blockInput, blockOutput blockIO

Finally, a machine is constructed with I/O devices.

  1. let machine new Machine([|blockInput; consoleInput|], [|consoleOutput; blockSelect; blockOutput|])
  2. machine.Run()
  3.  
  4. Console.ReadLine() |> ignore

Notice that in the Machine above, the P and B registers are initialized to the first I/O ports (while A is zero). This means that the first thing it does when run is read instructions from the first input port (the 0.blk file), so this is where boot code should be. Also, B is commonly used for output and is conveniently initialized to consoleOutput.

Next: Programming the F18