Redesigned with new layout and simulation tools to be robust and to minimize power. The computer can be throttled by a factor of 1024 to provide 2.4 Mips using 20 uW. It may be stopped altogether, but will have to reboot.
Multiply (125 Mops) and divide (40 Mops) have been improved. Internal memory is fast enough (1 ns) to sustain 2400 Mips. Data access, especially to external SRAM, will slow this. Code is loaded into on-chip DRAM for execution.
One stack is used to store subroutine return addresses. All processors have such a stack. The other is used to pass parameters to and from subroutines. Other processors use registers or stack frames for this purpose. However, all languages use an implicit stack to evaluate expressions. Forth makes it explicit.
As if emphasizing their importance, the stacks require 2/3 of the CPU silicon area. It is difficult to achieve their 1-cycle accesss timing.
The merits of stack vs. register designs have been argued for decades. A comprehensive book, Stack Computers, by Phil Koopman has been published online. To quote Sec 6.2: "0-operand stack addressing ... makes stack machines superior to conventional machines in the areas of program size, processor complexity, system complexity, processor performance, and consistency of program execution."
The Forth ALU operates on the top 1 or 2 items of the parameter stack, leaving the result there. This permits 0-operand instructions. Eliminating register addresses permits shorter instructions, in this case 5-bit. Several instructions are required to rearrange the stack. And it's convenient to move things to the return stack.
An address register is useful to reduce stack manipulation. It also supports incrementing to address successive words in memory. Similar use of the top of the return stack provides 2 addresses for memory-memory moves.
A demultiplexor allows the packing of up to 3 instructions per word. This increases the density of compiled code and reduces the interference between instruction and data memory access. It keeps the CPU busy while the next instruction is being fetched. Providing a sustained execution speed of 2400 Mips.
This is implemented by a 3-bit shift register. The current bit enables its slot into the instruction latch. A ready pulse from the memory manager latches the high-order 5 bits (slot 0). The pulse is delayed by a string of 14 inverters so that it repeats 2 ns later, latching the next slot. Slot 2 stops the process, as does a jump or fetch/store, until the next ready pulse.
There are 27 simple instructions, exactly suited to Forth. This allows 1-1 compilation of Forth source to machine code. On other processors, each Forth primitive requires several instructions. The situation is reversed for other languages: several Forth instructions may be required for their primitives.
... | Register |
T | Top of stack |
S | 2nd number on stack |
R | Top of Return stack |
A | Address |
Remember that fetch pushes the stack, store and binary operations pop it.
Code | Op | Action |
0 | word ; | Jump to subroutine; tail recursion |
1 | if | Jump to 'then' if T0-T17 are zero |
2 | word | Call subroutine |
3 | -if | Jump to 'then' if T17 is one |
6 | ; | Return |
8 | @r | Fetch from address in R |
9 | @+ | Fetch from address in A; increment A |
a | n | Fetch literal |
b | @ | Fetch from address in A |
c | !r | Store into address in R |
d | !+ | Store into address in A; increment A |
f | ! | Store into address in A |
10 | - | Ones-complement T |
11 | 2* | Shift T left 1 bit |
12 | 2/ | Shift T right 1 bit; preserve T17 |
13 | +* | Add S to T if T0=1 (multiply step) |
14 | or | Exclusive-or S to T |
15 | and | And S to T |
17 | + | Add S to T |
18 | pop | Fetch R |
19 | a | Fetch A |
1a | dup | Duplicate T |
1b | over | Fetch S |
1c | push | Store into R |
1d | a! | Store into A |
1e | nop | Do nothing |
1f | drop | Store T nowhere nop |
Another advantage of the 5-bit instruction is ease of decoding. A tree of NAND and NOR gates lead from the instruction bus to the enable for each register. This is facilitated by the limit of 10 lines to be routed: each bit and its complement.