Updated 2001 June

X18 Microcomputer core

High performance, low power Forth engine. Optimized for compute-bound portable applications. 18 bit address/data matches cache SRAM.

Features

Architecture

The X18 is an evolution of the F21 and i21 microprocessors. With .18um transistors, it has 5x their speed and 1/5 their power. It has their 16-deep Return and Data stacks and 27 0-operand instructions, packed 3 per word. A 100ms watchdog timer assures continued operation. Boots from on-chip ROM.

Redesigned with new layout and simulation tools to be robust and to minimize power. The computer can be throttled by a factor of 1024 to provide 2.4 Mips using 20 uW. It may be stopped altogether, but will have to reboot.

Multiply (125 Mops) and divide (40 Mops) have been improved. Internal memory is fast enough (1 ns) to sustain 2400 Mips. Data access, especially to external SRAM, will slow this. Code is loaded into on-chip DRAM for execution.

CPU

Forth code is highly factored into many small subroutines. An optimized processor requires an efficient call/return mechanism. This is best achieved with 2 push-down stacks. Each is implemented as a register feeding a 16x18-bit RAM with 8-transistor bit cells. The current entry is indicated by a 16-bit bidirectional, circular shift register.

One stack is used to store subroutine return addresses. All processors have such a stack. The other is used to pass parameters to and from subroutines. Other processors use registers or stack frames for this purpose. However, all languages use an implicit stack to evaluate expressions. Forth makes it explicit.

As if emphasizing their importance, the stacks require 2/3 of the CPU silicon area. It is difficult to achieve their 1-cycle accesss timing.

The merits of stack vs. register designs have been argued for decades. A comprehensive book, Stack Computers, by Phil Koopman has been published online. To quote Sec 6.2: "0-operand stack addressing ... makes stack machines superior to conventional machines in the areas of program size, processor complexity, system complexity, processor performance, and consistency of program execution."

The Forth ALU operates on the top 1 or 2 items of the parameter stack, leaving the result there. This permits 0-operand instructions. Eliminating register addresses permits shorter instructions, in this case 5-bit. Several instructions are required to rearrange the stack. And it's convenient to move things to the return stack.

An address register is useful to reduce stack manipulation. It also supports incrementing to address successive words in memory. Similar use of the top of the return stack provides 2 addresses for memory-memory moves.

A demultiplexor allows the packing of up to 3 instructions per word. This increases the density of compiled code and reduces the interference between instruction and data memory access. It keeps the CPU busy while the next instruction is being fetched. Providing a sustained execution speed of 2400 Mips.

This is implemented by a 3-bit shift register. The current bit enables its slot into the instruction latch. A ready pulse from the memory manager latches the high-order 5 bits (slot 0). The pulse is delayed by a string of 14 inverters so that it repeats 2 ns later, latching the next slot. Slot 2 stops the process, as does a jump or fetch/store, until the next ready pulse.

There are 27 simple instructions, exactly suited to Forth. This allows 1-1 compilation of Forth source to machine code. On other processors, each Forth primitive requires several instructions. The situation is reversed for other languages: several Forth instructions may be required for their primitives.

...Register
TTop of stack
S2nd number on stack
RTop of Return stack
AAddress

Remember that fetch pushes the stack, store and binary operations pop it.
CodeOpAction
0word ;Jump to subroutine; tail recursion
1ifJump to 'then' if T0-T17 are zero
2wordCall subroutine
3-ifJump to 'then' if T17 is one
6;Return
8@rFetch from address in R
9@+Fetch from address in A; increment A
anFetch literal
b@Fetch from address in A
c!rStore into address in R
d!+Store into address in A; increment A
f!Store into address in A
10-Ones-complement T
112*Shift T left 1 bit
122/Shift T right 1 bit; preserve T17
13+*Add S to T if T0=1 (multiply step)
14orExclusive-or S to T
15andAnd S to T
17+Add S to T
18popFetch R
19aFetch A
1adupDuplicate T
1boverFetch S
1cpushStore into R
1da!Store into A
1enopDo nothing
1fdropStore T nowhere nop

Another advantage of the 5-bit instruction is ease of decoding. A tree of NAND and NOR gates lead from the instruction bus to the enable for each register. This is facilitated by the limit of 10 lines to be routed: each bit and its complement.