CPUs
Three implementations of the same 8-bit processor, each with a different microarchitecture. All three execute the same 12-instruction ISA, share the same 256-byte BLOCK ROM (program) and 256-byte BLOCK RAM (data), and produce identical results — they differ only in how many clock cycles each instruction takes.
Instruction Set
| Opcode | Mnemonic | Operand | Description |
|---|---|---|---|
| 0x00 | NOP | — | No operation |
| 0x01 | LDI | imm8 | Load immediate into accumulator |
| 0x02 | LDA | addr8 | Load accumulator from RAM |
| 0x03 | STA | addr8 | Store accumulator to RAM |
| 0x04 | ADD | addr8 | Add RAM value to accumulator |
| 0x05 | SUB | addr8 | Subtract RAM value from accumulator |
| 0x06 | JMP | addr8 | Unconditional jump |
| 0x07 | JZ | addr8 | Jump if zero flag set |
| 0x08 | LDIN | — | Load button state into accumulator |
| 0x09 | STOU | — | Store accumulator to output register |
| 0x0A | WAI | count8 | Wait N milliseconds |
| 0xFF | HLT | — | Halt execution |
The zero flag is updated by LDI, LDIN, LDA, ADD, and SUB.
Modules
cpu — Multi-Cycle Baseline
A 4-bit state machine walks through up to 7 states per instruction:
- FETCH_INS_ADDR — Drive ROM address with PC.
- FETCH_INS_DATA — Latch instruction from ROM, increment PC, drive ROM address for operand.
- FETCH_OP_ADDR — Wait cycle for ROM latency.
- FETCH_OP_DATA — Latch operand from ROM (if instruction needs one), increment PC.
- EXEC — Execute: ALU operations, jumps, I/O, or issue RAM read.
- MEM_READ — Wait for RAM latency, then complete LDA/ADD/SUB.
- WAIT / HALT — Millisecond timer loop or infinite halt.
The WAI instruction uses a Bresenham-style millisecond counter: a 15-bit ms_counter counts up to 27,000 (one millisecond at 27 MHz), then decrements wait_counter. When wait_counter reaches zero, execution resumes.
LED output multiplexes between the output register and the PC based on the button: IF (btn==0) leds <= ~out_data[5:0] ELSE leds <= ~PC[5:0].
cpu_efficient — Overlapped Fetch
Eliminates the FETCH_OP_ADDR state by issuing the operand ROM read during FETCH_INS_DATA using PC + 1. The operand is consumed directly from rom.read.data in EXEC rather than from a register. This removes the operand register, shrinks the state encoding from 4 bits to 3, and saves one or two cycles per instruction.
cpu_pipeline — 3-Stage Pipeline
Uses dual-port ROM (rom.fetch and rom.oper) to read the instruction and operand simultaneously. Two pipeline registers (id_ex_instr, id_ex_operand) hold the decoded instruction while the next one is being fetched.
Pipeline states:
- STARTUP — Initialize ROM addresses after reset or branch. One-cycle penalty.
- BUBBLE — Fill the pipeline: latch first instruction into ID registers.
- RUNNING — Full pipeline: EX executes the current instruction while ID latches the next one and IF fetches the one after that. Steady-state throughput is one instruction per cycle.
- WAIT / HALT — Same as baseline.
Branches (JMP, JZ) flush the pipeline by returning to STARTUP. For LDA/ADD/SUB, the BUBBLE state prefetches the RAM address from the operand so the data is ready when EX runs.
por — Power-On Reset
Counts 16 clock cycles after the FPGA's DONE signal goes high before releasing the active-low por_n output. Ensures stable operation after configuration.
jz
@project(CHIP="GW2AR-18-QN88-C8-I7") SIMPLE_SOC
@import "cpu.jz"
@import "por.jz"
CLOCKS {
SCLK = { period=37.04 }; // 27MHz clock
}
IN_PINS {
SCLK = { standard=LVCMOS33 };
DONE = { standard=LVCMOS33 };
KEY[2] = { standard=LVCMOS33 };
}
OUT_PINS {
LED[6] = { standard=LVCMOS33, drive=8 };
}
MAP {
// System Clock
// 27MHz
SCLK = 4;
// 2 BUttons
// High = Closed
// Low = Open
KEY[0] = 88;
KEY[1] = 87;
// 6 LEDs
// High = OFF
// Low = ON
LED[0] = 15;
LED[1] = 16;
LED[2] = 17;
LED[3] = 18;
LED[4] = 19;
LED[5] = 20;
// DONE can be used as a POR signal
// Low = programming ongoing
// High = programming successfully
DONE = IOR32B;
}
@top CPU {
IN [1] clk = SCLK;
IN [1] rst_n = ~KEY[0];
IN [1] por = DONE;
IN [1] btn = KEY[1];
OUT [6] leds = LED;
}
@endprojjz
@global OPS
NOP = 8'h00;
LDI = 8'h01;
LDA = 8'h02;
STA = 8'h03;
ADD = 8'h04;
SUB = 8'h05;
JMP = 8'h06;
JZ = 8'h07;
LDIN = 8'h08;
STOU = 8'h09;
WAI = 8'h0A;
HLT = 8'hFF;
@endglob
@global STATE
FETCH_INS_ADDR = 4'b0000;
FETCH_INS_DATA = 4'b0001;
FETCH_OP_ADDR = 4'b0010;
FETCH_OP_DATA = 4'b0011;
EXEC = 4'b0100;
MEM_READ = 4'b0101;
WAIT = 4'b0110;
HALT = 4'b0111;
@endglob
@module CPU
CONST {
CLOCK_MHZ = 27;
MILISEC_COUNT = CLOCK_MHZ * 1000;
MILISEC_BITS = clog2(MILISEC_COUNT);
}
PORT {
IN [1] clk;
IN [1] rst_n;
IN [1] por;
IN [1] btn;
OUT [6] leds;
}
REGISTER {
instr [8] = 8'b0;
operand [8] = 8'b0;
A [8] = 8'b0;
PC [8] = 8'b0;
out_data [8] = 8'b0;
wait_counter [8] = 8'b0;
ms_counter [MILISEC_BITS] = MILISEC_BITS'b0;
state [4] = 4'b0;
zero [1] = 1'b0;
}
WIRE {
reset [1];
por_n [1];
}
@new por0 por {
IN [1] clk = clk;
IN [1] done = por;
OUT [1] por_n = por_n;
}
MEM(TYPE=BLOCK) {
rom [8] [256] = @file("../out/program.hex") {
OUT read SYNC;
};
ram [8] [256] = 8'h00 {
IN write;
OUT read SYNC;
};
}
ASYNCHRONOUS {
reset <= rst_n & por_n;
IF (btn == 1'b0) {
leds <= ~out_data[5:0];
} ELSE {
leds <= ~PC[5:0];
}
}
SYNCHRONOUS(CLK=clk RESET=reset RESET_ACTIVE=Low) {
SELECT(state) {
CASE STATE.HALT {
// tight loop
}
CASE STATE.WAIT {
IF (ms_counter == lit(MILISEC_BITS, MILISEC_COUNT)) {
ms_counter <= MILISEC_BITS'b0;
wait_counter <= wait_counter - 8'b1;
IF (wait_counter - 8'b1 == 8'b0) {
state <= STATE.FETCH_INS_ADDR;
}
} ELSE {
ms_counter <= ms_counter + MILISEC_BITS'b1;
}
}
CASE STATE.FETCH_INS_ADDR {
rom.read.addr <= PC;
state <= STATE.FETCH_INS_DATA;
}
CASE STATE.FETCH_INS_DATA {
instr <= rom.read.data;
PC <= PC + 8'b1;
state <= STATE.FETCH_OP_ADDR;
}
CASE STATE.FETCH_OP_ADDR {
rom.read.addr <= PC;
state <= STATE.FETCH_OP_DATA;
}
CASE STATE.FETCH_OP_DATA {
SELECT (instr) {
CASE OPS.LDI
CASE OPS.LDA
CASE OPS.STA
CASE OPS.ADD
CASE OPS.SUB
CASE OPS.JMP
CASE OPS.JZ
CASE OPS.WAI {
operand <= rom.read.data;
PC <= PC + 8'b1;
state <= STATE.EXEC;
}
DEFAULT {
state <= STATE.EXEC;
}
}
}
CASE STATE.EXEC {
SELECT(instr) {
CASE OPS.NOP {
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.LDI {
A <= operand;
zero <= (operand == 8'h00);
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.STA {
ram.write[operand] <= A;
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.JMP {
PC <= operand;
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.JZ {
IF (zero) {
PC <= operand;
}
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.LDIN {
A <= { 7'b0000000, btn };
zero <= (btn == 1'b0);
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.STOU {
out_data <= A;
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.WAI {
wait_counter <= operand;
state <= STATE.WAIT;
}
CASE OPS.HLT {
state <= STATE.HALT;
}
CASE OPS.LDA
CASE OPS.ADD
CASE OPS.SUB {
ram.read.addr <= operand;
state <= STATE.MEM_READ;
}
DEFAULT { }
}
}
CASE STATE.MEM_READ {
SELECT(instr) {
CASE OPS.LDA {
A <= ram.read.data;
zero <= (ram.read.data == 8'h00);
}
CASE OPS.ADD {
A <= A + ram.read.data; // no carry
zero <= (A + ram.read.data) == 8'h00;
}
CASE OPS.SUB {
A <= A - ram.read.data; // no borrow
zero <= (A - ram.read.data) == 8'h00;
}
DEFAULT { }
}
state <= STATE.FETCH_INS_ADDR;
}
DEFAULT {
state <= STATE.FETCH_INS_ADDR;
}
}
}
@endmodjz
@global OPS
NOP = 8'h00;
LDI = 8'h01;
LDA = 8'h02;
STA = 8'h03;
ADD = 8'h04;
SUB = 8'h05;
JMP = 8'h06;
JZ = 8'h07;
LDIN = 8'h08;
STOU = 8'h09;
WAI = 8'h0A;
HLT = 8'hFF;
@endglob
@global STATE
FETCH_INS_ADDR = 3'b000;
FETCH_INS_DATA = 3'b001;
EXEC = 3'b010;
MEM_READ = 3'b011;
WAIT = 3'b100;
HALT = 3'b101;
@endglob
@module CPU
CONST {
CLOCK_MHZ = 27;
MILISEC_COUNT = CLOCK_MHZ * 1000;
MILISEC_BITS = clog2(MILISEC_COUNT);
}
PORT {
IN [1] clk;
IN [1] rst_n;
IN [1] por;
IN [1] btn;
OUT [6] leds;
}
REGISTER {
instr [8] = 8'b0;
A [8] = 8'b0;
PC [8] = 8'b0;
out_data [8] = 8'b0;
wait_counter [8] = 8'b0;
ms_counter [MILISEC_BITS] = MILISEC_BITS'b0;
state [3] = 3'b0;
zero [1] = 1'b0;
}
WIRE {
reset [1];
por_n [1];
}
@new por0 por {
IN [1] clk = clk;
IN [1] done = por;
OUT [1] por_n = por_n;
}
MEM(TYPE=BLOCK) {
rom [8] [256] = @file("program.hex") {
OUT read SYNC;
};
ram [8] [256] = 8'h00 {
IN write;
OUT read SYNC;
};
}
ASYNCHRONOUS {
reset <= rst_n & por_n;
IF (btn == 1'b0) {
leds <= ~out_data[5:0];
} ELSE {
leds <= ~PC[5:0];
}
}
SYNCHRONOUS(CLK=clk RESET=reset RESET_ACTIVE=Low) {
SELECT(state) {
CASE STATE.HALT {
// tight loop
}
CASE STATE.WAIT {
IF (ms_counter == lit(MILISEC_BITS, MILISEC_COUNT)) {
ms_counter <= MILISEC_BITS'b0;
wait_counter <= wait_counter - 8'b1;
IF (wait_counter - 8'b1 == 8'b0) {
state <= STATE.FETCH_INS_ADDR;
}
} ELSE {
ms_counter <= ms_counter + MILISEC_BITS'b1;
}
}
CASE STATE.FETCH_INS_ADDR {
rom.read.addr <= PC;
state <= STATE.FETCH_INS_DATA;
}
CASE STATE.FETCH_INS_DATA {
instr <= rom.read.data;
PC <= PC + 8'b1;
SELECT (rom.read.data) {
CASE OPS.LDI
CASE OPS.LDA
CASE OPS.STA
CASE OPS.ADD
CASE OPS.SUB
CASE OPS.JMP
CASE OPS.JZ
CASE OPS.WAI {
rom.read.addr <= PC + 8'b1;
state <= STATE.EXEC;
}
DEFAULT {
state <= STATE.EXEC;
}
}
}
CASE STATE.EXEC {
SELECT(instr) {
CASE OPS.NOP {
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.LDI {
A <= rom.read.data;
zero <= (rom.read.data == 8'h00);
state <= STATE.FETCH_INS_ADDR;
PC <= PC + 8'b1;
}
CASE OPS.STA {
ram.write[rom.read.data] <= A;
state <= STATE.FETCH_INS_ADDR;
PC <= PC + 8'b1;
}
CASE OPS.JMP {
PC <= rom.read.data;
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.JZ {
IF (zero) {
PC <= rom.read.data;
} ELSE {
PC <= PC + 8'b1;
}
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.LDIN {
A <= { 7'b0000000, btn };
zero <= (btn == 1'b0);
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.STOU {
out_data <= A;
state <= STATE.FETCH_INS_ADDR;
}
CASE OPS.WAI {
wait_counter <= rom.read.data;
state <= STATE.WAIT;
PC <= PC + 8'b1;
}
CASE OPS.HLT {
state <= STATE.HALT;
}
CASE OPS.LDA
CASE OPS.ADD
CASE OPS.SUB {
ram.read.addr <= rom.read.data;
state <= STATE.MEM_READ;
PC <= PC + 8'b1;
}
DEFAULT { }
}
}
CASE STATE.MEM_READ {
SELECT(instr) {
CASE OPS.LDA {
A <= ram.read.data;
zero <= (ram.read.data == 8'h00);
}
CASE OPS.ADD {
A <= A + ram.read.data; // no carry
zero <= (A + ram.read.data) == 8'h00;
}
CASE OPS.SUB {
A <= A - ram.read.data; // no borrow
zero <= (A - ram.read.data) == 8'h00;
}
DEFAULT { }
}
state <= STATE.FETCH_INS_ADDR;
}
DEFAULT {
state <= STATE.FETCH_INS_ADDR;
}
}
}
@endmodjz
@global OPS
NOP = 8'h00;
LDI = 8'h01;
LDA = 8'h02;
STA = 8'h03;
ADD = 8'h04;
SUB = 8'h05;
JMP = 8'h06;
JZ = 8'h07;
LDIN = 8'h08;
STOU = 8'h09;
WAI = 8'h0A;
HLT = 8'hFF;
@endglob
@global PSTATE
STARTUP = 3'b000;
RUNNING = 3'b001;
BUBBLE = 3'b010;
WAIT = 3'b011;
HALT = 3'b100;
@endglob
@module CPU
CONST {
CLOCK_MHZ = 27;
MILISEC_COUNT = CLOCK_MHZ * 1000;
MILISEC_BITS = clog2(MILISEC_COUNT);
}
PORT {
IN [1] clk;
IN [1] rst_n;
IN [1] por;
IN [1] btn;
OUT [6] leds;
}
REGISTER {
A [8] = 8'b0;
PC [8] = 8'b0;
out_data [8] = 8'b0;
zero [1] = 1'b0;
wait_counter [8] = 8'b0;
ms_counter [MILISEC_BITS] = MILISEC_BITS'b0;
pipe_state [3] = 3'b000;
// Pipeline registers (ID -> EX)
id_ex_instr [8] = 8'b0;
id_ex_operand [8] = 8'b0;
}
WIRE {
reset [1];
por_n [1];
}
@new por0 por {
IN [1] clk = clk;
IN [1] done = por;
OUT [1] por_n = por_n;
}
MEM(TYPE=BLOCK) {
rom [8] [256] = @file("program.hex") {
OUT fetch SYNC;
OUT oper SYNC;
};
ram [8] [256] = 8'h00 {
IN write;
OUT read SYNC;
};
}
ASYNCHRONOUS {
reset <= rst_n & por_n;
IF (btn == 1'b0) {
leds <= ~out_data[5:0];
} ELSE {
leds <= ~PC[5:0];
}
}
SYNCHRONOUS(CLK=clk RESET=reset RESET_ACTIVE=Low) {
// 3-stage pipeline: IF -> ID -> EX
//
// Dual-port ROM fetches opcode (fetch port) and operand
// (oper port) in a single cycle. Variable-length instructions
// are handled by tracking whether the current fetch position
// needs adjustment based on the previous instruction's length.
//
// STARTUP: sets ROM addresses from PC, goes to BUBBLE.
// BUBBLE: ID only - latches ROM data into pipeline regs,
// advances PC based on instruction length.
// RUNNING: EX + ID + IF run simultaneously.
SELECT(pipe_state) {
CASE PSTATE.HALT {
pipe_state <= PSTATE.HALT;
}
CASE PSTATE.WAIT {
IF (ms_counter == lit(MILISEC_BITS, MILISEC_COUNT)) {
ms_counter <= MILISEC_BITS'b0;
wait_counter <= wait_counter - 8'b1;
IF (wait_counter - 8'b1 == 8'b0) {
pipe_state <= PSTATE.STARTUP;
}
} ELSE {
ms_counter <= ms_counter + MILISEC_BITS'b1;
}
}
CASE PSTATE.STARTUP {
// Present ROM addresses for fetch
rom.fetch.addr <= PC;
rom.oper.addr <= PC + 8'b1;
pipe_state <= PSTATE.BUBBLE;
}
CASE PSTATE.BUBBLE {
// ID stage only (no EX). Latch ROM data into
// pipeline registers.
id_ex_instr <= rom.fetch.data;
id_ex_operand <= rom.oper.data;
// Determine instruction length and set up next fetch
SELECT(rom.fetch.data) {
CASE OPS.LDA
CASE OPS.ADD
CASE OPS.SUB {
// 2-byte instruction that needs RAM prefetch
ram.read.addr <= rom.oper.data;
PC <= PC + 8'h02;
rom.fetch.addr <= PC + 8'h02;
rom.oper.addr <= PC + 8'h03;
}
CASE OPS.LDI
CASE OPS.STA
CASE OPS.JMP
CASE OPS.JZ
CASE OPS.WAI {
// 2-byte instruction (no RAM prefetch needed)
PC <= PC + 8'h02;
rom.fetch.addr <= PC + 8'h02;
rom.oper.addr <= PC + 8'h03;
}
DEFAULT {
// 1-byte instruction (NOP, LDIN, STOU, HLT)
PC <= PC + 8'b1;
rom.fetch.addr <= PC + 8'b1;
rom.oper.addr <= PC + 8'h02;
}
}
pipe_state <= PSTATE.RUNNING;
}
CASE PSTATE.RUNNING {
// EX + ID + IF run simultaneously.
// Branches and special ops redirect through STARTUP.
SELECT(id_ex_instr) {
CASE OPS.JMP {
// Unconditional branch: update PC, flush
PC <= id_ex_operand;
pipe_state <= PSTATE.STARTUP;
}
CASE OPS.JZ {
// Conditional branch: always flush, set PC
// if taken (otherwise PC stays at next instr)
IF (zero) {
PC <= id_ex_operand;
}
pipe_state <= PSTATE.STARTUP;
}
CASE OPS.WAI {
wait_counter <= id_ex_operand;
pipe_state <= PSTATE.WAIT;
}
CASE OPS.HLT {
pipe_state <= PSTATE.HALT;
}
DEFAULT {
// Normal pipeline: EX + ID + IF
// EX stage: execute the latched instruction
SELECT(id_ex_instr) {
CASE OPS.LDI {
A <= id_ex_operand;
zero <= (id_ex_operand == 8'h00);
}
CASE OPS.STA {
ram.write[id_ex_operand] <= A;
}
CASE OPS.LDIN {
A <= { 7'b0000000, btn };
zero <= (btn == 1'b0);
}
CASE OPS.STOU {
out_data <= A;
}
CASE OPS.LDA {
A <= ram.read.data;
zero <= (ram.read.data == 8'h00);
}
CASE OPS.ADD {
A <= A + ram.read.data;
zero <= (A + ram.read.data) == 8'h00;
}
CASE OPS.SUB {
A <= A - ram.read.data;
zero <= (A - ram.read.data) == 8'h00;
}
DEFAULT { }
}
// ID stage: latch next instruction
id_ex_instr <= rom.fetch.data;
id_ex_operand <= rom.oper.data;
// IF stage: advance PC based on current instruction length
SELECT(rom.fetch.data) {
CASE OPS.LDA
CASE OPS.ADD
CASE OPS.SUB {
// 2-byte instruction with RAM prefetch
ram.read.addr <= rom.oper.data;
PC <= PC + 8'h02;
rom.fetch.addr <= PC + 8'h02;
rom.oper.addr <= PC + 8'h03;
}
CASE OPS.LDI
CASE OPS.STA
CASE OPS.JMP
CASE OPS.JZ
CASE OPS.WAI {
// 2-byte instruction (no RAM prefetch)
PC <= PC + 8'h02;
rom.fetch.addr <= PC + 8'h02;
rom.oper.addr <= PC + 8'h03;
}
DEFAULT {
// 1-byte instruction
PC <= PC + 8'b1;
rom.fetch.addr <= PC + 8'b1;
rom.oper.addr <= PC + 8'h02;
}
}
}
}
}
DEFAULT {
pipe_state <= PSTATE.STARTUP;
}
}
}
@endmodjz
@module por
PORT {
IN [1] clk;
IN [1] done;
OUT [1] por_n;
}
CONST {
POR_CYCLES = 16;
POR_CNT_BITS = clog2(POR_CYCLES);
POR_MAX = POR_CYCLES - 1;
}
REGISTER {
por_reg [1] = 1'b0;
cnt [POR_CNT_BITS] = POR_CNT_BITS'b0;
}
ASYNCHRONOUS {
por_n <= por_reg;
}
SYNCHRONOUS(CLK=clk) {
IF (done == 1'b0) {
por_reg <= 1'b0;
cnt <= POR_CNT_BITS'b0;
} ELIF (cnt == lit(POR_CNT_BITS, POR_MAX)) {
por_reg <= 1'b1;
cnt <= cnt;
} ELSE {
por_reg <= 1'b0;
cnt <= cnt + POR_CNT_BITS'b1;
}
}
@endmodJZ-HDL Language Features
State machine coverage. JZ-HDL's SELECT/CASE with DEFAULT combined with mandatory register reset values prevents the class of bugs where a missing state in Verilog silently infers a latch or holds stale values.
BLOCK memory declaration. MEM(TYPE=BLOCK) with explicit read/write ports replaces vendor-specific BRAM inference pragmas. The compiler generates the correct structure; in Verilog, getting BRAM inference right requires following coding patterns that vary by vendor, and mistakes silently fall back to distributed logic.
Port-checked instantiation. @new por0 por { ... } declares connections with widths and directions. The compiler catches width mismatches and missing ports. Verilog port-connection mismatches may only produce warnings.
Compile-time constants. CONST MILISEC_COUNT = 27000 with clog2() and lit() replaces Verilog's parameter/localparam with guaranteed compile-time evaluation and explicit bit sizing at every use site.