Skip to content

SOC: RISC-V System-on-Chip

A complete system-on-chip built around a 32-bit RISC-V CPU (RV32IM) running on a Tang Nano 20K FPGA. The CPU connects to eight peripherals through a shared 32-bit bus, with HDMI video output, 8-channel audio synthesis, SD card storage, 64 Mbit SDRAM, and a UART serial console.

text
                  ┌────────────┐
                  │  RISC-V    │
                  │   CPU      │
                  │ (SOURCE)   │
                  └──────┬─────┘
                         │  SIMPLE_BUS
    ┌────────────────────┴─────────────────────────────────────┐
    │                        arbiter                           │
    └─┬───────┬──────┬──────┬────────┬────────┬──────┬───────┬─┘
      │       │      │      │        │        │      │       │
   ┌─────┐ ┌─────┐ ┌────┐ ┌────┐ ┌──────┐ ┌──────┐ ┌────┐ ┌─────┐
   │ ROM │ │ RAM │ │LED │ │UART│ │SDRAM │ │ Term │ │ SD │ │Audio│
   │0x0_ │ │0x1_ │ │0x2_│ │0x3_│ │ 0x4_ │ │ 0x5_ │ │0x6_│ │0x7_ │
   └─────┘ └─────┘ └────┘ └────┘ └──┬───┘ └──────┘ └──┬─┘ └─────┘
                                    │                 │
                                 SDRAM             SD Card
                                64 Mbit             (SPI)

INFO

Audio support is a work in progress and not currently functional.

Bus Architecture

The SIMPLE_BUS definition in the global file describes the shared bus:

jz
BUS SIMPLE_BUS {
    OUT   [32]    ADDR;
    OUT   [1]     CMD;
    OUT   [1]     VALID;
    INOUT [32]    DATA;
    IN    [1]     DONE;
}

The CPU declares a BUS SIMPLE_BUS SOURCE port. Each peripheral declares a BUS SIMPLE_BUS TARGET port. Directions resolve automatically: OUT from the CPU becomes IN at each peripheral. The INOUT DATA signal participates in tristate resolution — the compiler proves at compile time that exactly one peripheral drives DATA at any moment.

Modules

global

Shared constants and enumerations used across all modules via @global:

  • OP: 8-bit opcodes for the accumulator CPU variant (NOP through ST_X, 25 instructions).
  • STATE: CPU state machine states (FETCH through HALT, 12 states).
  • WAVE: Audio waveform types (SQUARE, TRIANGLE, SAWTOOTH, NOISE).
  • ENV: ADSR envelope states (IDLE, ATTACK, DECAY, SUSTAIN, RELEASE).
  • CMD: Bus commands (READ=0, WRITE=1).

rv_cpu — RISC-V RV32IM CPU

A multi-cycle 32-bit RISC-V implementation supporting the base integer (RV32I) and multiply/divide (M) extensions.

State machine (4-bit): FETCH → WAIT_FETCH → DECODE → EXECUTE → MEM_WAIT/RMW_WAIT/MULDIV_WAIT → WRITEBACK.

Instruction support:

  • LUI, AUIPC, JAL, JALR
  • Branches: BEQ, BNE, BLT, BGE, BLTU, BGEU
  • Loads: LB, LH, LW, LBU, LHU (with sign/zero extension)
  • Stores: SW, SH, SB (sub-word stores use read-modify-write via RMW_WAIT)
  • ALU: ADD, SUB, SLL, SLT, SLTU, XOR, SRL, SRA, OR, AND (immediate and register variants)
  • Multiply/divide: MUL, MULH, MULHSU, MULHU, DIV, DIVU, REM, REMU
  • CSR: CSRRW, CSRRS, CSRRC and immediate variants
  • Trap: ECALL, EBREAK, MRET

Interrupt handling: IRQ lines are checked at FETCH. Trap entry saves PC to mepc, sets mcause, copies MIE→MPIE, clears MIE, and jumps to mtvec. MRET restores MIE from MPIE.

Shadow register file: The register file (rv_regfile) maintains a shadow bank of 31 registers for zero-overhead trap context switching. When shadow_mode is active, reads and writes target the shadow bank, preserving the interrupted program's registers without software save/restore.

rv_alu — Combinational ALU

Implements all RV32I ALU operations via funct3 decoding. ADD/SUB selected by alt bit. Signed comparison uses ssub sign bit. Shifts use b[4:0] as shift amount. Entirely combinational — no registers.

rv_muldiv — Multiply/Divide Unit

Single-cycle multiply (64-bit product via hardware multiplier, selecting upper or lower 32 bits). 32-cycle restoring division with sign correction. Handles edge cases: division by zero returns -1/dividend, overflow (-2^31 / -1) returns -2^31.

rv_csr — CSR Register File

M-mode CSR registers: mstatus (MIE/MPIE/MPP), mie, mtvec, mepc, mcause, mtval, mcycle (free-running counter). Custom CSRs at 0xBC0-0xBC5: clock frequency, video mode, baud divider, SDRAM size, IRQ/SD card vectors. IRQ line status readable at 0xFC0.

arbiter

Template-based address decoder routing the CPU bus to 8 targets. Each target has a config entry matched against ADDR[31:28] using ((addr ^ value) & care) == 0. Three @template blocks handle matching, DONE collection, and VALID/ADDR/CMD routing with DATA aliasing for tristate pass-through.

Address map: ROM=0x0_, RAM=0x1_, LED=0x2_, UART=0x3_, SDRAM=0x4_, Terminal=0x5_, SD=0x6_, Audio=0x7_.

block_ram — 20 KB RAM

5120 × 32-bit BLOCK memory (10 BSRAM banks). Two-stage read pipeline: assert address → data ready next cycle. Writes complete in one cycle. Maps bus address via ADDR[14:2].

rom — 16 KB Boot ROM

4096 × 32-bit BLOCK memory initialized from bios.hex via @file. Read-only with the same two-stage pipeline as RAM. Address mapped via ADDR[13:2].

sdram — SDRAM Controller

Low-level command sequencer for the GW2AR-18's embedded 64 Mbit SDRAM (2M × 32, 11-bit row, 8-bit column, 2-bit bank).

Initialization: 200µs power-up wait → PRECHARGE ALL → two AUTO REFRESH cycles → MODE REGISTER SET (CL=2, burst=1).

11-state machine: INIT → IPRE → IREF → IMODE → IDLE → ACT_W → ACT → RD → RD_CL → WR → REF. Auto-precharge (A10=1) is used for both reads and writes. Refresh fires every ~7.8µs.

Tristate control: sdram_dq is driven during writes (r_dq_oe == 1'b1) and released to high-Z during reads.

sdram_bus — SDRAM Bus Wrapper

Adapts the raw SDRAM controller to the SIMPLE_BUS protocol. A 2-state machine (IDLE/WAIT) latches the bus address and data, asserts rd/wr, and waits for the controller's done signal before signaling bus DONE. Address mapping: pbus.ADDR[22:2] → 21-bit controller address.

led_out

Single 32-bit write-only register mapped to 6 LEDs via data[5:0]. Reads return the current register value.

uart — UART Controller

Wraps uart_tx and uart_rx sub-modules. Register map at two offsets:

  • Offset 0x0: read returns {30'b0, rx_has_data, tx_ready}; write sends DATA[7:0].
  • Offset 0x4: read returns the received byte and clears rx_has_data.

Both TX and RX are 8N1 with configurable baud via a baud_div input from the CPU's CSR. IRQ outputs signal TX ready (rising edge) and RX data available.

sdcard — SD Card SPI Controller

Full SPI-mode SD card interface with 512-byte sector buffer, CMD0/8/55/58/ACMD41 initialization sequence, block read/write with CRC, and DMA handshake for direct-to-RAM writes. Register map includes command, status, sector address, data buffer, and IRQ control. CS gap enforcement between commands per SD specification.

video — DVI/HDMI Video Output

Mode-switchable video pipeline supporting 720p@60Hz (80×22 text, 1280×720) and 1080p@30Hz (120×33 text, 1920×1080), both at 74.25 MHz pixel clock. A 5-stage pipeline reads character and attribute data from the terminal framebuffer, fetches font bitmaps from ROM, and produces TMDS-encoded output. Each cell is 16×32 pixels with RGB565 foreground/background colors. Cursor rendering supports 4 styles (underline, block, blinking variants).

video_timing

Dual-mode CEA-861 timing generator. Mode 0: 1280×720@60Hz (1650×750 total). Mode 1: 1920×1080@30Hz (2200×1125 total). Positive sync polarity for both modes.

terminal — Terminal Framebuffer

Dual-BSRAM character/attribute storage with separate sys_clk (CPU) and pixel_clk (video) ports. Register-mapped interface for cell read/write, cursor position/style, and hardware-accelerated CLEAR and SCROLL_UP commands via a 6-state FSM. Supports up to 120×33 cells.

audio — 8-Channel Audio Synthesizer

Eight independent audio channels, each with selectable waveform (square, triangle, sawtooth, noise), 24-bit frequency, 8-bit volume, 8-bit pan, 8-bit duty cycle, and full ADSR envelope. A 128-sample stereo ring buffer (DISTRIBUTED RAM) feeds the output. Register-mapped per-channel configuration at bus offsets grouped by channel index.

aud_gen — Audio Channel Generator

Single voice with waveform synthesis: square wave (phase vs. duty comparison), triangle (phase fold), sawtooth (direct phase), or noise (16-bit Galois LFSR, taps 16/14/13/11). ADSR envelope with 16-bit accumulator and configurable attack/decay/sustain/release rates. Output: wave × envelope × volume >> 24.

aud_mixer — 8-Channel Stereo Mixer

Sums 8 channels with per-channel pan (0=left, 128=center, 255=right). Each channel is scaled by (255-pan) for left and pan for right. Master volume applied via smul. Output clamped to ±0x7FFF on overflow.

cpu_accumulator — Alternative Simple CPU

A 32-bit accumulator-based CPU with A/X registers, 16-bit stack pointer, and flags (Z/C/N). Included as a simpler alternative to the RISC-V for testing. Same bus interface.

por — Power-On Reset

16-cycle delay after DONE assertion before releasing reset.

jz
@project(CHIP="GW2AR-18-QN88-C8-I7") SIMPLE_SOC
    @import "global.jz"
    @import "soc.jz"
    @import "rv_cpu.jz"
    @import "rv_regfile.jz"
    @import "rv_alu.jz"
    @import "rv_csr.jz"
    @import "rv_muldiv.jz"
    @import "por.jz"
    @import "rom.jz"
    @import "block_ram.jz"
    @import "led_out.jz"
    @import "arbiter.jz"
    @import "uart_tx.jz"
    @import "uart.jz"
    @import "uart_rx.jz"
    @import "sdram.jz"
    @import "sdram_bus.jz"
    @import "terminal.jz"
    @import "video.jz"
    @import "video_timing.jz"
    @import "tmds_encoder.jz"
    @import "sdcard.jz"
    @import "aud_gen.jz"
    @import "aud_mixer.jz"
    @import "audio.jz"

    CONFIG {
        DATA_WIDTH = 32;
        ADDR_WIDTH = 32;
        CLK_FREQ_MHZ = 74;
        SDRAM_SIZE_BYTES = 8388608;
    }

    CLOCKS {
        SCLK = { period=37.037 };   // 27MHz crystal
        sys_clk;                    // 74.25MHz system (from CLKDIV)
        serial_clk;                 // 371.25MHz (5x pixel clock, from PLL)
        pixel_clk;                  // 74.25MHz pixel clock (from CLKDIV)
    }

    IN_PINS {
        SCLK = { standard=LVCMOS33 };
        DONE = { standard=LVCMOS33 };
        KEY[2] = { standard=LVCMOS33 };
        UART_RX = { standard=LVCMOS33 };
        SDIO_D0 = { standard=LVCMOS33 };
    }

    OUT_PINS {
        LED[6] = { standard=LVCMOS33, drive=8 };
        UART_TX = { standard=LVCMOS33, drive=8 };
        SDIO_CLK = { standard=LVCMOS33, drive=8 };
        SDIO_CMD = { standard=LVCMOS33, drive=8 };
        SDIO_D3  = { standard=LVCMOS33, drive=8 };
        TMDS_CLK     = { mode=DIFFERENTIAL, standard=LVDS25, drive=3.5, width=10, fclk = serial_clk, pclk = pixel_clk, reset = pll_lock };
        TMDS_DATA[3] = { mode=DIFFERENTIAL, standard=LVDS25, drive=3.5, width=10, fclk = serial_clk, pclk = pixel_clk, reset = pll_lock };
        O_sdram_clk      = { standard=LVCMOS33, drive=8 };
        O_sdram_cke      = { standard=LVCMOS33, drive=8 };
        O_sdram_cs_n     = { standard=LVCMOS33, drive=8 };
        O_sdram_cas_n    = { standard=LVCMOS33, drive=8 };
        O_sdram_ras_n    = { standard=LVCMOS33, drive=8 };
        O_sdram_wen_n    = { standard=LVCMOS33, drive=8 };
        O_sdram_dqm[4]   = { standard=LVCMOS33, drive=8 };
        O_sdram_addr[11] = { standard=LVCMOS33, drive=8 };
        O_sdram_ba[2]    = { standard=LVCMOS33, drive=8 };
    }

    INOUT_PINS {
        IO_sdram_dq[32] = { standard=LVCMOS33, drive=8 };
    }

    MAP {
        // System Clock (27MHz)
        SCLK = 4;

        // 2 Buttons (active low)
        KEY[0] = 88;
        KEY[1] = 87;

        // 6 LEDs (active low)
        LED[0] = 15;
        LED[1] = 16;
        LED[2] = 17;
        LED[3] = 18;
        LED[4] = 19;
        LED[5] = 20;

        // UART
        UART_TX = 69;
        UART_RX = 70;

        // SDCard
        SDIO_D3  = 81;
        SDIO_D0  = 84;
        SDIO_CLK = 83;
        SDIO_CMD = 82;

        // DVI TMDS differential pairs
        TMDS_CLK     = { P=33, N=34 };
        TMDS_DATA[0] = { P=35, N=36 };
        TMDS_DATA[1] = { P=37, N=38 };
        TMDS_DATA[2] = { P=39, N=40 };

        // DONE (POR signal)
        DONE = IOR32B;

        // SDRAM
        O_sdram_clk      = IOR11B;
        O_sdram_cke      = IOL13A;
        O_sdram_cs_n     = IOL14B;
        O_sdram_cas_n    = IOL14A;
        O_sdram_ras_n    = IOL13B;
        O_sdram_wen_n    = IOL12B;
        O_sdram_dqm[0]   = IOL12A;
        O_sdram_dqm[1]   = IOR11A;
        O_sdram_dqm[2]   = IOL18A;
        O_sdram_dqm[3]   = IOR15B;
        O_sdram_addr[0]  = IOR14A;
        O_sdram_addr[1]  = IOR13B;
        O_sdram_addr[2]  = IOR14B;
        O_sdram_addr[3]  = IOR15A;
        O_sdram_addr[4]  = IOL16B;
        O_sdram_addr[5]  = IOL17B;
        O_sdram_addr[6]  = IOL16A;
        O_sdram_addr[7]  = IOL17A;
        O_sdram_addr[8]  = IOL15B;
        O_sdram_addr[9]  = IOL15A;
        O_sdram_addr[10] = IOR12B;
        O_sdram_ba[0]    = IOR13A;
        O_sdram_ba[1]    = IOR12A;
        IO_sdram_dq[0]   = IOL3A;
        IO_sdram_dq[1]   = IOL3B;
        IO_sdram_dq[2]   = IOL8A;
        IO_sdram_dq[3]   = IOL8B;
        IO_sdram_dq[4]   = IOL9A;
        IO_sdram_dq[5]   = IOL9B;
        IO_sdram_dq[6]   = IOL11A;
        IO_sdram_dq[7]   = IOL11B;
        IO_sdram_dq[8]   = IOR9B;
        IO_sdram_dq[9]   = IOR9A;
        IO_sdram_dq[10]  = IOR5B;
        IO_sdram_dq[11]  = IOR6A;
        IO_sdram_dq[12]  = IOR5A;
        IO_sdram_dq[13]  = IOR4B;
        IO_sdram_dq[14]  = IOR3B;
        IO_sdram_dq[15]  = IOR3A;
        IO_sdram_dq[16]  = IOL39B;
        IO_sdram_dq[17]  = IOL39A;
        IO_sdram_dq[18]  = IOL35B;
        IO_sdram_dq[19]  = IOL35A;
        IO_sdram_dq[20]  = IOL30B;
        IO_sdram_dq[21]  = IOL30A;
        IO_sdram_dq[22]  = IOL20A;
        IO_sdram_dq[23]  = IOL18B;
        IO_sdram_dq[24]  = IOR17A;
        IO_sdram_dq[25]  = IOR16A;
        IO_sdram_dq[26]  = IOR16B;
        IO_sdram_dq[27]  = IOR17B;
        IO_sdram_dq[28]  = IOR18A;
        IO_sdram_dq[29]  = IOR18B;
        IO_sdram_dq[30]  = IOR44A;
        IO_sdram_dq[31]  = IOR44B;
    }

    BUS SIMPLE_BUS {
        OUT   [32]    ADDR;
        OUT   [1]     CMD;
        OUT   [1]     VALID;
        INOUT [32]    DATA;
        IN    [1]     DONE;
    }

    CLOCK_GEN {
        PLL {
            IN REF_CLK SCLK;          // 27MHz crystal
            OUT BASE  serial_clk;    // 371.25 MHz (5x pixel clock)
            WIRE LOCK pll_lock;

            CONFIG {
                IDIV = 3;            // divider = 4
                FBDIV = 54;          // multiplier = 55
                ODIV = 2;            // VCO = 371.25 * 2 = 742.5 MHz
            };
        };

        // ┌────────────┬───────────┐
        // │ Resolution │  Refresh  │
        // ├────────────┼───────────┤
        // │ 1080p      │ 30Hz      │
        // │  720p      │ 60Hz      │
        // └────────────┴───────────┘
        CLKDIV {
            IN REF_CLK serial_clk;
            OUT BASE  pixel_clk;     // 371.25 / 5 = 74.25 MHz

            CONFIG {
                DIV_MODE = 5;
            };
        };

        // ┌──────────┬───────────────┐
        // │ DIV_MODE │    sys_clk    │
        // ├──────────┼───────────────┤
        // │ 2        │ 185.625 MHz   │
        // │ 3.5      │ 106.07 MHz    │
        // │ 4        │ 92.8125 MHz   │
        // │ 5        │ 74.25 MHz     │
        // │ 8        │ 46.41 MHz     │
        // └──────────┴───────────────┘
        // Note: dont forget to set CLK_FREQ_MHZ above
        CLKDIV {
            IN REF_CLK serial_clk;
            OUT BASE  sys_clk;

            CONFIG {
                DIV_MODE = 5;
            };
        };
    }

    @top SOC {
        IN  [1] sclk  = sys_clk;
        IN  [1] rst_n = ~KEY[0];
        IN  [1] done  = DONE;
        IN  [1] pixel_clk = pixel_clk;
        OUT [6] leds  = LED;
        OUT [1] tx    = UART_TX;
        IN  [1] rx    = UART_RX;
        OUT  [10] tmds_clk   = TMDS_CLK;
        OUT  [10] tmds_d0    = TMDS_DATA[0];
        OUT  [10] tmds_d1    = TMDS_DATA[1];
        OUT  [10] tmds_d2    = TMDS_DATA[2];
        OUT  [1]  sd_clk_pin   = SDIO_CLK;
        OUT  [1]  sd_mosi_pin  = SDIO_CMD;
        IN   [1]  sd_miso_pin  = SDIO_D0;
        OUT  [1]  sd_cs_n_pin  = SDIO_D3;
        OUT  [1]  sdram_cke    = O_sdram_cke;
        OUT  [1]  sdram_cs_n   = O_sdram_cs_n;
        OUT  [1]  sdram_ras_n  = O_sdram_ras_n;
        OUT  [1]  sdram_cas_n  = O_sdram_cas_n;
        OUT  [1]  sdram_wen_n  = O_sdram_wen_n;
        OUT  [4]  sdram_dqm    = O_sdram_dqm;
        OUT  [11] sdram_addr   = O_sdram_addr;
        OUT  [2]  sdram_ba     = O_sdram_ba;
        INOUT [32] sdram_dq    = IO_sdram_dq;
        OUT  [1]  sdram_clk_out = O_sdram_clk;
    }
@endproj
jz
// Opcodes (8-bit)
@global OP
    NOP   = 8'h00;
    LDI_A = 8'h01;
    LDI_X = 8'h02;
    LD_A  = 8'h03;
    ST_A  = 8'h04;
    ADD   = 8'h05;
    SUB   = 8'h06;
    AND   = 8'h07;
    OR    = 8'h08;
    XOR   = 8'h09;
    CMP   = 8'h0A;
    JMP   = 8'h0B;
    BEQ   = 8'h0C;
    BNE   = 8'h0D;
    PUSH  = 8'h0E;
    POP   = 8'h0F;
    CALL  = 8'h10;
    RET   = 8'h11;
    HLT   = 8'h12;
    INC   = 8'h13;
    DEC   = 8'h14;
    SHL   = 8'h15;
    SHR   = 8'h16;
    LD_X  = 8'h17;
    ST_X  = 8'h18;
@endglob

// CPU state machine states
@global STATE
    FETCH       = 4'b0000;
    WAIT_FETCH  = 4'b0001;
    DECODE      = 4'b0010;
    EXECUTE     = 4'b0011;
    MEM_READ    = 4'b0100;
    MEM_WAIT    = 4'b0101;
    WRITEBACK   = 4'b0110;
    PUSH_EXEC   = 4'b0111;
    POP_EXEC    = 4'b1000;
    CALL_PUSH   = 4'b1001;
    RET_POP     = 4'b1010;
    HALT        = 4'b1111;
@endglob

// Audio waveform types
@global WAVE
    SQUARE   = 3'd0;
    TRIANGLE = 3'd1;
    SAWTOOTH = 3'd2;
    NOISE    = 3'd3;
@endglob

// Audio envelope states
@global ENV
    IDLE    = 3'd0;
    ATTACK  = 3'd1;
    DECAY   = 3'd2;
    SUSTAIN = 3'd3;
    RELEASE = 3'd4;
@endglob

// Bus commands
@global CMD
    READ  = 1'b0;
    WRITE = 1'b1;
@endglob
jz
@module SOC
    PORT {
        IN  [1] sclk;
        IN  [1] rst_n;
        IN  [1] done;
        IN  [1] pixel_clk;
        OUT [6] leds;
        OUT [1] tx;
        IN  [1] rx;

        // HDMI/DVI TMDS outputs
        OUT [10] tmds_clk;
        OUT [10] tmds_d0;
        OUT [10] tmds_d1;
        OUT [10] tmds_d2;

        // SDRAM physical interface
        OUT   [1]  sdram_cke;
        OUT   [1]  sdram_cs_n;
        OUT   [1]  sdram_ras_n;
        OUT   [1]  sdram_cas_n;
        OUT   [1]  sdram_wen_n;
        OUT   [4]  sdram_dqm;
        OUT   [11] sdram_addr;
        OUT   [2]  sdram_ba;
        INOUT [32] sdram_dq;
        OUT   [1]  sdram_clk_out;

        // SD card SPI pins
        OUT [1] sd_clk_pin;
        OUT [1] sd_mosi_pin;
        IN  [1] sd_miso_pin;
        OUT [1] sd_cs_n_pin;
    }

    WIRE {
        por_n       [1];
        reset       [1];
        cpu_bus     [widthof(SIMPLE_BUS)];
        rom_bus     [widthof(SIMPLE_BUS)];
        ram_bus     [widthof(SIMPLE_BUS)];
        led_bus     [widthof(SIMPLE_BUS)];
        uart_bus    [widthof(SIMPLE_BUS)];
        sdram_bus   [widthof(SIMPLE_BUS)];
        term_bus    [widthof(SIMPLE_BUS)];
        sd_bus      [widthof(SIMPLE_BUS)];
        audio_bus   [widthof(SIMPLE_BUS)];
        led_sw      [6];
        uart_tx_pin    [1];
        uart_rx_pin    [1];
        uart_irq_tx    [1];
        uart_irq_rx    [1];
        cpu_irq_lines  [32];
        sdcard_irq     [1];
        audio_irq      [1];
        sd_clk_w       [1];
        sd_mosi_w      [1];
        sd_cs_n_w      [1];

        // Video mode wire
        video_mode_w  [1];

        // Baud rate divider
        baud_div_w    [16];

        // Video read interface wires
        vram_addr     [12];
        vram_char     [8];
        vram_attr     [32];

        // Cursor wires
        cursor_pos_w  [12];
        cursor_style_w [3];

        // TMDS output wires
        tmds_clk_w    [10];
        tmds_d0_w     [10];
        tmds_d1_w     [10];
        tmds_d2_w     [10];

        // SDRAM internal wires
        sdram_cke_w   [1];
        sdram_cs_n_w  [1];
        sdram_ras_n_w [1];
        sdram_cas_n_w [1];
        sdram_wen_n_w [1];
        sdram_dqm_w   [4];
        sdram_addr_w  [11];
        sdram_ba_w    [2];
        sdram_dq_w    [32];
    }

    @new por0 por {
        IN  [1] clk = sclk;
        IN  [1] done = done;
        OUT [1] por_n = por_n;
    }

    // Address map (4-bit decode on addr[31:28]):
    // ROM:   0x0000_0000 - 0x0000_0FFF  addr[31:28]=0x0 → val=0000 care=1111 = 8'h0F
    // RAM:   0x1000_0000 - 0x1000_4FFF  addr[31:28]=0x1 → val=0001 care=1111 = 8'h1F
    // LED:   0x2000_0000               addr[31:28]=0x2 → val=0010 care=1111 = 8'h2F
    // UART:  0x3000_0000               addr[31:28]=0x3 → val=0011 care=1111 = 8'h3F
    // SDRAM: 0x4000_0000 - 0x407F_FFFF  addr[31:28]=0x4 → val=0100 care=1111 = 8'h4F
    // TERM:  0x5000_0000               addr[31:28]=0x5 → val=0101 care=1111 = 8'h5F
    // SD:    0x6000_0000               addr[31:28]=0x6 → val=0110 care=1111 = 8'h6F
    // AUDIO: 0x7000_0000               addr[31:28]=0x7 → val=0111 care=1111 = 8'h7F
    @new arb0 arbiter {
        OVERRIDE {
            TARGET_COUNT = 8;
        }
        IN  [64] map_config = {8'h7F, 8'h6F, 8'h5F, 8'h4F, 8'h3F, 8'h2F, 8'h1F, 8'h0F};
        BUS SIMPLE_BUS TARGET [1] src = {cpu_bus};
        BUS SIMPLE_BUS SOURCE [8] tgt = {audio_bus, sd_bus, term_bus, sdram_bus, uart_bus, led_bus, ram_bus, rom_bus};
    }

    @new cpu0 cpu {
        OVERRIDE {
            CLK_FREQ_MHZ = CONFIG.CLK_FREQ_MHZ;
            SDRAM_SIZE_BYTES = CONFIG.SDRAM_SIZE_BYTES;
        }
        IN  [1] clk = sclk;
        IN  [1] rst_n = reset;
        IN  [32] irq_lines = cpu_irq_lines;
        BUS SIMPLE_BUS SOURCE pbus = cpu_bus;
        OUT [1]  video_mode = video_mode_w;
        OUT [16] baud_div = baud_div_w;
    }

    @new rom0 rom {
        IN  [1] clk = sclk;
        IN  [1] rst_n = reset;
        BUS SIMPLE_BUS TARGET pbus = rom_bus;
    }

    @new ram0 ram {
        IN  [1] clk = sclk;
        IN  [1] rst_n = reset;
        BUS SIMPLE_BUS TARGET pbus = ram_bus;
    }

    @new led0 led_out {
        IN  [1] clk = sclk;
        IN  [1] rst_n = reset;
        BUS SIMPLE_BUS TARGET pbus = led_bus;
        OUT [6] leds = led_sw;
    }

    @new uart0 uart {
        IN  [1] clk = sclk;
        IN  [1] rst_n = reset;
        BUS SIMPLE_BUS TARGET pbus = uart_bus;
        OUT [1] tx = uart_tx_pin;
        IN  [1] rx = uart_rx_pin;
        OUT [1] irq_tx_ready = uart_irq_tx;
        OUT [1] irq_rx_data  = uart_irq_rx;
        IN  [16] baud_div = baud_div_w;
    }

    @new sdram0 sdram_bus {
        OVERRIDE {
            CLK_FREQ_MHZ = CONFIG.CLK_FREQ_MHZ;
        }
        IN  [1]  clk       = sclk;
        IN  [1]  rst_n     = reset;
        BUS SIMPLE_BUS TARGET pbus = sdram_bus;
        OUT   [1]  sdram_cke   = sdram_cke_w;
        OUT   [1]  sdram_cs_n  = sdram_cs_n_w;
        OUT   [1]  sdram_ras_n = sdram_ras_n_w;
        OUT   [1]  sdram_cas_n = sdram_cas_n_w;
        OUT   [1]  sdram_wen_n = sdram_wen_n_w;
        OUT   [4]  sdram_dqm   = sdram_dqm_w;
        OUT   [11] sdram_addr  = sdram_addr_w;
        OUT   [2]  sdram_ba    = sdram_ba_w;
        INOUT [32] sdram_dq    = sdram_dq_w;
    }

    @new term0 terminal_fb {
        IN  [1]  clk       = sclk;
        IN  [1]  rst_n     = reset;
        IN  [1]  pixel_clk = pixel_clk;
        BUS SIMPLE_BUS TARGET pbus = term_bus;
        IN  [12] vram_addr = vram_addr;
        OUT [8]  vram_char = vram_char;
        OUT [32] vram_attr = vram_attr;
        OUT [12] cursor_pos   = cursor_pos_w;
        OUT [3]  cursor_style = cursor_style_w;
    }

    @new sd0 sdcard {
        IN  [1] clk    = sclk;
        IN  [1] rst_n  = reset;
        BUS SIMPLE_BUS TARGET pbus = sd_bus;
        OUT [1] irq    = sdcard_irq;
        OUT [1] sd_clk  = sd_clk_w;
        OUT [1] sd_mosi = sd_mosi_w;
        IN  [1] sd_miso = sd_miso_pin;
        OUT [1] sd_cs_n = sd_cs_n_w;
    }

    @new aud0 audio {
        IN  [1] clk   = sclk;
        IN  [1] rst_n = reset;
        BUS SIMPLE_BUS TARGET pbus = audio_bus;
        OUT [1] irq   = audio_irq;
    }

    @new vid0 video_out {
        IN  [1]  pixel_clk  = pixel_clk;
        IN  [1]  rst_n      = reset;
        IN  [1]  video_mode = video_mode_w;
        OUT [12] vram_addr  = vram_addr;
        IN  [8]  vram_char  = vram_char;
        IN  [32] vram_attr  = vram_attr;
        IN  [12] cursor_pos   = cursor_pos_w;
        IN  [3]  cursor_style = cursor_style_w;
        OUT [10] tmds_clk   = tmds_clk_w;
        OUT [10] tmds_d0    = tmds_d0_w;
        OUT [10] tmds_d1    = tmds_d1_w;
        OUT [10] tmds_d2    = tmds_d2_w;
    }

    ASYNCHRONOUS {
        reset <= rst_n & por_n;

        // IRQ lines: bit 0 = UART TX ready, bit 1 = UART RX data, bit 2 = SD card, bit 3 = audio
        cpu_irq_lines <= {28'd0, audio_irq, sdcard_irq, uart_irq_rx, uart_irq_tx};

        uart_rx_pin <= rx;

        // Active-low LEDs, all 6 software controlled
        leds <= ~led_sw;

        tx <= uart_tx_pin;

        // SDRAM physical pins
        sdram_cke   <= sdram_cke_w;
        sdram_cs_n  <= sdram_cs_n_w;
        sdram_ras_n <= sdram_ras_n_w;
        sdram_cas_n <= sdram_cas_n_w;
        sdram_wen_n <= sdram_wen_n_w;
        sdram_dqm   <= sdram_dqm_w;
        sdram_addr  <= sdram_addr_w;
        sdram_ba    <= sdram_ba_w;
        sdram_dq    = sdram_dq_w;

        // SDRAM clock (inverted for setup/hold margin)
        sdram_clk_out <= ~sclk;

        // SD card SPI pins
        sd_clk_pin  <= sd_clk_w;
        sd_mosi_pin <= sd_mosi_w;
        sd_cs_n_pin <= sd_cs_n_w;

        // HDMI/DVI TMDS outputs
        tmds_clk <= tmds_clk_w;
        tmds_d0  <= tmds_d0_w;
        tmds_d1  <= tmds_d1_w;
        tmds_d2  <= tmds_d2_w;
    }
@endmod
jz
// RV32I Base Integer ISA CPU
// 32-bit RISC-V, multi-cycle implementation
// Register file: x0=zero, x1-x31 general purpose
// Bus interface: 32-bit byte address, 32-bit data

// CPU state machine
@global RVS
    FETCH      = 4'h0;
    WAIT_FETCH = 4'h1;
    DECODE     = 4'h2;
    EXECUTE    = 4'h3;
    MEM_WAIT   = 4'h4;
    WRITEBACK  = 4'h5;
    RMW_WAIT   = 4'h6;
    MULDIV_WAIT = 4'h7;
    HALT       = 4'hF;
@endglob

// RV32I opcode groups (instr[6:0])
@global RVO
    LUI     = 7'b0110111;
    AUIPC   = 7'b0010111;
    JAL     = 7'b1101111;
    JALR    = 7'b1100111;
    BRANCH  = 7'b1100011;
    LOAD    = 7'b0000011;
    STORE   = 7'b0100011;
    ALU_IMM = 7'b0010011;
    ALU_REG = 7'b0110011;
    FENCE   = 7'b0001111;
    SYSTEM  = 7'b1110011;
@endglob

// RV32I funct3 values
@global F3
    ADD  = 3'b000;
    SLL  = 3'b001;
    SLT  = 3'b010;
    SLTU = 3'b011;
    XOR  = 3'b100;
    SRL  = 3'b101;
    OR   = 3'b110;
    AND  = 3'b111;
    BEQ  = 3'b000;
    BNE  = 3'b001;
    BLT  = 3'b100;
    BGE  = 3'b101;
    BLTU = 3'b110;
    BGEU = 3'b111;
    LB   = 3'b000;
    LH   = 3'b001;
    LW   = 3'b010;
    LBU  = 3'b100;
    LHU  = 3'b101;
@endglob

// CSR funct3 values (SYSTEM opcode)
@global CF3
    CSRRW  = 3'b001;
    CSRRS  = 3'b010;
    CSRRC  = 3'b011;
    CSRRWI = 3'b101;
    CSRRSI = 3'b110;
    CSRRCI = 3'b111;
@endglob

@module cpu
    CONST {
        CLK_FREQ_MHZ = 54;
        SDRAM_SIZE_BYTES = 0;
    }

    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        IN  [32] irq_lines;
        BUS SIMPLE_BUS SOURCE pbus;
        OUT [1] video_mode;
        OUT [16] baud_div;
    }

    REGISTER {
        // Program counter (byte address)
        pc       [32] = 32'h00000000;

        // State machine
        state    [4]  = 4'h0;

        // Instruction register
        instr    [32] = 32'h00000000;

        // Decoded register values
        rs1_val  [32] = 32'h00000000;
        rs2_val  [32] = 32'h00000000;

        // Decoded immediate
        imm_val  [32] = 32'h00000000;

        // Writeback
        wb_data  [32] = 32'h00000000;
        wb_rd    [5]  = 5'd0;
        next_pc  [32] = 32'h00000000;

        // Memory access tracking
        mem_funct3  [3] = 3'b000;
        mem_addr_lo [2] = 2'b00;

        // Bus control registers
        bus_addr  [32] = 32'h00000000;
        bus_data  [32] = 32'h00000000;
        bus_cmd   [1]  = 1'b0;
        bus_valid [1]  = 1'b0;

        // Shadow register bank select (0=normal, 1=trap/ISR)
        shadow_mode [1] = 1'b0;
    }

    WIRE {
        // Decoded instruction fields
        opcode    [7];
        rd        [5];
        funct3    [3];
        rs1_addr  [5];
        rs2_addr  [5];
        funct7    [7];

        // Register file interconnect
        rf_rs1_data  [32];
        rf_rs2_data  [32];
        rf_wr_en_w   [1];
        rf_wr_addr_w [5];
        rf_wr_data_w [32];

        // ALU interconnect
        alu_a_w      [32];
        alu_b_w      [32];
        alu_f3_w     [3];
        alu_alt_w    [1];
        alu_result   [32];

        // CSR interconnect
        csr_rd_addr_w    [12];
        csr_rd_data      [32];
        csr_wr_addr_w    [12];
        csr_wr_data_w    [32];
        csr_wr_en_w      [1];
        csr_trap_enter_w [1];
        csr_trap_epc_w   [32];
        csr_trap_cause_w [32];
        csr_trap_mret_w  [1];
        csr_mtvec        [32];
        csr_mepc         [32];
        csr_irqvec       [32];
        csr_sdcardvec    [32];
        csr_mstatus_mie  [1];
        csr_mie_meie     [1];
        irq_pending      [1];
        csr_new_val      [32];
        csr_zimm         [32];
        csr_video_mode   [1];
        csr_baud_div     [16];

        // Multiply/divide interconnect
        md_start  [1];
        md_result [32];
        md_done   [1];
    }

    @new rf0 rv_regfile {
        IN  [1]  clk      = clk;
        IN  [1]  rst_n    = rst_n;
        IN  [5]  rs1_addr = rs1_addr;
        IN  [5]  rs2_addr = rs2_addr;
        OUT [32] rs1_data = rf_rs1_data;
        OUT [32] rs2_data = rf_rs2_data;
        IN  [5]  wr_addr  = rf_wr_addr_w;
        IN  [32] wr_data  = rf_wr_data_w;
        IN  [1]  wr_en    = rf_wr_en_w;
        IN  [1]  shadow   = shadow_mode;
    }

    @new alu0 rv_alu {
        IN  [32] a      = alu_a_w;
        IN  [32] b      = alu_b_w;
        IN  [3]  funct3 = alu_f3_w;
        IN  [1]  alt    = alu_alt_w;
        OUT [32] result = alu_result;
    }

    @new csr0 rv_csr {
        OVERRIDE {
            CLK_FREQ_MHZ = CLK_FREQ_MHZ;
            SDRAM_SIZE_BYTES = SDRAM_SIZE_BYTES;
        }
        IN  [1]  clk         = clk;
        IN  [1]  rst_n       = rst_n;
        IN  [12] rd_addr     = csr_rd_addr_w;
        OUT [32] rd_data     = csr_rd_data;
        IN  [12] wr_addr     = csr_wr_addr_w;
        IN  [32] wr_data     = csr_wr_data_w;
        IN  [1]  wr_en       = csr_wr_en_w;
        IN  [1]  trap_enter  = csr_trap_enter_w;
        IN  [32] trap_epc    = csr_trap_epc_w;
        IN  [32] trap_cause  = csr_trap_cause_w;
        IN  [1]  trap_mret   = csr_trap_mret_w;
        OUT [32] mtvec_out     = csr_mtvec;
        OUT [32] mepc_out     = csr_mepc;
        OUT [32] irqvec_out   = csr_irqvec;
        OUT [32] sdcardvec_out = csr_sdcardvec;
        OUT [1]  mstatus_mie = csr_mstatus_mie;
        OUT [1]  mie_meie    = csr_mie_meie;
        IN  [32] irq_lines   = irq_lines;
        OUT [1]  video_mode  = csr_video_mode;
        OUT [16] baud_div    = csr_baud_div;
    }

    @new md0 rv_muldiv {
        IN  [1]  clk    = clk;
        IN  [1]  rst_n  = rst_n;
        IN  [32] a      = rs1_val;
        IN  [32] b      = rs2_val;
        IN  [3]  funct3 = funct3;
        IN  [1]  start  = md_start;
        OUT [32] result = md_result;
        OUT [1]  done   = md_done;
    }

    ASYNCHRONOUS {
        // Instruction field decode
        opcode   = instr[6:0];
        rd       = instr[11:7];
        funct3   = instr[14:12];
        rs1_addr = instr[19:15];
        rs2_addr = instr[24:20];
        funct7   = instr[31:25];

        // Drive bus signals
        pbus.ADDR  <= bus_addr;
        pbus.DATA  <= (bus_valid == 1'b1 && bus_cmd == CMD.WRITE) ? bus_data : 32'bz;
        pbus.CMD   <= bus_cmd;
        pbus.VALID <= bus_valid;

        // ALU inputs
        alu_a_w   <= rs1_val;
        alu_b_w   <= (opcode == RVO.ALU_REG) ? rs2_val : imm_val;
        alu_f3_w  <= funct3;
        alu_alt_w <= (opcode == RVO.ALU_REG || funct3 == F3.SRL) ? instr[30] : 1'b0;

        // Register file write control (active during WRITEBACK)
        rf_wr_en_w   <= (state == RVS.WRITEBACK) ? 1'b1 : 1'b0;
        rf_wr_addr_w <= wb_rd;
        rf_wr_data_w <= wb_data;

        // CSR field decode
        csr_zimm <= {27'd0, rs1_addr};

        // CSR read address (combinational read)
        csr_rd_addr_w <= instr[31:20];

        // Video mode output
        video_mode <= csr_video_mode;

        // Baud rate divider output
        baud_div <= csr_baud_div;

        // Multiply/divide start signal
        md_start <= (state == RVS.EXECUTE && opcode == RVO.ALU_REG && funct7 == 7'b0000001) ? 1'b1 : 1'b0;

        // Interrupt pending: any IRQ line active & external IRQ enabled & global enable
        irq_pending <= (irq_lines != 32'h00000000) ? (csr_mie_meie & csr_mstatus_mie) : 1'b0;

        // CSR write value computation (read-modify-write)
        IF (funct3 == CF3.CSRRW) {
            csr_new_val <= rs1_val;
        } ELIF (funct3 == CF3.CSRRS) {
            csr_new_val <= csr_rd_data | rs1_val;
        } ELIF (funct3 == CF3.CSRRC) {
            csr_new_val <= csr_rd_data & ~rs1_val;
        } ELIF (funct3 == CF3.CSRRWI) {
            csr_new_val <= csr_zimm;
        } ELIF (funct3 == CF3.CSRRSI) {
            csr_new_val <= csr_rd_data | csr_zimm;
        } ELIF (funct3 == CF3.CSRRCI) {
            csr_new_val <= csr_rd_data & ~csr_zimm;
        } ELSE {
            csr_new_val <= 32'h00000000;
        }

        // CSR write enable: active during EXECUTE with valid CSR funct3
        csr_wr_en_w   <= (state == RVS.EXECUTE && opcode == RVO.SYSTEM && funct3 != 3'b000) ? 1'b1 : 1'b0;
        csr_wr_addr_w <= instr[31:20];
        csr_wr_data_w <= csr_new_val;

        // Trap signals (combinational)
        // Trap enter: interrupt at FETCH, or ECALL/EBREAK at EXECUTE
        IF (state == RVS.FETCH && irq_pending == 1'b1) {
            csr_trap_enter_w <= 1'b1;
            csr_trap_cause_w <= 32'h8000000B;
        } ELIF (state == RVS.EXECUTE && opcode == RVO.SYSTEM && funct3 == 3'b000 && instr[31:20] != 12'h302 && instr[31:20] != 12'h105) {
            csr_trap_enter_w <= 1'b1;
            IF (instr[20] == 1'b1) {
                // EBREAK (imm=0x001)
                csr_trap_cause_w <= 32'h00000003;
            } ELSE {
                // ECALL (imm=0x000)
                csr_trap_cause_w <= 32'h0000000B;
            }
        } ELSE {
            csr_trap_enter_w <= 1'b0;
            csr_trap_cause_w <= 32'h00000000;
        }
        csr_trap_epc_w  <= pc;
        csr_trap_mret_w <= (state == RVS.EXECUTE && opcode == RVO.SYSTEM && funct3 == 3'b000 && instr[31:20] == 12'h302) ? 1'b1 : 1'b0;
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        // ============================================================
        // FETCH: Start instruction fetch at PC (or take interrupt)
        // ============================================================
        IF (state == RVS.FETCH) {
            IF (irq_pending == 1'b1) {
                // Trap: CSR module saves mepc/mcause, clears MIE
                pc          <= csr_mtvec;
                shadow_mode <= 1'b1;
                state       <= RVS.FETCH;
            } ELSE {
                bus_addr  <= pc;
                bus_cmd   <= CMD.READ;
                bus_valid <= 1'b1;
                state     <= RVS.WAIT_FETCH;
            }

        // ============================================================
        // WAIT_FETCH: Wait for bus DONE, latch instruction
        // ============================================================
        } ELIF (state == RVS.WAIT_FETCH) {
            IF (pbus.DONE == 1'b1) {
                instr     <= pbus.DATA;
                bus_valid <= 1'b0;
                state     <= RVS.DECODE;
            }

        // ============================================================
        // DECODE: Latch register values, compute immediate
        // ============================================================
        } ELIF (state == RVS.DECODE) {
            // Latch register file read outputs
            rs1_val <= rf_rs1_data;
            rs2_val <= rf_rs2_data;

            // --- Decode immediate based on instruction type ---
            IF (opcode == RVO.LUI || opcode == RVO.AUIPC) {
                // U-type: {instr[31:12], 12'h000}
                imm_val <= {instr[31:12], 12'h000};
            } ELIF (opcode == RVO.JAL) {
                // J-type
                IF (instr[31] == 1'b1) {
                    imm_val <= {11'h7FF, instr[31], instr[19:12], instr[20], instr[30:21], 1'b0};
                } ELSE {
                    imm_val <= {11'h000, instr[31], instr[19:12], instr[20], instr[30:21], 1'b0};
                }
            } ELIF (opcode == RVO.BRANCH) {
                // B-type
                IF (instr[31] == 1'b1) {
                    imm_val <= {19'h7FFFF, instr[31], instr[7], instr[30:25], instr[11:8], 1'b0};
                } ELSE {
                    imm_val <= {19'h00000, instr[31], instr[7], instr[30:25], instr[11:8], 1'b0};
                }
            } ELIF (opcode == RVO.STORE) {
                // S-type
                IF (instr[31] == 1'b1) {
                    imm_val <= {20'hFFFFF, instr[31:25], instr[11:7]};
                } ELSE {
                    imm_val <= {20'h00000, instr[31:25], instr[11:7]};
                }
            } ELSE {
                // I-type (LOAD, ALU_IMM, JALR, FENCE, SYSTEM)
                IF (instr[31] == 1'b1) {
                    imm_val <= {20'hFFFFF, instr[31:20]};
                } ELSE {
                    imm_val <= {20'h00000, instr[31:20]};
                }
            }

            state <= RVS.EXECUTE;

        // ============================================================
        // EXECUTE: ALU ops, branches, start memory ops
        // ============================================================
        } ELIF (state == RVS.EXECUTE) {

            // --- LUI ---
            IF (opcode == RVO.LUI) {
                wb_data <= imm_val;
                wb_rd   <= rd;
                next_pc <= pc + 32'h00000004;
                state   <= RVS.WRITEBACK;

            // --- AUIPC ---
            } ELIF (opcode == RVO.AUIPC) {
                wb_data <= pc + imm_val;
                wb_rd   <= rd;
                next_pc <= pc + 32'h00000004;
                state   <= RVS.WRITEBACK;

            // --- JAL ---
            } ELIF (opcode == RVO.JAL) {
                wb_data <= pc + 32'h00000004;
                wb_rd   <= rd;
                next_pc <= pc + imm_val;
                state   <= RVS.WRITEBACK;

            // --- JALR ---
            } ELIF (opcode == RVO.JALR) {
                wb_data <= pc + 32'h00000004;
                wb_rd   <= rd;
                next_pc <= (rs1_val + imm_val) & 32'hFFFFFFFE;
                state   <= RVS.WRITEBACK;

            // --- BRANCH ---
            } ELIF (opcode == RVO.BRANCH) {
                IF (funct3 == F3.BEQ) {
                    IF (rs1_val == rs2_val) {
                        pc <= pc + imm_val;
                    } ELSE {
                        pc <= pc + 32'h00000004;
                    }
                } ELIF (funct3 == F3.BNE) {
                    IF (rs1_val != rs2_val) {
                        pc <= pc + imm_val;
                    } ELSE {
                        pc <= pc + 32'h00000004;
                    }
                } ELIF (funct3 == F3.BLT) {
                    // Signed less than: ssub sign bit gives signed comparison
                    IF (ssub(rs1_val, rs2_val)[32] == 1'b1) {
                        pc <= pc + imm_val;
                    } ELSE {
                        pc <= pc + 32'h00000004;
                    }
                } ELIF (funct3 == F3.BGE) {
                    // Signed greater or equal
                    IF (ssub(rs1_val, rs2_val)[32] == 1'b0) {
                        pc <= pc + imm_val;
                    } ELSE {
                        pc <= pc + 32'h00000004;
                    }
                } ELIF (funct3 == F3.BLTU) {
                    IF (rs1_val < rs2_val) {
                        pc <= pc + imm_val;
                    } ELSE {
                        pc <= pc + 32'h00000004;
                    }
                } ELIF (funct3 == F3.BGEU) {
                    IF (rs1_val < rs2_val) {
                        pc <= pc + 32'h00000004;
                    } ELSE {
                        pc <= pc + imm_val;
                    }
                } ELSE {
                    pc <= pc + 32'h00000004;
                }
                state <= RVS.FETCH;

            // --- LOAD ---
            } ELIF (opcode == RVO.LOAD) {
                bus_addr    <= rs1_val + imm_val;
                mem_addr_lo <= (rs1_val + imm_val)[1:0];
                mem_funct3  <= funct3;
                bus_cmd     <= CMD.READ;
                bus_valid   <= 1'b1;
                wb_rd       <= rd;
                next_pc     <= pc + 32'h00000004;
                state       <= RVS.MEM_WAIT;

            // --- STORE ---
            } ELIF (opcode == RVO.STORE) {
                bus_addr    <= rs1_val + imm_val;
                mem_addr_lo <= (rs1_val + imm_val)[1:0];
                mem_funct3  <= funct3;
                next_pc     <= pc + 32'h00000004;

                IF (funct3 == F3.LW) {
                    // Word store: direct write
                    bus_data  <= rs2_val;
                    bus_cmd   <= CMD.WRITE;
                    bus_valid <= 1'b1;
                    state     <= RVS.MEM_WAIT;
                } ELSE {
                    // Byte/half store: read-modify-write, start with read
                    bus_cmd   <= CMD.READ;
                    bus_valid <= 1'b1;
                    state     <= RVS.MEM_WAIT;
                }

            // --- ALU immediate ---
            } ELIF (opcode == RVO.ALU_IMM) {
                wb_data <= alu_result;
                wb_rd   <= rd;
                next_pc <= pc + 32'h00000004;
                state   <= RVS.WRITEBACK;

            // --- ALU register ---
            } ELIF (opcode == RVO.ALU_REG) {
                IF (funct7 == 7'b0000001) {
                    // M extension: multiply/divide (muldiv unit started via async)
                    wb_rd   <= rd;
                    next_pc <= pc + 32'h00000004;
                    state   <= RVS.MULDIV_WAIT;
                } ELSE {
                    wb_data <= alu_result;
                    wb_rd   <= rd;
                    next_pc <= pc + 32'h00000004;
                    state   <= RVS.WRITEBACK;
                }

            // --- FENCE (treat as NOP) ---
            } ELIF (opcode == RVO.FENCE) {
                pc    <= pc + 32'h00000004;
                state <= RVS.FETCH;

            // --- SYSTEM (CSR / MRET / ECALL / EBREAK) ---
            } ELIF (opcode == RVO.SYSTEM) {
                IF (funct3 != 3'b000) {
                    // CSR instruction: old value to rd, write handled by async wr_en
                    wb_data <= csr_rd_data;
                    wb_rd   <= rd;
                    next_pc <= pc + 32'h00000004;
                    state   <= RVS.WRITEBACK;
                } ELSE {
                    IF (instr[31:20] == 12'h302) {
                        // MRET: return to mepc, CSR restores mstatus
                        pc          <= csr_mepc;
                        shadow_mode <= 1'b0;
                        state       <= RVS.FETCH;
                    } ELIF (instr[31:20] == 12'h105) {
                        // WFI: treat as NOP (spec allows this)
                        pc    <= pc + 32'h00000004;
                        state <= RVS.FETCH;
                    } ELSE {
                        // ECALL/EBREAK -> trap to mtvec
                        bus_valid <= 1'b0;
                        pc        <= csr_mtvec;
                        state     <= RVS.FETCH;
                    }
                }

            // --- Unknown opcode: NOP ---
            } ELSE {
                pc    <= pc + 32'h00000004;
                state <= RVS.FETCH;
            }

        // ============================================================
        // MEM_WAIT: Wait for load/store bus completion
        // ============================================================
        } ELIF (state == RVS.MEM_WAIT) {
            IF (pbus.DONE == 1'b1) {
                IF (opcode == RVO.LOAD) {
                    // Load: extract byte/half/word with sign extension
                    bus_valid <= 1'b0;
                    IF (mem_funct3 == F3.LB) {
                        IF (mem_addr_lo == 2'b00) {
                            wb_data <= (pbus.DATA[7] == 1'b1) ? {24'hFFFFFF, pbus.DATA[7:0]} : {24'h000000, pbus.DATA[7:0]};
                        } ELIF (mem_addr_lo == 2'b01) {
                            wb_data <= (pbus.DATA[15] == 1'b1) ? {24'hFFFFFF, pbus.DATA[15:8]} : {24'h000000, pbus.DATA[15:8]};
                        } ELIF (mem_addr_lo == 2'b10) {
                            wb_data <= (pbus.DATA[23] == 1'b1) ? {24'hFFFFFF, pbus.DATA[23:16]} : {24'h000000, pbus.DATA[23:16]};
                        } ELSE {
                            wb_data <= (pbus.DATA[31] == 1'b1) ? {24'hFFFFFF, pbus.DATA[31:24]} : {24'h000000, pbus.DATA[31:24]};
                        }
                    } ELIF (mem_funct3 == F3.LH) {
                        IF (mem_addr_lo[1] == 1'b0) {
                            wb_data <= (pbus.DATA[15] == 1'b1) ? {16'hFFFF, pbus.DATA[15:0]} : {16'h0000, pbus.DATA[15:0]};
                        } ELSE {
                            wb_data <= (pbus.DATA[31] == 1'b1) ? {16'hFFFF, pbus.DATA[31:16]} : {16'h0000, pbus.DATA[31:16]};
                        }
                    } ELIF (mem_funct3 == F3.LW) {
                        wb_data <= pbus.DATA;
                    } ELIF (mem_funct3 == F3.LBU) {
                        IF (mem_addr_lo == 2'b00) {
                            wb_data <= {24'h000000, pbus.DATA[7:0]};
                        } ELIF (mem_addr_lo == 2'b01) {
                            wb_data <= {24'h000000, pbus.DATA[15:8]};
                        } ELIF (mem_addr_lo == 2'b10) {
                            wb_data <= {24'h000000, pbus.DATA[23:16]};
                        } ELSE {
                            wb_data <= {24'h000000, pbus.DATA[31:24]};
                        }
                    } ELIF (mem_funct3 == F3.LHU) {
                        IF (mem_addr_lo[1] == 1'b0) {
                            wb_data <= {16'h0000, pbus.DATA[15:0]};
                        } ELSE {
                            wb_data <= {16'h0000, pbus.DATA[31:16]};
                        }
                    } ELSE {
                        wb_data <= pbus.DATA;
                    }
                    state <= RVS.WRITEBACK;

                } ELIF (bus_cmd == CMD.WRITE) {
                    // Store write complete (SW or RMW write phase done)
                    bus_valid <= 1'b0;
                    pc        <= next_pc;
                    state     <= RVS.FETCH;

                } ELSE {
                    // Store RMW read phase (SB/SH): merge byte/half into read word
                    IF (mem_funct3 == F3.LB) {
                        // Store byte
                        IF (mem_addr_lo == 2'b00) {
                            bus_data <= {pbus.DATA[31:8], rs2_val[7:0]};
                        } ELIF (mem_addr_lo == 2'b01) {
                            bus_data <= {pbus.DATA[31:16], rs2_val[7:0], pbus.DATA[7:0]};
                        } ELIF (mem_addr_lo == 2'b10) {
                            bus_data <= {pbus.DATA[31:24], rs2_val[7:0], pbus.DATA[15:0]};
                        } ELSE {
                            bus_data <= {rs2_val[7:0], pbus.DATA[23:0]};
                        }
                    } ELSE {
                        // Store halfword
                        IF (mem_addr_lo[1] == 1'b0) {
                            bus_data <= {pbus.DATA[31:16], rs2_val[15:0]};
                        } ELSE {
                            bus_data <= {rs2_val[15:0], pbus.DATA[15:0]};
                        }
                    }
                    bus_cmd   <= CMD.WRITE;
                    bus_valid <= 1'b1;
                    state     <= RVS.RMW_WAIT;
                }
            }

        // ============================================================
        // RMW_WAIT: Wait for RMW write to complete
        // ============================================================
        } ELIF (state == RVS.RMW_WAIT) {
            IF (pbus.DONE == 1'b1) {
                bus_valid <= 1'b0;
                pc        <= next_pc;
                state     <= RVS.FETCH;
            }

        // ============================================================
        // MULDIV_WAIT: Wait for multiply/divide completion
        // ============================================================
        } ELIF (state == RVS.MULDIV_WAIT) {
            IF (md_done == 1'b1) {
                wb_data <= md_result;
                state   <= RVS.WRITEBACK;
            }

        // ============================================================
        // WRITEBACK: Update PC (register write handled by regfile)
        // ============================================================
        } ELIF (state == RVS.WRITEBACK) {
            pc    <= next_pc;
            state <= RVS.FETCH;

        // ============================================================
        // HALT: Stay halted (ECALL/EBREAK)
        // ============================================================
        } ELIF (state == RVS.HALT) {
            state <= RVS.HALT;
        }
    }
@endmod
jz
// RV32I ALU
// Pure combinational arithmetic/logic unit
// Supports ADD/SUB, SLL, SLT, SLTU, XOR, SRL/SRA, OR, AND

@module rv_alu
    PORT {
        IN  [32] a;
        IN  [32] b;
        IN  [3]  funct3;
        IN  [1]  alt;
        OUT [32] result;
    }

    ASYNCHRONOUS {
        IF (funct3 == F3.ADD) {
            IF (alt == 1'b1) {
                // SUB
                result <= a - b;
            } ELSE {
                // ADD
                result <= a + b;
            }
        } ELIF (funct3 == F3.SLL) {
            result <= a << b[4:0];
        } ELIF (funct3 == F3.SLT) {
            // Signed less than: ssub gives 33-bit signed difference, bit[32] is sign
            result <= {31'd0, ssub(a, b)[32]};
        } ELIF (funct3 == F3.SLTU) {
            result <= (a < b) ? 32'h00000001 : 32'h00000000;
        } ELIF (funct3 == F3.XOR) {
            result <= a ^ b;
        } ELIF (funct3 == F3.SRL) {
            IF (alt == 1'b1) {
                // SRA
                result <= a >>> b[4:0];
            } ELSE {
                // SRL
                result <= a >> b[4:0];
            }
        } ELIF (funct3 == F3.OR) {
            result <= a | b;
        } ELIF (funct3 == F3.AND) {
            result <= a & b;
        } ELSE {
            result <= 32'h00000000;
        }
    }
@endmod
jz
// RV32I Zicsr Extension - CSR Register File
// M-mode CSRs: mstatus, mie, mtvec, mepc, mcause, mtval, mcycle
// Combinational read port, synchronous write port
// Trap entry/exit logic for M-mode interrupts

@module rv_csr
    CONST {
        CLK_FREQ_MHZ = 54;
        SDRAM_SIZE_BYTES = 0;
    }

    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;
        IN  [12] rd_addr;
        OUT [32] rd_data;
        IN  [12] wr_addr;
        IN  [32] wr_data;
        IN  [1]  wr_en;
        IN  [1]  trap_enter;
        IN  [32] trap_epc;
        IN  [32] trap_cause;
        IN  [1]  trap_mret;
        OUT [32] mtvec_out;
        OUT [32] mepc_out;
        OUT [1]  mstatus_mie;
        OUT [1]  mie_meie;
        IN  [32] irq_lines;
        OUT [32] irqvec_out;
        OUT [32] sdcardvec_out;
        OUT [1]  video_mode;
        OUT [16] baud_div;
    }

    REGISTER {
        mstatus_r [32] = 32'h00001800;
        mie_r     [32] = 32'h00000000;
        mtvec_r   [32] = 32'h00000000;
        mepc_r    [32] = 32'h00000000;
        mcause_r  [32] = 32'h00000000;
        mtval_r   [32] = 32'h00000000;
        mcycle_r      [32] = 32'h00000000;
        irqvec_r      [32] = 32'h00000000;
        sdcardvec_r   [32] = 32'h00000000;
        video_mode_r  [1]  = 1'b0;
        baud_div_r    [16] = 16'd644;
    }

    ASYNCHRONOUS {
        // CSR read mux (combinational)
        IF (rd_addr == 12'h300) {
            rd_data <= mstatus_r;
        } ELIF (rd_addr == 12'h304) {
            rd_data <= mie_r;
        } ELIF (rd_addr == 12'h305) {
            rd_data <= mtvec_r;
        } ELIF (rd_addr == 12'h341) {
            rd_data <= mepc_r;
        } ELIF (rd_addr == 12'h342) {
            rd_data <= mcause_r;
        } ELIF (rd_addr == 12'h343) {
            rd_data <= mtval_r;
        } ELIF (rd_addr == 12'hB00) {
            rd_data <= mcycle_r;
        } ELIF (rd_addr == 12'hBC0) {
            rd_data <= lit(32, CLK_FREQ_MHZ);
        } ELIF (rd_addr == 12'hBC1) {
            rd_data <= {31'd0, video_mode_r};
        } ELIF (rd_addr == 12'hBC2) {
            rd_data <= {16'd0, baud_div_r};
        } ELIF (rd_addr == 12'hBC3) {
            rd_data <= lit(32, SDRAM_SIZE_BYTES);
        } ELIF (rd_addr == 12'hBC4) {
            rd_data <= irqvec_r;
        } ELIF (rd_addr == 12'hBC5) {
            rd_data <= sdcardvec_r;
        } ELIF (rd_addr == 12'hFC0) {
            rd_data <= irq_lines;
        } ELSE {
            rd_data <= 32'h00000000;
        }

        // Direct outputs
        mtvec_out     <= mtvec_r;
        mepc_out      <= mepc_r;
        irqvec_out    <= irqvec_r;
        sdcardvec_out <= sdcardvec_r;
        mstatus_mie <= mstatus_r[3];
        mie_meie    <= mie_r[11];
        video_mode  <= video_mode_r;
        baud_div    <= baud_div_r;
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        // Free-running cycle counter
        mcycle_r <= mcycle_r + 32'h00000001;

        // Trap entry: save mepc/mcause, MPIE<=MIE, MIE<=0
        IF (trap_enter == 1'b1) {
            mepc_r   <= trap_epc;
            mcause_r <= trap_cause;
            mstatus_r <= {mstatus_r[31:8], mstatus_r[3], mstatus_r[6:4], 1'b0, mstatus_r[2:0]};

        // MRET: MIE<=MPIE, MPIE<=1
        } ELIF (trap_mret == 1'b1) {
            mstatus_r <= {mstatus_r[31:8], 1'b1, mstatus_r[6:4], mstatus_r[7], mstatus_r[2:0]};

        // CSR write
        } ELIF (wr_en == 1'b1) {
            IF (wr_addr == 12'h300) {
                // mstatus: preserve MPP=11 (M-mode only)
                mstatus_r <= {19'h00000, 2'b11, wr_data[10:8], wr_data[7], wr_data[6:4], wr_data[3], wr_data[2:0]};
            } ELIF (wr_addr == 12'h304) {
                mie_r <= wr_data;
            } ELIF (wr_addr == 12'h305) {
                // mtvec: force DIRECT mode (bits[1:0]=00)
                mtvec_r <= {wr_data[31:2], 2'b00};
            } ELIF (wr_addr == 12'h341) {
                // mepc: align to instruction boundary
                mepc_r <= {wr_data[31:2], 2'b00};
            } ELIF (wr_addr == 12'h342) {
                mcause_r <= wr_data;
            } ELIF (wr_addr == 12'h343) {
                mtval_r <= wr_data;
            } ELIF (wr_addr == 12'hBC4) {
                irqvec_r <= {wr_data[31:2], 2'b00};
            } ELIF (wr_addr == 12'hBC5) {
                sdcardvec_r <= {wr_data[31:2], 2'b00};
            } ELIF (wr_addr == 12'hBC1) {
                video_mode_r <= wr_data[0];
            } ELIF (wr_addr == 12'hBC2) {
                baud_div_r <= wr_data[15:0];
            }
            // mcycle (0xB00) and clk_freq (0xBC0) are read-only
        }
    }
@endmod
jz
// RV32M Multiply/Divide Unit
// Multiply: single-cycle via umul/smul intrinsics
// Divide: 32-cycle restoring division + 1 result cycle
// Supports MUL, MULH, MULHSU, MULHU, DIV, DIVU, REM, REMU

@module rv_muldiv
    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;
        IN  [32] a;
        IN  [32] b;
        IN  [3]  funct3;
        IN  [1]  start;
        OUT [32] result;
        OUT [1]  done;
    }

    WIRE {
        // Combinational multiply results via intrinsics
        mul_uu    [64];  // unsigned * unsigned
        mul_ss    [64];  // signed * signed
        mul_su    [64];  // |signed| * unsigned (for MULHSU)
        mul_su_neg [64]; // negated mul_su

        // Absolute values for divide via abs() intrinsic
        a_abs [33];
        b_abs [33];
    }

    REGISTER {
        running    [1]  = 1'b0;
        finishing  [1]  = 1'b0;
        count      [6]  = 6'd0;
        op         [3]  = 3'b000;
        negate_res [1]  = 1'b0;

        // Divide state
        quotient  [32] = 32'h00000000;
        remainder [33] = 33'h000000000;
        divisor   [33] = 33'h000000000;
        dividend  [32] = 32'h00000000;

        // Output registers
        res_reg  [32] = 32'h00000000;
        done_reg [1]  = 1'b0;
    }

    ASYNCHRONOUS {
        // Single-cycle multiply via intrinsics
        mul_uu <= umul(a, b);
        mul_ss <= smul(a, b);
        mul_su <= umul(abs(a)[31:0], b);
        mul_su_neg <= ~umul(abs(a)[31:0], b) + 64'h0000000000000001;

        // Absolute values for divide
        a_abs <= abs(a);
        b_abs <= abs(b);

        result <= res_reg;
        done   <= done_reg;
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        // ---- Finishing cycle: select divide result ----
        IF (finishing == 1'b1) {
            finishing <= 1'b0;
            done_reg <= 1'b1;

            IF (op == 3'b100 || op == 3'b101) {
                // DIV/DIVU: return quotient
                IF (negate_res == 1'b1) {
                    res_reg <= ~quotient + 32'h00000001;
                } ELSE {
                    res_reg <= quotient;
                }
            } ELSE {
                // REM/REMU: return remainder
                IF (negate_res == 1'b1) {
                    res_reg <= ~remainder[31:0] + 32'h00000001;
                } ELSE {
                    res_reg <= remainder[31:0];
                }
            }

        // ---- Divide algorithm running ----
        } ELIF (running == 1'b1) {
            done_reg <= 1'b0;

            // Restoring division step
            IF ({remainder[31:0], dividend[31]} >= divisor) {
                remainder <= {remainder[31:0], dividend[31]} - divisor;
                quotient  <= {quotient[30:0], 1'b1};
            } ELSE {
                remainder <= {remainder[31:0], dividend[31]};
                quotient  <= {quotient[30:0], 1'b0};
            }
            dividend <= {dividend[30:0], 1'b0};

            IF (count == 6'd31) {
                running  <= 1'b0;
                finishing <= 1'b1;
            } ELSE {
                count <= count + 6'd1;
            }

        // ---- Idle: check for start ----
        } ELIF (start == 1'b1) {
            op <= funct3;

            IF (funct3[2] == 1'b0) {
                // ---- Multiply: single-cycle result from combinational intrinsics ----
                done_reg <= 1'b1;

                IF (funct3 == 3'b000) {
                    // MUL: low 32 bits (same for signed/unsigned)
                    res_reg <= mul_uu[31:0];
                } ELIF (funct3 == 3'b001) {
                    // MULH: signed * signed, high word
                    res_reg <= mul_ss[63:32];
                } ELIF (funct3 == 3'b010) {
                    // MULHSU: signed * unsigned, high word
                    IF (a[31] == 1'b1) {
                        res_reg <= mul_su_neg[63:32];
                    } ELSE {
                        res_reg <= mul_su[63:32];
                    }
                } ELSE {
                    // MULHU: unsigned * unsigned, high word
                    res_reg <= mul_uu[63:32];
                }

            } ELSE {
                // ---- Divide start ----
                // Check divide by zero (1-cycle special case)
                IF (b == 32'h00000000) {
                    IF (funct3 == 3'b110 || funct3 == 3'b111) {
                        res_reg <= a;
                    } ELSE {
                        res_reg <= 32'hFFFFFFFF;
                    }
                    done_reg <= 1'b1;

                // Check signed overflow: -2^31 / -1
                } ELIF (funct3 == 3'b100 && a == 32'h80000000 && b == 32'hFFFFFFFF) {
                    res_reg  <= 32'h80000000;
                    done_reg <= 1'b1;
                } ELIF (funct3 == 3'b110 && a == 32'h80000000 && b == 32'hFFFFFFFF) {
                    res_reg  <= 32'h00000000;
                    done_reg <= 1'b1;

                } ELSE {
                    // Normal divide: use abs() for signed operands
                    done_reg <= 1'b0;
                    IF (funct3[0] == 1'b1) {
                        // DIVU/REMU: unsigned
                        dividend   <= a;
                        divisor    <= {1'b0, b};
                        negate_res <= 1'b0;
                    } ELSE {
                        // DIV/REM: signed, operate on absolute values
                        dividend <= a_abs[31:0];
                        divisor  <= {1'b0, b_abs[31:0]};
                        IF (funct3 == 3'b100) {
                            negate_res <= a[31] ^ b[31];
                        } ELSE {
                            negate_res <= a[31];
                        }
                    }
                    remainder <= 33'h000000000;
                    quotient  <= 32'h00000000;
                    count     <= 6'd0;
                    running   <= 1'b1;
                }
            }
        } ELSE {
            done_reg <= 1'b0;
        }
    }
@endmod
jz
// RV32I Register File with Shadow Bank
// 31 general-purpose registers (x1-x31), x0 hardwired to zero
// 31 shadow registers (s1-s31) for zero-overhead trap context switching
// 2 asynchronous read ports, 1 synchronous write port

@module rv_regfile
    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;
        IN  [5]  rs1_addr;
        IN  [5]  rs2_addr;
        OUT [32] rs1_data;
        OUT [32] rs2_data;
        IN  [5]  wr_addr;
        IN  [32] wr_data;
        IN  [1]  wr_en;
        IN  [1]  shadow;
    }

    REGISTER {
        x1  [32] = 32'h00000000;
        x2  [32] = 32'h00000000;
        x3  [32] = 32'h00000000;
        x4  [32] = 32'h00000000;
        x5  [32] = 32'h00000000;
        x6  [32] = 32'h00000000;
        x7  [32] = 32'h00000000;
        x8  [32] = 32'h00000000;
        x9  [32] = 32'h00000000;
        x10 [32] = 32'h00000000;
        x11 [32] = 32'h00000000;
        x12 [32] = 32'h00000000;
        x13 [32] = 32'h00000000;
        x14 [32] = 32'h00000000;
        x15 [32] = 32'h00000000;
        x16 [32] = 32'h00000000;
        x17 [32] = 32'h00000000;
        x18 [32] = 32'h00000000;
        x19 [32] = 32'h00000000;
        x20 [32] = 32'h00000000;
        x21 [32] = 32'h00000000;
        x22 [32] = 32'h00000000;
        x23 [32] = 32'h00000000;
        x24 [32] = 32'h00000000;
        x25 [32] = 32'h00000000;
        x26 [32] = 32'h00000000;
        x27 [32] = 32'h00000000;
        x28 [32] = 32'h00000000;
        x29 [32] = 32'h00000000;
        x30 [32] = 32'h00000000;
        x31 [32] = 32'h00000000;
        s1  [32] = 32'h00000000;
        s2  [32] = 32'h00000000;
        s3  [32] = 32'h00000000;
        s4  [32] = 32'h00000000;
        s5  [32] = 32'h00000000;
        s6  [32] = 32'h00000000;
        s7  [32] = 32'h00000000;
        s8  [32] = 32'h00000000;
        s9  [32] = 32'h00000000;
        s10 [32] = 32'h00000000;
        s11 [32] = 32'h00000000;
        s12 [32] = 32'h00000000;
        s13 [32] = 32'h00000000;
        s14 [32] = 32'h00000000;
        s15 [32] = 32'h00000000;
        s16 [32] = 32'h00000000;
        s17 [32] = 32'h00000000;
        s18 [32] = 32'h00000000;
        s19 [32] = 32'h00000000;
        s20 [32] = 32'h00000000;
        s21 [32] = 32'h00000000;
        s22 [32] = 32'h00000000;
        s23 [32] = 32'h00000000;
        s24 [32] = 32'h00000000;
        s25 [32] = 32'h00000000;
        s26 [32] = 32'h00000000;
        s27 [32] = 32'h00000000;
        s28 [32] = 32'h00000000;
        s29 [32] = 32'h00000000;
        s30 [32] = 32'h00000000;
        s31 [32] = 32'h00000000;
    }

    WIRE {
        b1  [32];
        b2  [32];
        b3  [32];
        b4  [32];
        b5  [32];
        b6  [32];
        b7  [32];
        b8  [32];
        b9  [32];
        b10 [32];
        b11 [32];
        b12 [32];
        b13 [32];
        b14 [32];
        b15 [32];
        b16 [32];
        b17 [32];
        b18 [32];
        b19 [32];
        b20 [32];
        b21 [32];
        b22 [32];
        b23 [32];
        b24 [32];
        b25 [32];
        b26 [32];
        b27 [32];
        b28 [32];
        b29 [32];
        b30 [32];
        b31 [32];
        x0  [32];
    }

    // Read mux: index 0 = x0 (zero), indices 1-31 = bank-selected registers
    MUX {
        bank = x0, b1, b2, b3, b4, b5, b6, b7, b8,
               b9, b10, b11, b12, b13, b14, b15, b16,
               b17, b18, b19, b20, b21, b22, b23, b24,
               b25, b26, b27, b28, b29, b30, b31;
    }

    ASYNCHRONOUS {
        // x0 is hardwired to zero
        x0  <= 32'h00000000;

        // Bank-selected register values
        b1  <= (shadow == 1'b0) ? x1  : s1;
        b2  <= (shadow == 1'b0) ? x2  : s2;
        b3  <= (shadow == 1'b0) ? x3  : s3;
        b4  <= (shadow == 1'b0) ? x4  : s4;
        b5  <= (shadow == 1'b0) ? x5  : s5;
        b6  <= (shadow == 1'b0) ? x6  : s6;
        b7  <= (shadow == 1'b0) ? x7  : s7;
        b8  <= (shadow == 1'b0) ? x8  : s8;
        b9  <= (shadow == 1'b0) ? x9  : s9;
        b10 <= (shadow == 1'b0) ? x10 : s10;
        b11 <= (shadow == 1'b0) ? x11 : s11;
        b12 <= (shadow == 1'b0) ? x12 : s12;
        b13 <= (shadow == 1'b0) ? x13 : s13;
        b14 <= (shadow == 1'b0) ? x14 : s14;
        b15 <= (shadow == 1'b0) ? x15 : s15;
        b16 <= (shadow == 1'b0) ? x16 : s16;
        b17 <= (shadow == 1'b0) ? x17 : s17;
        b18 <= (shadow == 1'b0) ? x18 : s18;
        b19 <= (shadow == 1'b0) ? x19 : s19;
        b20 <= (shadow == 1'b0) ? x20 : s20;
        b21 <= (shadow == 1'b0) ? x21 : s21;
        b22 <= (shadow == 1'b0) ? x22 : s22;
        b23 <= (shadow == 1'b0) ? x23 : s23;
        b24 <= (shadow == 1'b0) ? x24 : s24;
        b25 <= (shadow == 1'b0) ? x25 : s25;
        b26 <= (shadow == 1'b0) ? x26 : s26;
        b27 <= (shadow == 1'b0) ? x27 : s27;
        b28 <= (shadow == 1'b0) ? x28 : s28;
        b29 <= (shadow == 1'b0) ? x29 : s29;
        b30 <= (shadow == 1'b0) ? x30 : s30;
        b31 <= (shadow == 1'b0) ? x31 : s31;

        // Read ports (combinational): MUX selects register by address
        rs1_data <= bank[rs1_addr];
        rs2_data <= bank[rs2_addr];
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        IF (wr_en == 1'b1) {
            IF (shadow == 1'b0) {
                IF (wr_addr == 5'd1) { x1 <= wr_data; }
                ELIF (wr_addr == 5'd2) { x2 <= wr_data; }
                ELIF (wr_addr == 5'd3) { x3 <= wr_data; }
                ELIF (wr_addr == 5'd4) { x4 <= wr_data; }
                ELIF (wr_addr == 5'd5) { x5 <= wr_data; }
                ELIF (wr_addr == 5'd6) { x6 <= wr_data; }
                ELIF (wr_addr == 5'd7) { x7 <= wr_data; }
                ELIF (wr_addr == 5'd8) { x8 <= wr_data; }
                ELIF (wr_addr == 5'd9) { x9 <= wr_data; }
                ELIF (wr_addr == 5'd10) { x10 <= wr_data; }
                ELIF (wr_addr == 5'd11) { x11 <= wr_data; }
                ELIF (wr_addr == 5'd12) { x12 <= wr_data; }
                ELIF (wr_addr == 5'd13) { x13 <= wr_data; }
                ELIF (wr_addr == 5'd14) { x14 <= wr_data; }
                ELIF (wr_addr == 5'd15) { x15 <= wr_data; }
                ELIF (wr_addr == 5'd16) { x16 <= wr_data; }
                ELIF (wr_addr == 5'd17) { x17 <= wr_data; }
                ELIF (wr_addr == 5'd18) { x18 <= wr_data; }
                ELIF (wr_addr == 5'd19) { x19 <= wr_data; }
                ELIF (wr_addr == 5'd20) { x20 <= wr_data; }
                ELIF (wr_addr == 5'd21) { x21 <= wr_data; }
                ELIF (wr_addr == 5'd22) { x22 <= wr_data; }
                ELIF (wr_addr == 5'd23) { x23 <= wr_data; }
                ELIF (wr_addr == 5'd24) { x24 <= wr_data; }
                ELIF (wr_addr == 5'd25) { x25 <= wr_data; }
                ELIF (wr_addr == 5'd26) { x26 <= wr_data; }
                ELIF (wr_addr == 5'd27) { x27 <= wr_data; }
                ELIF (wr_addr == 5'd28) { x28 <= wr_data; }
                ELIF (wr_addr == 5'd29) { x29 <= wr_data; }
                ELIF (wr_addr == 5'd30) { x30 <= wr_data; }
                ELIF (wr_addr == 5'd31) { x31 <= wr_data; }
            } ELSE {
                IF (wr_addr == 5'd1) { s1 <= wr_data; }
                ELIF (wr_addr == 5'd2) { s2 <= wr_data; }
                ELIF (wr_addr == 5'd3) { s3 <= wr_data; }
                ELIF (wr_addr == 5'd4) { s4 <= wr_data; }
                ELIF (wr_addr == 5'd5) { s5 <= wr_data; }
                ELIF (wr_addr == 5'd6) { s6 <= wr_data; }
                ELIF (wr_addr == 5'd7) { s7 <= wr_data; }
                ELIF (wr_addr == 5'd8) { s8 <= wr_data; }
                ELIF (wr_addr == 5'd9) { s9 <= wr_data; }
                ELIF (wr_addr == 5'd10) { s10 <= wr_data; }
                ELIF (wr_addr == 5'd11) { s11 <= wr_data; }
                ELIF (wr_addr == 5'd12) { s12 <= wr_data; }
                ELIF (wr_addr == 5'd13) { s13 <= wr_data; }
                ELIF (wr_addr == 5'd14) { s14 <= wr_data; }
                ELIF (wr_addr == 5'd15) { s15 <= wr_data; }
                ELIF (wr_addr == 5'd16) { s16 <= wr_data; }
                ELIF (wr_addr == 5'd17) { s17 <= wr_data; }
                ELIF (wr_addr == 5'd18) { s18 <= wr_data; }
                ELIF (wr_addr == 5'd19) { s19 <= wr_data; }
                ELIF (wr_addr == 5'd20) { s20 <= wr_data; }
                ELIF (wr_addr == 5'd21) { s21 <= wr_data; }
                ELIF (wr_addr == 5'd22) { s22 <= wr_data; }
                ELIF (wr_addr == 5'd23) { s23 <= wr_data; }
                ELIF (wr_addr == 5'd24) { s24 <= wr_data; }
                ELIF (wr_addr == 5'd25) { s25 <= wr_data; }
                ELIF (wr_addr == 5'd26) { s26 <= wr_data; }
                ELIF (wr_addr == 5'd27) { s27 <= wr_data; }
                ELIF (wr_addr == 5'd28) { s28 <= wr_data; }
                ELIF (wr_addr == 5'd29) { s29 <= wr_data; }
                ELIF (wr_addr == 5'd30) { s30 <= wr_data; }
                ELIF (wr_addr == 5'd31) { s31 <= wr_data; }
            }
        }
    }
@endmod
jz
// Simple address-decode arbiter for single bus master
// Routes CPU bus to ROM, RAM, or LED based on address
@module arbiter
    CONST {
        SOURCE_COUNT = 1;
        TARGET_COUNT = 5;
    }

    PORT {
        IN  [TARGET_COUNT * 8] map_config;
        BUS SIMPLE_BUS TARGET [SOURCE_COUNT] src;
        BUS SIMPLE_BUS SOURCE [TARGET_COUNT] tgt;
    }

    WIRE {
        tgt_done        [TARGET_COUNT];
        any_done        [1];
        tgt_match       [TARGET_COUNT];
    }

    @template TARGET_MATCH(match_vec, src_port, config)
        match_vec[IDX] <= (((src_port.ADDR[31:28] ^ config[IDX*8+7:IDX*8+4]) & config[IDX*8+3:IDX*8]) == 4'b0000) ? 1'b1 : 1'b0;
    @endtemplate

    @template TARGET_DONE(done, tgt)
        done[IDX] = tgt[IDX].DONE;
    @endtemplate

    @template ROUTE_TARGET(src_port, tgt_port, config)
        @scratch match [1];

        // Address decode: ((addr[31:28] ^ value) & care) == 4'b0000
        match <= (((src_port.ADDR[31:28] ^ config[IDX*8+7:IDX*8+4]) & config[IDX*8+3:IDX*8]) == 4'b0000);

        // VALID only to the matching target
        tgt_port[IDX].VALID <= (src_port.VALID == 1'b1 && match == 1'b1) ? 1'b1 : 1'b0;

        // Forward address and command (direct assignment, not alias)
        tgt_port[IDX].ADDR <= src_port.ADDR;
        tgt_port[IDX].CMD  <= src_port.CMD;

        // DATA uses alias (=) for tristate pass-through
        tgt_port[IDX].DATA = src_port.DATA;
    @endtemplate

    ASYNCHRONOUS {
        // Gather target done signals
        @apply [TARGET_COUNT] TARGET_DONE(tgt_done, tgt);
        any_done <= (tgt_done != lit(TARGET_COUNT, 0));

        // Route source to targets (single source, index 0)
        @apply [TARGET_COUNT] ROUTE_TARGET(src[0], tgt, map_config);

        // Compute which target matches address
        @apply [TARGET_COUNT] TARGET_MATCH(tgt_match, src[0], map_config);

        // Reverse DATA path: responding target to source
        src[0].DATA <= tgt[oh2b(tgt_match)].DATA;

        // Route DONE back to source
        src[0].DONE <= any_done ? 1'b1 : 1'b0;
    }
@endmod
jz
@module ram
    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        BUS SIMPLE_BUS TARGET pbus;
    }

    // 5120-word bank, 32-bit wide (10 BSRAMs = 20KB)
    MEM(TYPE=BLOCK) {
        ram_mem [32] [5120] = 32'h00000000 {
            OUT read SYNC;
            IN  write;
        };
    }

    REGISTER {
        pending_read [1] = 1'b0;
        data_ready   [1] = 1'b0;
        read_data    [32] = 32'b0;
    }

    ASYNCHRONOUS {
        // Drive data only when data_ready
        pbus.DATA <= (pbus.VALID && pbus.CMD == CMD.READ && data_ready == 1'b1) ? read_data : 32'bz;

        // DONE signaling
        IF (pbus.VALID) {
            IF (pbus.CMD == CMD.WRITE) {
                pbus.DONE <= 1'b1;
            } ELIF (data_ready == 1'b1) {
                pbus.DONE <= 1'b1;
            } ELSE {
                pbus.DONE <= 1'b0;
            }
        } ELSE {
            pbus.DONE <= 1'bz;
        }
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        // Write path: 1-cycle
        IF (pbus.VALID && pbus.CMD == CMD.WRITE) {
            ram_mem.write[pbus.ADDR[14:2]] <= pbus.DATA;
        }

        // Read path: 2-stage pipeline
        IF (data_ready == 1'b1) {
            data_ready <= 1'b0;
        } ELIF (pending_read == 1'b1) {
            read_data <= ram_mem.read.data;
            data_ready <= 1'b1;
            pending_read <= 1'b0;
        } ELIF (pbus.VALID && pbus.CMD == CMD.READ) {
            ram_mem.read.addr <= pbus.ADDR[14:2];
            pending_read <= 1'b1;
        }
    }
@endmod
jz
@module rom
    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        BUS SIMPLE_BUS TARGET pbus;
    }

    // Single 4096-word bank, 32-bit wide (BSRAM)
    MEM(TYPE=BLOCK) {
        rom_mem [32] [4096] = @file("../out/bios.hex") {
            OUT read SYNC;
        };
    }

    REGISTER {
        pending_read [1] = 1'b0;
        data_ready   [1] = 1'b0;
        read_data    [32] = 32'b0;
    }

    ASYNCHRONOUS {
        // Drive data only when data_ready (after BSRAM pipeline)
        pbus.DATA <= (pbus.VALID && pbus.CMD == CMD.READ && data_ready == 1'b1) ? read_data : 32'bz;

        // DONE: on reads once data_ready is set, writes are ignored (ROM)
        IF (pbus.VALID) {
            IF (data_ready == 1'b1) {
                pbus.DONE <= 1'b1;
            } ELSE {
                pbus.DONE <= 1'b0;
            }
        } ELSE {
            pbus.DONE <= 1'bz;
        }
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        // 2-stage pipeline for SYNC memory read:
        // Stage 1: pending_read=1 - address sent to BSRAM
        // Stage 2: data_ready=1 - data latched into read_data

        IF (data_ready == 1'b1) {
            // Clear data_ready after CPU has seen DONE
            data_ready <= 1'b0;
        } ELIF (pending_read == 1'b1) {
            // Stage 2: Memory output is now valid, latch
            read_data <= rom_mem.read.data;
            data_ready <= 1'b1;
            pending_read <= 1'b0;
        } ELIF (pbus.VALID && pbus.CMD == CMD.READ) {
            // Stage 1: New read request, send address
            rom_mem.read.addr <= pbus.ADDR[13:2];
            pending_read <= 1'b1;
        }
    }
@endmod
jz
@module sdram_ctrl
    CONST {
        CLK_FREQ_MHZ = 54;
        INIT_COUNT = CLK_FREQ_MHZ * 200;
        REFRESH_INTERVAL = CLK_FREQ_MHZ * 78 / 10;
        MODE_REG = 544;           // CL=2, burst=1, sequential

        // GW2AR-18 SDRAM geometry: 2M x 32 = 64Mbit
        ROW_BITS  = 11;
        COL_BITS  = 8;
        BANK_BITS = 2;
        DATA_BITS = 32;
        ADDR_BITS = 21;           // ROW_BITS + COL_BITS + BANK_BITS

        // State machine states
        ST_INIT   = 0;
        ST_IPRE   = 1;
        ST_IREF   = 2;
        ST_IMODE  = 3;
        ST_IDLE   = 4;
        ST_ACT_W  = 5;
        ST_ACT    = 6;
        ST_RD     = 7;
        ST_RD_CL  = 8;
        ST_WR     = 9;
        ST_REF    = 10;
    }

    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;

        // User interface
        IN  [21] addr;
        IN  [32] wdata;
        OUT [32] rdata;
        IN  [1]  rd;
        IN  [1]  wr;
        OUT [1]  busy;
        OUT [1]  done;

        // SDRAM physical interface
        OUT [1]  sdram_cke;
        OUT [1]  sdram_cs_n;
        OUT [1]  sdram_ras_n;
        OUT [1]  sdram_cas_n;
        OUT [1]  sdram_wen_n;
        OUT [4]  sdram_dqm;
        OUT [11] sdram_addr;
        OUT [2]  sdram_ba;
        INOUT [32] sdram_dq;
    }

    REGISTER {
        state       [4]  = 4'd0;
        init_cnt    [14] = 14'b0;
        wait_cnt    [3]  = 3'b0;
        ref_cnt     [10] = 10'b0;
        ref_done    [1]  = 1'b0;

        // Command registers
        r_cke       [1]  = 1'b0;
        r_cs_n      [1]  = 1'b1;
        r_ras_n     [1]  = 1'b1;
        r_cas_n     [1]  = 1'b1;
        r_wen_n     [1]  = 1'b1;
        r_addr      [11] = 11'b0;
        r_ba        [2]  = 2'b0;
        r_dqm       [4]  = 4'b1111;

        // DQ control
        r_dq_oe     [1]  = 1'b0;
        r_dq_out    [32] = 32'b0;

        // Latched request
        r_req_addr  [21] = 21'b0;
        r_req_wdata [32] = 32'b0;
        r_req_write [1]  = 1'b0;

        // Output
        r_rdata     [32] = 32'b0;
        r_done      [1]  = 1'b0;
    }

    ASYNCHRONOUS {
        // Drive SDRAM pins from registers
        sdram_cke   <= r_cke;
        sdram_cs_n  <= r_cs_n;
        sdram_ras_n <= r_ras_n;
        sdram_cas_n <= r_cas_n;
        sdram_wen_n <= r_wen_n;
        sdram_dqm   <= r_dqm;
        sdram_addr  <= r_addr;
        sdram_ba    <= r_ba;

        // Tristate DQ bus
        sdram_dq <= (r_dq_oe == 1'b1) ? r_dq_out : 32'bz;

        // User interface outputs
        rdata <= r_rdata;
        done  <= r_done;
        busy  <= (state != lit(4, ST_IDLE));
    }

    SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
        SELECT(state) {
            // ---- INIT: Power-up wait (200us) ----
            CASE (lit(4, ST_INIT)) {
                r_cke   <= 1'b1;
                r_dq_oe <= 1'b0;
                r_done  <= 1'b0;

                IF (init_cnt == lit(14, INIT_COUNT)) {
                    // PRECHARGE ALL: cs=0, ras=0, cas=1, we=0, A10=1
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b0;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b0;
                    r_addr  <= 11'b10000000000;
                    wait_cnt <= 3'd1;
                    state <= lit(4, ST_IPRE);
                } ELSE {
                    // INHIBIT
                    r_cs_n  <= 1'b1;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                    init_cnt <= init_cnt + 14'b1;
                }
            }

            // ---- IPRE: Wait after PRECHARGE ALL ----
            CASE (lit(4, ST_IPRE)) {
                r_done <= 1'b0;
                IF (wait_cnt == 3'b0) {
                    // AUTO REFRESH: cs=0, ras=0, cas=0, we=1
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b0;
                    r_cas_n <= 1'b0;
                    r_wen_n <= 1'b1;
                    wait_cnt <= 3'd2;
                    state <= lit(4, ST_IREF);
                } ELSE {
                    // NOP
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                    wait_cnt <= wait_cnt - 3'b1;
                }
            }

            // ---- IREF: Init AUTO REFRESH (done twice) ----
            CASE (lit(4, ST_IREF)) {
                r_done <= 1'b0;
                IF (wait_cnt == 3'b0) {
                    IF (ref_done == 1'b0) {
                        // Second AUTO REFRESH
                        r_cs_n  <= 1'b0;
                        r_ras_n <= 1'b0;
                        r_cas_n <= 1'b0;
                        r_wen_n <= 1'b1;
                        wait_cnt <= 3'd2;
                        ref_done <= 1'b1;
                    } ELSE {
                        // MODE SET: cs=0, ras=0, cas=0, we=0, addr=mode
                        r_cs_n  <= 1'b0;
                        r_ras_n <= 1'b0;
                        r_cas_n <= 1'b0;
                        r_wen_n <= 1'b0;
                        r_addr  <= lit(11, MODE_REG);
                        r_ba    <= 2'b0;
                        wait_cnt <= 3'd2;
                        state <= lit(4, ST_IMODE);
                    }
                } ELSE {
                    // NOP
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                    wait_cnt <= wait_cnt - 3'b1;
                }
            }

            // ---- IMODE: Wait after MODE REGISTER SET ----
            CASE (lit(4, ST_IMODE)) {
                r_done <= 1'b0;
                IF (wait_cnt == 3'b0) {
                    // NOP, go to IDLE
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                    r_dqm   <= 4'b0000;
                    state <= lit(4, ST_IDLE);
                } ELSE {
                    // NOP
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                    wait_cnt <= wait_cnt - 3'b1;
                }
            }

            // ---- IDLE: Ready for commands ----
            // rd/wr checked before refresh to prevent lost pulses.
            // Delaying refresh by one access (~8 cycles) is within SDRAM timing margin.
            CASE (lit(4, ST_IDLE)) {
                r_dq_oe <= 1'b0;
                r_done  <= 1'b0;

                IF ((rd == 1'b1 || wr == 1'b1) && r_done == 1'b0) {
                    // Latch request
                    r_req_addr  <= addr;
                    r_req_wdata <= wdata;
                    r_req_write <= wr;

                    // ACTIVATE: cs=0, ras=0, cas=1, we=1
                    // Bank = addr[20:19], Row = addr[18:8]
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b0;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                    r_addr  <= addr[18:8];
                    r_ba    <= addr[20:19];
                    ref_cnt <= ref_cnt + 10'b1;
                    state <= lit(4, ST_ACT_W);
                } ELIF (ref_cnt >= lit(10, REFRESH_INTERVAL)) {
                    // AUTO REFRESH: cs=0, ras=0, cas=0, we=1
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b0;
                    r_cas_n <= 1'b0;
                    r_wen_n <= 1'b1;
                    ref_cnt  <= 10'b0;
                    wait_cnt <= 3'd2;
                    state <= lit(4, ST_REF);
                } ELSE {
                    ref_cnt <= ref_cnt + 10'b1;

                    // NOP
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                }
            }

            // ---- ACT_W: NOP wait for tRCD (20ns needs 2 cycles at 54MHz) ----
            CASE (lit(4, ST_ACT_W)) {
                r_done <= 1'b0;
                // NOP while waiting for tRCD
                r_cs_n  <= 1'b0;
                r_ras_n <= 1'b1;
                r_cas_n <= 1'b1;
                r_wen_n <= 1'b1;
                state <= lit(4, ST_ACT);
            }

            // ---- ACT: Issue READ or WRITE command ----
            CASE (lit(4, ST_ACT)) {
                r_done <= 1'b0;

                IF (r_req_write == 1'b1) {
                    // WRITE: cs=0, ras=1, cas=0, we=0, A10=1 (auto-precharge)
                    // Col = addr[7:0]
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b0;
                    r_wen_n <= 1'b0;
                    r_addr  <= {1'b1, 2'b0, r_req_addr[7:0]};
                    r_dq_oe <= 1'b1;
                    r_dq_out <= r_req_wdata;
                    r_dqm   <= 4'b0000;
                    state <= lit(4, ST_WR);
                } ELSE {
                    // READ: cs=0, ras=1, cas=0, we=1, A10=1 (auto-precharge)
                    // Col = addr[7:0]
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b0;
                    r_wen_n <= 1'b1;
                    r_addr  <= {1'b1, 2'b0, r_req_addr[7:0]};
                    r_dq_oe <= 1'b0;
                    r_dqm   <= 4'b0000;
                    wait_cnt <= 3'd2;
                    state <= lit(4, ST_RD);
                }
            }

            // ---- RD: READ command issued, start CAS latency wait ----
            CASE (lit(4, ST_RD)) {
                r_done <= 1'b0;
                // NOP while waiting
                r_cs_n  <= 1'b0;
                r_ras_n <= 1'b1;
                r_cas_n <= 1'b1;
                r_wen_n <= 1'b1;
                r_dq_oe <= 1'b0;
                wait_cnt <= wait_cnt - 3'b1;
                state <= lit(4, ST_RD_CL);
            }

            // ---- RD_CL: Waiting for CAS latency ----
            CASE (lit(4, ST_RD_CL)) {
                // NOP
                r_cs_n  <= 1'b0;
                r_ras_n <= 1'b1;
                r_cas_n <= 1'b1;
                r_wen_n <= 1'b1;
                r_dq_oe <= 1'b0;

                IF (wait_cnt == 3'b0) {
                    // Data valid, capture it
                    r_rdata <= sdram_dq;
                    r_done  <= 1'b1;
                    state <= lit(4, ST_IDLE);
                } ELSE {
                    r_done <= 1'b0;
                    wait_cnt <= wait_cnt - 3'b1;
                }
            }

            // ---- WR: WRITE command issued ----
            CASE (lit(4, ST_WR)) {
                // NOP, clear DQ drive
                r_cs_n  <= 1'b0;
                r_ras_n <= 1'b1;
                r_cas_n <= 1'b1;
                r_wen_n <= 1'b1;
                r_dq_oe <= 1'b0;
                r_done  <= 1'b1;
                state <= lit(4, ST_IDLE);
            }

            // ---- REF: Periodic auto refresh ----
            CASE (lit(4, ST_REF)) {
                r_done <= 1'b0;
                IF (wait_cnt == 3'b0) {
                    // NOP, return to IDLE
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                    state <= lit(4, ST_IDLE);
                } ELSE {
                    // NOP while waiting
                    r_cs_n  <= 1'b0;
                    r_ras_n <= 1'b1;
                    r_cas_n <= 1'b1;
                    r_wen_n <= 1'b1;
                    wait_cnt <= wait_cnt - 3'b1;
                }
            }

            DEFAULT {
                r_done <= 1'b0;
                // NOP
                r_cs_n  <= 1'b0;
                r_ras_n <= 1'b1;
                r_cas_n <= 1'b1;
                r_wen_n <= 1'b1;
                state <= lit(4, ST_IDLE);
            }
        }
    }
@endmod
jz
// Bus-mapped SDRAM peripheral
// Bridges SIMPLE_BUS protocol to sdram_ctrl rd/wr/done interface
@module sdram_bus
    CONST {
        CLK_FREQ_MHZ = 54;
        ST_IDLE = 0;
        ST_WAIT = 1;
    }

    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        BUS SIMPLE_BUS TARGET pbus;

        // SDRAM physical interface (directly to pins)
        OUT   [1]  sdram_cke;
        OUT   [1]  sdram_cs_n;
        OUT   [1]  sdram_ras_n;
        OUT   [1]  sdram_cas_n;
        OUT   [1]  sdram_wen_n;
        OUT   [4]  sdram_dqm;
        OUT   [11] sdram_addr;
        OUT   [2]  sdram_ba;
        INOUT [32] sdram_dq;
    }

    WIRE {
        ctrl_rdata [32];
        ctrl_busy  [1];
        ctrl_done  [1];

        ctrl_cke     [1];
        ctrl_cs_n    [1];
        ctrl_ras_n   [1];
        ctrl_cas_n   [1];
        ctrl_wen_n   [1];
        ctrl_dqm     [4];
        ctrl_addr    [11];
        ctrl_ba      [2];
        ctrl_dq      [32];
    }

    REGISTER {
        state      [1] = 1'b0;
        rd_hold    [1] = 1'b0;
        wr_hold    [1] = 1'b0;
        req_addr   [21] = 21'b0;
        req_wdata  [32] = 32'b0;
    }

    @new ctrl0 sdram_ctrl {
        OVERRIDE {
            CLK_FREQ_MHZ = CLK_FREQ_MHZ;
        }
        IN  [1]  clk       = clk;
        IN  [1]  rst_n     = rst_n;
        IN  [21] addr      = req_addr;
        IN  [32] wdata     = req_wdata;
        OUT [32] rdata     = ctrl_rdata;
        IN  [1]  rd        = rd_hold;
        IN  [1]  wr        = wr_hold;
        OUT [1]  busy      = ctrl_busy;
        OUT [1]  done      = ctrl_done;
        OUT [1]  sdram_cke   = ctrl_cke;
        OUT [1]  sdram_cs_n  = ctrl_cs_n;
        OUT [1]  sdram_ras_n = ctrl_ras_n;
        OUT [1]  sdram_cas_n = ctrl_cas_n;
        OUT [1]  sdram_wen_n = ctrl_wen_n;
        OUT [4]  sdram_dqm   = ctrl_dqm;
        OUT [11] sdram_addr  = ctrl_addr;
        OUT [2]  sdram_ba    = ctrl_ba;
        INOUT [32] sdram_dq  = ctrl_dq;
    }

    ASYNCHRONOUS {
        // Pass through SDRAM physical pins
        sdram_cke   <= ctrl_cke;
        sdram_cs_n  <= ctrl_cs_n;
        sdram_ras_n <= ctrl_ras_n;
        sdram_cas_n <= ctrl_cas_n;
        sdram_wen_n <= ctrl_wen_n;
        sdram_dqm   <= ctrl_dqm;
        sdram_addr  <= ctrl_addr;
        sdram_ba    <= ctrl_ba;
        sdram_dq    = ctrl_dq;

        // Drive bus DATA on read completion
        pbus.DATA <= (pbus.VALID && pbus.CMD == CMD.READ && state == lit(1, ST_WAIT) && ctrl_done == 1'b1) ? ctrl_rdata : 32'bz;

        // DONE signaling: multi-cycle, assert when sdram_ctrl done fires
        IF (pbus.VALID) {
            IF (state == lit(1, ST_WAIT) && ctrl_done == 1'b1) {
                pbus.DONE <= 1'b1;
            } ELSE {
                pbus.DONE <= 1'b0;
            }
        } ELSE {
            pbus.DONE <= 1'bz;
        }
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        SELECT(state) {
            CASE (lit(1, ST_IDLE)) {
                IF (pbus.VALID && ctrl_busy == 1'b0) {
                    req_addr  <= pbus.ADDR[22:2];
                    req_wdata <= pbus.DATA;

                    IF (pbus.CMD == CMD.WRITE) {
                        wr_hold <= 1'b1;
                        rd_hold <= 1'b0;
                    } ELSE {
                        rd_hold <= 1'b1;
                        wr_hold <= 1'b0;
                    }
                    state <= lit(1, ST_WAIT);
                } ELSE {
                    rd_hold <= 1'b0;
                    wr_hold <= 1'b0;
                }
            }

            CASE (lit(1, ST_WAIT)) {
                // Hold rd/wr high until ctrl completes.
                // If ctrl was refreshing when we asserted, it will see
                // rd/wr=1 when it returns to IDLE and start the access.
                IF (ctrl_done == 1'b1) {
                    rd_hold <= 1'b0;
                    wr_hold <= 1'b0;
                    state <= lit(1, ST_IDLE);
                }
            }

            DEFAULT {
                rd_hold <= 1'b0;
                wr_hold <= 1'b0;
                state <= lit(1, ST_IDLE);
            }
        }
    }
@endmod
jz
@module led_out
    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        BUS SIMPLE_BUS TARGET pbus;

        OUT [6] leds;
    }

    REGISTER {
        data [32] = 32'b0;
    }

    ASYNCHRONOUS {
        leds = data[5:0];

        // Drive data only when selected, valid, and in READ mode
        pbus.DATA <= (pbus.VALID && pbus.CMD == CMD.READ) ? data : 32'bz;

        // Drive DONE only when selected and valid
        IF (pbus.VALID) {
            pbus.DONE <= 1'b1;
        } ELSE {
            pbus.DONE <= 1'bz;
        }
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        IF (pbus.VALID && pbus.CMD == CMD.WRITE) {
            data <= pbus.DATA;
        }
    }
@endmod
jz
// Bus-mapped UART peripheral (TX + RX)
// Offset 0x0 (ADDR[2]=0): Write → TX byte. Read → {30'b0, rx_has_data, tx_ready}
// Offset 0x4 (ADDR[2]=1): Read → {24'h00, rx_byte} (clears rx_has_data)
@module uart
    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        BUS SIMPLE_BUS TARGET pbus;
        OUT [1] tx;
        IN  [1] rx;
        OUT [1] irq_tx_ready;
        OUT [1] irq_rx_data;
        IN  [16] baud_div;
    }

    WIRE {
        tx_ready   [1];
        tx_wire    [1];
        rx_data_w  [8];
        rx_valid_w [1];
    }

    REGISTER {
        tx_data     [8] = 8'h00;
        tx_valid    [1] = 1'b0;
        tx_ready_d  [1] = 1'b0;
        irq_tx_r    [1] = 1'b0;

        rx_byte     [8] = 8'h00;
        rx_has_data [1] = 1'b0;
    }

    @new tx0 uart_tx {
        IN  [1] clk   = clk;
        IN  [1] rst_n = rst_n;
        IN  [8] data  = tx_data;
        IN  [1] valid = tx_valid;
        OUT [1] ready = tx_ready;
        OUT [1] tx    = tx_wire;
        IN  [16] baud_div = baud_div;
    }

    @new rx0 uart_rx {
        IN  [1] clk   = clk;
        IN  [1] rst_n = rst_n;
        IN  [1] rx    = rx;
        OUT [8] data  = rx_data_w;
        OUT [1] valid = rx_valid_w;
        IN  [16] baud_div = baud_div;
    }

    ASYNCHRONOUS {
        tx <= tx_wire;
        irq_tx_ready <= irq_tx_r;
        irq_rx_data  <= rx_has_data;

        // Drive data on READ: mux on ADDR[2]
        IF (pbus.VALID && pbus.CMD == CMD.READ) {
            IF (pbus.ADDR[2] == 1'b0) {
                pbus.DATA <= {30'd0, rx_has_data, tx_ready};
            } ELSE {
                pbus.DATA <= {24'h000000, rx_byte};
            }
        } ELSE {
            pbus.DATA <= 32'bz;
        }

        // DONE: immediate when selected
        IF (pbus.VALID) {
            pbus.DONE <= 1'b1;
        } ELSE {
            pbus.DONE <= 1'bz;
        }
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        // TX: write byte when bus writes offset 0
        IF (pbus.VALID && pbus.CMD == CMD.WRITE) {
            tx_data  <= pbus.DATA[7:0];
            tx_valid <= 1'b1;
        } ELSE {
            tx_valid <= 1'b0;
        }

        // TX ready rising-edge detect: pulse IRQ when transmitter becomes ready
        tx_ready_d <= tx_ready;
        IF (tx_ready == 1'b1 && tx_ready_d == 1'b0) {
            irq_tx_r <= 1'b1;
        } ELSE {
            irq_tx_r <= 1'b0;
        }

        // RX: latch received byte
        IF (rx_valid_w == 1'b1) {
            rx_byte     <= rx_data_w;
            rx_has_data <= 1'b1;
        } ELIF (pbus.VALID && pbus.CMD == CMD.READ && pbus.ADDR[2] == 1'b1) {
            // Clear rx_has_data when bus reads offset 0x4
            rx_has_data <= 1'b0;
        }
    }
@endmod
jz
// Simple UART Transmitter — 8N1, no FIFO
// Asserts ready when idle. When valid is pulsed with data, transmits one byte.
@module uart_tx
    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        IN  [8] data;
        IN  [1] valid;
        OUT [1] ready;
        OUT [1] tx;
        IN  [16] baud_div;
    }

    REGISTER {
        // State machine (0=IDLE, 1=START, 2=DATA, 3=STOP)
        state     [2] = 2'd0;
        baud_cnt  [16] = 16'd0;
        bit_cnt   [3] = 3'd0;
        shift     [8] = 8'hFF;

        // Outputs
        tx_out    [1] = 1'b1;
        ready_out [1] = 1'b1;
    }

    ASYNCHRONOUS {
        tx    <= tx_out;
        ready <= ready_out;
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        SELECT (state) {
            CASE (2'd0) {
                // IDLE: line high, ready for data
                tx_out <= 1'b1;
                IF (valid == 1'b1) {
                    shift     <= data;
                    baud_cnt  <= baud_div;
                    state     <= 2'd1;
                    ready_out <= 1'b0;
                } ELSE {
                    ready_out <= 1'b1;
                }
            }

            CASE (2'd1) {
                // START bit: hold TX low for one baud period
                tx_out    <= 1'b0;
                ready_out <= 1'b0;
                IF (baud_cnt == 16'd0) {
                    baud_cnt <= baud_div;
                    bit_cnt  <= 3'd0;
                    state    <= 2'd2;
                } ELSE {
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            CASE (2'd2) {
                // DATA: shift out 8 bits LSB first
                tx_out    <= shift[0];
                ready_out <= 1'b0;
                IF (baud_cnt == 16'd0) {
                    shift <= { 1'b1, shift[7:1] };
                    IF (bit_cnt == 3'd7) {
                        baud_cnt <= baud_div;
                        state    <= 2'd3;
                    } ELSE {
                        bit_cnt  <= bit_cnt + 3'd1;
                        baud_cnt <= baud_div;
                    }
                } ELSE {
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            CASE (2'd3) {
                // STOP bit: hold TX high for one baud period
                tx_out <= 1'b1;
                IF (baud_cnt == 16'd0) {
                    state     <= 2'd0;
                    ready_out <= 1'b1;
                } ELSE {
                    ready_out <= 1'b0;
                    baud_cnt  <= baud_cnt - 16'd1;
                }
            }

            DEFAULT {
                tx_out    <= 1'b1;
                ready_out <= 1'b1;
                state     <= 2'd0;
            }
        }
    }
@endmod
jz
// Simple UART Receiver — 8N1, no FIFO
// Pulses valid for 1 cycle when a byte is received
@module uart_rx
    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        IN  [1] rx;
        OUT [8] data;
        OUT [1] valid;
        IN  [16] baud_div;
    }

    REGISTER {
        // Metastability synchronizer
        rx_sync1    [1] = 1'b1;
        rx_sync2    [1] = 1'b1;

        // State machine (0=IDLE, 1=START, 2=DATA, 3=STOP)
        state       [2] = 2'd0;
        baud_cnt    [16] = 16'd0;
        bit_cnt     [3] = 3'd0;
        shift       [8] = 8'h00;

        // Output
        data_out    [8] = 8'h00;
        valid_out   [1] = 1'b0;
    }

    ASYNCHRONOUS {
        data  <= data_out;
        valid <= valid_out;
    }

    SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
        // 2-stage synchronizer for async RX input
        rx_sync1 <= rx;
        rx_sync2 <= rx_sync1;

        SELECT (state) {
            CASE (2'd0) {
                // IDLE: wait for start bit (falling edge)
                valid_out <= 1'b0;
                IF (rx_sync2 == 1'b0) {
                    baud_cnt <= {1'b0, baud_div[15:1]};
                    state <= 2'd1;
                }
            }

            CASE (2'd1) {
                // START: verify start bit at mid-point
                valid_out <= 1'b0;
                IF (baud_cnt == 16'd0) {
                    IF (rx_sync2 == 1'b0) {
                        baud_cnt <= baud_div;
                        bit_cnt <= 3'd0;
                        shift <= 8'h00;
                        state <= 2'd2;
                    } ELSE {
                        // False start
                        state <= 2'd0;
                    }
                } ELSE {
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            CASE (2'd2) {
                // DATA: sample 8 bits at mid-bit
                valid_out <= 1'b0;
                IF (baud_cnt == 16'd0) {
                    shift <= { rx_sync2, shift[7:1] };
                    IF (bit_cnt == 3'd7) {
                        baud_cnt <= baud_div;
                        state <= 2'd3;
                    } ELSE {
                        bit_cnt <= bit_cnt + 3'd1;
                        baud_cnt <= baud_div;
                    }
                } ELSE {
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            CASE (2'd3) {
                // STOP: wait for stop bit, output byte
                IF (baud_cnt == 16'd0) {
                    data_out <= shift;
                    valid_out <= 1'b1;
                    state <= 2'd0;
                } ELSE {
                    valid_out <= 1'b0;
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            DEFAULT {
                valid_out <= 1'b0;
                state <= 2'd0;
            }
        }
    }
@endmod
jz
// SD Card SPI Controller
// SPI-mode SD card interface with sector read/write.
// CPU accesses registers via SIMPLE_BUS TARGET.
// 512-byte sector buffer with auto-incrementing DATA register.
//
// Register map (base + offset, 32-bit word aligned):
//   0x00: COMMAND   [W]  bits[1:0] = 00=NONE, 01=READ, 10=WRITE
//   0x04: STATUS    [R]  bits[4:0] = {DMA_ACTIVE, SDHC, READY, ERROR, BUSY}
//   0x08: SECTOR_LO [RW] bits[15:0] = sector address low
//   0x0C: SECTOR_HI [RW] bits[15:0] = sector address high
//   0x10: (reserved)
//   0x14: DATA      [RW] bits[15:0] = buffer auto-increment access
//   0x18: IRQ_CTRL  [RW] bits[1:0] = {clear, enable}

// SD card state machine states
@global SDCST
    POWER_UP    = 5'b00000;
    SEND_CLKS   = 5'b00001;
    CMD0        = 5'b00010;
    CMD0_RESP   = 5'b00011;
    CMD8        = 5'b00100;
    CMD8_RESP   = 5'b00101;
    CMD55       = 5'b00110;
    CMD55_RESP  = 5'b00111;
    ACMD41      = 5'b01000;
    ACMD41_RESP = 5'b01001;
    CMD58       = 5'b01010;
    CMD58_RESP  = 5'b01011;
    IDLE        = 5'b01100;
    READ_CMD    = 5'b01101;
    READ_RESP   = 5'b01110;
    READ_TOKEN  = 5'b01111;
    READ_DATA   = 5'b10000;
    READ_CRC    = 5'b10001;
    READ_DONE   = 5'b10011;
    WRITE_CMD   = 5'b10100;
    WRITE_RESP  = 5'b10101;
    WRITE_TOKEN = 5'b10110;
    WRITE_DATA  = 5'b10111;
    WRITE_CRC   = 5'b11000;
    WRITE_DRESP = 5'b11001;
    WRITE_BUSY  = 5'b11010;
    WRITE_DONE  = 5'b11011;
    ERROR       = 5'b11100;
    CS_GAP      = 5'b11101;
@endglob

// SD card register addresses
@global SDREG
    COMMAND     = 3'b000;
    STATUS      = 3'b001;
    SECTOR_LO   = 3'b010;
    SECTOR_HI   = 3'b011;
    DATA        = 3'b101;
    IRQ_CTRL    = 3'b110;
@endglob

// SD card commands
@global SDCMD
    NONE         = 2'b00;
    READ_SECTOR  = 2'b01;
    WRITE_SECTOR = 2'b10;
@endglob

@module sdcard
    CONST {
        // SPI clock dividers: system_clk / (2 * (div+1))
        // Slow: 74.25MHz / (2*186) = ~200 KHz (init)
        // Fast: 74.25MHz / (2*5)   = ~7.4 MHz (data)
        SPI_DIV_SLOW = 185;
        SPI_DIV_FAST = 4;
    }

    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        OUT [1] irq;
        BUS SIMPLE_BUS TARGET pbus;

        // SPI physical pins
        OUT [1] sd_clk;
        OUT [1] sd_mosi;
        IN  [1] sd_miso;
        OUT [1] sd_cs_n;
    }

    // 256-word (512-byte) sector buffer
    MEM {
        buffer [16] [256] = 16'h0000 {
            OUT rd ASYNC;
            IN  wr;
        };
    }

    WIRE {
        read_data [32];

        // Precomputed CMD17/CMD24 argument bytes
        // SDHC: sector number directly; SDSC: byte address (sector << 9)
        cmd_arg_b3 [8];
        cmd_arg_b2 [8];
        cmd_arg_b1 [8];
        cmd_arg_b0 [8];
    }

    REGISTER {
        // SD Card State Machine
        state         [5] = 5'b00000;  // SDCST.POWER_UP

        // SPI Engine
        spi_div       [8] = 8'd185;
        spi_div_cnt   [8] = 8'b0;
        spi_clk_reg   [1] = 1'b0;
        spi_shift_out [8] = 8'hFF;
        spi_shift_in  [8] = 8'b0;
        spi_bit_cnt   [4] = 4'b0;
        spi_busy      [1] = 1'b0;
        spi_rx_data   [8] = 8'b0;

        // MISO synchronizer
        miso_sync1    [1] = 1'b1;
        miso_sync2    [1] = 1'b1;

        // CS control
        cs_n          [1] = 1'b1;

        // SD Protocol
        powerup_cnt   [20] = 20'b0;
        gap_next_state [5] = 5'b0;
        clock_cnt     [8]  = 8'b0;
        send_cmd_phase [3] = 3'b0;
        resp_wait_cnt  [16] = 16'b0;
        resp_byte_cnt [4]  = 4'b0;
        r1_response   [8]  = 8'b0;
        retry_cnt     [16] = 16'b0;
        sdhc_flag     [1]  = 1'b0;

        // Sector Read/Write
        sector_lo     [16] = 16'b0;
        sector_hi     [16] = 16'b0;
        byte_cnt      [10] = 10'b0;
        byte_pair_lo  [8]  = 8'b0;
        buf_wr_addr   [8]  = 8'b0;
        buf_cpu_addr  [8]  = 8'b0;

        // CPU Command/Status
        command       [2]  = 2'b0;
        busy          [1]  = 1'b0;
        error         [1]  = 1'b0;
        ready         [1]  = 1'b0;
        irq_enable    [1]  = 1'b0;
        irq_status    [1]  = 1'b0;
        irq_clear_req [1]  = 1'b0;

        // Buffer Write Pipeline (single write point, 1-cycle latency)
        buf_wr_en     [1]  = 1'b0;
        buf_wr_addr_r [8]  = 8'b0;
        buf_wr_data_r [16] = 16'b0;

        // Bus Interface Pipeline
        pending_read  [1]  = 1'b0;
        read_reg_sel  [3]  = 3'b0;
        data_ready    [1]  = 1'b0;
    }

    ASYNCHRONOUS {
        // SPI Pin Outputs
        sd_clk  <= spi_clk_reg;
        sd_mosi <= spi_shift_out[7];
        sd_cs_n <= cs_n;

        // CPU Bus Read Data Mux
        SELECT(read_reg_sel) {
            CASE SDREG.COMMAND {
                read_data <= {30'b0, command};
            }
            CASE SDREG.STATUS {
                read_data <= {27'b0, 1'b0, sdhc_flag, ready, error, busy};
            }
            CASE SDREG.SECTOR_LO {
                read_data <= {16'b0, sector_lo};
            }
            CASE SDREG.SECTOR_HI {
                read_data <= {16'b0, sector_hi};
            }
            CASE SDREG.DATA {
                read_data <= {16'b0, buffer.rd[buf_cpu_addr]};
            }
            CASE SDREG.IRQ_CTRL {
                read_data <= {30'b0, irq_status, irq_enable};
            }
            DEFAULT {
                read_data <= 32'b0;
            }
        }

        // Drive bus data on read
        pbus.DATA <= (pbus.VALID && pbus.CMD == CMD.READ && data_ready == 1'b1) ? read_data : 32'bz;

        // DONE signaling
        IF (pbus.VALID) {
            IF (pbus.CMD == CMD.WRITE || data_ready == 1'b1) {
                pbus.DONE <= 1'b1;
            } ELSE {
                pbus.DONE <= 1'b0;
            }
        } ELSE {
            pbus.DONE <= 1'bz;
        }

        // CMD argument bytes (SDHC=sector, SDSC=sector<<9)
        IF (sdhc_flag == 1'b1) {
            cmd_arg_b3 <= sector_hi[15:8];
            cmd_arg_b2 <= sector_hi[7:0];
            cmd_arg_b1 <= sector_lo[15:8];
            cmd_arg_b0 <= sector_lo[7:0];
        } ELSE {
            cmd_arg_b3 <= {sector_hi[6:0], sector_lo[15]};
            cmd_arg_b2 <= sector_lo[14:7];
            cmd_arg_b1 <= {sector_lo[6:0], 1'b0};
            cmd_arg_b0 <= 8'h00;
        }

        // IRQ output
        IF (irq_enable == 1'b1 && irq_status == 1'b1) {
            irq <= 1'b1;
        } ELSE {
            irq <= 1'b0;
        }
    }

    SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
        // =====================
        // MISO Synchronizer (always runs)
        // =====================
        miso_sync1 <= sd_miso;
        miso_sync2 <= miso_sync1;

        // =====================
        // Bus Read Pipeline
        // =====================
        IF (data_ready == 1'b1) {
            data_ready <= 1'b0;
        } ELIF (pending_read == 1'b1) {
            data_ready <= 1'b1;
            pending_read <= 1'b0;
        } ELIF (pbus.VALID && pbus.CMD == CMD.READ) {
            pending_read <= 1'b1;
            read_reg_sel <= pbus.ADDR[4:2];
        }

        // =====================
        // Combined: Buffer Write + SPI Engine + Bus Writes + State Machine
        // Priority: Buffer Write > SPI > Bus Write > State Machine
        // =====================
        IF (buf_wr_en == 1'b1) {
            // --- Pending buffer write (1-cycle delayed) ---
            buffer.wr[buf_wr_addr_r] <= buf_wr_data_r;
            buf_wr_en <= 1'b0;

        } ELIF (spi_busy == 1'b1) {
            // --- SPI shift register ---
            IF (spi_div_cnt == 8'b0) {
                spi_div_cnt <= spi_div;
                IF (spi_clk_reg == 1'b0) {
                    // Rising edge: sample MISO
                    spi_clk_reg <= 1'b1;
                    spi_shift_in <= {spi_shift_in[6:0], miso_sync2};
                } ELSE {
                    // Falling edge
                    spi_clk_reg <= 1'b0;
                    IF (spi_bit_cnt == 4'd7) {
                        // Transfer complete
                        spi_busy <= 1'b0;
                        spi_rx_data <= spi_shift_in;
                        spi_bit_cnt <= 4'b0;
                    } ELSE {
                        spi_bit_cnt <= spi_bit_cnt + 4'b1;
                        spi_shift_out <= {spi_shift_out[6:0], 1'b1};
                    }
                }
            } ELSE {
                spi_div_cnt <= spi_div_cnt - 8'b1;
            }

        } ELIF (pbus.VALID && pbus.CMD == CMD.WRITE) {
            // --- CPU register writes (when SPI idle) ---
            SELECT(pbus.ADDR[4:2]) {
                CASE SDREG.COMMAND {
                    command <= pbus.DATA[1:0];
                }
                CASE SDREG.SECTOR_LO {
                    sector_lo <= pbus.DATA[15:0];
                }
                CASE SDREG.SECTOR_HI {
                    sector_hi <= pbus.DATA[15:0];
                }
                CASE SDREG.DATA {
                    buf_wr_en <= 1'b1;
                    buf_wr_addr_r <= buf_cpu_addr;
                    buf_wr_data_r <= pbus.DATA[15:0];
                    buf_cpu_addr <= buf_cpu_addr + 8'b1;
                }
                CASE SDREG.IRQ_CTRL {
                    irq_enable <= pbus.DATA[0];
                    IF (pbus.DATA[1] == 1'b1) {
                        irq_clear_req <= 1'b1;
                    }
                }
                DEFAULT {
                }
            }

        } ELSE {
            // --- State Machine (SPI idle, no bus write) ---
            SELECT(state) {
                // ----- Power-Up Delay -----
                CASE SDCST.POWER_UP {
                    cs_n <= 1'b1;
                    busy <= 1'b1;
                    IF (powerup_cnt == 20'd742500) {
                        powerup_cnt <= 20'b0;
                        clock_cnt <= 8'b0;
                        state <= SDCST.SEND_CLKS;
                    } ELSE {
                        powerup_cnt <= powerup_cnt + 20'b1;
                    }
                }

                // ----- Send 80 clocks with CS high -----
                CASE SDCST.SEND_CLKS {
                    IF (clock_cnt == 8'd20) {
                        state <= SDCST.CMD0;
                        send_cmd_phase <= 3'b0;
                        retry_cnt <= 16'b0;
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        clock_cnt <= clock_cnt + 8'b1;
                    }
                }

                // ----- CMD0: GO_IDLE_STATE -----
                CASE SDCST.CMD0 {
                    SELECT(send_cmd_phase) {
                        CASE 3'd0 {
                            cs_n <= 1'b0;
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h40;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd1;
                        }
                        CASE 3'd1 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd2;
                        }
                        CASE 3'd2 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd3;
                        }
                        CASE 3'd3 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd4;
                        }
                        CASE 3'd4 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd5;
                        }
                        CASE 3'd5 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h95;  // CRC for CMD0
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd6;
                        }
                        CASE 3'd6 {
                            resp_wait_cnt <= 16'b0;
                            state <= SDCST.CMD0_RESP;
                        }
                        DEFAULT {
                            send_cmd_phase <= 3'd0;
                        }
                    }
                }

                // ----- CMD0 Response -----
                CASE SDCST.CMD0_RESP {
                    IF (spi_rx_data[7] == 1'b0 && resp_wait_cnt != 16'b0) {
                        r1_response <= spi_rx_data;
                        cs_n <= 1'b1;
                        IF (spi_rx_data == 8'h01) {
                            gap_next_state <= SDCST.CMD8;
                            state <= SDCST.CS_GAP;
                        } ELSE {
                            IF (retry_cnt == 16'd255) {
                                state <= SDCST.ERROR;
                            } ELSE {
                                retry_cnt <= retry_cnt + 16'b1;
                                gap_next_state <= SDCST.CMD0;
                                state <= SDCST.CS_GAP;
                            }
                        }
                    } ELIF (resp_wait_cnt == 16'd64) {
                        cs_n <= 1'b1;
                        IF (retry_cnt == 16'd255) {
                            state <= SDCST.ERROR;
                        } ELSE {
                            retry_cnt <= retry_cnt + 16'b1;
                            gap_next_state <= SDCST.CMD0;
                            state <= SDCST.CS_GAP;
                        }
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        resp_wait_cnt <= resp_wait_cnt + 16'b1;
                    }
                }

                // ----- CMD8: SEND_IF_COND -----
                CASE SDCST.CMD8 {
                    SELECT(send_cmd_phase) {
                        CASE 3'd0 {
                            cs_n <= 1'b0;
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h48;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd1;
                        }
                        CASE 3'd1 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd2;
                        }
                        CASE 3'd2 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd3;
                        }
                        CASE 3'd3 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h01;  // VHS
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd4;
                        }
                        CASE 3'd4 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'hAA;  // Check pattern
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd5;
                        }
                        CASE 3'd5 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h87;  // CRC for CMD8
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd6;
                        }
                        CASE 3'd6 {
                            resp_wait_cnt <= 16'b0;
                            resp_byte_cnt <= 4'd0;
                            state <= SDCST.CMD8_RESP;
                        }
                        DEFAULT {
                            send_cmd_phase <= 3'd0;
                        }
                    }
                }

                // ----- CMD8 Response: R7 -----
                CASE SDCST.CMD8_RESP {
                    IF (resp_byte_cnt == 4'd0) {
                        IF (spi_rx_data[7] == 1'b0 && resp_wait_cnt != 16'b0) {
                            r1_response <= spi_rx_data;
                            IF (spi_rx_data == 8'h01) {
                                resp_byte_cnt <= 4'd1;
                                spi_busy <= 1'b1;
                                spi_shift_out <= 8'hFF;
                                spi_shift_in <= 8'b0;
                                spi_bit_cnt <= 4'b0;
                                spi_clk_reg <= 1'b0;
                                spi_div_cnt <= spi_div;
                            } ELSE {
                                cs_n <= 1'b1;
                                state <= SDCST.ERROR;
                            }
                        } ELIF (resp_wait_cnt == 16'd64) {
                            cs_n <= 1'b1;
                            state <= SDCST.ERROR;
                        } ELSE {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'hFF;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            resp_wait_cnt <= resp_wait_cnt + 16'b1;
                        }
                    } ELIF (resp_byte_cnt == 4'd4) {
                        cs_n <= 1'b1;
                        IF (spi_rx_data == 8'hAA) {
                            retry_cnt <= 16'b0;
                            gap_next_state <= SDCST.CMD55;
                            state <= SDCST.CS_GAP;
                        } ELSE {
                            state <= SDCST.ERROR;
                        }
                    } ELSE {
                        resp_byte_cnt <= resp_byte_cnt + 4'b1;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    }
                }

                // ----- CMD55: APP_CMD prefix -----
                CASE SDCST.CMD55 {
                    SELECT(send_cmd_phase) {
                        CASE 3'd0 {
                            cs_n <= 1'b0;
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h77;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd1;
                        }
                        CASE 3'd1 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd2;
                        }
                        CASE 3'd2 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd3;
                        }
                        CASE 3'd3 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd4;
                        }
                        CASE 3'd4 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd5;
                        }
                        CASE 3'd5 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h65;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd6;
                        }
                        CASE 3'd6 {
                            resp_wait_cnt <= 16'b0;
                            state <= SDCST.CMD55_RESP;
                        }
                        DEFAULT {
                            send_cmd_phase <= 3'd0;
                        }
                    }
                }

                // ----- CMD55 Response -----
                CASE SDCST.CMD55_RESP {
                    IF (spi_rx_data[7] == 1'b0 && resp_wait_cnt != 16'b0) {
                        r1_response <= spi_rx_data;
                        cs_n <= 1'b1;
                        gap_next_state <= SDCST.ACMD41;
                        state <= SDCST.CS_GAP;
                    } ELIF (resp_wait_cnt == 16'd64) {
                        cs_n <= 1'b1;
                        state <= SDCST.ERROR;
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        resp_wait_cnt <= resp_wait_cnt + 16'b1;
                    }
                }

                // ----- ACMD41: SD_SEND_OP_COND -----
                CASE SDCST.ACMD41 {
                    SELECT(send_cmd_phase) {
                        CASE 3'd0 {
                            cs_n <= 1'b0;
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h69;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd1;
                        }
                        CASE 3'd1 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h40;  // HCS bit
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd2;
                        }
                        CASE 3'd2 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd3;
                        }
                        CASE 3'd3 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd4;
                        }
                        CASE 3'd4 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd5;
                        }
                        CASE 3'd5 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h77;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd6;
                        }
                        CASE 3'd6 {
                            resp_wait_cnt <= 16'b0;
                            state <= SDCST.ACMD41_RESP;
                        }
                        DEFAULT {
                            send_cmd_phase <= 3'd0;
                        }
                    }
                }

                // ----- ACMD41 Response -----
                CASE SDCST.ACMD41_RESP {
                    IF (spi_rx_data[7] == 1'b0 && resp_wait_cnt != 16'b0) {
                        r1_response <= spi_rx_data;
                        cs_n <= 1'b1;
                        IF (spi_rx_data == 8'h00) {
                            gap_next_state <= SDCST.CMD58;
                            state <= SDCST.CS_GAP;
                        } ELSE {
                            IF (retry_cnt == 16'd1000) {
                                state <= SDCST.ERROR;
                            } ELSE {
                                retry_cnt <= retry_cnt + 16'b1;
                                gap_next_state <= SDCST.CMD55;
                                state <= SDCST.CS_GAP;
                            }
                        }
                    } ELIF (resp_wait_cnt == 16'd64) {
                        cs_n <= 1'b1;
                        IF (retry_cnt == 16'd1000) {
                            state <= SDCST.ERROR;
                        } ELSE {
                            retry_cnt <= retry_cnt + 16'b1;
                            gap_next_state <= SDCST.CMD55;
                            state <= SDCST.CS_GAP;
                        }
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        resp_wait_cnt <= resp_wait_cnt + 16'b1;
                    }
                }

                // ----- CMD58: READ_OCR -----
                CASE SDCST.CMD58 {
                    SELECT(send_cmd_phase) {
                        CASE 3'd0 {
                            cs_n <= 1'b0;
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h7A;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd1;
                        }
                        CASE 3'd1 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd2;
                        }
                        CASE 3'd2 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd3;
                        }
                        CASE 3'd3 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd4;
                        }
                        CASE 3'd4 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h00;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd5;
                        }
                        CASE 3'd5 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'hFD;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd6;
                        }
                        CASE 3'd6 {
                            resp_wait_cnt <= 16'b0;
                            resp_byte_cnt <= 4'd0;
                            state <= SDCST.CMD58_RESP;
                        }
                        DEFAULT {
                            send_cmd_phase <= 3'd0;
                        }
                    }
                }

                // ----- CMD58 Response: R3 -----
                CASE SDCST.CMD58_RESP {
                    IF (resp_byte_cnt == 4'd0) {
                        IF (spi_rx_data[7] == 1'b0 && resp_wait_cnt != 16'b0) {
                            r1_response <= spi_rx_data;
                            IF (spi_rx_data == 8'h00) {
                                resp_byte_cnt <= 4'd1;
                                spi_busy <= 1'b1;
                                spi_shift_out <= 8'hFF;
                                spi_shift_in <= 8'b0;
                                spi_bit_cnt <= 4'b0;
                                spi_clk_reg <= 1'b0;
                                spi_div_cnt <= spi_div;
                            } ELSE {
                                cs_n <= 1'b1;
                                state <= SDCST.ERROR;
                            }
                        } ELIF (resp_wait_cnt == 16'd64) {
                            cs_n <= 1'b1;
                            state <= SDCST.ERROR;
                        } ELSE {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'hFF;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            resp_wait_cnt <= resp_wait_cnt + 16'b1;
                        }
                    } ELIF (resp_byte_cnt == 4'd1) {
                        sdhc_flag <= spi_rx_data[6];
                        resp_byte_cnt <= 4'd2;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    } ELIF (resp_byte_cnt == 4'd4) {
                        cs_n <= 1'b1;
                        spi_div <= 8'd4;  // Switch to fast clock
                        busy <= 1'b0;
                        ready <= 1'b1;
                        state <= SDCST.IDLE;
                    } ELSE {
                        resp_byte_cnt <= resp_byte_cnt + 4'b1;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    }
                }

                // ----- IDLE -----
                CASE SDCST.IDLE {
                    // IRQ clear handshake
                    IF (irq_clear_req == 1'b1) {
                        irq_status <= 1'b0;
                        irq_clear_req <= 1'b0;
                    }
                    // Command dispatch
                    IF (command == SDCMD.READ_SECTOR) {
                        command <= SDCMD.NONE;
                        error <= 1'b0;
                        busy <= 1'b1;
                        buf_wr_addr <= 8'b0;
                        byte_cnt <= 10'b0;
                        gap_next_state <= SDCST.READ_CMD;
                        state <= SDCST.CS_GAP;
                    } ELIF (command == SDCMD.WRITE_SECTOR) {
                        command <= SDCMD.NONE;
                        error <= 1'b0;
                        busy <= 1'b1;
                        byte_cnt <= 10'b0;
                        buf_cpu_addr <= 8'b0;
                        gap_next_state <= SDCST.WRITE_CMD;
                        state <= SDCST.CS_GAP;
                    } ELSE {
                        // Auto-increment buffer pointer after DATA read
                        IF (data_ready == 1'b1 && read_reg_sel == SDREG.DATA) {
                            buf_cpu_addr <= buf_cpu_addr + 8'b1;
                        }
                    }
                }

                // ----- READ_CMD: CMD17 -----
                CASE SDCST.READ_CMD {
                    SELECT(send_cmd_phase) {
                        CASE 3'd0 {
                            cs_n <= 1'b0;
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h51;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd1;
                        }
                        CASE 3'd1 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= cmd_arg_b3;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd2;
                        }
                        CASE 3'd2 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= cmd_arg_b2;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd3;
                        }
                        CASE 3'd3 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= cmd_arg_b1;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd4;
                        }
                        CASE 3'd4 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= cmd_arg_b0;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd5;
                        }
                        CASE 3'd5 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'hFF;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd6;
                        }
                        CASE 3'd6 {
                            resp_wait_cnt <= 16'b0;
                            state <= SDCST.READ_RESP;
                        }
                        DEFAULT {
                            send_cmd_phase <= 3'd0;
                        }
                    }
                }

                // ----- READ_RESP -----
                CASE SDCST.READ_RESP {
                    IF (spi_rx_data[7] == 1'b0 && resp_wait_cnt != 16'b0) {
                        r1_response <= spi_rx_data;
                        IF (spi_rx_data == 8'h00) {
                            resp_wait_cnt <= 16'b0;
                            state <= SDCST.READ_TOKEN;
                        } ELSE {
                            cs_n <= 1'b1;
                            error <= 1'b1;
                            busy <= 1'b0;
                            state <= SDCST.IDLE;
                        }
                    } ELIF (resp_wait_cnt == 16'd64) {
                        cs_n <= 1'b1;
                        error <= 1'b1;
                        busy <= 1'b0;
                        state <= SDCST.IDLE;
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        resp_wait_cnt <= resp_wait_cnt + 16'b1;
                    }
                }

                // ----- READ_TOKEN -----
                CASE SDCST.READ_TOKEN {
                    IF (spi_rx_data == 8'hFE && resp_wait_cnt != 16'b0) {
                        byte_cnt <= 10'b0;
                        buf_wr_addr <= 8'b0;
                        state <= SDCST.READ_DATA;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    } ELIF (resp_wait_cnt == 16'd4096) {
                        cs_n <= 1'b1;
                        error <= 1'b1;
                        busy <= 1'b0;
                        state <= SDCST.IDLE;
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        resp_wait_cnt <= resp_wait_cnt + 16'b1;
                    }
                }

                // ----- READ_DATA -----
                CASE SDCST.READ_DATA {
                    IF (byte_cnt == 10'd512) {
                        byte_cnt <= 10'b0;
                        state <= SDCST.READ_CRC;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    } ELSE {
                        IF (byte_cnt[0] == 1'b0) {
                            byte_pair_lo <= spi_rx_data;
                        } ELSE {
                            buf_wr_en <= 1'b1;
                            buf_wr_addr_r <= buf_wr_addr;
                            buf_wr_data_r <= {spi_rx_data, byte_pair_lo};
                            buf_wr_addr <= buf_wr_addr + 8'b1;
                        }
                        byte_cnt <= byte_cnt + 10'b1;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    }
                }

                // ----- READ_CRC -----
                CASE SDCST.READ_CRC {
                    IF (byte_cnt == 10'd2) {
                        cs_n <= 1'b1;
                        state <= SDCST.READ_DONE;
                    } ELSE {
                        byte_cnt <= byte_cnt + 10'b1;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    }
                }

                // ----- READ_DONE -----
                CASE SDCST.READ_DONE {
                    busy <= 1'b0;
                    irq_status <= 1'b1;
                    irq_clear_req <= 1'b0;
                    buf_cpu_addr <= 8'b0;
                    state <= SDCST.IDLE;
                }

                // ----- WRITE_CMD: CMD24 -----
                CASE SDCST.WRITE_CMD {
                    SELECT(send_cmd_phase) {
                        CASE 3'd0 {
                            cs_n <= 1'b0;
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'h58;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd1;
                        }
                        CASE 3'd1 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= cmd_arg_b3;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd2;
                        }
                        CASE 3'd2 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= cmd_arg_b2;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd3;
                        }
                        CASE 3'd3 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= cmd_arg_b1;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd4;
                        }
                        CASE 3'd4 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= cmd_arg_b0;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd5;
                        }
                        CASE 3'd5 {
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'hFF;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                            send_cmd_phase <= 3'd6;
                        }
                        CASE 3'd6 {
                            resp_wait_cnt <= 16'b0;
                            state <= SDCST.WRITE_RESP;
                        }
                        DEFAULT {
                            send_cmd_phase <= 3'd0;
                        }
                    }
                }

                // ----- WRITE_RESP -----
                CASE SDCST.WRITE_RESP {
                    IF (spi_rx_data[7] == 1'b0 && resp_wait_cnt != 16'b0) {
                        r1_response <= spi_rx_data;
                        IF (spi_rx_data == 8'h00) {
                            state <= SDCST.WRITE_TOKEN;
                        } ELSE {
                            cs_n <= 1'b1;
                            error <= 1'b1;
                            busy <= 1'b0;
                            state <= SDCST.IDLE;
                        }
                    } ELIF (resp_wait_cnt == 16'd64) {
                        cs_n <= 1'b1;
                        error <= 1'b1;
                        busy <= 1'b0;
                        state <= SDCST.IDLE;
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        resp_wait_cnt <= resp_wait_cnt + 16'b1;
                    }
                }

                // ----- WRITE_TOKEN -----
                CASE SDCST.WRITE_TOKEN {
                    spi_busy <= 1'b1;
                    spi_shift_out <= 8'hFE;
                    spi_shift_in <= 8'b0;
                    spi_bit_cnt <= 4'b0;
                    spi_clk_reg <= 1'b0;
                    spi_div_cnt <= spi_div;
                    byte_cnt <= 10'b0;
                    buf_cpu_addr <= 8'b0;
                    state <= SDCST.WRITE_DATA;
                }

                // ----- WRITE_DATA -----
                CASE SDCST.WRITE_DATA {
                    IF (byte_cnt == 10'd512) {
                        byte_cnt <= 10'b0;
                        state <= SDCST.WRITE_CRC;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    } ELSE {
                        IF (byte_cnt[0] == 1'b0) {
                            spi_shift_out <= buffer.rd[buf_cpu_addr][7:0];
                        } ELSE {
                            spi_shift_out <= buffer.rd[buf_cpu_addr][15:8];
                            buf_cpu_addr <= buf_cpu_addr + 8'b1;
                        }
                        spi_busy <= 1'b1;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        byte_cnt <= byte_cnt + 10'b1;
                    }
                }

                // ----- WRITE_CRC -----
                CASE SDCST.WRITE_CRC {
                    IF (byte_cnt == 10'd2) {
                        resp_wait_cnt <= 16'b0;
                        state <= SDCST.WRITE_DRESP;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    } ELSE {
                        byte_cnt <= byte_cnt + 10'b1;
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    }
                }

                // ----- WRITE_DRESP -----
                CASE SDCST.WRITE_DRESP {
                    IF (spi_rx_data != 8'hFF && resp_wait_cnt != 16'b0) {
                        IF (spi_rx_data[3:1] == 3'b010) {
                            state <= SDCST.WRITE_BUSY;
                            spi_busy <= 1'b1;
                            spi_shift_out <= 8'hFF;
                            spi_shift_in <= 8'b0;
                            spi_bit_cnt <= 4'b0;
                            spi_clk_reg <= 1'b0;
                            spi_div_cnt <= spi_div;
                        } ELSE {
                            cs_n <= 1'b1;
                            error <= 1'b1;
                            busy <= 1'b0;
                            state <= SDCST.IDLE;
                        }
                    } ELIF (resp_wait_cnt == 16'd64) {
                        cs_n <= 1'b1;
                        error <= 1'b1;
                        busy <= 1'b0;
                        state <= SDCST.IDLE;
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                        resp_wait_cnt <= resp_wait_cnt + 16'b1;
                    }
                }

                // ----- WRITE_BUSY -----
                CASE SDCST.WRITE_BUSY {
                    IF (spi_rx_data != 8'h00) {
                        cs_n <= 1'b1;
                        state <= SDCST.WRITE_DONE;
                    } ELSE {
                        spi_busy <= 1'b1;
                        spi_shift_out <= 8'hFF;
                        spi_shift_in <= 8'b0;
                        spi_bit_cnt <= 4'b0;
                        spi_clk_reg <= 1'b0;
                        spi_div_cnt <= spi_div;
                    }
                }

                // ----- WRITE_DONE -----
                CASE SDCST.WRITE_DONE {
                    busy <= 1'b0;
                    irq_status <= 1'b1;
                    irq_clear_req <= 1'b0;
                    state <= SDCST.IDLE;
                }

                // ----- CS_GAP: send 8 clocks with CS HIGH between commands -----
                CASE SDCST.CS_GAP {
                    spi_busy <= 1'b1;
                    spi_shift_out <= 8'hFF;
                    spi_shift_in <= 8'b0;
                    spi_bit_cnt <= 4'b0;
                    spi_clk_reg <= 1'b0;
                    spi_div_cnt <= spi_div;
                    state <= gap_next_state;
                    send_cmd_phase <= 3'd0;
                }

                // ----- ERROR -----
                CASE SDCST.ERROR {
                    busy <= 1'b0;
                    error <= 1'b1;
                    cs_n <= 1'b1;
                }

                DEFAULT {
                    state <= SDCST.ERROR;
                }
            }
        }
    }
@endmod
jz
// Video Output Pipeline
// Dual-mode: 1280x720 @60Hz or 1920x1080 @30Hz DVI/HDMI output.
// Mode 0 (720p): 80x22 characters, 16x32 pixel glyphs
// Mode 1 (1080p): 120x33 characters, 16x32 pixel glyphs
// RGB565 FG+BG per cell. 8-pixel vertical offset from top.
//
// Pipeline (all on pixel_clk):
//   Stage 0 (comb): compute cell_addr → drives vram_addr to terminal
//   Edge 0→1: terminal registers BSRAM addr; pipeline regs capture timing
//   Edge 1→2: char/attr valid from terminal; font ROM addr registered
//   Edge 2→3: font bitmap latched into font_data; p3 captures pixel_col/attr
//   Stage 3 (comb): font_bit + pixel color computed from font_data + p3
//   Edge 3→4: pixel color registered → TMDS input
//   Edge 4→5: TMDS encoder registers internally
@module video_out
    PORT {
        IN  [1]  pixel_clk;
        IN  [1]  rst_n;
        IN  [1]  video_mode;

        // Framebuffer read interface (pixel_clk domain)
        OUT [12] vram_addr;
        IN  [8]  vram_char;
        IN  [32] vram_attr;

        // Cursor info from terminal
        IN  [12] cursor_pos;
        IN  [3]  cursor_style;

        // TMDS outputs
        OUT [10] tmds_clk;
        OUT [10] tmds_d0;
        OUT [10] tmds_d1;
        OUT [10] tmds_d2;
    }

    CONST {
        V_OFFSET = 8;
    }

    // Font ROM: 256 glyphs x 32 rows x 16 bits = 8192 entries
    MEM(TYPE=BLOCK) {
        font_rom [16] [8192] = @file("../out/font_16x32.hex") {
            OUT font_rd SYNC;
        };
    }

    // Video timing generator
    @new vt0 video_timing {
        IN  [1]  clk            = pixel_clk;
        IN  [1]  rst_n          = rst_n;
        IN  [1]  mode           = mode_sync2;
        OUT [1]  hsync          = vt_hsync;
        OUT [1]  vsync          = vt_vsync;
        OUT [1]  display_enable = vt_de;
        OUT [12] x_pos          = vt_x;
        OUT [11] y_pos          = vt_y;
    }

    // TMDS encoders
    @new enc_b tmds_encoder {
        IN  [1]  clk            = pixel_clk;
        IN  [1]  rst_n          = rst_n;
        IN  [8]  data_in        = p4_b;
        IN  [1]  c0             = p4_hsync;
        IN  [1]  c1             = p4_vsync;
        IN  [1]  display_enable = p4_de;
        OUT [10] tmds_out       = tmds_d0;
    }

    @new enc_g tmds_encoder {
        IN  [1]  clk            = pixel_clk;
        IN  [1]  rst_n          = rst_n;
        IN  [8]  data_in        = p4_g;
        IN  [1]  c0             = 1'b0;
        IN  [1]  c1             = 1'b0;
        IN  [1]  display_enable = p4_de;
        OUT [10] tmds_out       = tmds_d1;
    }

    @new enc_r tmds_encoder {
        IN  [1]  clk            = pixel_clk;
        IN  [1]  rst_n          = rst_n;
        IN  [8]  data_in        = p4_r;
        IN  [1]  c0             = 1'b0;
        IN  [1]  c1             = 1'b0;
        IN  [1]  display_enable = p4_de;
        OUT [10] tmds_out       = tmds_d2;
    }

    WIRE {
        // Timing signals from video_timing
        vt_hsync [1];
        vt_vsync [1];
        vt_de    [1];
        vt_x     [12];
        vt_y     [11];

        // Cell coordinates (combinational)
        col       [7];     // x / 16
        row       [6];     // (y - V_OFFSET) / 32
        scanline  [5];     // (y - V_OFFSET) % 32
        y_adj     [11];    // y - V_OFFSET

        // Row multiplication intermediates
        row_x128  [13];
        row_x64   [13];
        row_x16   [11];
        row_x8    [10];
        cell_addr [12];

        // Text area flag
        in_text_area [1];

        // Cursor matching
        is_cursor_cell [1];
        cursor_active  [1];  // cursor visible (accounts for blink)
        in_cursor_zone [1];  // scanline is in cursor region

        // Font pixel selection
        font_bit  [1];

        // RGB565 fields from latched attr
        fg_r5 [5]; fg_g6 [6]; fg_b5 [5];
        bg_r5 [5]; bg_g6 [6]; bg_b5 [5];

        // Pixel color output (comb)
        pixel_r [8];
        pixel_g [8];
        pixel_b [8];

        // Max text area bounds (mode-dependent)
        y_text_max [11];
        x_active   [12];
        col_max    [7];

        // Cursor pixel output override
        cursor_pixel [1];
    }

    REGISTER {
        // CDC synchronizer for video_mode (sys_clk → pixel_clk)
        mode_sync1   [1]  = 1'b0;
        mode_sync2   [1]  = 1'b0;

        // Blink timer: 0.5s at 74.25MHz = 37,125,000 cycles ~= 2^25
        blink_counter [25] = 25'b0;
        blink_on      [1]  = 1'b1;

        // Pipeline cursor cell match and scanline through stages
        p1_is_cursor  [1]  = 1'b0;
        p2_is_cursor  [1]  = 1'b0;
        p2_scanline   [5]  = 5'b0;
        p3_is_cursor  [1]  = 1'b0;
        p3_scanline   [5]  = 5'b0;

        // Pipeline stage 1: timing delayed 1 cycle
        p1_pixel_col [4]  = 4'b0;
        p1_hsync     [1]  = 1'b0;
        p1_vsync     [1]  = 1'b0;
        p1_de        [1]  = 1'b0;
        p1_in_text   [1]  = 1'b0;
        p1_scanline  [5]  = 5'b0;

        // Pipeline stage 2: char/attr latched, font addr sent
        p2_pixel_col [4]  = 4'b0;
        p2_hsync     [1]  = 1'b0;
        p2_vsync     [1]  = 1'b0;
        p2_de        [1]  = 1'b0;
        p2_in_text   [1]  = 1'b0;
        p2_attr      [32] = 32'b0;

        // Font data latch (SYNC MEM must be read in SYNC block)
        font_data    [16] = 16'b0;

        // Pipeline stage 3: font data valid, aligned with pixel_col/attr
        p3_pixel_col [4]  = 4'b0;
        p3_hsync     [1]  = 1'b0;
        p3_vsync     [1]  = 1'b0;
        p3_de        [1]  = 1'b0;
        p3_in_text   [1]  = 1'b0;
        p3_attr      [32] = 32'b0;

        // Pipeline stage 4: pixel computed, TMDS input
        p4_r         [8]  = 8'b0;
        p4_g         [8]  = 8'b0;
        p4_b         [8]  = 8'b0;
        p4_hsync     [1]  = 1'b0;
        p4_vsync     [1]  = 1'b0;
        p4_de        [1]  = 1'b0;
    }

    ASYNCHRONOUS {
        // TMDS clock channel
        tmds_clk <= 10'b1111100000;

        // Mode-dependent text area bounds
        // 720p: 8 + 22*32 = 712, active 1280, 80 cols
        // 1080p: 8 + 33*32 = 1064, active 1920, 120 cols
        y_text_max <= (mode_sync2 == 1'b1) ? 11'd1064 : 11'd712;
        x_active   <= (mode_sync2 == 1'b1) ? 12'd1920 : 12'd1280;
        col_max    <= (mode_sync2 == 1'b1) ? 7'd120   : 7'd80;

        // --- Stage 0: Combinational cell address ---
        y_adj <= vt_y - lit(11, V_OFFSET);
        col <= vt_x[10:4];
        row <= y_adj[10:5];
        scanline <= y_adj[4:0];

        in_text_area <= (vt_y >= lit(11, V_OFFSET) && vt_y < y_text_max &&
                         vt_x < x_active && vt_x[10:4] < col_max)
                        ? 1'b1 : 1'b0;

        // Cell address: row * cols + col
        // 720p: row*80 = row*64 + row*16
        // 1080p: row*120 = row*128 - row*8
        row_x128 <= {row, 7'b0};
        row_x64  <= {1'b0, row, 6'b0};
        row_x16  <= {1'b0, row, 4'b0};
        row_x8   <= {1'b0, row, 3'b0};

        IF (mode_sync2 == 1'b1) {
            cell_addr <= row_x128[11:0] - {2'b0, row_x8} + {5'b0, col};
        } ELSE {
            cell_addr <= row_x64[11:0] + {1'b0, row_x16} + {5'b0, col};
        }

        // Drive BSRAM address combinationally
        vram_addr <= cell_addr;

        // Cursor cell detection (stage 0)
        is_cursor_cell <= (cell_addr == cursor_pos && cursor_style != 3'b0) ? 1'b1 : 1'b0;

        // Cursor blink: styles 3,4 blink; styles 1,2 always on
        cursor_active <= (cursor_style == 3'd1 || cursor_style == 3'd2 ||
                         ((cursor_style == 3'd3 || cursor_style == 3'd4) && blink_on == 1'b1))
                         ? 1'b1 : 1'b0;

        // Cursor scanline zone: check if current scanline is in cursor region
        // Glyph is 32 pixels tall. Bottom 2 pixels = scanlines 30-31. Bottom 8 = scanlines 24-31.
        IF (cursor_style == 3'd1 || cursor_style == 3'd3) {
            // 2-pixel bottom line: scanlines 30-31
            in_cursor_zone <= (p3_scanline >= 5'd30) ? 1'b1 : 1'b0;
        } ELSE {
            // 8-pixel bottom block: scanlines 24-31
            in_cursor_zone <= (p3_scanline >= 5'd24) ? 1'b1 : 1'b0;
        }

        // Final cursor pixel: cell matches AND scanline in zone AND cursor visible
        cursor_pixel <= (p3_is_cursor == 1'b1 && in_cursor_zone == 1'b1 && cursor_active == 1'b1)
                        ? 1'b1 : 1'b0;

        // --- Stage 3 combinational: Font pixel selection from latched font data ---
        // font_data: MSB (bit 15) = leftmost pixel
        // Use gslice for dynamic bit indexing (spec requires constant indices for [])
        font_bit <= gslice(font_data, 4'd15 - p3_pixel_col, 1);

        // RGB565 from latched attribute
        fg_r5 <= p3_attr[15:11];
        fg_g6 <= p3_attr[10:5];
        fg_b5 <= p3_attr[4:0];
        bg_r5 <= p3_attr[31:27];
        bg_g6 <= p3_attr[26:21];
        bg_b5 <= p3_attr[20:16];

        // Pixel color: cursor inverts FG/BG; font_bit selects FG/BG; black outside text
        IF (p3_in_text == 1'b1 && cursor_pixel == 1'b1) {
            // Cursor: swap FG/BG (draw BG-colored text on FG-colored block)
            IF (font_bit == 1'b1) {
                pixel_r <= {bg_r5, bg_r5[4:2]};
                pixel_g <= {bg_g6, bg_g6[5:4]};
                pixel_b <= {bg_b5, bg_b5[4:2]};
            } ELSE {
                pixel_r <= {fg_r5, fg_r5[4:2]};
                pixel_g <= {fg_g6, fg_g6[5:4]};
                pixel_b <= {fg_b5, fg_b5[4:2]};
            }
        } ELIF (p3_in_text == 1'b1 && font_bit == 1'b1) {
            pixel_r <= {fg_r5, fg_r5[4:2]};
            pixel_g <= {fg_g6, fg_g6[5:4]};
            pixel_b <= {fg_b5, fg_b5[4:2]};
        } ELIF (p3_in_text == 1'b1) {
            pixel_r <= {bg_r5, bg_r5[4:2]};
            pixel_g <= {bg_g6, bg_g6[5:4]};
            pixel_b <= {bg_b5, bg_b5[4:2]};
        } ELSE {
            pixel_r <= 8'h00;
            pixel_g <= 8'h00;
            pixel_b <= 8'h00;
        }
    }

    SYNCHRONOUS(CLK = pixel_clk RESET = rst_n RESET_ACTIVE = Low) {
        // CDC synchronizer: video_mode from sys_clk domain
        mode_sync1 <= video_mode;
        mode_sync2 <= mode_sync1;

        // Blink timer: toggle every ~0.5s
        IF (blink_counter == 25'd0) {
            blink_counter <= 25'd18562500;  // half of 37.125M (0.5s at 74.25MHz)
            blink_on <= ~blink_on;
        } ELSE {
            blink_counter <= blink_counter - 25'd1;
        }

        // --- Edge 0 → 1: Terminal BSRAM addr registered (via comb vram_addr) ---
        p1_pixel_col <= vt_x[3:0];
        p1_hsync     <= vt_hsync;
        p1_vsync     <= vt_vsync;
        p1_de        <= vt_de;
        p1_in_text   <= in_text_area;
        p1_scanline  <= scanline;
        p1_is_cursor <= is_cursor_cell;

        // --- Edge 1 → 2: char/attr arrive from terminal; send font ROM read ---
        font_rom.font_rd.addr <= {vram_char, p1_scanline};
        p2_pixel_col <= p1_pixel_col;
        p2_hsync     <= p1_hsync;
        p2_vsync     <= p1_vsync;
        p2_de        <= p1_de;
        p2_in_text   <= p1_in_text;
        p2_attr      <= vram_attr;
        p2_is_cursor <= p1_is_cursor;
        p2_scanline  <= p1_scanline;

        // --- Edge 2 → 3: Font data latched; p3 captures aligned pixel_col/attr ---
        font_data    <= font_rom.font_rd.data;
        p3_pixel_col <= p2_pixel_col;
        p3_hsync     <= p2_hsync;
        p3_vsync     <= p2_vsync;
        p3_de        <= p2_de;
        p3_in_text   <= p2_in_text;
        p3_attr      <= p2_attr;
        p3_is_cursor <= p2_is_cursor;
        p3_scanline  <= p2_scanline;

        // --- Edge 3 → 4: Pixel color computed (comb from font_data + p3); latch for TMDS ---
        p4_r         <= pixel_r;
        p4_g         <= pixel_g;
        p4_b         <= pixel_b;
        p4_hsync     <= p3_hsync;
        p4_vsync     <= p3_vsync;
        p4_de        <= p3_de;
    }
@endmod
jz
// Dual-Mode Video Timing Generator
// Mode 0: 1280x720 @60Hz (CEA-861 VIC 4)
// Mode 1: 1920x1080 @30Hz (CEA-861 VIC 34)
// Both modes use 74.25 MHz pixel clock.
// Sync polarity: positive (sync HIGH during sync pulse)
@module video_timing
    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;
        IN  [1]  mode;
        OUT [1]  hsync;
        OUT [1]  vsync;
        OUT [1]  display_enable;
        OUT [12] x_pos;
        OUT [11] y_pos;
    }

    CONST {
        // 720p timing
        H_ACTIVE_720 = 1280;
        H_FRONT_720  = 110;
        H_SYNC_720   = 40;
        H_BACK_720   = 220;
        H_TOTAL_720  = 1650;
        V_ACTIVE_720 = 720;
        V_FRONT_720  = 5;
        V_SYNC_720   = 5;
        V_BACK_720   = 20;
        V_TOTAL_720  = 750;

        // 1080p@30 timing (CEA-861 VIC 34)
        H_ACTIVE_1080 = 1920;
        H_FRONT_1080  = 88;
        H_SYNC_1080   = 44;
        H_BACK_1080   = 148;
        H_TOTAL_1080  = 2200;
        V_ACTIVE_1080 = 1080;
        V_FRONT_1080  = 4;
        V_SYNC_1080   = 5;
        V_BACK_1080   = 36;
        V_TOTAL_1080  = 1125;
    }

    WIRE {
        h_total_m1    [12];
        v_total_m1    [11];
        h_active      [12];
        v_active      [11];
        h_sync_start  [12];
        h_sync_end    [12];
        v_sync_start  [11];
        v_sync_end    [11];
    }

    REGISTER {
        h_cnt [12] = 12'b0;
        v_cnt [11] = 11'b0;
    }

    ASYNCHRONOUS {
        // Mux timing parameters based on mode
        h_total_m1   <= (mode == 1'b1) ? lit(12, H_TOTAL_1080 - 1)              : lit(12, H_TOTAL_720 - 1);
        v_total_m1   <= (mode == 1'b1) ? lit(11, V_TOTAL_1080 - 1)              : lit(11, V_TOTAL_720 - 1);
        h_active     <= (mode == 1'b1) ? lit(12, H_ACTIVE_1080)                 : lit(12, H_ACTIVE_720);
        v_active     <= (mode == 1'b1) ? lit(11, V_ACTIVE_1080)                 : lit(11, V_ACTIVE_720);
        h_sync_start <= (mode == 1'b1) ? lit(12, H_ACTIVE_1080 + H_FRONT_1080)  : lit(12, H_ACTIVE_720 + H_FRONT_720);
        h_sync_end   <= (mode == 1'b1) ? lit(12, H_ACTIVE_1080 + H_FRONT_1080 + H_SYNC_1080) : lit(12, H_ACTIVE_720 + H_FRONT_720 + H_SYNC_720);
        v_sync_start <= (mode == 1'b1) ? lit(11, V_ACTIVE_1080 + V_FRONT_1080)  : lit(11, V_ACTIVE_720 + V_FRONT_720);
        v_sync_end   <= (mode == 1'b1) ? lit(11, V_ACTIVE_1080 + V_FRONT_1080 + V_SYNC_1080) : lit(11, V_ACTIVE_720 + V_FRONT_720 + V_SYNC_720);

        // Positive sync polarity: HIGH during sync pulse
        hsync <= (h_cnt >= h_sync_start && h_cnt < h_sync_end) ? 1'b1 : 1'b0;
        vsync <= (v_cnt >= v_sync_start && v_cnt < v_sync_end) ? 1'b1 : 1'b0;

        // Display enable: active region
        display_enable <= (h_cnt < h_active && v_cnt < v_active) ? 1'b1 : 1'b0;

        x_pos <= h_cnt;
        y_pos <= v_cnt;
    }

    SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
        IF (h_cnt == h_total_m1) {
            h_cnt <= 12'b0;
            IF (v_cnt == v_total_m1) {
                v_cnt <= 11'b0;
            } ELSE {
                v_cnt <= v_cnt + 11'b1;
            }
        } ELSE {
            h_cnt <= h_cnt + 12'b1;
        }
    }
@endmod
jz
// DVI TMDS 8b/10b Encoder
// Full DVI-compliant TMDS encoding with XOR/XNOR selection and
// running disparity tracking for DC balance on AC-coupled links.
@module tmds_encoder
    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;
        IN  [8]  data_in;
        IN  [1]  c0;
        IN  [1]  c1;
        IN  [1]  display_enable;
        OUT [10] tmds_out;
    }

    WIRE {
        // Popcount of data_in and q_m via intrinsic
        n1_d [4];

        // XOR/XNOR mode selection
        use_xnor [1];

        // Transition-minimized intermediate word q_m[8:0]
        qm0 [1]; qm1 [1]; qm2 [1]; qm3 [1];
        qm4 [1]; qm5 [1]; qm6 [1]; qm7 [1];
        qm8 [1];

        // Popcount of q_m[7:0] via intrinsic
        n1_q [4];

        // Disparity conditions
        cnt_is_zero [1];
        qm_balanced [1];
        cond1       [1];
        cnt_sign    [1];
        cond_inv    [1];

        // Arithmetic for disparity update (5-bit two's complement)
        diff_n1n0 [5];
        diff_n0n1 [5];
        qm8_x2    [5];
        nqm8_x2   [5];

        // Combinational outputs
        tmds_data [10];
        next_cnt  [5];
    }

    REGISTER {
        cnt      [5]  = 5'b00000;
        tmds_reg [10] = 10'b0000000000;
    }

    ASYNCHRONOUS {
        tmds_out <= tmds_reg;

        // --- Popcount of data_in ---
        n1_d <= popcount(data_in);

        // --- XOR/XNOR selection (DVI spec section 3.3.1) ---
        use_xnor <= (n1_d > 4'd4 || (n1_d == 4'd4 && data_in[0] == 1'b0))
                     ? 1'b1 : 1'b0;

        // --- Build transition-minimized word q_m ---
        qm0 <= data_in[0];
        qm1 <= (use_xnor == 1'b1) ? ~(data_in[1] ^ qm0) : (data_in[1] ^ qm0);
        qm2 <= (use_xnor == 1'b1) ? ~(data_in[2] ^ qm1) : (data_in[2] ^ qm1);
        qm3 <= (use_xnor == 1'b1) ? ~(data_in[3] ^ qm2) : (data_in[3] ^ qm2);
        qm4 <= (use_xnor == 1'b1) ? ~(data_in[4] ^ qm3) : (data_in[4] ^ qm3);
        qm5 <= (use_xnor == 1'b1) ? ~(data_in[5] ^ qm4) : (data_in[5] ^ qm4);
        qm6 <= (use_xnor == 1'b1) ? ~(data_in[6] ^ qm5) : (data_in[6] ^ qm5);
        qm7 <= (use_xnor == 1'b1) ? ~(data_in[7] ^ qm6) : (data_in[7] ^ qm6);
        qm8 <= (use_xnor == 1'b1) ? 1'b0 : 1'b1;

        // --- Popcount of q_m[7:0] ---
        n1_q <= popcount({qm7, qm6, qm5, qm4, qm3, qm2, qm1, qm0});

        // --- Disparity conditions ---
        cnt_is_zero <= (cnt == 5'b00000) ? 1'b1 : 1'b0;
        qm_balanced <= (n1_q == 4'd4) ? 1'b1 : 1'b0;
        cond1       <= (cnt_is_zero == 1'b1 || qm_balanced == 1'b1)
                        ? 1'b1 : 1'b0;
        cnt_sign    <= cnt[4];
        cond_inv    <= ((cnt_sign == 1'b0 && cnt_is_zero == 1'b0 && n1_q > 4'd4) ||
                        (cnt_sign == 1'b1 && n1_q < 4'd4))
                       ? 1'b1 : 1'b0;

        // --- Arithmetic helpers (5-bit two's complement) ---
        diff_n1n0 <= {n1_q, 1'b0} - 5'd8;
        diff_n0n1 <= 5'd8 - {n1_q, 1'b0};
        qm8_x2   <= {3'b000, qm8, 1'b0};
        nqm8_x2  <= {3'b000, ~qm8, 1'b0};

        // --- Output word and next disparity (DVI spec section 3.3.2) ---
        IF (cond1 == 1'b1) {
            IF (qm8 == 1'b0) {
                // XNOR mode, cnt==0 or balanced: invert data, bit[9]=1
                tmds_data <= {1'b1, 1'b0, ~qm7, ~qm6, ~qm5, ~qm4,
                              ~qm3, ~qm2, ~qm1, ~qm0};
                next_cnt  <= cnt + diff_n0n1;
            } ELSE {
                // XOR mode, cnt==0 or balanced: keep data, bit[9]=0
                tmds_data <= {1'b0, 1'b1, qm7, qm6, qm5, qm4,
                              qm3, qm2, qm1, qm0};
                next_cnt  <= cnt + diff_n1n0;
            }
        } ELIF (cond_inv == 1'b1) {
            // Invert to reduce disparity
            tmds_data <= {1'b1, qm8, ~qm7, ~qm6, ~qm5, ~qm4,
                          ~qm3, ~qm2, ~qm1, ~qm0};
            next_cnt  <= cnt + qm8_x2 + diff_n0n1;
        } ELSE {
            // Don't invert
            tmds_data <= {1'b0, qm8, qm7, qm6, qm5, qm4,
                          qm3, qm2, qm1, qm0};
            next_cnt  <= cnt - nqm8_x2 + diff_n1n0;
        }
    }

    SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
        IF (display_enable == 1'b0) {
            // Control period: reset disparity and emit control tokens
            cnt <= 5'b00000;
            IF (c0 == 1'b0 && c1 == 1'b0) {
                tmds_reg <= 10'b1101010100;
            } ELIF (c0 == 1'b1 && c1 == 1'b0) {
                tmds_reg <= 10'b0010101011;
            } ELIF (c0 == 1'b0 && c1 == 1'b1) {
                tmds_reg <= 10'b0101010100;
            } ELSE {
                tmds_reg <= 10'b1010101011;
            }
        } ELSE {
            // Data period: latch encoded word and update disparity
            tmds_reg <= tmds_data;
            cnt <= next_cnt;
        }
    }
@endmod
jz
// Audio controller - Yamaha FM synth inspired
// 8 configurable channels with waveform, ADSR envelope, volume, pan
// Mixes to stereo, fills internal ring buffer (512 stereo samples)
// CPU reads samples via bus register interface
//
// Register map (ADDR[8:2] = register index):
//   Global registers (ADDR[8:5] = 0000):
//     0x00 [RW] CTRL:       [0]=enable, [1]=irq_en, [2]=irq_clear(W)
//     0x04 [R]  STATUS:     [9:0]=buf_level, [10]=half_full, [11]=underrun
//     0x08 [R]  SAMPLE:     {left[15:0], right[15:0]} - read pops from buffer
//     0x0C [RW] DIVIDER:    [15:0]=sample rate divider (sclk/divider = sample rate)
//     0x10 [RW] MASTER_VOL: [7:0]=master volume 0-255
//
//   Per-channel registers (ADDR[8:5] = 0001..1000 for ch0..ch7):
//     +0x00 [RW] CH_CTRL:   [0]=key_on, [3:1]=waveform, [4]=enabled
//     +0x04 [RW] FREQ_LO:   [15:0]=phase increment low
//     +0x08 [RW] FREQ_HI:   [7:0]=phase increment high (24-bit total)
//     +0x0C [RW] VOLUME:    [7:0]=volume, [15:8]=pan (0=L, 128=center, 255=R)
//     +0x10 [RW] ENV_AD:    [7:0]=attack rate, [15:8]=decay rate
//     +0x14 [RW] ENV_SR:    [7:0]=sustain level, [15:8]=release rate
//     +0x18 [RW] DUTY:      [7:0]=duty cycle (square wave)

@module audio
    CONST {
        NUM_CHANNELS = 8;
        BUF_DEPTH    = 128;   // Stereo sample pairs in ring buffer
    }

    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        BUS SIMPLE_BUS TARGET pbus;
        OUT [1] irq;
    }

    WIRE {
        // Bus decode
        reg_group   [4];    // ADDR[8:5] - 0=global, 1-8=channel
        reg_sel     [3];    // ADDR[4:2] - register within group
        bus_read    [1];
        bus_write   [1];

        // Sample rate tick generation
        tick        [1];

        // Channel outputs (16-bit signed each)
        ch0_out [16]; ch1_out [16]; ch2_out [16]; ch3_out [16];
        ch4_out [16]; ch5_out [16]; ch6_out [16]; ch7_out [16];
        all_samples [128];
        all_pans    [64];

        // Mixer outputs
        mix_left    [16];
        mix_right   [16];
        mix_valid   [1];

        // Buffer level
        buf_count   [8];
        buf_half    [1];

    }

    REGISTER {
        // Global control
        enable      [1]  = 1'b0;
        irq_en      [1]  = 1'b0;
        master_vol  [8]  = 8'h80;
        divider     [16] = 16'd1125;  // 54MHz/1125 ≈ 48kHz
        div_count   [16] = 16'd0;
        sample_tick [1]  = 1'b0;
        underrun    [1]  = 1'b0;

        // Ring buffer pointers (7-bit for 128-deep buffer)
        wr_ptr      [7]  = 7'd0;
        rd_ptr      [7]  = 7'd0;

        // Per-channel registers: key_on, waveform, enabled
        ch0_key [1] = 1'b0; ch1_key [1] = 1'b0; ch2_key [1] = 1'b0; ch3_key [1] = 1'b0;
        ch4_key [1] = 1'b0; ch5_key [1] = 1'b0; ch6_key [1] = 1'b0; ch7_key [1] = 1'b0;
        ch0_wf [3] = 3'd0; ch1_wf [3] = 3'd0; ch2_wf [3] = 3'd0; ch3_wf [3] = 3'd0;
        ch4_wf [3] = 3'd0; ch5_wf [3] = 3'd0; ch6_wf [3] = 3'd0; ch7_wf [3] = 3'd0;
        ch0_en [1] = 1'b0; ch1_en [1] = 1'b0; ch2_en [1] = 1'b0; ch3_en [1] = 1'b0;
        ch4_en [1] = 1'b0; ch5_en [1] = 1'b0; ch6_en [1] = 1'b0; ch7_en [1] = 1'b0;

        // Frequency (24-bit, split into hi/lo registers)
        ch0_freq [24] = 24'd0; ch1_freq [24] = 24'd0; ch2_freq [24] = 24'd0; ch3_freq [24] = 24'd0;
        ch4_freq [24] = 24'd0; ch5_freq [24] = 24'd0; ch6_freq [24] = 24'd0; ch7_freq [24] = 24'd0;

        // Volume + pan
        ch0_vol [8] = 8'd0; ch1_vol [8] = 8'd0; ch2_vol [8] = 8'd0; ch3_vol [8] = 8'd0;
        ch4_vol [8] = 8'd0; ch5_vol [8] = 8'd0; ch6_vol [8] = 8'd0; ch7_vol [8] = 8'd0;
        ch0_pan [8] = 8'h80; ch1_pan [8] = 8'h80; ch2_pan [8] = 8'h80; ch3_pan [8] = 8'h80;
        ch4_pan [8] = 8'h80; ch5_pan [8] = 8'h80; ch6_pan [8] = 8'h80; ch7_pan [8] = 8'h80;

        // Envelope params
        ch0_atk [8] = 8'd0; ch1_atk [8] = 8'd0; ch2_atk [8] = 8'd0; ch3_atk [8] = 8'd0;
        ch4_atk [8] = 8'd0; ch5_atk [8] = 8'd0; ch6_atk [8] = 8'd0; ch7_atk [8] = 8'd0;
        ch0_dec [8] = 8'd0; ch1_dec [8] = 8'd0; ch2_dec [8] = 8'd0; ch3_dec [8] = 8'd0;
        ch4_dec [8] = 8'd0; ch5_dec [8] = 8'd0; ch6_dec [8] = 8'd0; ch7_dec [8] = 8'd0;
        ch0_sus [8] = 8'd0; ch1_sus [8] = 8'd0; ch2_sus [8] = 8'd0; ch3_sus [8] = 8'd0;
        ch4_sus [8] = 8'd0; ch5_sus [8] = 8'd0; ch6_sus [8] = 8'd0; ch7_sus [8] = 8'd0;
        ch0_rel [8] = 8'd0; ch1_rel [8] = 8'd0; ch2_rel [8] = 8'd0; ch3_rel [8] = 8'd0;
        ch4_rel [8] = 8'd0; ch5_rel [8] = 8'd0; ch6_rel [8] = 8'd0; ch7_rel [8] = 8'd0;

        // Duty cycle
        ch0_duty [8] = 8'h80; ch1_duty [8] = 8'h80; ch2_duty [8] = 8'h80; ch3_duty [8] = 8'h80;
        ch4_duty [8] = 8'h80; ch5_duty [8] = 8'h80; ch6_duty [8] = 8'h80; ch7_duty [8] = 8'h80;

        // Bus read pipeline
        read_reg     [32] = 32'd0;
        data_ready   [1]  = 1'b0;

        // IRQ output
        irq_reg      [1]  = 1'b0;
    }

    MEM(TYPE=DISTRIBUTED) {
        // Ring buffer: 128 entries, split into left/right 16-bit memories
        buf_left [16] [128] = 16'h0000 {
            OUT rd ASYNC;
            IN  wr;
        };
        buf_right [16] [128] = 16'h0000 {
            OUT rd ASYNC;
            IN  wr;
        };
    }

    // ---- Channel Generators ----
    @new gen0 aud_gen {
        IN  [1]  clk = clk;         IN  [1]  rst_n = rst_n;
        IN  [1]  sample_tick = tick; IN  [1]  key_on = ch0_key;
        IN  [1]  enabled = ch0_en;   IN  [3]  waveform = ch0_wf;
        IN  [24] freq = ch0_freq;    IN  [8]  volume = ch0_vol;
        IN  [8]  duty = ch0_duty;    IN  [8]  attack_rate = ch0_atk;
        IN  [8]  decay_rate = ch0_dec; IN  [8] sustain_level = ch0_sus;
        IN  [8]  release_rate = ch0_rel;
        OUT [16] sample_out = ch0_out;
    }
    @new gen1 aud_gen {
        IN  [1]  clk = clk;         IN  [1]  rst_n = rst_n;
        IN  [1]  sample_tick = tick; IN  [1]  key_on = ch1_key;
        IN  [1]  enabled = ch1_en;   IN  [3]  waveform = ch1_wf;
        IN  [24] freq = ch1_freq;    IN  [8]  volume = ch1_vol;
        IN  [8]  duty = ch1_duty;    IN  [8]  attack_rate = ch1_atk;
        IN  [8]  decay_rate = ch1_dec; IN  [8] sustain_level = ch1_sus;
        IN  [8]  release_rate = ch1_rel;
        OUT [16] sample_out = ch1_out;
    }
    @new gen2 aud_gen {
        IN  [1]  clk = clk;         IN  [1]  rst_n = rst_n;
        IN  [1]  sample_tick = tick; IN  [1]  key_on = ch2_key;
        IN  [1]  enabled = ch2_en;   IN  [3]  waveform = ch2_wf;
        IN  [24] freq = ch2_freq;    IN  [8]  volume = ch2_vol;
        IN  [8]  duty = ch2_duty;    IN  [8]  attack_rate = ch2_atk;
        IN  [8]  decay_rate = ch2_dec; IN  [8] sustain_level = ch2_sus;
        IN  [8]  release_rate = ch2_rel;
        OUT [16] sample_out = ch2_out;
    }
    @new gen3 aud_gen {
        IN  [1]  clk = clk;         IN  [1]  rst_n = rst_n;
        IN  [1]  sample_tick = tick; IN  [1]  key_on = ch3_key;
        IN  [1]  enabled = ch3_en;   IN  [3]  waveform = ch3_wf;
        IN  [24] freq = ch3_freq;    IN  [8]  volume = ch3_vol;
        IN  [8]  duty = ch3_duty;    IN  [8]  attack_rate = ch3_atk;
        IN  [8]  decay_rate = ch3_dec; IN  [8] sustain_level = ch3_sus;
        IN  [8]  release_rate = ch3_rel;
        OUT [16] sample_out = ch3_out;
    }
    @new gen4 aud_gen {
        IN  [1]  clk = clk;         IN  [1]  rst_n = rst_n;
        IN  [1]  sample_tick = tick; IN  [1]  key_on = ch4_key;
        IN  [1]  enabled = ch4_en;   IN  [3]  waveform = ch4_wf;
        IN  [24] freq = ch4_freq;    IN  [8]  volume = ch4_vol;
        IN  [8]  duty = ch4_duty;    IN  [8]  attack_rate = ch4_atk;
        IN  [8]  decay_rate = ch4_dec; IN  [8] sustain_level = ch4_sus;
        IN  [8]  release_rate = ch4_rel;
        OUT [16] sample_out = ch4_out;
    }
    @new gen5 aud_gen {
        IN  [1]  clk = clk;         IN  [1]  rst_n = rst_n;
        IN  [1]  sample_tick = tick; IN  [1]  key_on = ch5_key;
        IN  [1]  enabled = ch5_en;   IN  [3]  waveform = ch5_wf;
        IN  [24] freq = ch5_freq;    IN  [8]  volume = ch5_vol;
        IN  [8]  duty = ch5_duty;    IN  [8]  attack_rate = ch5_atk;
        IN  [8]  decay_rate = ch5_dec; IN  [8] sustain_level = ch5_sus;
        IN  [8]  release_rate = ch5_rel;
        OUT [16] sample_out = ch5_out;
    }
    @new gen6 aud_gen {
        IN  [1]  clk = clk;         IN  [1]  rst_n = rst_n;
        IN  [1]  sample_tick = tick; IN  [1]  key_on = ch6_key;
        IN  [1]  enabled = ch6_en;   IN  [3]  waveform = ch6_wf;
        IN  [24] freq = ch6_freq;    IN  [8]  volume = ch6_vol;
        IN  [8]  duty = ch6_duty;    IN  [8]  attack_rate = ch6_atk;
        IN  [8]  decay_rate = ch6_dec; IN  [8] sustain_level = ch6_sus;
        IN  [8]  release_rate = ch6_rel;
        OUT [16] sample_out = ch6_out;
    }
    @new gen7 aud_gen {
        IN  [1]  clk = clk;         IN  [1]  rst_n = rst_n;
        IN  [1]  sample_tick = tick; IN  [1]  key_on = ch7_key;
        IN  [1]  enabled = ch7_en;   IN  [3]  waveform = ch7_wf;
        IN  [24] freq = ch7_freq;    IN  [8]  volume = ch7_vol;
        IN  [8]  duty = ch7_duty;    IN  [8]  attack_rate = ch7_atk;
        IN  [8]  decay_rate = ch7_dec; IN  [8] sustain_level = ch7_sus;
        IN  [8]  release_rate = ch7_rel;
        OUT [16] sample_out = ch7_out;
    }

    // ---- Mixer ----
    @new mix0 aud_mixer {
        IN  [1]   clk = clk;
        IN  [1]   rst_n = rst_n;
        IN  [1]   sample_tick = tick;
        IN  [8]   master_vol = master_vol;
        IN  [128] ch_samples = all_samples;
        IN  [64]  ch_pans = all_pans;
        OUT [16]  out_left = mix_left;
        OUT [16]  out_right = mix_right;
        OUT [1]   out_valid = mix_valid;
    }

    ASYNCHRONOUS {
        // Bus decode
        reg_group <= pbus.ADDR[8:5];
        reg_sel   <= pbus.ADDR[4:2];
        bus_read  <= pbus.VALID & ~pbus.CMD;
        bus_write <= pbus.VALID & pbus.CMD;

        // Sample tick wire (directly from register, pulsed in SYNC block)
        tick <= sample_tick;

        // Concatenate channel outputs for mixer
        all_samples <= {ch7_out, ch6_out, ch5_out, ch4_out,
                        ch3_out, ch2_out, ch1_out, ch0_out};
        all_pans    <= {ch7_pan, ch6_pan, ch5_pan, ch4_pan,
                        ch3_pan, ch2_pan, ch1_pan, ch0_pan};

        // Buffer level = wr_ptr - rd_ptr (modular 9-bit, report 10 bits)
        buf_count <= {1'b0, wr_ptr} - {1'b0, rd_ptr};
        buf_half  <= buf_count[7] | buf_count[6];

        // IRQ output
        irq <= irq_reg;

        // Bus data drive (reads)
        IF (pbus.VALID && pbus.CMD == CMD.READ && data_ready == 1'b1) {
            pbus.DATA <= read_reg;
        } ELSE {
            pbus.DATA <= 32'bz;
        }

        // Bus DONE
        IF (pbus.VALID && pbus.CMD == CMD.WRITE) {
            pbus.DONE <= 1'b1;
        } ELIF (pbus.VALID && pbus.CMD == CMD.READ && data_ready == 1'b1) {
            pbus.DONE <= 1'b1;
        } ELSE {
            pbus.DONE <= 1'bz;
        }
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        // =====================
        // Chain 1: Sample rate divider (sample_tick, div_count)
        // =====================
        IF (enable == 1'b1 && div_count >= divider) {
            div_count   <= 16'd0;
            sample_tick <= 1'b1;
        } ELIF (enable == 1'b1) {
            div_count   <= div_count + 16'd1;
            sample_tick <= 1'b0;
        } ELSE {
            div_count   <= 16'd0;
            sample_tick <= 1'b0;
        }

        // =====================
        // Chain 2: Ring buffer write (wr_ptr)
        // =====================
        IF (mix_valid == 1'b1) {
            buf_left.wr[wr_ptr] <= mix_left;
            buf_right.wr[wr_ptr] <= mix_right;
            IF (wr_ptr == 7'd127) {
                wr_ptr <= 7'd0;
            } ELSE {
                wr_ptr <= wr_ptr + 7'd1;
            }
        }

        // =====================
        // Chain 3: IRQ register (irq_reg)
        // Bus write CTRL with bit[2] clears; buffer half-full sets
        // =====================
        IF (bus_write == 1'b1 && reg_group == 4'b0000 && reg_sel == 3'b000 && pbus.DATA[2] == 1'b1) {
            irq_reg <= 1'b0;
        } ELIF (irq_en == 1'b1 && buf_half == 1'b1) {
            irq_reg <= 1'b1;
        }

        // =====================
        // Chain 4: Underrun flag (underrun)
        // Bus write CTRL with bit[2] clears; empty buffer read sets
        // =====================
        IF (bus_write == 1'b1 && reg_group == 4'b0000 && reg_sel == 3'b000 && pbus.DATA[2] == 1'b1) {
            underrun <= 1'b0;
        } ELIF (bus_read == 1'b1 && reg_group == 4'b0000 && reg_sel == 3'b010 && rd_ptr == wr_ptr && data_ready == 1'b0) {
            underrun <= 1'b1;
        }

        // =====================
        // Chain 5: Bus read pipeline (data_ready, read_reg, rd_ptr)
        // Distributed mem reads are combinational (rd_data.addr set in ASYNC)
        // =====================
        IF (data_ready == 1'b1) {
            data_ready <= 1'b0;
        } ELIF (bus_read == 1'b1) {
            // SAMPLE register: read from ring buffer (combinational)
            IF (reg_group == 4'b0000 && reg_sel == 3'b010) {
                IF (rd_ptr != wr_ptr) {
                    read_reg   <= {buf_left.rd[rd_ptr], buf_right.rd[rd_ptr]};
                    data_ready <= 1'b1;
                    IF (rd_ptr == 7'd127) {
                        rd_ptr <= 7'd0;
                    } ELSE {
                        rd_ptr <= rd_ptr + 7'd1;
                    }
                } ELSE {
                    // Buffer empty
                    read_reg   <= 32'd0;
                    data_ready <= 1'b1;
                }

            // CTRL
            } ELIF (reg_group == 4'b0000 && reg_sel == 3'b000) {
                read_reg   <= {30'd0, irq_en, enable};
                data_ready <= 1'b1;

            // STATUS
            } ELIF (reg_group == 4'b0000 && reg_sel == 3'b001) {
                read_reg   <= {22'd0, underrun, buf_half, buf_count};
                data_ready <= 1'b1;

            // DIVIDER
            } ELIF (reg_group == 4'b0000 && reg_sel == 3'b011) {
                read_reg   <= {16'd0, divider};
                data_ready <= 1'b1;

            // MASTER_VOL
            } ELIF (reg_group == 4'b0000 && reg_sel == 3'b100) {
                read_reg   <= {24'd0, master_vol};
                data_ready <= 1'b1;

            // Channel 0 readback
            } ELIF (reg_group == 4'b0001) {
                data_ready <= 1'b1;
                IF (reg_sel == 3'b000) {
                    read_reg <= {27'd0, ch0_en, ch0_wf, ch0_key};
                } ELIF (reg_sel == 3'b001) {
                    read_reg <= {16'd0, ch0_freq[15:0]};
                } ELIF (reg_sel == 3'b010) {
                    read_reg <= {24'd0, ch0_freq[23:16]};
                } ELIF (reg_sel == 3'b011) {
                    read_reg <= {16'd0, ch0_pan, ch0_vol};
                } ELIF (reg_sel == 3'b100) {
                    read_reg <= {16'd0, ch0_dec, ch0_atk};
                } ELIF (reg_sel == 3'b101) {
                    read_reg <= {16'd0, ch0_rel, ch0_sus};
                } ELIF (reg_sel == 3'b110) {
                    read_reg <= {24'd0, ch0_duty};
                } ELSE {
                    read_reg <= 32'd0;
                }

            // Channel 1 readback
            } ELIF (reg_group == 4'b0010) {
                data_ready <= 1'b1;
                IF (reg_sel == 3'b000) {
                    read_reg <= {27'd0, ch1_en, ch1_wf, ch1_key};
                } ELIF (reg_sel == 3'b001) {
                    read_reg <= {16'd0, ch1_freq[15:0]};
                } ELIF (reg_sel == 3'b010) {
                    read_reg <= {24'd0, ch1_freq[23:16]};
                } ELIF (reg_sel == 3'b011) {
                    read_reg <= {16'd0, ch1_pan, ch1_vol};
                } ELIF (reg_sel == 3'b100) {
                    read_reg <= {16'd0, ch1_dec, ch1_atk};
                } ELIF (reg_sel == 3'b101) {
                    read_reg <= {16'd0, ch1_rel, ch1_sus};
                } ELIF (reg_sel == 3'b110) {
                    read_reg <= {24'd0, ch1_duty};
                } ELSE {
                    read_reg <= 32'd0;
                }

            } ELSE {
                // Other channel reads: return 0
                data_ready <= 1'b1;
                read_reg   <= 32'd0;
            }
        }

        // =====================
        // Chain 6: Bus register writes (enable, irq_en, master_vol, divider, ch regs)
        // NOTE: irq_reg and underrun are in their own chains above
        // =====================
        IF (bus_write == 1'b1) {
            // Global registers
            IF (reg_group == 4'b0000) {
                IF (reg_sel == 3'b000) {
                    enable <= pbus.DATA[0];
                    irq_en <= pbus.DATA[1];
                } ELIF (reg_sel == 3'b011) {
                    divider <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b100) {
                    master_vol <= pbus.DATA[7:0];
                }

            // Channel 0
            } ELIF (reg_group == 4'b0001) {
                IF (reg_sel == 3'b000) {
                    ch0_key <= pbus.DATA[0]; ch0_wf <= pbus.DATA[3:1]; ch0_en <= pbus.DATA[4];
                } ELIF (reg_sel == 3'b001) {
                    ch0_freq[15:0] <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b010) {
                    ch0_freq[23:16] <= pbus.DATA[7:0];
                } ELIF (reg_sel == 3'b011) {
                    ch0_vol <= pbus.DATA[7:0]; ch0_pan <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b100) {
                    ch0_atk <= pbus.DATA[7:0]; ch0_dec <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b101) {
                    ch0_sus <= pbus.DATA[7:0]; ch0_rel <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b110) {
                    ch0_duty <= pbus.DATA[7:0];
                }

            // Channel 1
            } ELIF (reg_group == 4'b0010) {
                IF (reg_sel == 3'b000) {
                    ch1_key <= pbus.DATA[0]; ch1_wf <= pbus.DATA[3:1]; ch1_en <= pbus.DATA[4];
                } ELIF (reg_sel == 3'b001) {
                    ch1_freq[15:0] <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b010) {
                    ch1_freq[23:16] <= pbus.DATA[7:0];
                } ELIF (reg_sel == 3'b011) {
                    ch1_vol <= pbus.DATA[7:0]; ch1_pan <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b100) {
                    ch1_atk <= pbus.DATA[7:0]; ch1_dec <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b101) {
                    ch1_sus <= pbus.DATA[7:0]; ch1_rel <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b110) {
                    ch1_duty <= pbus.DATA[7:0];
                }

            // Channel 2
            } ELIF (reg_group == 4'b0011) {
                IF (reg_sel == 3'b000) {
                    ch2_key <= pbus.DATA[0]; ch2_wf <= pbus.DATA[3:1]; ch2_en <= pbus.DATA[4];
                } ELIF (reg_sel == 3'b001) {
                    ch2_freq[15:0] <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b010) {
                    ch2_freq[23:16] <= pbus.DATA[7:0];
                } ELIF (reg_sel == 3'b011) {
                    ch2_vol <= pbus.DATA[7:0]; ch2_pan <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b100) {
                    ch2_atk <= pbus.DATA[7:0]; ch2_dec <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b101) {
                    ch2_sus <= pbus.DATA[7:0]; ch2_rel <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b110) {
                    ch2_duty <= pbus.DATA[7:0];
                }

            // Channel 3
            } ELIF (reg_group == 4'b0100) {
                IF (reg_sel == 3'b000) {
                    ch3_key <= pbus.DATA[0]; ch3_wf <= pbus.DATA[3:1]; ch3_en <= pbus.DATA[4];
                } ELIF (reg_sel == 3'b001) {
                    ch3_freq[15:0] <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b010) {
                    ch3_freq[23:16] <= pbus.DATA[7:0];
                } ELIF (reg_sel == 3'b011) {
                    ch3_vol <= pbus.DATA[7:0]; ch3_pan <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b100) {
                    ch3_atk <= pbus.DATA[7:0]; ch3_dec <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b101) {
                    ch3_sus <= pbus.DATA[7:0]; ch3_rel <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b110) {
                    ch3_duty <= pbus.DATA[7:0];
                }

            // Channel 4
            } ELIF (reg_group == 4'b0101) {
                IF (reg_sel == 3'b000) {
                    ch4_key <= pbus.DATA[0]; ch4_wf <= pbus.DATA[3:1]; ch4_en <= pbus.DATA[4];
                } ELIF (reg_sel == 3'b001) {
                    ch4_freq[15:0] <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b010) {
                    ch4_freq[23:16] <= pbus.DATA[7:0];
                } ELIF (reg_sel == 3'b011) {
                    ch4_vol <= pbus.DATA[7:0]; ch4_pan <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b100) {
                    ch4_atk <= pbus.DATA[7:0]; ch4_dec <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b101) {
                    ch4_sus <= pbus.DATA[7:0]; ch4_rel <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b110) {
                    ch4_duty <= pbus.DATA[7:0];
                }

            // Channel 5
            } ELIF (reg_group == 4'b0110) {
                IF (reg_sel == 3'b000) {
                    ch5_key <= pbus.DATA[0]; ch5_wf <= pbus.DATA[3:1]; ch5_en <= pbus.DATA[4];
                } ELIF (reg_sel == 3'b001) {
                    ch5_freq[15:0] <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b010) {
                    ch5_freq[23:16] <= pbus.DATA[7:0];
                } ELIF (reg_sel == 3'b011) {
                    ch5_vol <= pbus.DATA[7:0]; ch5_pan <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b100) {
                    ch5_atk <= pbus.DATA[7:0]; ch5_dec <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b101) {
                    ch5_sus <= pbus.DATA[7:0]; ch5_rel <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b110) {
                    ch5_duty <= pbus.DATA[7:0];
                }

            // Channel 6
            } ELIF (reg_group == 4'b0111) {
                IF (reg_sel == 3'b000) {
                    ch6_key <= pbus.DATA[0]; ch6_wf <= pbus.DATA[3:1]; ch6_en <= pbus.DATA[4];
                } ELIF (reg_sel == 3'b001) {
                    ch6_freq[15:0] <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b010) {
                    ch6_freq[23:16] <= pbus.DATA[7:0];
                } ELIF (reg_sel == 3'b011) {
                    ch6_vol <= pbus.DATA[7:0]; ch6_pan <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b100) {
                    ch6_atk <= pbus.DATA[7:0]; ch6_dec <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b101) {
                    ch6_sus <= pbus.DATA[7:0]; ch6_rel <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b110) {
                    ch6_duty <= pbus.DATA[7:0];
                }

            // Channel 7
            } ELIF (reg_group == 4'b1000) {
                IF (reg_sel == 3'b000) {
                    ch7_key <= pbus.DATA[0]; ch7_wf <= pbus.DATA[3:1]; ch7_en <= pbus.DATA[4];
                } ELIF (reg_sel == 3'b001) {
                    ch7_freq[15:0] <= pbus.DATA[15:0];
                } ELIF (reg_sel == 3'b010) {
                    ch7_freq[23:16] <= pbus.DATA[7:0];
                } ELIF (reg_sel == 3'b011) {
                    ch7_vol <= pbus.DATA[7:0]; ch7_pan <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b100) {
                    ch7_atk <= pbus.DATA[7:0]; ch7_dec <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b101) {
                    ch7_sus <= pbus.DATA[7:0]; ch7_rel <= pbus.DATA[15:8];
                } ELIF (reg_sel == 3'b110) {
                    ch7_duty <= pbus.DATA[7:0];
                }
            }
        }
    }
@endmod
jz
// Audio channel generator - waveform synthesis with ADSR envelope
// Inspired by Yamaha FM synth voice channels
// Waveforms: square (with duty), triangle, sawtooth, noise
// ADSR envelope controls amplitude over time
// Uses @global WAVE and ENV from global.jz

@module aud_gen
    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;
        IN  [1]  sample_tick;    // One-cycle pulse at sample rate
        IN  [1]  key_on;         // Gate signal (key pressed)
        IN  [1]  enabled;        // Channel enable
        IN  [3]  waveform;       // WAVE.SQUARE/TRIANGLE/SAWTOOTH/NOISE
        IN  [24] freq;           // Phase increment per sample
        IN  [8]  volume;         // Channel volume 0-255
        IN  [8]  duty;           // Square wave duty cycle 0-255
        IN  [8]  attack_rate;    // Envelope attack speed
        IN  [8]  decay_rate;     // Envelope decay speed
        IN  [8]  sustain_level;  // Envelope sustain amplitude
        IN  [8]  release_rate;   // Envelope release speed
        OUT [16] sample_out;     // Signed 16-bit output
    }

    WIRE {
        raw_wave     [16];   // Raw waveform (signed)
        tri_phase    [15];   // Triangle intermediate
        tri_wave     [16];   // Triangle output
        env_byte     [8];    // Top 8 bits of envelope
        vol_ext      [16];   // Volume zero-extended
        env_ext      [16];   // Envelope zero-extended
        wave_x_env   [32];   // waveform * envelope
        scaled_wave  [16];   // After envelope scaling
        scaled_ext   [32];   // For volume multiply
    }

    REGISTER {
        phase       [24] = 24'd0;
        env_state   [3]  = 3'd0;
        env_value   [16] = 16'd0;
        key_prev    [1]  = 1'b0;
        lfsr        [16] = 16'hACE1;
    }

    ASYNCHRONOUS {
        // Triangle: fold phase into up/down ramp
        tri_phase <= (phase[23] == 1'b0) ? phase[22:8] : ~phase[22:8];
        tri_wave  <= {1'b0, tri_phase};

        // Waveform mux
        IF (waveform == WAVE.SQUARE) {
            raw_wave <= (phase[23:16] < duty) ? 16'h7FFF : 16'h8001;
        } ELIF (waveform == WAVE.TRIANGLE) {
            // Shift up to full signed range: (tri * 2) - 0x7FFF
            raw_wave <= {tri_wave[14:0], 1'b0} - 16'h7FFF;
        } ELIF (waveform == WAVE.SAWTOOTH) {
            raw_wave <= {phase[23], phase[22:8]};
        } ELSE {
            raw_wave <= lfsr;
        }

        // Envelope scaling: wave * env[15:8] -> take upper 16 of 32
        env_byte <= env_value[15:8];
        env_ext  <= {8'd0, env_byte};
        vol_ext  <= {8'd0, volume};

        wave_x_env  <= smul(raw_wave, env_ext);
        scaled_wave <= wave_x_env[23:8];

        // Volume scaling: scaled_wave * volume -> take upper 16 of 32
        scaled_ext  <= smul(scaled_wave, vol_ext);

        // Output
        IF (enabled == 1'b1) {
            sample_out <= scaled_ext[23:8];
        } ELSE {
            sample_out <= 16'd0;
        }
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        key_prev <= key_on;

        IF (sample_tick == 1'b1) {
            // Phase accumulator
            IF (enabled == 1'b1) {
                phase <= phase + freq;
            }

            // Noise LFSR (Galois, taps at 16,14,13,11 -> XOR feedback)
            IF (waveform == WAVE.NOISE) {
                IF (lfsr[0] == 1'b1) {
                    lfsr <= {1'b0, lfsr[15:1]} ^ 16'hB400;
                } ELSE {
                    lfsr <= {1'b0, lfsr[15:1]};
                }
            }

            // ---- ADSR Envelope State Machine ----

            // Key-on rising edge -> start attack
            IF (key_on == 1'b1 && key_prev == 1'b0) {
                env_state <= ENV.ATTACK;
                env_value <= 16'd0;

            // Key-off -> start release
            } ELIF (key_on == 1'b0 && key_prev == 1'b1) {
                env_state <= ENV.RELEASE;

            } ELSE {
                // Envelope progression
                IF (env_state == ENV.ATTACK) {
                    IF (env_value + {8'd0, attack_rate} >= 16'hFF00) {
                        env_value <= 16'hFF00;
                        env_state <= ENV.DECAY;
                    } ELSE {
                        env_value <= env_value + {8'd0, attack_rate};
                    }

                } ELIF (env_state == ENV.DECAY) {
                    IF (env_value[15:8] <= sustain_level) {
                        env_value <= {sustain_level, 8'd0};
                        env_state <= ENV.SUSTAIN;
                    } ELIF (env_value < {8'd0, decay_rate}) {
                        env_value <= {sustain_level, 8'd0};
                        env_state <= ENV.SUSTAIN;
                    } ELSE {
                        env_value <= env_value - {8'd0, decay_rate};
                    }

                } ELIF (env_state == ENV.SUSTAIN) {
                    env_value <= {sustain_level, 8'd0};

                } ELIF (env_state == ENV.RELEASE) {
                    IF (env_value < {8'd0, release_rate}) {
                        env_value <= 16'd0;
                        env_state <= ENV.IDLE;
                    } ELSE {
                        env_value <= env_value - {8'd0, release_rate};
                    }
                }
            }
        }
    }
@endmod
jz
// Audio mixer - sums 8 channels into stereo output
// Accumulates signed 16-bit inputs with headroom, applies master volume,
// and clamps to signed 16-bit output range.
// Pan per channel: 0=full left, 128=center, 255=full right

@module aud_mixer
    CONST {
        NUM_CHANNELS = 8;
    }

    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;
        IN  [1]  sample_tick;
        IN  [8]  master_vol;     // Master volume 0-255
        // Channel inputs: concatenated [NUM_CHANNELS * 16] wide
        IN  [128] ch_samples;    // 8 channels x 16-bit signed
        IN  [64]  ch_pans;       // 8 channels x 8-bit pan (0=L, 128=center, 255=R)
        OUT [16] out_left;       // Mixed left channel
        OUT [16] out_right;      // Mixed right channel
        OUT [1]  out_valid;      // Pulse when outputs are ready
    }

    WIRE {
        // Individual channel extraction
        s0 [16]; s1 [16]; s2 [16]; s3 [16];
        s4 [16]; s5 [16]; s6 [16]; s7 [16];
        p0 [8]; p1 [8]; p2 [8]; p3 [8];
        p4 [8]; p5 [8]; p6 [8]; p7 [8];

        // Left gain = 255 - pan
        l0 [8]; l1 [8]; l2 [8]; l3 [8];
        l4 [8]; l5 [8]; l6 [8]; l7 [8];

        // Scaled samples left (smul 16x16 -> 32, take [23:8] for 16-bit)
        sl0 [32]; sl1 [32]; sl2 [32]; sl3 [32];
        sl4 [32]; sl5 [32]; sl6 [32]; sl7 [32];
        // Scaled samples right
        sr0 [32]; sr1 [32]; sr2 [32]; sr3 [32];
        sr4 [32]; sr5 [32]; sr6 [32]; sr7 [32];

        // Pan-extended to 16 bits for smul
        pe0 [16]; pe1 [16]; pe2 [16]; pe3 [16];
        pe4 [16]; pe5 [16]; pe6 [16]; pe7 [16];
        le0 [16]; le1 [16]; le2 [16]; le3 [16];
        le4 [16]; le5 [16]; le6 [16]; le7 [16];

        // Sign-extended to 20-bit for accumulation
        xl0 [20]; xl1 [20]; xl2 [20]; xl3 [20];
        xl4 [20]; xl5 [20]; xl6 [20]; xl7 [20];
        xr0 [20]; xr1 [20]; xr2 [20]; xr3 [20];
        xr4 [20]; xr5 [20]; xr6 [20]; xr7 [20];

        // Partial sums (pairwise to manage width)
        lsum01 [20]; lsum23 [20]; lsum45 [20]; lsum67 [20];
        rsum01 [20]; rsum23 [20]; rsum45 [20]; rsum67 [20];
        lsum03 [20]; lsum47 [20];
        rsum03 [20]; rsum47 [20];
        sum_left  [20];
        sum_right [20];

        // Master volume scaling
        mvol_ext     [20];
        left_scaled  [40];
        right_scaled [40];
        left_clamped  [16];
        right_clamped [16];
    }

    REGISTER {
        out_l_reg  [16] = 16'd0;
        out_r_reg  [16] = 16'd0;
        valid_reg  [1]  = 1'b0;
    }

    ASYNCHRONOUS {
        // Extract individual 16-bit samples
        s0 <= ch_samples[15:0];
        s1 <= ch_samples[31:16];
        s2 <= ch_samples[47:32];
        s3 <= ch_samples[63:48];
        s4 <= ch_samples[79:64];
        s5 <= ch_samples[95:80];
        s6 <= ch_samples[111:96];
        s7 <= ch_samples[127:112];

        // Extract pan values
        p0 <= ch_pans[7:0];     p1 <= ch_pans[15:8];
        p2 <= ch_pans[23:16];   p3 <= ch_pans[31:24];
        p4 <= ch_pans[39:32];   p5 <= ch_pans[47:40];
        p6 <= ch_pans[55:48];   p7 <= ch_pans[63:56];

        // Left gain = 255 - pan
        l0 <= 8'hFF - p0; l1 <= 8'hFF - p1;
        l2 <= 8'hFF - p2; l3 <= 8'hFF - p3;
        l4 <= 8'hFF - p4; l5 <= 8'hFF - p5;
        l6 <= 8'hFF - p6; l7 <= 8'hFF - p7;

        // Zero-extend pan/gain to 16-bit for smul
        pe0 <= {8'd0, p0}; pe1 <= {8'd0, p1};
        pe2 <= {8'd0, p2}; pe3 <= {8'd0, p3};
        pe4 <= {8'd0, p4}; pe5 <= {8'd0, p5};
        pe6 <= {8'd0, p6}; pe7 <= {8'd0, p7};
        le0 <= {8'd0, l0}; le1 <= {8'd0, l1};
        le2 <= {8'd0, l2}; le3 <= {8'd0, l3};
        le4 <= {8'd0, l4}; le5 <= {8'd0, l5};
        le6 <= {8'd0, l6}; le7 <= {8'd0, l7};

        // Scale each channel by left/right gain
        sl0 <= smul(s0, le0); sl1 <= smul(s1, le1);
        sl2 <= smul(s2, le2); sl3 <= smul(s3, le3);
        sl4 <= smul(s4, le4); sl5 <= smul(s5, le5);
        sl6 <= smul(s6, le6); sl7 <= smul(s7, le7);

        sr0 <= smul(s0, pe0); sr1 <= smul(s1, pe1);
        sr2 <= smul(s2, pe2); sr3 <= smul(s3, pe3);
        sr4 <= smul(s4, pe4); sr5 <= smul(s5, pe5);
        sr6 <= smul(s6, pe6); sr7 <= smul(s7, pe7);

        // Sign-extend 16-bit pan-scaled results to 20-bit
        xl0 <= {sl0[23], sl0[23], sl0[23], sl0[23], sl0[23:8]};
        xl1 <= {sl1[23], sl1[23], sl1[23], sl1[23], sl1[23:8]};
        xl2 <= {sl2[23], sl2[23], sl2[23], sl2[23], sl2[23:8]};
        xl3 <= {sl3[23], sl3[23], sl3[23], sl3[23], sl3[23:8]};
        xl4 <= {sl4[23], sl4[23], sl4[23], sl4[23], sl4[23:8]};
        xl5 <= {sl5[23], sl5[23], sl5[23], sl5[23], sl5[23:8]};
        xl6 <= {sl6[23], sl6[23], sl6[23], sl6[23], sl6[23:8]};
        xl7 <= {sl7[23], sl7[23], sl7[23], sl7[23], sl7[23:8]};

        xr0 <= {sr0[23], sr0[23], sr0[23], sr0[23], sr0[23:8]};
        xr1 <= {sr1[23], sr1[23], sr1[23], sr1[23], sr1[23:8]};
        xr2 <= {sr2[23], sr2[23], sr2[23], sr2[23], sr2[23:8]};
        xr3 <= {sr3[23], sr3[23], sr3[23], sr3[23], sr3[23:8]};
        xr4 <= {sr4[23], sr4[23], sr4[23], sr4[23], sr4[23:8]};
        xr5 <= {sr5[23], sr5[23], sr5[23], sr5[23], sr5[23:8]};
        xr6 <= {sr6[23], sr6[23], sr6[23], sr6[23], sr6[23:8]};
        xr7 <= {sr7[23], sr7[23], sr7[23], sr7[23], sr7[23:8]};

        // Pairwise sum to accumulate
        lsum01 <= xl0 + xl1; lsum23 <= xl2 + xl3;
        lsum45 <= xl4 + xl5; lsum67 <= xl6 + xl7;
        rsum01 <= xr0 + xr1; rsum23 <= xr2 + xr3;
        rsum45 <= xr4 + xr5; rsum67 <= xr6 + xr7;

        lsum03 <= lsum01 + lsum23;
        lsum47 <= lsum45 + lsum67;
        rsum03 <= rsum01 + rsum23;
        rsum47 <= rsum45 + rsum67;

        sum_left  <= lsum03 + lsum47;
        sum_right <= rsum03 + rsum47;

        // Apply master volume (zero-extend to 20 for smul)
        mvol_ext <= {12'd0, master_vol};
        left_scaled  <= smul(sum_left, mvol_ext);
        right_scaled <= smul(sum_right, mvol_ext);

        // Clamp to signed 16-bit range after >>8
        IF (left_scaled[27] == 1'b0 && left_scaled[27:23] != 5'b00000) {
            left_clamped <= 16'h7FFF;
        } ELIF (left_scaled[27] == 1'b1 && left_scaled[27:23] != 5'b11111) {
            left_clamped <= 16'h8001;
        } ELSE {
            left_clamped <= left_scaled[23:8];
        }

        IF (right_scaled[27] == 1'b0 && right_scaled[27:23] != 5'b00000) {
            right_clamped <= 16'h7FFF;
        } ELIF (right_scaled[27] == 1'b1 && right_scaled[27:23] != 5'b11111) {
            right_clamped <= 16'h8001;
        } ELSE {
            right_clamped <= right_scaled[23:8];
        }

        out_left  <= out_l_reg;
        out_right <= out_r_reg;
        out_valid <= valid_reg;
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        IF (sample_tick == 1'b1) {
            out_l_reg <= left_clamped;
            out_r_reg <= right_clamped;
            valid_reg <= 1'b1;
        } ELSE {
            valid_reg <= 1'b0;
        }
    }
@endmod
jz
// Terminal Framebuffer with Command Register Interface
// Dual BSRAM banks: cpu-side (sys_clk r/w) + video-side (pixel_clk read)
// CPU writes are mirrored to both banks. Hardware scroll/clear operate internally.
//
// Register map (base 0x5000_0000, ADDR[5:2] selects register):
//   +0x00  CELL_ADDR   [R/W]  12-bit cell index for read/write
//   +0x04  CELL_CHAR   [R/W]  8-bit character at CELL_ADDR
//   +0x08  CELL_ATTR   [R/W]  32-bit {BG[15:0], FG[15:0]} at CELL_ADDR
//   +0x0C  FILL_CHAR   [R/W]  8-bit character for clear/scroll fill
//   +0x10  FILL_ATTR   [R/W]  32-bit attribute for clear/scroll fill
//   +0x14  TERM_COLS   [R/W]  8-bit terminal columns (80 or 120)
//   +0x18  TERM_CELLS  [R/W]  12-bit total cells (cols * rows)
//   +0x1C  COMMAND     [W]    1=CLEAR, 2=SCROLL_UP
//   +0x20  STATUS      [R]    bit 0 = BUSY
//   +0x24  CURSOR      [R/W]  bits[14:3]=cell position, bits[2:0]=style
//
// Cursor styles: 0=none 1=line 2=block 3=line-blink 4=block-blink
//
// Write CELL_ADDR first, then read/write CELL_CHAR/CELL_ATTR.
// BSRAM read result is latched; reads return latched value immediately.
@module terminal_fb
    PORT {
        IN  [1]  clk;
        IN  [1]  rst_n;
        IN  [1]  pixel_clk;
        BUS SIMPLE_BUS TARGET pbus;

        // Video read interface (pixel_clk domain)
        IN  [12] vram_addr;
        OUT [8]  vram_char;
        OUT [32] vram_attr;

        // Cursor info for video pipeline
        OUT [12] cursor_pos;
        OUT [3]  cursor_style;
    }

    // CPU-side memories (both ports in sys_clk domain)
    MEM(TYPE=BLOCK) {
        char_cpu [8] [4096] = 8'h00 {
            OUT cpu_char_rd SYNC;
            IN  cpu_char_wr;
        };
    }

    MEM(TYPE=BLOCK) {
        attr_cpu [32] [4096] = 32'h00000000 {
            OUT cpu_attr_rd SYNC;
            IN  cpu_attr_wr;
        };
    }

    // Video-side memories (read on pixel_clk, write on sys_clk)
    MEM(TYPE=BLOCK) {
        char_vid [8] [4096] = 8'h00 {
            OUT vid_char_rd SYNC;
            IN  vid_char_wr;
        };
    }

    MEM(TYPE=BLOCK) {
        attr_vid [32] [4096] = 32'h00000000 {
            OUT vid_attr_rd SYNC;
            IN  vid_attr_wr;
        };
    }

    WIRE {
        // Combinational memory operation signals
        char_wr_en  [1];
        attr_wr_en  [1];
        mem_wr_addr [12];
        mem_wr_char [8];
        mem_wr_attr [32];
        mem_rd_addr [12];

        // FSM next-state signals
        next_state   [3];
        next_counter [12];
        next_src     [12];

        // Bus decode helpers
        bus_reg      [4];
        is_bus_wr    [1];
        is_bus_rd    [1];

        // FSM state decode
        is_idle      [1];
        is_clear     [1];
        is_scr_rd    [1];
        is_scr_wr    [1];
        is_scr_fill  [1];
        at_last_cell [1];
        at_last_src  [1];

        // Bus read data mux
        read_data    [32];
        read_reg_sel [4];
    }

    REGISTER {
        // Register file
        cell_addr_reg  [12] = 12'b0;
        fill_char_reg  [8]  = 8'h20;
        fill_attr_reg  [32] = 32'h0000FFFF;
        term_cols_reg  [8]  = 8'd80;
        term_cells_reg [12] = 12'd1760;
        cursor_reg     [15] = 15'b0;

        // Latched BSRAM read results
        char_latch     [8]  = 8'h00;
        attr_latch     [32] = 32'h00000000;

        // Latched video read results (pixel_clk domain)
        vid_char_latch [8]  = 8'h00;
        vid_attr_latch [32] = 32'h00000000;

        // FSM: 0=IDLE, 1=CLEAR, 2=SCROLL_READ, 3=SCROLL_WRITE, 4=SCROLL_FILL, 5=SCROLL_LATCH
        fsm_state      [3]  = 3'b0;
        fsm_counter    [12] = 12'b0;
        fsm_src        [12] = 12'b0;
    }

    ASYNCHRONOUS {
        // Video read outputs
        vram_char <= vid_char_latch;
        vram_attr <= vid_attr_latch;

        // Cursor outputs
        cursor_pos   <= cursor_reg[14:3];
        cursor_style <= cursor_reg[2:0];

        // FSM state decode
        is_idle     <= (fsm_state == 3'd0) ? 1'b1 : 1'b0;
        is_clear    <= (fsm_state == 3'd1) ? 1'b1 : 1'b0;
        is_scr_rd   <= (fsm_state == 3'd2) ? 1'b1 : 1'b0;
        is_scr_wr   <= (fsm_state == 3'd3) ? 1'b1 : 1'b0;
        is_scr_fill <= (fsm_state == 3'd4) ? 1'b1 : 1'b0;
        // State 5: SCROLL_LATCH — wait cycle for BSRAM read data to arrive
        at_last_cell <= (fsm_counter == term_cells_reg - 12'd1) ? 1'b1 : 1'b0;
        at_last_src  <= (fsm_src == term_cells_reg - 12'd1) ? 1'b1 : 1'b0;

        // Bus decode
        bus_reg   <= pbus.ADDR[5:2];
        is_bus_wr <= (fsm_state == 3'd0 && pbus.VALID == 1'b1 && pbus.CMD == CMD.WRITE) ? 1'b1 : 1'b0;
        is_bus_rd <= (fsm_state == 3'd0 && pbus.VALID == 1'b1 && pbus.CMD == CMD.READ) ? 1'b1 : 1'b0;

        // ---- Separate char/attr write enables ----
        // FSM ops write both; bus CELL_CHAR writes char only; bus CELL_ATTR writes attr only
        char_wr_en <= (is_clear == 1'b1 || is_scr_wr == 1'b1 || is_scr_fill == 1'b1 ||
                       (is_bus_wr == 1'b1 && bus_reg == 4'd1))
                      ? 1'b1 : 1'b0;
        attr_wr_en <= (is_clear == 1'b1 || is_scr_wr == 1'b1 || is_scr_fill == 1'b1 ||
                       (is_bus_wr == 1'b1 && bus_reg == 4'd2))
                      ? 1'b1 : 1'b0;

        // ---- Memory write address ----
        mem_wr_addr <= (is_clear == 1'b1 || is_scr_wr == 1'b1 || is_scr_fill == 1'b1)
                       ? fsm_counter : cell_addr_reg;

        // ---- Memory write character ----
        mem_wr_char <= (is_clear == 1'b1 || is_scr_fill == 1'b1) ? fill_char_reg :
                       (is_scr_wr == 1'b1) ? char_latch :
                       (is_bus_wr == 1'b1 && bus_reg == 4'd1) ? pbus.DATA[7:0] :
                       char_latch;

        // ---- Memory write attribute ----
        mem_wr_attr <= (is_clear == 1'b1 || is_scr_fill == 1'b1) ? fill_attr_reg :
                       (is_scr_wr == 1'b1) ? attr_latch :
                       (is_bus_wr == 1'b1 && bus_reg == 4'd2) ? pbus.DATA :
                       attr_latch;

        // ---- Memory read address ----
        mem_rd_addr <= (is_scr_rd == 1'b1) ? fsm_src :
                       (is_bus_wr == 1'b1 && bus_reg == 4'd0) ? pbus.DATA[11:0] :
                       cell_addr_reg;

        // ---- FSM next state ----
        // State flow: SCROLL_READ(2) → SCROLL_LATCH(5) → SCROLL_WRITE(3) → loop or SCROLL_FILL(4)
        next_state <= (is_clear == 1'b1 && at_last_cell == 1'b1) ? 3'd0 :
                      (is_clear == 1'b1) ? 3'd1 :
                      (is_scr_rd == 1'b1) ? 3'd5 :
                      (fsm_state == 3'd5) ? 3'd3 :
                      (is_scr_wr == 1'b1 && at_last_src == 1'b1) ? 3'd4 :
                      (is_scr_wr == 1'b1) ? 3'd2 :
                      (is_scr_fill == 1'b1 && at_last_cell == 1'b1) ? 3'd0 :
                      (is_scr_fill == 1'b1) ? 3'd4 :
                      (is_bus_wr == 1'b1 && bus_reg == 4'd7 && pbus.DATA[1:0] == 2'd1) ? 3'd1 :
                      (is_bus_wr == 1'b1 && bus_reg == 4'd7 && pbus.DATA[1:0] == 2'd2) ? 3'd2 :
                      fsm_state;

        // ---- FSM next counter ----
        next_counter <= (is_clear == 1'b1 && at_last_cell == 1'b1) ? 12'd0 :
                        (is_clear == 1'b1) ? fsm_counter + 12'd1 :
                        (is_scr_wr == 1'b1) ? fsm_counter + 12'd1 :
                        (is_scr_fill == 1'b1 && at_last_cell == 1'b1) ? 12'd0 :
                        (is_scr_fill == 1'b1) ? fsm_counter + 12'd1 :
                        (is_bus_wr == 1'b1 && bus_reg == 4'd7) ? 12'd0 :
                        fsm_counter;

        // ---- FSM next source ----
        next_src <= (is_scr_wr == 1'b1) ? fsm_src + 12'd1 :
                    (is_bus_wr == 1'b1 && bus_reg == 4'd7 && pbus.DATA[1:0] == 2'd2)
                        ? {4'b0, term_cols_reg} :
                    fsm_src;

        // ---- Bus read data mux ----
        read_reg_sel <= pbus.ADDR[5:2];

        SELECT(read_reg_sel) {
            CASE 4'd0 {
                read_data <= {20'b0, cell_addr_reg};
            }
            CASE 4'd1 {
                read_data <= {24'b0, char_latch};
            }
            CASE 4'd2 {
                read_data <= attr_latch;
            }
            CASE 4'd3 {
                read_data <= {24'b0, fill_char_reg};
            }
            CASE 4'd4 {
                read_data <= fill_attr_reg;
            }
            CASE 4'd5 {
                read_data <= {24'b0, term_cols_reg};
            }
            CASE 4'd6 {
                read_data <= {20'b0, term_cells_reg};
            }
            CASE 4'd8 {
                read_data <= {31'b0, is_idle == 1'b0};
            }
            CASE 4'd9 {
                read_data <= {17'b0, cursor_reg};
            }
            DEFAULT {
                read_data <= 32'b0;
            }
        }

        // ---- Bus response ----
        pbus.DATA <= (pbus.VALID && pbus.CMD == CMD.READ) ? read_data : 32'bz;

        IF (pbus.VALID) {
            pbus.DONE <= 1'b1;
        } ELSE {
            pbus.DONE <= 1'bz;
        }
    }

    // CPU port (sys_clk domain)
    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        // FSM state update
        fsm_state   <= next_state;
        fsm_counter <= next_counter;
        fsm_src     <= next_src;

        // Always latch BSRAM read data
        char_latch <= char_cpu.cpu_char_rd.data;
        attr_latch <= attr_cpu.cpu_attr_rd.data;

        // Single memory read address
        char_cpu.cpu_char_rd.addr <= mem_rd_addr;
        attr_cpu.cpu_attr_rd.addr <= mem_rd_addr;

        // Separate char/attr writes to avoid stale latch cross-contamination
        IF (char_wr_en == 1'b1) {
            char_cpu.cpu_char_wr[mem_wr_addr] <= mem_wr_char;
            char_vid.vid_char_wr[mem_wr_addr] <= mem_wr_char;
        }
        IF (attr_wr_en == 1'b1) {
            attr_cpu.cpu_attr_wr[mem_wr_addr] <= mem_wr_attr;
            attr_vid.vid_attr_wr[mem_wr_addr] <= mem_wr_attr;
        }

        // Register writes (from bus)
        IF (is_bus_wr == 1'b1 && bus_reg == 4'd0) {
            cell_addr_reg <= pbus.DATA[11:0];
        }
        IF (is_bus_wr == 1'b1 && bus_reg == 4'd3) {
            fill_char_reg <= pbus.DATA[7:0];
        }
        IF (is_bus_wr == 1'b1 && bus_reg == 4'd4) {
            fill_attr_reg <= pbus.DATA;
        }
        IF (is_bus_wr == 1'b1 && bus_reg == 4'd5) {
            term_cols_reg <= pbus.DATA[7:0];
        }
        IF (is_bus_wr == 1'b1 && bus_reg == 4'd6) {
            term_cells_reg <= pbus.DATA[11:0];
        }
        IF (is_bus_wr == 1'b1 && bus_reg == 4'd9) {
            cursor_reg <= pbus.DATA[14:0];
        }
    }

    // Video read port (pixel_clk domain)
    SYNCHRONOUS(CLK = pixel_clk RESET = rst_n RESET_ACTIVE = Low) {
        char_vid.vid_char_rd.addr <= vram_addr;
        attr_vid.vid_attr_rd.addr <= vram_addr;
        vid_char_latch <= char_vid.vid_char_rd.data;
        vid_attr_latch <= attr_vid.vid_attr_rd.data;
    }
@endmod
jz
// Simple 32-bit accumulator-based CPU
// Registers: A (accumulator), X (index) - 32-bit
//            SP (stack pointer), PC (program counter) - 16-bit
// Flags: Z (zero), C (carry), N (negative)
//
// Instruction format (32-bit fixed):
//   [31:24] opcode (8-bit)
//   [23:16] operand byte
//   [15:0]  immediate/address (16-bit)
//
// Memory map:
//   0x0000-0x0FFF  ROM (4096 words)
//   0x1000-0x1FFF  RAM (4096 words, stack at top)
//   0x2000         LED output
@module cpu
    CONST {
        START_PC = 0;
    }

    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        BUS SIMPLE_BUS SOURCE pbus;
    }

    REGISTER {
        // Program counter
        PC      [16] = lit(16, START_PC);

        // Registers
        reg_a   [32] = 32'h00000000;
        reg_x   [32] = 32'h00000000;

        // Stack pointer
        SP      [16] = 16'h1FFF;

        // Flags
        flag_z  [1] = 1'b0;
        flag_c  [1] = 1'b0;
        flag_n  [1] = 1'b0;

        // State machine
        state   [4] = STATE.FETCH;

        // Instruction register
        instr   [32] = 32'h00000000;

        // Bus control registers
        bus_addr  [16] = 16'h0000;
        bus_data  [32] = 32'h00000000;
        bus_cmd   [1]  = 1'b0;
        bus_valid [1]  = 1'b0;

        // Load/store destination tracking
        mem_dst   [1] = 1'b0;    // 0=A, 1=X
    }

    WIRE {
        // Decoded instruction fields
        opcode    [8];
        imm_addr  [16];
    }

    ASYNCHRONOUS {
        // Instruction decode
        opcode   = instr[31:24];
        imm_addr = instr[15:0];

        // Drive bus signals
        pbus.ADDR  <= bus_addr;
        pbus.DATA  <= (bus_valid == 1'b1 && bus_cmd == CMD.WRITE) ? bus_data : 32'bz;
        pbus.CMD   <= bus_cmd;
        pbus.VALID <= bus_valid;
    }

    SYNCHRONOUS(CLK = clk RESET = rst_n RESET_ACTIVE = Low) {
        IF (state == STATE.FETCH) {
            // Start instruction fetch from memory at PC
            bus_addr  <= PC;
            bus_cmd   <= CMD.READ;
            bus_valid <= 1'b1;
            state     <= STATE.WAIT_FETCH;

        } ELIF (state == STATE.WAIT_FETCH) {
            // Wait for memory to respond
            IF (pbus.DONE == 1'b1) {
                instr     <= pbus.DATA;
                bus_valid <= 1'b0;
                state     <= STATE.DECODE;
            }

        } ELIF (state == STATE.DECODE) {
            // Decode and execute simple instructions, or set up memory access
            IF (opcode == OP.NOP) {
                PC    <= PC + 16'h0001;
                state <= STATE.FETCH;

            } ELIF (opcode == OP.LDI_A) {
                // Load 16-bit immediate into A (zero-extended)
                reg_a  <= {16'h0000, imm_addr};
                flag_z <= (imm_addr == 16'h0000) ? 1'b1 : 1'b0;
                flag_n <= 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.LDI_X) {
                // Load 16-bit immediate into X (zero-extended)
                reg_x  <= {16'h0000, imm_addr};
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.LD_A) {
                // Start memory read for A
                bus_addr  <= imm_addr;
                bus_cmd   <= CMD.READ;
                bus_valid <= 1'b1;
                mem_dst   <= 1'b0;
                PC        <= PC + 16'h0001;
                state     <= STATE.MEM_WAIT;

            } ELIF (opcode == OP.LD_X) {
                // Start memory read for X
                bus_addr  <= imm_addr;
                bus_cmd   <= CMD.READ;
                bus_valid <= 1'b1;
                mem_dst   <= 1'b1;
                PC        <= PC + 16'h0001;
                state     <= STATE.MEM_WAIT;

            } ELIF (opcode == OP.ST_A) {
                // Start memory write from A
                bus_addr  <= imm_addr;
                bus_data  <= reg_a;
                bus_cmd   <= CMD.WRITE;
                bus_valid <= 1'b1;
                PC        <= PC + 16'h0001;
                state     <= STATE.MEM_WAIT;

            } ELIF (opcode == OP.ST_X) {
                // Start memory write from X
                bus_addr  <= imm_addr;
                bus_data  <= reg_x;
                bus_cmd   <= CMD.WRITE;
                bus_valid <= 1'b1;
                PC        <= PC + 16'h0001;
                state     <= STATE.MEM_WAIT;

            } ELIF (opcode == OP.ADD) {
                // A = A + X
                reg_a  <= reg_a[31:0] + reg_x[31:0];
                flag_z <= ((reg_a[31:0] + reg_x[31:0]) == 32'h00000000) ? 1'b1 : 1'b0;
                flag_n <= reg_a[31] ^ reg_x[31];
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.SUB) {
                // A = A - X
                reg_a  <= reg_a[31:0] - reg_x[31:0];
                flag_z <= ((reg_a[31:0] - reg_x[31:0]) == 32'h00000000) ? 1'b1 : 1'b0;
                flag_n <= (reg_a < reg_x) ? 1'b1 : 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.AND) {
                reg_a  <= reg_a & reg_x;
                flag_z <= ((reg_a & reg_x) == 32'h00000000) ? 1'b1 : 1'b0;
                flag_n <= 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.OR) {
                reg_a  <= reg_a | reg_x;
                flag_z <= ((reg_a | reg_x) == 32'h00000000) ? 1'b1 : 1'b0;
                flag_n <= 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.XOR) {
                reg_a  <= reg_a ^ reg_x;
                flag_z <= ((reg_a ^ reg_x) == 32'h00000000) ? 1'b1 : 1'b0;
                flag_n <= 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.CMP) {
                // Set flags from A - X (don't store result)
                flag_z <= (reg_a == reg_x) ? 1'b1 : 1'b0;
                flag_c <= (reg_a >= reg_x) ? 1'b1 : 1'b0;
                flag_n <= (reg_a < reg_x) ? 1'b1 : 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.JMP) {
                PC    <= imm_addr;
                state <= STATE.FETCH;

            } ELIF (opcode == OP.BEQ) {
                IF (flag_z == 1'b1) {
                    PC <= imm_addr;
                } ELSE {
                    PC <= PC + 16'h0001;
                }
                state <= STATE.FETCH;

            } ELIF (opcode == OP.BNE) {
                IF (flag_z == 1'b0) {
                    PC <= imm_addr;
                } ELSE {
                    PC <= PC + 16'h0001;
                }
                state <= STATE.FETCH;

            } ELIF (opcode == OP.INC) {
                reg_a  <= reg_a + 32'h00000001;
                flag_z <= ((reg_a + 32'h00000001) == 32'h00000000) ? 1'b1 : 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.DEC) {
                reg_a  <= reg_a - 32'h00000001;
                flag_z <= ((reg_a - 32'h00000001) == 32'h00000000) ? 1'b1 : 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.SHL) {
                flag_c <= reg_a[31];
                reg_a  <= {reg_a[30:0], 1'b0};
                flag_z <= ({reg_a[30:0], 1'b0} == 32'h00000000) ? 1'b1 : 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.SHR) {
                flag_c <= reg_a[0];
                reg_a  <= {1'b0, reg_a[31:1]};
                flag_z <= ({1'b0, reg_a[31:1]} == 32'h00000000) ? 1'b1 : 1'b0;
                PC     <= PC + 16'h0001;
                state  <= STATE.FETCH;

            } ELIF (opcode == OP.PUSH) {
                // Push A: write A to mem[SP], then SP -= 1
                bus_addr  <= SP;
                bus_data  <= reg_a;
                bus_cmd   <= CMD.WRITE;
                bus_valid <= 1'b1;
                state     <= STATE.PUSH_EXEC;

            } ELIF (opcode == OP.POP) {
                // Pop A: SP += 1, then read mem[SP]
                SP        <= SP + 16'h0001;
                bus_addr  <= SP + 16'h0001;
                bus_cmd   <= CMD.READ;
                bus_valid <= 1'b1;
                mem_dst   <= 1'b0;
                state     <= STATE.POP_EXEC;

            } ELIF (opcode == OP.CALL) {
                // Push PC+1, then jump to addr
                bus_addr  <= SP;
                bus_data  <= {16'h0000, PC + 16'h0001};
                bus_cmd   <= CMD.WRITE;
                bus_valid <= 1'b1;
                state     <= STATE.CALL_PUSH;

            } ELIF (opcode == OP.RET) {
                // Pop PC
                SP        <= SP + 16'h0001;
                bus_addr  <= SP + 16'h0001;
                bus_cmd   <= CMD.READ;
                bus_valid <= 1'b1;
                state     <= STATE.RET_POP;

            } ELIF (opcode == OP.HLT) {
                bus_valid <= 1'b0;
                state     <= STATE.HALT;

            } ELSE {
                // Unknown opcode: treat as NOP
                PC    <= PC + 16'h0001;
                state <= STATE.FETCH;
            }

        } ELIF (state == STATE.MEM_WAIT) {
            // Wait for memory read/write to complete
            IF (pbus.DONE == 1'b1) {
                bus_valid <= 1'b0;
                IF (bus_cmd == CMD.READ) {
                    // Load result into destination register
                    IF (mem_dst == 1'b0) {
                        reg_a  <= pbus.DATA;
                        flag_z <= (pbus.DATA == 32'h00000000) ? 1'b1 : 1'b0;
                        flag_n <= pbus.DATA[31];
                    } ELSE {
                        reg_x <= pbus.DATA;
                    }
                }
                state <= STATE.FETCH;
            }

        } ELIF (state == STATE.PUSH_EXEC) {
            // Wait for push write to complete
            IF (pbus.DONE == 1'b1) {
                bus_valid <= 1'b0;
                SP        <= SP - 16'h0001;
                PC        <= PC + 16'h0001;
                state     <= STATE.FETCH;
            }

        } ELIF (state == STATE.POP_EXEC) {
            // Wait for pop read to complete
            IF (pbus.DONE == 1'b1) {
                bus_valid <= 1'b0;
                reg_a     <= pbus.DATA;
                flag_z    <= (pbus.DATA == 32'h00000000) ? 1'b1 : 1'b0;
                flag_n    <= pbus.DATA[31];
                PC        <= PC + 16'h0001;
                state     <= STATE.FETCH;
            }

        } ELIF (state == STATE.CALL_PUSH) {
            // Wait for push of return address to complete
            IF (pbus.DONE == 1'b1) {
                bus_valid <= 1'b0;
                SP        <= SP - 16'h0001;
                PC        <= imm_addr;
                state     <= STATE.FETCH;
            }

        } ELIF (state == STATE.RET_POP) {
            // Wait for pop of return address to complete
            IF (pbus.DONE == 1'b1) {
                bus_valid <= 1'b0;
                PC        <= pbus.DATA[15:0];
                state     <= STATE.FETCH;
            }

        } ELIF (state == STATE.HALT) {
            // Stay halted
            state <= STATE.HALT;
        }
    }
@endmod
jz
@module por
    PORT {
        IN  [1] clk;
        IN  [1] done;
        OUT [1] por_n;
    }

    CONST {
        POR_CYCLES   = 16;
        POR_CNT_BITS = clog2(POR_CYCLES);
        POR_MAX      = POR_CYCLES - 1;
    }

    REGISTER {
        por_reg [1] = 1'b0;
        cnt     [POR_CNT_BITS] = POR_CNT_BITS'b0;
    }

    ASYNCHRONOUS {
        por_n <= por_reg;
    }

    SYNCHRONOUS(CLK = clk) {
        IF (done == 1'b0) {
            por_reg <= 1'b0;
            cnt <= POR_CNT_BITS'b0;
        } ELIF (cnt == lit(POR_CNT_BITS, POR_MAX)) {
            por_reg <= 1'b1;
            cnt <= cnt;
        } ELSE {
            por_reg <= 1'b0;
            cnt <= cnt + POR_CNT_BITS'b1;
        }
    }
@endmod

Clock Architecture

text
27 MHz crystal (SCLK)
  └─ PLL (IDIV=3, FBDIV=54, ODIV=2)
       └─ 371.25 MHz serial_clk
            ├─ CLKDIV (DIV_MODE=5)
            │    └─ 74.25 MHz sys_clk / pixel_clk
            └─ (sdram_clk = inverted sys_clk for DDR timing)

The sys_clk and pixel_clk are the same 74.25 MHz signal. The SDRAM clock is phase-inverted for proper setup/hold timing at the SDRAM chip.

JZ-HDL Language Features

BUS abstraction. Adding or removing a signal from SIMPLE_BUS requires changing only the bus definition — all modules using BUS SIMPLE_BUS SOURCE or TARGET automatically get the updated port list. In Verilog, this change ripples through every module's port list, every instantiation, and every wire declaration.

Tristate ownership proof. The compiler verifies at compile time that exactly one driver is active on the shared DATA bus at any moment. Each peripheral drives DATA only when selected by the arbiter. The arbiter's template-based address decoding provides the proof structure. In Verilog, tristate conflicts are only found during simulation — or on hardware.

Global constants. @global shares opcodes, state encodings, and bus commands across all modules without parameter threading. Every module that imports the global file sees the same CMD.READ, CMD.WRITE, and STATE.FETCH constants.

Mandatory reset values. Every register in the design has a declared initial value. The SDRAM control signals (r_cs_n, r_ras_n, etc.) reset to inactive (high for active-low signals). Forgetting a register in the reset block — a common Verilog bug that sends garbage commands to SDRAM during power-on — is impossible.

Template-based code generation. The arbiter uses three @template blocks to generate address matching, DONE collection, and signal routing for all 8 targets. Adding a ninth peripheral means adding one more entry to the config constant and one more @new instance — no template changes needed.