Skip to content

Ascon-128 Encryption

A hardware implementation of the Ascon-128 authenticated encryption cipher (NIST SP 800-232) on the Tang Nano 20K. The FPGA accepts plaintext over UART, encrypts or decrypts it using a 128-bit key and nonce, and returns the ciphertext with a 128-bit authentication tag. The design processes one permutation round per clock cycle with no large data buffers — blocks stream through as they arrive.

Architecture

                  ┌──────────────────────────────────┐
   UART RX ──────►│  ascon_top                       │
                  │                                  │
                  │  ┌──────────┐    ┌────────────┐  │
                  │  │ uart_rx  │───►│            │  │
                  │  └──────────┘    │  16-state  │  │
                  │                  │  protocol  │  │
                  │  ┌──────────┐    │  FSM       │  │
                  │  │ uart_tx  │◄───│            │  │
                  │  └──────────┘    │            │  │
                  │                  └─────┬──────┘  │
                  │                        │         │
                  │                  ┌─────▼──────┐  │
                  │                  │   ascon    │  │
                  │                  │  320-bit   │  │
                  │                  │  state     │  │
                  │                  │  engine    │  │
                  │                  └────────────┘  │
                  │                                  │
   UART TX ◄──────│  ┌──────────┐                    │
                  │  │   por    │  Power-on reset    │
                  │  └──────────┘                    │
                  └──────────────────────────────────┘

UART Protocol

The host communicates at 115200 baud, 8N1. Data is sent in 8-byte blocks with the host reading each block's response before sending the next, since the FPGA has no RX FIFO.

Encrypt request: 'E' + 16-byte key + 16-byte nonce + 1-byte length + data bytes

Decrypt request: 'D' + 16-byte key + 16-byte nonce + 1-byte length + data bytes, then 16-byte tag (sent after reading plaintext)

Response: Streamed output data (per block) + status byte ('K' or 'F') + 16-byte tag (encrypt only)

Modules

ascon

The core Ascon-128 AEAD engine. Maintains a 320-bit state split across five 64-bit registers (s0s4) plus a 128-bit key stored as k0/k1.

Permutation. The full Ascon permutation is computed combinationally in the ASYNCHRONOUS block as four chained stages: round constant addition + pre-XOR, chi-like S-box substitution, post-XOR with bit inversion, and linear diffusion via circular shifts. Each clock cycle in ST_PERM commits one round's result. Initialization and finalization use 12 rounds (a-rounds); intermediate blocks use 6 rounds (b-rounds, starting from round_cnt = 6).

Encrypt vs. decrypt. For intermediate full blocks, encrypt XORs the plaintext into s0 while decrypt replaces s0 entirely with the ciphertext. For partial last blocks, decrypt requires special handling — only the ciphertext bytes are replaced in s0, the padding byte is XORed at the correct position, and the remaining state bytes are preserved. This is implemented with a 7-way SELECT on din_partial_len.

Tag computation. After the final permutation, the tag is {s3 ^ k0, s4 ^ k1}. For decrypt, the engine also compares the computed tag against the expected tag_in.

ascon_top

The streaming UART interface with a 16-state protocol FSM. Handles key/nonce/data reception, block accumulation into a 64-bit shift register, padding (0x80 followed by zeros for partial blocks, or a separate empty padding block for full blocks), and interleaved transmission of output bytes.

The design uses only two 64-bit shift registers for data — one for block accumulation (blk_acc) and one for TX output (tx_shift). No large buffers are needed because blocks are processed and transmitted as they stream through.

uart_rx / uart_tx

Standard 8N1 UART modules at 115200 baud (27 MHz / 234 ≈ 115384 baud). The receiver includes a 2-stage metastability synchronizer and samples at the mid-bit point. The transmitter shifts data out LSB-first with a ready/valid handshake.

por

Power-on reset module. Waits for the FPGA's DONE signal, then counts 16 clock cycles before asserting por_n. Uses clog2() for the counter width — the compiler computes the minimum bit count at compile time.

Test Tool

A Python test script (tools/ascon_test.py) communicates with the FPGA over serial. It supports interactive encrypt/decrypt of strings or files (chunked into 255-byte segments with per-chunk nonce derivation), and a self-test mode that verifies encrypt-decrypt round-trips and tag rejection for five test vectors.

bash
# Self-test (5 vectors: empty, 1-block, partial, 2-block, multi+partial)
python3 tools/ascon_test.py /dev/ttyUSB0 selftest

# Encrypt a file
python3 tools/ascon_test.py /dev/ttyUSB0 encrypt myfile.txt > myfile.enc

# Decrypt it back
python3 tools/ascon_test.py /dev/ttyUSB0 decrypt myfile.enc > recovered.txt
jz
@project(CHIP="GW2AR-18-QN88-C8-I7") ASCON_CRYPTO
    @import "por.jz"
    @import "uart_rx.jz"
    @import "uart_tx.jz"
    @import "ascon.jz"
    @import "top.jz"

    CONFIG {
        CLK_MHZ = 27;
    }

    CLOCKS {
        SCLK = { period=37.04 }; // 27MHz clock
    }

    IN_PINS {
        SCLK    = { standard=LVCMOS33 };
        DONE    = { standard=LVCMOS33 };
        KEY[2]  = { standard=LVCMOS33 };
        UART_RX = { standard=LVCMOS33 };
    }

    OUT_PINS {
        LED[6]  = { standard=LVCMOS33, drive=8 };
        UART_TX = { standard=LVCMOS33, drive=8 };
    }

    MAP {
        // System Clock 27MHz
        SCLK = 4;

        // Buttons
        KEY[0] = 87;
        KEY[1] = 88;

        // LEDs (active low)
        LED[0] = 15;
        LED[1] = 16;
        LED[2] = 17;
        LED[3] = 18;
        LED[4] = 19;
        LED[5] = 20;

        // DONE (POR)
        DONE = IOR32B;

        // UART
        UART_RX = 70;
        UART_TX = 69;
    }

    @top ascon_top {
        IN  [1] clk     = SCLK;
        IN  [1] por     = DONE;
        IN  [1] rst_n   = ~KEY[0];
        IN  [1] uart_rx = UART_RX;
        OUT [1] uart_tx = UART_TX;
        OUT [6] leds    = ~LED;
    }
@endproj
jz
// Ascon-128 AEAD Engine (NIST SP 800-232)
// Iterative permutation: 1 round per clock cycle
// 320-bit state, 128-bit key/nonce/tag
@module ascon
    PORT {
        IN  [1]   clk;
        IN  [1]   rst_n;

        IN  [1]   cmd_start;
        IN  [1]   cmd;           // 0=encrypt, 1=decrypt
        IN  [128] key;
        IN  [128] nonce;

        IN  [64]  din;
        IN  [1]   din_valid;
        IN  [1]   din_last;
        IN  [1]   din_empty;
        IN  [3]   din_partial_len; // 0=full block, 1-7=partial byte count

        IN  [128] tag_in;

        OUT [64]  dout;
        OUT [1]   dout_valid;
        OUT [128] tag_out;
        OUT [1]   tag_valid;
        OUT [1]   done;
        OUT [1]   ready;
    }

    CONST {
        ST_IDLE      = 0;
        ST_PERM      = 1;
        ST_INIT_KEY  = 2;
        ST_WAIT      = 3;
        ST_FIN       = 4;
        ST_TAG       = 5;
    }

    REGISTER {
        state       [3]   = 3'd0;
        perm_next   [3]   = 3'd0;

        s0          [64]  = 64'h0000000000000000;
        s1          [64]  = 64'h0000000000000000;
        s2          [64]  = 64'h0000000000000000;
        s3          [64]  = 64'h0000000000000000;
        s4          [64]  = 64'h0000000000000000;

        k0          [64]  = 64'h0000000000000000;
        k1          [64]  = 64'h0000000000000000;

        round_cnt   [4]   = 4'd0;
        cmd_reg     [1]   = 1'b0;

        dout_reg    [64]  = 64'h0000000000000000;
        dout_v      [1]   = 1'b0;
        tag_reg     [128] = 128'h00000000000000000000000000000000;
        tag_v       [1]   = 1'b0;
        done_reg    [1]   = 1'b0;
        ready_reg   [1]   = 1'b1;
    }

    WIRE {
        rc [8];
        a0 [64]; a1 [64]; a2 [64]; a3 [64]; a4 [64];
        b0 [64]; b1 [64]; b2 [64]; b3 [64]; b4 [64];
        c0 [64]; c1 [64]; c2 [64]; c3 [64]; c4 [64];
        r0 [64]; r1 [64]; r2 [64]; r3 [64]; r4 [64];
        rc_ext [64];
    }

    ASYNCHRONOUS {
        dout       <= dout_reg;
        dout_valid <= dout_v;
        tag_out    <= tag_reg;
        tag_valid  <= tag_v;
        done       <= done_reg;
        ready      <= ready_reg;

        rc = { ~round_cnt, round_cnt };
        rc_ext <= { 56'h00000000000000, rc };

        a0 = s0 ^ s4;
        a1 = s1;
        a2 = (s2 ^ rc_ext) ^ s1;
        a3 = s3;
        a4 = s4 ^ s3;

        b0 = a0 ^ ((~a1) & a2);
        b1 = a1 ^ ((~a2) & a3);
        b2 = a2 ^ ((~a3) & a4);
        b3 = a3 ^ ((~a4) & a0);
        b4 = a4 ^ ((~a0) & a1);

        c0 = b0 ^ b4;
        c1 = b1 ^ b0;
        c2 = ~b2;
        c3 = b3 ^ (~b2);
        c4 = b4;

        r0 = c0 ^ { c0[18:0], c0[63:19] } ^ { c0[27:0], c0[63:28] };
        r1 = c1 ^ { c1[60:0], c1[63:61] } ^ { c1[38:0], c1[63:39] };
        r2 = c2 ^ { c2[0],    c2[63:1]  } ^ { c2[5:0],  c2[63:6]  };
        r3 = c3 ^ { c3[9:0],  c3[63:10] } ^ { c3[16:0], c3[63:17] };
        r4 = c4 ^ { c4[6:0],  c4[63:7]  } ^ { c4[40:0], c4[63:41] };
    }

    SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
        SELECT (state) {
            CASE (lit(3, ST_IDLE)) {
                dout_v   <= 1'b0;
                done_reg <= 1'b0;
                IF (cmd_start == 1'b1) {
                    k0 <= key[127:64];
                    k1 <= key[63:0];
                    cmd_reg <= cmd;
                    s0 <= 64'h80400c0600000000;
                    s1 <= key[127:64];
                    s2 <= key[63:0];
                    s3 <= nonce[127:64];
                    s4 <= nonce[63:0];
                    round_cnt <= 4'd0;
                    perm_next <= lit(3, ST_INIT_KEY);
                    state <= lit(3, ST_PERM);
                    ready_reg <= 1'b0;
                } ELSE {
                    ready_reg <= 1'b1;
                }
            }

            CASE (lit(3, ST_PERM)) {
                dout_v   <= 1'b0;
                done_reg <= 1'b0;
                ready_reg <= 1'b0;
                s0 <= r0;
                s1 <= r1;
                s2 <= r2;
                s3 <= r3;
                s4 <= r4;
                IF (round_cnt == 4'd11) {
                    state <= perm_next;
                } ELSE {
                    round_cnt <= round_cnt + 4'd1;
                }
            }

            CASE (lit(3, ST_INIT_KEY)) {
                dout_v   <= 1'b0;
                done_reg <= 1'b0;
                ready_reg <= 1'b1;
                s3 <= s3 ^ k0;
                s4 <= s4 ^ k1 ^ 64'h0000000000000001;
                state <= lit(3, ST_WAIT);
            }

            CASE (lit(3, ST_WAIT)) {
                done_reg <= 1'b0;
                IF (din_valid == 1'b1) {
                    IF (din_last == 1'b1 && din_empty == 1'b1) {
                        s0 <= s0 ^ 64'h8000000000000000;
                        dout_v <= 1'b0;
                        state <= lit(3, ST_FIN);
                        ready_reg <= 1'b0;
                    } ELIF (din_last == 1'b1) {
                        dout_reg <= s0 ^ din;
                        IF (cmd_reg == 1'b0) {
                            // Encrypt: s0 ^= {plaintext, 0x80, zeros}
                            s0 <= s0 ^ din;
                        } ELSE {
                            // Decrypt partial: replace CT bytes, XOR pad, keep tail
                            SELECT (din_partial_len) {
                                CASE (3'd1) {
                                    s0 <= { din[63:56], s0[55:48] ^ 8'h80, s0[47:0] };
                                }
                                CASE (3'd2) {
                                    s0 <= { din[63:48], s0[47:40] ^ 8'h80, s0[39:0] };
                                }
                                CASE (3'd3) {
                                    s0 <= { din[63:40], s0[39:32] ^ 8'h80, s0[31:0] };
                                }
                                CASE (3'd4) {
                                    s0 <= { din[63:32], s0[31:24] ^ 8'h80, s0[23:0] };
                                }
                                CASE (3'd5) {
                                    s0 <= { din[63:24], s0[23:16] ^ 8'h80, s0[15:0] };
                                }
                                CASE (3'd6) {
                                    s0 <= { din[63:16], s0[15:8] ^ 8'h80, s0[7:0] };
                                }
                                CASE (3'd7) {
                                    s0 <= { din[63:8], s0[7:0] ^ 8'h80 };
                                }
                                DEFAULT {
                                    // Full block (shouldn't reach here)
                                    s0 <= din;
                                }
                            }
                        }
                        dout_v <= 1'b1;
                        state <= lit(3, ST_FIN);
                        ready_reg <= 1'b0;
                    } ELSE {
                        IF (cmd_reg == 1'b0) {
                            dout_reg <= s0 ^ din;
                            s0 <= s0 ^ din;
                        } ELSE {
                            dout_reg <= s0 ^ din;
                            s0 <= din;
                        }
                        dout_v <= 1'b1;
                        round_cnt <= 4'd6;
                        perm_next <= lit(3, ST_WAIT);
                        state <= lit(3, ST_PERM);
                        ready_reg <= 1'b0;
                    }
                } ELSE {
                    dout_v <= 1'b0;
                    ready_reg <= 1'b1;
                }
            }

            CASE (lit(3, ST_FIN)) {
                dout_v   <= 1'b0;
                done_reg <= 1'b0;
                ready_reg <= 1'b0;
                s1 <= s1 ^ k0;
                s2 <= s2 ^ k1;
                round_cnt <= 4'd0;
                perm_next <= lit(3, ST_TAG);
                state <= lit(3, ST_PERM);
            }

            CASE (lit(3, ST_TAG)) {
                dout_v <= 1'b0;
                ready_reg <= 1'b1;
                tag_reg <= { s3 ^ k0, s4 ^ k1 };
                IF (cmd_reg == 1'b1) {
                    IF ({ s3 ^ k0, s4 ^ k1 } == tag_in) {
                        tag_v <= 1'b1;
                    } ELSE {
                        tag_v <= 1'b0;
                    }
                } ELSE {
                    tag_v <= 1'b1;
                }
                done_reg <= 1'b1;
                state <= lit(3, ST_IDLE);
            }

            DEFAULT {
                dout_v   <= 1'b0;
                done_reg <= 1'b0;
                ready_reg <= 1'b1;
                state <= lit(3, ST_IDLE);
            }
        }
    }
@endmod
jz
// Ascon-128 UART Interface — streaming architecture
//
// Protocol (115200 8N1):
//   REQUEST:
//     Byte 0:      'E' (0x45) encrypt, 'D' (0x44) decrypt
//     Bytes 1-16:  128-bit key (MSB first)
//     Bytes 17-32: 128-bit nonce (MSB first)
//     Byte 33:     Data length in bytes (0-255)
//     Bytes 34+:   Data bytes
//     Decrypt only: 16 more bytes of expected tag
//
//   RESPONSE:
//     Bytes 0+:    Output data (same length as input, streamed)
//     Next byte:   'K' success, 'F' tag mismatch
//     Encrypt only: 16 bytes of tag appended
//
// Data is processed in 8-byte blocks as it streams through.
// No large buffer needed; only two 64-bit shift registers.
@module ascon_top
    PORT {
        IN  [1] clk;
        IN  [1] por;
        IN  [1] rst_n;
        IN  [1] uart_rx;
        OUT [1] uart_tx;
        OUT [6] leds;
    }

    WIRE {
        reset      [1];
        por_n      [1];
        rx_data    [8];
        rx_valid   [1];
        tx_ready   [1];
        ac_dout    [64];
        ac_dout_v  [1];
        ac_tag_out [128];
        ac_tag_ok  [1];
        ac_done    [1];
        ac_ready   [1];
    }

    CONST {
        S_RX_CMD     = 0;
        S_RX_KEY     = 1;
        S_RX_NONCE   = 2;
        S_RX_LEN     = 3;
        S_INIT       = 4;
        S_INIT_WAIT  = 5;
        S_RX_BLK     = 6;
        S_FEED       = 7;
        S_ASCON_WAIT = 8;
        S_TX_BLK     = 9;
        S_RX_TAG     = 10;
        S_DONE_WAIT  = 11;
        S_TX_STATUS  = 12;
        S_TX_TAG     = 13;
        S_TX_DONE    = 14;
        S_EMPTY_FEED = 15;
    }

    REGISTER {
        state       [4]   = 4'd0;
        cmd_reg     [1]   = 1'b0;

        // Key and nonce (shift in MSB first)
        key_sr      [128] = 128'h00000000000000000000000000000000;
        nonce_sr    [128] = 128'h00000000000000000000000000000000;

        // Data tracking
        data_len    [8]   = 8'd0;     // total data bytes
        bytes_done  [8]   = 8'd0;     // bytes received/processed so far
        byte_cnt    [8]   = 8'd0;     // general purpose byte counter

        // Block accumulator (shift in 8 bytes MSB first)
        blk_acc     [64]  = 64'h0000000000000000;
        blk_pos     [3]   = 3'd0;     // bytes in current block (0-7)

        // TX shift register
        tx_shift    [64]  = 64'h0000000000000000;
        tx_cnt      [4]   = 4'd0;     // bytes to send from current block

        // Ascon control registers
        ac_start_r  [1]   = 1'b0;
        ac_cmd_r    [1]   = 1'b0;
        ac_din_r    [64]  = 64'h0000000000000000;
        ac_din_v_r  [1]   = 1'b0;
        ac_din_last_r [1] = 1'b0;
        ac_din_empty_r [1] = 1'b0;
        ac_din_plen_r  [3] = 3'd0;

        // Tag
        tag_in_sr   [128] = 128'h00000000000000000000000000000000;
        tag_result  [128] = 128'h00000000000000000000000000000000;
        tag_ok_reg  [1]   = 1'b0;

        // TX byte and send
        tx_byte_r   [8]   = 8'h00;
        tx_send_r   [1]   = 1'b0;
        tx_tag_cnt  [5]   = 5'd0;

        // Partial block tracking
        is_last_blk [1]   = 1'b0;
        partial_len [3]   = 3'd0;     // bytes in last block if partial

        // Latch ascon done (it may pulse while we're receiving tag)
        done_latch  [1]   = 1'b0;

        // LED
        led_reg     [6]   = 6'b000000;
    }

    ASYNCHRONOUS {
        reset = por_n & rst_n;
        leds <= led_reg;
    }

    @new por0 por {
        IN  [1] clk   = clk;
        IN  [1] done  = por;
        OUT [1] por_n = por_n;
    }

    @new urx0 uart_rx {
        OVERRIDE {
            CLK_MHZ = CONFIG.CLK_MHZ;
        }
        IN  [1] clk   = clk;
        IN  [1] rst_n = reset;
        IN  [1] rx    = uart_rx;
        OUT [8] data  = rx_data;
        OUT [1] valid = rx_valid;
    }

    @new utx0 uart_tx {
        OVERRIDE {
            CLK_MHZ = CONFIG.CLK_MHZ;
        }
        IN  [1] clk   = clk;
        IN  [1] rst_n = reset;
        IN  [8] data  = tx_byte_r;
        IN  [1] valid = tx_send_r;
        OUT [1] ready = tx_ready;
        OUT [1] tx    = uart_tx;
    }

    @new ac0 ascon {
        IN  [1]   clk       = clk;
        IN  [1]   rst_n     = reset;
        IN  [1]   cmd_start = ac_start_r;
        IN  [1]   cmd       = ac_cmd_r;
        IN  [128] key       = key_sr;
        IN  [128] nonce     = nonce_sr;
        IN  [64]  din       = ac_din_r;
        IN  [1]   din_valid = ac_din_v_r;
        IN  [1]   din_last  = ac_din_last_r;
        IN  [1]   din_empty = ac_din_empty_r;
        IN  [3]   din_partial_len = ac_din_plen_r;
        IN  [128] tag_in    = tag_in_sr;
        OUT [64]  dout      = ac_dout;
        OUT [1]   dout_valid = ac_dout_v;
        OUT [128] tag_out   = ac_tag_out;
        OUT [1]   tag_valid = ac_tag_ok;
        OUT [1]   done      = ac_done;
        OUT [1]   ready     = ac_ready;
    }

    SYNCHRONOUS(CLK=clk RESET=reset RESET_ACTIVE=Low) {
        SELECT (state) {
            // ---- Receive command byte ----
            CASE (lit(4, S_RX_CMD)) {
                led_reg <= 6'b000001;
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (rx_valid == 1'b1) {
                    IF (rx_data == 8'h45) {
                        cmd_reg <= 1'b0;
                        byte_cnt <= 8'd0;
                        state <= lit(4, S_RX_KEY);
                    } ELIF (rx_data == 8'h44) {
                        cmd_reg <= 1'b1;
                        byte_cnt <= 8'd0;
                        state <= lit(4, S_RX_KEY);
                    }
                }
            }

            // ---- Receive 16-byte key ----
            CASE (lit(4, S_RX_KEY)) {
                led_reg <= 6'b000011;
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (rx_valid == 1'b1) {
                    key_sr <= { key_sr[119:0], rx_data };
                    IF (byte_cnt == 8'd15) {
                        byte_cnt <= 8'd0;
                        state <= lit(4, S_RX_NONCE);
                    } ELSE {
                        byte_cnt <= byte_cnt + 8'd1;
                    }
                }
            }

            // ---- Receive 16-byte nonce ----
            CASE (lit(4, S_RX_NONCE)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (rx_valid == 1'b1) {
                    nonce_sr <= { nonce_sr[119:0], rx_data };
                    IF (byte_cnt == 8'd15) {
                        state <= lit(4, S_RX_LEN);
                    } ELSE {
                        byte_cnt <= byte_cnt + 8'd1;
                    }
                }
            }

            // ---- Receive data length byte ----
            CASE (lit(4, S_RX_LEN)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (rx_valid == 1'b1) {
                    data_len <= rx_data;
                    bytes_done <= 8'd0;
                    blk_pos <= 3'd0;
                    blk_acc <= 64'h0000000000000000;
                    partial_len <= rx_data[2:0];
                    state <= lit(4, S_INIT);
                }
            }

            // ---- Start Ascon init ----
            CASE (lit(4, S_INIT)) {
                led_reg <= 6'b001111;
                ac_start_r <= 1'b1;
                ac_cmd_r <= cmd_reg;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                done_latch <= 1'b0;
                state <= lit(4, S_INIT_WAIT);
            }

            // ---- Wait for Ascon init to complete ----
            CASE (lit(4, S_INIT_WAIT)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (ac_ready == 1'b1) {
                    IF (data_len == 8'd0) {
                        // Empty message: send padding block
                        state <= lit(4, S_EMPTY_FEED);
                    } ELSE {
                        // Start receiving data
                        state <= lit(4, S_RX_BLK);
                    }
                }
            }

            // ---- Receive 8 data bytes into block accumulator ----
            CASE (lit(4, S_RX_BLK)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (rx_valid == 1'b1) {
                    blk_acc <= { blk_acc[55:0], rx_data };
                    bytes_done <= bytes_done + 8'd1;

                    IF (bytes_done + 8'd1 == data_len) {
                        // All data received
                        is_last_blk <= 1'b1;
                        blk_pos <= blk_pos + 3'd1;
                        state <= lit(4, S_FEED);
                    } ELIF (blk_pos == 3'd7) {
                        // Full 8-byte block ready
                        blk_pos <= 3'd0;
                        is_last_blk <= 1'b0;
                        state <= lit(4, S_FEED);
                    } ELSE {
                        blk_pos <= blk_pos + 3'd1;
                    }
                }
            }

            // ---- Feed block to Ascon ----
            CASE (lit(4, S_FEED)) {
                ac_start_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (is_last_blk == 1'b1 && blk_pos != 3'd0) {
                    // Partial last block: pad with 0x80 after last byte
                    SELECT (blk_pos) {
                        CASE (3'd1) {
                            ac_din_r <= { blk_acc[7:0],   8'h80, 48'h000000000000 };
                        }
                        CASE (3'd2) {
                            ac_din_r <= { blk_acc[15:0],  8'h80, 40'h0000000000 };
                        }
                        CASE (3'd3) {
                            ac_din_r <= { blk_acc[23:0],  8'h80, 32'h00000000 };
                        }
                        CASE (3'd4) {
                            ac_din_r <= { blk_acc[31:0],  8'h80, 24'h000000 };
                        }
                        CASE (3'd5) {
                            ac_din_r <= { blk_acc[39:0],  8'h80, 16'h0000 };
                        }
                        CASE (3'd6) {
                            ac_din_r <= { blk_acc[47:0],  8'h80, 8'h00 };
                        }
                        CASE (3'd7) {
                            ac_din_r <= { blk_acc[55:0],  8'h80 };
                        }
                        DEFAULT {
                            ac_din_r <= 64'h8000000000000000;
                        }
                    }
                    ac_din_v_r <= 1'b1;
                    ac_din_last_r <= 1'b1;
                    ac_din_empty_r <= 1'b0;
                    // Partial block: only send partial_len bytes in TX
                    tx_cnt <= { 1'b0, partial_len };
                    ac_din_plen_r <= partial_len;
                    state <= lit(4, S_ASCON_WAIT);
                } ELIF (is_last_blk == 1'b1) {
                    // Full last block: process normally, then need empty pad
                    ac_din_r <= blk_acc;
                    ac_din_v_r <= 1'b1;
                    ac_din_last_r <= 1'b0;
                    ac_din_empty_r <= 1'b0;
                    ac_din_plen_r <= 3'd0;
                    tx_cnt <= 4'd8;
                    state <= lit(4, S_ASCON_WAIT);
                } ELSE {
                    // Intermediate full block
                    ac_din_r <= blk_acc;
                    ac_din_v_r <= 1'b1;
                    ac_din_last_r <= 1'b0;
                    ac_din_empty_r <= 1'b0;
                    ac_din_plen_r <= 3'd0;
                    tx_cnt <= 4'd8;
                    state <= lit(4, S_ASCON_WAIT);
                }
            }

            // ---- Wait for Ascon, capture output, then TX ----
            CASE (lit(4, S_ASCON_WAIT)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (ac_dout_v == 1'b1) {
                    tx_shift <= ac_dout;
                    state <= lit(4, S_TX_BLK);
                }
            }

            // ---- TX: send bytes from shift register ----
            CASE (lit(4, S_TX_BLK)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                IF (tx_ready == 1'b1 && tx_send_r == 1'b0) {
                    tx_byte_r <= tx_shift[63:56];
                    tx_shift <= { tx_shift[55:0], 8'h00 };
                    tx_send_r <= 1'b1;
                    IF (ac_done == 1'b1) {
                        done_latch <= 1'b1;
                    }
                    IF (tx_cnt == 4'd1) {
                        // Last byte of this block sent
                        IF (is_last_blk == 1'b1 && partial_len != 3'd0) {
                            // Partial last block was already din_last
                            // Wait for ascon done
                            IF (cmd_reg == 1'b1) {
                                byte_cnt <= 8'd0;
                                state <= lit(4, S_RX_TAG);
                            } ELSE {
                                state <= lit(4, S_DONE_WAIT);
                            }
                        } ELIF (is_last_blk == 1'b1) {
                            // Full last block: need empty padding block
                            state <= lit(4, S_EMPTY_FEED);
                        } ELSE {
                            // More blocks to receive
                            blk_pos <= 3'd0;
                            blk_acc <= 64'h0000000000000000;
                            state <= lit(4, S_RX_BLK);
                        }
                    } ELSE {
                        tx_cnt <= tx_cnt - 4'd1;
                    }
                } ELSE {
                    tx_send_r <= 1'b0;
                    IF (ac_done == 1'b1) {
                        done_latch <= 1'b1;
                    }
                }
            }

            // ---- Feed empty padding block ----
            CASE (lit(4, S_EMPTY_FEED)) {
                ac_start_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (ac_ready == 1'b1) {
                    ac_din_r <= 64'h0000000000000000;
                    ac_din_v_r <= 1'b1;
                    ac_din_last_r <= 1'b1;
                    ac_din_empty_r <= 1'b1;
                    IF (cmd_reg == 1'b1) {
                        byte_cnt <= 8'd0;
                        state <= lit(4, S_RX_TAG);
                    } ELSE {
                        state <= lit(4, S_DONE_WAIT);
                    }
                } ELSE {
                    ac_din_v_r <= 1'b0;
                    ac_din_last_r <= 1'b0;
                    ac_din_empty_r <= 1'b0;
                }
            }

            // ---- Receive 16-byte tag (decrypt only) ----
            CASE (lit(4, S_RX_TAG)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (ac_done == 1'b1) {
                    done_latch <= 1'b1;
                }
                IF (rx_valid == 1'b1) {
                    tag_in_sr <= { tag_in_sr[119:0], rx_data };
                    IF (byte_cnt == 8'd15) {
                        state <= lit(4, S_DONE_WAIT);
                    } ELSE {
                        byte_cnt <= byte_cnt + 8'd1;
                    }
                }
            }

            // ---- Wait for Ascon finalization ----
            CASE (lit(4, S_DONE_WAIT)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                IF (done_latch == 1'b1) {
                    // Done already fired while we were in S_RX_TAG or S_TX_BLK
                    // tag_out and tag_valid persist in ascon (not cleared in IDLE)
                    tag_result <= ac_tag_out;
                    tag_ok_reg <= ac_tag_ok;
                    tx_tag_cnt <= 5'd0;
                    done_latch <= 1'b0;
                    state <= lit(4, S_TX_STATUS);
                } ELIF (ac_done == 1'b1) {
                    tag_result <= ac_tag_out;
                    tag_ok_reg <= ac_tag_ok;
                    tx_tag_cnt <= 5'd0;
                    state <= lit(4, S_TX_STATUS);
                }
            }

            // ---- TX: send status byte ----
            CASE (lit(4, S_TX_STATUS)) {
                led_reg <= 6'b011111;
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                IF (tx_ready == 1'b1 && tx_send_r == 1'b0) {
                    IF (cmd_reg == 1'b1 && tag_result != tag_in_sr) {
                        tx_byte_r <= 8'h46;
                    } ELSE {
                        tx_byte_r <= 8'h4B;
                    }
                    tx_send_r <= 1'b1;
                    IF (cmd_reg == 1'b0) {
                        state <= lit(4, S_TX_TAG);
                    } ELSE {
                        state <= lit(4, S_TX_DONE);
                    }
                } ELSE {
                    tx_send_r <= 1'b0;
                }
            }

            // ---- TX: send 16-byte tag (encrypt only) ----
            CASE (lit(4, S_TX_TAG)) {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                IF (tx_ready == 1'b1 && tx_send_r == 1'b0) {
                    tx_byte_r <= tag_result[127:120];
                    tag_result <= { tag_result[119:0], 8'h00 };
                    tx_send_r <= 1'b1;
                    IF (tx_tag_cnt == 5'd15) {
                        state <= lit(4, S_TX_DONE);
                    } ELSE {
                        tx_tag_cnt <= tx_tag_cnt + 5'd1;
                    }
                } ELSE {
                    tx_send_r <= 1'b0;
                }
            }

            // ---- Done, return to idle ----
            CASE (lit(4, S_TX_DONE)) {
                led_reg <= 6'b000000;
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                state <= lit(4, S_RX_CMD);
            }

            DEFAULT {
                ac_start_r <= 1'b0;
                ac_din_v_r <= 1'b0;
                ac_din_last_r <= 1'b0;
                ac_din_empty_r <= 1'b0;
                tx_send_r <= 1'b0;
                state <= lit(4, S_RX_CMD);
            }
        }
    }
@endmod
jz
// Simple UART Receiver — 8N1, no FIFO
// Pulses valid for 1 cycle when a byte is received
@module uart_rx
    CONST {
        CLK_MHZ = 27;
        BAUD_DIV = (CLK_MHZ * 1000000 / 115200) - 1;
        HALF_BAUD = BAUD_DIV / 2;
    }

    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        IN  [1] rx;
        OUT [8] data;
        OUT [1] valid;
    }

    REGISTER {
        // Metastability synchronizer
        rx_sync1    [1] = 1'b1;
        rx_sync2    [1] = 1'b1;

        // State machine (0=IDLE, 1=START, 2=DATA, 3=STOP)
        state       [2] = 2'd0;
        baud_cnt    [16] = 16'd0;
        bit_cnt     [3] = 3'd0;
        shift       [8] = 8'h00;

        // Output
        data_out    [8] = 8'h00;
        valid_out   [1] = 1'b0;
    }

    ASYNCHRONOUS {
        data  <= data_out;
        valid <= valid_out;
    }

    SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
        // 2-stage synchronizer for async RX input
        rx_sync1 <= rx;
        rx_sync2 <= rx_sync1;

        SELECT (state) {
            CASE (2'd0) {
                // IDLE: wait for start bit (falling edge)
                valid_out <= 1'b0;
                IF (rx_sync2 == 1'b0) {
                    baud_cnt <= lit(16, HALF_BAUD);
                    state <= 2'd1;
                }
            }

            CASE (2'd1) {
                // START: verify start bit at mid-point
                valid_out <= 1'b0;
                IF (baud_cnt == 16'd0) {
                    IF (rx_sync2 == 1'b0) {
                        baud_cnt <= lit(16, BAUD_DIV);
                        bit_cnt <= 3'd0;
                        shift <= 8'h00;
                        state <= 2'd2;
                    } ELSE {
                        // False start
                        state <= 2'd0;
                    }
                } ELSE {
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            CASE (2'd2) {
                // DATA: sample 8 bits at mid-bit
                valid_out <= 1'b0;
                IF (baud_cnt == 16'd0) {
                    shift <= { rx_sync2, shift[7:1] };
                    IF (bit_cnt == 3'd7) {
                        baud_cnt <= lit(16, BAUD_DIV);
                        state <= 2'd3;
                    } ELSE {
                        bit_cnt <= bit_cnt + 3'd1;
                        baud_cnt <= lit(16, BAUD_DIV);
                    }
                } ELSE {
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            CASE (2'd3) {
                // STOP: wait for stop bit, output byte
                IF (baud_cnt == 16'd0) {
                    data_out <= shift;
                    valid_out <= 1'b1;
                    state <= 2'd0;
                } ELSE {
                    valid_out <= 1'b0;
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            DEFAULT {
                valid_out <= 1'b0;
                state <= 2'd0;
            }
        }
    }
@endmod
jz
// Simple UART Transmitter — 8N1, no FIFO
// Asserts ready when idle. When valid is pulsed with data, transmits one byte.
@module uart_tx
    CONST {
        CLK_MHZ = 27;
        BAUD_DIV = (CLK_MHZ * 1000000 / 115200) - 1;
    }

    PORT {
        IN  [1] clk;
        IN  [1] rst_n;
        IN  [8] data;
        IN  [1] valid;
        OUT [1] ready;
        OUT [1] tx;
    }

    REGISTER {
        // State machine (0=IDLE, 1=START, 2=DATA, 3=STOP)
        state     [2] = 2'd0;
        baud_cnt  [16] = 16'd0;
        bit_cnt   [3] = 3'd0;
        shift     [8] = 8'hFF;

        // Outputs
        tx_out    [1] = 1'b1;
        ready_out [1] = 1'b1;
    }

    ASYNCHRONOUS {
        tx    <= tx_out;
        ready <= ready_out;
    }

    SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
        SELECT (state) {
            CASE (2'd0) {
                // IDLE: line high, ready for data
                tx_out <= 1'b1;
                IF (valid == 1'b1) {
                    shift     <= data;
                    baud_cnt  <= lit(16, BAUD_DIV);
                    state     <= 2'd1;
                    ready_out <= 1'b0;
                } ELSE {
                    ready_out <= 1'b1;
                }
            }

            CASE (2'd1) {
                // START bit: hold TX low for one baud period
                tx_out    <= 1'b0;
                ready_out <= 1'b0;
                IF (baud_cnt == 16'd0) {
                    baud_cnt <= lit(16, BAUD_DIV);
                    bit_cnt  <= 3'd0;
                    state    <= 2'd2;
                } ELSE {
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            CASE (2'd2) {
                // DATA: shift out 8 bits LSB first
                tx_out    <= shift[0];
                ready_out <= 1'b0;
                IF (baud_cnt == 16'd0) {
                    shift <= { 1'b1, shift[7:1] };
                    IF (bit_cnt == 3'd7) {
                        baud_cnt <= lit(16, BAUD_DIV);
                        state    <= 2'd3;
                    } ELSE {
                        bit_cnt  <= bit_cnt + 3'd1;
                        baud_cnt <= lit(16, BAUD_DIV);
                    }
                } ELSE {
                    baud_cnt <= baud_cnt - 16'd1;
                }
            }

            CASE (2'd3) {
                // STOP bit: hold TX high for one baud period
                tx_out <= 1'b1;
                IF (baud_cnt == 16'd0) {
                    state     <= 2'd0;
                    ready_out <= 1'b1;
                } ELSE {
                    ready_out <= 1'b0;
                    baud_cnt  <= baud_cnt - 16'd1;
                }
            }

            DEFAULT {
                tx_out    <= 1'b1;
                ready_out <= 1'b1;
                state     <= 2'd0;
            }
        }
    }
@endmod
jz
@module por
    PORT {
        IN  [1] clk;
        IN  [1] done;
        OUT [1] por_n;
    }

    CONST {
        POR_CYCLES   = 16;
        POR_CNT_BITS = clog2(POR_CYCLES);
        POR_MAX      = POR_CYCLES - 1;
    }

    REGISTER {
        por_reg [1] = 1'b0;
        cnt     [POR_CNT_BITS] = POR_CNT_BITS'b0;
    }

    ASYNCHRONOUS {
        por_n <= por_reg;
    }

    SYNCHRONOUS(CLK=clk) {
        IF (done == 1'b0) {
            por_reg <= 1'b0;
            cnt <= POR_CNT_BITS'b0;
        } ELIF (cnt == lit(POR_CNT_BITS, POR_MAX)) {
            por_reg <= 1'b1;
            cnt <= cnt;
        } ELSE {
            por_reg <= 1'b0;
            cnt <= cnt + POR_CNT_BITS'b1;
        }
    }
@endmod

JZ-HDL Language Features

  • Combinational permutation in ASYNCHRONOUS. All four stages of the Ascon S-box and linear layer are expressed as chained wire assignments. The compiler verifies there are no combinational loops and that every wire is driven exactly once — the kind of bug that would silently produce wrong ciphertext in Verilog.

  • Width-safe concatenation and slicing. The padding logic, key splitting, and circular shifts use explicit widths throughout ({ din[63:24], s0[23:16] ^ 8'h80, s0[15:0] }). The compiler checks that every concatenation and slice produces exactly the declared width, preventing the off-by-one bit errors that plague hand-written cryptographic RTL.

  • clog2() compile-time evaluation. The POR module uses clog2(POR_CYCLES) to compute the counter width. The compiler evaluates this at compile time, ensuring the counter is exactly wide enough without manual calculation.

  • Single-driver enforcement. Each register and wire has exactly one driver. The Ascon engine's 320-bit state is modified in multiple FSM states, but the compiler verifies through control-flow analysis that no state can be driven from two places simultaneously — a guarantee that would require manual review in Verilog.