Ascon-128 Encryption
A hardware implementation of the Ascon-128 authenticated encryption cipher (NIST SP 800-232) on the Tang Nano 20K. The FPGA accepts plaintext over UART, encrypts or decrypts it using a 128-bit key and nonce, and returns the ciphertext with a 128-bit authentication tag. The design processes one permutation round per clock cycle with no large data buffers — blocks stream through as they arrive.
Architecture
┌──────────────────────────────────┐
UART RX ──────►│ ascon_top │
│ │
│ ┌──────────┐ ┌────────────┐ │
│ │ uart_rx │───►│ │ │
│ └──────────┘ │ 16-state │ │
│ │ protocol │ │
│ ┌──────────┐ │ FSM │ │
│ │ uart_tx │◄───│ │ │
│ └──────────┘ │ │ │
│ └─────┬──────┘ │
│ │ │
│ ┌─────▼──────┐ │
│ │ ascon │ │
│ │ 320-bit │ │
│ │ state │ │
│ │ engine │ │
│ └────────────┘ │
│ │
UART TX ◄──────│ ┌──────────┐ │
│ │ por │ Power-on reset │
│ └──────────┘ │
└──────────────────────────────────┘UART Protocol
The host communicates at 115200 baud, 8N1. Data is sent in 8-byte blocks with the host reading each block's response before sending the next, since the FPGA has no RX FIFO.
Encrypt request: 'E' + 16-byte key + 16-byte nonce + 1-byte length + data bytes
Decrypt request: 'D' + 16-byte key + 16-byte nonce + 1-byte length + data bytes, then 16-byte tag (sent after reading plaintext)
Response: Streamed output data (per block) + status byte ('K' or 'F') + 16-byte tag (encrypt only)
Modules
ascon
The core Ascon-128 AEAD engine. Maintains a 320-bit state split across five 64-bit registers (s0–s4) plus a 128-bit key stored as k0/k1.
Permutation. The full Ascon permutation is computed combinationally in the ASYNCHRONOUS block as four chained stages: round constant addition + pre-XOR, chi-like S-box substitution, post-XOR with bit inversion, and linear diffusion via circular shifts. Each clock cycle in ST_PERM commits one round's result. Initialization and finalization use 12 rounds (a-rounds); intermediate blocks use 6 rounds (b-rounds, starting from round_cnt = 6).
Encrypt vs. decrypt. For intermediate full blocks, encrypt XORs the plaintext into s0 while decrypt replaces s0 entirely with the ciphertext. For partial last blocks, decrypt requires special handling — only the ciphertext bytes are replaced in s0, the padding byte is XORed at the correct position, and the remaining state bytes are preserved. This is implemented with a 7-way SELECT on din_partial_len.
Tag computation. After the final permutation, the tag is {s3 ^ k0, s4 ^ k1}. For decrypt, the engine also compares the computed tag against the expected tag_in.
ascon_top
The streaming UART interface with a 16-state protocol FSM. Handles key/nonce/data reception, block accumulation into a 64-bit shift register, padding (0x80 followed by zeros for partial blocks, or a separate empty padding block for full blocks), and interleaved transmission of output bytes.
The design uses only two 64-bit shift registers for data — one for block accumulation (blk_acc) and one for TX output (tx_shift). No large buffers are needed because blocks are processed and transmitted as they stream through.
uart_rx / uart_tx
Standard 8N1 UART modules at 115200 baud (27 MHz / 234 ≈ 115384 baud). The receiver includes a 2-stage metastability synchronizer and samples at the mid-bit point. The transmitter shifts data out LSB-first with a ready/valid handshake.
por
Power-on reset module. Waits for the FPGA's DONE signal, then counts 16 clock cycles before asserting por_n. Uses clog2() for the counter width — the compiler computes the minimum bit count at compile time.
Test Tool
A Python test script (tools/ascon_test.py) communicates with the FPGA over serial. It supports interactive encrypt/decrypt of strings or files (chunked into 255-byte segments with per-chunk nonce derivation), and a self-test mode that verifies encrypt-decrypt round-trips and tag rejection for five test vectors.
bash
# Self-test (5 vectors: empty, 1-block, partial, 2-block, multi+partial)
python3 tools/ascon_test.py /dev/ttyUSB0 selftest
# Encrypt a file
python3 tools/ascon_test.py /dev/ttyUSB0 encrypt myfile.txt > myfile.enc
# Decrypt it back
python3 tools/ascon_test.py /dev/ttyUSB0 decrypt myfile.enc > recovered.txtjz
@project(CHIP="GW2AR-18-QN88-C8-I7") ASCON_CRYPTO
@import "por.jz"
@import "uart_rx.jz"
@import "uart_tx.jz"
@import "ascon.jz"
@import "top.jz"
CONFIG {
CLK_MHZ = 27;
}
CLOCKS {
SCLK = { period=37.04 }; // 27MHz clock
}
IN_PINS {
SCLK = { standard=LVCMOS33 };
DONE = { standard=LVCMOS33 };
KEY[2] = { standard=LVCMOS33 };
UART_RX = { standard=LVCMOS33 };
}
OUT_PINS {
LED[6] = { standard=LVCMOS33, drive=8 };
UART_TX = { standard=LVCMOS33, drive=8 };
}
MAP {
// System Clock 27MHz
SCLK = 4;
// Buttons
KEY[0] = 87;
KEY[1] = 88;
// LEDs (active low)
LED[0] = 15;
LED[1] = 16;
LED[2] = 17;
LED[3] = 18;
LED[4] = 19;
LED[5] = 20;
// DONE (POR)
DONE = IOR32B;
// UART
UART_RX = 70;
UART_TX = 69;
}
@top ascon_top {
IN [1] clk = SCLK;
IN [1] por = DONE;
IN [1] rst_n = ~KEY[0];
IN [1] uart_rx = UART_RX;
OUT [1] uart_tx = UART_TX;
OUT [6] leds = ~LED;
}
@endprojjz
// Ascon-128 AEAD Engine (NIST SP 800-232)
// Iterative permutation: 1 round per clock cycle
// 320-bit state, 128-bit key/nonce/tag
@module ascon
PORT {
IN [1] clk;
IN [1] rst_n;
IN [1] cmd_start;
IN [1] cmd; // 0=encrypt, 1=decrypt
IN [128] key;
IN [128] nonce;
IN [64] din;
IN [1] din_valid;
IN [1] din_last;
IN [1] din_empty;
IN [3] din_partial_len; // 0=full block, 1-7=partial byte count
IN [128] tag_in;
OUT [64] dout;
OUT [1] dout_valid;
OUT [128] tag_out;
OUT [1] tag_valid;
OUT [1] done;
OUT [1] ready;
}
CONST {
ST_IDLE = 0;
ST_PERM = 1;
ST_INIT_KEY = 2;
ST_WAIT = 3;
ST_FIN = 4;
ST_TAG = 5;
}
REGISTER {
state [3] = 3'd0;
perm_next [3] = 3'd0;
s0 [64] = 64'h0000000000000000;
s1 [64] = 64'h0000000000000000;
s2 [64] = 64'h0000000000000000;
s3 [64] = 64'h0000000000000000;
s4 [64] = 64'h0000000000000000;
k0 [64] = 64'h0000000000000000;
k1 [64] = 64'h0000000000000000;
round_cnt [4] = 4'd0;
cmd_reg [1] = 1'b0;
dout_reg [64] = 64'h0000000000000000;
dout_v [1] = 1'b0;
tag_reg [128] = 128'h00000000000000000000000000000000;
tag_v [1] = 1'b0;
done_reg [1] = 1'b0;
ready_reg [1] = 1'b1;
}
WIRE {
rc [8];
a0 [64]; a1 [64]; a2 [64]; a3 [64]; a4 [64];
b0 [64]; b1 [64]; b2 [64]; b3 [64]; b4 [64];
c0 [64]; c1 [64]; c2 [64]; c3 [64]; c4 [64];
r0 [64]; r1 [64]; r2 [64]; r3 [64]; r4 [64];
rc_ext [64];
}
ASYNCHRONOUS {
dout <= dout_reg;
dout_valid <= dout_v;
tag_out <= tag_reg;
tag_valid <= tag_v;
done <= done_reg;
ready <= ready_reg;
rc = { ~round_cnt, round_cnt };
rc_ext <= { 56'h00000000000000, rc };
a0 = s0 ^ s4;
a1 = s1;
a2 = (s2 ^ rc_ext) ^ s1;
a3 = s3;
a4 = s4 ^ s3;
b0 = a0 ^ ((~a1) & a2);
b1 = a1 ^ ((~a2) & a3);
b2 = a2 ^ ((~a3) & a4);
b3 = a3 ^ ((~a4) & a0);
b4 = a4 ^ ((~a0) & a1);
c0 = b0 ^ b4;
c1 = b1 ^ b0;
c2 = ~b2;
c3 = b3 ^ (~b2);
c4 = b4;
r0 = c0 ^ { c0[18:0], c0[63:19] } ^ { c0[27:0], c0[63:28] };
r1 = c1 ^ { c1[60:0], c1[63:61] } ^ { c1[38:0], c1[63:39] };
r2 = c2 ^ { c2[0], c2[63:1] } ^ { c2[5:0], c2[63:6] };
r3 = c3 ^ { c3[9:0], c3[63:10] } ^ { c3[16:0], c3[63:17] };
r4 = c4 ^ { c4[6:0], c4[63:7] } ^ { c4[40:0], c4[63:41] };
}
SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
SELECT (state) {
CASE (lit(3, ST_IDLE)) {
dout_v <= 1'b0;
done_reg <= 1'b0;
IF (cmd_start == 1'b1) {
k0 <= key[127:64];
k1 <= key[63:0];
cmd_reg <= cmd;
s0 <= 64'h80400c0600000000;
s1 <= key[127:64];
s2 <= key[63:0];
s3 <= nonce[127:64];
s4 <= nonce[63:0];
round_cnt <= 4'd0;
perm_next <= lit(3, ST_INIT_KEY);
state <= lit(3, ST_PERM);
ready_reg <= 1'b0;
} ELSE {
ready_reg <= 1'b1;
}
}
CASE (lit(3, ST_PERM)) {
dout_v <= 1'b0;
done_reg <= 1'b0;
ready_reg <= 1'b0;
s0 <= r0;
s1 <= r1;
s2 <= r2;
s3 <= r3;
s4 <= r4;
IF (round_cnt == 4'd11) {
state <= perm_next;
} ELSE {
round_cnt <= round_cnt + 4'd1;
}
}
CASE (lit(3, ST_INIT_KEY)) {
dout_v <= 1'b0;
done_reg <= 1'b0;
ready_reg <= 1'b1;
s3 <= s3 ^ k0;
s4 <= s4 ^ k1 ^ 64'h0000000000000001;
state <= lit(3, ST_WAIT);
}
CASE (lit(3, ST_WAIT)) {
done_reg <= 1'b0;
IF (din_valid == 1'b1) {
IF (din_last == 1'b1 && din_empty == 1'b1) {
s0 <= s0 ^ 64'h8000000000000000;
dout_v <= 1'b0;
state <= lit(3, ST_FIN);
ready_reg <= 1'b0;
} ELIF (din_last == 1'b1) {
dout_reg <= s0 ^ din;
IF (cmd_reg == 1'b0) {
// Encrypt: s0 ^= {plaintext, 0x80, zeros}
s0 <= s0 ^ din;
} ELSE {
// Decrypt partial: replace CT bytes, XOR pad, keep tail
SELECT (din_partial_len) {
CASE (3'd1) {
s0 <= { din[63:56], s0[55:48] ^ 8'h80, s0[47:0] };
}
CASE (3'd2) {
s0 <= { din[63:48], s0[47:40] ^ 8'h80, s0[39:0] };
}
CASE (3'd3) {
s0 <= { din[63:40], s0[39:32] ^ 8'h80, s0[31:0] };
}
CASE (3'd4) {
s0 <= { din[63:32], s0[31:24] ^ 8'h80, s0[23:0] };
}
CASE (3'd5) {
s0 <= { din[63:24], s0[23:16] ^ 8'h80, s0[15:0] };
}
CASE (3'd6) {
s0 <= { din[63:16], s0[15:8] ^ 8'h80, s0[7:0] };
}
CASE (3'd7) {
s0 <= { din[63:8], s0[7:0] ^ 8'h80 };
}
DEFAULT {
// Full block (shouldn't reach here)
s0 <= din;
}
}
}
dout_v <= 1'b1;
state <= lit(3, ST_FIN);
ready_reg <= 1'b0;
} ELSE {
IF (cmd_reg == 1'b0) {
dout_reg <= s0 ^ din;
s0 <= s0 ^ din;
} ELSE {
dout_reg <= s0 ^ din;
s0 <= din;
}
dout_v <= 1'b1;
round_cnt <= 4'd6;
perm_next <= lit(3, ST_WAIT);
state <= lit(3, ST_PERM);
ready_reg <= 1'b0;
}
} ELSE {
dout_v <= 1'b0;
ready_reg <= 1'b1;
}
}
CASE (lit(3, ST_FIN)) {
dout_v <= 1'b0;
done_reg <= 1'b0;
ready_reg <= 1'b0;
s1 <= s1 ^ k0;
s2 <= s2 ^ k1;
round_cnt <= 4'd0;
perm_next <= lit(3, ST_TAG);
state <= lit(3, ST_PERM);
}
CASE (lit(3, ST_TAG)) {
dout_v <= 1'b0;
ready_reg <= 1'b1;
tag_reg <= { s3 ^ k0, s4 ^ k1 };
IF (cmd_reg == 1'b1) {
IF ({ s3 ^ k0, s4 ^ k1 } == tag_in) {
tag_v <= 1'b1;
} ELSE {
tag_v <= 1'b0;
}
} ELSE {
tag_v <= 1'b1;
}
done_reg <= 1'b1;
state <= lit(3, ST_IDLE);
}
DEFAULT {
dout_v <= 1'b0;
done_reg <= 1'b0;
ready_reg <= 1'b1;
state <= lit(3, ST_IDLE);
}
}
}
@endmodjz
// Ascon-128 UART Interface — streaming architecture
//
// Protocol (115200 8N1):
// REQUEST:
// Byte 0: 'E' (0x45) encrypt, 'D' (0x44) decrypt
// Bytes 1-16: 128-bit key (MSB first)
// Bytes 17-32: 128-bit nonce (MSB first)
// Byte 33: Data length in bytes (0-255)
// Bytes 34+: Data bytes
// Decrypt only: 16 more bytes of expected tag
//
// RESPONSE:
// Bytes 0+: Output data (same length as input, streamed)
// Next byte: 'K' success, 'F' tag mismatch
// Encrypt only: 16 bytes of tag appended
//
// Data is processed in 8-byte blocks as it streams through.
// No large buffer needed; only two 64-bit shift registers.
@module ascon_top
PORT {
IN [1] clk;
IN [1] por;
IN [1] rst_n;
IN [1] uart_rx;
OUT [1] uart_tx;
OUT [6] leds;
}
WIRE {
reset [1];
por_n [1];
rx_data [8];
rx_valid [1];
tx_ready [1];
ac_dout [64];
ac_dout_v [1];
ac_tag_out [128];
ac_tag_ok [1];
ac_done [1];
ac_ready [1];
}
CONST {
S_RX_CMD = 0;
S_RX_KEY = 1;
S_RX_NONCE = 2;
S_RX_LEN = 3;
S_INIT = 4;
S_INIT_WAIT = 5;
S_RX_BLK = 6;
S_FEED = 7;
S_ASCON_WAIT = 8;
S_TX_BLK = 9;
S_RX_TAG = 10;
S_DONE_WAIT = 11;
S_TX_STATUS = 12;
S_TX_TAG = 13;
S_TX_DONE = 14;
S_EMPTY_FEED = 15;
}
REGISTER {
state [4] = 4'd0;
cmd_reg [1] = 1'b0;
// Key and nonce (shift in MSB first)
key_sr [128] = 128'h00000000000000000000000000000000;
nonce_sr [128] = 128'h00000000000000000000000000000000;
// Data tracking
data_len [8] = 8'd0; // total data bytes
bytes_done [8] = 8'd0; // bytes received/processed so far
byte_cnt [8] = 8'd0; // general purpose byte counter
// Block accumulator (shift in 8 bytes MSB first)
blk_acc [64] = 64'h0000000000000000;
blk_pos [3] = 3'd0; // bytes in current block (0-7)
// TX shift register
tx_shift [64] = 64'h0000000000000000;
tx_cnt [4] = 4'd0; // bytes to send from current block
// Ascon control registers
ac_start_r [1] = 1'b0;
ac_cmd_r [1] = 1'b0;
ac_din_r [64] = 64'h0000000000000000;
ac_din_v_r [1] = 1'b0;
ac_din_last_r [1] = 1'b0;
ac_din_empty_r [1] = 1'b0;
ac_din_plen_r [3] = 3'd0;
// Tag
tag_in_sr [128] = 128'h00000000000000000000000000000000;
tag_result [128] = 128'h00000000000000000000000000000000;
tag_ok_reg [1] = 1'b0;
// TX byte and send
tx_byte_r [8] = 8'h00;
tx_send_r [1] = 1'b0;
tx_tag_cnt [5] = 5'd0;
// Partial block tracking
is_last_blk [1] = 1'b0;
partial_len [3] = 3'd0; // bytes in last block if partial
// Latch ascon done (it may pulse while we're receiving tag)
done_latch [1] = 1'b0;
// LED
led_reg [6] = 6'b000000;
}
ASYNCHRONOUS {
reset = por_n & rst_n;
leds <= led_reg;
}
@new por0 por {
IN [1] clk = clk;
IN [1] done = por;
OUT [1] por_n = por_n;
}
@new urx0 uart_rx {
OVERRIDE {
CLK_MHZ = CONFIG.CLK_MHZ;
}
IN [1] clk = clk;
IN [1] rst_n = reset;
IN [1] rx = uart_rx;
OUT [8] data = rx_data;
OUT [1] valid = rx_valid;
}
@new utx0 uart_tx {
OVERRIDE {
CLK_MHZ = CONFIG.CLK_MHZ;
}
IN [1] clk = clk;
IN [1] rst_n = reset;
IN [8] data = tx_byte_r;
IN [1] valid = tx_send_r;
OUT [1] ready = tx_ready;
OUT [1] tx = uart_tx;
}
@new ac0 ascon {
IN [1] clk = clk;
IN [1] rst_n = reset;
IN [1] cmd_start = ac_start_r;
IN [1] cmd = ac_cmd_r;
IN [128] key = key_sr;
IN [128] nonce = nonce_sr;
IN [64] din = ac_din_r;
IN [1] din_valid = ac_din_v_r;
IN [1] din_last = ac_din_last_r;
IN [1] din_empty = ac_din_empty_r;
IN [3] din_partial_len = ac_din_plen_r;
IN [128] tag_in = tag_in_sr;
OUT [64] dout = ac_dout;
OUT [1] dout_valid = ac_dout_v;
OUT [128] tag_out = ac_tag_out;
OUT [1] tag_valid = ac_tag_ok;
OUT [1] done = ac_done;
OUT [1] ready = ac_ready;
}
SYNCHRONOUS(CLK=clk RESET=reset RESET_ACTIVE=Low) {
SELECT (state) {
// ---- Receive command byte ----
CASE (lit(4, S_RX_CMD)) {
led_reg <= 6'b000001;
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (rx_valid == 1'b1) {
IF (rx_data == 8'h45) {
cmd_reg <= 1'b0;
byte_cnt <= 8'd0;
state <= lit(4, S_RX_KEY);
} ELIF (rx_data == 8'h44) {
cmd_reg <= 1'b1;
byte_cnt <= 8'd0;
state <= lit(4, S_RX_KEY);
}
}
}
// ---- Receive 16-byte key ----
CASE (lit(4, S_RX_KEY)) {
led_reg <= 6'b000011;
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (rx_valid == 1'b1) {
key_sr <= { key_sr[119:0], rx_data };
IF (byte_cnt == 8'd15) {
byte_cnt <= 8'd0;
state <= lit(4, S_RX_NONCE);
} ELSE {
byte_cnt <= byte_cnt + 8'd1;
}
}
}
// ---- Receive 16-byte nonce ----
CASE (lit(4, S_RX_NONCE)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (rx_valid == 1'b1) {
nonce_sr <= { nonce_sr[119:0], rx_data };
IF (byte_cnt == 8'd15) {
state <= lit(4, S_RX_LEN);
} ELSE {
byte_cnt <= byte_cnt + 8'd1;
}
}
}
// ---- Receive data length byte ----
CASE (lit(4, S_RX_LEN)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (rx_valid == 1'b1) {
data_len <= rx_data;
bytes_done <= 8'd0;
blk_pos <= 3'd0;
blk_acc <= 64'h0000000000000000;
partial_len <= rx_data[2:0];
state <= lit(4, S_INIT);
}
}
// ---- Start Ascon init ----
CASE (lit(4, S_INIT)) {
led_reg <= 6'b001111;
ac_start_r <= 1'b1;
ac_cmd_r <= cmd_reg;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
done_latch <= 1'b0;
state <= lit(4, S_INIT_WAIT);
}
// ---- Wait for Ascon init to complete ----
CASE (lit(4, S_INIT_WAIT)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (ac_ready == 1'b1) {
IF (data_len == 8'd0) {
// Empty message: send padding block
state <= lit(4, S_EMPTY_FEED);
} ELSE {
// Start receiving data
state <= lit(4, S_RX_BLK);
}
}
}
// ---- Receive 8 data bytes into block accumulator ----
CASE (lit(4, S_RX_BLK)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (rx_valid == 1'b1) {
blk_acc <= { blk_acc[55:0], rx_data };
bytes_done <= bytes_done + 8'd1;
IF (bytes_done + 8'd1 == data_len) {
// All data received
is_last_blk <= 1'b1;
blk_pos <= blk_pos + 3'd1;
state <= lit(4, S_FEED);
} ELIF (blk_pos == 3'd7) {
// Full 8-byte block ready
blk_pos <= 3'd0;
is_last_blk <= 1'b0;
state <= lit(4, S_FEED);
} ELSE {
blk_pos <= blk_pos + 3'd1;
}
}
}
// ---- Feed block to Ascon ----
CASE (lit(4, S_FEED)) {
ac_start_r <= 1'b0;
tx_send_r <= 1'b0;
IF (is_last_blk == 1'b1 && blk_pos != 3'd0) {
// Partial last block: pad with 0x80 after last byte
SELECT (blk_pos) {
CASE (3'd1) {
ac_din_r <= { blk_acc[7:0], 8'h80, 48'h000000000000 };
}
CASE (3'd2) {
ac_din_r <= { blk_acc[15:0], 8'h80, 40'h0000000000 };
}
CASE (3'd3) {
ac_din_r <= { blk_acc[23:0], 8'h80, 32'h00000000 };
}
CASE (3'd4) {
ac_din_r <= { blk_acc[31:0], 8'h80, 24'h000000 };
}
CASE (3'd5) {
ac_din_r <= { blk_acc[39:0], 8'h80, 16'h0000 };
}
CASE (3'd6) {
ac_din_r <= { blk_acc[47:0], 8'h80, 8'h00 };
}
CASE (3'd7) {
ac_din_r <= { blk_acc[55:0], 8'h80 };
}
DEFAULT {
ac_din_r <= 64'h8000000000000000;
}
}
ac_din_v_r <= 1'b1;
ac_din_last_r <= 1'b1;
ac_din_empty_r <= 1'b0;
// Partial block: only send partial_len bytes in TX
tx_cnt <= { 1'b0, partial_len };
ac_din_plen_r <= partial_len;
state <= lit(4, S_ASCON_WAIT);
} ELIF (is_last_blk == 1'b1) {
// Full last block: process normally, then need empty pad
ac_din_r <= blk_acc;
ac_din_v_r <= 1'b1;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
ac_din_plen_r <= 3'd0;
tx_cnt <= 4'd8;
state <= lit(4, S_ASCON_WAIT);
} ELSE {
// Intermediate full block
ac_din_r <= blk_acc;
ac_din_v_r <= 1'b1;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
ac_din_plen_r <= 3'd0;
tx_cnt <= 4'd8;
state <= lit(4, S_ASCON_WAIT);
}
}
// ---- Wait for Ascon, capture output, then TX ----
CASE (lit(4, S_ASCON_WAIT)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (ac_dout_v == 1'b1) {
tx_shift <= ac_dout;
state <= lit(4, S_TX_BLK);
}
}
// ---- TX: send bytes from shift register ----
CASE (lit(4, S_TX_BLK)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
IF (tx_ready == 1'b1 && tx_send_r == 1'b0) {
tx_byte_r <= tx_shift[63:56];
tx_shift <= { tx_shift[55:0], 8'h00 };
tx_send_r <= 1'b1;
IF (ac_done == 1'b1) {
done_latch <= 1'b1;
}
IF (tx_cnt == 4'd1) {
// Last byte of this block sent
IF (is_last_blk == 1'b1 && partial_len != 3'd0) {
// Partial last block was already din_last
// Wait for ascon done
IF (cmd_reg == 1'b1) {
byte_cnt <= 8'd0;
state <= lit(4, S_RX_TAG);
} ELSE {
state <= lit(4, S_DONE_WAIT);
}
} ELIF (is_last_blk == 1'b1) {
// Full last block: need empty padding block
state <= lit(4, S_EMPTY_FEED);
} ELSE {
// More blocks to receive
blk_pos <= 3'd0;
blk_acc <= 64'h0000000000000000;
state <= lit(4, S_RX_BLK);
}
} ELSE {
tx_cnt <= tx_cnt - 4'd1;
}
} ELSE {
tx_send_r <= 1'b0;
IF (ac_done == 1'b1) {
done_latch <= 1'b1;
}
}
}
// ---- Feed empty padding block ----
CASE (lit(4, S_EMPTY_FEED)) {
ac_start_r <= 1'b0;
tx_send_r <= 1'b0;
IF (ac_ready == 1'b1) {
ac_din_r <= 64'h0000000000000000;
ac_din_v_r <= 1'b1;
ac_din_last_r <= 1'b1;
ac_din_empty_r <= 1'b1;
IF (cmd_reg == 1'b1) {
byte_cnt <= 8'd0;
state <= lit(4, S_RX_TAG);
} ELSE {
state <= lit(4, S_DONE_WAIT);
}
} ELSE {
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
}
}
// ---- Receive 16-byte tag (decrypt only) ----
CASE (lit(4, S_RX_TAG)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (ac_done == 1'b1) {
done_latch <= 1'b1;
}
IF (rx_valid == 1'b1) {
tag_in_sr <= { tag_in_sr[119:0], rx_data };
IF (byte_cnt == 8'd15) {
state <= lit(4, S_DONE_WAIT);
} ELSE {
byte_cnt <= byte_cnt + 8'd1;
}
}
}
// ---- Wait for Ascon finalization ----
CASE (lit(4, S_DONE_WAIT)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
IF (done_latch == 1'b1) {
// Done already fired while we were in S_RX_TAG or S_TX_BLK
// tag_out and tag_valid persist in ascon (not cleared in IDLE)
tag_result <= ac_tag_out;
tag_ok_reg <= ac_tag_ok;
tx_tag_cnt <= 5'd0;
done_latch <= 1'b0;
state <= lit(4, S_TX_STATUS);
} ELIF (ac_done == 1'b1) {
tag_result <= ac_tag_out;
tag_ok_reg <= ac_tag_ok;
tx_tag_cnt <= 5'd0;
state <= lit(4, S_TX_STATUS);
}
}
// ---- TX: send status byte ----
CASE (lit(4, S_TX_STATUS)) {
led_reg <= 6'b011111;
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
IF (tx_ready == 1'b1 && tx_send_r == 1'b0) {
IF (cmd_reg == 1'b1 && tag_result != tag_in_sr) {
tx_byte_r <= 8'h46;
} ELSE {
tx_byte_r <= 8'h4B;
}
tx_send_r <= 1'b1;
IF (cmd_reg == 1'b0) {
state <= lit(4, S_TX_TAG);
} ELSE {
state <= lit(4, S_TX_DONE);
}
} ELSE {
tx_send_r <= 1'b0;
}
}
// ---- TX: send 16-byte tag (encrypt only) ----
CASE (lit(4, S_TX_TAG)) {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
IF (tx_ready == 1'b1 && tx_send_r == 1'b0) {
tx_byte_r <= tag_result[127:120];
tag_result <= { tag_result[119:0], 8'h00 };
tx_send_r <= 1'b1;
IF (tx_tag_cnt == 5'd15) {
state <= lit(4, S_TX_DONE);
} ELSE {
tx_tag_cnt <= tx_tag_cnt + 5'd1;
}
} ELSE {
tx_send_r <= 1'b0;
}
}
// ---- Done, return to idle ----
CASE (lit(4, S_TX_DONE)) {
led_reg <= 6'b000000;
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
state <= lit(4, S_RX_CMD);
}
DEFAULT {
ac_start_r <= 1'b0;
ac_din_v_r <= 1'b0;
ac_din_last_r <= 1'b0;
ac_din_empty_r <= 1'b0;
tx_send_r <= 1'b0;
state <= lit(4, S_RX_CMD);
}
}
}
@endmodjz
// Simple UART Receiver — 8N1, no FIFO
// Pulses valid for 1 cycle when a byte is received
@module uart_rx
CONST {
CLK_MHZ = 27;
BAUD_DIV = (CLK_MHZ * 1000000 / 115200) - 1;
HALF_BAUD = BAUD_DIV / 2;
}
PORT {
IN [1] clk;
IN [1] rst_n;
IN [1] rx;
OUT [8] data;
OUT [1] valid;
}
REGISTER {
// Metastability synchronizer
rx_sync1 [1] = 1'b1;
rx_sync2 [1] = 1'b1;
// State machine (0=IDLE, 1=START, 2=DATA, 3=STOP)
state [2] = 2'd0;
baud_cnt [16] = 16'd0;
bit_cnt [3] = 3'd0;
shift [8] = 8'h00;
// Output
data_out [8] = 8'h00;
valid_out [1] = 1'b0;
}
ASYNCHRONOUS {
data <= data_out;
valid <= valid_out;
}
SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
// 2-stage synchronizer for async RX input
rx_sync1 <= rx;
rx_sync2 <= rx_sync1;
SELECT (state) {
CASE (2'd0) {
// IDLE: wait for start bit (falling edge)
valid_out <= 1'b0;
IF (rx_sync2 == 1'b0) {
baud_cnt <= lit(16, HALF_BAUD);
state <= 2'd1;
}
}
CASE (2'd1) {
// START: verify start bit at mid-point
valid_out <= 1'b0;
IF (baud_cnt == 16'd0) {
IF (rx_sync2 == 1'b0) {
baud_cnt <= lit(16, BAUD_DIV);
bit_cnt <= 3'd0;
shift <= 8'h00;
state <= 2'd2;
} ELSE {
// False start
state <= 2'd0;
}
} ELSE {
baud_cnt <= baud_cnt - 16'd1;
}
}
CASE (2'd2) {
// DATA: sample 8 bits at mid-bit
valid_out <= 1'b0;
IF (baud_cnt == 16'd0) {
shift <= { rx_sync2, shift[7:1] };
IF (bit_cnt == 3'd7) {
baud_cnt <= lit(16, BAUD_DIV);
state <= 2'd3;
} ELSE {
bit_cnt <= bit_cnt + 3'd1;
baud_cnt <= lit(16, BAUD_DIV);
}
} ELSE {
baud_cnt <= baud_cnt - 16'd1;
}
}
CASE (2'd3) {
// STOP: wait for stop bit, output byte
IF (baud_cnt == 16'd0) {
data_out <= shift;
valid_out <= 1'b1;
state <= 2'd0;
} ELSE {
valid_out <= 1'b0;
baud_cnt <= baud_cnt - 16'd1;
}
}
DEFAULT {
valid_out <= 1'b0;
state <= 2'd0;
}
}
}
@endmodjz
// Simple UART Transmitter — 8N1, no FIFO
// Asserts ready when idle. When valid is pulsed with data, transmits one byte.
@module uart_tx
CONST {
CLK_MHZ = 27;
BAUD_DIV = (CLK_MHZ * 1000000 / 115200) - 1;
}
PORT {
IN [1] clk;
IN [1] rst_n;
IN [8] data;
IN [1] valid;
OUT [1] ready;
OUT [1] tx;
}
REGISTER {
// State machine (0=IDLE, 1=START, 2=DATA, 3=STOP)
state [2] = 2'd0;
baud_cnt [16] = 16'd0;
bit_cnt [3] = 3'd0;
shift [8] = 8'hFF;
// Outputs
tx_out [1] = 1'b1;
ready_out [1] = 1'b1;
}
ASYNCHRONOUS {
tx <= tx_out;
ready <= ready_out;
}
SYNCHRONOUS(CLK=clk RESET=rst_n RESET_ACTIVE=Low) {
SELECT (state) {
CASE (2'd0) {
// IDLE: line high, ready for data
tx_out <= 1'b1;
IF (valid == 1'b1) {
shift <= data;
baud_cnt <= lit(16, BAUD_DIV);
state <= 2'd1;
ready_out <= 1'b0;
} ELSE {
ready_out <= 1'b1;
}
}
CASE (2'd1) {
// START bit: hold TX low for one baud period
tx_out <= 1'b0;
ready_out <= 1'b0;
IF (baud_cnt == 16'd0) {
baud_cnt <= lit(16, BAUD_DIV);
bit_cnt <= 3'd0;
state <= 2'd2;
} ELSE {
baud_cnt <= baud_cnt - 16'd1;
}
}
CASE (2'd2) {
// DATA: shift out 8 bits LSB first
tx_out <= shift[0];
ready_out <= 1'b0;
IF (baud_cnt == 16'd0) {
shift <= { 1'b1, shift[7:1] };
IF (bit_cnt == 3'd7) {
baud_cnt <= lit(16, BAUD_DIV);
state <= 2'd3;
} ELSE {
bit_cnt <= bit_cnt + 3'd1;
baud_cnt <= lit(16, BAUD_DIV);
}
} ELSE {
baud_cnt <= baud_cnt - 16'd1;
}
}
CASE (2'd3) {
// STOP bit: hold TX high for one baud period
tx_out <= 1'b1;
IF (baud_cnt == 16'd0) {
state <= 2'd0;
ready_out <= 1'b1;
} ELSE {
ready_out <= 1'b0;
baud_cnt <= baud_cnt - 16'd1;
}
}
DEFAULT {
tx_out <= 1'b1;
ready_out <= 1'b1;
state <= 2'd0;
}
}
}
@endmodjz
@module por
PORT {
IN [1] clk;
IN [1] done;
OUT [1] por_n;
}
CONST {
POR_CYCLES = 16;
POR_CNT_BITS = clog2(POR_CYCLES);
POR_MAX = POR_CYCLES - 1;
}
REGISTER {
por_reg [1] = 1'b0;
cnt [POR_CNT_BITS] = POR_CNT_BITS'b0;
}
ASYNCHRONOUS {
por_n <= por_reg;
}
SYNCHRONOUS(CLK=clk) {
IF (done == 1'b0) {
por_reg <= 1'b0;
cnt <= POR_CNT_BITS'b0;
} ELIF (cnt == lit(POR_CNT_BITS, POR_MAX)) {
por_reg <= 1'b1;
cnt <= cnt;
} ELSE {
por_reg <= 1'b0;
cnt <= cnt + POR_CNT_BITS'b1;
}
}
@endmodJZ-HDL Language Features
Combinational permutation in
ASYNCHRONOUS. All four stages of the Ascon S-box and linear layer are expressed as chained wire assignments. The compiler verifies there are no combinational loops and that every wire is driven exactly once — the kind of bug that would silently produce wrong ciphertext in Verilog.Width-safe concatenation and slicing. The padding logic, key splitting, and circular shifts use explicit widths throughout (
{ din[63:24], s0[23:16] ^ 8'h80, s0[15:0] }). The compiler checks that every concatenation and slice produces exactly the declared width, preventing the off-by-one bit errors that plague hand-written cryptographic RTL.clog2()compile-time evaluation. The POR module usesclog2(POR_CYCLES)to compute the counter width. The compiler evaluates this at compile time, ensuring the counter is exactly wide enough without manual calculation.Single-driver enforcement. Each register and wire has exactly one driver. The Ascon engine's 320-bit state is modified in multiple FSM states, but the compiler verifies through control-flow analysis that no state can be driven from two places simultaneously — a guarantee that would require manual review in Verilog.