finished the hub's logic, half done with the interfaces, and hopefully working on pi logic soon

This commit is contained in:
2025-06-05 22:38:26 -04:00
parent f61de84b4a
commit a94823b44a
8 changed files with 304 additions and 266 deletions

View File

@ -0,0 +1,71 @@
# Reworking Almost Done
Date: 2025-06-05
## Goals and expectations
Build the framework for the entire fabric's logic.
## Results
Nicely done.
This should've been split into several commits, but it's still work on
the same project. The following is all that's been done:
1. Completed the hub's logic.
2. Began reworking the interfaces' logic, already done with the RX
side of things (the RX side always seem simpler).
3. Built my own `git` server using `gitea` and `nginx` to host my
stuff (ROSE included).
4. Removed `specifications.md` and revised `protocol.md` to keep them
updated with the reworked routing logic.
## Is KISS elegance?
> Keep It Simple, Stupid.
Is this elegance?
It could be, when everything else is working within strict bounds
(e.g. the hub's strict logic), leveraging that strictness to simplify
some logic (e.g. adding a cooldown period for queue slot allocation)
is elegance. Leveraging the strictness allows the logic to be
simplified, and the simplicity is elegance in its most natural form.
## Reflections
1. KISS. Letting the interfaces manage the packet addresses was a
great decision. It dumbed down the hub's logic at the cost of
adding a bit of complexity to the interfaces' logic. This would
also make things do one thing and do that thing well.
2. Be organized. Organize the components by their functionalities.
It would prove to be very useful when there's a dozen latches
coming in and out of the interfaces/hub and you'd like to know if
you missed something.
3. Get sidetracked, absorb outside stimulation. Even though my last
devlog said to focus, there's still the need to know how other
things operate, even if they may seem irrelevant. I remember
reading the code for `SMaRTT-REPS` and `STrack` and some new CC
algorithms for UET during my time at Huawei, and then reading one
of their proprietary CC algorithms, which had a much simpler logic
but performs similar to the above-mentioned more complex ones.
Getting sidetracked would help you think - is there a simpler way
to achieve what I want?
4. Get started. I think I might stress this every time two commits
happen days away from each other, but I can never stress this
enough. I spent hours deciding whether or not to start coding with
the time I have on hand, but the real action only took a fraction
of that time. Begin, progress is progress, no matter how much it
is.
## Final thoughts
There are a lot of ideas that are still "under validation" in my mind,
like for example implementing a dual-issue round-robin scheme for the
hub to increase the maximum throughput and prepare ROSE for scale.
They don't go unnoticed, I'd add them to the devlog and hoping that
someday when I get the leisure to return to them, I can implement
them.
Also, finding time for ROSE has been hard, only fragments of time has
been found on most days. But there's no say that I can't do something
with that time.
## Next steps
Complete the TX logic, figure out the byte syncing for the SPI side of
things, and also test the things out on a TB and hopefully start
making my way back to C land on RPi's.

View File

@ -1,21 +0,0 @@
# Protocol Specifications for ROSE
## Protocol header
Rose will use a 20-byte header for its packets. The headers contain
the following components:
1. *CMD* (8 bits): Contains the command to the fabric or
interconnected devices, see *Commands* for details.
2. *SRC* (16 bits): Contains the packet source address.
3. *DEST* (16 bits): contains the packet destination address.
4. *ID* (32 bits): contains a packet identifier unique to the source
in a large time frame.
5. *RESERVED* (64 bits): reserved for future iterations.
6. *CRC* (8 bits): reserved for future iterations incorporating CRC
checksums.
7. *LEN* (16 bits): length of payload.
## SPI Specifications
We will be using SPI in mode 0, where data is sampled on the falling
edge of the clock pulse and shifted out on the rising edge.

View File

@ -1,30 +1,35 @@
`include <params.sv>
`include <params.svh>
`include <routing.svh>
// IMPORTANT: interfaces are supposed to keep track of their own packet states
module hub(
input logic sys_clk,
input logic rst,
input logic [INTERFACE_CNT - 1][PACKET_ADDR_LEN - 1:0] rx_pkt_addr,
input logic [INTERFACE_CNT - 1:0][7:0] rx_byte,
input logic [INTERFACE_CNT - 1:0] rx_valid,
input logic [INTERFACE_CNT - 1:0] tx_ready,
input logic [INTERFACE_CNT - 1:0] tx_full,
input logic [INTERFACE_CNT - 1:0][PACKET_ADDR_LEN - 1:0] tx_pkt_addr,
input logic [INTERFACE_CNT - 1:0] rx_new_packet,
output logic [INTERFACE_CNT - 1:0] rx_ready,
output logic [INTERFACE_CNT - 1:0][PACKET_ADDR_LEN - 1:0] tx_queue_addr,
output logic [INTERFACE_CNT - 1:0] tx_queue_addr_valid,
output logic [INTERFACE_CNT - 1:0][7:0] tx_byte,
output logic [INTERFACE_CNT - 1:0] tx_valid);
input logic sys_clk,
input logic [7:0] rx_byte [INTERFACE_CNT],
input logic rx_valid [INTERFACE_CNT],
output logic rx_ready [INTERFACE_CNT],
input logic [PACKET_ADDR_LEN - 1:0] rx_pkt_addr [INTERFACE_CNT],
output logic [7:0] tx_byte [INTERFACE_CNT],
output logic tx_valid [INTERFACE_CNT],
input logic tx_ready [INTERFACE_CNT],
input logic [PACKET_ADDR_LEN - 1:0] tx_pkt_addr [INTERFACE_CNT],
input logic [QUEUE_ADDR_LEN - 1:0] tx_queue_addr [INTERFACE_CNT],
output logic [QUEUE_ADDR_LEN - 1:0] tx_new_queue[INTERFACE_CNT],
output logic tx_new_queue_valid [INTERFACE_CNT],
input logic tx_new_queue_ready [INTERFACE_CNT],
output logic free_queue_empty);
timeunit 1ns;
timeprecision 1ps;
logic [INTERFACE_CNT - 1:0] curr_service;
logic [INTERFACE_ADDR_LEN - 1:0] curr;
logic [INTERFACE_ADDR_LEN - 1:0] dest_buff [INTERFACE_CNT];
logic reuse_queue_slot [INTERFACE_CNT];
logic last_served_read;
logic request_new_slot;
logic [QUEUE_ADDR_LEN - 1:0] new_slot_addr;
logic free_queue_empty;
logic [QUEUE_ADDR_LEN - 1:0] empty_slot_addr;
logic [QUEUE_ADDR_LEN - 1:0] empty_slot_enqueue;
logic empty_slot_enqueue;
free_queue fqueue(.sys_clk(sys_clk),
.rst(rst),
@ -34,12 +39,13 @@ module hub(
.new_slot_addr(new_slot_addr),
.queue_empty(free_queue_empty));
logic [INTERFACE_CNT - 1:0][MEMORY_ADDR_LEN - 1:0] rx_mem_addr;
logic [QUEUE_ADDR_LEN - 1:0] rx_queue_addr [INTERFACE_CNT];
logic [MEMORY_POOL_ADDR_LEN - 1:0] mem_read_addr;
logic [7:0] mem_read_byte;
logic [MEMORY_POOL_ADDR_LEN - 1:0] mem_write_addr;
logic mem_write_enable;
logic [7:0] mem_write_byte;
shortint new_slot_cooldown [INTERFACE_CNT];
memory_pool mpool(.sys_clk(sys_clk),
.rst(rst),
@ -49,49 +55,88 @@ module hub(
.write_enable(mem_write_enable),
.read_byte(mem_read_byte));
always_ff @ (posedge sys_clk or rst) begin
if (rst) begin
tx_queue_addr <= '0;
tx_queue_addr_valid <= '0;
tx_byte <= '0;
tx_valid <= '0;
curr_service <= '0;
rx_ready <= '0;
rx_mem_addr <= '0;
curr <= '0;
for (int i = 0; i < INTERFACE_CNT; i++) begin
tx_new_queue[i] <= '0;
tx_new_queue_valid[i] <= 0;
tx_byte[i] <= '0;
tx_valid[i] <= 0;
rx_ready[i] <= 0;
rx_queue_addr[i] <= '0;
dest_buff[i] <= '0;
reuse_queue_slot[i] <= 0;
new_slot_cooldown[i] <= 0;
end
mem_read_addr <= '0;
mem_write_addr <= '0;
mem_write_enable <= 0;
mem_write_byte <= '0;
end else begin
// NOTE: signaled the servicing interface in the last cycle
rx_ready[curr_service] <= 0;
rx_ready[curr_service + 1] <= 1;
rx_ready[curr] <= 0;
rx_ready[curr + 1] <= 1;
tx_new_queue_valid[dest_buff[curr - 1]] <= 0;
// IMPORTANT: interfaces should send the byte no matter what, rx_ready is to prevent sending a new byte
if (rx_valid[curr_service]) begin
if (rx_valid[curr]) begin
// IMPORTANT: memory_write_addr is ready on the next cycle
if (rx_new_packet[curr_service]) begin
if (rx_pkt_addr[curr] == 0 && !reuse_queue_slot[curr] &&
!(|new_slot_cooldown[curr])) begin
if (free_queue_empty) begin
// TODO: handle the drop logic
end else begin
request_new_slot <= 1;
rx_mem_addr[{curr_service,
MEMORY_POOL_ADDR_SHIFT'd0}
+:MEMORY_POOL_ADDR_LEN
] <= {new_slot_addr, PACKET_ADDR_LEN'd0};
mem_write_addr <= {new_slot_addr, PACKET_ADDR_LEN'd0};
rx_queue_addr[curr] <= new_slot_addr;
mem_write_addr <= {new_slot_addr, rx_pkt_addr[curr]};
new_slot_cooldown[curr] <= NEW_SLOT_COOLDOWN;
end
end else begin // if (rx_new_packet[curr_service])
// NOTE: if memory
mem_write_addr <= mem_write_addr + 1;
end else begin // if (rx_new_packet[curr])
reuse_queue_slot[curr] <= 0;
mem_write_addr <= {rx_queue_addr[curr], rx_pkt_addr[curr]};
request_new_slot <= 0;
end // else: !if(rx_new_packet[curr_service])
mem_write_byte <= rx_byte[{curr_service, 3'd0}+:8];
if (rx_pkt_addr[curr] == ROSE_DEST_INDEX) begin
dest_buff[curr] <= next_hop(rx_byte[curr]);
end
end // else: !if(rx_new_packet[curr])
if (|new_slot_cooldown[curr]) begin
new_slot_cooldown[curr] <= new_slot_cooldown[curr] - 1;
end
mem_write_byte <= rx_byte[curr];
mem_write_enable <= 1;
end else // if (rx_valid[curr_service])
if (&rx_pkt_addr[curr]) begin // packet complete
if (tx_new_queue_ready[dest_buff[curr]]) begin
tx_new_queue[dest_buff[curr]] <= rx_queue_addr[curr];
tx_new_queue_valid[dest_buff[curr]] <= 1;
end else begin
reuse_queue_slot[curr] <= 1;
end
end
end else begin // if (rx_valid[curr])
mem_write_enable <= 0;
end // else: !if(rx_valid[curr])
// IMPORTANT: tx_ready is only signaled when tx_pkt_addr is valid
if (tx_ready[curr]) begin
last_served_read <= 1;
mem_read_addr <= {tx_queue_addr[curr], tx_pkt_addr[curr]};
if (&tx_pkt_addr[curr]) begin
empty_slot_addr <= tx_queue_addr[curr];
empty_slot_enqueue <= 1;
end else begin
empty_slot_enqueue <= 0;
end
end else begin
empty_slot_enqueue <= 0;
last_served_read <= 0;
end // else: !if(tx_ready[curr])
if (last_served_read) begin
tx_byte[curr - 1] <= mem_read_byte;
tx_valid[curr - 1] <= 1;
end else begin
tx_valid[curr - 1] <= 0;
end
end
end
endmodule // hub
@ -123,8 +168,8 @@ module free_queue(input logic sys_clk,
always_ff @ (posedge sys_clk or rst) begin
if (rst) begin
head <= '0;
tail <= QUEUE_ADDR_LEN'd1;
queue_size = QUEUE_SIZE;
tail <= {QUEUE_ADDR_LEN{1'd1}};
queue_size <= QUEUE_SIZE;
new_slot_addr <= '0;
end else begin
if (request_new_slot) begin

View File

@ -1,23 +1,25 @@
// NOTE: The first byte is used for syncing due to using different clock domains
`define SYNC_2FF
`include <params.svh>
module spi_interface(
input logic rst,
input logic sys_clk,
input logic mosi,
input logic cs,
input logic sclk,
input logic rx_ready,
input logic tx_valid,
input logic [7:0] tx_byte,
input logic [1:0] tx_src,
input logic [1:0] packet_size,
output logic miso,
output logic tx_ready,
output logic rx_valid,
output logic [7:0] rx_byte,
output logic [7:0] rx_dest,
output logic [7:0] rx_cmd,
output logic rx_cmd_valid);
output logic rx_valid,
input logic rx_ready,
output logic [PACKET_ADDR_LEN - 1:0] rx_pkt_addr,
input logic [7:0] tx_byte,
input logic tx_valid,
output logic tx_ready,
output logic [PACKET_ADDR_LEN - 1:0] tx_pkt_addr,
output logic [QUEUE_ADDR_LEN - 1:0] tx_queue_addr,
input logic [QUEUE_ADDR_LEN - 1:0] tx_new_queue,
input logic tx_new_queue_valid,
output logic tx_new_queue_ready,
input logic free_queue_empty);
timeunit 1ns;
timeprecision 1ps;
@ -32,7 +34,7 @@ module spi_interface(
.clk_rising_edge(sclk_rising_edge),
.clk_falling_edge(sclk_falling_edge));
int bit_cnt = 0;
shortint bit_cnt = 0;
logic [7:0] rx_shift;
logic [7:0] tx_shift = 8'b00101010;
logic [7:0] rx_buff = '0;
@ -58,21 +60,45 @@ module spi_interface(
bit_cnt <= 0;
rx_buff <= {rx_shift[6:0], mosi};
byte_ready <= 1;
end else
end else begin
byte_ready <= 0;
end
end // else: !if(cs)
end // else: !if(rst)
$display("[%0d] current rx_shift: %b", $time, rx_shift);
$display("[%0d] current bit_cnt: %0d", $time, bit_cnt);
$display("[%0d] current rx_buff: %b", $time, rx_buff);
end // always_ff @ (posedge sclk)
always_ff @ (posedge sclk_falling_edge) begin
shortint idle_cntdn;
logic rx_drained;
always_ff @ (posedge sys_clk or rst) begin
if (rst) begin
tx_shift <= 0;
rx_drained <= 0;
rx_pkt_addr <= '1;
rx_byte <= '0;
rx_valid <= 0;
idle_cntdn <= 0;
end else begin
if (!rx_drained && byte_ready) begin
rx_byte <= rx_buff;
rx_valid <= 1;
idle_cntdn <= INTERFACE_IDLE_COUNTDOWN;
rx_drained <= 1;
rx_pkt_addr <= rx_pkt_addr + 1;
end else if (!byte_ready) begin
rx_drained <= 0;
if (!(|idle_cntdn)) begin
rx_valid <= 0;
end else begin
idle_cntdn <= idle_cntdn - 1;
end
else begin
end
end
end
always_ff @ (posedge sclk_falling_edge or rst) begin
if (rst) begin
tx_shift <= 8'b00101010;
end else begin
if (cs) begin
tx_shift <= 0;
end else begin
@ -90,62 +116,6 @@ module spi_interface(
assign miso = tx_shift[7];
// RX and TX logic
logic [9:0] rx_queue_head = 0;
logic [9:0] rx_queue_tail = 0;
logic [10:0] rx_size = 0;
logic rx_queue_write = 0;
logic [7:0] rx_read;
logic [7:0] dest_read;
logic packet_sending;
logic rx_queue_empty;
assign rx_size = (rx_queue_tail + 11'd1024 - rx_queue_head) & 11'h3FF;
assign rx_queue_empty = ~(|rx_size);
rx_queue_bram rx_queue (.sys_clk(sys_clk),
.write_enable(rx_queue_write),
.read_addr(rx_queue_head),
.write_addr(rx_queue_tail),
.write_data(rx_buff),
.read_data(rx_read),
.read_dest(dest_read));
always_ff @ (posedge sys_clk) begin
if (rst) begin
rx_queue_head <= '0;
rx_queue_tail <= '0;
rx_queue_write <= '0;
rx_read <= '0;
packet_sending <= 0;
end else begin
if (byte_ready)
rx_queue_write <= 1;
if (rx_queue_write) begin
rx_queue_write <= 0;
rx_queue_tail <= rx_queue_tail + 1;
end
if (!packet_sending) begin
if (rx_size > 2 && rx_ready) begin
rx_byte <= rx_read;
rx_dest <= dest_read;
rx_valid <= 1;
end else
rx_valid <= 0;
end else begin
if (is_packet_complete(rx_queue_head, packet_size))
packet_sending <= 0;
else if (rx_size > 0) begin
rx_byte <= rx_read;
rx_dest <= dest_read;
rx_valid <= 1;
end
end
end
end // always_ff @ (posedge sys_clk)
logic [13:0] tx_queue_head;
logic [13:0] tx_queue_tail;
endmodule // spi_interface
@ -157,7 +127,7 @@ module async_get_clk_edges(
output logic clk_falling_edge);
timeunit 1ns;
timeprecision 1ps;
`ifdef SYNC_2FF
logic sync_0 = 0;
logic sync_1 = 0;
@ -173,71 +143,4 @@ module async_get_clk_edges(
assign clk_rising_edge = sync_0 & ~sync_1;
assign clk_falling_edge = ~sync_0 & sync_1;
`else // !`ifdef SYNC_2FF
logic [2:0] clk_sync = 0;
always_ff @ (posedge sys_clk) begin
if (rst)
clk_sync <= {clk_sync[1:0], ext_clk};
end
assign clk_rising_edge = (clk_sync[2:1] == 2'b01);
assign clk_falling_edge = (clk_sync[2:1] == 2'b10);
`endif // !`ifdef SYNC_2FF
endmodule // async_get_clk_edges
module rx_queue_bram (
input logic sys_clk,
input logic write_enable,
input logic [9:0] read_addr,
input logic [9:0] write_addr,
input logic [7:0] write_data,
output logic [7:0] read_data,
output logic [7:0] read_dest);
timeunit 1ns;
timeprecision 1ps;
logic [7:0] mem [1023:0];
always_ff @ (posedge sys_clk) begin
if (write_enable)
mem[write_addr] <= write_data;
read_data <= mem[read_addr];
read_dest <= mem[read_addr + 1];
end
endmodule // rx_queue_bram
module tx_queue_bram(input logic sys_clk,
input logic write_enable,
input logic [13:0] read_addr,
input logic [13:0] write_addr,
input logic [7:0] write_data,
output logic [7:0] read_data);
timeunit 1ns;
timeprecision 1ps;
logic [7:0] mem [16 * 1023:0];
always_ff @ (posedge sys_clk) begin
if (write_enable)
mem[write_addr] <= write_data;
read_data <= mem[read_addr];
end
endmodule // tx_queue_bram
function automatic logic is_packet_complete(input logic [9:0] head,
input logic [1:0] packet_size);
case(packet_size)
2'b00:
return &(head & 'd64);
2'b01:
return &(head & 'd128);
2'b10:
return &(head & 'd256);
2'b11:
return &head;
endcase // case (packet_size)
endfunction // packet_complete

View File

@ -1,6 +1,13 @@
`ifndef __PARAMS_SVH__
`define __PARAMS_SVH__
parameter int PACKET_SIZE = 64;
parameter int PACKET_ADDR_LEN = 6;
parameter int QUEUE_SIZE = 1024;
parameter int ROSE_ADDR_LEN = 8;
parameter logic [PACKET_ADDR_LEN - 1:0] ROSE_DEST_INDEX = 1;
parameter shortint QUEUE_SIZE = 1024;
parameter int QUEUE_ADDR_LEN = 10;
parameter int MEMORY_POOL_SIZE = QUEUE_SIZE * PACKET_SIZE;
parameter int MEMORY_POOL_ADDR_LEN = QUEUE_ADDR_LEN + PACKET_ADDR_LEN;
@ -8,4 +15,9 @@ parameter int MEMORY_POOL_ADDR_SHIFT = 4;
parameter int INTERFACE_QUEUE_SIZE = 512;
parameter int INTERFACE_QUEUE_ADDR_LEN = 9;
parameter int INTERFACE_CNT = 4;
parameter int INTERFACE_ADDR_LEN = 2;
parameter int CRC_BITS = 8;
parameter shortint NEW_SLOT_COOLDOWN = 500;
parameter shortint INTERFACE_IDLE_COUNTDOWN = 4;
`endif

21
fabric/src/routing.svh Normal file
View File

@ -0,0 +1,21 @@
`ifndef __ROUTING__SVH__
`define __ROUTING__SVH__
`include <params.svh>
function automatic logic [INTERFACE_ADDR_LEN - 1:0] next_hop(input logic [ROSE_ADDR_LEN - 1:0] interface_addr);
case(interface_addr)
0:
return 0;
1:
return 1;
2:
return 2;
3:
return 3;
default:
return 0;
endcase // case (interface_addr)
endfunction // next_hop
`endif

View File

@ -1,14 +1,15 @@
# The Specifications for ROSE
Extensions to the protocol may change the specifications, see the
Development on the protocol may change the specifications, see the
devlogs for specific decisions on changes.
## Packet specifications
### Packet size
This is determined by the users after considering their applications
during compilation/synthesis of their ROSE setup, the size must be a
power of 2 (and long enough, >16 bytes is recommended).
### Header
#### (1 byte) Command + packet size
- Packet sizes are chosen out of 4 predetermined sizes. Using only 2
bits in this byte to be represented.
- Commands are 6 bits with 64 possibilities, see the **Commands**
#### (1 byte) Command
- Commands are 8 bits with 256 possibilities, see the **Commands**
section for details.
#### (1 byte) Destination address
@ -19,13 +20,14 @@ devlogs for specific decisions on changes.
### Payload
Via commands, leading or trailing bytes in the payload can also be
repurposed to timestamps or other feature extensions.
repurposed to feature extensions.
### (1 byte) CRC-8
To ensure delivery.
## Commands
TBD.
1. `[CMD: 0]` Idle.
2. `[CMD: 1]` Send packet.
### Feature Extensions
#### [CMD: TBD] Include timestamp
#### [CMD: TBD] Include sequence number

View File

@ -44,6 +44,11 @@ each line.
Unless it's the bit-length of a byte or something that's commonly
known and obvious at first glance, use a constant to store it.
## Arrays
Avoid using arrays that are more than 2 dimensions. If you need to
store multiple dimensions of data, consider using `struct` to clarify
what each dimension stores.
## Naming schemes
Names are only meaningful to humans, and the rationale behind the
following guidelines is to allow anyone reading the code to know what