Files
rose/devlog/2025-05-17-Routing-logic.md
2025-05-17 22:06:41 -04:00

5.5 KiB

Routing Logic

Date: 2025-05-17

Goals and expectations

The flu is mostly gone from me, so I expected to get some work done. The first is to complete the core routing logic (still, no congestion considered, if a TX buffer is full, then it'll just drop the data silently).

Thought train

The RX queue should also be able to send high-priority messages to the interface if it's congested because the routing logic is congested. There can even be congestion management methods developed based on how full the RX queue is relative to the packet sizes and the number of connected devices.

Results

Trivial (not really)

Due to the younger me being blind and used verilog-mode's default 3-space indentation, the files have been revamped to use 4 spaces for indentations. And I also renamed some files and modules.

Indentation is pretty relevant in good code, 3 spaces is probably more evil than using tabs.

Completed routing logic

I had the idea in mind to stream packets directly without any buffer involved, but to simplify the round-robin logic (and allowing potentially multiple streams of data), I went with a small buffer to absorb 1 byte from the interfaces.

I did re-rethink the routing logic: all interfaces can send incoming data at the routing logic, so that I don't have to deal with sync-related issues on the RX side of things, and put the service buffer inside of the routing logic to work:

On the RX side, if the buffer for that interface (1 byte) is full or going to be filled, turn off rx_ready. And if the buffer is empty, it moves the data received from the interface into the service buffer.

On the TX side, I implemented a round-robin approach and only service one buffer at any given time. If the destination is ready, send the byte and set rx_ready to true. Also note that due to only servicing 1 TX queue at any given time, I have to update the tx_valid bit of the last destination to avoid sending duplicates.

Potential problem

The rx_ready design is currently under evaluation, I have two approaches in mind, one is the safe one by assigning rx_ready = ~in_buffer which would definitely solve any kind of problems related to an interface sending when the buffer isn't ready. But this would mean skipping cycles when 1 interface can directly stream to another.

Then there's the option of only turning off rx_ready when the interface is trying to write to a full buffer, but the incoming byte still stays in a register and hence enabling continuous streaming.

However, since we're polling from any specific buffer only once every 4 cycles, that means skipping 1 cycle for the RX side of 1 interface is trivial. So, I went with the first approach.

However, this gave me inspiration for another thing: I can allow a direct stream mode so that one device can just stream to another, better yet, I can also use a shared pool of memory to avoid any kind of streaming, although that will significantly impact the logic involved and reduce queue size flexibility.

BRAM access

Exciting stuff, finally getting into BRAM land, a 1 cycle delay is acceptable when the logic itself is running faster than the interface. I learned about how to safely access it (within the same clock domain of course, that's why there's an RX buffer), and wrote some logic for the RX buffer (incomplete).

Reflections

  1. Trade-offs are being made. If I construct more complex logic, then I can eliminate the need for a central routing logic for data, but there's a few catches to that: 1. the memory management would be more complex, although it would allow more flexible memory allocation and handle bursts better, but it would also mean having 4 smaller queues inside of a bigger memory pool and using a memory collection queue to keep track of which buffers are empty; 2. If one interface is being congested, that means the entire fabric is probably going to be congested.
    • As always, there's the design of using reserved queues for each interface and a shared central buffer for handling bursts. BUT WITH EVEN MORE COMPLEXITY!
  2. Reworking the design is acceptable, but I should still keep track of all of my ideas just in case I want to go back to them one day. A lot of things came up as I gathered my thoughts for this devlog, combining unimplemented ideas and my current implementation. Best to save this devlog for future references.
  3. FPGAs are restricting, but as I dug deeper into constructing logic for it, I felt as inspired as I first found out how programming is like teaching a child to do everything as explicitly and as accurately as you can.
  4. Ideas are cheaper than implementation, that doesn't mean they'll stay. Keep track of the ideas.

Lessons learned

  1. Respect the hardware. Get to know it more, like how BRAM access has a 1 cycle delay and how non-BRAM variables will use up your LUTs.
  2. There's multiple ways to do things, weigh them carefully and decide what to do with them. They can be ditched, implemented, or saved for the future.
  3. Rethink and connect. One design choice can lead to another, one idea can be combined with another. Go back to previous thoughts and think about how you can refine the current implementation by taking a page out of those past books.

Final thoughts

As I continue working on ROSE, I see more of its potential, and I can see that I'm making steps to realizing many of them.

I might write down all of my ideas for someone (perhaps me in a more distant future) to implement all of them.

Next steps

Complete the RX and TX queues, and test them out on a testbench.