121 lines
5.5 KiB
Markdown
121 lines
5.5 KiB
Markdown
# Routing Logic
|
|
Date: 2025-05-17
|
|
|
|
## Goals and expectations
|
|
The flu is mostly gone from me, so I expected to get some work done.
|
|
The first is to complete the core routing logic (still, no congestion
|
|
considered, if a TX buffer is full, then it'll just drop the data
|
|
silently).
|
|
|
|
## Thought train
|
|
The RX queue should also be able to send high-priority messages to the
|
|
interface if it's congested because the routing logic is congested.
|
|
There can even be congestion management methods developed based on how
|
|
full the RX queue is relative to the packet sizes and the number of
|
|
connected devices.
|
|
|
|
## Results
|
|
### Trivial (not really)
|
|
Due to the younger me being blind and used `verilog-mode`'s default
|
|
3-space indentation, the files have been revamped to use 4 spaces for
|
|
indentations. And I also renamed some files and modules.
|
|
|
|
Indentation is pretty relevant in good code, 3 spaces is probably more
|
|
evil than using tabs.
|
|
|
|
### Completed routing logic
|
|
I had the idea in mind to stream packets directly without any buffer
|
|
involved, but to simplify the round-robin logic (and allowing
|
|
potentially multiple streams of data), I went with a small buffer to
|
|
absorb 1 byte from the interfaces.
|
|
|
|
I did re-rethink the routing logic: all interfaces can send incoming
|
|
data at the routing logic, so that I don't have to deal with
|
|
sync-related issues on the RX side of things, and put the service
|
|
buffer inside of the routing logic to work:
|
|
|
|
On the RX side, if the buffer for that interface (1 byte) is full or
|
|
going to be filled, turn off `rx_ready`. And if the buffer is empty,
|
|
it moves the data received from the interface into the service buffer.
|
|
|
|
On the TX side, I implemented a round-robin approach and only service
|
|
one buffer at any given time. If the destination is ready, send the
|
|
byte and set `rx_ready` to true. Also note that due to only servicing
|
|
1 TX queue at any given time, I have to update the `tx_valid` bit of
|
|
the last destination to avoid sending duplicates.
|
|
|
|
#### Potential problem
|
|
The `rx_ready` design is currently under evaluation, I have two
|
|
approaches in mind, one is the safe one by assigning `rx_ready =
|
|
~in_buffer` which would definitely solve any kind of problems related
|
|
to an interface sending when the buffer isn't ready. But this would
|
|
mean skipping cycles when 1 interface can directly stream to another.
|
|
|
|
Then there's the option of only turning off `rx_ready` when the
|
|
interface is trying to write to a full buffer, but the incoming byte
|
|
still stays in a register and hence enabling continuous streaming.
|
|
|
|
However, since we're polling from any specific buffer only once every
|
|
4 cycles, that means skipping 1 cycle for the RX side of 1 interface
|
|
is trivial. So, I went with the first approach.
|
|
|
|
However, this gave me inspiration for another thing: I can allow a
|
|
direct stream mode so that one device can just stream to another,
|
|
better yet, I can also use a shared pool of memory to avoid any kind
|
|
of streaming, although that will significantly impact the logic
|
|
involved and reduce queue size flexibility.
|
|
|
|
### BRAM access
|
|
Exciting stuff, finally getting into BRAM land, a 1 cycle delay is
|
|
acceptable when the logic itself is running faster than the interface.
|
|
I learned about how to safely access it (within the same clock domain
|
|
of course, that's why there's an RX buffer), and wrote some logic for
|
|
the RX buffer (incomplete).
|
|
|
|
## Reflections
|
|
1. Trade-offs are being made. If I construct more complex logic, then
|
|
I can eliminate the need for a central routing logic for data, but
|
|
there's a few catches to that: 1. the memory management would be
|
|
more complex, although it would allow more flexible memory
|
|
allocation and handle bursts better, but it would also mean having
|
|
4 smaller queues inside of a bigger memory pool and using a memory
|
|
collection queue to keep track of which buffers are
|
|
empty; 2. If one interface is being congested, that means the
|
|
entire fabric is probably going to be congested.
|
|
- As always, there's the design of using reserved queues for each
|
|
interface and a shared central buffer for handling bursts. BUT
|
|
WITH EVEN MORE COMPLEXITY!
|
|
2. Reworking the design is acceptable, but I should still keep track
|
|
of all of my ideas just in case I want to go back to them one day.
|
|
A lot of things came up as I gathered my thoughts for this devlog,
|
|
combining unimplemented ideas and my current implementation. Best
|
|
to save this devlog for future references.
|
|
3. FPGAs are restricting, but as I dug deeper into constructing logic
|
|
for it, I felt as inspired as I first found out how programming is
|
|
like teaching a child to do everything as explicitly and as
|
|
accurately as you can.
|
|
4. Ideas are cheaper than implementation, that doesn't mean they'll
|
|
stay. Keep track of the ideas.
|
|
|
|
## Lessons learned
|
|
1. Respect the hardware. Get to know it more, like how BRAM access
|
|
has a 1 cycle delay and how non-BRAM variables will use up your
|
|
LUTs.
|
|
2. There's multiple ways to do things, weigh them carefully and
|
|
decide what to do with them. They can be ditched, implemented, or
|
|
saved for the future.
|
|
3. Rethink and connect. One design choice can lead to another, one
|
|
idea can be combined with another. Go back to previous thoughts
|
|
and think about how you can refine the current implementation by
|
|
taking a page out of those past books.
|
|
|
|
## Final thoughts
|
|
As I continue working on ROSE, I see more of its potential, and I can
|
|
see that I'm making steps to realizing many of them.
|
|
|
|
I might write down all of my ideas for someone (perhaps me in a more
|
|
distant future) to implement all of them.
|
|
|
|
## Next steps
|
|
Complete the RX and TX queues, and test them out on a testbench.
|