WORKING PROGRESS. revamped the files and naming, so git is a bit confused

mem_hub.sv -> hub.sv spi_slave.sv -> interface.sv
2025-05-17 22:06:41 -04:00
parent a83a8f9f95
commit 1f7c47a1fb
5 changed files with 392 additions and 149 deletions
--- a/devlog/2025-05-17-Routing-logic.md
+++ b/devlog/2025-05-17-Routing-logic.md
@ -0,0 +1,120 @@
+# Routing Logic
+Date: 2025-05-17
+
+## Goals and expectations
+The flu is mostly gone from me, so I expected to get some work done.
+The first is to complete the core routing logic (still, no congestion
+considered, if a TX buffer is full, then it'll just drop the data
+silently).
+
+## Thought train
+The RX queue should also be able to send high-priority messages to the
+interface if it's congested because the routing logic is congested.
+There can even be congestion management methods developed based on how
+full the RX queue is relative to the packet sizes and the number of
+connected devices.
+
+## Results
+### Trivial (not really)
+Due to the younger me being blind and used `verilog-mode`'s default
+3-space indentation, the files have been revamped to use 4 spaces for
+indentations.  And I also renamed some files and modules.
+
+Indentation is pretty relevant in good code, 3 spaces is probably more
+evil than using tabs.
+
+### Completed routing logic
+I had the idea in mind to stream packets directly without any buffer
+involved, but to simplify the round-robin logic (and allowing
+potentially multiple streams of data), I went with a small buffer to
+absorb 1 byte from the interfaces.
+
+I did re-rethink the routing logic: all interfaces can send incoming
+data at the routing logic, so that I don't have to deal with
+sync-related issues on the RX side of things, and put the service
+buffer inside of the routing logic to work:
+
+On the RX side, if the buffer for that interface (1 byte) is full or
+going to be filled, turn off `rx_ready`.  And if the buffer is empty,
+it moves the data received from the interface into the service buffer.
+
+On the TX side, I implemented a round-robin approach and only service
+one buffer at any given time.  If the destination is ready, send the
+byte and set `rx_ready` to true.  Also note that due to only servicing
+1 TX queue at any given time, I have to update the `tx_valid` bit of
+the last destination to avoid sending duplicates.
+
+#### Potential problem
+The `rx_ready` design is currently under evaluation, I have two
+approaches in mind, one is the safe one by assigning `rx_ready =
+~in_buffer` which would definitely solve any kind of problems related
+to an interface sending when the buffer isn't ready.  But this would
+mean skipping cycles when 1 interface can directly stream to another.
+
+Then there's the option of only turning off `rx_ready` when the
+interface is trying to write to a full buffer, but the incoming byte
+still stays in a register and hence enabling continuous streaming.
+
+However, since we're polling from any specific buffer only once every
+4 cycles, that means skipping 1 cycle for the RX side of 1 interface
+is trivial.  So, I went with the first approach.
+
+However, this gave me inspiration for another thing: I can allow a
+direct stream mode so that one device can just stream to another,
+better yet, I can also use a shared pool of memory to avoid any kind
+of streaming, although that will significantly impact the logic
+involved and reduce queue size flexibility.
+
+### BRAM access
+Exciting stuff, finally getting into BRAM land, a 1 cycle delay is
+acceptable when the logic itself is running faster than the interface.
+I learned about how to safely access it (within the same clock domain
+of course, that's why there's an RX buffer), and wrote some logic for
+the RX buffer (incomplete).
+
+## Reflections
+1. Trade-offs are being made.  If I construct more complex logic, then
+   I can eliminate the need for a central routing logic for data, but
+   there's a few catches to that: 1. the memory management would be
+   more complex, although it would allow more flexible memory
+   allocation and handle bursts better, but it would also mean having
+   4 smaller queues inside of a bigger memory pool and using a memory
+   collection queue to keep track of which buffers are
+   empty; 2. If one interface is being congested, that means the
+   entire fabric is probably going to be congested.
+   - As always, there's the design of using reserved queues for each
+     interface and a shared central buffer for handling bursts.  BUT
+     WITH EVEN MORE COMPLEXITY!
+2. Reworking the design is acceptable, but I should still keep track
+   of all of my ideas just in case I want to go back to them one day.
+   A lot of things came up as I gathered my thoughts for this devlog,
+   combining unimplemented ideas and my current implementation.  Best
+   to save this devlog for future references.
+3. FPGAs are restricting, but as I dug deeper into constructing logic
+   for it, I felt as inspired as I first found out how programming is
+   like teaching a child to do everything as explicitly and as
+   accurately as you can.
+4. Ideas are cheaper than implementation, that doesn't mean they'll
+   stay.  Keep track of the ideas.
+   
+## Lessons learned
+1. Respect the hardware.  Get to know it more, like how BRAM access
+   has a 1 cycle delay and how non-BRAM variables will use up your
+   LUTs.
+2. There's multiple ways to do things, weigh them carefully and
+   decide what to do with them.  They can be ditched, implemented, or
+   saved for the future.
+3. Rethink and connect.  One design choice can lead to another, one
+   idea can be combined with another.  Go back to previous thoughts
+   and think about how you can refine the current implementation by
+   taking a page out of those past books.
+
+## Final thoughts
+As I continue working on ROSE, I see more of its potential, and I can
+see that I'm making steps to realizing many of them.
+
+I might write down all of my ideas for someone (perhaps me in a more
+distant future) to implement all of them.
+
+## Next steps
+Complete the RX and TX queues, and test them out on a testbench.