115 lines
4.5 KiB
Markdown
115 lines
4.5 KiB
Markdown
# The Plan/Roadmap for ROSE
|
|
> Plans turn fear into focus, risk into reach, and steps into a path.
|
|
|
|
This plan has been modified in the course of the development of ROSE.
|
|
And that was also in the plan itself: you plan at every step. See the
|
|
end for the changes made to the plan.
|
|
|
|
## The roadmap
|
|
This is a rough summary of what I did and what I plan to do.
|
|
|
|
### [DONE] Learning RTL and HDL and getting familiar with SPI
|
|
Implement a functional SPI slave on the FPGA. Add small logic to
|
|
manipulate the data. Learn about cross-clock domain design and
|
|
implementation.
|
|
|
|
### [DONE] Implement the routing logic along with the interfaces
|
|
This would be the core part of implementing ROSE on the fabric
|
|
side. This is a bare minimum implementation disregarding any
|
|
congestion control or inter-fabric routing.
|
|
|
|
### [TODO] Test the logic in sims
|
|
This is an important next step. And one of the most important aspects
|
|
of this project. I need to test this out in ideal scenarios so that I
|
|
can catch and fix bugs before I get to hard-to-debug bare metal.
|
|
|
|
### [TODO] Test on a RPi-FPGA setup
|
|
Getting the code to run on sims is one thing, getting it to run on
|
|
actual hardware is another. This entire step will be to ship the code
|
|
onto my setup and deal with any kind of synthesis and place-and-route
|
|
problems that sims won't reveal.
|
|
|
|
### [TODO] Implement logging to an external device via UART
|
|
This would lay the foundations for THORN and PETAL and also would come
|
|
in handy when analyzing congestion and other anomalies.
|
|
|
|
### [TODO] Test on a RPi-FPGA-RPi setup
|
|
This is where THORN would branch off from ROSE. ROSE should keep some
|
|
minimal unit test testbenches, and have fully functional test suites
|
|
and toolchains be implemented in THORN.
|
|
|
|
### [TODO] RPi's ROSE buffer implementation
|
|
`mmap` memory for the SPI drivers on RPi's to *simulate* zero-copy on
|
|
the RPi's.
|
|
|
|
### [TODO] Modify the SPI kernel drivers for explicit DMA control
|
|
Allow ROSE's DMA to be implemented in the drivers.
|
|
|
|
### [TODO] Abstract ROSE into APIs or kernel devices
|
|
Note: This may be implemented as development of THORN goes into
|
|
action, or be facilitated by it.
|
|
|
|
### [TODO] Implement mesh networks allowing inter-fabric routing
|
|
ROSE shouldn't be limited to only 1 fabric.
|
|
|
|
## Changes to the plan
|
|
The plan is always changing, but it's important to remember what I
|
|
learned from every change.
|
|
|
|
### Ditching dynamic routing
|
|
In a datacenter or HFT setup, it's rarely expected that the connected
|
|
devices will change. Hardcoded routing paths is very acceptable and
|
|
keeps up with the deterministic nature of ROSE.
|
|
|
|
#### The lesson learned
|
|
Figure out the exact target of ROSE - it's not meant for generic
|
|
networks, so shave off any redundancy that it doesn't need.
|
|
|
|
### Not reversing the input when testing out the SPI interface
|
|
A few things to note:
|
|
|
|
1. Sending the bytes incremented by 1 back is sufficient to prove a
|
|
stable connection.
|
|
2. Reversing the input would result in a double-ended queue and
|
|
increasing the complexity of the logic without little benefits to
|
|
later steps.
|
|
|
|
So, I've decided to ditch this idea.
|
|
|
|
#### The lesson learned
|
|
Plan with the next step in mind, take actions with the next step in
|
|
mind. Know what is enough and what is too far.
|
|
|
|
### Moving deployment onto the hardware until later on
|
|
Originally, I planned to deploy the logic and test with real hardware
|
|
as soon as I have a working SPI module. But that's not really
|
|
applicable. I'd be fixing the synthesis with every step thereafter.
|
|
Better to finalize the design in sims first, and then solve the
|
|
FPGA-specific problems as an entire step.
|
|
|
|
I'd rather scratch my head over a bunch of problems at once than to
|
|
scratch my head every time I push an update to the logic.
|
|
|
|
#### The lesson learned
|
|
Weight testing against the cost of time and efficiency. If testing
|
|
hinders development, then it should be separated from the development
|
|
cycle.
|
|
|
|
### Ditching features
|
|
I ditched the plans for supporting AI clusters, along with the plans
|
|
for congestion control. Focus on reducing latency and an
|
|
implementation that's elegant and simple.
|
|
|
|
#### The lesson learned
|
|
Focus. Know what ROSE really stand for, and stop spending thoughts on
|
|
unnecessary things like trying to dual-wield AI and HFT workloads.
|
|
|
|
### Added sims as part of the plan
|
|
Yes, that should be explicitly written in the plan!
|
|
|
|
#### The lesson learned
|
|
Every step is a step. Every step you've walked is solid, and every
|
|
step you plan should also fall sequentially. There's no jumping from
|
|
raw code to deployment, and there's no excuse for leaving this out of
|
|
the plan.
|