ROSE - RDMA Over SPI Engine
A zero-copy, RDMA-inspired transport layer built over SPI, designed for embedded systems and affordable hardware experimentation.
ROSE is part of the larger FLORA project, see [link here] for details.
Table of Contents
- What is a rose?
- What is ROSE?
- The goal
- Getting started
- Coding in style
- Protocol specifications
- The planning
- The action
- Why ROSE?
- Special thanks
What is a rose?
Roses are elegant. So must be the systems we grow.
This is the underlying philosophy of ROSE since the day the idea popped into my head.
What is ROSE?
ROSE is an open-source RDMA-inspired data transfer protocol and engine built over SPI, originally using Raspberry Pi devices and an FPGA fabric. It simulates key properties of RDMA — low-latency, memory-mapped, zero-copy semantics — on affordable, widely accessible hardware.
The goal
To explore systems-level design, test automation and test-driven development cycles, and high-performance data movement through hardware-software co-design.
Getting started
Requirements
- Some SBCs with SPI interfaces, preferably running a DMA-enabled SPI controller (Raspberry Pi's will do just fine)
- An FPGA with enough pins (5 per device connection)
- The willingness and courage to tinker with Linux and FPGAs.
- Some SystemVerilog knowledge (unless you're using the Tang Primer 20K, which is what I use)
- A good terminal emulator, a good shell, and a good code editor (emacs preferred).
Deploying the masters/controllers on the SBCs
TBD
Deploying the slaves/peripheral modules on the FPGA fabric
TBD
Coding in style
Coding style is important. Code is for both the human and the machine, the machine doesn't care about style, but humans do. A good style would help development a lot.
See style.md
for details.
Protocol specifications
ROSE was designed to embrace newer possibilities as development continues.
See protocol.md
for details.
The planning
See the plan in plan.md
. This file also contains short summaries of
what I did at each step.
Most of ROSE's behaviors and features have been planned before the first source file was even created. A good plan serves both as a good guideline and a good reward mechanism. You'd know early when you're running into trouble, and you'd know when you've made a solid step in realizing the project, even if it's simple shift registers to send back what your device sent.
Plans turn fear into focus, risk into reach, and steps into a path.
When you dream big, use a plan to ground it with smaller, more manageable structures. And most people like it when their dreams come true.
The action
See the devlog/
directory for a detailed record of the development
process.
Writing down what I did, how I did them, what walls I ran into, and how I can learn from my mistakes helped me realize this project.
Why ROSE?
RDMA hardware is quite inaccessible to the average person, at least when this project started. I want to show that with a few hundred dollars and some tinkering, I can build a close simulation of industrial-grade RDMA networks with lower bandwidth and higher latency, but still outperforming (in specific use cases) the Ethernet + TCP/IP stack we use day to day. The best way to learn the ways of the industry is to try to build a miniature version in your garage.
The inspiration
Before planning the project and laying out the roadmap of weeks and even months of development, I did a co-op at Nokia as a SR Platform Testing Dev. That allowed me to learn how the networks worked, how data is transmitted from one end of the world to another (I mostly worked the network layer and data link layer). That co-op helped me land me an offer as a Data Center Network Engineer (intern), and also piqued my interests in the lower levels of how the internet runs as it is today.
I did my own research on how to quickly get into the world of data centers, and came across RDMA and InfiniBand and RoCE. It was love at first sight. The idea of accessing a piece of data on a remote machine as if it was on the requesting machine felt like magic. Then came the cold, harsh truth that my PC simply could not run anything like that due to hardware and software constraints. So, I sought another path, picking a good protocol with DMA support and RDMA potential, and ended up finding SPI. It's not as powerful as messing around with PCIe lanes, but more cost-effective and easier to implement, after all, with the right design, this can easily be migrated to use SERDES or other connections.
The idea bloomed like a flower, extending its petals, and before I knew it, I had designed an entire framework with RDMA running on SPI connections and a test-driven development process (which, with some more planning, became the foundations of THORN).
But designing is not enough. Anyone could come up with an idea, some would even come up with brilliant ideas. I need something concrete to keep me anchored to my initiative of learning how this technology works. And ROSE came into being.
Special thanks
I'd like to share my gratitude to ChatGPT and other AI-driven tools in helping to realize this project. I didn't use them to write the actual code, but I used them to explore my ideas, to plan my path, and to catch anything that I overlooked in the process. They are powerful in that way, they can help expand what you have in mind, they can offer insights into areas that you've never even heard of. And in that, I'm thankful to the ever-evolving world of technology, and the countless researchers and their effort to making us live in a better world.