initial commit: figuring out SPI on the Tang Primer 20K, already 3 devlogs, will commit on a per-devlog/document change basis

This commit is contained in:
2025-05-11 00:35:24 -04:00
commit 24bf28db9d
8 changed files with 549 additions and 0 deletions

View File

@ -0,0 +1,120 @@
# SPI slave implementation on the Tang Primer 20K
Date: 2025-05-03
## Goals and expectations
Today's goal will be focused on getting some simple SPI modules on the
Tang Primer 20K running that would receive bits from a master
(raspberry pi) and sends them back, optionally to increment them by
one.
### Before diving in
- I have not yet formally learned SystemVerilog, perhaps I won't until
the early stages of THORN, since simulations are something of its
concerns.
- Today's development will be in a learn as I go model, this is to
quickly get my hands dirty playing with the FPGA and related things
and to give myself some positive feedback after planning out such a
big plan and getting honestly a bit scared by the details (even
though I set pretty loose deadlines).
- Hopefully, I can figure out how to run their programmer in Linux as
well, and hopefully they provided command-line access to their tool
chain or else I would have to ship the synthesized binary in the
repository so that I can automate flashing it.
## Results
We didn't get to pushing the logic to the FPGA, but we did get pass
the sim using `verilator` with a `tb` generating the master signal.
This is a great step forward as I implemented it with a buffer in
mind, which would corresponding to buffering an entire ROSE packet to
memory.
## Reflections
1. Use elegant solutions, if something is ugly, it should not have
been working in the first place.
2. Programming in SystemVerilog is very different from your normal,
sequential programming languages, especially with non-blocking
assignments.
- Non-blocking assignments will run at once after all the logic,
and all of them at once.
- For example: if I want to update a buffer for the result of
incrementing a received buffer upon the reception of the 8th bit,
I would have to write it as `tx_buff <= {rx_shift[6:0], mosi} +
1`, instead of `tx_buff <= rx_shift + 1`, `rx_shift` hasn't been
updated in that cycle. This actually took me a while to figure
out, that I'm updating the buffer with older results if I use the
latter.
3. **ALWAYS** test with longer tests, and every bit matters.
- For starters, my initial (incorrect) logic ran fine against a
single byte of data, but I didn't notice that I'm actually
updating the buffer at the reception of the 9th bit, which meant
that it quickly fell apart when I tested it against something
like "HELLO" and got gibberish back. It was at that moment I
knew that I had incorrect timing for updating the buffer.
- Then there came the problem with "HELLO", I found that I got
results that were incremented by **2** when I ran the "fixed"
version. While it was tempting to just decrement it by 1 when I
update `tx_shift` and when sending out the first bit of the byte
(since `tx_shift` isn't updated at that time, it would have to
draw that bit from `tx_buff`) or simply not add the 1 to
`tx_buff`. But fixing the logic mattered more, and that's when I
noticed that they all had 0's at their second to last bit for the
entire string. Yup, another sync issue. So, I moved on to
testing with something with more coverage like "ABCD", which
covers the parity.
4. Plan out what to do at each clock edge. Clocks are confusing with
combinational logic, especially paired with SPI's specifications
for using both the rising and falling edges. It took me a while to
realize that the rising edge happens **before** the falling one,
which meant that the data from the rising edge has already been
updated.
5. Use `$display` to poke around the bits and bytes. Although an
oscilloscope might be even better, I should probably set that up.
Displaying debug messages has helped me catch many errors in my
logic (completely avoidable, just a newbie programmer's fault).
6. ChatGPT might not be the best tool to help. I think it did well
helping me plan this project, but not in helping me with the actual
code (according to my philosophy, it never should even try). It
tried to write a 2FF syncing module for the slave module, which is
completely unnecessary since we're directly syncing from the
master's clock. Although that would be helpful later on with
inter-clock domain transactions of data.
- Do your own research, use ChatGPT for suggestions and analyzing
the error messages (thanks to `verilator` for giving me good error
messages and kidnapping me like a rust compiler).
## Final thoughts
SystemVerilog is confusing. Combinational logic is confusing.
Designing logic and tests within that framework is confusing.
Fuddling with them befuddled me. But it's sweet to see some positive
feedback after all that planning. It's a great start (even if it's
just some simple receive-and-send-back logic), I feel like I'm
actually starting to learn the ropes here, something that I could
never have learned by reading online tutorials or some reference book.
Even though most of today was me shooting myself in the foot with a
poor understanding of the principles behind SV, it helped me grasp how
the language and what it produces work. It felt like opening myself
to a new domain, where everything can be ran all at once, something
achievable in python or C only with running different threads and
using mutexes to prevent data corruption. It felt like a leap of
faith into abstracting the logic gates but keeping the logic alive. I
still have faith that ROSE will succeed, along with THORN and PETAL.
I still setup a testbench in ROSE, but after the completion of a
working prototype, it will be migrated/integrated into THORN.
For now, let the rose sprout on its own and let it gain the momentum
to grow its stems and leaves...
## The next step
Dump the logic onto the FPGA, and see how it responds to the raspberry
pi.
Then, try to setup a buffer inside the FPGA's registers to hold
packets. Perhaps try to receive a number of ROSE packets, and then
send them out in reverse order, with their contents modified (like
switching the source and destination fields).
Might as well as enable UART dumps for debug messages, they would come
in handy once I hand the logic to the FPGA. This is highly optional
for a "next step", but a must in the near-term.

View File

@ -0,0 +1,55 @@
# SPI slave implementation on the Tang Primer 20K
Date: 2025-05-04
## Goals and expectations
Since yesterday I completed the simulated runs for the simple SPI
slave's logic, so I wanted to test it out on real hardware to get my
hands dirty, and also try to start adding queues to the logic to
actually start handling traffic and if the timing allows, play around
with UART for a while.
## Results
Less than appealing, definitely room for improvement. I got stuck
because apparently the `sclk` pin should be connected to a
timing-specific pin if I want to treat the incoming signal as a clock
signal. That turned today into a huge digging into the documentation
for the FPGA for usable pins. And usable `GCLK` pins were hard to
find since I'm using the dock ext. board and it already repurposed
several (most) of them to onboard peripheral ports like Ethernet. I
have exactly 1 pair of `GCLK` pins, for potentially eight connected
RPi's. That's not enough.
So, I'm going with the other method: we're using an internal
(higher-frequency) clock to capture `sclk` using 2FF synchronizers.
This would come in handy when I start managing different queues for
multiple connections, so today isn't completely stalling.
At the very least I got some research done on how the pins on the FPGA
are purposed, it's been a great learning experience.
So, in a word, I did nothing tangible today, but paved the path for
tomorrow.
## Reflections
1. Get things into action, real action. Sims can only detect logical
errors, but throwing the code into Place and Route and letting the
FPGA-specific tools warn you about the what you're doing wrong is
essential when building this close to actual hardware.
- If yesterday opened my mind to simultaneously running logic, then
today would be me tripping over because I left the safe embrace
of abstracting away the hardware.
2. More knowledge isn't always bad for the project, need more
foresight from now on.
## Final Thoughts
When you port your code from one Linux machine to another, you'd
expect it to work with some minor tweaks to the dependencies
(unless they touch the kernel), but porting FPGA code from a sim
bed to hardware feels harder than writing the logic itself (on an
unfamiliar platform).
The future is still looking great.
## The next step
Same as yesterday, test the logic with hardware, try setting up
buffers, potentially tinker with UART.

View File

@ -0,0 +1,69 @@
# SPI slave implementation on the Tang Primer 20K
Date: 2025-05-04
## Goals and expectations
Get back on track.
I've been stalling even more due to the co-op term starting and having
a lot to learn every day. But they also gave me a lot of inspiration,
especially DCTCP and some efforts from UET's congestion management
methods, with its adaptive adjusting of transmission windows.
It feels good to see some more ideas getting integrated into ROSE, but
also daunting because I know if I don't get moving soon, this will end
up in the trash.
## Results
Decent. Decent progress considering the past week, I managed to solve
the sync issue.
My first approach was to just use 2 flip-flops to detect the rising
and the falling edges, but somehow the transmission part is delayed by
exactly 1 bit (`sclk` cycle). It's normal for the 2 flip-flop method
to induce a one-time delay of 1-2 cycles (resolvable by dumping the
first byte), but not normal for it to actually be permanently delayed.
It would've been great if I just set `miso` with the second most
significant bit, but that's not a solution if I want to pipe entire
bytes to a queue somewhere else.
Thanks to the tutorial on https://www.fpga4fun.com/SPI2.html, I
found the problem - I have to prepare the sending bit *before* the
falling edge, and by the time my edge detection has reported a falling
edge, the testbench had already sampled the `miso` line.
With that out of the way, I can finally start thinking about how to
implement a simple routing logic.
## Reflections
1. Elegance matters. The first approach was "okay" in the sense that
it actually did what I want, but "not okay" in the sense that it
fits terribly in a system. If it's not elegant, it probably won't
fit.
2. Plan small. I never thought that syncing across clock domains could
be such a problem, and it can even reveal problems in a working
design (i.e. the module I wrote on the first day that uses `sclk`
directly, it tries to assign the new `miso` value upon `negedge
sclk` and worked perfectly fine in the sim). Plan small and hope
that you can do more.
3. Read other people's code, see how other people do it, find good
resources. I originally didn't try to use other people's code
simply because the code I found on GitHub were to generalized and
complex for my current understanding, and that they used Verilog,
which would be pain if I tried to run it in SystemVerilog. But
once I switched the keywords from "SPI SystemVerilog" to "SPI
FPGA", the referenced tutorial came up and it was a lifesaver.
Cleanest code for implementing the features I wanted.
4. Plan well. The first approach was caught as a somehow-running-bug
immediately as I try to integrate it into the larger system. This
is planning helping you navigate the project and signaling pitfalls
very early on.
## Final thoughts
Hopefully, co-op would ease (as I digest all the incoming info from
all the training), and we make progress.
I can't imagine how metastability in systems running on higher clocks
could be built if all the people builds stuff like I do ;)
## The next step
Implement FIFO modules. Try to send a byte stream backwards.