Files
llama.cpp/docs/build-riscv64-spacemit.md
alex-spacemit b77e6c18e1 ggml: riscv: add riscv spacemit backend (#15288)
* ggml: add spacemit backend

Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23

* add new line at end of file

Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2

* add riscv zba extension limit

Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce

* fixed for review comments, file renamed and format

Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce

* fixed for code format, after clang-format

Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2

* use _Float16 instead of __fp16

Change-Id: I039fb02bb95270e641bc4442204e658735859d43

* add ci for riscv64-spacemit-ime-native

Change-Id: I711c1033061df1a289ea77891b2997599dfe8279

* update debian-13-riscv64-spacemit-ime-native ci label

Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a

* remove license comment for spacemit ime

Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3

* upgrade binutils for gcc ime

Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45

* add spacemit ime cross jobs

Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6

* remove native compile for riscv64-spacemit-ime

Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e

* ci : add caching for spacemit ime cross toolchain

Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de

* ci: bug fixed for cache path and env

Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a

* Update .github/workflows/build-linux-cross.yml for cache path

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* bugfixed for  build-linux-cross.yml,  syntax error

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: cailinxi <linxi.cai@spacemit.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-29 17:50:44 +03:00

4.0 KiB

Important

This build documentation is specific only to RISC-V SpacemiT SOCs.

Build llama.cpp locally (for riscv64)

  1. Prepare Toolchain For RISCV
wget https://archive.spacemit.com/toolchain/spacemit-toolchain-linux-glibc-x86_64-v1.1.2.tar.xz
  1. Build Below is the build script: it requires utilizing RISC-V vector instructions for acceleration. Ensure the GGML_CPU_RISCV64_SPACEMIT compilation option is enabled. The currently supported optimization version is RISCV64_SPACEMIT_IME1, corresponding to the RISCV64_SPACEMIT_IME_SPEC compilation option. Compiler configurations are defined in the riscv64-spacemit-linux-gnu-gcc.cmake file. Please ensure you have installed the RISC-V compiler and set the environment variable via export RISCV_ROOT_PATH={your_compiler_path}.

cmake -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DGGML_CPU_RISCV64_SPACEMIT=ON \
    -DLLAMA_CURL=OFF \
    -DGGML_RVV=ON \
    -DGGML_RV_ZFH=ON \
    -DGGML_RV_ZICBOP=ON \
    -DRISCV64_SPACEMIT_IME_SPEC=RISCV64_SPACEMIT_IME1 \
    -DCMAKE_TOOLCHAIN_FILE=${PWD}/cmake/riscv64-spacemit-linux-gnu-gcc.cmake \
    -DCMAKE_INSTALL_PREFIX=build/installed

cmake --build build --parallel $(nproc) --config Release

pushd build
make install
popd

Simulation

You can use QEMU to perform emulation on non-RISC-V architectures.

  1. Download QEMU
wget https://archive.spacemit.com/spacemit-ai/qemu/jdsk-qemu-v0.0.14.tar.gz
  1. Run Simulation After build your llama.cpp, you can run the executable file via QEMU for simulation, for example:
export QEMU_ROOT_PATH={your QEMU file path}
export RISCV_ROOT_PATH_IME1={your RISC-V compiler path}

${QEMU_ROOT_PATH}/bin/qemu-riscv64 -L ${RISCV_ROOT_PATH_IME1}/sysroot -cpu max,vlen=256,elen=64,vext_spec=v1.0 ${PWD}/build/bin/llama-cli -m ${PWD}/models/Qwen2.5-0.5B-Instruct-Q4_0.gguf -t 1

Performance

Quantization Support For Matrix

model name      : Spacemit(R) X60
isa             : rv64imafdcv_zicbom_zicboz_zicntr_zicond_zicsr_zifencei_zihintpause_zihpm_zfh_zfhmin_zca_zcd_zba_zbb_zbc_zbs_zkt_zve32f_zve32x_zve64d_zve64f_zve64x_zvfh_zvfhmin_zvkt_sscofpmf_sstc_svinval_svnapot_svpbmt
mmu             : sv39
uarch           : spacemit,x60
mvendorid       : 0x710
marchid         : 0x8000000058000001

Q4_0

Model Size Params backend threads test t/s
Qwen2.5 0.5B 403.20 MiB 630.17 M cpu 4 pp512 64.12 ± 0.26
Qwen2.5 0.5B 403.20 MiB 630.17 M cpu 4 tg128 10.03 ± 0.01
Qwen2.5 1.5B 1011.16 MiB 1.78 B cpu 4 pp512 24.16 ± 0.02
Qwen2.5 1.5B 1011.16 MiB 1.78 B cpu 4 tg128 3.83 ± 0.06
Qwen2.5 3B 1.86 GiB 3.40 B cpu 4 pp512 12.08 ± 0.02
Qwen2.5 3B 1.86 GiB 3.40 B cpu 4 tg128 2.23 ± 0.02

Q4_1

Model Size Params backend threads test t/s
Qwen2.5 0.5B 351.50 MiB 494.03 M cpu 4 pp512 62.07 ± 0.12
Qwen2.5 0.5B 351.50 MiB 494.03 M cpu 4 tg128 9.91 ± 0.01
Qwen2.5 1.5B 964.06 MiB 1.54 B cpu 4 pp512 22.95 ± 0.25
Qwen2.5 1.5B 964.06 MiB 1.54 B cpu 4 tg128 4.01 ± 0.15
Qwen2.5 3B 1.85 GiB 3.09 B cpu 4 pp512 11.55 ± 0.16
Qwen2.5 3B 1.85 GiB 3.09 B cpu 4 tg128 2.25 ± 0.04

Q4_K

Model Size Params backend threads test t/s
Qwen2.5 0.5B 462.96 MiB 630.17 M cpu 4 pp512 9.29 ± 0.05
Qwen2.5 0.5B 462.96 MiB 630.17 M cpu 4 tg128 5.67 ± 0.04
Qwen2.5 1.5B 1.04 GiB 1.78 B cpu 4 pp512 10.38 ± 0.10
Qwen2.5 1.5B 1.04 GiB 1.78 B cpu 4 tg128 3.17 ± 0.08
Qwen2.5 3B 1.95 GiB 3.40 B cpu 4 pp512 4.23 ± 0.04
Qwen2.5 3B 1.95 GiB 3.40 B cpu 4 tg128 1.73 ± 0.00