Files
llama.cpp/docs/backend/zDNN.md
Aaron Teo 264f1b5187 zdnn: refactor codebase + add docs (#16178)
* zdnn: initial matmul refactor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm static from funcs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update ggml-zdnn.h

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: change header files to hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch to common.hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move mulmat forward around

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm inline from utils

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* docs: add zDNN docs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-23 14:53:05 +08:00

62 lines
2.0 KiB
Markdown

# llama.cpp for IBM zDNN Accelerator
## Background
IBM zDNN (Z Deep Neural Network) is a hardware acceleration library designed specifically to leverage the IBM NNPA (Neural Network Processor Assist) accelerator located within IBM Telum I and II processors. It provides significant performance improvements for neural network inference operations.
### Llama.cpp + IBM zDNN
The llama.cpp zDNN backend is designed to enable llama.cpp on IBM z17 and later systems via the IBM zDNN hardware acceleration library.
## Software & Hardware Support
| Hardware Level | Status | Verified |
| -------------------- | ------------- | -------------------------- |
| IBM z17 / LinuxONE 5 | Supported | RHEL 9.6, IBM z17, 40 IFLs |
| IBM z16 / LinuxONE 4 | Not Supported | |
## Data Types Supported
| Data Type | Status |
| --------- | --------- |
| F32 | Supported |
| F16 | Supported |
| BF16 | Supported |
## CMake Options
The IBM zDNN backend has the following CMake options that control the behaviour of the backend.
| CMake Option | Default Value | Description |
| ------------ | ------------- | ----------------------------------- |
| `GGML_ZDNN` | `OFF` | Compile llama.cpp with zDNN support |
| `ZDNN_ROOT` | `""` | Override zDNN library lookup |
## 1. Install zDNN Library
Note: Using the zDNN library provided via `apt` or `yum` may not work correctly as reported in [#15772](https://github.com/ggml-org/llama.cpp/issues/15772). It is preferred that you compile from source.
```sh
git clone --recurse-submodules https://github.com/IBM/zDNN
cd zDNN
autoreconf .
./configure --prefix=/opt/zdnn-libs
make build
sudo make install
```
## 2. Build llama.cpp
```sh
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -S . -G Ninja -B build \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_ZDNN=ON \
-DZDNN_ROOT=/opt/zdnn-libs
cmake --build build --config Release -j$(nproc)
```