mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	 264f1b5187
			
		
	
	264f1b5187
	
	
	
		
			
			* zdnn: initial matmul refactor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm static from funcs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update ggml-zdnn.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: change header files to hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to common.hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move mulmat forward around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm inline from utils Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add zDNN docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
		
			
				
	
	
		
			62 lines
		
	
	
		
			2.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			62 lines
		
	
	
		
			2.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # llama.cpp for IBM zDNN Accelerator
 | |
| 
 | |
| ## Background
 | |
| 
 | |
| IBM zDNN (Z Deep Neural Network) is a hardware acceleration library designed specifically to leverage the IBM NNPA (Neural Network Processor Assist) accelerator located within IBM Telum I and II processors. It provides significant performance improvements for neural network inference operations.
 | |
| 
 | |
| ### Llama.cpp + IBM zDNN
 | |
| 
 | |
| The llama.cpp zDNN backend is designed to enable llama.cpp on IBM z17 and later systems via the IBM zDNN hardware acceleration library.
 | |
| 
 | |
| ## Software & Hardware Support
 | |
| 
 | |
| | Hardware Level       | Status        | Verified                   |
 | |
| | -------------------- | ------------- | -------------------------- |
 | |
| | IBM z17 / LinuxONE 5 | Supported     | RHEL 9.6, IBM z17, 40 IFLs |
 | |
| | IBM z16 / LinuxONE 4 | Not Supported |                            |
 | |
| 
 | |
| ## Data Types Supported
 | |
| 
 | |
| | Data Type | Status    |
 | |
| | --------- | --------- |
 | |
| | F32       | Supported |
 | |
| | F16       | Supported |
 | |
| | BF16      | Supported |
 | |
| 
 | |
| ## CMake Options
 | |
| 
 | |
| The IBM zDNN backend has the following CMake options that control the behaviour of the backend.
 | |
| 
 | |
| | CMake Option | Default Value | Description                         |
 | |
| | ------------ | ------------- | ----------------------------------- |
 | |
| | `GGML_ZDNN`  | `OFF`         | Compile llama.cpp with zDNN support |
 | |
| | `ZDNN_ROOT`  | `""`          | Override zDNN library lookup        |
 | |
| 
 | |
| ## 1. Install zDNN Library
 | |
| 
 | |
| Note: Using the zDNN library provided via `apt` or `yum` may not work correctly as reported in [#15772](https://github.com/ggml-org/llama.cpp/issues/15772). It is preferred that you compile from source.
 | |
| 
 | |
| ```sh
 | |
| git clone --recurse-submodules https://github.com/IBM/zDNN
 | |
| cd zDNN
 | |
| 
 | |
| autoreconf .
 | |
| ./configure --prefix=/opt/zdnn-libs
 | |
| 
 | |
| make build
 | |
| sudo make install
 | |
| ```
 | |
| 
 | |
| ## 2. Build llama.cpp
 | |
| 
 | |
| ```sh
 | |
| git clone https://github.com/ggml-org/llama.cpp
 | |
| cd llama.cpp
 | |
| 
 | |
| cmake -S . -G Ninja -B build \
 | |
|     -DCMAKE_BUILD_TYPE=Release \
 | |
|     -DGGML_ZDNN=ON \
 | |
|     -DZDNN_ROOT=/opt/zdnn-libs
 | |
| cmake --build build --config Release -j$(nproc)
 | |
| ```
 |