mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-28 08:31:25 +00:00 
			
		
		
		
	ggml-cpu: drop support for nnpa intrinsics (#15821)
This commit is contained in:
		| @@ -42,18 +42,6 @@ cmake --build build --config Release -j $(nproc) | ||||
|     cmake --build build --config Release -j $(nproc) | ||||
|     ``` | ||||
|  | ||||
| -   By default, NNPA is disabled by default. To enable it: | ||||
|  | ||||
|     ```bash | ||||
|     cmake -S . -B build             \ | ||||
|         -DCMAKE_BUILD_TYPE=Release  \ | ||||
|         -DGGML_BLAS=ON              \ | ||||
|         -DGGML_BLAS_VENDOR=OpenBLAS \ | ||||
|         -DGGML_NNPA=ON | ||||
|  | ||||
|     cmake --build build --config Release -j $(nproc) | ||||
|     ``` | ||||
|  | ||||
| -   For debug builds: | ||||
|  | ||||
|     ```bash | ||||
| @@ -164,15 +152,11 @@ All models need to be converted to Big-Endian. You can achieve this in three cas | ||||
|  | ||||
| Only available in IBM z15/LinuxONE 3 or later system with the `-DGGML_VXE=ON` (turned on by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z14/arch12. In such systems, the APIs can still run but will use a scalar implementation. | ||||
|  | ||||
| ### 2. NNPA Vector Intrinsics Acceleration | ||||
|  | ||||
| Only available in IBM z16/LinuxONE 4 or later system with the `-DGGML_NNPA=ON` (turned off by default) compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs can still run but will use a scalar implementation. | ||||
|  | ||||
| ### 3. zDNN Accelerator (WIP) | ||||
| ### 2. zDNN Accelerator (WIP) | ||||
|  | ||||
| Only available in IBM z17/LinuxONE 5 or later system with the `-DGGML_ZDNN=ON` compile flag. No hardware acceleration is possible with llama.cpp with older systems, such as IBM z15/arch13. In such systems, the APIs will default back to CPU routines. | ||||
|  | ||||
| ### 4. Spyre Accelerator | ||||
| ### 3. Spyre Accelerator | ||||
|  | ||||
| _Only available with IBM z17 / LinuxONE 5 or later system. No support currently available._ | ||||
|  | ||||
| @@ -230,10 +214,6 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl | ||||
|     CXXFLAGS="-include cstdint" pip3 install -r requirements.txt | ||||
|     ``` | ||||
|  | ||||
| 5. `-DGGML_NNPA=ON` generates gibberish output | ||||
|  | ||||
|     Answer: We are aware of this as detailed in [this issue](https://github.com/ggml-org/llama.cpp/issues/14877). Please either try reducing the number of threads, or disable the compile option using `-DGGML_NNPA=OFF`. | ||||
|  | ||||
| ## Getting Help on IBM Z & LinuxONE | ||||
|  | ||||
| 1. **Bugs, Feature Requests** | ||||
| @@ -258,38 +238,38 @@ IBM VXE/VXE2 SIMD acceleration depends on the BLAS implementation. It is strongl | ||||
|  | ||||
| ## Appendix B: SIMD Support Matrix | ||||
|  | ||||
| |            | VX/VXE/VXE2 | NNPA | zDNN | Spyre | | ||||
| | ---------- | ----------- | ---- | ---- | ----- | | ||||
| | FP32       | ✅          | ✅   | ✅   | ❓    | | ||||
| | FP16       | ✅          | ✅   | ❓   | ❓    | | ||||
| | BF16       | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | Q4_0       | ✅          | ✅   | ❓   | ❓    | | ||||
| | Q4_1       | ✅          | ✅   | ❓   | ❓    | | ||||
| | MXFP4      | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | Q5_0       | ✅          | ✅   | ❓   | ❓    | | ||||
| | Q5_1       | ✅          | ✅   | ❓   | ❓    | | ||||
| | Q8_0       | ✅          | ✅   | ❓   | ❓    | | ||||
| | Q2_K       | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | Q3_K       | ✅          | ✅   | ❓   | ❓    | | ||||
| | Q4_K       | ✅          | ✅   | ❓   | ❓    | | ||||
| | Q5_K       | ✅          | ✅   | ❓   | ❓    | | ||||
| | Q6_K       | ✅          | ✅   | ❓   | ❓    | | ||||
| | TQ1_0      | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | TQ2_0      | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | IQ2_XXS    | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | IQ2_XS     | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | IQ2_S      | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | IQ3_XXS    | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | IQ3_S      | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | IQ1_S      | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | IQ1_M      | 🚫          | 🚫   | ❓   | ❓    | | ||||
| | IQ4_NL     | ✅          | ✅   | ❓   | ❓    | | ||||
| | IQ4_XS     | ✅          | ✅   | ❓   | ❓    | | ||||
| | FP32->FP16 | 🚫          | ✅   | ❓   | ❓    | | ||||
| | FP16->FP32 | 🚫          | ✅   | ❓   | ❓    | | ||||
| |            | VX/VXE/VXE2 | zDNN | Spyre | | ||||
| |------------|-------------|------|-------| | ||||
| | FP32       | ✅           | ✅    | ❓     | | ||||
| | FP16       | ✅           | ❓    | ❓     | | ||||
| | BF16       | 🚫           | ❓    | ❓     | | ||||
| | Q4_0       | ✅           | ❓    | ❓     | | ||||
| | Q4_1       | ✅           | ❓    | ❓     | | ||||
| | MXFP4      | 🚫           | ❓    | ❓     | | ||||
| | Q5_0       | ✅           | ❓    | ❓     | | ||||
| | Q5_1       | ✅           | ❓    | ❓     | | ||||
| | Q8_0       | ✅           | ❓    | ❓     | | ||||
| | Q2_K       | 🚫           | ❓    | ❓     | | ||||
| | Q3_K       | ✅           | ❓    | ❓     | | ||||
| | Q4_K       | ✅           | ❓    | ❓     | | ||||
| | Q5_K       | ✅           | ❓    | ❓     | | ||||
| | Q6_K       | ✅           | ❓    | ❓     | | ||||
| | TQ1_0      | 🚫           | ❓    | ❓     | | ||||
| | TQ2_0      | 🚫           | ❓    | ❓     | | ||||
| | IQ2_XXS    | 🚫           | ❓    | ❓     | | ||||
| | IQ2_XS     | 🚫           | ❓    | ❓     | | ||||
| | IQ2_S      | 🚫           | ❓    | ❓     | | ||||
| | IQ3_XXS    | 🚫           | ❓    | ❓     | | ||||
| | IQ3_S      | 🚫           | ❓    | ❓     | | ||||
| | IQ1_S      | 🚫           | ❓    | ❓     | | ||||
| | IQ1_M      | 🚫           | ❓    | ❓     | | ||||
| | IQ4_NL     | ✅           | ❓    | ❓     | | ||||
| | IQ4_XS     | ✅           | ❓    | ❓     | | ||||
| | FP32->FP16 | 🚫           | ❓    | ❓     | | ||||
| | FP16->FP32 | 🚫           | ❓    | ❓     | | ||||
|  | ||||
| -   ✅ - acceleration available | ||||
| -   🚫 - acceleration unavailable, will still run using scalar implementation | ||||
| -   ❓ - acceleration unknown, please contribute if you can test it yourself | ||||
|  | ||||
| Last Updated by **Aaron Teo (aaron.teo1@ibm.com)** on Aug 22, 2025. | ||||
| Last Updated by **Aaron Teo (aaron.teo1@ibm.com)** on Sep 6, 2025. | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Aaron Teo
					Aaron Teo