ggml-cpu: implement MXFP4 SIMD for s390x (#16193)

* ggml-cpu: impl mxfp4 s390x

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: missing s = sumf

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: fix incorrect kval_mxfp4 type

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: rework mxfp4

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: missing delta calc

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: fix typo

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: fix typo for vec_splats

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: expand to 2 blocks per loop

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: add unroll to boost perf

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: back to 1 block per loop to test perf

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* Revert "ggml-cpu: back to 1 block per loop to test perf"

This reverts commit 1fe55724e2.

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: rm unroll from single block

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
This commit is contained in:
Aaron Teo
2025-09-26 18:27:25 +08:00
committed by GitHub
parent 00217cd413
commit 9b26511857
2 changed files with 95 additions and 1 deletions

View File

@@ -160,7 +160,6 @@
#define ggml_vec_dot_iq3_s_q8_K_generic ggml_vec_dot_iq3_s_q8_K
#define ggml_vec_dot_iq1_s_q8_K_generic ggml_vec_dot_iq1_s_q8_K
#define ggml_vec_dot_iq1_m_q8_K_generic ggml_vec_dot_iq1_m_q8_K
#define ggml_vec_dot_mxfp4_q8_0_generic ggml_vec_dot_mxfp4_q8_0
// repack.cpp
#define ggml_quantize_mat_q8_0_4x4_generic ggml_quantize_mat_q8_0_4x4
#define ggml_quantize_mat_q8_0_4x8_generic ggml_quantize_mat_q8_0_4x8