mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-11-02 09:12:03 +00:00
ggml-cpu: implement MXFP4 SIMD for s390x (#16193)
* ggml-cpu: impl mxfp4 s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: missing s = sumf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix incorrect kval_mxfp4 type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rework mxfp4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: missing delta calc
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix typo
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix typo for vec_splats
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: expand to 2 blocks per loop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add unroll to boost perf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: back to 1 block per loop to test perf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: back to 1 block per loop to test perf"
This reverts commit 1fe55724e2.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rm unroll from single block
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
This commit is contained in:
@@ -160,7 +160,6 @@
|
||||
#define ggml_vec_dot_iq3_s_q8_K_generic ggml_vec_dot_iq3_s_q8_K
|
||||
#define ggml_vec_dot_iq1_s_q8_K_generic ggml_vec_dot_iq1_s_q8_K
|
||||
#define ggml_vec_dot_iq1_m_q8_K_generic ggml_vec_dot_iq1_m_q8_K
|
||||
#define ggml_vec_dot_mxfp4_q8_0_generic ggml_vec_dot_mxfp4_q8_0
|
||||
// repack.cpp
|
||||
#define ggml_quantize_mat_q8_0_4x4_generic ggml_quantize_mat_q8_0_4x4
|
||||
#define ggml_quantize_mat_q8_0_4x8_generic ggml_quantize_mat_q8_0_4x8
|
||||
|
||||
Reference in New Issue
Block a user