Aaron Teo
4f017d718a
ggml-cpu: test fix for conversion failure
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:55:16 +08:00
Aaron Teo
5424d9e757
ggml-cpu: add breakpoint for debugging
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:51:05 +08:00
Aaron Teo
bb9345ca8a
ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:50:05 +08:00
Aaron Teo
e0f8fb930b
ggml-cpu: clarify variable naming
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:43:41 +08:00
Aaron Teo
27b4c3f338
ggml-cpu: remove noop, general code cleanup
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:41:39 +08:00
Aaron Teo
8312adc980
ggml-cpu: rework noop
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:24:32 +08:00
Aaron Teo
6d507bbeb0
ggml-cpu: switch to vec_xst for 4 element loops also
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:23:23 +08:00
Aaron Teo
f9f6c7e897
ggml-cpu: nnpa switch to vec_xst test
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:16:35 +08:00
Aaron Teo
6a25fd8531
ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:10:44 +08:00
Aaron Teo
ebc1d19f62
ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 16:01:55 +08:00
Aaron Teo
9330454cb8
ggml-cpu: remove sigint from fp16 store
...
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 15:06:31 +08:00
Aaron Teo
575ea9f6c6
ggml-cpu: fp16 load ensured to hit
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 15:00:46 +08:00
Aaron Teo
8f3a5af6c0
ggml-cpu: ensure fp16 and fp32 load and stores are called
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:57:25 +08:00
Aaron Teo
94f10ca189
ggml-cpu: fix float placeholder
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:53:15 +08:00
Aaron Teo
d9cc63a94a
ggml-cpu: fix print vs printf
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:51:38 +08:00
Aaron Teo
48b820d05f
ggml-cpu: add debugging prints to see if dlf16 is correct
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-21 14:50:33 +08:00
Aaron Teo
ffe296457e
ggml-cpu: better variable names
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 2f58bbcbb8 )
2025-06-21 14:47:46 +08:00
Aaron Teo
ebf9f34a38
ggml-cpu: add fp32->fp16
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 0ff0d65162 )
2025-06-21 14:47:23 +08:00
Aaron Teo
45a4cf651c
ggml-cpu: add fp16->fp32 nnpa first
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 8d4a7987f9 )
2025-06-21 14:47:12 +08:00
Aaron Teo
5801806f70
ggml-cpu: add nnpa compile flag
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 4a9f60c201 )
2025-06-21 14:46:41 +08:00
Acly
b7147673f2
Add ggml_roll (ggml/1274)
...
* ggml : add ggml_roll
* use set/get_op_params & std::min
2025-06-20 21:02:47 +03:00
Christian Kastner
6369be0735
Implement GGML_CPU_ALL_VARIANTS for PowerPC ( #14286 )
...
* Add PowerPC feature detection and scoring
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC
* ggml-cpu: Delay some initializations until function is called
When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-06-20 14:17:32 +02:00
Georgi Gerganov
d27b3ca175
ggml : fix repack work size for mul_mat_id ( #14292 )
...
ggml-ci
2025-06-20 11:19:15 +03:00
Charles Xu
9230dbe2c7
ggml: Update KleidiAI to v1.9.0 ( #14277 )
2025-06-20 10:51:01 +03:00
Diego Devesa
8f71d0f3e8
ggml-cpu : remove unnecesary arm feature detection ( #14281 )
...
Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.
2025-06-19 21:24:14 +02:00
Aaron Teo
faed5a5f5d
llamafile : support s390x SIMD instruction set ( #14273 )
2025-06-19 11:48:54 +02:00
Aaron Teo
50d2227953
ggml-cpu: reduce asm calls for hsum ( #14037 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-18 18:10:08 +01:00
Aaron Teo
6231c5cd6d
ggml-cpu: fix uncaught underscore terminators ( #14023 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-06-18 18:06:49 +01:00
Charles Xu
ef035803eb
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS ( #14258 )
2025-06-18 12:40:07 +01:00
xctan
860a9e4eef
ggml-cpu : remove the weak alias trick ( #14221 )
2025-06-17 12:58:32 +03:00
Diego Devesa
6adc3c3ebc
llama : add thread safety test ( #14035 )
...
* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-16 08:11:43 -07:00
Charles Xu
3ba0d843c6
ggml: Add Android support for GGML_CPU_ALL_VARIANTS ( #14206 )
2025-06-16 11:47:57 +02:00
xctan
3555b3004b
ggml-cpu : rework weak alias on apple targets ( #14146 )
...
* ggml-cpu : rework weak alias on apple targets
* fix powerpc detection
* fix ppc detection
* fix powerpc detection on darwin
2025-06-16 13:54:15 +08:00
Christian Kastner
532802f938
Implement GGML_CPU_ALL_VARIANTS for ARM ( #14080 )
...
* ggml-cpu: Factor out feature detection build from x86
* ggml-cpu: Add ARM feature detection and scoring
This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.
This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM
Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.
Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.
* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now
The other platforms will need their own specific variants.
This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.
2025-06-11 21:07:44 +02:00
Georgi Gerganov
b7ce1ad1e3
ggml : fix weak alias win32 (whisper/0)
...
ggml-ci
2025-06-10 18:39:33 +03:00
xctan
f470bc36be
ggml-cpu : split arch-specific implementations ( #13892 )
...
* move ggml-cpu-aarch64 to repack
* split quantize_row_q8_0/1
* split helper functions
* split ggml_vec_dot_q4_0_q8_0
* split ggml_vec_dot_q4_1_q8_1
* split ggml_vec_dot_q5_0_q8_0
* split ggml_vec_dot_q5_1_q8_1
* split ggml_vec_dot_q8_0_q8_0
* split ggml_vec_dot_tq1_0_q8_K
* split ggml_vec_dot_tq2_0_q8_K
* split ggml_vec_dot_q2_K_q8_K
* split ggml_vec_dot_q3_K_q8_K
* split ggml_vec_dot_q4_K_q8_K
* split ggml_vec_dot_q5_K_q8_K
* split ggml_vec_dot_q6_K_q8_K
* split ggml_vec_dot_iq2_xxs_q8_K
* split ggml_vec_dot_iq2_xs_q8_K
* split ggml_vec_dot_iq2_s_q8_K
* split ggml_vec_dot_iq3_xxs_q8_K
* split ggml_vec_dot_iq3_s_q8_K
* split ggml_vec_dot_iq1_s_q8_K
* split ggml_vec_dot_iq1_m_q8_K
* split ggml_vec_dot_iq4_nl_q8_0
* split ggml_vec_dot_iq4_xs_q8_K
* fix typos
* fix missing prototypes
* rename ggml-cpu-quants.c
* rename ggml-cpu-traits
* rename arm folder
* move cpu-feats-x86.cpp
* rename ggml-cpu-hbm
* update arm detection macro in quants.c
* move iq quant tables
* split ggml_quantize_mat_q8_0/K
* split ggml_gemv_*
* split ggml_gemm_*
* rename namespace aarch64 to repack
* use weak aliases to replace test macros
* rename GGML_CPU_AARCH64 to GGML_CPU_REPACK
* rename more aarch64 to repack
* clean up rebase leftover
* fix compilation errors
* remove trailing spaces
* try to fix clang compilation errors
* try to fix clang compilation errors again
* try to fix clang compilation errors, 3rd attempt
* try to fix clang compilation errors, 4th attempt
* try to fix clang compilation errors, 5th attempt
* try to fix clang compilation errors, 6th attempt
* try to fix clang compilation errors, 7th attempt
* try to fix clang compilation errors, 8th attempt
* try to fix clang compilation errors, 9th attempt
* more cleanup
* fix compilation errors
* fix apple targets
* fix a typo in arm version of ggml_vec_dot_q4_K_q8_K
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-09 16:47:13 +02:00
Diego Devesa
482548716f
releases : use dl backend for linux release, remove arm64 linux release ( #13996 )
2025-06-04 13:15:54 +02:00
shalinib-ibm
093e3f1feb
cmake : Handle mixed-case 'Power' strings in POWER CPU detection ( #13966 )
...
Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".
This patch provides a fix by first converting the string to uppercase before applying the regex.
Signed-off-by: root <root@rheldb2v.pperf.tadn.ibm.com >
Co-authored-by: root <root@rheldb2v.pperf.tadn.ibm.com >
2025-06-02 15:18:36 +03:00
Max Krasnyansky
053b1539c0
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling ( #12995 )
...
* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.
Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* threading: disable SetThreadInfo() calls for older Windows versions
* Update tools/llama-bench/llama-bench.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com >
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-05-31 15:39:19 -07:00
Yibo Cai
54a2c7a8cd
arm64: optimize q4_k_q8_k kernel with i8mm ( #13886 )
...
This PR improves q4_k_q8_k gemm kernel with arm64 i8mm instruction.
Tested on neoverse-n2 with llama3 8b q4_k_m quantization model.
- 34% ~ 50% S_PP uplift for all batch sizes
- 12% ~ 37% S_TG uplift for batch size 4 and above
Perplexity doesn't change with this PR.
```
// tested on neoverse-n2
$ llama-batched-bench \
-m Meta-Llama-3-8B-Instruct-Q4_K_M.gguf \
--no-mmap -fa \
-c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \
-npl 1,2,4,8,16,32 \
-t 64
---------------------------------------------------------------------
| PP | TG | B | S_PP t/s | S_TG t/s |
| | | | original | this pr | original | this pr |
|-------|--------|------|----------|----------|----------|----------|
| 128 | 128 | 1 | 110.12 | 147.83 | 24.36 | 24.28 |
| 128 | 128 | 2 | 121.16 | 172.42 | 46.36 | 47.93 |
| 128 | 128 | 4 | 120.15 | 169.75 | 74.68 | 84.00 |
| 128 | 128 | 8 | 130.97 | 196.81 | 91.04 | 114.74 |
| 128 | 128 | 16 | 131.01 | 196.88 | 101.43 | 135.79 |
| 128 | 128 | 32 | 130.85 | 196.51 | 106.97 | 147.29 |
---------------------------------------------------------------------
```
2025-05-29 14:39:20 +03:00
Christian Kastner
21fcc21ad5
cmake: Factor out CPU architecture detection ( #13883 )
...
* cmake: Define function for querying architecture
The tests and results match exactly those of ggml/src/CMakeLists.txt
* Switch arch detection over to new function
2025-05-29 12:50:25 +02:00
Vineel Abhinav
dd8ba93416
ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm ( #13882 )
...
* F32-Mamba-Seq_Scan-SVE
* Fix formatting
* ggml : missing space
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-05-29 12:18:43 +03:00
Vineel Abhinav
1b8fb8152d
ggml: aarch64: Implement SVE F32 kernels for vector functions ( #13843 )
...
* F32-Mamba-SVE
* F32-Mamba-SVE
* Resolve test errors-1
* Resolve test errors-2
* F32-vec-SVE
* F32-vec-SVE
* F32-vec-SVE
2025-05-29 09:01:33 +03:00
xctan
05f6ac6283
ggml : riscv: add xtheadvector support ( #13720 )
...
* ggml : riscv: add xtheadvector support
* ggml : clean up some macro usage
2025-05-27 16:21:36 +03:00
Christian Kastner
7fe03e7446
ggml-cpu: x86 feature detection is specific to x86 ( #13811 )
2025-05-27 13:18:39 +02:00
Diego Devesa
2bd1b30f69
ggml-cpu : set openmp wait time if not set ( #13758 )
2025-05-24 22:26:47 +02:00
Xuan-Son Nguyen
cf4cb59e64
ggml : add ggml_gelu_erf() ( #13667 )
...
* ggml : add ggml_gelu_na (not approximated)
* fix naming order
* rename na --> erf
* apply review suggesions
* revert naming order
2025-05-21 16:26:33 +02:00
Yibo Cai
5ab5d5fb25
arm64: optimize q6_k_q8_k kernel with i8mm ( #13519 )
...
This PR improves q6_k_q8_k gemm kernel with arm64 i8mm instruction.
Tested on neoverse-n2 with llama3 8b q6_k quantization model.
- 40% ~ 54% S_PP uplift for all batch sizes
- 16% ~ 47% S_TG uplift for batch size 4 and above
Perplexity doesn't change with this PR.
```
// tested on neoverse-n2
$ llama-batched-bench \
-m Meta-Llama-3-8B-Instruct-Q6_K.gguf \
--no-mmap -fa \
-c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \
-npl 1,2,4,8,16,32 \
-t 64
---------------------------------------------------------------------
| PP | TG | B | S_PP t/s | S_TG t/s |
| | | | original | this pr | original | this pr |
|-------|--------|------|----------|----------|----------|----------|
| 128 | 128 | 1 | 78.52 | 109.18 | 18.63 | 18.88 |
| 128 | 128 | 2 | 84.62 | 123.94 | 34.54 | 36.92 |
| 128 | 128 | 4 | 84.36 | 122.49 | 52.65 | 61.32 |
| 128 | 128 | 8 | 90.52 | 138.87 | 63.46 | 84.41 |
| 128 | 128 | 16 | 90.11 | 138.56 | 71.04 | 101.33 |
| 128 | 128 | 32 | 89.81 | 137.79 | 75.14 | 110.47 |
---------------------------------------------------------------------
```
2025-05-14 21:53:52 +02:00
Dan Johansson
4f711afed5
ggml-cpu: Update KleidiAI to v1.6 and fix include directives ( #13509 )
...
Signed-off-by: Dan Johansson <dan.johansson@arm.com >
2025-05-13 18:02:28 +03:00
Dan Johansson
a71a4075cd
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel ( #13053 )
...
* ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel
Signed-off-by: Dan Johansson <dan.johansson@arm.com >
* * code review fixes
Signed-off-by: Dan Johansson <dan.johansson@arm.com >
* * adds a comment that clarifies barrier usage
Signed-off-by: Dan Johansson <dan.johansson@arm.com >
---------
Signed-off-by: Dan Johansson <dan.johansson@arm.com >
Co-authored-by: Charles Xu <charles.xu@arm.com >
2025-05-12 13:06:19 +02:00