llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-19 11:57:07 +00:00

Files

Francis Couture-Harpin e9719576c4 ggml : also faster TQ1_0

Same optimization as for TQ2_0 by offsetting the sum instead of the weights.
This makes TQ1_0 almost as fast as Q8_0 on AVX2.

2024-07-31 00:08:48 -04:00

2024-06-26 18:33:02 +03:00

2024-07-30 18:33:15 -04:00

ggml : also faster TQ1_0

2024-07-31 00:08:48 -04:00

.gitignore

2024-07-13 18:12:39 +02:00

CMakeLists.txt

2024-07-28 01:41:25 +02:00