llama.cpp/ggml.c at 955ef9a5d53d8f911fe00580ac9bd0caa56430af

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-15 11:17:31 +00:00

Files

Georgi Gerganov 955ef9a5d5 ggml : alternative Q4_3 implementation using modified Q8_0 (#1109 )

* ggml : prefer vzip to vuzp

This way we always use the same type of instruction across all quantizations

* ggml : alternative Q4_3 implementation using modified Q8_0

* ggml : fix Q4_3 scalar imlpementation

* ggml : slight improvement of Q4_3 - no need for loop unrolling

* ggml : fix AVX paths for Q8_0 quantization

2023-04-22 10:55:35 +03:00

382 KiB

Raw Blame History

View Raw

382 KiB Raw Blame History

382 KiB

Raw Blame History