llama.cpp/ggml.c at dd0eabc049fb1efc631cab8eb0a646808d704e18

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-01 09:01:57 +00:00

Files

unbounded dd0eabc049 ggml : use full range for Q4_0 and Q4_2 quantization (#729 )

* Use full range for q4_0 quantization

By keeping the sign of the highest magnitude, we can make sure the
highest value maps to -8, which is currently unused.
This is a bit of a freebie since it is fully backwards compatible with
the current format.

* Update quantize_row_q4_0 for AVX/AVX2

* Update quantize_row_q4_0 for WASM

Untested

* Update quantize_row_q4_0 for Arm NEON

* Update quantize_row_q4_0 for PowerPC

Untested

* Use full range for q4_2 quantization

2023-04-25 20:20:46 +03:00

385 KiB

Raw Blame History

View Raw

385 KiB Raw Blame History

385 KiB

Raw Blame History