mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Georgi Gerganov 574406dc7e ggml : add Q5_0 and Q5_1 quantization (#1187 )

* ggml : add Q5_0 quantization (cuBLAS only)

* ggml : fix Q5_0 qh -> uint32_t

* ggml : fix q5_0 histogram stats

* ggml : q5_0 scalar dot product

* ggml : q5_0 ARM NEON dot

* ggml : q5_0 more efficient ARM NEON using uint64_t masks

* ggml : rename Q5_0 -> Q5_1

* ggml : adding Q5_0 mode

* quantize : add Q5_0 and Q5_1 to map

* ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195)

---------

Co-authored-by: Stephan Walter <stephan@walter.name>

2023-04-26 23:14:13 +03:00

CMakeLists.txt

llama : fix linkage with mingw (#551 )

2023-03-28 21:23:09 +03:00

quantize.cpp

ggml : add Q5_0 and Q5_1 quantization (#1187 )

2023-04-26 23:14:13 +03:00

README.md

Overhaul the examples structure

2023-03-25 20:26:40 +02:00

README.md

quantize

TODO