llama.cpp/ggml-cuda.cu at chunks

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Georgi Gerganov 2d5db48371 ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508 )

* ggml : use F16 instead of F32 in Q4_0, Q4_1 and Q8_0

* llama : bump LLAMA_FILE_VERSION to 3

* cuda : update Q4 and Q8 dequantize kernels

* ggml : fix AVX dot products

* readme : update performance table + hot topics

2023-05-19 22:17:18 +03:00

30 KiB

Raw Permalink Blame History

View Raw

30 KiB Raw Permalink Blame History

30 KiB

Raw Permalink Blame History