llama.cpp/ggml-metal.m at 17c10acfb44ecb7af25e37fb67b9501cbc0034d2

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-22 12:27:26 +00:00

Files

Kawrakow e9b66ee982 metal : add Q4_1 implementation (#1785 )

23.3 ms / token, so just ~1% slower than q4_0.
Achieves 290 GB/s memory throughput.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2023-06-10 11:28:11 +03:00

View Raw