mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-11-22 12:27:26 +00:00
23.3 ms / token, so just ~1% slower than q4_0. Achieves 290 GB/s memory throughput. Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
35 KiB
35 KiB