Files
llama.cpp/ggml/src/ggml-cuda/quantize.cuh
Johannes Gäßler 808aba3916 CUDA: optimize and refactor MMQ (#8416)
* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation
2024-07-11 16:47:47 +02:00

979 B