llama.cpp/ggml/src/ggml-cuda/quantize.cuh at 0f7e8f389d21482470ffa38fc067fee973f2d7b0

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-14 11:07:10 +00:00

Files

Johannes Gäßler 808aba3916 CUDA: optimize and refactor MMQ (#8416 )

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

2024-07-11 16:47:47 +02:00

View Raw