llama.cpp/ggml-cuda.cu at 62fa15bcd24a21f3d9aa705f8ef9c00beea83b83

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-10 10:27:03 +00:00

Files

Johannes Gäßler d50f8897a7 CUDA: stream-k decomposition for MMQ (#8018 )

* CUDA: stream-k decomposition for MMQ

* fix undefined memory reads for small matrices

2024-06-20 14:39:21 +02:00

View Raw