Files
llama.cpp/ggml-cuda/mma.cuh
Johannes Gäßler 9a590c8226 CUDA: optimize MMQ int8 tensor core performance (#8062)
* CUDA: optimize MMQ int8 tensor core performance

* only a single get_mma_tile_x_k function

* simplify code, make functions constexpr
2024-06-24 12:41:23 +02:00

7.3 KiB