llama.cpp/ggml-cuda/mma.cuh at dd047b476c8b904e0c25e5dbc5bee6ffde2f6e17

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-17 11:37:10 +00:00

Files

Johannes Gäßler 9a590c8226 CUDA: optimize MMQ int8 tensor core performance (#8062 )

* CUDA: optimize MMQ int8 tensor core performance

* only a single get_mma_tile_x_k function

* simplify code, make functions constexpr

2024-06-24 12:41:23 +02:00

View Raw