llama.cpp/ggml-cuda/common.cuh at 7c26775adb579e92b59c82e8084c07a1d0f75e9c

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Johannes Gäßler 76d66ee0be CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921 )

* CUDA: faster q2_K, q3_K MMQ + int8 tensor cores

* try CI fix

* try CI fix

* try CI fix

* fix data race

* rever q2_K precision related changes

2024-06-14 18:41:49 +02:00

View Raw