llama.cpp/ggml-cuda.cu at 9d2382b3e45b5815fc6a054045a2f2c2b18c22a2

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-12 10:47:01 +00:00

Files

Johannes Gäßler 11f3ca06b8 CUDA: Quantized matrix matrix multiplication (#2160 )

* mmq implementation for non k-quants

* q6_K

* q2_K

* q3_k

* q4_K

* vdr

* q5_K

* faster q8_1 loading

* loop unrolling

* add __restrict__

* q2_K sc_high

* GGML_CUDA_MMQ_Y

* Updated Makefile

* Update Makefile

* DMMV_F16 -> F16

* Updated README, CMakeLists

* Fix CMakeLists.txt

* Fix CMakeLists.txt

* Fix multi GPU out-of-bounds

2023-07-29 23:04:44 +02:00

194 KiB

Raw Blame History

View Raw

194 KiB Raw Blame History

194 KiB

Raw Blame History