Files
llama.cpp/ggml-cuda.cu
slaren f5ef5cfb18 ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412)
* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16

* rename CC_TURING to CC_VOLTA

* disable fp16 mat mul completely with multi GPU
2023-09-30 18:12:57 +02:00

274 KiB