Files
llama.cpp/ggml-cuda.h
slaren 7fc50c051a cuBLAS: use host pinned memory and dequantize while copying (#1207)
* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase
2023-04-29 02:04:18 +02:00

2.7 KiB