llama.cpp/ggml-cuda.h at ea3a0ad6b6b5ca4693b94acd4cb32e2803f66fae

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

slaren 7fc50c051a cuBLAS: use host pinned memory and dequantize while copying (#1207 )

* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase

2023-04-29 02:04:18 +02:00

2.7 KiB

Raw Blame History

View Raw

2.7 KiB Raw Blame History

2.7 KiB

Raw Blame History