llama.cpp/ggml-cuda.h at 76a884920aa1d2fc0dc7a7ac12dfc5ec5816377c

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

slaren 7fc50c051a cuBLAS: use host pinned memory and dequantize while copying (#1207 )

* cuBLAS: dequantize simultaneously while copying memory

* cuBLAS: use host pinned memory

* cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory

* cuBLAS: also pin kv cache

* fix rebase

2023-04-29 02:04:18 +02:00

2.7 KiB

Raw Blame History

View Raw

2.7 KiB Raw Blame History

2.7 KiB

Raw Blame History