llama.cpp/ggml.c at f6793491b5af6da75edad34d6f503ef86d31b09f

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-01 09:01:57 +00:00

Files

slaren dc68f0054c cuda : fix vmm pool with multi GPU (#4620 )

* cuda : fix vmm pool with multi GPU

* hip

* use recommended granularity instead of minimum

* better error checking

* fix mixtral

* use cudaMemcpy3DPeerAsync

* use cuda_pool_alloc in ggml_cuda_op_mul_mat

* consolidate error checking in ggml_cuda_set_device

* remove unnecessary inlines

ggml-ci

* style fixes

* only use vmm for the main device

* fix scratch buffer size, re-enable vmm pool for all devices

* remove unnecessary check id != g_main_device

2023-12-26 21:23:59 +01:00

635 KiB

Raw Blame History

View Raw

635 KiB Raw Blame History

635 KiB

Raw Blame History