llama.cpp/ggml-cuda.cu at 31f27758faf4a4bd08101a57c7ec3a473f771f86

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-01 09:01:57 +00:00

Files

Erik Garrison 0f630fbc92 cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449 )

* AMD ROCm: handle UMA memory VRAM expansions

This resolves #2797 by allowing ROCm AMD GPU users with a UMA to
dynamically expand the VRAM allocated to the GPU.

Without this, AMD ROCm users with shared CPU/GPU memory usually are
stuck with the BIOS-set (or fixed) framebuffer VRAM, making it
impossible to load more than 1-2 layers.

Note that the model is duplicated in RAM because it's loaded once for
the CPU and then copied into a second set of allocations that are
managed by the HIP UMA system. We can fix this later.

* clarify build process for ROCm on linux with cmake

* avoid using deprecated ROCm hipMallocHost

* keep simplifying the change required for UMA

* cmake: enable UMA-compatible allocation when LLAMA_HIP_UMA=ON

2023-12-21 21:45:32 +02:00

365 KiB

Raw Blame History

View Raw

365 KiB Raw Blame History

365 KiB

Raw Blame History