llama.cpp/ggml-cuda.cu at 7d5f18468ceabd7a38f414f9f21b26b0c137f994

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-18 11:46:58 +00:00

Files

Kawrakow d924522a46 Custom RoPE + bettter memory management for CUDA (#2295 )

* Custom RoPE + bettter memory management for CUDA

* Adjusted look ahead in ggml_cuda_pool_malloc to 5%

This is sufficient it seems.
We end up using about 200 MB less VRAM that way when running
the 13B model with context 8192.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

2023-07-21 17:27:51 +03:00

146 KiB

Raw Blame History

View Raw

146 KiB Raw Blame History

146 KiB

Raw Blame History