llama.cpp/ggml-cuda.h at 33a52448061cfd2ea44da9e6cb30b2ec22e2f6d0

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

slaren 2bf8d0f7c4 backend : offload large batches to GPU (#6083 )

* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2024-03-18 11:03:04 +01:00

1.4 KiB

Raw Blame History

View Raw

1.4 KiB Raw Blame History

1.4 KiB

Raw Blame History