backend : offload large batches to GPU (#6083)

* backend : offload large batches to GPU * fix hip * code cleanup * fix CUDA split buffers * Update ggml-backend-impl.h Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cuda : fix memset without set_device * imatrix : remove sched affix from weight names * sched : add a new split if the current one has too many inputs reduce max inputs per split more cleanup * update backends ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-10-27 08:21:30 +00:00 · 2024-03-18 11:03:04 +01:00
parent 496bc79bc2
commit 2bf8d0f7c4
14 changed files with 349 additions and 396 deletions
--- a/examples/llama-bench/llama-bench.cpp
+++ b/examples/llama-bench/llama-bench.cpp
@@ -114,10 +114,10 @@ static std::string get_cpu_info() {
 static std::string get_gpu_info() {
    std::string id;
 #ifdef GGML_USE_CUBLAS
-    int count = ggml_cuda_get_device_count();
+    int count = ggml_backend_cuda_get_device_count();
    for (int i = 0; i < count; i++) {
        char buf[128];
-        ggml_cuda_get_device_description(i, buf, sizeof(buf));
+        ggml_backend_cuda_get_device_description(i, buf, sizeof(buf));
        id += buf;
        if (i < count - 1) {
            id += "/";