llama : reuse compute graphs (#14482)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

* llama : reuse compute graphs

ggml-ci

* llama-bench : add graph reuse parameter

ggml-ci

* cont : remove the parameter and the sched resets

ggml-ci

* graph : rename update() to can_reuse()

ggml-ci

* params : remove is_same()

ggml-ci

* graph : set res->params in llm_graph_context constructor

ggml-ci

* graph : avoid set_max_nodes in llm_graph_result

ggml-ci

* kv-cache : reuse llama_context's graph result instance

ggml-ci

* context : reset the previous graph result upon memory updates

ggml-ci

* batch : llama_ubatch now carries its data instead of pointing to balloc

ggml-ci

* merge : fix build

ggml-ci

* graph : fix can_reuse() checks when flash-attention is disabled

* graph : move llm_graph_result impl in source file + debug env

ggml-ci

This commit is contained in:

Georgi Gerganov

2025-07-17 19:08:33 +03:00

committed by

GitHub

parent 086cf81e88

commit 01612b7409

12 changed files with 548 additions and 289 deletions

									
										1

include/llama.h
									
												View File
												
				@@ -1394,6 +1394,7 @@ extern "C" {

				        int32_t n_p_eval;

				        int32_t n_eval;

				        int32_t n_reused; // number of times a ggml compute graph had been reused

				    };

				    struct llama_perf_sampler_data {

llama : reuse compute graphs (#14482)

1 include/llama.h Unescape Escape View File

1

include/llama.h

View File