llama.cpp/ggml/include/ggml-backend.h at c0389dba43d50695f9d3f57dd1f1a14cbefc100c

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-04 09:32:00 +00:00

Files

Diego Devesa 9777032dcc llama : separate compute buffer reserve from fattn check (#15696 )

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.

2025-08-31 15:49:03 +02:00

20 KiB

Raw Blame History

View Raw

20 KiB Raw Blame History

20 KiB

Raw Blame History