llama.cpp/ggml/include/ggml-backend.h at 0a1b3982cd0bd18730d50a693053b88c13fd04a6

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

Diego Devesa 9777032dcc llama : separate compute buffer reserve from fattn check (#15696 )

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.

2025-08-31 15:49:03 +02:00

20 KiB

Raw Blame History

View Raw

20 KiB Raw Blame History

20 KiB

Raw Blame History