llama.cpp/ggml/include/ggml-backend.h at 856ed0947f27b4ec3ad269fceda0402fbab263d3

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-01 09:01:57 +00:00

Files

Diego Devesa 9777032dcc llama : separate compute buffer reserve from fattn check (#15696 )

Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.

2025-08-31 15:49:03 +02:00

20 KiB

Raw Blame History

View Raw

20 KiB Raw Blame History

20 KiB

Raw Blame History