Files
llama.cpp/ggml
Max Krasnyansky dcca0d3ab8 cpu: introduce chunking for flash attention (#16829)
Factor out the core FA loop into flash_atten_f16_one_chunk and add an outter loop
on top that handles the chunks.
2025-10-30 14:26:05 +02:00
..
2024-07-13 18:12:39 +02:00