llama.cpp/tests/test-backend-ops.cpp at 6bf28f0111ff9f21b3c1b1eace20c590281e7ba6

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Jeff Bolz f01bd02376 vulkan: Implement split_k for coopmat2 flash attention. (#12627 )

When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.

2025-04-02 14:25:08 -05:00

173 KiB

Raw Blame History

View Raw

173 KiB Raw Blame History

173 KiB

Raw Blame History