Commit Graph

4 Commits

Author SHA1 Message Date
Jeff Bolz
c4f53563df vulkan: support fattn sinks (#15126) 2025-08-07 22:44:20 +02:00
Jeff Bolz
6efcd65945 vulkan: optimize flash attention split_k_reduce (#14554)
* vulkan: allow FA split_k with smaller KV values

* vulkan: spread split_k_reduce work across more threads

k_num can get rather large. Use the whole workgroup to reduce the M/L values.

Launch a thread for each element in the HSV dimension of the output. Helps a
lot for large HSV (like deepseek).
2025-07-08 20:11:42 +02:00
Jeff Bolz
8875523eb3 vulkan: support softmax/FA batch and broadcast (#14449) 2025-07-02 15:48:33 +03:00
Jeff Bolz
f01bd02376 vulkan: Implement split_k for coopmat2 flash attention. (#12627)
When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.
2025-04-02 14:25:08 -05:00