llama.cpp/tests/test-backend-ops.cpp at 971f245b3b5f3f55991bb779cb541b00f82eea1d

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Jeff Bolz 015022bb53 vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931 )

The grouped query attention optmization doesn't require a power of two ratio,
the only thing relying on it was the modulo operation written as bitwise &.

split_k need not depend on gqa_ratio - enable it any time there's only one
workgroup in the X dimension. The shader gets the split index from the x coord,
and multiple workgroups in the X dimension (pre-split) indicates a larger
FA operation that wouldn't need splitting.

2025-04-16 20:37:25 +02:00

173 KiB

Raw Blame History

View Raw

173 KiB Raw Blame History

173 KiB

Raw Blame History