llama.cpp/tests/test-backend-ops.cpp at 1fa4551af069358e29fe4c497c801b0dee85cb49

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-21 12:16:57 +00:00

Files

Jeff Bolz 1fa4551af0 vulkan: support larger argsort (#17313 )

* vulkan: support larger argsort

This is an extension of the original bitonic sorting shader that puts the
temporary values in global memory and when more than 1024 threads are needed
it runs multiple workgroups and synchronizes through a pipelinebarrier.

To improve the memory access pattern, a copy of the float value is kept with
the index value. I've applied this same change to the original shared memory
version of the shader, which is still used when ncols <= 1024.

* Reduce the number of shader variants. Use smaller workgroups when doing a single pass, for a modest perf boost

* reduce loop overhead

* run multiple cols per invocation, to reduce barrier overhead

2025-11-19 17:25:50 +01:00

315 KiB

Raw Blame History

View Raw

315 KiB Raw Blame History

315 KiB

Raw Blame History