llama.cpp/tests/test-backend-ops.cpp at 30649cab657d87ac46692332a76e1b75d5d22e00

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-04 09:32:00 +00:00

Files

Jeff Bolz de5627910d vulkan: Optimize argsort (#15354 )

- Launch an appropriate number of invocations (next larger power of two).
32 invocations is common and the barrier is much cheaper there.
- Specialize for "needs bounds checking" vs not.
- Make the code less branchy and [[unroll]] the loops. In the final code,
I see no branches inside the main loop (only predicated stores) when
needs_bounds_check is false.
- Always sort ascending, then apply the ascending vs descending option when
doing the final stores to memory.
- Copy the values into shared memory, makes them slightly cheaper to access.

2025-08-17 10:41:45 +02:00

240 KiB

Raw Blame History

View Raw

240 KiB Raw Blame History

240 KiB

Raw Blame History