Vulkan Optimizations and Fixes (#8959)

* Optimize Vulkan REPEAT performance * Use Vulkan GLSL fused multiply-add instruction where possible * Add GGML_VULKAN_PERF option to output performance data per operator * Rework and fix Vulkan descriptor set and descriptor pool handling * Fix float32 concat f16 shader validation error * Add Vulkan GROUP_NORM eps parameter * Fix validation error with transfer queue memory barrier flags * Remove trailing whitespaces
2025-11-07 09:57:00 +00:00 · 2024-08-14 18:32:53 +02:00
parent 98a532d474
commit 5fd89a70ea
16 changed files with 781 additions and 851 deletions
--- a/ggml/src/vulkan-shaders/mul_mat_vec_p021.comp
+++ b/ggml/src/vulkan-shaders/mul_mat_vec_p021.comp
@@ -52,7 +52,7 @@ void main() {
        // y is not transposed but permuted
        const uint iy = channel*nrows_y + row_y;

-        tmp[tid] += xi * FLOAT_TYPE(data_b[iy]);
+        tmp[tid] = fma(xi, FLOAT_TYPE(data_b[iy]), tmp[tid]);
    }

    // dst is not transposed and not permuted