CUDA: fix race conditions FlashAttention kernels (#13438)

This commit is contained in:
Johannes Gäßler
2025-05-10 22:22:48 +02:00
committed by GitHub
parent d2a4ef05c6
commit 0208355f42
2 changed files with 3 additions and 0 deletions

View File

@@ -168,6 +168,7 @@ static __global__ void flash_attn_vec_ext_f16(
for (int j = 0; j < ncols; ++j) {
KQ[j*D + tid] = -HALF_MAX_HALF;
}
__syncthreads();
half2 VKQ[ncols] = {{0.0f, 0.0f}};