tests : add -INF blocks to the KQ mask in the FA tests (#16380)

* tests : add -INF blocks to the KQ mask in the FA tests

* cont : bump -INF block size to 64

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

* ggml : prevent division by zero in FA CPU op

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
This commit is contained in:
Georgi Gerganov
2025-10-07 08:22:35 +03:00
committed by GitHub
parent 8ae32dc9ec
commit 1d6092fc72
2 changed files with 47 additions and 1 deletions

View File

@@ -8135,7 +8135,7 @@ static void ggml_compute_forward_flash_attn_ext_f16(
}
// V /= S
const float S_inv = 1.0f/S;
const float S_inv = S == 0.0f ? 0.0f : 1.0f/S;
ggml_vec_scale_f32(DV, VKQ32, S_inv);
// dst indices