ggml : fix padding in timestep embedding kernels (#15932)

* ggml : remove adding extra dim timestep embedding

This commit updates the ggml_timestep_embedding function to no longer
add an extra dimension when the specified dimension is odd.

The motivation for this change is that this introduces an unnecessary
dimension when the dimension is odd, which caused an issue in the
kernels which were not expecting this extra dimension and it resulted in
uninitialized memory for the second to last dimension.

* ggml-cuda : fix padding in timestep embedding kernel

This commit removes the zeroing out of the last dimension now that we
are not adding the extra padding dimension.

* ggml-metal : fix padding in timestep embedding kernel

This commit fixes the zero padding for odd dimensions in
the timestep embedding kernel

* ggml-opencl : fix padding in timestep embedding kernel

This commit fixes the zero padding for odd dimensions in
the timestep embedding kernel.

* ggml-sycl : fix padding in timestep embedding kernel

This commit fixes the zero padding for odd dimensions in
the timestep embedding kernel.

* ggml-vulkan : fix padding in timestep embedding kernel

This commit fixes the zero padding for odd dimensions in
the timestep embedding kernel.

* ggml-cpu : fix padding in timestep embedding function

This commit removes the zeroing out of the last dimension now that we
are not adding the extra padding dimension.
This commit is contained in:
Daniel Bevenius
2025-09-16 15:25:57 +02:00
committed by GitHub
parent 76888d202e
commit 3913f8730e
7 changed files with 15 additions and 18 deletions

View File

@@ -26,8 +26,8 @@ kernel void kernel_timestep_embedding(
local_half_dim = logical_dim / 2;
local_embed_data_ptr = (global float *)((global char *)local_dst_output_base_ptr + local_i * dst_nb1_bytes);
if (logical_dim % 2 != 0 && local_j == ((logical_dim + 1) / 2)) {
local_embed_data_ptr[logical_dim] = 0.0f;
if (logical_dim % 2 != 0 && local_j == local_half_dim) {
local_embed_data_ptr[2 * local_half_dim] = 0.0f;
}
if (local_j >= local_half_dim) {