ggml : implement REGLU/GEGLU/SWIGLU ops (#14158)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-09 10:17:06 +00:00

* implement unary REGLU/GEGLU/SWIGLU cpu ops

* relax constraints

* duplicate shape of source

* fix ggml_vec_geglu_f16

* special case gated ops

* implement unary REGLU/GEGLU/SWIGLU cuda ops

* tighten constraints again

* refactor into GGML_GLU_OP

* metal : add glu kernels

ggml-ci

* add CUDA_GLU_BLOCK_SIZE [no ci]

* more constraints and use 64bit ints

ggml-ci

* 64bit multiplication [no ci]

* implement swapped variants (cpu/cuda)

* update comment [no ci]

ggml-ci

* Vulkan: Add GLU ops and shaders

* SYCL: Implement fused kernel GEGLU, SWIGLU and REGLU for single up+gate

* ggml : implement GLU for split up/gate (#14181)

* implement GLU for split up/gate

* add tests for ggml_glu_split

* Vulkan: Implement glu_split logic and shader support

* add split to logging [no ci]

* SYCL: refactor element_size ops and add split up and gate support to gated kernels

* SYCL: switch GEGLU to use tanh approximation

---------

Co-authored-by: 0cc4m <picard12@live.de>
Co-authored-by: Akarshan <akarshan@menlo.ai>

* GGML: increase OP count in assertion

* Refactor: Optimize SYCL element-wise operations with unary function inlining

This commit refactors the SYCL element-wise operations to improve performance by:

- Inlining unary operations (sgn, abs, elu, gelu, silu, etc.) to reduce kernel launch overhead.
- Introducing helper functions `op_xxx` for each unary operation to encapsulate the logic.
- Replacing direct kernel calls with calls to these inlined functions.
- Using `__dpct_inline__` to encourage compiler inlining.
- Minor code cleanup and consistency improvements.

The changes aim to reduce kernel launch overhead and improve the overall efficiency of element-wise operations on SYCL devices.

* vulkan: Increase workgroup size for GLU, for performance (#14345)

* vulkan: Increase workgroup size for GLU, for performance

* vulkan: change GLU shaders to do one element per invocation rather than one row per workgroup

* merge fix

* metal : add support for split and swap

ggml-ci

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: 0cc4m <picard12@live.de>
Co-authored-by: Akarshan <akarshan@menlo.ai>
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

This commit is contained in:

Sigbjørn Skjæret

2025-06-29 11:04:10 +02:00

committed by

GitHub

parent bd9c981d72

commit a0535ffa0d

26 changed files with 2126 additions and 1153 deletions

									
										11

ggml/src/ggml-metal/ggml-metal-impl.h
									
												View File
												
				@@ -422,6 +422,17 @@ typedef struct {

				    int32_t  KHW; // KH * KW, pre-computed on CPU to save GPU resources

				} ggml_metal_kargs_im2col;

				typedef struct{

				    int32_t  ne00;

				    uint64_t nb01;

				    int32_t  ne10;

				    uint64_t nb11;

				    int32_t  ne0;

				    uint64_t nb1;

				    int32_t  i00;

				    int32_t  i10;

				} ggml_metal_kargs_glu;

				typedef struct {

				    int64_t  ne00;

				    int64_t  ne01;

ggml : implement REGLU/GEGLU/SWIGLU ops (#14158)

11 ggml/src/ggml-metal/ggml-metal-impl.h Unescape Escape View File

11

ggml/src/ggml-metal/ggml-metal-impl.h

View File