mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	ggml : implement REGLU/GEGLU/SWIGLU ops (#14158)
* implement unary REGLU/GEGLU/SWIGLU cpu ops * relax constraints * duplicate shape of source * fix ggml_vec_geglu_f16 * special case gated ops * implement unary REGLU/GEGLU/SWIGLU cuda ops * tighten constraints again * refactor into GGML_GLU_OP * metal : add glu kernels ggml-ci * add CUDA_GLU_BLOCK_SIZE [no ci] * more constraints and use 64bit ints ggml-ci * 64bit multiplication [no ci] * implement swapped variants (cpu/cuda) * update comment [no ci] ggml-ci * Vulkan: Add GLU ops and shaders * SYCL: Implement fused kernel GEGLU, SWIGLU and REGLU for single up+gate * ggml : implement GLU for split up/gate (#14181) * implement GLU for split up/gate * add tests for ggml_glu_split * Vulkan: Implement glu_split logic and shader support * add split to logging [no ci] * SYCL: refactor element_size ops and add split up and gate support to gated kernels * SYCL: switch GEGLU to use tanh approximation --------- Co-authored-by: 0cc4m <picard12@live.de> Co-authored-by: Akarshan <akarshan@menlo.ai> * GGML: increase OP count in assertion * Refactor: Optimize SYCL element-wise operations with unary function inlining This commit refactors the SYCL element-wise operations to improve performance by: - Inlining unary operations (sgn, abs, elu, gelu, silu, etc.) to reduce kernel launch overhead. - Introducing helper functions `op_xxx` for each unary operation to encapsulate the logic. - Replacing direct kernel calls with calls to these inlined functions. - Using `__dpct_inline__` to encourage compiler inlining. - Minor code cleanup and consistency improvements. The changes aim to reduce kernel launch overhead and improve the overall efficiency of element-wise operations on SYCL devices. * vulkan: Increase workgroup size for GLU, for performance (#14345) * vulkan: Increase workgroup size for GLU, for performance * vulkan: change GLU shaders to do one element per invocation rather than one row per workgroup * merge fix * metal : add support for split and swap ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: 0cc4m <picard12@live.de> Co-authored-by: Akarshan <akarshan@menlo.ai> Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
This commit is contained in:
		| @@ -2303,6 +2303,21 @@ static bool ggml_cuda_compute_forward(ggml_backend_cuda_context & ctx, struct gg | ||||
|                     return false; | ||||
|             } | ||||
|             break; | ||||
|         case GGML_OP_GLU: | ||||
|             switch (ggml_get_glu_op(dst)) { | ||||
|                 case GGML_GLU_OP_REGLU: | ||||
|                     ggml_cuda_op_reglu(ctx, dst); | ||||
|                     break; | ||||
|                 case GGML_GLU_OP_GEGLU: | ||||
|                     ggml_cuda_op_geglu(ctx, dst); | ||||
|                     break; | ||||
|                 case GGML_GLU_OP_SWIGLU: | ||||
|                     ggml_cuda_op_swiglu(ctx, dst); | ||||
|                     break; | ||||
|                 default: | ||||
|                     return false; | ||||
|             } | ||||
|             break; | ||||
|         case GGML_OP_NORM: | ||||
|             ggml_cuda_op_norm(ctx, dst); | ||||
|             break; | ||||
| @@ -3096,6 +3111,16 @@ static bool ggml_backend_cuda_device_supports_op(ggml_backend_dev_t dev, const g | ||||
|                     return false; | ||||
|             } | ||||
|             break; | ||||
|         case GGML_OP_GLU: | ||||
|             switch (ggml_get_glu_op(op)) { | ||||
|                 case GGML_GLU_OP_REGLU: | ||||
|                 case GGML_GLU_OP_GEGLU: | ||||
|                 case GGML_GLU_OP_SWIGLU: | ||||
|                     return ggml_is_contiguous_1(op->src[0]); | ||||
|                 default: | ||||
|                     return false; | ||||
|             } | ||||
|             break; | ||||
|         case GGML_OP_MUL_MAT: | ||||
|         case GGML_OP_MUL_MAT_ID: | ||||
|             { | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Sigbjørn Skjæret
					Sigbjørn Skjæret