leejet
0a1b3982cd
ggml: add ops for WAN video model (cuda && cpu) ( #15669 )
...
* add conv3d support
* add ggml_pad_ext for cpu & cuda backend
* cuda/cpu: add im2col_3d support
* cuda: make im2col a little faster
* fix cuda pad/scale/im2col3d
* make im2col_3d faster
* gguf: support loading tensors which n_dims > GGML_MAX_DIMS
* fix cuda get_rows
* avoid ggml_conv_3d conflict
* correct GGML_OP_COUNT assertion
* avoid build failure
* avoid build failure on MacOS
* cuda: remove unnecessary MIN define
* fix cpu im2col_3d
* adjust the code style
* cuda: use simpler loop in get_rows
* add test_im2col_3d to test-backend-ops
* test-backend-ops.cpp: remove trailing whitespace
* cpu: im2col_3d support non continuous src
Co-authored-by: Jeff Bolz <jbolz@nvidia.com >
* fix test_im2col_3d
* remove unused variables
* cuda: get_rows: dfloat2 -> float2
* add test_pad_ext to test-backend-ops.cpp
* add gguf_init_from_file_ext impl
* Revert "gguf: support loading tensors which n_dims > GGML_MAX_DIMS"
This reverts commit d8377a0a37 .
* Revert "add gguf_init_from_file_ext impl"
This reverts commit d9f1d13208 .
* update ggml_backend_vk_device_supports_op
* fix ggml_backend_vk_device_supports_op
* update other backend supports op for ggml_pad_ext
* metal/opencl/sycl/vulkan: fix GGML_OP_PAD check in supports_op
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com >
2025-09-04 10:38:49 +02:00
rmatif
820bc98531
opencl: add hs=40 to FA ( #15758 )
2025-09-03 23:30:28 -07:00
rmatif
97669e4073
opencl: add attn sinks support for FA kernels ( #15706 )
2025-09-01 23:26:53 -07:00
rmatif
86076f92de
OpenCL: add fused group_norm/norm, mul, add ( #15314 )
...
* add fused group_norm/norm, mul, add
* fix spacing
* revert rms_norm logic
* fix trailing whitespace
2025-08-26 23:36:05 -07:00
lhez
f7207b0415
opencl: fix support ops condition for rms_norm ( #15560 )
2025-08-25 14:18:09 -07:00
lhez
fb22dd07a6
opencl: mark argsort unsupported if cols exceed workgroup limit ( #15375 )
2025-08-19 11:25:51 -07:00
rmatif
912ff8c119
OpenCL: add initial FA support ( #14987 )
...
* add F16/F16 fa support
* fix kernel init
* use mad instead of fma
* use inline function
* mark FA with sinks as unsupported for now
* add pragma unroll to loops
2025-08-16 01:05:55 -07:00
lhez
e2c1bfff53
opencl: add initial mxfp4 support via mv ( #15270 )
...
* opencl: add reference `mul_mv_mxfp4_f32`
* opencl: add reference `mul_mv_id` for mxfp4
* Q4_0 tranpose fix for Adreno
---------
Co-authored-by: shawngu-quic <shawngu@qti.qualcomm.com >
2025-08-15 09:52:14 -07:00
rmatif
60a7658810
opencl: allow mixed f16/f32 add ( #15140 )
2025-08-12 02:42:41 -07:00
AN Long
cd6983d56d
ggml : fix field name when new ggml_backend ( #14944 )
2025-08-08 14:37:22 +02:00
lhez
aaa3d07ae7
opencl: support sink in soft_max (attn sinks) ( #15152 )
2025-08-07 21:47:03 -07:00
rmatif
756cfea826
fix profiling crash ( #15072 )
2025-08-06 14:17:51 -07:00
lhez
e725a1a982
opencl: add swiglu_oai and add_id ( #15121 )
...
* opencl: add `swiglu-oai`
* opencl: add `add_id`
* opencl: add missing `add_id.cl`
2025-08-06 12:12:17 -07:00
Georgi Gerganov
fd1234cb46
llama : add gpt-oss ( #15091 )
...
* oai moe
* compat with new checkpoint
* add attn sink impl
* add rope scaling yarn
* logits match with latest transformers code
* wip chat template
* rm trailing space
* use ggml_scale_bias
* rm redundant is_swa_all
* convert interleaved gate_up
* graph : fix activation function to match reference (#7 )
* vocab : handle o200k_harmony special tokens
* ggml : add attention sinks support (#1 )
* llama : add attn sinks
* ggml : add attn sinks
* cuda : add attn sinks
* vulkan : add support for sinks in softmax
remove unnecessary return
* ggml : add fused swiglu_oai op (#11 )
* ggml : add fused swiglu_oai op
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* update CUDA impl
* cont : metal impl
* add vulkan impl
* test-backend-ops : more test cases, clean up
* llama : remove unfused impl
* remove extra lines
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: slaren <slarengh@gmail.com >
* repack mxfp4 upon conversion
* clean up a bit
* enable thinking
* add quick hack to render only some special tokens
* fix bf16 conversion
* remove vocab hack
* webui ok
* support chat parsing for gpt-oss
* fix webui
* direct mapping mxfp4, FINALLY
* force using mxfp4
* properly use lazy tensor
* ggml : add mxfp4
ggml : use e8m0 conversion instead of powf
Co-authored-by: Diego Devesa <slarengh@gmail.com >
change kvalues_mxfp4 table to match e2m1 (#6 )
metal : remove quantization for now (not used)
cuda : fix disabled CUDA graphs due to ffn moe bias
vulkan : add support for mxfp4
cont : add cm2 dequant
* ggml : add ggml_add_id (#13 )
* ggml : add ggml_add_id
* add cuda impl
* llama : add weight support check for add_id
* perf opt
* add vulkan impl
* rename cuda files
* add metal impl
* allow in-place ggml_add_id
* llama : keep biases on CPU with --cpu-moe
* llama : fix compile error
ggml-ci
* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw
ggml-ci
* cleanup
ggml-ci
* sycl : fix supports_op for MXFP4
ggml-ci
* fix Unknown reasoning format
* ggml-cpu : fix AVX build
ggml-ci
* fix hip build
ggml-ci
* cuda : add mxfp4 dequantization support for cuBLAS
ggml-ci
* ggml-cpu : fix mxfp4 fallback definitions for some architectures
ggml-ci
* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: slaren <slarengh@gmail.com >
2025-08-05 22:10:36 +03:00
lhez
5c0eb5ef54
opencl: fix adreno compiler detection logic ( #15029 )
2025-08-02 19:51:18 +02:00
lhez
1c872f71fb
opencl: add f16 for add, sub, mul, div ( #14984 )
2025-08-01 13:15:44 +02:00
lhez
6e6725459a
opencl: add mul_mat_f32_f32_l4_lm and mul_mat_f16_f32_l4_lm ( #14809 )
2025-07-30 14:56:55 -07:00
lhez
ce111d39d6
opencl: add fused rms_norm_mul ( #14841 )
...
* opencl: add fused `rms_norm` + `mul`
* opencl: improve workgroup size for `rms_norm_mul`
2025-07-25 17:12:13 +02:00
lhez
8e6f8bc875
opencl: remove unreachable return ( #14806 )
2025-07-22 08:53:30 +02:00
Sigbjørn Skjæret
38d3af1b73
opencl: fix im2col when KW!=KH ( #14803 )
2025-07-21 13:55:10 -07:00
rmatif
6c9ee3b17e
opencl: add conv2d kernel ( #14403 )
...
* add conv2d kernel
* fix trailing whitespace
* whitespace fixe
* handle f16 input and f16 kernel, more opt
* resolve conflicts
* use enqueue_ndrange_kernel
2025-07-21 10:03:19 -07:00
Georgi Gerganov
05fec5bd29
ggml : add build-time message to remind about ggml_set_rows ( #14661 )
...
ggml-ci
2025-07-13 10:36:33 +03:00
rmatif
6bdda13981
opencl: add tiled mul_mat_f16_f32 ( #14535 )
...
* add tiled mul_mat_f16_f32
* fix trailing whitespace
* add insightful comments
2025-07-10 14:58:12 -07:00
lhez
0b8855775c
opencl: add set_rows for f16 and f32 ( #14547 )
...
* opencl: add `set_rows` for `f16` and `f32`
* opencl: better choose workgroup size for `set_rows`
2025-07-10 11:48:52 -07:00
Xuan-Son Nguyen
98bab638fb
ggml : add ggml_scale_bias ( #14417 )
...
* ggml : add ggml_scale_bias
* ggml_vec_mad1_f32
* add more simd
* add CUDA
* sycl
* vulkan
* cann (placeholder)
* opencl
* will this fix cpu?
* fix cuda
* suggestions from coderabbit
* fix cann compile error
* vDSP_vsmsa
* rm __ARM_FEATURE_SVE
* use memcpy for op params
* make code looks more consistent
* use scalar for __ARM_FEATURE_SVE
* add x param to ggml_vec_mad1_f32
2025-07-09 18:16:12 +02:00
Sigbjørn Skjæret
6681688146
opencl: add GELU_ERF ( #14476 )
2025-07-04 23:24:56 -07:00
Sigbjørn Skjæret
28657a8229
ggml : implement GEGLU_ERF and GEGLU_QUICK ops ( #14445 )
2025-07-03 23:07:22 +02:00
lhez
bee28421be
opencl : broadcast for soft_max ( #14510 )
2025-07-03 20:22:24 +02:00
Georgi Gerganov
a70c8a0c4b
kv-cache : use ggml_set_rows ( #14285 )
...
* kv-cache : use ggml_set_rows
ggml-ci
* graph : separate k and v indices
ggml-ci
* cont : remove redundant ifs
ggml-ci
* kv-cache : improve find_slot impl
* kv-cache : bounds-check when accessing slot_info indices
* kv-cache : add comments
ggml-ci
* ggml : add TODOs for adding GGML_OP_SET_ROWS support in the backends
ggml-ci
2025-07-03 10:53:35 +03:00
zhouwg
307e79d33d
opencl : fix possible buffer overflow in dump_tensor ( #14490 )
2025-07-02 14:38:10 +02:00
Eric Zhang
c8a4e470f6
opencl : skip empty nodes on cgraph compute ( #14491 )
2025-07-02 13:00:04 +02:00
lhez
603e43dc91
opencl : update upscale to support align corners ( #14488 )
2025-07-02 09:07:42 +02:00
lhez
79b33b2317
opencl : add GEGLU, REGLU, SWIGLU ( #14456 )
2025-07-01 09:19:16 +02:00
lhez
73e53dc834
opencl: ref count ggml_backend_opencl_context and refactor profiling ( #14254 )
...
* Move profiling info into `ggml_backend_opencl_context`
* Add `enqueue_ndrange_kernel` to launch kernel
2025-06-24 11:46:25 -07:00
lhez
4c763c8d1b
opencl: add mul_mv_id_q4_0_f32_8x_flat ( #14003 )
2025-06-10 16:55:58 -07:00
lhez
71e74a3ac9
opencl: add backend_synchronize ( #13939 )
...
* This is not needed by the normal use where the result is read
using `tensor_get`, but it allows perf mode of `test-backend-ops`
to properly measure performance.
2025-06-02 16:54:58 -07:00
rmatif
bfb1e012a0
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat ( #13840 )
...
* add concat, pad, repeat, tsembd, tanh, upscale
* small fixes
2025-06-02 16:53:36 -07:00
lhez
a3c30846e4
opencl: add new ops - argsort, div, sub, addrows, sigmoid, group_norm ( #13787 )
...
* opencl: add `argsort`
* opencl: add `div`
* opencl: add `add_rows`
* opencl: add `sub`
* opencl: add `sigmoid`, both `f16` and `f32`
* opencl: add `group_norm`
2025-05-27 12:56:08 -07:00
lhez
1701d4c54f
opencl: mark mul_mat f32f32 as supporting non-contiguous tensors ( #13790 )
2025-05-27 12:53:14 -07:00
Henry Linjamäki
a4e8912dfd
opencl: Add support for multiple devices ( #12622 )
...
* opencl: Add support for multiple devices
... but limited to one platform. A platform with a GPU will be preferred.
Additionally:
* Filter out devices that lack capabilities needed by the backend
implementation (half support, OpenCL 2.0+, etc).
* Make ggml_backend_opencl_reg() thread-safe.
* fixup: fix an error in sync_with_other_backends
... when there is only one OpenCL device available.
2025-05-21 16:21:45 -07:00
Henry Linjamäki
edbf42edfd
opencl: fix couple crashes ( #12795 )
...
* opencl: fix couple crashes
* fix kernel launches failed on devices which do not support
non-uniform work-groups. When non-uniform work-groups are not
supported, set `local_work_size` to NULL (= let driver choose the
work-group sizes). This patch does not cover everything - just the
cases tested by test-backend-ops.
* fix sub-buffer creation failed due to `cl_buffer_region::origin` not
being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`.
* OpenCL: query non-uniform WG sizes only on OpenCL 3.0+
2025-05-21 13:21:17 -07:00
lhez
f0d46ef157
opencl: remove unnecessary assert for add ( #13257 )
2025-05-12 13:13:49 -07:00
kimminsu
12b17501e6
opencl: fix incorrect local_size index in profiling log ( #12868 )
2025-04-16 14:25:57 -07:00
lhez
80f19b4186
opencl: split ggml-opencl.cl into multiple files and cleanup ( #12886 )
...
* opencl: refactor - split the kernel files
---------
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com >
* opencl: split more kernels into separate files
* opencl: specify subgroup size instead of querying it
* opencl: refine Adreno cl compiler version parsing
* opencl: skip some kernels not used by Adreno on old compilers
* opencl: refine logic for selecting Adreno kernels
* opencl: refine Adreno cl compiler version
* opencl: cleanup preprocessor for kernels
* opencl: consider Adreno CL compiler on Windows
* opencl: add final newline for `mul_mv_f16_f16.cl`
---------
Co-authored-by: Shangqing Gu <quic_shawngu@quicinc.com >
2025-04-15 12:26:00 -07:00
lhez
82974011f3
opencl: better identify Adreno GPU ( #12760 )
2025-04-07 13:22:54 -07:00
lhez
97a20c012b
opencl: use max_alloc_size in backend ctx instead of querying again ( #12705 )
2025-04-02 17:01:42 -07:00
Junil Kim
f423981ac8
opencl : fix memory allocation size ( #12649 )
...
issue:
https://github.com/CodeLinaro/llama.cpp/pull/17#issuecomment-2760611283
This patch fixes the memory allocation size
not exceeding the maximum size of the OpenCL device.
2025-04-01 09:54:34 -07:00
lhez
5dec47dcd4
opencl: add multi and vision rope, gelu_quick and im2col ( #12600 )
...
* opencl: add `im2col`
* opencl: add `gelu_quick`
* opencl: add mrope
* opencl: add vision rope
2025-03-27 08:08:08 -07:00
lhez
2b65ae3029
opencl: simplify kernel embedding logic in cmakefile ( #12503 )
...
Co-authored-by: Max Krasnyansky <quic_maxk@quicinc.com >
2025-03-24 09:20:47 -07:00
lhez
d84635b1b0
opencl: improve profiling ( #12442 )
...
* opencl: more profiling timing
* opencl: generate trace for profiling
* opencl: reduce profiling overhead
* Populate profiling timing info at the end rather than after each
kernel run
* opencl: fix for chrome tracing
2025-03-18 12:54:55 -07:00