Aaron Teo
12e6b8b65d
Merge branch 'master' into feat/backend-zdnn
2025-07-31 02:00:01 +08:00
uvos
ad4a700117
HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes ( #14949 )
2025-07-30 17:38:06 +02:00
Kai Pastor
e228de9449
cmake : Fix BLAS link interface (ggml/1316)
2025-07-30 17:33:11 +03:00
Kai Pastor
73a8e5ca03
vulkan : fix 32-bit builds (ggml/1313)
...
The pipeline member can be cast to VkPipeline.
This is a VkPipeline_T* on 64 bit but a uint64_t on 32 bit.
Cf. VK_DEFINE_NON_DISPATCHABLE_HANDLE documentation.
2025-07-30 17:33:11 +03:00
Johannes Gäßler
92b8810ec7
CUDA: skip masked KV slices for all FA kernels ( #14924 )
2025-07-30 15:46:13 +02:00
Aaron Teo
92a17ed9f3
ggml-zdnn: clean up project structure
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 17:36:38 +08:00
Aaron Teo
90d460c20b
ggml-zdnn: clean up matmul selection
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 17:34:15 +08:00
Aaron Teo
e67feafc65
ggml-zdnn: fix ztensor deallocation abort
...
stabilise ggml <-> zdnn api
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 17:27:49 +08:00
Aaron Teo
803dde3bbc
ggml-zdnn: code clean up
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 17:23:36 +08:00
Aaron Teo
70224e6cb7
ggml-zdnn: bring load ztensor back to init routine
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 17:21:04 +08:00
Aaron Teo
1eb7c35e3a
ggml-zdnn: code cleanup
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 16:57:14 +08:00
Aaron Teo
b7a77cf683
ggml-zdnn: add guards to prevent loading ztensor if transformed
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 16:15:20 +08:00
Aaron Teo
4d5edb2221
ggml-zdnn: fix errorenous output load tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 16:11:07 +08:00
Aaron Teo
20d69b6cdf
ggml-zdnn: disable global load ztensor for now
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 16:05:58 +08:00
Aaron Teo
4fb6bee1f6
ggml-zdnn: attempt at using default nwhc format instead
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 16:04:19 +08:00
Aaron Teo
7b50d057dd
ggml-zdnn: attempt at manually changing the layout
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 15:33:13 +08:00
Aaron Teo
ad0cb30212
ggml-zdnn: disable logging and breakpoints for full test
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 13:52:13 +08:00
Aaron Teo
b4dffed954
ggml-zdnn: work on moving output ztensor as well
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 13:50:09 +08:00
Aaron Teo
fd766bdd44
ggml-zdnn: load ztensors in cgraph exec
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 13:40:36 +08:00
Aaron Teo
e30b1ffbde
ggml-zdnn: fix missing return from init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 13:34:47 +08:00
Aaron Teo
4493b148d0
ggml-zdnn: disable op_none initialisation for testing
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 13:33:12 +08:00
Aaron Teo
213f1d2a3f
ggml-zdnn: add inputs logging
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 13:11:09 +08:00
Aaron Teo
e695e8577d
ggml-zdnn: add tensor to pre_tfm_desc logging
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-30 13:06:36 +08:00
uvos
aa79524c51
HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets ( #14945 )
2025-07-29 20:23:04 +02:00
uvos
b77d11179d
HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. ( #14930 )
...
This is useful for testing for regressions on GCN with CDNA hardware.
With GGML_HIP_MMQ_MFMA=Off and GGML_CUDA_FORCE_MMQ=On we can conveniently test the GCN code path on CDNA. As CDNA is just GCN renamed with MFMA added and limited use ACC registers, this provides a good alternative for regression testing when GCN hardware is not available.
2025-07-29 17:44:30 +02:00
uvos
c7aa1364fd
HIP: Ignore unsupported unroll transformation in fattn-vec ( #14931 )
...
llvm with the amdgcn target dose not support unrolling loops with conditional break statements, when those statements can not be resolved at compile time. Similar to other places in GGML lets simply ignore this warning.
2025-07-29 17:43:43 +02:00
hipudding
204f2cf168
CANN: Add ggml_set_rows ( #14943 )
2025-07-29 22:36:43 +08:00
Sigbjørn Skjæret
138b288b59
cuda : add softcap fusion ( #14907 )
2025-07-29 14:22:03 +02:00
Aaron Teo
8dbca74fc7
ggml-zdnn: attempt to use unique ptr
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 17:03:58 +08:00
Aaron Teo
b1376ad051
ggml-zdnn: add weights logging to check
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 16:38:07 +08:00
Aaron Teo
b28b423801
ggml-zdnn: switch to using deque to fix pointer deref problem
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 15:55:33 +08:00
Aaron Teo
3446807452
ggml-zdnn: attempt at fixing invalid buffer
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 15:45:46 +08:00
Aaron Teo
2d45ee2536
ggml-zdnn: add init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 15:36:42 +08:00
Aman Gupta
0a5036bee9
CUDA: add roll ( #14919 )
...
* CUDA: add roll
* Make everything const, use __restrict__
2025-07-29 14:45:18 +08:00
Aaron Teo
ab60ae6ca2
ggml-zdnn: add zdnn_init call for static libs
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 00:55:44 +08:00
Aaron Teo
0ae2d30302
ggml-zdnn: add nnpa installed detection
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 00:39:55 +08:00
Aaron Teo
a9438925f2
ggml-zdnn: add parmblkformat detections
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 00:36:55 +08:00
Aaron Teo
1c6ca76c2e
ggml-zdnn: remove free_buffer debug info
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 00:27:16 +08:00
Aaron Teo
1a0520a540
ggml-zdnn: add logging to debug free buffer
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 00:12:18 +08:00
Aaron Teo
2872276d8a
ggml-zdnn: fix invalid ztensor buffer release
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-29 00:09:00 +08:00
Aaron Teo
2cfa118fa9
ggml-zdnn: fix missing load tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 23:42:24 +08:00
xctan
db16e2831c
ggml-cpu : deduplicate scalar implementations ( #14897 )
...
* remove redundant code in riscv
* remove redundant code in arm
* remove redundant code in loongarch
* remove redundant code in ppc
* remove redundant code in s390
* remove redundant code in wasm
* remove redundant code in x86
* remove fallback headers
* fix x86 ggml_vec_dot_q8_0_q8_0
2025-07-28 17:40:24 +02:00
Aaron Teo
fc9260deab
ggml-zdnn: attempt to fix sigsegv
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 23:37:50 +08:00
Aaron Teo
e0549c2925
ggml-zdnn: fix missing vector import in header
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 23:33:37 +08:00
Aaron Teo
f99b274cac
ggml-zdnn: fix missing vector import
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 23:30:48 +08:00
Aaron Teo
0905168388
ggml-zdnn: rewrite into mre
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 23:26:15 +08:00
Akarshan Biswas
cd1fce6d4f
SYCL: Add set_rows support for quantized types ( #14883 )
...
* SYCL: Add set_rows support for quantized types
This commit adds support for GGML_OP_SET_ROWS operation for various
quantized tensor types (Q8_0, Q5_1, Q5_0, Q4_1, Q4_0, IQ4_NL) and BF16
type in the SYCL backend.
The quantization/dequantization copy kernels were moved from cpy.cpp
to cpy.hpp to make them available for set_rows.cpp.
This addresses part of the TODOs mentioned in the code.
* Use get_global_linear_id() instead
ggml-ci
* Fix formatting
ggml-ci
* Use const for ne11 and size_t variables in set_rows_sycl_q
ggml-ci
* Increase block size for q kernel to 256
ggml-ci
* Cleanup imports
* Add float.h to cpy.hpp
2025-07-28 20:32:15 +05:30
Johannes Gäßler
946b1f6859
CUDA: fix pointer incrementation in FA ( #14916 )
2025-07-28 14:30:22 +02:00
Aaron Teo
03ec5d3ed3
ggml-zdnn: bring back working matmul
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 18:14:44 +08:00
Aaron Teo
4cc62cb693
ggml-zdnn: move bias data to local also
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 18:10:14 +08:00