Giuseppe Scrivano
1568d13c2c
vulkan: implement ABS and NEG ( #17245 )
...
* docs: update Vulkan ops
* vulkan: add NEG op
* vulkan: add ABS op
---------
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com >
2025-11-15 12:00:29 +01:00
Piotr Wilkin (ilintar)
389ac78b26
ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM ( #17063 )
...
* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM
* Update ggml/include/ggml.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Code review
* Whitespace
* Update tests/test-backend-ops.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* This is actually sigmoid, duh.
* Add CONST, remove TRI_KEEP, other changes from review
* Update tests/test-backend-ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml-cuda/unary.cu
Co-authored-by: Aman Gupta <amangupta052@gmail.com >
* Remove extra script
* Update ggml/src/ggml.c
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* moving changes from laptop [no ci]
* pre-rebase
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Refactor tests
* ggml : cleanup
* cont : fix ggml_fill srcs
* tests : add note
* ggml : add ggml_fill_inplace
* ggml : add asserts
* ggml : fix ggml_fill constant cast
* cont : ggml_tri minor
* Use TENSOR_LOCALS
* Fix regression from #14596 , regenerate
* Don't make commits at night...
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
Co-authored-by: Aman Gupta <amangupta052@gmail.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-11-13 20:54:47 +02:00
Neo Zhang Jianyu
07751f8d44
update SYCL support OPs ( #17208 )
...
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com >
2025-11-13 08:42:23 +08:00
Raul Torres
23a46ce972
CANN: GGML_CANN_ACL_GRAPH works only USE_ACL_GRAPH enabled ( #16861 )
...
The documentation should state that `GGML_CANN_ACL_GRAPH` is only effective if `USE_ACL_GRAPH` was enabled at compilation time.
2025-11-12 14:37:52 +08:00
YehuditE
9d7c518d64
sycl: add CONCAT operator support ( #16047 )
...
* sycl: add CONCAT operator support
* cleanup: remove stray lines added by mistake
* fix: code format issues in concat.cpp and tests/test-backend-ops.cpp
* chore: fix editorconfig violations
* cleanup: drop unnecessary i16 type support
* docs: update sycl-csv and regenerate ops.md
* update docs/ops.md
* fix: adapt to upstream master changes after rebase
* fix: remove empty files
* fix: drop whitespace
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-11-06 11:02:33 +01:00
Johannes Gäßler
22c8c3c6ad
docs: explain CUDA 11 compilation [no ci] ( #16824 )
2025-11-06 08:14:35 +01:00
mnehete32
9aa63374f2
CUDA: update ops.md ( #17005 )
2025-11-05 11:01:15 +08:00
lhez
5e90233bdb
opencl: update doc ( #17011 )
...
* opencl: update docs
* opencl: update docs
* opencl: fix link
* opencl: update doc
2025-11-04 16:02:36 -08:00
Aaron Teo
a864132ba5
devops: fix failing s390x docker build ( #16918 )
2025-11-02 08:48:46 +08:00
YaelLogic
338074c383
sycl: add RMS_NORM_BACK operation support ( #16808 )
...
* sycl: add RMS_NORM_BACK operation support
* sycl: rms_norm_back: add dual reduction paths (FP64 and FP32) and savepoint before further changes
* sycl: add RMS_NORM_BACK support
Implement RMS_NORM_BACK for the SYCL backend using FP32 compensated parallel reduction. Minimal docs updates (ops.md / SYCL.csv).
* revert: restore .gitignore and tools/run/CMakeLists.txt to upstream
* revert: restore tests/CMakeLists.txt to upstream
* sycl: optimize rms_norm_back
* fix: restore SYCL.csv to correct state with RMS_NORM_BACK support
* Update ggml/src/ggml-sycl/norm.cpp
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com >
* fix: remove trailing whitespace and add missing newline (EditorConfig)
---------
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com >
2025-10-29 14:14:39 +08:00
Johannes Gäßler
80d28f104c
HIP: fix AMDGPU_TARGETS, update documentation ( #16803 )
2025-10-27 21:39:49 +01:00
Max Krasnyansky
63d2fc46e1
Add experimental ggml-hexagon backend for the Hexagon NPU ( #16547 )
...
* model: add support for extra bufs for all devices
* hexagon: add experimental ggml-hexagon backend for the Hexagon NPU
This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU.
Highlights:
- Supports Hexagon versions: v73, v75, v79, and v81
- Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5
- Supports Q4_0, Q8_0, MXFP4, and FP32 data types
- Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX
**Note:** This backend is experimental and may exhibit instability or limited performance across supported devices.
It is intended for early testing and feedback from llama.cpp/ggml developer and user community.
Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com >
Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com >
* hexagon: fix format checker errors
* hexagon: update readme and cmake presets
* ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions
* hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input
* hexagon: move ADB helper scripts into scripts/snapdragon/adb
* hexagon: replace all f/printfs with GGML_LOG_...
* readme: add hexagon to the list supported backends
* hexagon: stack malmuts with quantized inputs only
* hexagon: add TODO for fixing issues in hexagon_graph_optimize
* hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC
* scripts: fix lint errors
* scripts: update qdc pytest script to make linter happy
* hexagon: add reduce sum in fp32
* hexagon: reduce number of vector stores in matmul output
* hexagon: remove the need for vdelta in reduce-multiply-x8
* hexagon: consistent use of reduce_sum_fp32 for row_sums
* hexagon: some more matmul optimizations and comments
Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models).
We've handled those cases already but at a higher overhead.
* hexagon: update cmake presets
* hexagon: add OPMASK support for run-bench.sh wrapper
* hexagon: update to use GGML_BACKEND_API
* hexagon: remove unused logic for setting tensor flags for the views
* hexagon: add asserts to set/get_tensor to make sure we handle complete tensors
Same asserts as the CPU backend.
* hexagon: use cpy_tensor slow path for non-host buffers
* hexagon: error checks in the buffer allocator
* cmake: move include(extProj) under ggml-hexagon
* hexagon: don't forget to delete the backend on free
* hexagon: set/get_tensor size assert apply only to quantized tensors
* hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now
GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way.
Ideally we need a bit more finer log levels.
* docs: typos in hexagon developer docs (libggm-...)
* hexagon: overhaul error handling in the session/device allocation
this should handle all failure paths in the session allocation.
* hexagon: update cmake presets to enable fp16 vectors
* hexagon: remove unused time_usec function
* hexagon: don't forget to release buffer contexts
* hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure)
* hexagon: remove custom can_repeat function and use ggml_can_repeat
---------
Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com >
Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com >
2025-10-22 13:47:09 -07:00
YehuditE
6de8ed7519
sycl : add PAD_REFLECT_D1 operator support ( #16145 )
...
* sycl: add PAD_REFLECT_D1 operator support
* docs(ops): regenerate docs/ops.md
* remove trailing whitespaces
* style: fix editorconfig issues — trim trailing spaces and normalize EOLs
* fix: move PAD_REFLECT_1D case outside of fall-through block
2025-10-21 00:21:12 +02:00
safranowith
2330de7b84
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators ( #16613 )
...
* SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators
Clean up unrelated changes from previous commit
* Chore: remove empty lines and fix indentation
* Clean up: remove leftover blank lines and fix spacing
* chore: fix trailing whitespace and ensure final newline
* Cleanup: remove redundant declarations already defined in header
* Sync docs/ops.md with updated backend operation support
* docs: update ops.md after rebase
* docs: update ops.md - Vulkan supports SSM_CONV and SSM_SCAN
2025-10-20 11:08:32 +03:00
Giuseppe Scrivano
3d4e86bbeb
vulkan: Add State Space Model (SSM) Operations Support ( #16463 )
...
* vulkan: implement SSM scan operation
Add State Space Model scan operation to the Vulkan backend.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com >
* vulkan: implement SSM conv operation
Add State Space Model conv operation to the Vulkan backend.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com >
---------
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com >
2025-10-17 14:23:47 +02:00
safranowith
466c1911ab
cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators ( #16083 )
...
* CPU: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators
- Added the operators to unary op enum
- Implemented API functions
- Implemented forward and unary-op logic in CPU backend
- Updated ggml_get_n_tasks
- Updated operators names array and static_assert
- Updated docs and enabled automatic tests
* docs: add documentation for ggml_trunc and ggml_trunc_inplace in ggml.h
* chore: remove trailing whitespace from ggml.h
* Remove unresolved merge markers
* Apply review suggestions: cleanup formatting, enum order and leftover artifacts
* Regenerate ops.md using create_ops_docs.py
2025-10-15 21:24:51 +02:00
Neo Zhang Jianyu
c7be9febcb
[SYCL] fix UT fault cases: count-equal, argsort, pad OPs ( #16521 )
...
* fix/refactor OP argsort, pad
* fix count-equal op
* update SYCL OP list
* fix format issue
---------
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com >
2025-10-12 21:53:35 +08:00
Neo Zhang Jianyu
2be72c2b12
SYCL: Update to oneAPI 2025.2 ( #16371 )
...
* update oneapi to 2025.2, use deep-learning-essentials to replace base-tool
* update to 2025.2 use deeplearn essi to replace base toolkit
* add missed dll
* add deep learning essentials
* add sycl-ls
---------
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com >
2025-10-02 10:16:25 +03:00
alex-spacemit
b77e6c18e1
ggml: riscv: add riscv spacemit backend ( #15288 )
...
* ggml: add spacemit backend
Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23
* add new line at end of file
Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2
* add riscv zba extension limit
Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce
* fixed for review comments, file renamed and format
Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce
* fixed for code format, after clang-format
Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2
* use _Float16 instead of __fp16
Change-Id: I039fb02bb95270e641bc4442204e658735859d43
* add ci for riscv64-spacemit-ime-native
Change-Id: I711c1033061df1a289ea77891b2997599dfe8279
* update debian-13-riscv64-spacemit-ime-native ci label
Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a
* remove license comment for spacemit ime
Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3
* upgrade binutils for gcc ime
Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45
* add spacemit ime cross jobs
Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6
* remove native compile for riscv64-spacemit-ime
Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e
* ci : add caching for spacemit ime cross toolchain
Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de
* ci: bug fixed for cache path and env
Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a
* Update .github/workflows/build-linux-cross.yml for cache path
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* bugfixed for build-linux-cross.yml, syntax error
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: cailinxi <linxi.cai@spacemit.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-09-29 17:50:44 +03:00
R0CKSTAR
a86a580a66
musa: upgrade musa sdk to 4.3.0 ( #16240 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-09-26 02:56:38 +02:00
Aaron Teo
264f1b5187
zdnn: refactor codebase + add docs ( #16178 )
...
* zdnn: initial matmul refactor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rm static from funcs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update ggml-zdnn.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: change header files to hpp
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to common.hpp
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move mulmat forward around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rm inline from utils
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add zDNN docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-09-23 14:53:05 +08:00
Aaron Teo
40be51152d
ggml-zdnn: fix #15414 , activate FP16 and BF16 acceleration and incorrect zTensor free ( #15839 )
2025-09-13 02:39:52 +08:00
hipudding
c0389dba43
CANN: Disable acl_graph for prefill stage ( #15933 )
...
Since the prefill length is not fixed, graphs constructed for the
prefill stage cannot be reused. For this reason, ACL graph
execution is disabled by default during prefill.
2025-09-11 15:59:37 +08:00
Chenguang Li
28b5f190ef
CANN: implement LRU cache for ACL graphs ( #15814 )
...
* CANN: implement LRU cache for ACL graphs in CANN backend
- Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects.
- Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded.
- Updated push, move_to_front, and clear methods to manage cached graphs efficiently.
- Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.
* fix typo
* The LRU cache capacity can be configured via an env variable
Signed-off-by: noemotiovon <757486878@qq.com >
* refactory acl graph
* refactory && fix review comments
Signed-off-by: noemotiovon <757486878@qq.com >
---------
Signed-off-by: noemotiovon <757486878@qq.com >
2025-09-10 15:29:12 +08:00
Aaron Teo
186415d595
ggml-cpu: drop support for nnpa intrinsics ( #15821 )
2025-09-06 11:27:28 +08:00
hipudding
5421f63ab0
CANN: Fix precision issue on 310I DUO multi-devices ( #15784 )
2025-09-04 15:12:30 +08:00
Chenguang Li
2f853687b3
CANN: Support eager execution mode under ACL graph compilation ( #15712 )
...
* [CANN] Support eager execution mode under ACL graph compilation
Add support for running operators in eager mode while ACL graph
compilation is enabled. This allows bypassing graph execution
and directly submitting ops, which is useful for debugging and
reducing graph build overhead in certain scenarios.
Signed-off-by: noemotiovon <757486878@qq.com >
* fix typo
Signed-off-by: noemotiovon <757486878@qq.com >
* rename to acl_graph_mode
Signed-off-by: noemotiovon <757486878@qq.com >
---------
Signed-off-by: noemotiovon <757486878@qq.com >
2025-09-02 14:07:48 +08:00
Diego Devesa
dd892555b0
Update build.md to remove MSVC arm64 notes ( #15684 )
...
Removed information about MSVC compiler limitations for arm64 builds.
2025-08-30 23:51:28 +08:00
ExtReMLapin
792b44f2ed
server : add documentation for parallel_tool_calls param ( #15647 )
...
Co-authored-by: Pierre F <no@p.e>
2025-08-29 20:25:40 +03:00
tc-mb
c4e9239064
model : support MiniCPM-V 4.5 ( #15575 )
2025-08-26 10:05:55 +02:00
Aaron Teo
ad5c975c2d
ggml-cpu: Support Q5_0 and Q5_1 on s390x ( #15486 )
...
* ggml-cpu: initial q5_0 impl for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: updated q5_0 code for better performance
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: use optimised hsum for better performance
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: introduce q5_1 simd + refactor q5_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix incorrect return type vec_hsum
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: refactor q5_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: q5_1 update loop unroll to 4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: update q5_0 unroll to 4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: update build-s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: update unused variables q5_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update the last update date
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-08-22 16:11:04 +08:00
Johannes Gäßler
7a6e91ad26
CUDA: replace GGML_CUDA_F16 with CUDA arch checks ( #15433 )
2025-08-20 16:58:49 +02:00
Aaron Teo
ff27f80a74
ggml: initial IBM zDNN backend ( #14975 )
...
* ggml-zdnn: inital backend impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: tensor->extra logging check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add output buffer check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: run compute and store into tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more loggers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update set_tensor logging to check only for matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: last working matmul version
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add comments to prevent accidentally deleting lines
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: support op out_prod
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update op out_prod to use tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rewrite the backend implementation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bugfix new impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix compiler warnings and bugfixes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: test ztensor finding in init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: implement at least 1 op to test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: assign tensor->extra to buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add check for view tensors to prevent init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rework init_tensor to create new buffers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to std vector instead of array
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch buffers back and set to arbitrary number
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: impl init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update supports_op matmul matrix
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: impl matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix compiler error missing type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing data transform call
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: tighten memory usage, change string allocation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias ztensor and data free
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias data transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more debug info for extra buffer transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add logger to check if mat mul ops go through set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: activate bias transform in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move weights transform into mulmat
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more safeguards in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix sequencing of transforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bugfix transform ztensor vs origtensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: figure out why sigtrap is happening
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move everything back to local declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move bias data to local also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bring back working matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rewrite into mre
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing vector import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing vector import in header
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt to fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix invalid ztensor buffer release
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add logging to debug free buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: remove free_buffer debug info
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add parmblkformat detections
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add nnpa installed detection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add zdnn_init call for static libs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* chore: add codeowners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable batched matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at fixing tensor views during matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: deny all view tensors directly
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix pr comments
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update ops docs for zdnn
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: redo test-backend-ops for ops.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix typo in build-s390x.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* codeowners: remove taronaeo for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "codeowners: remove taronaeo for now"
This reverts commit 411ea4ed78 .
* ggml-zdnn: remove unused ggml_zdnn macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-08-15 21:11:22 +08:00
rainred
cf9e5648a7
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. ( #14750 )
...
* Fix MinicpmV model converter and clip to avoid using hardcode.
* Code update for pr/14750
* Remove unused field, update script path in docs.
* Add version 5 for fallback code.
---------
Co-authored-by: lzhang <zhanglei@modelbest.cn >
2025-08-11 16:12:12 +02:00
tc-mb
952a47f455
mtmd : support MiniCPM-V 4.0 ( #14983 )
...
* support minicpm-v 4
* add md
* support MiniCPM-o 4.0
* add default location
* temp rm MiniCPM-o 4.0
* fix code
* fix "minicpmv_projector" default path
2025-07-31 17:22:17 +02:00
hipudding
11490b3672
CANN: Improve loading efficiency after converting weights to NZ format. ( #14985 )
...
* CANN: Improve loading efficiency after converting weights to NZ format.
* CANN: fix typo
2025-07-31 19:47:20 +08:00
Xinpeng Dou
61550f8231
CANN: update ops docs ( #14935 )
...
* CANN:add ops docs
* CANN: update ops docs
2025-07-30 08:39:24 +08:00
lhez
8ad7b3e65b
opencl : add ops docs ( #14910 )
2025-07-28 18:50:17 +02:00
Xuan-Son Nguyen
00fa15fedc
mtmd : add support for Voxtral ( #14862 )
...
* mtmd : add support for Voxtral
* clean up
* fix python requirements
* add [BEGIN_AUDIO] token
* also support Devstral conversion
* add docs and tests
* fix regression for ultravox
* minor coding style improvement
* correct project activation fn
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-07-28 15:01:48 +02:00
Georgi Gerganov
a5771c9eea
ops : update BLAS ( #14914 )
2025-07-28 10:01:03 +02:00
Georgi Gerganov
c35f9eaf09
ops : update Metal ( #14912 )
2025-07-28 08:22:56 +03:00
Ruben Ortlam
bf78f5439e
vulkan: add ops docs ( #14900 )
2025-07-27 15:33:08 +02:00
Akarshan Biswas
bbfc849274
SYCL: add ops doc ( #14901 )
2025-07-27 17:52:58 +05:30
Aman Gupta
446595b9b3
Docs: add instructions for adding backends ( #14889 )
2025-07-27 09:36:43 +08:00
Aaron Teo
c7f3169cd5
ggml-cpu : disable GGML_NNPA by default due to instability ( #14880 )
...
* docs: update s390x document for sentencepiece
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit e086c5e3a7 )
* docs: update huggingface links + reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 8410b085ea )
* ggml-cpu: disable ggml-nnpa compile flag by default
fixes #14877
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 412f4c7c88 )
* docs: update s390x build docs to reflect nnpa disable
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit c1eeae1d0c )
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-25 19:09:03 +02:00
wooksong
e7fecba934
docs : update HOWTO‑add‑model.md for ModelBase and new model classes ( #14874 )
...
This patch updates the example in docs/development/HOWTO-add-model.md to
reflect recent changes after `TextModel` and `MmprojModel` were introduced.
It replaces the outdated `Model` base class with `TextModel` or `MmprojModel`
and updates the registration example accordingly.
Signed-off-by: Wook Song <wook16.song@samsung.com >
2025-07-25 16:25:05 +02:00
R0CKSTAR
3f4fc97f1d
musa: upgrade musa sdk to rc4.2.0 ( #14498 )
...
* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-07-24 20:05:37 +01:00
Pouya
39cffdf188
docs: add libcurl-dev install hint for Linux distros ( #14801 )
...
* docs: add libcurl-dev install hint for Linux distros
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
* Update docs/build.md
---------
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-07-24 11:26:44 +02:00
rspOverflow
b526ad2668
Documentation: Further revisions to the Vulkan section in build.md ( #14785 )
...
* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.
* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md
2025-07-20 18:55:32 +02:00
rspOverflow
f0d4d176df
Documentation: Update build.md's Vulkan section ( #14736 )
...
* Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation.
* Documentation: Reorganize build.md's Vulkan section.
2025-07-19 12:18:36 +02:00