Ankur Verma
c7b7db0445
mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli ( #17277 )
b7067
2025-11-15 12:41:16 +01:00
Giuseppe Scrivano
1568d13c2c
vulkan: implement ABS and NEG ( #17245 )
...
* docs: update Vulkan ops
* vulkan: add NEG op
* vulkan: add ABS op
---------
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com >
b7066
2025-11-15 12:00:29 +01:00
Jeff Bolz
439342ea0b
vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths ( #17244 )
...
* vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths
* set allow_misalign
b7065
2025-11-15 11:56:15 +01:00
Jeff Bolz
234ae7d7bd
vulkan: skip all-negative-inf blocks in FA ( #17186 )
b7064
2025-11-15 10:37:25 +01:00
Jeff Bolz
38eaf32af1
vulkan: change graph_compute to be async and enable get_tensor_async ( #17158 )
...
* vulkan: change graph_compute to be async and enable get_tensor_async
This allows some additional CPU/GPU overlap for large pp workloads. Also seems
to help a bit for token gen, maybe getting rid of a small bubble between
graph_compute and get_tensor.
Async set and copy functions seem to be very rarely used, so I didn't enable
them because I didn't have a good way to test them.
The async commands need to be ordered against each other, so put them all on
the compute queue. The non-async commands still use the transfer queue.
The fence for graph_compute/get_tensor_async is submitted and waited on in
ggml_vk_synchronize.
* fix thread safety errors
* teardown context cleanly
* Handle async read to non-pinned dst
b7063
2025-11-15 09:06:41 +01:00
Xuan-Son Nguyen
9b17d74ab7
mtmd: add mtmd_log_set ( #17268 )
b7062
2025-11-14 15:56:19 +01:00
Bartowski
e1fcf8b09b
model : add AfmoeForCausalLM support ( #16477 )
...
* Add AFMOE model support
* Update to vocab
* Add model sizing
* Undo Rope change for ARCEE model
* Address review comments
* Update modeling code is_sliding -> use_rope, replace hard-coded logic
* Fix AFMOE tokenizer
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update AFMoE tokenizer class identification to be more unique
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
b7061
2025-11-14 13:54:10 +01:00
Marek Hradil jr.
6cd0cf72ce
fix : Dangling pointer for non-empty trigger words in lazy grammar construction ( #17048 )
...
* fix : Dangling pointer for non-empty trigger words in llama_sampler_init_grammar_impl (#17047 )
* Replace 'static' workaround, with keeping variable in scope for longer
* Create std::array directly and pass into llama_grammar_init_impl
* Add back the trigger pattern
* Missed array include
b7060
2025-11-14 14:35:26 +02:00
Georgi Gerganov
d396b43748
server : fix "can batch with" bug ( #17263 )
b7059
2025-11-14 14:03:45 +02:00
Georgi Gerganov
45c6ef7307
metal : support argsort for ne00 > 1024 ( #17247 )
...
* metal : refactor argsort
* cont : sort chunks
* cont : merge sorted buckets
* cont : cleanup
b7058
2025-11-14 09:36:06 +02:00
Georgi Gerganov
2606b0adab
metal : make the FA extra sizes consistent ( #17143 )
b7057
2025-11-14 09:13:34 +02:00
ixgbe
307772fcda
readme : add RVV,ZVFH,ZFH,ZICBOP support for RISC-V ( #17259 )
...
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn >
2025-11-14 09:12:56 +02:00
Aleksander Grygier
f1bad23f88
Better UX for handling multiple attachments in WebUI ( #17246 )
b7055
2025-11-14 01:19:08 +01:00
Alberto Cabrera Pérez
becc4816dd
ggml-cpu: handle 3d tensors in repack mat_mul ( #17241 )
...
* ggml-cpu: handle 3d tensors in repack mul_mat
* Removed unnecessary branch, removed need for <algorithm>
* Fixed dst_ptr pointer in chunk + clang_format
* GGML_ASSERT to check wdata within bounds
* Accidental ggml.h inclusion
* Improved GGML_ASSERT on wdata boundaries
* Address performance regression in Qwen and llama.cpp due to chunking
b7054
2025-11-13 12:53:00 -08:00
Xuan-Son Nguyen
c4abcb2457
server: fixing naming conflict res_error ( #17243 )
b7053
2025-11-13 20:53:47 +01:00
Piotr Wilkin (ilintar)
389ac78b26
ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM ( #17063 )
...
* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM
* Update ggml/include/ggml.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Code review
* Whitespace
* Update tests/test-backend-ops.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* This is actually sigmoid, duh.
* Add CONST, remove TRI_KEEP, other changes from review
* Update tests/test-backend-ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml-cuda/unary.cu
Co-authored-by: Aman Gupta <amangupta052@gmail.com >
* Remove extra script
* Update ggml/src/ggml.c
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* moving changes from laptop [no ci]
* pre-rebase
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Refactor tests
* ggml : cleanup
* cont : fix ggml_fill srcs
* tests : add note
* ggml : add ggml_fill_inplace
* ggml : add asserts
* ggml : fix ggml_fill constant cast
* cont : ggml_tri minor
* Use TENSOR_LOCALS
* Fix regression from #14596 , regenerate
* Don't make commits at night...
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
Co-authored-by: Aman Gupta <amangupta052@gmail.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
b7052
2025-11-13 20:54:47 +02:00
Ruben Ortlam
a19bd6f7ce
vulkan: remove shell call from vulkan-shaders-gen tool, revert file check ( #17219 )
...
* vulkan: remove shell call from vulkan-shaders-gen tool
* use string vector for command execution
* Fix condition
* use string, remove const_cast
* Fix dependency file quotation on Windows
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com >
b7051
2025-11-13 14:51:21 +01:00
Diego Devesa
dd091e52f8
sched : fix reserve ignoring user tensor assignments ( #17232 )
b7050
2025-11-13 13:14:02 +01:00
ixgbe
1215dde7b0
ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations ( #17227 )
...
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn >
b7049
2025-11-13 13:13:32 +01:00
bagheera
0cfb19166b
metal: accelerated conv2d ( #17175 )
...
* metal: accelerated conv2d
* cont : cleanup
---------
Co-authored-by: bghira <bghira@users.github.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b7048
2025-11-13 13:32:44 +02:00
Georgi Gerganov
2776db6c81
Revert "ggml-cpu: handle 3d tensors in repack mat_mul ( #17030 )" ( #17233 )
...
This reverts commit 1c398dc9ec .
b7047
2025-11-13 12:59:37 +02:00
Diego Devesa
879dec341a
ggml-cpu : use template for argsort ( #17222 )
b7046
2025-11-13 10:59:05 +02:00
TecJesh
97d5117217
CANN: Add cross_entropy_loss op support ( #16886 )
...
* update L2_NORM op support
* update L2_NORM op support
* remove extra whitespace
* cann: update cross_entropy_loss op support
* remove trailing whitespaces
* rebase the latest code in the main repository and remove the l2_norm operator that already exists in another pull request.
* undo the l2_norm operator deletion
b7045
2025-11-13 09:39:51 +08:00
Aman Gupta
a90eb94ca9
CUDA: fuse rope + set_rows ( #16884 )
...
* CUDA: add fused rope
* move k forward_expand up
* create helper function instead of re-using params
* make assert statement more in line with comment
* rope_norm: coalesced writes to global mem
b7044
2025-11-13 08:50:01 +08:00
Neo Zhang Jianyu
07751f8d44
update SYCL support OPs ( #17208 )
...
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com >
2025-11-13 08:42:23 +08:00
o7si
ffb6f3d921
vocab : correct bounds check for UGM XCDA array access ( #17215 )
b7042
2025-11-12 23:41:02 +01:00
Johannes Gäßler
5d6838b74f
CUDA: static assert to prevent misuse of memcpy_1 ( #17198 )
b7041
2025-11-12 23:13:55 +01:00
Mike Abbott
92bb442ad9
docker : preserve .so symlinks for docker container builds ( #17214 )
2025-11-12 20:33:55 +01:00
Georgi Gerganov
374fe09cdd
ggml : use std::sort in ggml_argsort CPU implementation ( #17211 )
...
* ggml : use std::sort in ggml_argsort CPU implementation
* cont : add missing header
b7039
2025-11-12 20:43:38 +02:00
Aleksander Grygier
8e878f0cb4
Update packages + upgrade Storybook to v10 ( #17201 )
...
* chore: Update packages + upgrade Storybook to v10
* fix: Increase timeout for UI tests
2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen
00c94083b3
server: (refactor) implement generator-based API for task results ( #17174 )
...
* server: (refactor) implement generator-based API for task results
* improve
* moving some code
* fix "Response ended prematurely"
* add sink.done before return false
* rm redundant check
* rm unused var
* rename generator --> reader
b7037
2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen
017eceed61
ci: add check vendor job ( #17179 )
...
* ci: add check vendor job
* use dev version of miniaudio
* move to dedicated workflow, only run on related files changed
2025-11-12 14:56:02 +01:00
Xuan-Son Nguyen
ee8dd5c658
server: move res_error/res_ok to static function ( #17167 )
b7035
2025-11-12 14:17:24 +01:00
Alberto Cabrera Pérez
1c398dc9ec
ggml-cpu: handle 3d tensors in repack mat_mul ( #17030 )
...
* ggml-cpu: handle 3d tensors in repack mul_mat
* Removed unnecessary branch, removed need for <algorithm>
* Fixed dst_ptr pointer in chunk + clang_format
* GGML_ASSERT to check wdata within bounds
* Accidental ggml.h inclusion
* Improved GGML_ASSERT on wdata boundaries
b7034
2025-11-12 14:52:19 +02:00
Adrien Gallouët
52cf111b31
cmake : cleanup ( #17199 )
b7033
2025-11-12 14:48:30 +02:00
Adrien Gallouët
78010a0d52
cmake : move OpenSSL linking to vendor/cpp-httplib ( #17177 )
...
* cmake : move OpenSSL linking to vendor/cpp-httplib
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
* bring back httplib 0.27.0
* add -DLLAMA_HTTPLIB
* update cmake config for visionos
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
b7032
2025-11-12 12:32:50 +01:00
TecJesh
655cddd174
CANN: Add L2_NORM op support ( #16856 )
...
* update L2_NORM op support
* update L2_NORM op support
* remove extra whitespace
b7031
2025-11-12 15:11:42 +08:00
Neo Zhang Jianyu
5da7664960
[SYCL]fix ci crash about SSM_CONV ( #17169 )
...
* fix ci crash
* Update ggml-sycl.cpp
* Update ggml/src/ggml-sycl/ggml-sycl.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
b7030
2025-11-12 14:44:29 +08:00
Raul Torres
23a46ce972
CANN: GGML_CANN_ACL_GRAPH works only USE_ACL_GRAPH enabled ( #16861 )
...
The documentation should state that `GGML_CANN_ACL_GRAPH` is only effective if `USE_ACL_GRAPH` was enabled at compilation time.
2025-11-12 14:37:52 +08:00
Max Krasnyansky
c273d75375
hexagon: various Op fixes ( #17135 )
...
* hexagon: explicitly check for ops with zero nrows
llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows.
Somehow other backends seems to handle this without obvious explicit checks.
In the hexagon case we need to check explicitly and skip them.
* hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL
Co-authored-by: chraac <chraac@gmail.com >
* hexagon: use fastdiv in ADD_ID
* hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs
---------
Co-authored-by: chraac <chraac@gmail.com >
b7028
2025-11-11 15:25:04 -08:00
Eve
7d019cff74
disable rms norm mul rope for chips with no fp16 rte ( #17134 )
b7027
2025-11-11 12:53:30 -06:00
sudhiarm
3fe36c3238
ci: add Arm-hosted Graviton4 runner ( #17021 )
...
* ci: add Arm-hosted Graviton4 runner
* ci: add missing dependencies for graviton4 build
* ci: enable LFS checkout on graviton4
* ci: move git-lfs install to dependencies in Graviton4 workflow
2025-11-11 17:58:05 +02:00
Xuan-Son Nguyen
1d45b4228f
vendor: split httplib to cpp/h files ( #17150 )
...
* vendor: split httplib to cpp/h files
* move defines
* include httplib if curl is not used
* add TODO
* fix build ios
* fix build visionos instead
b7025
2025-11-11 13:32:58 +01:00
ixgbe
ca4844062b
ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion ( #17161 )
...
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn >
b7024
2025-11-11 13:41:51 +02:00
duduta
73460f6278
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 ( #16805 )
...
* extract rotate_pairs logic from ggml_compute_forward_rope_f32
* templateify ggml_compute_forward_rope_f32 and _f16
* abort when rope type not supported, remove GLM from test-rope
* add imrope branch to switch
* add rope tests for perf
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b7023
2025-11-11 13:33:24 +02:00
Charles Xu
8c583242ad
kleidiai: add optimized per-channel kernels for Q8_0 ( #16993 )
b7022
2025-11-11 13:20:31 +02:00
Mike Abbott
4a5b8aff40
cmake : add version to all shared object files ( #17091 )
...
When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned. This applies a version to all generated so files, allowing the package to build without errors.
b7021
2025-11-11 13:19:50 +02:00
Nicolas B. Pierron
d2d626938a
Install rpc-server when GGML_RPC is ON. ( #17149 )
b7020
2025-11-11 10:53:59 +00:00
levkropp
2fc392ce35
convert : register UMT5Model architecture for T5 conversion ( #17160 )
...
Register UMT5Model as a supported architecture variant for T5 model conversion.
This allows the conversion to work for models downloaded with AutoModel.
2025-11-11 09:38:30 +01:00
lhez
ece0f5c177
opencl: add fastdiv and use it in set_rows, ported from cuda ( #17090 )
...
* opencl: add fastdiv for mm q8_0
* opencl: use uint4 for fastdiv vals
* opencl: use fastdiv for set_rows
* opencl: do not use fastdiv for q8_0 mm
b7018
2025-11-10 15:00:13 -08:00