Georgi Gerganov
f01f9f6db4
Revert "memory : move the recurrent state into the memory context"
...
This reverts commit 00f115fe81 .
2025-11-14 14:36:12 +02:00
Georgi Gerganov
d05215c59f
memory : move the recurrent state into the memory context
2025-11-14 14:36:12 +02:00
Georgi Gerganov
d1255e9b4d
graph : fix reuse check for recurrent inputs
2025-11-14 14:36:12 +02:00
Georgi Gerganov
3e898f119c
graph : reuse recurrent graphs
2025-11-14 14:36:12 +02:00
Georgi Gerganov
ede13c43cb
graph : reuse hybrid graphs
2025-11-14 14:36:12 +02:00
Marek Hradil jr.
6cd0cf72ce
fix : Dangling pointer for non-empty trigger words in lazy grammar construction ( #17048 )
...
* fix : Dangling pointer for non-empty trigger words in llama_sampler_init_grammar_impl (#17047 )
* Replace 'static' workaround, with keeping variable in scope for longer
* Create std::array directly and pass into llama_grammar_init_impl
* Add back the trigger pattern
* Missed array include
b7060
2025-11-14 14:35:26 +02:00
Georgi Gerganov
d396b43748
server : fix "can batch with" bug ( #17263 )
b7059
2025-11-14 14:03:45 +02:00
Georgi Gerganov
45c6ef7307
metal : support argsort for ne00 > 1024 ( #17247 )
...
* metal : refactor argsort
* cont : sort chunks
* cont : merge sorted buckets
* cont : cleanup
b7058
2025-11-14 09:36:06 +02:00
Georgi Gerganov
2606b0adab
metal : make the FA extra sizes consistent ( #17143 )
b7057
2025-11-14 09:13:34 +02:00
ixgbe
307772fcda
readme : add RVV,ZVFH,ZFH,ZICBOP support for RISC-V ( #17259 )
...
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn >
2025-11-14 09:12:56 +02:00
Aleksander Grygier
f1bad23f88
Better UX for handling multiple attachments in WebUI ( #17246 )
b7055
2025-11-14 01:19:08 +01:00
Alberto Cabrera Pérez
becc4816dd
ggml-cpu: handle 3d tensors in repack mat_mul ( #17241 )
...
* ggml-cpu: handle 3d tensors in repack mul_mat
* Removed unnecessary branch, removed need for <algorithm>
* Fixed dst_ptr pointer in chunk + clang_format
* GGML_ASSERT to check wdata within bounds
* Accidental ggml.h inclusion
* Improved GGML_ASSERT on wdata boundaries
* Address performance regression in Qwen and llama.cpp due to chunking
b7054
2025-11-13 12:53:00 -08:00
Xuan-Son Nguyen
c4abcb2457
server: fixing naming conflict res_error ( #17243 )
b7053
2025-11-13 20:53:47 +01:00
Piotr Wilkin (ilintar)
389ac78b26
ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM ( #17063 )
...
* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM
* Update ggml/include/ggml.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Code review
* Whitespace
* Update tests/test-backend-ops.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* This is actually sigmoid, duh.
* Add CONST, remove TRI_KEEP, other changes from review
* Update tests/test-backend-ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml-cuda/unary.cu
Co-authored-by: Aman Gupta <amangupta052@gmail.com >
* Remove extra script
* Update ggml/src/ggml.c
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* moving changes from laptop [no ci]
* pre-rebase
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Refactor tests
* ggml : cleanup
* cont : fix ggml_fill srcs
* tests : add note
* ggml : add ggml_fill_inplace
* ggml : add asserts
* ggml : fix ggml_fill constant cast
* cont : ggml_tri minor
* Use TENSOR_LOCALS
* Fix regression from #14596 , regenerate
* Don't make commits at night...
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
Co-authored-by: Aman Gupta <amangupta052@gmail.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
b7052
2025-11-13 20:54:47 +02:00
Ruben Ortlam
a19bd6f7ce
vulkan: remove shell call from vulkan-shaders-gen tool, revert file check ( #17219 )
...
* vulkan: remove shell call from vulkan-shaders-gen tool
* use string vector for command execution
* Fix condition
* use string, remove const_cast
* Fix dependency file quotation on Windows
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com >
b7051
2025-11-13 14:51:21 +01:00
Diego Devesa
dd091e52f8
sched : fix reserve ignoring user tensor assignments ( #17232 )
b7050
2025-11-13 13:14:02 +01:00
ixgbe
1215dde7b0
ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations ( #17227 )
...
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn >
b7049
2025-11-13 13:13:32 +01:00
bagheera
0cfb19166b
metal: accelerated conv2d ( #17175 )
...
* metal: accelerated conv2d
* cont : cleanup
---------
Co-authored-by: bghira <bghira@users.github.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b7048
2025-11-13 13:32:44 +02:00
Georgi Gerganov
2776db6c81
Revert "ggml-cpu: handle 3d tensors in repack mat_mul ( #17030 )" ( #17233 )
...
This reverts commit 1c398dc9ec .
b7047
2025-11-13 12:59:37 +02:00
Diego Devesa
879dec341a
ggml-cpu : use template for argsort ( #17222 )
b7046
2025-11-13 10:59:05 +02:00
TecJesh
97d5117217
CANN: Add cross_entropy_loss op support ( #16886 )
...
* update L2_NORM op support
* update L2_NORM op support
* remove extra whitespace
* cann: update cross_entropy_loss op support
* remove trailing whitespaces
* rebase the latest code in the main repository and remove the l2_norm operator that already exists in another pull request.
* undo the l2_norm operator deletion
b7045
2025-11-13 09:39:51 +08:00
Aman Gupta
a90eb94ca9
CUDA: fuse rope + set_rows ( #16884 )
...
* CUDA: add fused rope
* move k forward_expand up
* create helper function instead of re-using params
* make assert statement more in line with comment
* rope_norm: coalesced writes to global mem
b7044
2025-11-13 08:50:01 +08:00
Neo Zhang Jianyu
07751f8d44
update SYCL support OPs ( #17208 )
...
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com >
2025-11-13 08:42:23 +08:00
o7si
ffb6f3d921
vocab : correct bounds check for UGM XCDA array access ( #17215 )
b7042
2025-11-12 23:41:02 +01:00
Johannes Gäßler
5d6838b74f
CUDA: static assert to prevent misuse of memcpy_1 ( #17198 )
b7041
2025-11-12 23:13:55 +01:00
Mike Abbott
92bb442ad9
docker : preserve .so symlinks for docker container builds ( #17214 )
2025-11-12 20:33:55 +01:00
Georgi Gerganov
374fe09cdd
ggml : use std::sort in ggml_argsort CPU implementation ( #17211 )
...
* ggml : use std::sort in ggml_argsort CPU implementation
* cont : add missing header
b7039
2025-11-12 20:43:38 +02:00
Aleksander Grygier
8e878f0cb4
Update packages + upgrade Storybook to v10 ( #17201 )
...
* chore: Update packages + upgrade Storybook to v10
* fix: Increase timeout for UI tests
2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen
00c94083b3
server: (refactor) implement generator-based API for task results ( #17174 )
...
* server: (refactor) implement generator-based API for task results
* improve
* moving some code
* fix "Response ended prematurely"
* add sink.done before return false
* rm redundant check
* rm unused var
* rename generator --> reader
b7037
2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen
017eceed61
ci: add check vendor job ( #17179 )
...
* ci: add check vendor job
* use dev version of miniaudio
* move to dedicated workflow, only run on related files changed
2025-11-12 14:56:02 +01:00
Xuan-Son Nguyen
ee8dd5c658
server: move res_error/res_ok to static function ( #17167 )
b7035
2025-11-12 14:17:24 +01:00
Alberto Cabrera Pérez
1c398dc9ec
ggml-cpu: handle 3d tensors in repack mat_mul ( #17030 )
...
* ggml-cpu: handle 3d tensors in repack mul_mat
* Removed unnecessary branch, removed need for <algorithm>
* Fixed dst_ptr pointer in chunk + clang_format
* GGML_ASSERT to check wdata within bounds
* Accidental ggml.h inclusion
* Improved GGML_ASSERT on wdata boundaries
b7034
2025-11-12 14:52:19 +02:00
Adrien Gallouët
52cf111b31
cmake : cleanup ( #17199 )
b7033
2025-11-12 14:48:30 +02:00
Adrien Gallouët
78010a0d52
cmake : move OpenSSL linking to vendor/cpp-httplib ( #17177 )
...
* cmake : move OpenSSL linking to vendor/cpp-httplib
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
* bring back httplib 0.27.0
* add -DLLAMA_HTTPLIB
* update cmake config for visionos
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
b7032
2025-11-12 12:32:50 +01:00
TecJesh
655cddd174
CANN: Add L2_NORM op support ( #16856 )
...
* update L2_NORM op support
* update L2_NORM op support
* remove extra whitespace
b7031
2025-11-12 15:11:42 +08:00
Neo Zhang Jianyu
5da7664960
[SYCL]fix ci crash about SSM_CONV ( #17169 )
...
* fix ci crash
* Update ggml-sycl.cpp
* Update ggml/src/ggml-sycl/ggml-sycl.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
b7030
2025-11-12 14:44:29 +08:00
Raul Torres
23a46ce972
CANN: GGML_CANN_ACL_GRAPH works only USE_ACL_GRAPH enabled ( #16861 )
...
The documentation should state that `GGML_CANN_ACL_GRAPH` is only effective if `USE_ACL_GRAPH` was enabled at compilation time.
2025-11-12 14:37:52 +08:00
Max Krasnyansky
c273d75375
hexagon: various Op fixes ( #17135 )
...
* hexagon: explicitly check for ops with zero nrows
llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows.
Somehow other backends seems to handle this without obvious explicit checks.
In the hexagon case we need to check explicitly and skip them.
* hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL
Co-authored-by: chraac <chraac@gmail.com >
* hexagon: use fastdiv in ADD_ID
* hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs
---------
Co-authored-by: chraac <chraac@gmail.com >
b7028
2025-11-11 15:25:04 -08:00
Eve
7d019cff74
disable rms norm mul rope for chips with no fp16 rte ( #17134 )
b7027
2025-11-11 12:53:30 -06:00
sudhiarm
3fe36c3238
ci: add Arm-hosted Graviton4 runner ( #17021 )
...
* ci: add Arm-hosted Graviton4 runner
* ci: add missing dependencies for graviton4 build
* ci: enable LFS checkout on graviton4
* ci: move git-lfs install to dependencies in Graviton4 workflow
2025-11-11 17:58:05 +02:00
Xuan-Son Nguyen
1d45b4228f
vendor: split httplib to cpp/h files ( #17150 )
...
* vendor: split httplib to cpp/h files
* move defines
* include httplib if curl is not used
* add TODO
* fix build ios
* fix build visionos instead
b7025
2025-11-11 13:32:58 +01:00
ixgbe
ca4844062b
ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion ( #17161 )
...
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn >
b7024
2025-11-11 13:41:51 +02:00
duduta
73460f6278
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 ( #16805 )
...
* extract rotate_pairs logic from ggml_compute_forward_rope_f32
* templateify ggml_compute_forward_rope_f32 and _f16
* abort when rope type not supported, remove GLM from test-rope
* add imrope branch to switch
* add rope tests for perf
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
b7023
2025-11-11 13:33:24 +02:00
Charles Xu
8c583242ad
kleidiai: add optimized per-channel kernels for Q8_0 ( #16993 )
b7022
2025-11-11 13:20:31 +02:00
Mike Abbott
4a5b8aff40
cmake : add version to all shared object files ( #17091 )
...
When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned. This applies a version to all generated so files, allowing the package to build without errors.
b7021
2025-11-11 13:19:50 +02:00
Nicolas B. Pierron
d2d626938a
Install rpc-server when GGML_RPC is ON. ( #17149 )
b7020
2025-11-11 10:53:59 +00:00
levkropp
2fc392ce35
convert : register UMT5Model architecture for T5 conversion ( #17160 )
...
Register UMT5Model as a supported architecture variant for T5 model conversion.
This allows the conversion to work for models downloaded with AutoModel.
2025-11-11 09:38:30 +01:00
lhez
ece0f5c177
opencl: add fastdiv and use it in set_rows, ported from cuda ( #17090 )
...
* opencl: add fastdiv for mm q8_0
* opencl: use uint4 for fastdiv vals
* opencl: use fastdiv for set_rows
* opencl: do not use fastdiv for q8_0 mm
b7018
2025-11-10 15:00:13 -08:00
Sigbjørn Skjæret
7bef684118
models : move build_inp_out_ids outside loop ( #17151 )
...
* move build_inp_out_ids outside loop
* realign
b7017
2025-11-10 22:55:30 +01:00
Max Krasnyansky
395e286bc9
cpu: skip NOPs to avoid barriers ( #17133 )
...
* cpu: skip NOPs to avoid barriers
* cpu: use ggml_op_is_empty
b7016
2025-11-10 12:44:49 -08:00