llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-15 11:17:31 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	45c6ef7307	metal : support argsort for ne00 > 1024 (#17247 ) * metal : refactor argsort * cont : sort chunks * cont : merge sorted buckets * cont : cleanup b7058	2025-11-14 09:36:06 +02:00
Georgi Gerganov	2606b0adab	metal : make the FA extra sizes consistent (#17143 ) b7057	2025-11-14 09:13:34 +02:00
ixgbe	307772fcda	readme : add RVV,ZVFH,ZFH,ZICBOP support for RISC-V (#17259 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-14 09:12:56 +02:00
Aleksander Grygier	f1bad23f88	Better UX for handling multiple attachments in WebUI (#17246 ) b7055	2025-11-14 01:19:08 +01:00
Alberto Cabrera Pérez	becc4816dd	ggml-cpu: handle 3d tensors in repack mat_mul (#17241 ) * ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries * Address performance regression in Qwen and llama.cpp due to chunking b7054	2025-11-13 12:53:00 -08:00
Xuan-Son Nguyen	c4abcb2457	server: fixing naming conflict res_error (#17243 ) b7053	2025-11-13 20:53:47 +01:00
Piotr Wilkin (ilintar)	389ac78b26	ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063 ) * Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> b7052	2025-11-13 20:54:47 +02:00
Ruben Ortlam	a19bd6f7ce	vulkan: remove shell call from vulkan-shaders-gen tool, revert file check (#17219 ) * vulkan: remove shell call from vulkan-shaders-gen tool * use string vector for command execution * Fix condition * use string, remove const_cast * Fix dependency file quotation on Windows --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com> b7051	2025-11-13 14:51:21 +01:00
Diego Devesa	dd091e52f8	sched : fix reserve ignoring user tensor assignments (#17232 ) b7050	2025-11-13 13:14:02 +01:00
ixgbe	1215dde7b0	ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (#17227 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> b7049	2025-11-13 13:13:32 +01:00
bagheera	0cfb19166b	metal: accelerated conv2d (#17175 ) * metal: accelerated conv2d * cont : cleanup --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b7048	2025-11-13 13:32:44 +02:00
Georgi Gerganov	2776db6c81	Revert "ggml-cpu: handle 3d tensors in repack mat_mul (#17030 )" (#17233 ) This reverts commit `1c398dc9ec`. b7047	2025-11-13 12:59:37 +02:00
Diego Devesa	879dec341a	ggml-cpu : use template for argsort (#17222 ) b7046	2025-11-13 10:59:05 +02:00
TecJesh	97d5117217	CANN: Add cross_entropy_loss op support (#16886 ) * update L2_NORM op support * update L2_NORM op support * remove extra whitespace * cann: update cross_entropy_loss op support * remove trailing whitespaces * rebase the latest code in the main repository and remove the l2_norm operator that already exists in another pull request. * undo the l2_norm operator deletion b7045	2025-11-13 09:39:51 +08:00
Aman Gupta	a90eb94ca9	CUDA: fuse rope + set_rows (#16884 ) * CUDA: add fused rope * move k forward_expand up * create helper function instead of re-using params * make assert statement more in line with comment * rope_norm: coalesced writes to global mem b7044	2025-11-13 08:50:01 +08:00
Neo Zhang Jianyu	07751f8d44	update SYCL support OPs (#17208 ) Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>	2025-11-13 08:42:23 +08:00
o7si	ffb6f3d921	vocab : correct bounds check for UGM XCDA array access (#17215 ) b7042	2025-11-12 23:41:02 +01:00
Johannes Gäßler	5d6838b74f	CUDA: static assert to prevent misuse of memcpy_1 (#17198 ) b7041	2025-11-12 23:13:55 +01:00
Mike Abbott	92bb442ad9	docker : preserve .so symlinks for docker container builds (#17214 )	2025-11-12 20:33:55 +01:00
Georgi Gerganov	374fe09cdd	ggml : use std::sort in ggml_argsort CPU implementation (#17211 ) * ggml : use std::sort in ggml_argsort CPU implementation * cont : add missing header b7039	2025-11-12 20:43:38 +02:00
Aleksander Grygier	8e878f0cb4	Update packages + upgrade Storybook to v10 (#17201 ) * chore: Update packages + upgrade Storybook to v10 * fix: Increase timeout for UI tests	2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen	00c94083b3	server: (refactor) implement generator-based API for task results (#17174 ) * server: (refactor) implement generator-based API for task results * improve * moving some code * fix "Response ended prematurely" * add sink.done before return false * rm redundant check * rm unused var * rename generator --> reader b7037	2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen	017eceed61	ci: add check vendor job (#17179 ) * ci: add check vendor job * use dev version of miniaudio * move to dedicated workflow, only run on related files changed	2025-11-12 14:56:02 +01:00
Xuan-Son Nguyen	ee8dd5c658	server: move res_error/res_ok to static function (#17167 ) b7035	2025-11-12 14:17:24 +01:00
Alberto Cabrera Pérez	1c398dc9ec	ggml-cpu: handle 3d tensors in repack mat_mul (#17030 ) * ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries b7034	2025-11-12 14:52:19 +02:00
Adrien Gallouët	52cf111b31	cmake : cleanup (#17199 ) b7033	2025-11-12 14:48:30 +02:00
Adrien Gallouët	78010a0d52	cmake : move OpenSSL linking to vendor/cpp-httplib (#17177 ) * cmake : move OpenSSL linking to vendor/cpp-httplib Signed-off-by: Adrien Gallouët <angt@huggingface.co> * bring back httplib 0.27.0 * add -DLLAMA_HTTPLIB * update cmake config for visionos --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> b7032	2025-11-12 12:32:50 +01:00
TecJesh	655cddd174	CANN: Add L2_NORM op support (#16856 ) * update L2_NORM op support * update L2_NORM op support * remove extra whitespace b7031	2025-11-12 15:11:42 +08:00
Neo Zhang Jianyu	5da7664960	[SYCL]fix ci crash about SSM_CONV (#17169 ) * fix ci crash * Update ggml-sycl.cpp * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> b7030	2025-11-12 14:44:29 +08:00
Raul Torres	23a46ce972	CANN: GGML_CANN_ACL_GRAPH works only USE_ACL_GRAPH enabled (#16861 ) The documentation should state that `GGML_CANN_ACL_GRAPH` is only effective if `USE_ACL_GRAPH` was enabled at compilation time.	2025-11-12 14:37:52 +08:00
Max Krasnyansky	c273d75375	hexagon: various Op fixes (#17135 ) * hexagon: explicitly check for ops with zero nrows llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows. Somehow other backends seems to handle this without obvious explicit checks. In the hexagon case we need to check explicitly and skip them. * hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL Co-authored-by: chraac <chraac@gmail.com> * hexagon: use fastdiv in ADD_ID * hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs --------- Co-authored-by: chraac <chraac@gmail.com> b7028	2025-11-11 15:25:04 -08:00
Eve	7d019cff74	disable rms norm mul rope for chips with no fp16 rte (#17134 ) b7027	2025-11-11 12:53:30 -06:00
sudhiarm	3fe36c3238	ci: add Arm-hosted Graviton4 runner (#17021 ) * ci: add Arm-hosted Graviton4 runner * ci: add missing dependencies for graviton4 build * ci: enable LFS checkout on graviton4 * ci: move git-lfs install to dependencies in Graviton4 workflow	2025-11-11 17:58:05 +02:00
Xuan-Son Nguyen	1d45b4228f	vendor: split httplib to cpp/h files (#17150 ) * vendor: split httplib to cpp/h files * move defines * include httplib if curl is not used * add TODO * fix build ios * fix build visionos instead b7025	2025-11-11 13:32:58 +01:00
ixgbe	ca4844062b	ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion (#17161 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> b7024	2025-11-11 13:41:51 +02:00
duduta	73460f6278	ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805 ) * extract rotate_pairs logic from ggml_compute_forward_rope_f32 * templateify ggml_compute_forward_rope_f32 and _f16 * abort when rope type not supported, remove GLM from test-rope * add imrope branch to switch * add rope tests for perf * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b7023	2025-11-11 13:33:24 +02:00
Charles Xu	8c583242ad	kleidiai: add optimized per-channel kernels for Q8_0 (#16993 ) b7022	2025-11-11 13:20:31 +02:00
Mike Abbott	4a5b8aff40	cmake : add version to all shared object files (#17091 ) When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned. This applies a version to all generated so files, allowing the package to build without errors. b7021	2025-11-11 13:19:50 +02:00
Nicolas B. Pierron	d2d626938a	Install rpc-server when GGML_RPC is ON. (#17149 ) b7020	2025-11-11 10:53:59 +00:00
levkropp	2fc392ce35	convert : register UMT5Model architecture for T5 conversion (#17160 ) Register UMT5Model as a supported architecture variant for T5 model conversion. This allows the conversion to work for models downloaded with AutoModel.	2025-11-11 09:38:30 +01:00
lhez	ece0f5c177	opencl: add fastdiv and use it in set_rows, ported from cuda (#17090 ) * opencl: add fastdiv for mm q8_0 * opencl: use uint4 for fastdiv vals * opencl: use fastdiv for set_rows * opencl: do not use fastdiv for q8_0 mm b7018	2025-11-10 15:00:13 -08:00
Sigbjørn Skjæret	7bef684118	models : move build_inp_out_ids outside loop (#17151 ) * move build_inp_out_ids outside loop * realign b7017	2025-11-10 22:55:30 +01:00
Max Krasnyansky	395e286bc9	cpu: skip NOPs to avoid barriers (#17133 ) * cpu: skip NOPs to avoid barriers * cpu: use ggml_op_is_empty b7016	2025-11-10 12:44:49 -08:00
Georgi Gerganov	13730c183b	metal : cap threadgroups size of set_rows (#17146 ) b7015	2025-11-10 21:33:35 +02:00
Adrien Gallouët	967eb4b2bf	ggml-cpu : inspect -march and -mcpu to found the CPU (#16333 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co> b7014	2025-11-10 21:03:36 +02:00
Ruben Ortlam	f117be185e	vulkan: check glslc executable string (#17144 ) b7013	2025-11-10 16:59:26 +01:00
Ruben Ortlam	85234a4b3a	vulkan: fix validation issue introduced by #16868 (#17145 ) b7012	2025-11-10 16:59:10 +01:00
Gabe Goodhart	0c74f32632	memory: Hybrid context shift (#17009 ) * feat(memory): Only fail partial erasure of recurrent tail The recurrent state is always assumed to be the state as of the last update from the final token in the sequence. When doing a partial erasure, if the range does not include the final token, the erasure can be considered a success since any memory used for the sequence prior to the final token (which is no memory) has been successfully removed. There is one potential case that this doesn't address which is the pruning of cache to remove sensitive data from the context. This wouldn't work for attention cache partial removal (in the middle) either since the KV state is linearly-dependent and states in later sequence positions would still be based on the state from the sensitive data, even if that data is no longer cached, so I don't think this is relevant, but it is worth noting that the semantics of this change for a partial erasure in the middle of the cache are essentially "my context is already compressed" and not "all trace of the removed tokens has been removed." https://github.com/ggml-org/llama.cpp/issues/16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(main): Check the output of seq_rm for prefix matching This prefix matching is explicitly attempting to remove the tokens at the end of the sequence that don't match. This is the operation that can't be performed on a recurrent cache due to the state being updated in place, so if this removal fails, we need to clear the whole cache. https://github.com/ggml-org/llama.cpp/issues/16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix(memory): Fix condition for partial erasure failure if p0 > pos Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: compilade <git@compilade.net> * style: Fix extra parens Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix(main.cpp): Set n_matching_session_tokens to 0 on cache clear https://github.com/ggml-org/llama.cpp/issues/16768 Branch: HybridContextShift-16768 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: compilade <git@compilade.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b7011	2025-11-10 17:14:23 +02:00
Georgi Gerganov	c27efd2bd1	metal : enable tensor API for A19 (#17087 ) b7010	2025-11-10 15:38:42 +02:00
fj-y-saito	df70bedda7	arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_… (#15277 ) * add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K * Surround SVE function with compiler directive * fix compile switch * fix coding style * ggml : fix indent --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b7009	2025-11-10 15:12:59 +02:00

1 2 3 4 5 ...

7058 Commits