llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Author	SHA1	Message	Date
Aleksander Grygier	aa9538a63a	webui: Remove running `llama-server` within WebUI `dev.sh` script (#16363 )	2025-10-01 08:40:26 +03:00
Bartowski	e74c92e842	model : support GLM 4.6 (make a few NextN/MTP tensors not required) (#16359 ) * Make a few GLM tensors not required layer.nextn.shared_head_head and layer.nextn.embed_tokens are both excluded from GLM 4.6 resulting in the model not loading after conversion/quantization, this marks those tensors as not required which makes it work * Update llama-model.cpp layer.nextn.shared_head_norm also not required in case of future models b6653	2025-09-30 22:24:36 +02:00
Sigbjørn Skjæret	b2ba81dbe0	ci : fix ccache key for ubuntu-cpu-cmake (#16355 ) * fix ccache key for ubuntu-cpu-cmake * set it for release as well [no ci]	2025-09-30 21:41:42 +02:00
Adrien Gallouët	bf6f3b3a19	common : disable progress bar without a tty (#16352 ) * common : disable progress bar without a tty Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add missing headers Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co> b6651	2025-09-30 20:52:41 +03:00
lhez	7c156df414	opencl: support pad_ext (#15888 ) b6650	2025-09-30 10:45:45 -07:00
Pascal	16b0ca0d2e	Chatapi ignore empty sampling (#16330 ) * fix: skip empty sampling fields instead of coercing to 0 in chat API options * chore: update webui build output	2025-09-30 19:18:54 +02:00
Reese Levine	8d78cd2613	ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187 ) * Work on rope * Simplify inplace operation generation and combine mul/add generation * Work on rope variants * implement neox rope * rope complete * Add sub,div,glu operators * implement scale op * Update cpy shader to handle cont/more types * formatting * Update test vars printing for rope,rms_norm * Avoid ROPE hardcoded constants * Add TODO to change ROPE constants to enum Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix TODO comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> b6648	2025-09-30 09:57:51 -07:00
lhez	d1c84a662d	opencl: support ne3 in get_rows (#15866 ) b6647	2025-09-30 09:55:13 -07:00
Adrien Gallouët	364a7a6d4a	common : remove common_has_curl() (#16351 ) `test-arg-parser.cpp` has been updated to work consistently, regardless of whether CURL or SSL support is available, and now always points to `ggml.ai`. The previous timeout test has been removed, but it can be added back by providing a dedicated URL under `ggml.ai`. Signed-off-by: Adrien Gallouët <angt@huggingface.co> b6646	2025-09-30 17:39:44 +03:00
Sigbjørn Skjæret	2df5bcf357	ci : disable ccache for android (#16348 )	2025-09-30 15:38:01 +02:00
Georgi Gerganov	075c01567b	ggml : bump version to 0.9.4 (ggml/1363) b6644	2025-09-30 13:53:55 +03:00
anavp-nvidia	a014310374	cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (#16328 ) * Fix Nemotron Nano v2 9B not executing as CUDA Graph on NVIDIA GPUs * fix to ensure test-backend-ops check passes b6643	2025-09-30 11:13:22 +03:00
Georgi Gerganov	35fb82497e	metal : dynamic simdgroups for MV kernels (#16340 ) * metal : dynamic simdgroups for MV kernels * cont : minor b6642	2025-09-30 11:03:23 +03:00
Adrien Gallouët	3c62aed89f	common : simplify etag tracking by removing json (#16342 ) The JSON parser is temporarily kept only for backward compatibility. It reads the etag from old .json files to prevent unnecessary re-downloads for existing users. This legacy code can be removed in a future version. Signed-off-by: Adrien Gallouët <angt@huggingface.co> b6641	2025-09-30 10:36:33 +03:00
Charles Xu	f1eb1cb1eb	kleidiai : fix work size and threads sync for fp16 (#16246 ) b6640	2025-09-30 10:07:20 +03:00
lhez	de41f2b7bf	codeowners: add codeowners for opencl backend (#16344 )	2025-09-30 08:30:16 +03:00
Jeff Bolz	a74a0d69f3	tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences (#16295 ) * tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences * apply similar error bounds to test_cpy b6638	2025-09-29 19:26:34 -05:00
Pascal	5f7e166cbf	Fix thinking blocks with quotes + add handling `[THINK]...[/THINK]` blocks (#16326 ) * fix: prevent reasoning blocks with quotes from being truncated * chore: update webui build output * feat: Improve thinking content parsing * test: Adds ChatMessage component stories for different thinking blocks * chore: update webui build output * fix: ChatMessage story fix --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-09-29 18:49:47 +02:00
Georgi Gerganov	d72f5f7ba2	ci : add AMD runners and workflows (#16249 ) * ci : add AMD runners and workflows * ci : move AMD jobs to separate workflow * cont : fix paths	2025-09-29 17:51:48 +03:00
alex-spacemit	b77e6c18e1	ggml: riscv: add riscv spacemit backend (#15288 ) * ggml: add spacemit backend Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23 * add new line at end of file Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2 * add riscv zba extension limit Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce * fixed for review comments, file renamed and format Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce * fixed for code format, after clang-format Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2 * use _Float16 instead of __fp16 Change-Id: I039fb02bb95270e641bc4442204e658735859d43 * add ci for riscv64-spacemit-ime-native Change-Id: I711c1033061df1a289ea77891b2997599dfe8279 * update debian-13-riscv64-spacemit-ime-native ci label Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a * remove license comment for spacemit ime Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3 * upgrade binutils for gcc ime Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45 * add spacemit ime cross jobs Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6 * remove native compile for riscv64-spacemit-ime Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e * ci : add caching for spacemit ime cross toolchain Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de * ci: bug fixed for cache path and env Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a * Update .github/workflows/build-linux-cross.yml for cache path Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * bugfixed for build-linux-cross.yml, syntax error Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: cailinxi <linxi.cai@spacemit.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> b6635	2025-09-29 17:50:44 +03:00
Georgi Gerganov	2ddd3f2356	sync : ggml b6634	2025-09-29 17:43:58 +03:00
Georgi Gerganov	4d3d455d3c	sync : whisper.cpp (ggml/1359) * ggml : Fix MKL detection by quoting BLAS_INCLUDE_DIRS (whisper/3426) * sync : whisper.cpp	2025-09-29 17:43:58 +03:00
Daniel Bevenius	c9b1c06467	ggml : remove -dev suffix from release version (ggml/1355) This commit removes the `-dev` suffix from the version string in CMakeLists.txt and the release script. The version will now be just be formatted as `MAJOR.MINOR.PATCH`.	2025-09-29 17:43:58 +03:00
Daniel Bevenius	b6ae75afb4	ggml : bump version to 0.9.3 (ggml/1353)	2025-09-29 17:43:58 +03:00
Georgi Gerganov	b6dff20e2f	ggml : prepare for development of 0.9.2-dev	2025-09-29 17:43:58 +03:00
Georgi Gerganov	2db78c75e4	ggml : bump version to 0.9.1	2025-09-29 17:43:58 +03:00
Rafal Lewczuk	02463ab27b	ggml-backend : add root cause in error message if loading backend library fails (#16172 ) This PR adds additional information to an error message when loading backend library via ld_load_library() fails. This helps spotting why backend library did not load (missing library, missing dependency or unresolved symbol etc.). b6628	2025-09-29 13:17:09 +02:00
Sigbjørn Skjæret	adc76347d7	ggml : check cuda and metal argsort limits and add test (#16323 ) * check cuda argsort limits and add test * add metal check b6627	2025-09-29 11:09:00 +02:00
Aleksander Grygier	3a2bdcda0b	Improve Mobile UI for dialogs and action dropdowns (#16222 ) * fix: Always show conversation item actions * feat: Improve Alert Dialog and Dialog mobile UI * feat: Add settings reset to default confirmation * fix: Close Edit dialog on save * chore: update webui build output * webui: implement proper z-index system and scroll management - Add CSS variable for centralized z-index control - Fix dropdown positioning with Settings dialog conflicts - Prevent external scroll interference with proper event handling - Clean up hardcoded z-index values for maintainable architecture * webui: ensured the settings dialog enforces dynamic viewport height on mobile while retaining existing desktop sizing overrides * feat: Use `dvh` instead of computed px height for dialogs max height on mobile * chore: update webui build output * feat: Improve Settings fields UI * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Pascal <admin@serveurperso.com>	2025-09-29 10:37:20 +02:00
Pascal	66bb7985c3	fix: preserved zero values in chat settings inputs and textareas by switching to nullish coalescing for field values and default placeholders (#16312 )	2025-09-29 09:08:41 +02:00
Vinkal	2f61c0f5bf	llama-cli: prevent spurious assistant token (#16202 ) * tools/main: llama-cli: prevent spurious assistant token (#13402) During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece. Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged. Fixes #13402. Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> * Update tools/main/main.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * tools/main: remove outdated comment Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> --------- Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> b6624	2025-09-29 10:03:12 +03:00
ddh0	3ffd0fae47	perplexity : show more kl-divergence data (#16321 ) Adds additional percentile data for displayed in the output of `llama-perplexity --kl-divergence`: - Added 95 percentile (mirroring existing 5 percentile) - Added 0.1 percentile (mirroring existing 99.9 percentile) b6623	2025-09-29 09:30:45 +03:00
Georgi Gerganov	a4a0aa5ea2	ggml : fix dependencies for ggml_set_rows (#16318 ) b6622	2025-09-29 08:41:28 +03:00
Jeff Bolz	92cd103f62	vulkan: Fix validation failure in quantized flash attention (#16292 ) b6621	2025-09-29 06:50:37 +02:00
Sigbjørn Skjæret	b887d2f341	ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307 ) * fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 * add test that fails on simd b6620	2025-09-28 23:15:03 +02:00
crat0z	bd0af02fc9	common : fix reasoning before forced tool call via tool_choice = required (#16264 ) * common : fix reasoning before forced tool call via tool_choice = required * common : improve reasoning and commentary handling when tool_choice is required (cherry picked from commit c746984956d6882c2de73d53ae2bb3bdf889e475) --------- Co-authored-by: Alde Rojas <hello@alde.dev> b6619	2025-09-28 21:13:50 +03:00
R0CKSTAR	d9e0e7c819	ci : fix musa docker build (#16306 ) Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>	2025-09-28 16:38:15 +02:00
Aaron Teo	0124ac989f	devops: switch to using ubuntu-22.04-s390x image (#16302 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> b6617	2025-09-28 19:25:58 +08:00
Imad Saddik	2811c65286	Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] (#16297 )	2025-09-28 13:04:46 +02:00
Jeff Bolz	d8359f5fde	vulkan: 64-bit im2col (#16135 ) * vulkan: 64-bit im2col Add variants of the im2col shaders that use buffer_device_address/buffer_reference, and use 64-bit address calculations. This is needed for large convolutions used in stable-diffusion.cpp. * fix validation error for large im2col b6615	2025-09-28 08:38:37 +02:00
Georgi Gerganov	6a2c6145a0	metal : extend mat-mat multiplication support (#16225 ) * metal : support mul_mm with src1->type == GGML_TYPE_F16 * metal : support mul_mm_id with src1->type == GGML_TYPE_F16 [no ci] * metal : mul_mm support ne00 % 32 != 0 * metal : support mul_mm_id with ne00 % 32 != 0 * cont : remove unnecessary unrolls * cont : simplify data loading * metal : optimize mul_mm when output bounds checks are not needed	2025-09-28 09:34:44 +03:00
Georgi Gerganov	3b53634fe3	metal : fuse non-sequential nodes (#16102 ) * metal : fuse non-sequential nodes * cont : add comment * cont : simplify bounds checks b6613	2025-09-28 09:34:05 +03:00
Jeff Bolz	1384abf8b8	vulkan: handle mat_mul with A matrix > 4GB (#16176 ) * vulkan: handle mat_mul with A matrix > 4GB This change splits mat_mul operations with huge A matrix into chunks in the M dimension. This works well for stable-diffusion use cases where the im2col matrix has very large M. Fix the order of setting the stride in mul_mm_cm2 - setting the dimension clobbers the stride, so stride should be set after. * build fixes b6612	2025-09-27 20:36:34 -05:00
Jeff Bolz	e6d65fb02d	vulkan: support arbitrary KV dimension in flash attention (#16160 ) The "Clamp" spec constant is already based on whether KV is a multiple of Bc, so use that to control whether bounds checking is performed. Add bounds checking to the scalar and coopmat1 paths. Coopmat2 didn't need any changes (the K/V tensors are already optionally clamped, nothing else needed to be changed). b6611	2025-09-27 22:43:39 +02:00
Acly	8656f5de68	vulkan : make the vulkan.hpp dynamic dispatcher instance private (#16224 ) * don't use VULKAN_HPP_DEFAULT_DISPATCH_LOADER_DYNAMIC_STORAGE which can cause conflicts if application or other libraries do the same b6610	2025-09-27 22:41:03 +02:00
Aleksander Grygier	4807e8f96a	Show message actions by default (#16289 )	2025-09-27 19:56:40 +02:00
Aman Gupta	c0bfc57af4	CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277 ) * CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 This commit adds mul_mat_id support for ncols_dst >= 16. It does this by packing ncols_dst tiles into the blockDim.y. My tests on a RTX 3090 show that this is faster than the cuBLAS fallback for f16 till bs=64, and for f32 till bs=32 * Review: refactor if statement b6608	2025-09-27 18:49:32 +02:00
Johannes Gäßler	75a3a6c2cd	CUDA: refactor and deduplicate vector FA kernels (#16208 ) * CUDA: refactor and deduplicate vector FA kernels b6607	2025-09-27 18:45:07 +02:00
Dmytro Minochkin	0499b29c6f	vulkan: throw system error instead of SIGABRT during init on older devices (#16156 ) * Throw system error on old Vulkan driver rather than SIGABRT * Optionally handle any potential error in vulkan init b6606	2025-09-27 18:26:46 +02:00
Adrien Gallouët	234e2ff8ed	server : remove old LLAMA_SERVER_SSL (#16290 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co> b6605	2025-09-27 19:17:08 +03:00

1 2 3 4 5 ...

6704 Commits