llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-12 10:47:01 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	9dd7a0390f	llama : add log about loading model tensors (#11699 )	2025-02-06 13:41:37 +02:00
Georgi Gerganov	0f1c1cab2c	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-06 10:04:33 +02:00
Georgi Gerganov	e0d913fccb	llama : clear whitespaces	2025-02-06 10:02:50 +02:00
Johannes Gäßler	fd08255d0d	CUDA: non-contiguous (RMS) norm support (#11659 ) * CUDA: non-contiguous (RMS) norm support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-04 22:21:42 +01:00
Molly Sophia	1eca8916b5	llama : fix rwkv inference (#11618 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-03 14:17:50 +02:00
Olivier Chafik	90f9b88afb	nit: more informative crash when grammar sampler fails (#11593 )	2025-02-02 19:58:34 +00:00
Georgi Gerganov	74b0807245	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-02 11:07:05 +02:00
Georgi Gerganov	3e23be7911	context : store graph build function callback ggml-ci	2025-02-02 10:49:32 +02:00
piDack	0cec062a63	llama : add support for GLM-Edge and GLM-Edge-V series models (#10573 ) * add glm edge chat model * use config partial_rotary_factor as rope ratio * support for glm edge model * vision model support * remove debug info * fix format * llava.cpp trailing whitespace * remove unused AutoTokenizer * Update src/llama.cpp for not contain <\|end\|> or </s> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * add edge template * fix chat template * fix confict * fix confict * fix ci err * fix format err * fix template err * 9b hf chat support * format * format clip.cpp * fix format * Apply suggestions from code review * Apply suggestions from code review * Update examples/llava/clip.cpp * fix format * minor : style --------- Co-authored-by: liyuhang <yuhang.li@zhipuai.cn> Co-authored-by: piDack <pcdack@hotmail.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: liyuhang <yuhang.li@aminer.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-02 09:48:46 +02:00
Georgi Gerganov	5d3491e789	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-01-31 15:11:11 +02:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Georgi Gerganov	a40ba49fa6	Merge branch 'master' into gg/llama-kv-cache	2025-01-30 16:39:58 +02:00
mgroeber9110	ffd0821c57	vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496 )	2025-01-30 12:10:59 +02:00
Georgi Gerganov	c30e34cdba	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-01-29 15:01:26 +02:00
Georgi Gerganov	918885697e	llama : resolve rwkv conflict ggml-ci	2025-01-29 14:45:04 +02:00
Molly Sophia	325afb370a	llama: fix missing k_cache store for rwkv6qwen2 (#11445 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-01-29 12:07:21 +08:00
lexasub	a5203b4465	llama : minor fixes for up llama load model speed (#11448 ) * impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30% * llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings * Update src/llama-vocab.cpp --------- Co-authored-by: lexasub <empty@empty.ru> Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-27 14:42:09 +01:00
Georgi Gerganov	e665b57fa2	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-01-27 14:09:22 +02:00
Johannes Gäßler	df984e0147	llama: refactor llama_decode_impl (#11381 )	2025-01-27 12:07:12 +01:00
Georgi Gerganov	a0c500b4dc	context : prepare for abstraction ggml-ci	2025-01-26 20:16:22 +02:00
Georgi Gerganov	99422dfa3f	context : introduce llama_batch_manager ggml-ci	2025-01-26 20:16:22 +02:00
Georgi Gerganov	cb8f2095c6	wip	2025-01-26 20:16:22 +02:00
Georgi Gerganov	133ad6a723	context : initial need_reserve logic ggml-ci	2025-01-26 20:16:22 +02:00
Georgi Gerganov	c75ba6851e	context : move adapter code in the implementation [no ci]	2025-01-26 20:16:22 +02:00
Georgi Gerganov	f0713498fd	context : add get_ctx_padding() ggml-ci	2025-01-26 20:16:22 +02:00
Georgi Gerganov	b4ec1d4429	cont : move kv_self update to llama_context ggml-ci	2025-01-26 20:16:21 +02:00
Georgi Gerganov	f2524c0e41	llama : remove references to llama_kv_cache (wip) Intermediate step necessary to abstract the `llama_context` and `llama_kv_cache`. ggml-ci	2025-01-26 20:16:21 +02:00
Georgi Gerganov	ae274f9747	llama : fix names [no ci]	2025-01-26 20:16:21 +02:00
Georgi Gerganov	a19f671fe0	context : minor ggml-ci	2025-01-26 20:16:21 +02:00
Georgi Gerganov	17b363afd3	llama : update llama_kv_self API ggml-ci	2025-01-26 20:16:20 +02:00
Georgi Gerganov	fd05ab87aa	kv_cache : move state read/write to llama_kv_cache ggml-ci	2025-01-26 20:14:36 +02:00
Georgi Gerganov	4cd1b6fa4c	context : prepare kv_cache_read/write to be moved to kv_cache ggml-ci	2025-01-26 20:14:36 +02:00
Georgi Gerganov	73a14eccc9	kv_cache : minor	2025-01-26 20:14:36 +02:00
Georgi Gerganov	fef90cb3d7	kv_cache : fix ggml-ci	2025-01-26 20:14:36 +02:00
Georgi Gerganov	4d7bd03e65	kv_cache : functions -> members ggml-ci	2025-01-26 20:14:36 +02:00
Georgi Gerganov	f78b396ee7	llama : add struct llama_kv_cache (wip) [no ci]	2025-01-26 20:12:06 +02:00
Frank Mai	1d8ee06000	rpc: fix register position (#11424 ) Signed-off-by: thxCode <thxcode0824@gmail.com>	2025-01-26 16:20:34 +01:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-21 13:18:51 +00:00
Christopher Nielsen	90d987b105	mmap: add include for cerrno (#11296 ) ggml-ci Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-20 16:02:43 +02:00
Xuan Son Nguyen	ec7f3ac9ab	llama : add support for Deepseek-R1-Qwen distill model (#11310 ) * llama : add support for Deepseek-R1-Qwen distill model * coding style	2025-01-20 14:35:07 +01:00
Georgi Gerganov	ef6dada60c	cont : fix whitespaces (#11305 )	2025-01-20 09:29:32 +02:00
Kyle Bruene	ae3c1db2f9	llama : re-add LLM_ARCH_PHIMOE (#11305 ) Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.	2025-01-20 09:21:01 +02:00
Georgi Gerganov	4dd34ff831	cmake : add sanitizer flags for llama.cpp (#11279 ) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-18 16:18:15 +02:00
Radoslav Gerganov	667d72846c	rpc : early register backend devices (#11262 ) Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609	2025-01-17 10:57:09 +02:00
Georgi Gerganov	a133566d34	vocab : fix double-eos check (#11273 ) ggml-ci	2025-01-17 09:28:00 +02:00
Xuan Son Nguyen	681149ced2	llama : add `llama_model_load_from_splits` (#11255 ) * llama : add `llama_model_load_from_splits` * update	2025-01-16 13:54:08 +01:00
Johannes Gäßler	432df2d5f9	RoPE: fix back, CUDA support for back + noncont. (#11240 ) * RoPE: fix back, CUDA support for back + noncont. * fix comments reg. non-cont. RoPE support [no-ci]	2025-01-15 12:51:37 +01:00
Georgi Gerganov	bbf3e55e35	vocab : add dummy tokens for "no_vocab" type (#11231 ) * vocab : add dummy tokens for "no_vocab" type ggml-ci * vocab : minor [no ci]	2025-01-14 11:54:58 +01:00
Daniel Bevenius	8f70fc3d1b	llama : remove 'd' from bad special token log (#11212 ) This commit removes the 'd' from the log message in llama-vocab.cpp when logging a bad special token. The motivation for this is that currently the output can look something like the following: ```console load: bad special token: 'tokenizer.ggml.image_token_id' = 128256d, using default id -1 ```	2025-01-13 13:38:20 +01:00
Xuan Son Nguyen	9a483999a6	llama : fix chat template gguf key (#11201 )	2025-01-12 13:45:14 +01:00

1 2 3 4 5 ...

329 Commits