llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-06 09:46:50 +00:00

Author	SHA1	Message	Date
younesbelkada	e96cc73390	clean ups	2025-07-07 15:13:06 +04:00
younesbelkada	a9f3a63dc1	injected mup	2025-07-07 15:00:25 +04:00
ibrahimkhadraoui	b3bc1fb237	Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased	2025-07-07 14:36:55 +04:00
ibrahimkhadraoui	286e1fa569	fix rope_theta	2025-07-07 14:36:51 +04:00
ibrahimkhadraoui	97011d7a1f	mup_vec create as float64	2025-07-07 14:25:32 +04:00
Younes B	6c39e775dd	fix conversion and d_inner	2025-07-07 10:56:49 +02:00
ibrahimkhadraoui	441d8d66bd	override modify_tensors instead of get_tensors	2025-07-07 12:00:57 +04:00
ibrahimkhadraoui	c4af0f3ca5	mamba_d_ssm added to d_inner find_hparam	2025-07-07 11:17:31 +04:00
ibrahimkhadraoui	c56ec07a9a	read arch from gguf.MODEL_ARCH	2025-07-07 10:34:46 +04:00
ibrahimkhadraoui	280dd2dcb7	falcon-h1 specefic vocab resolved	2025-07-07 10:25:57 +04:00
ibrahimkhadraoui	7a25441e13	fixed multipliers	2025-07-04 17:41:03 +04:00
ibrahimkhadraoui	9760c8bc9d	conflict solve	2025-07-04 16:28:48 +04:00
ibrahimkhadraoui	2aa48dd853	Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased	2025-07-04 16:25:54 +04:00
ibrahimkhadraoui	3ee7983961	fix vocab size	2025-07-04 16:25:27 +04:00
younesbelkada	250b4f1074	mix instead of max	2025-07-04 15:53:47 +04:00
younesbelkada	1fd0574adc	try	2025-07-04 15:50:43 +04:00
ibrahimkhadraoui	2fe057cc40	Revert "fix" This reverts commit `243e4d1a50`.	2025-07-04 15:04:13 +04:00
younesbelkada	243e4d1a50	fix	2025-07-04 14:55:31 +04:00
younesbelkada	1415cd8782	another fix	2025-07-04 14:49:59 +04:00
younesbelkada	a39a8423f7	merge	2025-07-04 14:48:22 +04:00
younesbelkada	50eadc7b33	fixes	2025-07-04 14:47:31 +04:00
ibrahimkhadraoui	071f4b7fd8	changed precision for multipliers float 32->64	2025-07-04 14:37:02 +04:00
ibrahimkhadraoui	8bea92261e	python fixes	2025-07-04 14:32:11 +04:00
younesbelkada	14c37ec047	more cleaning on python code	2025-07-03 18:09:30 +04:00
Sigbjørn Skjæret	e75ba4c043	gguf-py : add support for chat template jinja files (#14508 ) * add support for chat template jinja files * remove gemma3n hack	2025-07-02 21:02:35 +02:00
compilade	5d46babdc2	llama : initial Mamba-2 support (#9126 ) * llama : initial Mamba-2 support * ggml : SIMD ggml_ssm_scan for Mamba-2 * ggml : improve ggml_mul speed when masking recurrent states * llama : support running Mamba-Codestral-7B-v0.1 * llama : fix Mamba-2 conv state saving * ggml : make the ggml_mul fast broadcast path more consistently formatted * llama : remove unused variable * llama : add missing break * convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present The tokenzier.json of Mamba-Codestral-7B-v0.1 otherwise requires workarounds to work correctly. * llama : avoid redundant state copy for Mamba 1 and 2 * metal : attempt to adapt SSM_SCAN for Mamba-2 * metal : fix SSM_SCAN pipeline scope * metal : use log and exp instead of log1pf and expf in SSM_SCAN * metal : remove unused arguments for SSM_SCAN The max index is 31, so trimming the arguments is necessary. * metal : add back n_seqs to SSM_SCAN args Whoops, this is needed for the offset in the concatenated output. * metal : fix SSM_SCAN state head offset * metal : fix wrong number of tokens per sequence in SSM_SCAN * ggml : remove unused fast broadcast path in GGML_MUL This was initially added because states were masked with ggml_mul, but this is no longer done and so this "optimisation" is no longer necessary, or at least not worth the additional code complexity. * ggml : avoid multiply by D in GGML_OP_SSM_SCAN This makes the weight buft detection in src/llama.cpp simpler. * convert : transpose Mamba-2 A, D and reshape SSM_NORM This breaks existing conversions of Mamba-2 models to avoid some reshapes. Not sure if it's a good idea, but it makes the graph slightly cleaner. * llama : more appropriate SSM_SCAN and SSM_CONV buft support checks * convert : fix flake8 lint * metal : fix confusion between ; and , * metal : add missing args for nb references in ssm_scan_f32_group * metal : single-user mamba2 inference works * kv-cache : remove const_cast when setting inputs for s_copy And also fix multi-user inference for recurrent models by using cell_id instead of i as the kv cell index when populating s_copy. * convert : avoid AutoConfig for Mamba and Mamba2 hparams * kv-cache : allow context shift for recurrent models * graph : fix recurrent state copies when avoiding copies Works, but using lambda functions might not be that clean. * ggml : fix mamba2 ssm scan when compiled with SVE * ggml-cpu : reorder SVE FMA for consistency with other SIMD arches * cuda : implement ssm scan for Mamba2 There is still room for improvement, but it works! * cuda : adapt Mamba1 ssm scan to shape changes from Mamba2 * mamba : fix mismatched new and delete size for llm_build_mamba Subclasses of llm_graph_context cannot have extra fields, because the called destructor is not the one from the subclass. This otherwise would cause problems when runnning Mamba-(1\|2) inference when compiled -DGGML_SANITIZE_ADDRESS=ON * cuda : graceful fallback for Mamba-1 models with weird embd size	2025-07-02 13:10:24 -04:00
Weizhao Ouyang	566c16fcce	model : add support for ERNIE 4.5 0.3B model (#14408 ) Add Day-0 support for Baidu ERNIE 4.5 0.3B model. Signed-off-by: Weizhao Ouyang <weizhao.ouyang@arm.com>	2025-06-28 16:08:21 +02:00
Sigbjørn Skjæret	f667f1e624	convert : fix broken sentencepiece vocab (#14416 )	2025-06-27 10:42:19 +02:00
Xuan-Son Nguyen	8846aace49	model : gemma3n text-only (#14400 ) * gemma3n * add llm_graph_input_one	2025-06-26 20:34:02 +03:00
Daniel Han	b23fa0b3f4	convert : fix Llama 4 conversion (#14311 )	2025-06-21 06:32:01 +02:00
Sigbjørn Skjæret	88fc854b4b	llama : improve sep token handling (#14272 )	2025-06-20 14:04:09 +02:00
pqnet	5fc7856815	convert : fix remote option in Windows (#14100 )	2025-06-19 12:21:40 +02:00
Sigbjørn Skjæret	3865cff4f5	convert : fix null head_dim AutoConfig regression (#14248 )	2025-06-18 09:52:07 +02:00
Đinh Trọng Huy	ad590be98c	model : add NeoBERT (#14164 ) * convert neobert model to gguf * add inference graph * fix flake8 lint * followed reviewer suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * follow reviewers suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * override NeoBERT feed-forward length --------- Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-16 14:53:41 +02:00
Bartowski	d7da8dc83a	model : Add support for Arcee AI's upcoming AFM model (#14185 ) * Add Arcee AFM support * Add draft update code * Fix linter and update URL, may still not be final * Update src/llama-model.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Remote accidental blank line --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-06-16 01:04:06 +02:00
Mikko Juola	9ae4143bc6	model : add dots.llm1 architecture support (#14044 ) (#14118 ) Adds: * Dots1Model to convert_hf_to_gguf.py * Computation graph code to llama-model.cpp * Chat template to llama-chat.cpp to detect this model's template. --- The model is called "dots.llm1" (I decided to shorten it to dots1 or DOTS1 in the code generally) architecture. The only models that exist as of writing of this commit that follow this architecture are "dots.llm1.inst" and "dots.llm1.base" from here: * https://huggingface.co/rednote-hilab/dots.llm1.inst * https://huggingface.co/rednote-hilab/dots.llm1.base The model architecture is a combination of Qwen and Deepseek parts, as seen here: `ffe12627b4/src/transformers/models/dots1/modular_dots1.py`	2025-06-15 09:52:06 +02:00
Sigbjørn Skjæret	55f6b9fa65	convert : fix duplicate key DeepSeek-R1 conversion error (#14103 )	2025-06-10 23:29:52 +02:00
Sigbjørn Skjæret	3678b838bb	llama : support GEGLU for jina-bert-v2 (#14090 )	2025-06-10 18:02:08 +02:00
Sigbjørn Skjæret	1caae7fc6c	gguf-py : add add_classifier_output_labels method to writer (#14031 ) * add add_classifier_output_labels * use add_classifier_output_labels	2025-06-05 17:42:31 +02:00
Sigbjørn Skjæret	5e1c3aed40	convert : fix nomic-bert-moe mask token (#13757 )	2025-06-01 18:07:21 +02:00
Sigbjørn Skjæret	c496fe0b1d	convert : fix vocab padding code for bert models (#13954 )	2025-06-01 17:23:11 +02:00
Sigbjørn Skjæret	db38704f01	convert : fix rwkv bos/eos token (#13844 )	2025-05-30 14:50:43 +02:00
Xuan-Son Nguyen	07e4351ce6	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 ) * convert : allow partial update to the chkhsh pre-tokenizer list * code style * update tokenizer out * rm inp/out files for models not having gguf * fixed hash for glm * skip nomic-bert-moe test * Update convert_hf_to_gguf_update.py * fix minerva-7b hash * rm redundant import	2025-05-30 12:24:37 +02:00
Đinh Trọng Huy	291f2b6913	llama : add support for DistilBert (#13907 ) * add distilbert * small fixes * add note for LLM_ARCH_DISTIL_BERT * Use MODEL_ARCH.BERT for DistilBert --------- Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>	2025-05-30 11:56:02 +02:00
Sigbjørn Skjæret	e83ba3e460	llama : add support for jina-reranker-v2 (#13900 )	2025-05-29 21:42:31 +02:00
Sigbjørn Skjæret	5ca82fc1d7	convert : workaround for AutoConfig dummy labels (#13881 )	2025-05-29 10:00:57 +02:00
Sigbjørn Skjæret	6385b843a8	llama : add RobertaForSequenceClassification reranker support (#13875 )	2025-05-29 08:15:01 +02:00
Đinh Trọng Huy	e0e3aa231d	llama : add support for BertForSequenceClassification reranker (#13858 ) * convert: add support for BertForSequenceClassification * add support for reranking using BertForSequenceClassification * merge checks of eos and sep * fix lint --------- Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>	2025-05-28 19:01:58 +02:00
Đinh Trọng Huy	aa6dff05be	convert: small addition to support LlamaModel (#13838 ) Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>	2025-05-28 16:34:18 +02:00
Xuan-Son Nguyen	a3938fb53d	convert : fix qwen omni conversion (#13859 ) * convert : fix qwen omni conversion * fix typo	2025-05-28 16:12:35 +02:00

1 2 3 4

187 Commits