llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-05 09:36:52 +00:00

Author	SHA1	Message	Date
ibrahimkhadraoui	c3c5d51c6a	added hashes	2025-07-08 13:37:14 +04:00
ibrahimkhadraoui	9b92648302	flake8 fixes	2025-07-08 13:14:47 +04:00
Younes B	d28c31a90c	Merge branch 'master' into add-fh1-rebased	2025-07-08 10:37:13 +02:00
Xuan-Son Nguyen	8f22dc0a53	model : add hunyuan moe (#14425 ) * model : add hunyuan moe * tokenizer ok * fix tensor name * cgraph init * chat template * wip * almost working * skip embed, fix bos * cleanup * yarn scaling * cleanup * correct rope type * failed token fix * ntk alpha freq_base * tokenization working * cleanup and pr changes * vocab_size sanity check * ntk alpha generic * Update convert_hf_to_gguf.py * Apply suggestions from code review * fix regression * fix style --------- Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com>	2025-07-08 11:24:06 +03:00
ibrahimkhadraoui	9a048d8de9	flake8 fixes	2025-07-08 11:45:58 +04:00
younesbelkada	adff470c8a	more cleanups and fixed conversion	2025-07-08 11:19:38 +04:00
ibrahimkhadraoui	2834a4ac10	clean	2025-07-08 11:00:30 +04:00
younesbelkada	8555ee8b2c	more cleanups on python conversion;	2025-07-08 10:41:33 +04:00
younesbelkada	d473d42832	more cleanups	2025-07-08 10:39:12 +04:00
younesbelkada	7d7da0b37e	d_ssm -> d_inner;	2025-07-08 10:18:43 +04:00
younesbelkada	632861e6c1	some cleanups	2025-07-07 17:27:34 +04:00
ibrahimkhadraoui	b6df0a49d5	add bos False	2025-07-07 16:57:52 +04:00
ibrahimkhadraoui	53446f7e42	rm unused MAMBA_CHUNK_SIZE	2025-07-07 15:29:56 +04:00
younesbelkada	e96cc73390	clean ups	2025-07-07 15:13:06 +04:00
younesbelkada	a9f3a63dc1	injected mup	2025-07-07 15:00:25 +04:00
ibrahimkhadraoui	b3bc1fb237	Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased	2025-07-07 14:36:55 +04:00
ibrahimkhadraoui	286e1fa569	fix rope_theta	2025-07-07 14:36:51 +04:00
ibrahimkhadraoui	97011d7a1f	mup_vec create as float64	2025-07-07 14:25:32 +04:00
Younes B	6c39e775dd	fix conversion and d_inner	2025-07-07 10:56:49 +02:00
ibrahimkhadraoui	441d8d66bd	override modify_tensors instead of get_tensors	2025-07-07 12:00:57 +04:00
ibrahimkhadraoui	c4af0f3ca5	mamba_d_ssm added to d_inner find_hparam	2025-07-07 11:17:31 +04:00
ibrahimkhadraoui	c56ec07a9a	read arch from gguf.MODEL_ARCH	2025-07-07 10:34:46 +04:00
ibrahimkhadraoui	280dd2dcb7	falcon-h1 specefic vocab resolved	2025-07-07 10:25:57 +04:00
ibrahimkhadraoui	7a25441e13	fixed multipliers	2025-07-04 17:41:03 +04:00
ibrahimkhadraoui	9760c8bc9d	conflict solve	2025-07-04 16:28:48 +04:00
ibrahimkhadraoui	2aa48dd853	Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased	2025-07-04 16:25:54 +04:00
ibrahimkhadraoui	3ee7983961	fix vocab size	2025-07-04 16:25:27 +04:00
younesbelkada	250b4f1074	mix instead of max	2025-07-04 15:53:47 +04:00
younesbelkada	1fd0574adc	try	2025-07-04 15:50:43 +04:00
ibrahimkhadraoui	2fe057cc40	Revert "fix" This reverts commit `243e4d1a50`.	2025-07-04 15:04:13 +04:00
younesbelkada	243e4d1a50	fix	2025-07-04 14:55:31 +04:00
younesbelkada	1415cd8782	another fix	2025-07-04 14:49:59 +04:00
younesbelkada	a39a8423f7	merge	2025-07-04 14:48:22 +04:00
younesbelkada	50eadc7b33	fixes	2025-07-04 14:47:31 +04:00
ibrahimkhadraoui	071f4b7fd8	changed precision for multipliers float 32->64	2025-07-04 14:37:02 +04:00
ibrahimkhadraoui	8bea92261e	python fixes	2025-07-04 14:32:11 +04:00
younesbelkada	14c37ec047	more cleaning on python code	2025-07-03 18:09:30 +04:00
Sigbjørn Skjæret	e75ba4c043	gguf-py : add support for chat template jinja files (#14508 ) * add support for chat template jinja files * remove gemma3n hack	2025-07-02 21:02:35 +02:00
compilade	5d46babdc2	llama : initial Mamba-2 support (#9126 ) * llama : initial Mamba-2 support * ggml : SIMD ggml_ssm_scan for Mamba-2 * ggml : improve ggml_mul speed when masking recurrent states * llama : support running Mamba-Codestral-7B-v0.1 * llama : fix Mamba-2 conv state saving * ggml : make the ggml_mul fast broadcast path more consistently formatted * llama : remove unused variable * llama : add missing break * convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present The tokenzier.json of Mamba-Codestral-7B-v0.1 otherwise requires workarounds to work correctly. * llama : avoid redundant state copy for Mamba 1 and 2 * metal : attempt to adapt SSM_SCAN for Mamba-2 * metal : fix SSM_SCAN pipeline scope * metal : use log and exp instead of log1pf and expf in SSM_SCAN * metal : remove unused arguments for SSM_SCAN The max index is 31, so trimming the arguments is necessary. * metal : add back n_seqs to SSM_SCAN args Whoops, this is needed for the offset in the concatenated output. * metal : fix SSM_SCAN state head offset * metal : fix wrong number of tokens per sequence in SSM_SCAN * ggml : remove unused fast broadcast path in GGML_MUL This was initially added because states were masked with ggml_mul, but this is no longer done and so this "optimisation" is no longer necessary, or at least not worth the additional code complexity. * ggml : avoid multiply by D in GGML_OP_SSM_SCAN This makes the weight buft detection in src/llama.cpp simpler. * convert : transpose Mamba-2 A, D and reshape SSM_NORM This breaks existing conversions of Mamba-2 models to avoid some reshapes. Not sure if it's a good idea, but it makes the graph slightly cleaner. * llama : more appropriate SSM_SCAN and SSM_CONV buft support checks * convert : fix flake8 lint * metal : fix confusion between ; and , * metal : add missing args for nb references in ssm_scan_f32_group * metal : single-user mamba2 inference works * kv-cache : remove const_cast when setting inputs for s_copy And also fix multi-user inference for recurrent models by using cell_id instead of i as the kv cell index when populating s_copy. * convert : avoid AutoConfig for Mamba and Mamba2 hparams * kv-cache : allow context shift for recurrent models * graph : fix recurrent state copies when avoiding copies Works, but using lambda functions might not be that clean. * ggml : fix mamba2 ssm scan when compiled with SVE * ggml-cpu : reorder SVE FMA for consistency with other SIMD arches * cuda : implement ssm scan for Mamba2 There is still room for improvement, but it works! * cuda : adapt Mamba1 ssm scan to shape changes from Mamba2 * mamba : fix mismatched new and delete size for llm_build_mamba Subclasses of llm_graph_context cannot have extra fields, because the called destructor is not the one from the subclass. This otherwise would cause problems when runnning Mamba-(1\|2) inference when compiled -DGGML_SANITIZE_ADDRESS=ON * cuda : graceful fallback for Mamba-1 models with weird embd size	2025-07-02 13:10:24 -04:00
Weizhao Ouyang	566c16fcce	model : add support for ERNIE 4.5 0.3B model (#14408 ) Add Day-0 support for Baidu ERNIE 4.5 0.3B model. Signed-off-by: Weizhao Ouyang <weizhao.ouyang@arm.com>	2025-06-28 16:08:21 +02:00
Sigbjørn Skjæret	f667f1e624	convert : fix broken sentencepiece vocab (#14416 )	2025-06-27 10:42:19 +02:00
Xuan-Son Nguyen	8846aace49	model : gemma3n text-only (#14400 ) * gemma3n * add llm_graph_input_one	2025-06-26 20:34:02 +03:00
Daniel Han	b23fa0b3f4	convert : fix Llama 4 conversion (#14311 )	2025-06-21 06:32:01 +02:00
Sigbjørn Skjæret	88fc854b4b	llama : improve sep token handling (#14272 )	2025-06-20 14:04:09 +02:00
pqnet	5fc7856815	convert : fix remote option in Windows (#14100 )	2025-06-19 12:21:40 +02:00
Sigbjørn Skjæret	3865cff4f5	convert : fix null head_dim AutoConfig regression (#14248 )	2025-06-18 09:52:07 +02:00
Đinh Trọng Huy	ad590be98c	model : add NeoBERT (#14164 ) * convert neobert model to gguf * add inference graph * fix flake8 lint * followed reviewer suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * follow reviewers suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * override NeoBERT feed-forward length --------- Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-16 14:53:41 +02:00
Bartowski	d7da8dc83a	model : Add support for Arcee AI's upcoming AFM model (#14185 ) * Add Arcee AFM support * Add draft update code * Fix linter and update URL, may still not be final * Update src/llama-model.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Remote accidental blank line --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-06-16 01:04:06 +02:00
Mikko Juola	9ae4143bc6	model : add dots.llm1 architecture support (#14044 ) (#14118 ) Adds: * Dots1Model to convert_hf_to_gguf.py * Computation graph code to llama-model.cpp * Chat template to llama-chat.cpp to detect this model's template. --- The model is called "dots.llm1" (I decided to shorten it to dots1 or DOTS1 in the code generally) architecture. The only models that exist as of writing of this commit that follow this architecture are "dots.llm1.inst" and "dots.llm1.base" from here: * https://huggingface.co/rednote-hilab/dots.llm1.inst * https://huggingface.co/rednote-hilab/dots.llm1.base The model architecture is a combination of Qwen and Deepseek parts, as seen here: `ffe12627b4/src/transformers/models/dots1/modular_dots1.py`	2025-06-15 09:52:06 +02:00
Sigbjørn Skjæret	55f6b9fa65	convert : fix duplicate key DeepSeek-R1 conversion error (#14103 )	2025-06-10 23:29:52 +02:00

1 2 3 4

200 Commits