Commit Graph

200 Commits

Author SHA1 Message Date
ibrahimkhadraoui
c3c5d51c6a added hashes 2025-07-08 13:37:14 +04:00
ibrahimkhadraoui
9b92648302 flake8 fixes 2025-07-08 13:14:47 +04:00
Younes B
d28c31a90c Merge branch 'master' into add-fh1-rebased 2025-07-08 10:37:13 +02:00
Xuan-Son Nguyen
8f22dc0a53 model : add hunyuan moe (#14425)
* model : add hunyuan moe

* tokenizer ok

* fix tensor name

* cgraph init

* chat template

* wip

* almost working

* skip embed, fix bos

* cleanup

* yarn scaling

* cleanup

* correct rope type

* failed token fix

* ntk alpha freq_base

* tokenization working

* cleanup and pr changes

* vocab_size sanity check

* ntk alpha generic

* Update convert_hf_to_gguf.py

* Apply suggestions from code review

* fix regression

* fix style

---------

Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com>
2025-07-08 11:24:06 +03:00
ibrahimkhadraoui
9a048d8de9 flake8 fixes 2025-07-08 11:45:58 +04:00
younesbelkada
adff470c8a more cleanups and fixed conversion 2025-07-08 11:19:38 +04:00
ibrahimkhadraoui
2834a4ac10 clean 2025-07-08 11:00:30 +04:00
younesbelkada
8555ee8b2c more cleanups on python conversion; 2025-07-08 10:41:33 +04:00
younesbelkada
d473d42832 more cleanups 2025-07-08 10:39:12 +04:00
younesbelkada
7d7da0b37e d_ssm -> d_inner; 2025-07-08 10:18:43 +04:00
younesbelkada
632861e6c1 some cleanups 2025-07-07 17:27:34 +04:00
ibrahimkhadraoui
b6df0a49d5 add bos False 2025-07-07 16:57:52 +04:00
ibrahimkhadraoui
53446f7e42 rm unused MAMBA_CHUNK_SIZE 2025-07-07 15:29:56 +04:00
younesbelkada
e96cc73390 clean ups 2025-07-07 15:13:06 +04:00
younesbelkada
a9f3a63dc1 injected mup 2025-07-07 15:00:25 +04:00
ibrahimkhadraoui
b3bc1fb237 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-07 14:36:55 +04:00
ibrahimkhadraoui
286e1fa569 fix rope_theta 2025-07-07 14:36:51 +04:00
ibrahimkhadraoui
97011d7a1f mup_vec create as float64 2025-07-07 14:25:32 +04:00
Younes B
6c39e775dd fix conversion and d_inner 2025-07-07 10:56:49 +02:00
ibrahimkhadraoui
441d8d66bd override modify_tensors instead of get_tensors 2025-07-07 12:00:57 +04:00
ibrahimkhadraoui
c4af0f3ca5 mamba_d_ssm added to d_inner find_hparam 2025-07-07 11:17:31 +04:00
ibrahimkhadraoui
c56ec07a9a read arch from gguf.MODEL_ARCH 2025-07-07 10:34:46 +04:00
ibrahimkhadraoui
280dd2dcb7 falcon-h1 specefic vocab resolved 2025-07-07 10:25:57 +04:00
ibrahimkhadraoui
7a25441e13 fixed multipliers 2025-07-04 17:41:03 +04:00
ibrahimkhadraoui
9760c8bc9d conflict solve 2025-07-04 16:28:48 +04:00
ibrahimkhadraoui
2aa48dd853 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-04 16:25:54 +04:00
ibrahimkhadraoui
3ee7983961 fix vocab size 2025-07-04 16:25:27 +04:00
younesbelkada
250b4f1074 mix instead of max 2025-07-04 15:53:47 +04:00
younesbelkada
1fd0574adc try 2025-07-04 15:50:43 +04:00
ibrahimkhadraoui
2fe057cc40 Revert "fix"
This reverts commit 243e4d1a50.
2025-07-04 15:04:13 +04:00
younesbelkada
243e4d1a50 fix 2025-07-04 14:55:31 +04:00
younesbelkada
1415cd8782 another fix 2025-07-04 14:49:59 +04:00
younesbelkada
a39a8423f7 merge 2025-07-04 14:48:22 +04:00
younesbelkada
50eadc7b33 fixes 2025-07-04 14:47:31 +04:00
ibrahimkhadraoui
071f4b7fd8 changed precision for multipliers float 32->64 2025-07-04 14:37:02 +04:00
ibrahimkhadraoui
8bea92261e python fixes 2025-07-04 14:32:11 +04:00
younesbelkada
14c37ec047 more cleaning on python code 2025-07-03 18:09:30 +04:00
Sigbjørn Skjæret
e75ba4c043 gguf-py : add support for chat template jinja files (#14508)
* add support for chat template jinja files

* remove gemma3n hack
2025-07-02 21:02:35 +02:00
compilade
5d46babdc2 llama : initial Mamba-2 support (#9126)
* llama : initial Mamba-2 support

* ggml : SIMD ggml_ssm_scan for Mamba-2

* ggml : improve ggml_mul speed when masking recurrent states

* llama : support running Mamba-Codestral-7B-v0.1

* llama : fix Mamba-2 conv state saving

* ggml : make the ggml_mul fast broadcast path more consistently formatted

* llama : remove unused variable

* llama : add missing break

* convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present

The tokenzier.json of Mamba-Codestral-7B-v0.1 otherwise requires
workarounds to work correctly.

* llama : avoid redundant state copy for Mamba 1 and 2

* metal : attempt to adapt SSM_SCAN for Mamba-2

* metal : fix SSM_SCAN pipeline scope

* metal : use log and exp instead of log1pf and expf in SSM_SCAN

* metal : remove unused arguments for SSM_SCAN

The max index is 31, so trimming the arguments is necessary.

* metal : add back n_seqs to SSM_SCAN args

Whoops, this is needed for the offset in the concatenated output.

* metal : fix SSM_SCAN state head offset

* metal : fix wrong number of tokens per sequence in SSM_SCAN

* ggml : remove unused fast broadcast path in GGML_MUL

This was initially added because states were masked with ggml_mul,
but this is no longer done and so this "optimisation" is no longer
necessary, or at least not worth the additional code complexity.

* ggml : avoid multiply by D in GGML_OP_SSM_SCAN

This makes the weight buft detection in src/llama.cpp simpler.

* convert : transpose Mamba-2 A, D and reshape SSM_NORM

This breaks existing conversions of Mamba-2 models
to avoid some reshapes.

Not sure if it's a good idea,
but it makes the graph slightly cleaner.

* llama : more appropriate SSM_SCAN and SSM_CONV buft support checks

* convert : fix flake8 lint

* metal : fix confusion between ; and ,

* metal : add missing args for nb references in ssm_scan_f32_group

* metal : single-user mamba2 inference works

* kv-cache : remove const_cast when setting inputs for s_copy

And also fix multi-user inference for recurrent models
by using cell_id instead of i as the kv cell index
when populating s_copy.

* convert : avoid AutoConfig for Mamba and Mamba2 hparams

* kv-cache : allow context shift for recurrent models

* graph : fix recurrent state copies when avoiding copies

Works, but using lambda functions might not be that clean.

* ggml : fix mamba2 ssm scan when compiled with SVE

* ggml-cpu : reorder SVE FMA for consistency with other SIMD arches

* cuda : implement ssm scan for Mamba2

There is still room for improvement, but it works!

* cuda : adapt Mamba1 ssm scan to shape changes from Mamba2

* mamba : fix mismatched new and delete size for llm_build_mamba

Subclasses of llm_graph_context cannot have extra fields,
because the called destructor is not the one from the subclass.
This otherwise would cause problems when runnning Mamba-(1|2) inference
when compiled -DGGML_SANITIZE_ADDRESS=ON

* cuda : graceful fallback for Mamba-1 models with weird embd size
2025-07-02 13:10:24 -04:00
Weizhao Ouyang
566c16fcce model : add support for ERNIE 4.5 0.3B model (#14408)
Add Day-0 support for Baidu ERNIE 4.5 0.3B model.

Signed-off-by: Weizhao Ouyang <weizhao.ouyang@arm.com>
2025-06-28 16:08:21 +02:00
Sigbjørn Skjæret
f667f1e624 convert : fix broken sentencepiece vocab (#14416) 2025-06-27 10:42:19 +02:00
Xuan-Son Nguyen
8846aace49 model : gemma3n text-only (#14400)
* gemma3n

* add llm_graph_input_one
2025-06-26 20:34:02 +03:00
Daniel Han
b23fa0b3f4 convert : fix Llama 4 conversion (#14311) 2025-06-21 06:32:01 +02:00
Sigbjørn Skjæret
88fc854b4b llama : improve sep token handling (#14272) 2025-06-20 14:04:09 +02:00
pqnet
5fc7856815 convert : fix remote option in Windows (#14100) 2025-06-19 12:21:40 +02:00
Sigbjørn Skjæret
3865cff4f5 convert : fix null head_dim AutoConfig regression (#14248) 2025-06-18 09:52:07 +02:00
Đinh Trọng Huy
ad590be98c model : add NeoBERT (#14164)
* convert neobert model to gguf

* add inference graph

* fix flake8 lint

* followed reviewer suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* follow reviewers suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* override NeoBERT feed-forward length

---------

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-16 14:53:41 +02:00
Bartowski
d7da8dc83a model : Add support for Arcee AI's upcoming AFM model (#14185)
* Add Arcee AFM support

* Add draft update code

* Fix linter and update URL, may still not be final

* Update src/llama-model.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Remote accidental blank line

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-06-16 01:04:06 +02:00
Mikko Juola
9ae4143bc6 model : add dots.llm1 architecture support (#14044) (#14118)
Adds:

* Dots1Model to convert_hf_to_gguf.py

* Computation graph code to llama-model.cpp

* Chat template to llama-chat.cpp to detect this model's template.

---

The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.

The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:

* https://huggingface.co/rednote-hilab/dots.llm1.inst

* https://huggingface.co/rednote-hilab/dots.llm1.base

The model architecture is a combination of Qwen and Deepseek parts, as
seen here:

ffe12627b4/src/transformers/models/dots1/modular_dots1.py
2025-06-15 09:52:06 +02:00
Sigbjørn Skjæret
55f6b9fa65 convert : fix duplicate key DeepSeek-R1 conversion error (#14103) 2025-06-10 23:29:52 +02:00