Piotr Wilkin (ilintar)
b1afcab804
model : add support for Seed-OSS ( #15490 )
...
* First draft
* Fix linter errors
* Added missing sinks nullptr
* Don't forget the llama-arch!
* We're through to the generation stage.
* Fix post-attention norm
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Fix RoPE type
* Fix tensor name and reorder llm_types
* Update gguf-py/gguf/constants.py
Remove nonexistent FFN_POST_NORM tensor
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Add basic chat template
* Add chat template tests
* Remake chat template test
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-chat.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Reorder llm type descriptions
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-08-23 15:21:52 +02:00
Georgi Gerganov
fd1234cb46
llama : add gpt-oss ( #15091 )
...
* oai moe
* compat with new checkpoint
* add attn sink impl
* add rope scaling yarn
* logits match with latest transformers code
* wip chat template
* rm trailing space
* use ggml_scale_bias
* rm redundant is_swa_all
* convert interleaved gate_up
* graph : fix activation function to match reference (#7 )
* vocab : handle o200k_harmony special tokens
* ggml : add attention sinks support (#1 )
* llama : add attn sinks
* ggml : add attn sinks
* cuda : add attn sinks
* vulkan : add support for sinks in softmax
remove unnecessary return
* ggml : add fused swiglu_oai op (#11 )
* ggml : add fused swiglu_oai op
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* update CUDA impl
* cont : metal impl
* add vulkan impl
* test-backend-ops : more test cases, clean up
* llama : remove unfused impl
* remove extra lines
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: slaren <slarengh@gmail.com >
* repack mxfp4 upon conversion
* clean up a bit
* enable thinking
* add quick hack to render only some special tokens
* fix bf16 conversion
* remove vocab hack
* webui ok
* support chat parsing for gpt-oss
* fix webui
* direct mapping mxfp4, FINALLY
* force using mxfp4
* properly use lazy tensor
* ggml : add mxfp4
ggml : use e8m0 conversion instead of powf
Co-authored-by: Diego Devesa <slarengh@gmail.com >
change kvalues_mxfp4 table to match e2m1 (#6 )
metal : remove quantization for now (not used)
cuda : fix disabled CUDA graphs due to ffn moe bias
vulkan : add support for mxfp4
cont : add cm2 dequant
* ggml : add ggml_add_id (#13 )
* ggml : add ggml_add_id
* add cuda impl
* llama : add weight support check for add_id
* perf opt
* add vulkan impl
* rename cuda files
* add metal impl
* allow in-place ggml_add_id
* llama : keep biases on CPU with --cpu-moe
* llama : fix compile error
ggml-ci
* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw
ggml-ci
* cleanup
ggml-ci
* sycl : fix supports_op for MXFP4
ggml-ci
* fix Unknown reasoning format
* ggml-cpu : fix AVX build
ggml-ci
* fix hip build
ggml-ci
* cuda : add mxfp4 dequantization support for cuBLAS
ggml-ci
* ggml-cpu : fix mxfp4 fallback definitions for some architectures
ggml-ci
* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: slaren <slarengh@gmail.com >
2025-08-05 22:10:36 +03:00
stevenkuang
0f5ccd6fd1
model : add hunyuan dense ( #14878 )
...
* support hunyuan_v1_dense
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* update hunyuan_moe to hunyuan_v1_moe
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix rope alpha assert and bos token
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* add blank line
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* Revert "update hunyuan_moe to hunyuan_v1_moe"
This reverts commit aa973ca219 .
* use hunyuan_dense instead of hunyuan_v1_dense
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix hunyuan_moe chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* remove leftover code
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* update hunyuan dense chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix hunyuan dense vocab and chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
---------
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
2025-08-01 15:31:12 +02:00
lgai-exaone
e0cb5c5cb8
model : add EXAONE 4.0 support ( #14630 )
2025-07-18 10:45:49 +02:00
Gabriel Larson
4a4f426944
model : add Kimi-K2 support ( #14654 )
...
* Kimi-K2 conversion
* add Kimi_K2 pre type
* Kimi-K2
* Kimi-K2 unicode
* Kimi-K2
* LLAMA_MAX_EXPERTS 384
* fix vocab iteration
* regex space fix
* add kimi-k2 to pre_computed_hashes
* Updated with kimi-k2 get_vocab_base_pre hash
* fix whitespaces
* fix flake errors
* remove more unicode.cpp whitespaces
* change set_vocab() flow
* add moonshotai-Kimi-K2.jinja to /models/templates/
* update moonshotai-Kimi-K2.jinja
* add kimi-k2 chat template
* add kimi-k2
* update NotImplementedError
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* except Exception
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){}
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-07-15 21:54:22 +02:00
Xuan-Son Nguyen
8f22dc0a53
model : add hunyuan moe ( #14425 )
...
* model : add hunyuan moe
* tokenizer ok
* fix tensor name
* cgraph init
* chat template
* wip
* almost working
* skip embed, fix bos
* cleanup
* yarn scaling
* cleanup
* correct rope type
* failed token fix
* ntk alpha freq_base
* tokenization working
* cleanup and pr changes
* vocab_size sanity check
* ntk alpha generic
* Update convert_hf_to_gguf.py
* Apply suggestions from code review
* fix regression
* fix style
---------
Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com >
2025-07-08 11:24:06 +03:00
Mikko Juola
9ae4143bc6
model : add dots.llm1 architecture support ( #14044 ) ( #14118 )
...
Adds:
* Dots1Model to convert_hf_to_gguf.py
* Computation graph code to llama-model.cpp
* Chat template to llama-chat.cpp to detect this model's template.
---
The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.
The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:
* https://huggingface.co/rednote-hilab/dots.llm1.inst
* https://huggingface.co/rednote-hilab/dots.llm1.base
The model architecture is a combination of Qwen and Deepseek parts, as
seen here:
ffe12627b4/src/transformers/models/dots1/modular_dots1.py
2025-06-15 09:52:06 +02:00
Xuan-Son Nguyen
3f96aeff39
llama : one-off chat template fix for Mistral-Small-2503 ( #13398 )
...
* llama : one-off chat template fix for Mistral-Small-2503
* update readme
* add mistral-v7-tekken
2025-05-09 11:17:51 +02:00
Xuan-Son Nguyen
e5d6c2554e
llama-chat : fix typo GML --> GLM ( #13143 )
2025-04-28 10:11:58 +02:00
Xuan-Son Nguyen
dc39a5e7a8
mtmd : support SmolVLM (version 1 and 2) ( #13050 )
...
* mtmd : support SmolVLM (version 1 and 2)
* correct chat template
* fix n_patches
* scale_factor is an int
* add more models to test
2025-04-22 16:24:54 +02:00
Xuan-Son Nguyen
1466621e73
llama : Support llama 4 text-only ( #12791 )
...
* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build
2025-04-07 23:06:44 +02:00
Sigbjørn Skjæret
2c3f8b850a
llama : support BailingMoE (Ling) ( #12634 )
2025-03-30 22:21:03 +02:00
Sergei Vorobyov
7242dd9675
llama-chat : Add Yandex instruct model template support ( #12621 )
...
* add yandex template
* update yandex chat template
* fix tests
* adjust chat template
* fix style
* fix tool macro in template
* add clarify comment
---------
Co-authored-by: Sergei Vorobev <serv01@yandex-team.ru >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-03-30 20:12:03 +02:00
piDack
0cec062a63
llama : add support for GLM-Edge and GLM-Edge-V series models ( #10573 )
...
* add glm edge chat model
* use config partial_rotary_factor as rope ratio
* support for glm edge model
* vision model support
* remove debug info
* fix format
* llava.cpp trailing whitespace
* remove unused AutoTokenizer
* Update src/llama.cpp for not contain <|end|> or </s>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add edge template
* fix chat template
* fix confict
* fix confict
* fix ci err
* fix format err
* fix template err
* 9b hf chat support
* format
* format clip.cpp
* fix format
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/llava/clip.cpp
* fix format
* minor : style
---------
Co-authored-by: liyuhang <yuhang.li@zhipuai.cn >
Co-authored-by: piDack <pcdack@hotmail.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: liyuhang <yuhang.li@aminer.cn >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-02 09:48:46 +02:00
Xuan Son Nguyen
d9feae1c06
llama-chat : add phi 4 template ( #11148 )
2025-01-09 10:07:33 +01:00
fairydreaming
9394bbd484
llama : Add support for DeepSeek V3 ( #11049 )
...
* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com >
2025-01-04 21:06:11 +01:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp ( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00