Sigbjørn Skjæret
84ab83cc0b
model : jina-embeddings-v3 support ( #13693 )
...
* initial jina-embeddings-v3 support
* initial jina-embeddings-v3 support
* initial jina-embeddings-v3 support
* fix vocab parsing with only tokenizer.json
* set mask token lstrip attribute
* additional unk_token_id fallback just in case [no ci]
* revert vocab_size() change [no ci]
* merge tensor loading into general bert
* rope
* add lora embedding and loading (non-functional)
* export separate lora ggufs instead
* add adapter metadata api
* use std::string
* convert_hf_to_lora compatibility
* fix assert
* apply suggestions from review
* apply suggestion from review
2025-08-28 15:49:50 +02:00
Aldehir Rojas
b204a5a234
gpt-oss: implement harmony parsing ( #15181 )
...
* model : add harmony parser for gpt-oss
* gpt-oss : fix grammar trigger from causing empty stack
* gpt-oss: tweak the grammar trigger again
* gpt-oss : add support for recipient in role header
* gpt-oss : fix ungrouped tool calls in grammar
* gpt-oss : loosen function name matching during parse
* gpt-oss : clean up workarounds
* gpt-oss : add template tests
* gpt-oss : simulate thinking and tool call tags
* gpt-oss : undo think tags when reasoning_format is none
* gpt-oss : set special tokens back to user defined
* gpt-oss : update openai-gpt-oss template
* server : filter out harmony thought messages
* gpt-oss : simplify parsing
2025-08-14 17:23:11 +03:00
Georgi Gerganov
fd1234cb46
llama : add gpt-oss ( #15091 )
...
* oai moe
* compat with new checkpoint
* add attn sink impl
* add rope scaling yarn
* logits match with latest transformers code
* wip chat template
* rm trailing space
* use ggml_scale_bias
* rm redundant is_swa_all
* convert interleaved gate_up
* graph : fix activation function to match reference (#7 )
* vocab : handle o200k_harmony special tokens
* ggml : add attention sinks support (#1 )
* llama : add attn sinks
* ggml : add attn sinks
* cuda : add attn sinks
* vulkan : add support for sinks in softmax
remove unnecessary return
* ggml : add fused swiglu_oai op (#11 )
* ggml : add fused swiglu_oai op
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* update CUDA impl
* cont : metal impl
* add vulkan impl
* test-backend-ops : more test cases, clean up
* llama : remove unfused impl
* remove extra lines
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: slaren <slarengh@gmail.com >
* repack mxfp4 upon conversion
* clean up a bit
* enable thinking
* add quick hack to render only some special tokens
* fix bf16 conversion
* remove vocab hack
* webui ok
* support chat parsing for gpt-oss
* fix webui
* direct mapping mxfp4, FINALLY
* force using mxfp4
* properly use lazy tensor
* ggml : add mxfp4
ggml : use e8m0 conversion instead of powf
Co-authored-by: Diego Devesa <slarengh@gmail.com >
change kvalues_mxfp4 table to match e2m1 (#6 )
metal : remove quantization for now (not used)
cuda : fix disabled CUDA graphs due to ffn moe bias
vulkan : add support for mxfp4
cont : add cm2 dequant
* ggml : add ggml_add_id (#13 )
* ggml : add ggml_add_id
* add cuda impl
* llama : add weight support check for add_id
* perf opt
* add vulkan impl
* rename cuda files
* add metal impl
* allow in-place ggml_add_id
* llama : keep biases on CPU with --cpu-moe
* llama : fix compile error
ggml-ci
* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw
ggml-ci
* cleanup
ggml-ci
* sycl : fix supports_op for MXFP4
ggml-ci
* fix Unknown reasoning format
* ggml-cpu : fix AVX build
ggml-ci
* fix hip build
ggml-ci
* cuda : add mxfp4 dequantization support for cuBLAS
ggml-ci
* ggml-cpu : fix mxfp4 fallback definitions for some architectures
ggml-ci
* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: slaren <slarengh@gmail.com >
2025-08-05 22:10:36 +03:00
Sam
ef0144c087
model: support GLM 4.5 family of models ( #14939 )
...
* model: Add GLM 4.5 (#14921 )
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Merge in PR suggestions
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: Add GLM 4.5 family of models (#14921 )
1. Updated tensor_mapping.py with NextN tensor mappings
- Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py
- Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm
2. Added num_nextn_predict_layers configuration
- Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp
- Added num_nextn_predict_layers field to llama_hparams struct
- Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter
- Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers
- Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method
- Updated conversion script to extract and write this parameter from HuggingFace config
3. Added FIM tokens for GLM4_MOE
- Added GLM-4.5's FIM tokens to llama-vocab.cpp:
- <|code_prefix|> for FIM_PRE
- <|code_suffix|> for FIM_SUF
- <|code_middle|> for FIM_MID
4. Removed manual NextN tensor handling
- Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors
- NextN tensors are now handled automatically through the proper tensor mapping system
* glm 4.5 update tensors names
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: glm 4.5 apply suggestions from code review
* Apply suggestions from code review
* patch broken chat template
* typings fix
* add TENSOR_SKIP flag
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* Update src/llama-model-loader.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-08-04 20:29:25 +02:00
Csaba Kecskemeti
97366dc6ab
vocab : JetBrains Mellum pre-tokenizer ( #15045 )
2025-08-03 21:38:18 +02:00
stevenkuang
0f5ccd6fd1
model : add hunyuan dense ( #14878 )
...
* support hunyuan_v1_dense
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* update hunyuan_moe to hunyuan_v1_moe
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix rope alpha assert and bos token
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* add blank line
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* Revert "update hunyuan_moe to hunyuan_v1_moe"
This reverts commit aa973ca219 .
* use hunyuan_dense instead of hunyuan_v1_dense
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix hunyuan_moe chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* remove leftover code
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* update hunyuan dense chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix hunyuan dense vocab and chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
---------
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
2025-08-01 15:31:12 +02:00
lgai-exaone
e0cb5c5cb8
model : add EXAONE 4.0 support ( #14630 )
2025-07-18 10:45:49 +02:00
Aman Gupta
ab14019821
Support diffusion models: Add Dream 7B ( #14644 )
...
* Support diffusion models: Add Dream 7B
* Move diffusion to examples
* Move stuff to examples. Add patch to not use kv-cache
* Address review comments
* Make sampling fast
* llama: remove diffusion functions
* Add basic timings + cleanup
* More cleanup
* Review comments: better formating, use LOG instead std::cerr, re-use batch, use ubatch instead of max_length
* fixup!
* Review: move everything to diffusion-cli for now
2025-07-16 20:03:51 +08:00
Gabriel Larson
4a4f426944
model : add Kimi-K2 support ( #14654 )
...
* Kimi-K2 conversion
* add Kimi_K2 pre type
* Kimi-K2
* Kimi-K2 unicode
* Kimi-K2
* LLAMA_MAX_EXPERTS 384
* fix vocab iteration
* regex space fix
* add kimi-k2 to pre_computed_hashes
* Updated with kimi-k2 get_vocab_base_pre hash
* fix whitespaces
* fix flake errors
* remove more unicode.cpp whitespaces
* change set_vocab() flow
* add moonshotai-Kimi-K2.jinja to /models/templates/
* update moonshotai-Kimi-K2.jinja
* add kimi-k2 chat template
* add kimi-k2
* update NotImplementedError
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* except Exception
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){}
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-07-15 21:54:22 +02:00
Shunta Saito
68e37a61a7
model : add PLaMo-2 support ( #14560 )
...
* Add PLaMo-2 model using hybrid memory module
* Fix z shape
* Add cmath to include from llama-vocab.h
* Explicitly dequantize normalization weights before RoPE apply
* Revert unnecessary cast because the problem can be solved by excluding attn_k, attn_q when quantizing
* Use ATTN_K/Q_NORM for k,q weights to prevent quantization
* Remove SSM_BCDT that is not used from anywhere
* Do not duplicate embedding weights for output.weight
* Fix tokenizer encoding problem for multibyte strings
* Apply suggestion from @CISC
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Use LLM_FFN_SWIGLU instead of splitting ffn_gate and ffn_up
* Remove unnecessary part for Grouped Query Attention
* Fix how to load special token id to gguf
* Remove unused tensor mapping
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Remove llama_vocab_plamo2 class and replace it with llm_tokenizer_plamo2_session to follow the other tokenizer implementations
* Update src/llama-vocab.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Fix plamo2 tokenizer session to prevent multiple calls of build()
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-07-15 18:11:42 +02:00
Tarek Dakhran
f5e96b368f
model : support LiquidAI LFM2 hybrid family ( #14620 )
...
**Important**
LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340 )into transformers, but has not yet been released.
To convert into gguf, install transformers from source
```shell
pip install "transformers @ git+https://github.com/huggingface/transformers.git@main "
```
2025-07-11 20:27:01 +02:00
Dowon
576c82eda2
vocab : add midm-2.0 model pre-tokenizer ( #14626 )
2025-07-11 09:36:04 +02:00
Ryan Mangeno
4bb625b713
Smoldocling support ( #14597 )
...
* support for smoldocling
* fixed merge conflicts
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
* merge conflicts
* pre tokenizer merge fix
* convert : fix smollm3 jinja template (#14586 )
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* support for smoldocling
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* fixed merge conflicts
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* safetensors tensor mapping
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* added back accidental removal of clean spaces for hunyuan
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* updated hash and reordererd model list
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update include/llama.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf_update.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* removed old tensor name
* removed tensor mappings -> handled by smolvlm
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: compilade <git@compilade.net >
2025-07-10 19:41:00 +02:00
Dowon
ffd59e7d18
model : add skt/A.X-4.0 model vocabulary ( #14589 )
2025-07-09 11:22:31 +03:00
ibrahim khadraoui
04655063c4
model : add support for Falcon-H1 family ( #14534 )
...
* v1
* push more fixes
* another fix
* fix
* more fixes
* minor fix
* more cleaning on python code
* python fixes
* changed precision for multipliers float 32->64
* fixes
* another fix
* fix
* pre-norm -> norm
* fix
* Revert "fix"
This reverts commit 243e4d1a50 .
* fix
* small fix ffn_norm
* try
* mix instead of max
* fix vocab size
* conflict solve
* fixed multipliers
* falcon-h1 specefic vocab resolved
* read arch from gguf.MODEL_ARCH
* mamba_d_ssm added to d_inner find_hparam
* remove unused functions from gguf_writer.py
* override modify_tensors instead of get_tensors
* fix conversion and d_inner
* added some cb functions for debugging puposes
* inp_out_ids moved outside of layers loop
* mup_vec create as float64
* fix rope_theta
* injected mup
* clean ups
* rm extra space
* rm unused MAMBA_CHUNK_SIZE
* rm unused key
* add bos False
* changed ROPE_TYPE
* cleaning debugging stuff
* cleaning debug quant
* fix comment
* some cleanups
* some cleanups
* Update src/llama-model-loader.cpp
* more cleanups
* moe cleanuips
* d_ssm -> d_inner;
* cleaning unused hparams
* cleanup
* more cleanups
* more cleanups on python conversion;
* minor cleanups
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* remove todo
* added falcon-h1
* tensor not required
* clean
* remove unneeded attributes
* more cleanups and fixed conversion
* remove final_norm
* flake8 fixes
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* flake8 fixes
* Update src/llama-hparams.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-arch.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* added hashes
* Update src/llama-arch.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-vocab.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* update the update file
* Revert "update the update file"
This reverts commit 082ab4ad2a .
* fix: address suggestions
* fix: update convert_hf_to_gguf.py
* Update gguf-py/gguf/constants.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model-loader.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* d_inner fixed
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* reshaping ssm_norm for 34B
* removing generate_mup
* remove duplicates metadata keys
* rm comment
* final comment
* fix unused args
* fix constants
* fix bad merge
* Update src/llama-model.cpp
Co-authored-by: compilade <git@compilade.net >
* falcon-h1: remove unused ssm_in_b and bad merge
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* falcon-h1: fix last comment
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net >
* falcon-h1: revert add_add_bos(False)
* falcon-h1: fix tied weights
* falcon-h1: remove whitespace
* falcon-h1: fix wrong size param
* falcon-h1: fix whitespace issues
---------
Co-authored-by: younesbelkada <younes.belkada@tii.ae >
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: compilade <git@compilade.net >
2025-07-09 10:03:49 +02:00
Xuan-Son Nguyen
8f22dc0a53
model : add hunyuan moe ( #14425 )
...
* model : add hunyuan moe
* tokenizer ok
* fix tensor name
* cgraph init
* chat template
* wip
* almost working
* skip embed, fix bos
* cleanup
* yarn scaling
* cleanup
* correct rope type
* failed token fix
* ntk alpha freq_base
* tokenization working
* cleanup and pr changes
* vocab_size sanity check
* ntk alpha generic
* Update convert_hf_to_gguf.py
* Apply suggestions from code review
* fix regression
* fix style
---------
Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com >
2025-07-08 11:24:06 +03:00
Sigbjørn Skjæret
22015b2092
lint : remove trailing whitepace ( #14304 )
2025-06-20 16:37:44 +02:00
Ruikai Peng
dd6e6d0b6a
vocab : prevent tokenizer overflow ( #14301 )
...
* vocab : prevent stack overflow in tokenize
* vocab : return error instead of aborting on oversized token count
* vocab : INT32_MIN from llama_tokenize on overflow
2025-06-20 07:13:06 -07:00
Sigbjørn Skjæret
88fc854b4b
llama : improve sep token handling ( #14272 )
2025-06-20 14:04:09 +02:00
fanyang
456af35eb7
build : suppress gcc15 compile warnings ( #14261 )
...
* Change _contains_any() substrs to std::string_view and fix the find comparison logic.
2025-06-19 14:49:48 +02:00
Bartowski
d7da8dc83a
model : Add support for Arcee AI's upcoming AFM model ( #14185 )
...
* Add Arcee AFM support
* Add draft update code
* Fix linter and update URL, may still not be final
* Update src/llama-model.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Remote accidental blank line
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
2025-06-16 01:04:06 +02:00
Georgi Gerganov
fb85a288d7
vocab : fix build ( #14175 )
...
ggml-ci
2025-06-13 20:03:05 +03:00
Guy Goldenberg
3cfbbdb44e
Merge commit from fork
...
* vocab : prevent integer overflow during load
* Add static cast and GGML_ABORT
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-13 19:20:25 +03:00
Georgi Gerganov
c33fe8b8c4
vocab : prevent heap overflow when vocab is too small ( #14145 )
...
ggml-ci
2025-06-13 08:03:54 +03:00
Sigbjørn Skjæret
9f47fa5792
vocab : warn about missing mask token ( #14022 )
2025-06-05 09:29:18 +02:00
Sigbjørn Skjæret
5e1c3aed40
convert : fix nomic-bert-moe mask token ( #13757 )
2025-06-01 18:07:21 +02:00
Sigbjørn Skjæret
c3a2624339
vocab : fix ugm tokenizer precision ( #13743 )
2025-05-24 12:29:09 +02:00
Johannes Gäßler
10d2af0eaa
llama/ggml: add LLM training support ( #10544 )
...
* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
2025-05-12 14:44:49 +02:00
Sigbjørn Skjæret
d2a4ef05c6
vocab : add ByteDance-Seed/Seed-Coder ( #13423 )
2025-05-10 22:08:07 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
Mikko Juola
971f245b3b
llama : recognize IBM Granite 3.3 FIM tokens ( #12988 )
...
The Granite's FIM tokens are very similar to Qwen's; it's just that
they use underscore instead of a dash. So <fim_middle> for example
instead of <fim-middle>.
Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base
shows:
```
"<fim_prefix>",
"<fim_middle>",
"<fim_suffix>",
"<fim_pad>",
...
"<reponame>",
```
2025-04-17 11:37:05 +03:00
Yuxuan Zhang
06bb53ad9b
llama-model : add Glm4Model implementation for GLM-4-0414 ( #12867 )
...
* GLM-4-0414
* use original one
* Using with tensor map
* fix bug
* change order
* change order
* format with flask8
2025-04-11 12:10:10 +02:00
Xuan-Son Nguyen
1466621e73
llama : Support llama 4 text-only ( #12791 )
...
* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build
2025-04-07 23:06:44 +02:00
yumeyao
5dd5d1ab00
vocab : use string_view::find() to avoid unnecessary looking up beyond the fragment range ( #12706 )
2025-04-03 18:32:54 +03:00
Sigbjørn Skjæret
83a88bd6af
vocab : BailingMoE : change possessive quantifiers to greedy ( #12677 )
2025-04-02 11:21:48 +02:00
Daniel Bevenius
c80a7759da
vocab : add special infill tokens for CodeLlama ( #11850 )
...
* vocab : add special infill tokens for CodeLlama
The commit adds the following special tokens for CodeLlama infill:
- `▁<PRE>`
- `▁<SUF>`
- `▁<MID>`
The motivation for this is that currently the infill example uses
CodeLlama as a suggested model. But when using this model the following
error is generated:
```console
/llama.cpp-debug/examples/infill/infill.cpp:165: GGML_ASSERT(llama_vocab_fim_pre(vocab) >= 0) failed
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
305251 Aborted (core dumped)
./build/bin/llama-infill -t 10 -ngl 0 -m models/codellama-13b.Q5_K_S.gguf \
-c 4096 --temp 0.7 --repeat_penalty 1.1 -n 20 \
--in-prefix "def helloworld():\n print(\"hell" \
--in-suffix "\n print(\"goodbye world\")\n "
```
* squash! vocab : add special infill tokens for CodeLlama
Add _<EOT> as well.
2025-03-31 18:40:56 +02:00
Sigbjørn Skjæret
2c3f8b850a
llama : support BailingMoE (Ling) ( #12634 )
2025-03-30 22:21:03 +02:00
Juyoung Suk
b3de7cac73
llama : add Trillion 7B model support ( #12556 )
...
* Support Trillion 7B
* Update llama.h
* Update llama.h
* Update llama-vocab.cpp for Trillion
* Update llama-vocab.cpp
2025-03-30 20:38:33 +02:00
compilade
00d53800e0
llama-vocab : add SuperBPE pre-tokenizer ( #12532 )
2025-03-24 11:47:24 +01:00
mgroeber9110
5bbe6a9fe9
ggml : portability fixes for VS 2017 ( #12150 )
...
* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com >
2025-03-04 18:53:26 +02:00
Xuan-Son Nguyen
c43a3e7996
llama : add Phi-4-mini support (supersede #12099 ) ( #12108 )
...
* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com >
2025-02-28 12:44:11 +01:00
mgroeber9110
ffd0821c57
vocab : correctly identify LF token for GPT-2 style BPE tokenizer ( #11496 )
2025-01-30 12:10:59 +02:00
lexasub
a5203b4465
llama : minor fixes for up llama load model speed ( #11448 )
...
* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%
* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
* Update src/llama-vocab.cpp
---------
Co-authored-by: lexasub <empty@empty.ru >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-01-27 14:42:09 +01:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
Georgi Gerganov
a133566d34
vocab : fix double-eos check ( #11273 )
...
ggml-ci
2025-01-17 09:28:00 +02:00
Georgi Gerganov
bbf3e55e35
vocab : add dummy tokens for "no_vocab" type ( #11231 )
...
* vocab : add dummy tokens for "no_vocab" type
ggml-ci
* vocab : minor [no ci]
2025-01-14 11:54:58 +01:00
Daniel Bevenius
8f70fc3d1b
llama : remove 'd' from bad special token log ( #11212 )
...
This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
'tokenizer.ggml.image_token_id' = 128256d, using default id -1
```
2025-01-13 13:38:20 +01:00
Georgi Gerganov
08f10f69c3
llama : remove notion of CLS token ( #11064 )
...
ggml-ci
2025-01-12 12:15:53 +02:00
Georgi Gerganov
afa8a9ec9b
llama : add llama_vocab, functions -> methods, naming ( #11110 )
...
* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-01-12 11:32:42 +02:00
Georgi Gerganov
727368c60f
llama : use LLAMA_TOKEN_NULL ( #11062 )
...
ggml-ci
2025-01-06 10:52:15 +02:00