Tarek Dakhran 
							
						 
					 
					
						
						
							
						
						f5e96b368f 
					 
					
						
						
							
							model : support LiquidAI LFM2 hybrid family ( #14620 )  
						
						 
						
						... 
						
						
						
						**Important**
LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340 )into transformers, but has not yet been released.
To convert into gguf, install transformers from source
```shell
pip install "transformers @ git+https://github.com/huggingface/transformers.git@main "
``` 
						
						
					 
					
						2025-07-11 20:27:01 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Dowon 
							
						 
					 
					
						
						
							
						
						576c82eda2 
					 
					
						
						
							
							vocab : add midm-2.0 model pre-tokenizer ( #14626 )  
						
						 
						
						
						
						
					 
					
						2025-07-11 09:36:04 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ryan Mangeno 
							
						 
					 
					
						
						
							
						
						4bb625b713 
					 
					
						
						
							
							Smoldocling support ( #14597 )  
						
						 
						
						... 
						
						
						
						* support for smoldocling
* fixed merge conflicts
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
* merge conflicts
* pre tokenizer merge fix
* convert : fix smollm3 jinja template (#14586 )
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* support for smoldocling
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* fixed merge conflicts
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* safetensors tensor mapping
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
* added back accidental removal of clean spaces for hunyuan
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* updated hash and reordererd model list
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update include/llama.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf_update.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-vocab.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* removed old tensor name
* removed tensor mappings -> handled by smolvlm
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Signed-off-by: ryan-mangeno <ryanmangeno@gmail.com >
Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com >
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2025-07-10 19:41:00 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Dowon 
							
						 
					 
					
						
						
							
						
						ffd59e7d18 
					 
					
						
						
							
							model : add skt/A.X-4.0 model vocabulary ( #14589 )  
						
						 
						
						
						
						
					 
					
						2025-07-09 11:22:31 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ibrahim khadraoui 
							
						 
					 
					
						
						
							
						
						04655063c4 
					 
					
						
						
							
							model : add support for Falcon-H1 family ( #14534 )  
						
						 
						
						... 
						
						
						
						* v1
* push more fixes
* another fix
* fix
* more fixes
* minor fix
* more cleaning on python code
* python fixes
* changed precision for multipliers float 32->64
* fixes
* another fix
* fix
* pre-norm -> norm
* fix
* Revert "fix"
This reverts commit 243e4d1a50 .
* fix
* small fix ffn_norm
* try
* mix instead of max
* fix vocab size
* conflict solve
* fixed multipliers
* falcon-h1 specefic vocab resolved
* read arch from gguf.MODEL_ARCH
* mamba_d_ssm added to d_inner find_hparam
* remove unused functions from gguf_writer.py
* override modify_tensors instead of get_tensors
* fix conversion and d_inner
* added some cb functions for debugging puposes
* inp_out_ids moved outside of layers loop
* mup_vec create as float64
* fix rope_theta
* injected mup
* clean ups
* rm extra space
* rm unused MAMBA_CHUNK_SIZE
* rm unused key
* add bos False
* changed ROPE_TYPE
* cleaning debugging stuff
* cleaning debug quant
* fix comment
* some cleanups
* some cleanups
* Update src/llama-model-loader.cpp
* more cleanups
* moe cleanuips
* d_ssm -> d_inner;
* cleaning unused hparams
* cleanup
* more cleanups
* more cleanups on python conversion;
* minor cleanups
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* remove todo
* added falcon-h1
* tensor not required
* clean
* remove unneeded attributes
* more cleanups and fixed conversion
* remove final_norm
* flake8 fixes
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* flake8 fixes
* Update src/llama-hparams.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-arch.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* added hashes
* Update src/llama-arch.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-vocab.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* update the update file
* Revert "update the update file"
This reverts commit 082ab4ad2a .
* fix: address suggestions
* fix: update convert_hf_to_gguf.py
* Update gguf-py/gguf/constants.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model-loader.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* d_inner fixed
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* reshaping ssm_norm for 34B
* removing generate_mup
* remove duplicates metadata keys
* rm comment
* final comment
* fix unused args
* fix constants
* fix bad merge
* Update src/llama-model.cpp
Co-authored-by: compilade <git@compilade.net >
* falcon-h1: remove unused ssm_in_b and bad merge
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* falcon-h1: fix last comment
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net >
* falcon-h1: revert add_add_bos(False)
* falcon-h1: fix tied weights
* falcon-h1: remove whitespace
* falcon-h1: fix wrong size param
* falcon-h1: fix whitespace issues
---------
Co-authored-by: younesbelkada <younes.belkada@tii.ae >
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2025-07-09 10:03:49 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						8f22dc0a53 
					 
					
						
						
							
							model : add hunyuan moe ( #14425 )  
						
						 
						
						... 
						
						
						
						* model : add hunyuan moe
* tokenizer ok
* fix tensor name
* cgraph init
* chat template
* wip
* almost working
* skip embed, fix bos
* cleanup
* yarn scaling
* cleanup
* correct rope type
* failed token fix
* ntk alpha freq_base
* tokenization working
* cleanup and pr changes
* vocab_size sanity check
* ntk alpha generic
* Update convert_hf_to_gguf.py
* Apply suggestions from code review
* fix regression
* fix style
---------
Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com > 
						
						
					 
					
						2025-07-08 11:24:06 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						22015b2092 
					 
					
						
						
							
							lint : remove trailing whitepace ( #14304 )  
						
						 
						
						
						
						
					 
					
						2025-06-20 16:37:44 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Ruikai Peng 
							
						 
					 
					
						
						
							
						
						dd6e6d0b6a 
					 
					
						
						
							
							vocab : prevent tokenizer overflow ( #14301 )  
						
						 
						
						... 
						
						
						
						* vocab : prevent stack overflow in tokenize
* vocab : return error instead of aborting on oversized token count
* vocab : INT32_MIN from llama_tokenize on overflow 
						
						
					 
					
						2025-06-20 07:13:06 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						88fc854b4b 
					 
					
						
						
							
							llama : improve sep token handling ( #14272 )  
						
						 
						
						
						
						
					 
					
						2025-06-20 14:04:09 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								fanyang 
							
						 
					 
					
						
						
							
						
						456af35eb7 
					 
					
						
						
							
							build : suppress gcc15 compile warnings ( #14261 )  
						
						 
						
						... 
						
						
						
						* Change _contains_any() substrs to std::string_view and fix the find comparison logic. 
						
						
					 
					
						2025-06-19 14:49:48 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Bartowski 
							
						 
					 
					
						
						
							
						
						d7da8dc83a 
					 
					
						
						
							
							model : Add support for Arcee AI's upcoming AFM model ( #14185 )  
						
						 
						
						... 
						
						
						
						* Add Arcee AFM support
* Add draft update code
* Fix linter and update URL, may still not be final
* Update src/llama-model.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Remote accidental blank line
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-06-16 01:04:06 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						fb85a288d7 
					 
					
						
						
							
							vocab : fix build ( #14175 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-06-13 20:03:05 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Guy Goldenberg 
							
						 
					 
					
						
						
							
						
						3cfbbdb44e 
					 
					
						
						
							
							Merge commit from fork  
						
						 
						
						... 
						
						
						
						* vocab : prevent integer overflow during load
* Add static cast and GGML_ABORT
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-06-13 19:20:25 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c33fe8b8c4 
					 
					
						
						
							
							vocab : prevent heap overflow when vocab is too small ( #14145 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-06-13 08:03:54 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						9f47fa5792 
					 
					
						
						
							
							vocab : warn about missing mask token ( #14022 )  
						
						 
						
						
						
						
					 
					
						2025-06-05 09:29:18 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						5e1c3aed40 
					 
					
						
						
							
							convert : fix nomic-bert-moe mask token ( #13757 )  
						
						 
						
						
						
						
					 
					
						2025-06-01 18:07:21 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						c3a2624339 
					 
					
						
						
							
							vocab : fix ugm tokenizer precision ( #13743 )  
						
						 
						
						
						
						
					 
					
						2025-05-24 12:29:09 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						10d2af0eaa 
					 
					
						
						
							
							llama/ggml: add LLM training support ( #10544 )  
						
						 
						
						... 
						
						
						
						* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period 
						
						
					 
					
						2025-05-12 14:44:49 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						d2a4ef05c6 
					 
					
						
						
							
							vocab : add ByteDance-Seed/Seed-Coder ( #13423 )  
						
						 
						
						
						
						
					 
					
						2025-05-10 22:08:07 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						ecda2ec4b3 
					 
					
						
						
							
							mtmd : Support Pixtral 12B ( #13065 )  
						
						 
						
						... 
						
						
						
						* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script 
						
						
					 
					
						2025-04-23 20:21:59 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Mikko Juola 
							
						 
					 
					
						
						
							
						
						971f245b3b 
					 
					
						
						
							
							llama : recognize IBM Granite 3.3 FIM tokens ( #12988 )  
						
						 
						
						... 
						
						
						
						The Granite's FIM tokens are very similar to Qwen's; it's just that
they use underscore instead of a dash. So <fim_middle> for example
instead of <fim-middle>.
Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base
shows:
```
    "<fim_prefix>",
    "<fim_middle>",
    "<fim_suffix>",
    "<fim_pad>",
    ...
    "<reponame>",
``` 
						
						
					 
					
						2025-04-17 11:37:05 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yuxuan Zhang 
							
						 
					 
					
						
						
							
						
						06bb53ad9b 
					 
					
						
						
							
							llama-model : add Glm4Model implementation for GLM-4-0414 ( #12867 )  
						
						 
						
						... 
						
						
						
						* GLM-4-0414
* use original one
* Using with tensor map
* fix bug
* change order
* change order
* format with flask8 
						
						
					 
					
						2025-04-11 12:10:10 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						1466621e73 
					 
					
						
						
							
							llama : Support llama 4 text-only ( #12791 )  
						
						 
						
						... 
						
						
						
						* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build 
						
						
					 
					
						2025-04-07 23:06:44 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								yumeyao 
							
						 
					 
					
						
						
							
						
						5dd5d1ab00 
					 
					
						
						
							
							vocab : use string_view::find() to avoid unnecessary looking up beyond the fragment range ( #12706 )  
						
						 
						
						
						
						
					 
					
						2025-04-03 18:32:54 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						83a88bd6af 
					 
					
						
						
							
							vocab : BailingMoE : change possessive quantifiers to greedy ( #12677 )  
						
						 
						
						
						
						
					 
					
						2025-04-02 11:21:48 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						c80a7759da 
					 
					
						
						
							
							vocab : add special infill tokens for CodeLlama ( #11850 )  
						
						 
						
						... 
						
						
						
						* vocab : add special infill tokens for CodeLlama
The commit adds the following special tokens for CodeLlama infill:
- `▁<PRE>`
- `▁<SUF>`
- `▁<MID>`
The motivation for this is that currently the infill example uses
CodeLlama as a suggested model. But when using this model the following
error is generated:
```console
/llama.cpp-debug/examples/infill/infill.cpp:165: GGML_ASSERT(llama_vocab_fim_pre(vocab) >= 0) failed
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
305251 Aborted                 (core dumped)
./build/bin/llama-infill -t 10 -ngl 0 -m models/codellama-13b.Q5_K_S.gguf \
  -c 4096 --temp 0.7 --repeat_penalty 1.1 -n 20 \
  --in-prefix "def helloworld():\n    print(\"hell" \
  --in-suffix "\n   print(\"goodbye world\")\n    "
```
* squash! vocab : add special infill tokens for CodeLlama
Add _<EOT> as well. 
						
						
					 
					
						2025-03-31 18:40:56 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2c3f8b850a 
					 
					
						
						
							
							llama : support BailingMoE (Ling) ( #12634 )  
						
						 
						
						
						
						
					 
					
						2025-03-30 22:21:03 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Juyoung Suk 
							
						 
					 
					
						
						
							
						
						b3de7cac73 
					 
					
						
						
							
							llama : add Trillion 7B model support ( #12556 )  
						
						 
						
						... 
						
						
						
						* Support Trillion 7B
* Update llama.h
* Update llama.h
* Update llama-vocab.cpp for Trillion
* Update llama-vocab.cpp 
						
						
					 
					
						2025-03-30 20:38:33 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								compilade 
							
						 
					 
					
						
						
							
						
						00d53800e0 
					 
					
						
						
							
							llama-vocab : add SuperBPE pre-tokenizer ( #12532 )  
						
						 
						
						
						
						
					 
					
						2025-03-24 11:47:24 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								mgroeber9110 
							
						 
					 
					
						
						
							
						
						5bbe6a9fe9 
					 
					
						
						
							
							ggml : portability fixes for VS 2017 ( #12150 )  
						
						 
						
						... 
						
						
						
						* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com > 
						
						
					 
					
						2025-03-04 18:53:26 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						c43a3e7996 
					 
					
						
						
							
							llama : add Phi-4-mini support (supersede  #12099 ) ( #12108 )  
						
						 
						
						... 
						
						
						
						* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com > 
						
						
					 
					
						2025-02-28 12:44:11 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								mgroeber9110 
							
						 
					 
					
						
						
							
						
						ffd0821c57 
					 
					
						
						
							
							vocab : correctly identify LF token for GPT-2 style BPE tokenizer ( #11496 )  
						
						 
						
						
						
						
					 
					
						2025-01-30 12:10:59 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								lexasub 
							
						 
					 
					
						
						
							
						
						a5203b4465 
					 
					
						
						
							
							llama : minor fixes for up llama load model speed ( #11448 )  
						
						 
						
						... 
						
						
						
						* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%
* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
* Update src/llama-vocab.cpp
---------
Co-authored-by: lexasub <empty@empty.ru >
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-27 14:42:09 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						ec7f3ac9ab 
					 
					
						
						
							
							llama : add support for Deepseek-R1-Qwen distill model ( #11310 )  
						
						 
						
						... 
						
						
						
						* llama : add support for Deepseek-R1-Qwen distill model
* coding style 
						
						
					 
					
						2025-01-20 14:35:07 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a133566d34 
					 
					
						
						
							
							vocab : fix double-eos check ( #11273 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-17 09:28:00 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						bbf3e55e35 
					 
					
						
						
							
							vocab : add dummy tokens for "no_vocab" type ( #11231 )  
						
						 
						
						... 
						
						
						
						* vocab : add dummy tokens for "no_vocab" type
ggml-ci
* vocab : minor [no ci] 
						
						
					 
					
						2025-01-14 11:54:58 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						8f70fc3d1b 
					 
					
						
						
							
							llama : remove 'd' from bad special token log ( #11212 )  
						
						 
						
						... 
						
						
						
						This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
    'tokenizer.ggml.image_token_id' = 128256d, using default id -1
``` 
						
						
					 
					
						2025-01-13 13:38:20 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						08f10f69c3 
					 
					
						
						
							
							llama : remove notion of CLS token ( #11064 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-12 12:15:53 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						afa8a9ec9b 
					 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming ( #11110 )  
						
						 
						
						... 
						
						
						
						* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-12 11:32:42 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						727368c60f 
					 
					
						
						
							
							llama : use LLAMA_TOKEN_NULL ( #11062 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 10:52:15 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								fairydreaming 
							
						 
					 
					
						
						
							
						
						9394bbd484 
					 
					
						
						
							
							llama : Add support for DeepSeek V3 ( #11049 )  
						
						 
						
						... 
						
						
						
						* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-01-04 21:06:11 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f66f582927 
					 
					
						
						
							
							llama : refactor src/llama.cpp ( #10902 )  
						
						 
						
						... 
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 
					
						2025-01-03 10:18:53 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						30caac3a68 
					 
					
						
						
							
							llama : the WPM vocabs use the CLS token as BOS ( #10930 )  
						
						 
						
						... 
						
						
						
						* llama : the WPM vocabs use the CLS token as BOS
ggml-ci
* llama : add comment 
						
						
					 
					
						2024-12-24 09:44:20 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0bf2d10c55 
					 
					
						
						
							
							tts : add OuteTTS support ( #10784 )  
						
						 
						
						... 
						
						
						
						* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder 
						
						
					 
					
						2024-12-18 19:27:21 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						08ea539df2 
					 
					
						
						
							
							unicode : improve naming style ( #10838 )  
						
						 
						
						... 
						
						
						
						* unicode : improve naming style
ggml-ci
* cont [no ci] 
						
						
					 
					
						2024-12-16 12:31:45 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Riccardo Orlando 
							
						 
					 
					
						
						
							
						
						6fe6247831 
					 
					
						
						
							
							llama : add Minerva 7B model support ( #10673 )  
						
						 
						
						... 
						
						
						
						* Support for Minerva 7B
* Update convert_hf_to_gguf_update.py 
						
						
					 
					
						2024-12-05 20:30:59 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								wwoodsTM 
							
						 
					 
					
						
						
							
						
						ff252ea48e 
					 
					
						
						
							
							llama : add DRY sampler ( #9702 )  
						
						 
						
						... 
						
						
						
						* sampling : add DRY sampler (post-refactor)
* DRY: Trying to fix coauthors, removed unneeded line
* DRY: Fixed redundant code
* DRY: Fixed crash issue due to DRY being in chain but uninitialized
---------
Co-authored-by: l3utterfly <gc.pthzfoldr@gmail.com >
Co-authored-by: pi6am <34464159+pi6am@users.noreply.github.com > 
						
						
					 
					
						2024-10-25 19:07:34 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						99bd4ac28c 
					 
					
						
						
							
							llama : infill sampling handle very long tokens ( #9924 )  
						
						 
						
						... 
						
						
						
						* llama : infill sampling handle very long tokens
ggml-ci
* cont : better indices
ggml-ci 
						
						
					 
					
						2024-10-17 22:32:47 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						9e04102448 
					 
					
						
						
							
							llama : suppress conversion from 'size_t' to 'int' ( #9046 )  
						
						 
						
						... 
						
						
						
						* llama : suppress conversion from 'size_t' to 'int'
This commit updates llm_tokenizer_spm.tokenize to suppress/remove the
following warnings that are generated on Windows when using MSVC:
```console
src\llama-vocab.cpp(211,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
src\llama-vocab.cpp(517,1): warning C4267: 'argument':
    conversion from 'size_t' to 'int', possible loss of data
```
This is done by adding a cast for the size_t returned from
symbols.size(). I believe this is safe as it seems unlikely that
symbols, which stores an entry for each UTF8 character, would become
larger than INT_MAX.
The motivation for this change is to reduce the number of warnings that
are currently generated when building on Windows.
* squash! llama : suppress conversion from 'size_t' to 'int'
Move cast into for loop. 
						
						
					 
					
						2024-10-16 20:34:28 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						755a9b2bf0 
					 
					
						
						
							
							llama : add infill sampler ( #9896 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-15 16:35:33 +03:00