Sam 
							
						 
					 
					
						
						
							
						
						ef0144c087 
					 
					
						
						
							
							model: support GLM 4.5 family of models ( #14939 )  
						
						... 
						
						
						
						* model: Add GLM 4.5 (#14921 )
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Merge in PR suggestions
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: Add GLM 4.5 family of models (#14921 )
1. Updated tensor_mapping.py with NextN tensor mappings
- Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py
- Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm
2. Added num_nextn_predict_layers configuration
- Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp
- Added num_nextn_predict_layers field to llama_hparams struct
- Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter
- Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers
- Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method
- Updated conversion script to extract and write this parameter from HuggingFace config
3. Added FIM tokens for GLM4_MOE
- Added GLM-4.5's FIM tokens to llama-vocab.cpp:
  - <|code_prefix|> for FIM_PRE
  - <|code_suffix|> for FIM_SUF
  - <|code_middle|> for FIM_MID
4. Removed manual NextN tensor handling
- Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors
- NextN tensors are now handled automatically through the proper tensor mapping system
* glm 4.5 update tensors names
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: glm 4.5 apply suggestions from code review
* Apply suggestions from code review
* patch broken chat template
* typings fix
* add TENSOR_SKIP flag
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* Update src/llama-model-loader.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-08-04 20:29:25 +02:00 
						 
				 
			
				
					
						
							
							
								Csaba Kecskemeti 
							
						 
					 
					
						
						
							
						
						97366dc6ab 
					 
					
						
						
							
							vocab : JetBrains Mellum pre-tokenizer ( #15045 )  
						
						
						
						
					 
					
						2025-08-03 21:38:18 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2bf3fbf0b5 
					 
					
						
						
							
							ci : check that pre-tokenizer hashes are up-to-date ( #15032 )  
						
						... 
						
						
						
						* torch is not required for convert_hf_to_gguf_update
* add --check-missing parameter
* check that pre-tokenizer hashes are up-to-date 
						
						
					 
					
						2025-08-02 14:39:01 +02:00 
						 
				 
			
				
					
						
							
							
								Douglas Hanley 
							
						 
					 
					
						
						
							
						
						711d5e6fe6 
					 
					
						
						
							
							convert : fix Qwen3-Embedding pre-tokenizer hash ( #15030 )  
						
						
						
						
					 
					
						2025-08-02 12:51:02 +02:00 
						 
				 
			
				
					
						
							
							
								stevenkuang 
							
						 
					 
					
						
						
							
						
						0f5ccd6fd1 
					 
					
						
						
							
							model : add hunyuan dense ( #14878 )  
						
						... 
						
						
						
						* support hunyuan_v1_dense
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* update hunyuan_moe to hunyuan_v1_moe
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix rope alpha assert and bos token
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* add blank line
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* Revert "update hunyuan_moe to hunyuan_v1_moe"
This reverts commit aa973ca219stevenkuang@tencent.com >
* fix hunyuan_moe chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* remove leftover code
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* update hunyuan dense chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix hunyuan dense vocab and chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
---------
Signed-off-by: stevenkuang <stevenkuang@tencent.com > 
						
						
					 
					
						2025-08-01 15:31:12 +02:00 
						 
				 
			
				
					
						
							
							
								lgai-exaone 
							
						 
					 
					
						
						
							
						
						e0cb5c5cb8 
					 
					
						
						
							
							model : add EXAONE 4.0 support ( #14630 )  
						
						
						
						
					 
					
						2025-07-18 10:45:49 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						19e5943d9e 
					 
					
						
						
							
							convert : make hf token optional ( #14717 )  
						
						... 
						
						
						
						* make hf token optional
* fail if we can't get necessary tokenizer config 
						
						
					 
					
						2025-07-16 23:17:43 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						4b91d6f71f 
					 
					
						
						
							
							convert : only check for tokenizer folder if we need it ( #14704 )  
						
						
						
						
					 
					
						2025-07-16 08:52:04 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						cf91f217f1 
					 
					
						
						
							
							convert : add pre-computed hashes first to prevent order mishaps ( #14701 )  
						
						
						
						
					 
					
						2025-07-16 08:51:12 +02:00 
						 
				 
			
				
					
						
							
							
								Gabriel Larson 
							
						 
					 
					
						
						
							
						
						4a4f426944 
					 
					
						
						
							
							model : add Kimi-K2 support ( #14654 )  
						
						... 
						
						
						
						* Kimi-K2 conversion
* add Kimi_K2  pre type
* Kimi-K2
* Kimi-K2 unicode
* Kimi-K2
* LLAMA_MAX_EXPERTS 384
* fix vocab iteration
* regex space fix
* add kimi-k2 to pre_computed_hashes
* Updated with kimi-k2 get_vocab_base_pre hash
* fix whitespaces
* fix flake errors
* remove more unicode.cpp whitespaces
* change set_vocab() flow
* add moonshotai-Kimi-K2.jinja to /models/templates/
* update moonshotai-Kimi-K2.jinja
* add kimi-k2 chat template
* add kimi-k2
* update NotImplementedError
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* except Exception
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){}
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com > 
						
						
					 
					
						2025-07-15 21:54:22 +02:00 
						 
				 
			
				
					
						
							
							
								Tarek Dakhran 
							
						 
					 
					
						
						
							
						
						f5e96b368f 
					 
					
						
						
							
							model : support LiquidAI LFM2 hybrid family ( #14620 )  
						
						... 
						
						
						
						**Important**
LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340 )into transformers, but has not yet been released.
To convert into gguf, install transformers from source
```shell
pip install "transformers @ git+https://github.com/huggingface/transformers.git@main "
``` 
						
						
					 
					
						2025-07-11 20:27:01 +02:00 
						 
				 
			
				
					
						
							
							
								Dowon 
							
						 
					 
					
						
						
							
						
						576c82eda2 
					 
					
						
						
							
							vocab : add midm-2.0 model pre-tokenizer ( #14626 )  
						
						
						
						
					 
					
						2025-07-11 09:36:04 +02:00 
						 
				 
			
				
					
						
							
							
								Dowon 
							
						 
					 
					
						
						
							
						
						ffd59e7d18 
					 
					
						
						
							
							model : add skt/A.X-4.0 model vocabulary ( #14589 )  
						
						
						
						
					 
					
						2025-07-09 11:22:31 +03:00 
						 
				 
			
				
					
						
							
							
								ibrahim khadraoui 
							
						 
					 
					
						
						
							
						
						04655063c4 
					 
					
						
						
							
							model : add support for Falcon-H1 family ( #14534 )  
						
						... 
						
						
						
						* v1
* push more fixes
* another fix
* fix
* more fixes
* minor fix
* more cleaning on python code
* python fixes
* changed precision for multipliers float 32->64
* fixes
* another fix
* fix
* pre-norm -> norm
* fix
* Revert "fix"
This reverts commit 243e4d1a50ggerganov@gmail.com >
* remove todo
* added falcon-h1
* tensor not required
* clean
* remove unneeded attributes
* more cleanups and fixed conversion
* remove final_norm
* flake8 fixes
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* flake8 fixes
* Update src/llama-hparams.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-arch.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* added hashes
* Update src/llama-arch.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-vocab.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* update the update file
* Revert "update the update file"
This reverts commit 082ab4ad2asigbjorn.skjaeret@scala.com >
* Update src/llama-model-loader.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* d_inner fixed
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* reshaping ssm_norm for 34B
* removing generate_mup
* remove duplicates metadata keys
* rm comment
* final comment
* fix unused args
* fix constants
* fix bad merge
* Update src/llama-model.cpp
Co-authored-by: compilade <git@compilade.net >
* falcon-h1: remove unused ssm_in_b and bad merge
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* falcon-h1: fix last comment
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net >
* falcon-h1: revert add_add_bos(False)
* falcon-h1: fix tied weights
* falcon-h1: remove whitespace
* falcon-h1: fix wrong size param
* falcon-h1: fix whitespace issues
---------
Co-authored-by: younesbelkada <younes.belkada@tii.ae >
Co-authored-by: Younes B <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2025-07-09 10:03:49 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						8f22dc0a53 
					 
					
						
						
							
							model : add hunyuan moe ( #14425 )  
						
						... 
						
						
						
						* model : add hunyuan moe
* tokenizer ok
* fix tensor name
* cgraph init
* chat template
* wip
* almost working
* skip embed, fix bos
* cleanup
* yarn scaling
* cleanup
* correct rope type
* failed token fix
* ntk alpha freq_base
* tokenization working
* cleanup and pr changes
* vocab_size sanity check
* ntk alpha generic
* Update convert_hf_to_gguf.py
* Apply suggestions from code review
* fix regression
* fix style
---------
Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com > 
						
						
					 
					
						2025-07-08 11:24:06 +03:00 
						 
				 
			
				
					
						
							
							
								Bartowski 
							
						 
					 
					
						
						
							
						
						0bf49eb668 
					 
					
						
						
							
							convert : remove arcee change in convert_hf_to_gguf_update.py ( #14207 )  
						
						
						
						
					 
					
						2025-06-16 10:16:06 +02:00 
						 
				 
			
				
					
						
							
							
								Bartowski 
							
						 
					 
					
						
						
							
						
						d7da8dc83a 
					 
					
						
						
							
							model : Add support for Arcee AI's upcoming AFM model ( #14185 )  
						
						... 
						
						
						
						* Add Arcee AFM support
* Add draft update code
* Fix linter and update URL, may still not be final
* Update src/llama-model.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Remote accidental blank line
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-06-16 01:04:06 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						07e4351ce6 
					 
					
						
						
							
							convert : allow partial update to the chkhsh pre-tokenizer list ( #13847 )  
						
						... 
						
						
						
						* convert : allow partial update to the chkhsh pre-tokenizer list
* code style
* update tokenizer out
* rm inp/out files for models not having gguf
* fixed hash for glm
* skip nomic-bert-moe test
* Update convert_hf_to_gguf_update.py
* fix minerva-7b hash
* rm redundant import 
						
						
					 
					
						2025-05-30 12:24:37 +02:00 
						 
				 
			
				
					
						
							
							
								Alex Fanthome 
							
						 
					 
					
						
						
							
						
						f7873fc698 
					 
					
						
						
							
							tests : change umlaut test ( #11600 )  
						
						
						
						
					 
					
						2025-05-28 15:49:28 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						d2a4ef05c6 
					 
					
						
						
							
							vocab : add ByteDance-Seed/Seed-Coder ( #13423 )  
						
						
						
						
					 
					
						2025-05-10 22:08:07 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						ecda2ec4b3 
					 
					
						
						
							
							mtmd : Support Pixtral 12B ( #13065 )  
						
						... 
						
						
						
						* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script 
						
						
					 
					
						2025-04-23 20:21:59 +02:00 
						 
				 
			
				
					
						
							
							
								Yuxuan Zhang 
							
						 
					 
					
						
						
							
						
						06bb53ad9b 
					 
					
						
						
							
							llama-model : add Glm4Model implementation for GLM-4-0414 ( #12867 )  
						
						... 
						
						
						
						* GLM-4-0414
* use original one
* Using with tensor map
* fix bug
* change order
* change order
* format with flask8 
						
						
					 
					
						2025-04-11 12:10:10 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						1466621e73 
					 
					
						
						
							
							llama : Support llama 4 text-only ( #12791 )  
						
						... 
						
						
						
						* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build 
						
						
					 
					
						2025-04-07 23:06:44 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2c3f8b850a 
					 
					
						
						
							
							llama : support BailingMoE (Ling) ( #12634 )  
						
						
						
						
					 
					
						2025-03-30 22:21:03 +02:00 
						 
				 
			
				
					
						
							
							
								Juyoung Suk 
							
						 
					 
					
						
						
							
						
						b3de7cac73 
					 
					
						
						
							
							llama : add Trillion 7B model support ( #12556 )  
						
						... 
						
						
						
						* Support Trillion 7B
* Update llama.h
* Update llama.h
* Update llama-vocab.cpp for Trillion
* Update llama-vocab.cpp 
						
						
					 
					
						2025-03-30 20:38:33 +02:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						00d53800e0 
					 
					
						
						
							
							llama-vocab : add SuperBPE pre-tokenizer ( #12532 )  
						
						
						
						
					 
					
						2025-03-24 11:47:24 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						c43a3e7996 
					 
					
						
						
							
							llama : add Phi-4-mini support (supersede  #12099 ) ( #12108 )  
						
						... 
						
						
						
						* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com > 
						
						
					 
					
						2025-02-28 12:44:11 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						68ff663a04 
					 
					
						
						
							
							repo : update links to new url ( #11886 )  
						
						... 
						
						
						
						* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci 
						
						
					 
					
						2025-02-15 16:40:57 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						ec7f3ac9ab 
					 
					
						
						
							
							llama : add support for Deepseek-R1-Qwen distill model ( #11310 )  
						
						... 
						
						
						
						* llama : add support for Deepseek-R1-Qwen distill model
* coding style 
						
						
					 
					
						2025-01-20 14:35:07 +01:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						9394bbd484 
					 
					
						
						
							
							llama : Add support for DeepSeek V3 ( #11049 )  
						
						... 
						
						
						
						* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-01-04 21:06:11 +01:00 
						 
				 
			
				
					
						
							
							
								Yun Dou 
							
						 
					 
					
						
						
							
						
						b92a14a841 
					 
					
						
						
							
							llama : support InfiniAI Megrez 3b ( #10893 )  
						
						... 
						
						
						
						* Support InfiniAI Megrez 3b
* Fix tokenizer_clean_spaces for megrez 
						
						
					 
					
						2024-12-23 01:35:44 +01:00 
						 
				 
			
				
					
						
							
							
								Billel Mokeddem 
							
						 
					 
					
						
						
							
						
						7ae33a616f 
					 
					
						
						
							
							llama : add Falcon3 support ( #10883 )  
						
						... 
						
						
						
						* Add Falcon3 model support
* Add fix for adding bos to added special tokens
* Add comment explaining the logic behind the if statement
* Add a log message to better track the when the following line of code is triggered
* Update log to only print when input and output characters are different
* Fix handling pre-normalized tokens
* Refactoring 
						
						
					 
					
						2024-12-23 00:09:58 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						4da69d1abd 
					 
					
						
						
							
							Revert "llama : add Falcon3 support ( #10864 )" ( #10876 )  
						
						... 
						
						
						
						This reverts commit 382bc7f2e8 
						
						
					 
					
						2024-12-18 01:36:46 +01:00 
						 
				 
			
				
					
						
							
							
								Billel Mokeddem 
							
						 
					 
					
						
						
							
						
						382bc7f2e8 
					 
					
						
						
							
							llama : add Falcon3 support ( #10864 )  
						
						
						
						
					 
					
						2024-12-17 17:24:56 +02:00 
						 
				 
			
				
					
						
							
							
								Valentin Mamedov 
							
						 
					 
					
						
						
							
						
						a0974156f3 
					 
					
						
						
							
							llama : add Deepseek MoE v1 & GigaChat models ( #10827 )  
						
						... 
						
						
						
						* Add deepseek v1 arch & gigachat template
* improve template code
* add readme
* delete comments
* remove comment
* fix format
* lint llama.cpp
* fix order of deepseek and deepseek2, move gigachat temlate to the end of func
* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need
* remove comments
* move deepseek above deepseek2
* change placement of gigachat chat template 
						
						
					 
					
						2024-12-15 19:02:46 +02:00 
						 
				 
			
				
					
						
							
							
								Sukriti Sharma 
							
						 
					 
					
						
						
							
						
						784a14aa49 
					 
					
						
						
							
							convert : add support for Roberta embeddings ( #10695 )  
						
						
						
						
					 
					
						2024-12-07 09:02:14 +02:00 
						 
				 
			
				
					
						
							
							
								Riccardo Orlando 
							
						 
					 
					
						
						
							
						
						6fe6247831 
					 
					
						
						
							
							llama : add Minerva 7B model support ( #10673 )  
						
						... 
						
						
						
						* Support for Minerva 7B
* Update convert_hf_to_gguf_update.py 
						
						
					 
					
						2024-12-05 20:30:59 +02:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						d405804be8 
					 
					
						
						
							
							py : update outdated copy-paste instructions [no ci] ( #10667 )  
						
						... 
						
						
						
						This commit updates the copy-paste instruction in
convert_hf_to_gguf_update.py to reflect that convert_hf_to_gguf.py
will have already been updated with the new get_vocab_base_pre()
function when this script completes. 
						
						
					 
					
						2024-12-05 09:47:55 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						bc5ba007b2 
					 
					
						
						
							
							server : check that the prompt fits in the slot's context ( #10030 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-25 10:13:46 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f4d2b8846a 
					 
					
						
						
							
							llama : add reranking support ( #9510 )  
						
						... 
						
						
						
						* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-09-28 17:42:03 +03:00 
						 
				 
			
				
					
						
							
							
								nopperl 
							
						 
					 
					
						
						
							
						
						9a913110cf 
					 
					
						
						
							
							llama : add support for Chameleon ( #8543 )  
						
						... 
						
						
						
						* convert chameleon hf to gguf
* add chameleon tokenizer tests
* fix lint
* implement chameleon graph
* add swin norm param
* return qk norm weights and biases to original format
* implement swin norm
* suppress image token output
* rem tabs
* add comment to conversion
* fix ci
* check for k norm separately
* adapt to new lora implementation
* fix layer input for swin norm
* move swin_norm in gguf writer
* add comment regarding special token regex in chameleon pre-tokenizer
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net >
* fix punctuation regex in chameleon pre-tokenizer (@compilade)
Co-authored-by: compilade <git@compilade.net >
* fix lint
* trigger ci
---------
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2024-09-28 15:08:43 +03:00 
						 
				 
			
				
					
						
							
							
								daminho 
							
						 
					 
					
						
						
							
						
						c837981bba 
					 
					
						
						
							
							py : add Phi-1.5/Phi-2 tokenizer ( #9361 )  
						
						... 
						
						
						
						* add phi2 tokenizer
* add phi name to convert_hf_to_gguf_update.py
* make tokenizer_pre consistent; llama.cpp work 
						
						
					 
					
						2024-09-12 14:28:20 +03:00 
						 
				 
			
				
					
						
							
							
								Pavel Zloi 
							
						 
					 
					
						
						
							
						
						8db003a19d 
					 
					
						
						
							
							py : support converting local models ( #7547 )  
						
						... 
						
						
						
						* Support of converting local models added to convert-hf-to-gguf-update.py
* Description fixed
* shutil added to imports 
						
						
					 
					
						2024-09-11 15:29:51 +03:00 
						 
				 
			
				
					
						
							
							
								Minsoo Cheong 
							
						 
					 
					
						
						
							
						
						c679e0cb5c 
					 
					
						
						
							
							llama : add EXAONE model support ( #9025 )  
						
						... 
						
						
						
						* add exaone model support
* add chat template
* fix whitespace
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* add ftype
* add exaone pre-tokenizer in `llama-vocab.cpp`
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com >
* fix lint
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com >
* add `EXAONE` to supported models in `README.md`
* fix space
Co-authored-by: compilade <git@compilade.net >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: compilade <113953597+compilade@users.noreply.github.com >
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2024-08-16 09:35:18 +03:00 
						 
				 
			
				
					
						
							
							
								Esko Toivonen 
							
						 
					 
					
						
						
							
						
						6bda7ce6c3 
					 
					
						
						
							
							llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish ( #8850 )  
						
						
						
						
					 
					
						2024-08-15 10:17:12 +03:00 
						 
				 
			
				
					
						
							
							
								Keke Han 
							
						 
					 
					
						
						
							
						
						081fe431aa 
					 
					
						
						
							
							llama : fix codeshell support ( #8599 )  
						
						... 
						
						
						
						* llama : fix codeshell support
* llama : move codeshell after smollm below to respect the enum order 
						
						
					 
					
						2024-07-22 19:43:43 +03:00 
						 
				 
			
				
					
						
							
							
								Jason Stillerman 
							
						 
					 
					
						
						
							
						
						d94c6e0ccb 
					 
					
						
						
							
							llama : add support for SmolLm pre-tokenizer ( #8609 )  
						
						... 
						
						
						
						* Adding SmolLM Pre Tokenizer
* Update convert_hf_to_gguf_update.py
Co-authored-by: compilade <git@compilade.net >
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net >
* handle regex
* removed .inp and out .out ggufs
---------
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2024-07-22 17:43:01 +03:00 
						 
				 
			
				
					
						
							
							
								Jiří Podivín 
							
						 
					 
					
						
						
							
						
						566daa5a5b 
					 
					
						
						
							
							*.py: Stylistic adjustments for python ( #8233 )  
						
						... 
						
						
						
						* Superflous parens in conditionals were removed.
* Unused args in function were removed.
* Replaced unused `idx` var with `_`
* Initializing file_format and format_version attributes
* Renaming constant to capitals
* Preventing redefinition of the `f` var
Signed-off-by: Jiri Podivin <jpodivin@redhat.com > 
						
						
					 
					
						2024-07-22 23:44:53 +10:00 
						 
				 
			
				
					
						
							
							
								Michael Coppola 
							
						 
					 
					
						
						
							
						
						940362224d 
					 
					
						
						
							
							llama : add support for Tekken pre-tokenizer ( #8579 )  
						
						... 
						
						
						
						* llama : Added support for Tekken pre-tokenizer (#8577 )
Removed uneeded `vocab.tokenizer_clean_spaces` assignment
* llama : fix order of pre-tokenizers
* * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces
* Updated chkhsh for Tekken tokenizer
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-07-20 16:43:51 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e235b267a2 
					 
					
						
						
							
							py : switch to snake_case ( #8305 )  
						
						... 
						
						
						
						* py : switch to snake_case
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* cont : fix link
* gguf-py : use snake_case in scripts entrypoint export
* py : rename requirements for convert_legacy_llama.py
Needed for scripts/check-requirements.sh
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net > 
						
						
					 
					
						2024-07-05 07:53:33 +03:00