Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						403fbacbbc 
					 
					
						
						
							
							convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if present ( #12667 )  
						
						
						
						
					 
					
						2025-03-31 16:36:25 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2c3f8b850a 
					 
					
						
						
							
							llama : support BailingMoE (Ling) ( #12634 )  
						
						
						
						
					 
					
						2025-03-30 22:21:03 +02:00 
						 
				 
			
				
					
						
							
							
								Juyoung Suk 
							
						 
					 
					
						
						
							
						
						b3de7cac73 
					 
					
						
						
							
							llama : add Trillion 7B model support ( #12556 )  
						
						... 
						
						
						
						* Support Trillion 7B
* Update llama.h
* Update llama.h
* Update llama-vocab.cpp for Trillion
* Update llama-vocab.cpp 
						
						
					 
					
						2025-03-30 20:38:33 +02:00 
						 
				 
			
				
					
						
							
							
								Si1w 
							
						 
					 
					
						
						
							
						
						f125b8dccf 
					 
					
						
						
							
							llama : add PLM GGUF Conversion & Inference Support ( #12457 )  
						
						... 
						
						
						
						* add edgellm model arch[conversation feature doesn't work]
* remove output.weight layer for edgellm arch
* [Model] update the name of the model
* update the name of model arch in convert gguf
* [Model] Refarctor the model arch into llama-model
* [Bug] Fix the bug in create attn kv
* [Code] Fix editorconfig erros
* [Code] Remove Trailing whitespace
* [Code] Remove Trailing whitespace
* [Code] Change the order of model arch in list
* [Code] Fix flake8 Lint errors
* Remove trailing white space
* [Code] Remove  call in model arch 
						
						
					 
					
						2025-03-27 12:49:15 +02:00 
						 
				 
			
				
					
						
							
							
								Csaba Kecskemeti 
							
						 
					 
					
						
						
							
						
						d5c6309d91 
					 
					
						
						
							
							convert : Support Qwen2_5_VLForConditionalGeneration ( #12595 )  
						
						
						
						
					 
					
						2025-03-27 11:11:23 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						df4d20cd53 
					 
					
						
						
							
							convert : fix squeeze for ssm_conv tensors ( #12573 )  
						
						... 
						
						
						
						* convert : fix squeeze for ssm_conv tensors
* convert : match ssm_conv tensors by type
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net > 
						
						
					 
					
						2025-03-26 08:21:05 -04:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						53af4dba42 
					 
					
						
						
							
							convert: fix Mistral3/Gemma3 model hparams init ( #12571 )  
						
						... 
						
						
						
						* Fix Mistral3/Gemma3 model hparams init
* set positional args correctly
* use existing hparams if passed 
						
						
					 
					
						2025-03-25 23:03:10 +01:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						00d53800e0 
					 
					
						
						
							
							llama-vocab : add SuperBPE pre-tokenizer ( #12532 )  
						
						
						
						
					 
					
						2025-03-24 11:47:24 +01:00 
						 
				 
			
				
					
						
							
							
								Bartowski 
							
						 
					 
					
						
						
							
						
						732b5fbf5e 
					 
					
						
						
							
							convert : avoid calls to tokenizer.added_tokens_decoder ( #12473 )  
						
						... 
						
						
						
						tokenizer.added_tokens_decoder returns a fresh dict every time relatively slowly (~0.04s on average) which results in massive slowdowns when we have a huge number of added tokens 
						
						
					 
					
						2025-03-20 08:36:37 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						108e53c2f1 
					 
					
						
						
							
							llama : add support for GPT2, Bloom and CodeShell tied word embeddings ( #12456 )  
						
						... 
						
						
						
						* Add support for GPT2, Bloom and CodeShell tied word embeddings
* Deduplicate tied word embeddings weights
* Workaround for incorrect weight map
It appears transformer.wte.weight is in the weight map even though the weights are not there, remove it if output weights are encountered first.
* check++
* fatfingers-- 
						
						
					 
					
						2025-03-19 09:08:49 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						29fff308c7 
					 
					
						
						
							
							llama : support converting Mistral Small text-only ( #12450 )  
						
						
						
						
					 
					
						2025-03-18 19:16:19 +01:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						7dfad387e3 
					 
					
						
						
							
							llama: Add support for RWKV v7 architecture ( #12412 )  
						
						... 
						
						
						
						* ggml: Add op l2_norm
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* ggml: Add op rwkv_wkv7
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: Add support for RWKV7 and ARWKV7 models
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: fix inference with RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: add more (a)rwkv7 variants in size
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Apply code-format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* fix MUSA build
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: fix shape error with rwkv using llama-parallel
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
					 
					
						2025-03-18 07:27:50 +08:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						7841fc723e 
					 
					
						
						
							
							llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )  
						
						... 
						
						
						
						* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64 
						
						
					 
					
						2025-03-12 09:30:24 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						c43a3e7996 
					 
					
						
						
							
							llama : add Phi-4-mini support (supersede  #12099 ) ( #12108 )  
						
						... 
						
						
						
						* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com > 
						
						
					 
					
						2025-02-28 12:44:11 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						68ff663a04 
					 
					
						
						
							
							repo : update links to new url ( #11886 )  
						
						... 
						
						
						
						* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci 
						
						
					 
					
						2025-02-15 16:40:57 +02:00 
						 
				 
			
				
					
						
							
							
								piDack 
							
						 
					 
					
						
						
							
						
						0cec062a63 
					 
					
						
						
							
							llama : add support for GLM-Edge and GLM-Edge-V series models ( #10573 )  
						
						... 
						
						
						
						* add glm edge chat model
* use config partial_rotary_factor as rope ratio
* support for glm edge model
* vision model support
* remove debug info
* fix format
* llava.cpp trailing whitespace
* remove unused AutoTokenizer
* Update src/llama.cpp for not contain <|end|> or </s>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add edge template
* fix chat template
* fix confict
* fix confict
* fix ci err
* fix format err
* fix template err
* 9b hf chat support
* format
* format clip.cpp
* fix format
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/llava/clip.cpp
* fix format
* minor : style
---------
Co-authored-by: liyuhang <yuhang.li@zhipuai.cn >
Co-authored-by: piDack <pcdack@hotmail.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: liyuhang <yuhang.li@aminer.cn >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-02 09:48:46 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						ec7f3ac9ab 
					 
					
						
						
							
							llama : add support for Deepseek-R1-Qwen distill model ( #11310 )  
						
						... 
						
						
						
						* llama : add support for Deepseek-R1-Qwen distill model
* coding style 
						
						
					 
					
						2025-01-20 14:35:07 +01:00 
						 
				 
			
				
					
						
							
							
								RunningLeon 
							
						 
					 
					
						
						
							
						
						4dbc8b9cb7 
					 
					
						
						
							
							llama : add internlm3 support ( #11233 )  
						
						... 
						
						
						
						* support internlm3
* fix lint 
						
						
					 
					
						2025-01-16 20:10:38 +02:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						2739a71e4b 
					 
					
						
						
							
							convert : sort print supported models [no ci] ( #11179 )  
						
						... 
						
						
						
						This commit sorts the list of supported models when printing them out.
The motivation for this change is to make it easier to find a specific
model in the list of supported models. For example:
```console
$ ./convert_hf_to_gguf.py --print-supported-models
Supported models:
- ArcticForCausalLM
- BaiChuanForCausalLM
- BaichuanForCausalLM
- BertForMaskedLM
- BertModel
- BitnetForCausalLM
- BloomForCausalLM
- BloomModel
- CamembertModel
- ChameleonForCausalLM
- ChameleonForConditionalGeneration
- ChatGLMForConditionalGeneration
- ChatGLMModel
- CodeShellForCausalLM
- Cohere2ForCausalLM
- CohereForCausalLM
- DbrxForCausalLM
- DeciLMForCausalLM
- DeepseekForCausalLM
- DeepseekV2ForCausalLM
- DeepseekV3ForCausalLM
- ExaoneForCausalLM
- FalconForCausalLM
- FalconMambaForCausalLM
- GPT2LMHeadModel
- GPTBigCodeForCausalLM
- GPTNeoXForCausalLM
- GPTRefactForCausalLM
- Gemma2ForCausalLM
- GemmaForCausalLM
- GraniteForCausalLM
- GraniteMoeForCausalLM
- GrokForCausalLM
- InternLM2ForCausalLM
- JAISLMHeadModel
- JinaBertForMaskedLM
- JinaBertModel
- LLaMAForCausalLM
- LlamaForCausalLM
- LlavaStableLMEpochForCausalLM
- MPTForCausalLM
- MT5ForConditionalGeneration
- MambaForCausalLM
- MambaLMHeadModel
- MiniCPM3ForCausalLM
- MiniCPMForCausalLM
- MistralForCausalLM
- MixtralForCausalLM
- NemotronForCausalLM
- NomicBertModel
- OLMoForCausalLM
- Olmo2ForCausalLM
- OlmoForCausalLM
- OlmoeForCausalLM
- OpenELMForCausalLM
- OrionForCausalLM
- Phi3ForCausalLM
- PhiForCausalLM
- PhiMoEForCausalLM
- PlamoForCausalLM
- QWenLMHeadModel
- Qwen2ForCausalLM
- Qwen2MoeForCausalLM
- Qwen2VLForConditionalGeneration
- RWForCausalLM
- RWKV6Qwen2ForCausalLM
- RobertaModel
- Rwkv6ForCausalLM
- StableLMEpochForCausalLM
- StableLmForCausalLM
- Starcoder2ForCausalLM
- T5EncoderModel
- T5ForConditionalGeneration
- T5WithLMHeadModel
- UMT5ForConditionalGeneration
- WavTokenizerDec
- XLMRobertaForSequenceClassification
- XLMRobertaModel
- XverseForCausalLM
``` 
						
						
					 
					
						2025-01-11 05:50:33 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						ff3fcabc72 
					 
					
						
						
							
							convert : add --print-supported-models option ( #11172 )  
						
						... 
						
						
						
						* convert : add --print-supported-models option
This commit adds a new option to the convert_hf_to_gguf.py script to
print the supported models.
The motivation for this is that it can be useful to know which models
are supported by the script without having to look at the code.
Example usage:
```console
$ ./convert_hf_to_gguf.py --print-supported-models
Supported models:
- GPTNeoXForCausalLM
- BloomForCausalLM
- BloomModel
- MPTForCausalLM
- OrionForCausalLM
- BaichuanForCausalLM
- BaiChuanForCausalLM
- XverseForCausalLM
- FalconForCausalLM
- RWForCausalLM
- GPTBigCodeForCausalLM
- GPTRefactForCausalLM
- StableLmForCausalLM
- StableLMEpochForCausalLM
- LlavaStableLMEpochForCausalLM
- LLaMAForCausalLM
- LlamaForCausalLM
- MistralForCausalLM
- MixtralForCausalLM
- DeciLMForCausalLM
- BitnetForCausalLM
- GrokForCausalLM
- DbrxForCausalLM
- MiniCPMForCausalLM
- MiniCPM3ForCausalLM
- QWenLMHeadModel
- Qwen2ForCausalLM
- Qwen2VLForConditionalGeneration
- WavTokenizerDec
- Qwen2MoeForCausalLM
- GPT2LMHeadModel
- PhiForCausalLM
- Phi3ForCausalLM
- PhiMoEForCausalLM
- PlamoForCausalLM
- CodeShellForCausalLM
- InternLM2ForCausalLM
- BertModel
- BertForMaskedLM
- CamembertModel
- RobertaModel
- NomicBertModel
- XLMRobertaModel
- XLMRobertaForSequenceClassification
- GemmaForCausalLM
- Gemma2ForCausalLM
- Starcoder2ForCausalLM
- Rwkv6ForCausalLM
- RWKV6Qwen2ForCausalLM
- MambaForCausalLM
- MambaLMHeadModel
- FalconMambaForCausalLM
- CohereForCausalLM
- Cohere2ForCausalLM
- OLMoForCausalLM
- OlmoForCausalLM
- Olmo2ForCausalLM
- OlmoeForCausalLM
- JinaBertModel
- JinaBertForMaskedLM
- OpenELMForCausalLM
- ArcticForCausalLM
- DeepseekForCausalLM
- DeepseekV3ForCausalLM
- DeepseekV2ForCausalLM
- UMT5ForConditionalGeneration
- MT5ForConditionalGeneration
- T5ForConditionalGeneration
- T5WithLMHeadModel
- T5EncoderModel
- JAISLMHeadModel
- ChatGLMModel
- ChatGLMForConditionalGeneration
- NemotronForCausalLM
- ExaoneForCausalLM
- GraniteForCausalLM
- GraniteMoeForCausalLM
- ChameleonForCausalLM
- ChameleonForConditionalGeneration
```
* squash! convert : add --print-supported-models option
Fix flake8 error. 
						
						
					 
					
						2025-01-10 11:30:53 +01:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						ee7136c6d1 
					 
					
						
						
							
							llama: add support for QRWKV6 model architecture ( #11001 )  
						
						... 
						
						
						
						llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: compilade <git@compilade.net >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2025-01-10 09:58:08 +08:00 
						 
				 
			
				
					
						
							
							
								Pierrick Hymbert 
							
						 
					 
					
						
						
							
						
						f8feb4b01a 
					 
					
						
						
							
							model: Add support for PhiMoE arch ( #11003 )  
						
						... 
						
						
						
						* model: support phimoe
* python linter
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com >
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com >
* doc: add phimoe as supported model
ggml-ci
---------
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com > 
						
						
					 
					
						2025-01-09 11:21:41 +01:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						9394bbd484 
					 
					
						
						
							
							llama : Add support for DeepSeek V3 ( #11049 )  
						
						... 
						
						
						
						* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-01-04 21:06:11 +01:00 
						 
				 
			
				
					
						
							
							
								DAN™ 
							
						 
					 
					
						
						
							
						
						46be942214 
					 
					
						
						
							
							llama : add support for the cohere2 model architecture ( #10900 )  
						
						
						
						
					 
					
						2025-01-04 16:33:31 +02:00 
						 
				 
			
				
					
						
							
							
								ymcki 
							
						 
					 
					
						
						
							
						
						bc7b1f8632 
					 
					
						
						
							
							convert : fix Llama-3_1-Nemotron-51B rope settings ( #11008 )  
						
						... 
						
						
						
						* conflict resolution
* move comments after bracket to its own line
* DeciLMCausalModel now reads rope_theta from config.json properly 
						
						
					 
					
						2024-12-31 13:04:48 +02:00 
						 
				 
			
				
					
						
							
							
								Yun Dou 
							
						 
					 
					
						
						
							
						
						b92a14a841 
					 
					
						
						
							
							llama : support InfiniAI Megrez 3b ( #10893 )  
						
						... 
						
						
						
						* Support InfiniAI Megrez 3b
* Fix tokenizer_clean_spaces for megrez 
						
						
					 
					
						2024-12-23 01:35:44 +01:00 
						 
				 
			
				
					
						
							
							
								ymcki 
							
						 
					 
					
						
						
							
						
						6f0c9e034b 
					 
					
						
						
							
							llama : support for Llama-3_1-Nemotron-51B ( #10669 )  
						
						... 
						
						
						
						* conflict resolution
* move comments after bracket to its own line 
						
						
					 
					
						2024-12-23 01:22:33 +01:00 
						 
				 
			
				
					
						
							
							
								Billel Mokeddem 
							
						 
					 
					
						
						
							
						
						7ae33a616f 
					 
					
						
						
							
							llama : add Falcon3 support ( #10883 )  
						
						... 
						
						
						
						* Add Falcon3 model support
* Add fix for adding bos to added special tokens
* Add comment explaining the logic behind the if statement
* Add a log message to better track the when the following line of code is triggered
* Update log to only print when input and output characters are different
* Fix handling pre-normalized tokens
* Refactoring 
						
						
					 
					
						2024-12-23 00:09:58 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5cd85b5e00 
					 
					
						
						
							
							convert : add BertForMaskedLM ( #10919 )  
						
						
						
						
					 
					
						2024-12-21 10:10:18 +02:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						0a11f8b7b5 
					 
					
						
						
							
							convert : fix RWKV v6 model conversion ( #10913 )  
						
						... 
						
						
						
						* Enable --no-context-shift for llama-perplexity example
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV 6: Fix error in ggml_cuda_op_bin_bcast
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
					 
					
						2024-12-20 11:44:58 +02:00 
						 
				 
			
				
					
						
							
							
								Sukriti Sharma 
							
						 
					 
					
						
						
							
						
						2fffc52b50 
					 
					
						
						
							
							llama : fix Roberta embeddings ( #10856 )  
						
						... 
						
						
						
						* fix: Use gpt2 tokenizer for roberta and add eos/bos tokens
Branch: RobertaTokenizer
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fixes to position embeddings
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com >
* map roberta-bpe to gpt-2
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com >
* fix linting
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com >
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com >
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com > 
						
						
					 
					
						2024-12-19 15:04:51 +02:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						7585edbdeb 
					 
					
						
						
							
							convert : Add support for Microsoft Phi-4 model  ( #10817 )  
						
						... 
						
						
						
						* convert : use GPT2 vocab for Phi-4 model
* convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models
* llama : do not use sliding window attention mask for Phi-4 model
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2024-12-19 10:37:12 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0bf2d10c55 
					 
					
						
						
							
							tts : add OuteTTS support ( #10784 )  
						
						... 
						
						
						
						* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder 
						
						
					 
					
						2024-12-18 19:27:21 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						4da69d1abd 
					 
					
						
						
							
							Revert "llama : add Falcon3 support ( #10864 )" ( #10876 )  
						
						... 
						
						
						
						This reverts commit 382bc7f2e8 
						
						
					 
					
						2024-12-18 01:36:46 +01:00 
						 
				 
			
				
					
						
							
							
								Billel Mokeddem 
							
						 
					 
					
						
						
							
						
						382bc7f2e8 
					 
					
						
						
							
							llama : add Falcon3 support ( #10864 )  
						
						
						
						
					 
					
						2024-12-17 17:24:56 +02:00 
						 
				 
			
				
					
						
							
							
								Valentin Mamedov 
							
						 
					 
					
						
						
							
						
						a0974156f3 
					 
					
						
						
							
							llama : add Deepseek MoE v1 & GigaChat models ( #10827 )  
						
						... 
						
						
						
						* Add deepseek v1 arch & gigachat template
* improve template code
* add readme
* delete comments
* remove comment
* fix format
* lint llama.cpp
* fix order of deepseek and deepseek2, move gigachat temlate to the end of func
* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need
* remove comments
* move deepseek above deepseek2
* change placement of gigachat chat template 
						
						
					 
					
						2024-12-15 19:02:46 +02:00 
						 
				 
			
				
					
						
							
							
								HimariO 
							
						 
					 
					
						
						
							
						
						ba1cb19cdd 
					 
					
						
						
							
							llama : add Qwen2VL support + multimodal RoPE ( #10361 )  
						
						... 
						
						
						
						* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-12-14 14:43:46 +02:00 
						 
				 
			
				
					
						
							
							
								Robert Collins 
							
						 
					 
					
						
						
							
						
						62e84d9848 
					 
					
						
						
							
							llama : add 128k yarn context for Qwen ( #10698 )  
						
						... 
						
						
						
						* add 128k yarn context for Qwen
* added property for model tensors
* removing useless line 
						
						
					 
					
						2024-12-07 23:12:27 +02:00 
						 
				 
			
				
					
						
							
							
								Sukriti Sharma 
							
						 
					 
					
						
						
							
						
						784a14aa49 
					 
					
						
						
							
							convert : add support for Roberta embeddings ( #10695 )  
						
						
						
						
					 
					
						2024-12-07 09:02:14 +02:00 
						 
				 
			
				
					
						
							
							
								Riccardo Orlando 
							
						 
					 
					
						
						
							
						
						6fe6247831 
					 
					
						
						
							
							llama : add Minerva 7B model support ( #10673 )  
						
						... 
						
						
						
						* Support for Minerva 7B
* Update convert_hf_to_gguf_update.py 
						
						
					 
					
						2024-12-05 20:30:59 +02:00 
						 
				 
			
				
					
						
							
							
								JFLFY2255 
							
						 
					 
					
						
						
							
						
						8d0cfd554a 
					 
					
						
						
							
							llama: Support MiniCPM-1B (with & w/o longrope) ( #10559 )  
						
						
						
						
					 
					
						2024-12-04 11:42:50 +02:00 
						 
				 
			
				
					
						
							
							
								Shane A 
							
						 
					 
					
						
						
							
						
						80acb7b430 
					 
					
						
						
							
							Rename Olmo1124 to Olmo2 ( #10500 )  
						
						
						
						
					 
					
						2024-11-25 19:36:09 +01:00 
						 
				 
			
				
					
						
							
							
								Gabe Goodhart 
							
						 
					 
					
						
						
							
						
						9336db462c 
					 
					
						
						
							
							convert : XLMRoberta Type Vocab Size ( #10458 )  
						
						... 
						
						
						
						This matches the key in common bert-based embedding models and may have a
value other than 1 in it.
Branch: XLMRobertaTypeVocabSize
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com > 
						
						
					 
					
						2024-11-24 11:02:34 +02:00 
						 
				 
			
				
					
						
							
							
								Shane A 
							
						 
					 
					
						
						
							
						
						a88ad007de 
					 
					
						
						
							
							llama : add OLMo November 2024 support ( #10394 )  
						
						... 
						
						
						
						* Add OLMo November 2024 constants
* Add OLMo November 2024 converter
* Add loading of OLMo November 2024 tensors and hyper parameters
* Add building of OLMo November 2024 model 
						
						
					 
					
						2024-11-19 11:04:08 +02:00 
						 
				 
			
				
					
						
							
							
								Faisal Zaghloul 
							
						 
					 
					
						
						
							
						
						60e17ce23c 
					 
					
						
						
							
							Remove identical wte/etw logic for jais ( #10203 )  
						
						
						
						
					 
					
						2024-11-07 08:46:12 -08:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						7554aa4655 
					 
					
						
						
							
							convert-lora : make --base optional ( #10110 )  
						
						... 
						
						
						
						* convert-lora : make `--base` optional
* lint
* handle case where base_model_name_or_path is invalid
* do not include metadata from base model
* clarify unspecified --base
* add small comment [no ci]
* trigger ci 
						
						
					 
					
						2024-11-02 12:53:17 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						bc5ba007b2 
					 
					
						
						
							
							server : check that the prompt fits in the slot's context ( #10030 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-25 10:13:46 +03:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						11d47057a5 
					 
					
						
						
							
							Rwkv chat template fix ( #10001 )  
						
						... 
						
						
						
						* llama: remove useless template matching for rwkv-world
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* converter: Add comment about the hack for rwkv models
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-10-22 15:22:26 +02:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						4ff7fe1fb3 
					 
					
						
						
							
							llama : add chat template for RWKV-World + fix EOT ( #9968 )  
						
						... 
						
						
						
						* Add chat template for RWKV-World
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV: Fix the chat template not being used
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV v6: Set EOT token to ``\n\n``
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* readme: add rwkv into supported model list
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
					 
					
						2024-10-22 13:33:37 +03:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						1927378bcc 
					 
					
						
						
							
							convert : refactor rope_freqs generation ( #9396 )  
						
						... 
						
						
						
						* convert : refactor rope_freqs generation
This should also fix vocab-only conversion for Phi-3.
* convert : adapt MiniCPM3 to separate rope_freqs insertion
MiniCPM3's tokenizer is treated as a SentencePiece tokenizer to avoid
having to run its custom Python code which mixes tokenization
in the same file as tool calls.
gguf-py : add long and short RoPE factors to tensor mappings
Empty, but the key names are used to populate the mappings. 
						
						
					 
					
						2024-10-01 09:31:36 +03:00