Olivier Chafik 
							
						 
					 
					
						
						
							
						
						90f9b88afb 
					 
					
						
						
							
							nit: more informative crash when grammar sampler fails ( #11593 )  
						
						 
						
						
						
						
					 
					
						2025-02-02 19:58:34 +00:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								piDack 
							
						 
					 
					
						
						
							
						
						0cec062a63 
					 
					
						
						
							
							llama : add support for GLM-Edge and GLM-Edge-V series models ( #10573 )  
						
						 
						
						... 
						
						
						
						* add glm edge chat model
* use config partial_rotary_factor as rope ratio
* support for glm edge model
* vision model support
* remove debug info
* fix format
* llava.cpp trailing whitespace
* remove unused AutoTokenizer
* Update src/llama.cpp for not contain <|end|> or </s>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add edge template
* fix chat template
* fix confict
* fix confict
* fix ci err
* fix format err
* fix template err
* 9b hf chat support
* format
* format clip.cpp
* fix format
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/llava/clip.cpp
* fix format
* minor : style
---------
Co-authored-by: liyuhang <yuhang.li@zhipuai.cn >
Co-authored-by: piDack <pcdack@hotmail.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: liyuhang <yuhang.li@aminer.cn >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-02 09:48:46 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						8b576b6c55 
					 
					
						
						
							
							Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )  
						
						 
						
						... 
						
						
						
						---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-01-30 19:13:58 +00:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								mgroeber9110 
							
						 
					 
					
						
						
							
						
						ffd0821c57 
					 
					
						
						
							
							vocab : correctly identify LF token for GPT-2 style BPE tokenizer ( #11496 )  
						
						 
						
						
						
						
					 
					
						2025-01-30 12:10:59 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						325afb370a 
					 
					
						
						
							
							llama: fix missing k_cache store for rwkv6qwen2 ( #11445 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
					 
					
						2025-01-29 12:07:21 +08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								lexasub 
							
						 
					 
					
						
						
							
						
						a5203b4465 
					 
					
						
						
							
							llama : minor fixes for up llama load model speed ( #11448 )  
						
						 
						
						... 
						
						
						
						* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%
* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
* Update src/llama-vocab.cpp
---------
Co-authored-by: lexasub <empty@empty.ru >
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-27 14:42:09 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						df984e0147 
					 
					
						
						
							
							llama: refactor llama_decode_impl ( #11381 )  
						
						 
						
						
						
						
					 
					
						2025-01-27 12:07:12 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Frank Mai 
							
						 
					 
					
						
						
							
						
						1d8ee06000 
					 
					
						
						
							
							rpc: fix register position ( #11424 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: thxCode <thxcode0824@gmail.com > 
						
						
					 
					
						2025-01-26 16:20:34 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						 
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7 
* Add --jinja and --chat-template-file flags
* Add missing <optional> include
* Avoid print in get_hf_chat_template.py
* No designated initializers yet
* Try and work around msvc++ non-macro max resolution quirk
* Update test_chat_completion.py
* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
* Refactor test-chat-template
* Test templates w/ minja
* Fix deprecation
* Add --jinja to llama-run
* Update common_chat_format_example to use minja template wrapper
* Test chat_template in e2e test
* Update utils.py
* Update test_chat_completion.py
* Update run.cpp
* Update arg.cpp
* Refactor common_chat_* functions to accept minja template + use_jinja option
* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
* Revert LLAMA_CHATML_TEMPLATE refactor
* Normalize newlines in test-chat-templates for windows tests
* Forward decl minja::chat_template to avoid eager json dep
* Flush stdout in chat template before potential crash
* Fix copy elision warning
* Rm unused optional include
* Add missing optional include to server.cpp
* Disable jinja test that has a cryptic windows failure
* minja: fix vigogne (https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626 
* Update minja to https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Christopher Nielsen 
							
						 
					 
					
						
						
							
						
						90d987b105 
					 
					
						
						
							
							mmap: add include for cerrno ( #11296 )  
						
						 
						
						... 
						
						
						
						ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-01-20 16:02:43 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						ec7f3ac9ab 
					 
					
						
						
							
							llama : add support for Deepseek-R1-Qwen distill model ( #11310 )  
						
						 
						
						... 
						
						
						
						* llama : add support for Deepseek-R1-Qwen distill model
* coding style 
						
						
					 
					
						2025-01-20 14:35:07 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ef6dada60c 
					 
					
						
						
							
							cont : fix whitespaces ( #11305 )  
						
						 
						
						
						
						
					 
					
						2025-01-20 09:29:32 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Kyle Bruene 
							
						 
					 
					
						
						
							
						
						ae3c1db2f9 
					 
					
						
						
							
							llama : re-add LLM_ARCH_PHIMOE ( #11305 )  
						
						 
						
						... 
						
						
						
						Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor. 
						
						
					 
					
						2025-01-20 09:21:01 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4dd34ff831 
					 
					
						
						
							
							cmake : add sanitizer flags for llama.cpp ( #11279 )  
						
						 
						
						... 
						
						
						
						* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de > 
						
						
					 
					
						2025-01-18 16:18:15 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						667d72846c 
					 
					
						
						
							
							rpc : early register backend devices ( #11262 )  
						
						 
						
						... 
						
						
						
						Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609  
						
						
					 
					
						2025-01-17 10:57:09 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a133566d34 
					 
					
						
						
							
							vocab : fix double-eos check ( #11273 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-17 09:28:00 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						681149ced2 
					 
					
						
						
							
							llama : add llama_model_load_from_splits ( #11255 )  
						
						 
						
						... 
						
						
						
						* llama : add `llama_model_load_from_splits`
* update 
						
						
					 
					
						2025-01-16 13:54:08 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						432df2d5f9 
					 
					
						
						
							
							RoPE: fix back, CUDA support for back + noncont. ( #11240 )  
						
						 
						
						... 
						
						
						
						* RoPE: fix back, CUDA support for back + noncont.
* fix comments reg. non-cont. RoPE support [no-ci] 
						
						
					 
					
						2025-01-15 12:51:37 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						bbf3e55e35 
					 
					
						
						
							
							vocab : add dummy tokens for "no_vocab" type ( #11231 )  
						
						 
						
						... 
						
						
						
						* vocab : add dummy tokens for "no_vocab" type
ggml-ci
* vocab : minor [no ci] 
						
						
					 
					
						2025-01-14 11:54:58 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						8f70fc3d1b 
					 
					
						
						
							
							llama : remove 'd' from bad special token log ( #11212 )  
						
						 
						
						... 
						
						
						
						This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
    'tokenizer.ggml.image_token_id' = 128256d, using default id -1
``` 
						
						
					 
					
						2025-01-13 13:38:20 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						9a483999a6 
					 
					
						
						
							
							llama : fix chat template gguf key ( #11201 )  
						
						 
						
						
						
						
					 
					
						2025-01-12 13:45:14 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						08f10f69c3 
					 
					
						
						
							
							llama : remove notion of CLS token ( #11064 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-12 12:15:53 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						afa8a9ec9b 
					 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming ( #11110 )  
						
						 
						
						... 
						
						
						
						* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-12 11:32:42 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						ee7136c6d1 
					 
					
						
						
							
							llama: add support for QRWKV6 model architecture ( #11001 )  
						
						 
						
						... 
						
						
						
						llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: compilade <git@compilade.net >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2025-01-10 09:58:08 +08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Pierrick Hymbert 
							
						 
					 
					
						
						
							
						
						f8feb4b01a 
					 
					
						
						
							
							model: Add support for PhiMoE arch ( #11003 )  
						
						 
						
						... 
						
						
						
						* model: support phimoe
* python linter
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com >
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com >
* doc: add phimoe as supported model
ggml-ci
---------
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com > 
						
						
					 
					
						2025-01-09 11:21:41 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						d9feae1c06 
					 
					
						
						
							
							llama-chat : add phi 4 template ( #11148 )  
						
						 
						
						
						
						
					 
					
						2025-01-09 10:07:33 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						4d2b3d8804 
					 
					
						
						
							
							lora : improve compat with mergekit-extract-lora ( #11131 )  
						
						 
						
						... 
						
						
						
						* (wip) support mergekit-extracted lora
* support mergekit-extract-lora
* use lora->get_scale
* correct comment
* correct norm name & condition
* add some hints 
						
						
					 
					
						2025-01-08 15:59:53 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c07d437bbd 
					 
					
						
						
							
							llama : avoid hardcoded QK_K ( #11061 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-08 16:19:36 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						53ff6b9b9f 
					 
					
						
						
							
							GGUF: C++ refactor, backend support, misc fixes ( #11030 )  
						
						 
						
						... 
						
						
						
						* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types 
						
						
					 
					
						2025-01-07 18:01:58 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ecebbd292d 
					 
					
						
						
							
							llama : remove unused headers ( #11109 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 17:52:35 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						09186fabbe 
					 
					
						
						
							
							llama : remove check flash_attn with lora ( #11104 )  
						
						 
						
						
						
						
					 
					
						2025-01-06 13:41:12 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Asghar Ghorbani 
							
						 
					 
					
						
						
							
						
						96a1dc27c3 
					 
					
						
						
							
							llama : prevent system info string accumulation across calls ( #11101 )  
						
						 
						
						
						
						
					 
					
						2025-01-06 13:21:46 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						6369f867a4 
					 
					
						
						
							
							llama : rename missed batch params/vars to ubatch ( #10059 )  
						
						 
						
						... 
						
						
						
						This commit renames the `batch` parameter to `ubatch` in the
`llama_kv_cache_find_slot`, `llm_build_inp_embd`, and
`llm_build_mamba` functions.
The motivation for this is that this should have been done as part of
Commit 19d900a756  ("llama : rename batch
to ubatch (#9950 )") but for some reason I missed these functions in
that commit and only noticed them now (sorry). 
						
						
					 
					
						2025-01-06 11:28:17 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						47182dd03f 
					 
					
						
						
							
							llama : update llama_model API names ( #11063 )  
						
						 
						
						... 
						
						
						
						* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
						
						
					 
					
						2025-01-06 10:55:18 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ae2f606bb5 
					 
					
						
						
							
							mmap : fix fileno macro clash ( #11076 )  
						
						 
						
						... 
						
						
						
						* mmap : fix fileno macro clash
ggml-ci
* cont
ggml-ci 
						
						
					 
					
						2025-01-06 10:52:38 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						727368c60f 
					 
					
						
						
							
							llama : use LLAMA_TOKEN_NULL ( #11062 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 10:52:15 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5047dd3546 
					 
					
						
						
							
							llama : use _impl suffix instead of _internal ( #11060 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 10:52:01 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								fairydreaming 
							
						 
					 
					
						
						
							
						
						9394bbd484 
					 
					
						
						
							
							llama : Add support for DeepSeek V3 ( #11049 )  
						
						 
						
						... 
						
						
						
						* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-01-04 21:06:11 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								DAN™ 
							
						 
					 
					
						
						
							
						
						46be942214 
					 
					
						
						
							
							llama : add support for the cohere2 model architecture ( #10900 )  
						
						 
						
						
						
						
					 
					
						2025-01-04 16:33:31 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f66f582927 
					 
					
						
						
							
							llama : refactor src/llama.cpp ( #10902 )  
						
						 
						
						... 
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 
					
						2025-01-03 10:18:53 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						30caac3a68 
					 
					
						
						
							
							llama : the WPM vocabs use the CLS token as BOS ( #10930 )  
						
						 
						
						... 
						
						
						
						* llama : the WPM vocabs use the CLS token as BOS
ggml-ci
* llama : add comment 
						
						
					 
					
						2024-12-24 09:44:20 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Yun Dou 
							
						 
					 
					
						
						
							
						
						b92a14a841 
					 
					
						
						
							
							llama : support InfiniAI Megrez 3b ( #10893 )  
						
						 
						
						... 
						
						
						
						* Support InfiniAI Megrez 3b
* Fix tokenizer_clean_spaces for megrez 
						
						
					 
					
						2024-12-23 01:35:44 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								ymcki 
							
						 
					 
					
						
						
							
						
						6f0c9e034b 
					 
					
						
						
							
							llama : support for Llama-3_1-Nemotron-51B ( #10669 )  
						
						 
						
						... 
						
						
						
						* conflict resolution
* move comments after bracket to its own line 
						
						
					 
					
						2024-12-23 01:22:33 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Billel Mokeddem 
							
						 
					 
					
						
						
							
						
						7ae33a616f 
					 
					
						
						
							
							llama : add Falcon3 support ( #10883 )  
						
						 
						
						... 
						
						
						
						* Add Falcon3 model support
* Add fix for adding bos to added special tokens
* Add comment explaining the logic behind the if statement
* Add a log message to better track the when the following line of code is triggered
* Update log to only print when input and output characters are different
* Fix handling pre-normalized tokens
* Refactoring 
						
						
					 
					
						2024-12-23 00:09:58 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5cab3e4aaa 
					 
					
						
						
							
							llama : minor grammar refactor ( #10897 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-12-19 17:42:13 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sukriti Sharma 
							
						 
					 
					
						
						
							
						
						2fffc52b50 
					 
					
						
						
							
							llama : fix Roberta embeddings ( #10856 )  
						
						 
						
						... 
						
						
						
						* fix: Use gpt2 tokenizer for roberta and add eos/bos tokens
Branch: RobertaTokenizer
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
* fixes to position embeddings
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com >
* map roberta-bpe to gpt-2
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com >
* fix linting
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com >
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com >
Co-authored-by: Gabe Goodhart <ghart@us.ibm.com > 
						
						
					 
					
						2024-12-19 15:04:51 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								fairydreaming 
							
						 
					 
					
						
						
							
						
						7585edbdeb 
					 
					
						
						
							
							convert : Add support for Microsoft Phi-4 model  ( #10817 )  
						
						 
						
						... 
						
						
						
						* convert : use GPT2 vocab for Phi-4 model
* convert : use null value of sliding_window to distinguish Phi-4 from other PHI3-based models
* llama : do not use sliding window attention mask for Phi-4 model
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2024-12-19 10:37:12 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0bf2d10c55 
					 
					
						
						
							
							tts : add OuteTTS support ( #10784 )  
						
						 
						
						... 
						
						
						
						* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder 
						
						
					 
					
						2024-12-18 19:27:21 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						4da69d1abd 
					 
					
						
						
							
							Revert "llama : add Falcon3 support ( #10864 )" ( #10876 )  
						
						 
						
						... 
						
						
						
						This reverts commit 382bc7f2e8 . 
						
						
					 
					
						2024-12-18 01:36:46 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								DAN™ 
							
						 
					 
					
						
						
							
						
						d62b532c52 
					 
					
						
						
							
							Use model->gguf_kv for loading the template instead of using the C API. ( #10868 )  
						
						 
						
						... 
						
						
						
						* Bump model_template to 16384 bytes to support larger chat templates.
* Use `model->gguf_kv` for efficiency. 
						
						
					 
					
						2024-12-17 23:24:22 +01:00