HighDoping 
							
						 
					 
					
						
						
							
						
						953c2a62cf 
					 
					
						
						
							
							model : restore support for T5Encoder ( #12590 )  
						
						
						
						
					 
					
						2025-03-27 11:43:33 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						fbdfefe74e 
					 
					
						
						
							
							llama : gemma3 : use output tensor if it exists in model weight ( #12506 )  
						
						... 
						
						
						
						* llama : gemma3 : use output tensor if it exists in model weight
* also add to the llm_tensor_names 
						
						
					 
					
						2025-03-22 23:28:19 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						af04481e6b 
					 
					
						
						
							
							model : do not repack if a GPU device is present ( #12498 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-21 16:14:29 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						960e726077 
					 
					
						
						
							
							chore : cleanup llama_model_loader::TENSOR_ usage ( #12492 )  
						
						
						
						
					 
					
						2025-03-21 10:21:36 +01:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						dbb3a4739e 
					 
					
						
						
							
							llama : make Qwen2MoE QKV bias optional ( #12477 )  
						
						
						
						
					 
					
						2025-03-20 12:49:59 +01:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						108e53c2f1 
					 
					
						
						
							
							llama : add support for GPT2, Bloom and CodeShell tied word embeddings ( #12456 )  
						
						... 
						
						
						
						* Add support for GPT2, Bloom and CodeShell tied word embeddings
* Deduplicate tied word embeddings weights
* Workaround for incorrect weight map
It appears transformer.wte.weight is in the weight map even though the weights are not there, remove it if output weights are encountered first.
* check++
* fatfingers-- 
						
						
					 
					
						2025-03-19 09:08:49 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						75422e8bc4 
					 
					
						
						
							
							graph : normalize Q, K, V shapes + sync cross attention ( #12449 )  
						
						... 
						
						
						
						* graph : normalize Q, K, V shapes and add comments
ggml-ci
* context : synchronize before getting cross attention data
* model : fix command-r attention norm check 
						
						
					 
					
						2025-03-18 21:35:19 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						99aa304fb9 
					 
					
						
						
							
							llama : add support for EXAONE tied word embeddings ( #12451 )  
						
						
						
						
					 
					
						2025-03-18 17:24:33 +01:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						7dfad387e3 
					 
					
						
						
							
							llama: Add support for RWKV v7 architecture ( #12412 )  
						
						... 
						
						
						
						* ggml: Add op l2_norm
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* ggml: Add op rwkv_wkv7
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: Add support for RWKV7 and ARWKV7 models
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: fix inference with RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: add more (a)rwkv7 variants in size
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Apply code-format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* fix MUSA build
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* llama: fix shape error with rwkv using llama-parallel
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
					 
					
						2025-03-18 07:27:50 +08:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						8ba95dca20 
					 
					
						
						
							
							llama : fix OLMo-2-0325-32B-Instruct K-norm size ( #12400 )  
						
						
						
						
					 
					
						2025-03-16 19:46:36 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c522ce4143 
					 
					
						
						
							
							graph : simplify attn input build for unified KV cache ( #12381 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-14 10:47:44 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						081bee8c64 
					 
					
						
						
							
							hparams : add SWA rope parameters ( #12374 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-14 09:03:24 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						84d5475541 
					 
					
						
						
							
							llama : fix Gemma3 SWA KV cache shift ( #12373 )  
						
						... 
						
						
						
						* llama : fix Gemma3 SWA KV cache shift
ggml-ci
* hparams : add comment [no ci] 
						
						
					 
					
						2025-03-13 19:08:07 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e0dbec0bc6 
					 
					
						
						
							
							llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )  
						
						... 
						
						
						
						* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci 
						
						
					 
					
						2025-03-13 12:35:44 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						7841fc723e 
					 
					
						
						
							
							llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )  
						
						... 
						
						
						
						* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64 
						
						
					 
					
						2025-03-12 09:30:24 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						c43a3e7996 
					 
					
						
						
							
							llama : add Phi-4-mini support (supersede  #12099 ) ( #12108 )  
						
						... 
						
						
						
						* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com > 
						
						
					 
					
						2025-02-28 12:44:11 +01:00 
						 
				 
			
				
					
						
							
							
								Vitali Lovich 
							
						 
					 
					
						
						
							
						
						3e9a2860e9 
					 
					
						
						
							
							llama : expose llama_model_n_head_kv in the API ( #11997 )  
						
						... 
						
						
						
						It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed). 
						
						
					 
					
						2025-02-25 11:29:33 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						51f311e057 
					 
					
						
						
							
							llama : skip loading unused tensors ( #12004 )  
						
						... 
						
						
						
						* llama : assign unknown/unused tensors to host buffer type
ggml-ci
* llama : skip unused tensors
ggml-ci 
						
						
					 
					
						2025-02-21 18:33:18 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						bdcf8b6a56 
					 
					
						
						
							
							cont : fix mmap flag print ( #11699 )  
						
						
						
						
					 
					
						2025-02-08 16:49:38 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						9dd7a0390f 
					 
					
						
						
							
							llama : add log about loading model tensors ( #11699 )  
						
						
						
						
					 
					
						2025-02-06 13:41:37 +02:00 
						 
				 
			
				
					
						
							
							
								piDack 
							
						 
					 
					
						
						
							
						
						0cec062a63 
					 
					
						
						
							
							llama : add support for GLM-Edge and GLM-Edge-V series models ( #10573 )  
						
						... 
						
						
						
						* add glm edge chat model
* use config partial_rotary_factor as rope ratio
* support for glm edge model
* vision model support
* remove debug info
* fix format
* llava.cpp trailing whitespace
* remove unused AutoTokenizer
* Update src/llama.cpp for not contain <|end|> or </s>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add edge template
* fix chat template
* fix confict
* fix confict
* fix ci err
* fix format err
* fix template err
* 9b hf chat support
* format
* format clip.cpp
* fix format
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/llava/clip.cpp
* fix format
* minor : style
---------
Co-authored-by: liyuhang <yuhang.li@zhipuai.cn >
Co-authored-by: piDack <pcdack@hotmail.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: liyuhang <yuhang.li@aminer.cn >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-02 09:48:46 +02:00 
						 
				 
			
				
					
						
							
							
								Frank Mai 
							
						 
					 
					
						
						
							
						
						1d8ee06000 
					 
					
						
						
							
							rpc: fix register position ( #11424 )  
						
						... 
						
						
						
						Signed-off-by: thxCode <thxcode0824@gmail.com > 
						
						
					 
					
						2025-01-26 16:20:34 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ef6dada60c 
					 
					
						
						
							
							cont : fix whitespaces ( #11305 )  
						
						
						
						
					 
					
						2025-01-20 09:29:32 +02:00 
						 
				 
			
				
					
						
							
							
								Kyle Bruene 
							
						 
					 
					
						
						
							
						
						ae3c1db2f9 
					 
					
						
						
							
							llama : re-add LLM_ARCH_PHIMOE ( #11305 )  
						
						... 
						
						
						
						Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor. 
						
						
					 
					
						2025-01-20 09:21:01 +02:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						667d72846c 
					 
					
						
						
							
							rpc : early register backend devices ( #11262 )  
						
						... 
						
						
						
						Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609  
						
						
					 
					
						2025-01-17 10:57:09 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						afa8a9ec9b 
					 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming ( #11110 )  
						
						... 
						
						
						
						* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-12 11:32:42 +02:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						ee7136c6d1 
					 
					
						
						
							
							llama: add support for QRWKV6 model architecture ( #11001 )  
						
						... 
						
						
						
						llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: compilade <git@compilade.net >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: compilade <git@compilade.net > 
						
						
					 
					
						2025-01-10 09:58:08 +08:00 
						 
				 
			
				
					
						
							
							
								Pierrick Hymbert 
							
						 
					 
					
						
						
							
						
						f8feb4b01a 
					 
					
						
						
							
							model: Add support for PhiMoE arch ( #11003 )  
						
						... 
						
						
						
						* model: support phimoe
* python linter
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com >
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com >
* doc: add phimoe as supported model
ggml-ci
---------
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com > 
						
						
					 
					
						2025-01-09 11:21:41 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						47182dd03f 
					 
					
						
						
							
							llama : update llama_model API names ( #11063 )  
						
						... 
						
						
						
						* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
						
						
					 
					
						2025-01-06 10:55:18 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						727368c60f 
					 
					
						
						
							
							llama : use LLAMA_TOKEN_NULL ( #11062 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 10:52:15 +02:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						9394bbd484 
					 
					
						
						
							
							llama : Add support for DeepSeek V3 ( #11049 )  
						
						... 
						
						
						
						* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-01-04 21:06:11 +01:00 
						 
				 
			
				
					
						
							
							
								DAN™ 
							
						 
					 
					
						
						
							
						
						46be942214 
					 
					
						
						
							
							llama : add support for the cohere2 model architecture ( #10900 )  
						
						
						
						
					 
					
						2025-01-04 16:33:31 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f66f582927 
					 
					
						
						
							
							llama : refactor src/llama.cpp ( #10902 )  
						
						... 
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 
					
						2025-01-03 10:18:53 +02:00