Ruikai Peng 
							
						 
					 
					
						
						
							
						
						dd6e6d0b6a 
					 
					
						
						
							
							vocab : prevent tokenizer overflow ( #14301 )  
						
						... 
						
						
						
						* vocab : prevent stack overflow in tokenize
* vocab : return error instead of aborting on oversized token count
* vocab : INT32_MIN from llama_tokenize on overflow 
						
						
					 
					
						2025-06-20 07:13:06 -07:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						88fc854b4b 
					 
					
						
						
							
							llama : improve sep token handling ( #14272 )  
						
						
						
						
					 
					
						2025-06-20 14:04:09 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						89fea80d29 
					 
					
						
						
							
							server : fix incorrect usage of llama_get_embeddings() ( #14225 )  
						
						... 
						
						
						
						* server : fix incorrect usage of llama_get_embeddings()
ggml-ci
* cont : fix the fix
ggml-ci 
						
						
					 
					
						2025-06-16 22:33:27 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d3e64b9f49 
					 
					
						
						
							
							llama : rework embeddings logic ( #14208 )  
						
						... 
						
						
						
						* llama : rework embeddings logic
ggml-ci
* cont : fix rerank
ggml-ci
* cont : engrish [no ci]
* cont : fix rerank
ggml-ci
* server : support both embeddings and completions with single model
ggml-ci
* cont : avoid embeddings_org
ggml-ci 
						
						
					 
					
						2025-06-16 14:14:00 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b9912ac570 
					 
					
						
						
							
							batch : auto-gen positions + verify multi-sequence input ( #14177 )  
						
						... 
						
						
						
						* batch : verify multi-sequence input batches
ggml-ci
* cont : auto-gen positions + verify multi-seq input
ggml-ci
* cont : first print debug info, then perform validation
ggml-ci
* cont : fix position auto-gen + add comments
ggml-ci 
						
						
					 
					
						2025-06-15 09:18:37 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						745aa5319b 
					 
					
						
						
							
							llama : deprecate llama_kv_self_ API ( #14030 )  
						
						... 
						
						
						
						* llama : deprecate llama_kv_self_ API
ggml-ci
* llama : allow llama_memory_(nullptr)
ggml-ci
* memory : add flag for optional data clear in llama_memory_clear
ggml-ci 
						
						
					 
					
						2025-06-06 14:11:15 +03:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						d17a809ef0 
					 
					
						
						
							
							llama : support multiple classifier outputs and labels ( #13940 )  
						
						
						
						
					 
					
						2025-06-06 09:03:25 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						7f37b6cf1e 
					 
					
						
						
							
							memory : migrate from llama_kv_cache to more generic llama_memory ( #14006 )  
						
						... 
						
						
						
						* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API
ggml-ci
* context : fix casts
ggml-ci 
						
						
					 
					
						2025-06-05 15:29:22 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						803f8baf4f 
					 
					
						
						
							
							llama : deprecate explicit kv_self defrag/update calls ( #13921 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-31 15:58:33 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						3600cc2886 
					 
					
						
						
							
							llama : use n_swa + n_ubatch cells for SWA cache ( #13833 )  
						
						... 
						
						
						
						* llama : use n_swa + n_ubatch cells for SWA cache
ggml-ci
* llama : add warning about multi-sqeuence SWA contexts 
						
						
					 
					
						2025-05-31 15:57:44 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						12d0188c0d 
					 
					
						
						
							
							kv-cache : refactor + add llama_memory_state_i ( #13746 )  
						
						... 
						
						
						
						* kv-cache : simplify the "struct llama_kv_cache" interface
ggml-ci
* kv-cache : revert the (n_swa + n_ubatch) change (for next PR)
ggml-ci
* kv-cache : some comments
ggml-ci
* context : fix graph reserve for multiple sequences
ggml-ci
* kv-cache : fix typo [no ci]
* kv-cache : fix find_slot() logic for free slots
ggml-ci
* llama : add TODO for deprecating the defrag API in the future
* kv-cache : improve find_slot() using min/max seq pos info
ggml-ci
* llama : handle aborts and compute errors
ggml-ci
* memory : extract state into llama_memory_state
ggml-ci
* kv-cache : add comments
ggml-ci
* server : update batching logic to reset n_batch on successful decode
* server : upon full re-processing, remove the sequence from the cache
* kv-cache : add TODO for doing split_equal when split_simple fails
ggml-ci 
						
						
					 
					
						2025-05-31 10:24:04 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						22229314fc 
					 
					
						
						
							
							llama : clarify deprecation message ( #13794 )  
						
						
						
						
					 
					
						2025-05-26 12:57:50 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						de2ef53a4b 
					 
					
						
						
							
							kv-cache : rework kv_cell ( #13706 )  
						
						... 
						
						
						
						* kv-cache : rework kv_cell
ggml-ci
* kv-cells : use "shift" instead of "delta" consistently
ggml-ci
* llama : add llama_max_parallel_sequences()
ggml-ci
* kv-cells : update comments [no ci]
* context : fail upon construction if sequences exceed max value
ggml-ci
* kv-cells : get_pos() -> pos_get() + comments
ggml-ci
* kv-cells : fix tracking of "used" cells
ggml-ci 
						
						
					 
					
						2025-05-25 16:34:36 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						797f2ac062 
					 
					
						
						
							
							kv-cache : simplify the interface ( #13660 )  
						
						... 
						
						
						
						* kv-cache : simplify the interface
ggml-ci
* context : revert llama_batch_allocr position change
ggml-ci 
						
						
					 
					
						2025-05-21 15:11:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a4090d1174 
					 
					
						
						
							
							llama : remove llama_kv_cache_view API + remove deprecated ( #13653 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-20 16:13:16 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e298d2fbd0 
					 
					
						
						
							
							kv-cache : add SWA support ( #13194 )  
						
						... 
						
						
						
						* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci 
						
						
					 
					
						2025-05-20 08:05:46 +03:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						cf0a43bb64 
					 
					
						
						
							
							llama-bench : add defrag-thold, check for invalid ranges ( #13487 )  
						
						
						
						
					 
					
						2025-05-13 00:31:37 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						10d2af0eaa 
					 
					
						
						
							
							llama/ggml: add LLM training support ( #10544 )  
						
						... 
						
						
						
						* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period 
						
						
					 
					
						2025-05-12 14:44:49 +02:00 
						 
				 
			
				
					
						
							
							
								David Huang 
							
						 
					 
					
						
						
							
						
						7f323a589f 
					 
					
						
						
							
							Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B ( #13386 )  
						
						
						
						
					 
					
						2025-05-11 14:18:39 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						d2a4ef05c6 
					 
					
						
						
							
							vocab : add ByteDance-Seed/Seed-Coder ( #13423 )  
						
						
						
						
					 
					
						2025-05-10 22:08:07 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6562e5a4d6 
					 
					
						
						
							
							context : allow cache-less context for embeddings ( #13108 )  
						
						... 
						
						
						
						* context : allow cache-less context for embeddings
ggml-ci
* context : enable reranking with encode()
ggml-ci
* context : encode() clears embd_seq
ggml-ci
* examples : use llama_encode() when appropriate
ggml-ci
* models : nomic bert moe does not require KV cache
* llama : update comments for llama_decode/llama_encode
ggml-ci
* context : update warning log [no ci] 
						
						
					 
					
						2025-05-08 14:28:33 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						51fb96b1ff 
					 
					
						
						
							
							context : remove logits_all flag ( #13284 )  
						
						... 
						
						
						
						* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci 
						
						
					 
					
						2025-05-08 14:26:50 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d9d398f84f 
					 
					
						
						
							
							sampling : when top-k <= 0 -> noop ( #13173 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-04-29 20:22:57 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						ecda2ec4b3 
					 
					
						
						
							
							mtmd : Support Pixtral 12B ( #13065 )  
						
						... 
						
						
						
						* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script 
						
						
					 
					
						2025-04-23 20:21:59 +02:00 
						 
				 
			
				
					
						
							
							
								Ed Addario 
							
						 
					 
					
						
						
							
						
						71e90e8813 
					 
					
						
						
							
							quantize: Handle user-defined quantization levels for additional tensors ( #12511 )  
						
						... 
						
						
						
						* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Minor refactoring as per the contributors' coding guidelines
* Update descriptions to match existing style
* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Minor refactoring as per the contributors' guidelines
* Implement general --tensor-type instead of tensor-specific command option
* Fix implied type bug
* Restore missing #includes
* Add regex capability for tensor selection
* Refactor function name and update ALLOWED_TENSOR_TYPE
* Add missing #include
* Handle edge case when tensor name is cls.output
* Minor logging improvement 
						
						
					 
					
						2025-04-13 21:29:28 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						1466621e73 
					 
					
						
						
							
							llama : Support llama 4 text-only ( #12791 )  
						
						... 
						
						
						
						* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build 
						
						
					 
					
						2025-04-07 23:06:44 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						e0e912f49b 
					 
					
						
						
							
							llama : add option to override model tensor buffers ( #11397 )  
						
						... 
						
						
						
						* llama : add option to override tensor buffers
* ggml : fix possible underflow in ggml_nbytes 
						
						
					 
					
						2025-04-02 14:52:01 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2c3f8b850a 
					 
					
						
						
							
							llama : support BailingMoE (Ling) ( #12634 )  
						
						
						
						
					 
					
						2025-03-30 22:21:03 +02:00 
						 
				 
			
				
					
						
							
							
								Juyoung Suk 
							
						 
					 
					
						
						
							
						
						b3de7cac73 
					 
					
						
						
							
							llama : add Trillion 7B model support ( #12556 )  
						
						... 
						
						
						
						* Support Trillion 7B
* Update llama.h
* Update llama.h
* Update llama-vocab.cpp for Trillion
* Update llama-vocab.cpp 
						
						
					 
					
						2025-03-30 20:38:33 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						dd373dd3bf 
					 
					
						
						
							
							llama: fix error on bad grammar ( #12628 )  
						
						
						
						
					 
					
						2025-03-28 18:08:52 +01:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						00d53800e0 
					 
					
						
						
							
							llama-vocab : add SuperBPE pre-tokenizer ( #12532 )  
						
						
						
						
					 
					
						2025-03-24 11:47:24 +01:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						8fcb563613 
					 
					
						
						
							
							Load all MoE experts during warmup ( #11571 )  
						
						... 
						
						
						
						* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup
* common : use new API to enable warmup mode during model warmup
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-03-14 13:47:05 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e0dbec0bc6 
					 
					
						
						
							
							llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )  
						
						... 
						
						
						
						* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci 
						
						
					 
					
						2025-03-13 12:35:44 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						669912d9a5 
					 
					
						
						
							
							tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )  
						
						... 
						
						
						
						* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-03-05 13:05:13 +00:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						c43a3e7996 
					 
					
						
						
							
							llama : add Phi-4-mini support (supersede  #12099 ) ( #12108 )  
						
						... 
						
						
						
						* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com > 
						
						
					 
					
						2025-02-28 12:44:11 +01:00 
						 
				 
			
				
					
						
							
							
								Vitali Lovich 
							
						 
					 
					
						
						
							
						
						3e9a2860e9 
					 
					
						
						
							
							llama : expose llama_model_n_head_kv in the API ( #11997 )  
						
						... 
						
						
						
						It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed). 
						
						
					 
					
						2025-02-25 11:29:33 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						68ff663a04 
					 
					
						
						
							
							repo : update links to new url ( #11886 )  
						
						... 
						
						
						
						* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci 
						
						
					 
					
						2025-02-15 16:40:57 +02:00 
						 
				 
			
				
					
						
							
							
								Vinesh Janarthanan 
							
						 
					 
					
						
						
							
						
						27e8a23300 
					 
					
						
						
							
							sampling: add Top-nσ sampler ( #11223 )  
						
						... 
						
						
						
						* initial sampling changes:
* completed top nsigma sampler implementation
* apply parameter to only llama-cli
* updated readme
* added tests and fixed nsigma impl
* cleaned up pr
* format
* format
* format
* removed commented tests
* cleanup pr and remove explicit floats
* added top-k sampler to improve performance
* changed sigma to float
* fixed string format to float
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* added llama_sampler_init
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-13 08:45:57 +02:00 
						 
				 
			
				
					
						
							
							
								Christian Fillion 
							
						 
					 
					
						
						
							
						
						7ee953a64a 
					 
					
						
						
							
							llama : add llama_sampler_init for safe usage of llama_sampler_free ( #11727 )  
						
						... 
						
						
						
						The C API in llama.h claims users can implement `llama_sampler_i` to
create custom `llama_sampler`. The sampler chain takes ownership and
calls `llama_sampler_free` on them. However, `llama_sampler_free` is
hard-coded to use `delete`. This is undefined behavior if the object
wasn't also allocated via `new` from libllama's C++ runtime. Callers
in C and C-compatible languages do not use C++'s `new` operator. C++
callers may not be sharing the same heap as libllama. 
						
						
					 
					
						2025-02-07 11:33:27 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						8b576b6c55 
					 
					
						
						
							
							Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )  
						
						... 
						
						
						
						---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-01-30 19:13:58 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						667d72846c 
					 
					
						
						
							
							rpc : early register backend devices ( #11262 )  
						
						... 
						
						
						
						Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609  
						
						
					 
					
						2025-01-17 10:57:09 +02:00 
						 
				 
			
				
					
						
							
							
								David Renshaw 
							
						 
					 
					
						
						
							
						
						960ec65273 
					 
					
						
						
							
							llama : fix deprecation message: vocabable -> vocab ( #11269 )  
						
						
						
						
					 
					
						2025-01-17 08:12:01 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						681149ced2 
					 
					
						
						
							
							llama : add llama_model_load_from_splits ( #11255 )  
						
						... 
						
						
						
						* llama : add `llama_model_load_from_splits`
* update 
						
						
					 
					
						2025-01-16 13:54:08 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						08f10f69c3 
					 
					
						
						
							
							llama : remove notion of CLS token ( #11064 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-12 12:15:53 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						afa8a9ec9b 
					 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming ( #11110 )  
						
						... 
						
						
						
						* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-12 11:32:42 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						47182dd03f 
					 
					
						
						
							
							llama : update llama_model API names ( #11063 )  
						
						... 
						
						
						
						* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
						
						
					 
					
						2025-01-06 10:55:18 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						727368c60f 
					 
					
						
						
							
							llama : use LLAMA_TOKEN_NULL ( #11062 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 10:52:15 +02:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						9394bbd484 
					 
					
						
						
							
							llama : Add support for DeepSeek V3 ( #11049 )  
						
						... 
						
						
						
						* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-01-04 21:06:11 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f66f582927 
					 
					
						
						
							
							llama : refactor src/llama.cpp ( #10902 )  
						
						... 
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 
					
						2025-01-03 10:18:53 +02:00