Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e298d2fbd0 
					 
					
						
						
							
							kv-cache : add SWA support ( #13194 )  
						
						... 
						
						
						
						* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci 
						
						
					 
					
						2025-05-20 08:05:46 +03:00 
						 
				 
			
				
					
						
							
							
								psocolovsky 
							
						 
					 
					
						
						
							
						
						1dfbf2cf3a 
					 
					
						
						
							
							common : add load_progress_callback ( #13617 )  
						
						
						
						
					 
					
						2025-05-19 21:17:36 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						3198405e98 
					 
					
						
						
							
							common: add partial regex support (#12808 )  
						
						... 
						
						
						
						* move string_find_partial_stop & string_ends_with to common
* add common_regex (supports partial matches)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* partial regex: add missing iterator end checks
* string utils: use string_views
* direct throw to avoid ggml.h include
* regex-partial: replace missed ggml_asserts
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-05-14 19:50:57 +01:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						10d2af0eaa 
					 
					
						
						
							
							llama/ggml: add LLM training support ( #10544 )  
						
						... 
						
						
						
						* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period 
						
						
					 
					
						2025-05-12 14:44:49 +02:00 
						 
				 
			
				
					
						
							
							
								David Huang 
							
						 
					 
					
						
						
							
						
						7f323a589f 
					 
					
						
						
							
							Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B ( #13386 )  
						
						
						
						
					 
					
						2025-05-11 14:18:39 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						51fb96b1ff 
					 
					
						
						
							
							context : remove logits_all flag ( #13284 )  
						
						... 
						
						
						
						* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci 
						
						
					 
					
						2025-05-08 14:26:50 +03:00 
						 
				 
			
				
					
						
							
							
								Prajwal B Mehendarkar 
							
						 
					 
					
						
						
							
						
						bc091a4dc5 
					 
					
						
						
							
							common : Define cache directory on AIX ( #12915 )  
						
						
						
						
					 
					
						2025-04-12 17:33:39 +02:00 
						 
				 
			
				
					
						
							
							
								yuri@FreeBSD 
							
						 
					 
					
						
						
							
						
						68b08f36d0 
					 
					
						
						
							
							common : Define cache directory on FreeBSD ( #12892 )  
						
						
						
						
					 
					
						2025-04-11 21:45:44 +02:00 
						 
				 
			
				
					
						
							
							
								tastelikefeet 
							
						 
					 
					
						
						
							
						
						b2034c2b55 
					 
					
						
						
							
							contrib: support modelscope community ( #12664 )  
						
						... 
						
						
						
						* support download from modelscope
* support login
* remove comments
* add arguments
* fix code
* fix win32
* test passed
* fix readme
* revert readme
* change to MODEL_ENDPOINT
* revert tail line
* fix readme
* refactor model endpoint
* remove blank line
* fix header
* fix as comments
* update comment
* update readme
---------
Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com> 
						
						
					 
					
						2025-04-11 14:01:56 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						e0e912f49b 
					 
					
						
						
							
							llama : add option to override model tensor buffers ( #11397 )  
						
						... 
						
						
						
						* llama : add option to override tensor buffers
* ggml : fix possible underflow in ggml_nbytes 
						
						
					 
					
						2025-04-02 14:52:01 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						42eb248f46 
					 
					
						
						
							
							common : remove json.hpp from common.cpp ( #12697 )  
						
						... 
						
						
						
						* common : remove json.hpp from common.cpp
* fix comment 
						
						
					 
					
						2025-04-02 09:58:34 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						267c1399f1 
					 
					
						
						
							
							common : refactor downloading system, handle mmproj with -hf option ( #12694 )  
						
						... 
						
						
						
						* (wip) refactor downloading system [no ci]
* fix all examples
* fix mmproj with -hf
* gemma3: update readme
* only handle mmproj in llava example
* fix multi-shard download
* windows: fix problem with std::min and std::max
* fix 2 
						
						
					 
					
						2025-04-01 23:44:05 +02:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						8fcb563613 
					 
					
						
						
							
							Load all MoE experts during warmup ( #11571 )  
						
						... 
						
						
						
						* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup
* common : use new API to enable warmup mode during model warmup
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-03-14 13:47:05 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e0dbec0bc6 
					 
					
						
						
							
							llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )  
						
						... 
						
						
						
						* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci 
						
						
					 
					
						2025-03-13 12:35:44 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						669912d9a5 
					 
					
						
						
							
							tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )  
						
						... 
						
						
						
						* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-03-05 13:05:13 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						63e489c025 
					 
					
						
						
							
							tool-call: refactor common chat / tool-call api (+ tests / fixes) ( #11900 )  
						
						... 
						
						
						
						* tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type
* addressed clang-tidy lints in [test-]chat.*
* rm minja deps from util & common & move it to common/minja/
* add name & tool_call_id to common_chat_msg
* add common_chat_tool
* added json <-> tools, msgs conversions to chat.h
* fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens)
* fix deepseek r1 slow test (no longer <think> opening w/ new template)
* allow empty tools w/ auto + grammar
* fix & test server grammar & json_schema params w/ & w/o --jinja 
						
						
					 
					
						2025-02-18 18:03:23 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						9f4cc8f8d3 
					 
					
						
						
							
							sync: minja (#11641 )  
						
						... 
						
						
						
						* `sync`: minja
182de30cdahttps://github.com/google/minja/pull/46 
https://github.com/google/minja/pull/45  
						
						
					 
					
						2025-02-05 01:00:12 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						cde3833239 
					 
					
						
						
							
							tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616 )  
						
						... 
						
						
						
						* tool-call: allow `--jinja --chat-template chatml`
* fix double bos issue (drop bos/eos tokens from jinja template)
* add missing try catch around jinja parsing to default to chatml
* Simplify default chatml logic 
						
						
					 
					
						2025-02-03 23:49:27 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						8b576b6c55 
					 
					
						
						
							
							Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )  
						
						... 
						
						
						
						---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-01-30 19:13:58 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						a94f3b2727 
					 
					
						
						
							
							common: utils to split / join / repeat strings (from json converter) (#11342 )  
						
						... 
						
						
						
						* Factor string_join, string_split, string_repeat into common
* json: refactor to surface a versatile builder
* Update common.cpp 
						
						
					 
					
						2025-01-22 09:51:44 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						667d72846c 
					 
					
						
						
							
							rpc : early register backend devices ( #11262 )  
						
						... 
						
						
						
						Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609  
						
						
					 
					
						2025-01-17 10:57:09 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						00b4c3da62 
					 
					
						
						
							
							common : support tag-based --hf-repo like on ollama ( #11195 )  
						
						... 
						
						
						
						* common : support tag-based hf_repo like on ollama
* fix build
* various fixes
* small fixes
* fix style
* fix windows build?
* move common_get_hf_file to common.cpp
* fix complain with noreturn 
						
						
					 
					
						2025-01-13 13:56:23 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						9a483999a6 
					 
					
						
						
							
							llama : fix chat template gguf key ( #11201 )  
						
						
						
						
					 
					
						2025-01-12 13:45:14 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						afa8a9ec9b 
					 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming ( #11110 )  
						
						... 
						
						
						
						* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-12 11:32:42 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						53ff6b9b9f 
					 
					
						
						
							
							GGUF: C++ refactor, backend support, misc fixes ( #11030 )  
						
						... 
						
						
						
						* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types 
						
						
					 
					
						2025-01-07 18:01:58 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						47182dd03f 
					 
					
						
						
							
							llama : update llama_model API names ( #11063 )  
						
						... 
						
						
						
						* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci 
						
						
					 
					
						2025-01-06 10:55:18 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						727368c60f 
					 
					
						
						
							
							llama : use LLAMA_TOKEN_NULL ( #11062 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 10:52:15 +02:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						4b0c638b9a 
					 
					
						
						
							
							common : disable KV cache shifting automatically for unsupported models ( #11053 )  
						
						... 
						
						
						
						* Disable KV cache shifting automatically for unsupported models
instead of exiting directly
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-03 14:13:18 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f66f582927 
					 
					
						
						
							
							llama : refactor src/llama.cpp ( #10902 )  
						
						... 
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 
					
						2025-01-03 10:18:53 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						45095a61bf 
					 
					
						
						
							
							server : clean up built-in template detection ( #11026 )  
						
						... 
						
						
						
						* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
						
						
					 
					
						2024-12-31 15:22:01 +01:00 
						 
				 
			
				
					
						
							
							
								Peter 
							
						 
					 
					
						
						
							
						
						6e1531aca5 
					 
					
						
						
							
							common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON ( #11013 )  
						
						... 
						
						
						
						In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)
In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)
In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned) 
						
						
					 
					
						2024-12-31 01:46:06 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0bf2d10c55 
					 
					
						
						
							
							tts : add OuteTTS support ( #10784 )  
						
						... 
						
						
						
						* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder 
						
						
					 
					
						2024-12-18 19:27:21 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						152610eda9 
					 
					
						
						
							
							server : output embeddings for all tokens when pooling = none ( #10861 )  
						
						... 
						
						
						
						* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-12-18 13:01:41 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						644fd71b44 
					 
					
						
						
							
							sampling : refactor + optimize penalties sampler ( #10803 )  
						
						... 
						
						
						
						* sampling : refactor + optimize penalties sampler
ggml-ci
* common : apply ignore_eos as logit bias
ggml-ci
* batched : remove penalties sampler
* params : allow penalty_last_n == -1 to be equal to context size
ggml-ci
* common : by default, move the penalties at the end of the sampling chain
ggml-ci
* common : ignore all EOG tokens
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* common : move back the penalties at the front of the sampling chain
ggml-ci
* readme : restore hint about --ignore-eos flag [no ci]
* llama : minor
ggml-ci
* webui : update
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2024-12-16 12:31:14 +02:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						c27ac678dd 
					 
					
						
						
							
							Opt class for positional argument handling ( #10508 )  
						
						... 
						
						
						
						Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:
  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf 
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2024-12-13 19:34:25 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						adffa6ffd5 
					 
					
						
						
							
							common : improve -ctv -ctk CLI arguments ( #10806 )  
						
						... 
						
						
						
						* common : improve ctv ctk cli argument
* regenerate docs
* even better approach
* use std::vector 
						
						
					 
					
						2024-12-12 22:53:05 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						7cc2d2c889 
					 
					
						
						
							
							ggml : move AMX to the CPU backend ( #10570 )  
						
						... 
						
						
						
						* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-11-29 21:54:58 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						9f912511bc 
					 
					
						
						
							
							common : fix duplicated file name with hf_repo and hf_file ( #10550 )  
						
						
						
						
					 
					
						2024-11-27 22:30:52 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						10bce0450f 
					 
					
						
						
							
							llama : accept a list of devices to use to offload a model ( #10497 )  
						
						... 
						
						
						
						* llama : accept a list of devices to use to offload a model
* accept `--dev none` to completely disable offloading
* fix dev list with dl backends
* rename env parameter to LLAMA_ARG_DEVICE for consistency 
						
						
					 
					
						2024-11-25 19:30:06 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						5931c1f233 
					 
					
						
						
							
							ggml : add support for dynamic loading of backends ( #10469 )  
						
						... 
						
						
						
						* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-11-25 15:13:39 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d9d54e498d 
					 
					
						
						
							
							speculative : refactor and add a simpler example ( #10362 )  
						
						... 
						
						
						
						* speculative : refactor and add a simpler example
ggml-ci
* speculative : clean-up and add comments and TODOs [no ci]
* speculative : manage context in common_speculative
ggml-ci
* speculative : simplify
ggml-ci
* speculative : simplify (cont)
ggml-ci
* speculative : add --draft-min CLI arg
* speculative : minor fixup
* make : build fixes
* speculative : do not redraft previous drafts
ggml-ci
* speculative : fix the draft sampling
ggml-ci
* speculative : fix compile warning
* common : refactor args
ggml-ci
* common : change defaults [no ci]
* common : final touches
ggml-ci 
						
						
					 
					
						2024-11-25 09:58:41 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8e752a777b 
					 
					
						
						
							
							llama : add check for KV cache shifts ( #10401 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-11-19 13:29:26 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						4e54be0ec6 
					 
					
						
						
							
							llama/ex: remove --logdir argument ( #10339 )  
						
						
						
						
					 
					
						2024-11-16 23:00:41 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						ae8de6d50a 
					 
					
						
						
							
							ggml : build backends as libraries ( #10256 )  
						
						... 
						
						
						
						* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com > 
						
						
					 
					
						2024-11-14 18:04:35 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5c333e0140 
					 
					
						
						
							
							metal : add BF16 support ( #8439 )  
						
						... 
						
						
						
						* ggml : add initial BF16 support
ggml-ci
* metal : add mul_mat_id BF16 support
ggml-ci
* metal : check for bfloat support on the Metal device
ggml-ci
* metal : better var names [no ci]
* metal : do not build bfloat kernels when not supported
ggml-ci
* metal : try to fix BF16 support check
ggml-ci
* metal : this should correctly check bfloat support 
						
						
					 
					
						2024-11-06 19:53:51 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						9f40989351 
					 
					
						
						
							
							ggml : move CPU backend to a separate file ( #10144 )  
						
						
						
						
					 
					
						2024-11-03 19:34:08 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8d8ff71536 
					 
					
						
						
							
							llama : remove Tail-Free sampling ( #10071 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-29 10:42:05 +02:00 
						 
				 
			
				
					
						
							
							
								wwoodsTM 
							
						 
					 
					
						
						
							
						
						ff252ea48e 
					 
					
						
						
							
							llama : add DRY sampler ( #9702 )  
						
						... 
						
						
						
						* sampling : add DRY sampler (post-refactor)
* DRY: Trying to fix coauthors, removed unneeded line
* DRY: Fixed redundant code
* DRY: Fixed crash issue due to DRY being in chain but uninitialized
---------
Co-authored-by: l3utterfly <gc.pthzfoldr@gmail.com >
Co-authored-by: pi6am <34464159+pi6am@users.noreply.github.com > 
						
						
					 
					
						2024-10-25 19:07:34 +03:00 
						 
				 
			
				
					
						
							
							
								Michael Podvitskiy 
							
						 
					 
					
						
						
							
						
						d80fb71f8b 
					 
					
						
						
							
							llama: string_split fix ( #10022 )  
						
						... 
						
						
						
						* llama: Refactor string_split to use template specialization,  fixes parsing strings with spaces
* llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string 
						
						
					 
					
						2024-10-25 17:57:54 +02:00