Diego Devesa 
							
						 
					 
					
						
						
							
						
						15e03282bb 
					 
					
						
						
							
							ci : limit write permission to only the release step + fixes ( #13392 )  
						
						... 
						
						
						
						* ci : limit write permission to only the release step
* fix win cuda file name
* fix license file copy on multi-config generators 
						
						
					 
					
						2025-05-08 23:45:22 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						8c83449cb7 
					 
					
						
						
							
							server : (webui) revamp the input area, plus many small UI improvements ( #13365 )  
						
						... 
						
						
						
						* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log 
						
						
					 
					
						2025-05-08 15:37:29 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						51fb96b1ff 
					 
					
						
						
							
							context : remove logits_all flag ( #13284 )  
						
						... 
						
						
						
						* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci 
						
						
					 
					
						2025-05-08 14:26:50 +03:00 
						 
				 
			
				
					
						
							
							
								Ycros 
							
						 
					 
					
						
						
							
						
						39e73ae0d6 
					 
					
						
						
							
							common : Add a warning when we can't match samplers from a string or char. ( #13330 )  
						
						
						
						
					 
					
						2025-05-07 11:23:28 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4773d7a02f 
					 
					
						
						
							
							examples : remove infill ( #13283 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-07 10:28:02 +03:00 
						 
				 
			
				
					
						
							
							
								oobabooga 
							
						 
					 
					
						
						
							
						
						233461f812 
					 
					
						
						
							
							sampling : Integrate Top-nσ into main sampling chain (and add it to the server) ( #13264 )  
						
						... 
						
						
						
						* sampling: add Top-nσ sampler to `llama-server` and sampler ordering
* revert: sampler ordering
* revert: VS' crappy auto-formatting
* revert: VS' crappy auto-formatting pt.2
* revert: my crappy eye sight...
* sampling: add XTC to Top-nσ sampler chain
* sampling: add Dyna. Temp. to Top-nσ sampler chain
* sampling: actually remove Top-nσ from sampler(oops)
* Integrate top_n_sigma into main sampler chain
* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA
* Formatting
* Lint
* Exit early in the sampler if nsigma < 0
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com > 
						
						
					 
					
						2025-05-05 22:12:19 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						9b61acf060 
					 
					
						
						
							
							mtmd : rename llava directory to mtmd ( #13311 )  
						
						... 
						
						
						
						* mv llava to mtmd
* change ref everywhere 
						
						
					 
					
						2025-05-05 16:02:55 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						1d36b3670b 
					 
					
						
						
							
							llama : move end-user examples to tools directory ( #13249 )  
						
						... 
						
						
						
						* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-05-02 20:27:13 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						fab647e884 
					 
					
						
						
							
							server : add cache reuse card link to help ( #13230 )  
						
						... 
						
						
						
						* server : add cache reuse card link to help
* args : use short url 
						
						
					 
					
						2025-05-02 09:48:31 +03:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						d7a14c42a1 
					 
					
						
						
							
							build : fix build info on windows ( #13239 )  
						
						... 
						
						
						
						* build : fix build info on windows
* fix cuda host compiler msg 
						
						
					 
					
						2025-05-01 21:48:08 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						13c9a3319b 
					 
					
						
						
							
							arg : remove CURLINFO_EFFECTIVE_METHOD ( #13228 )  
						
						
						
						
					 
					
						2025-05-01 10:23:25 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						6f67cf1f48 
					 
					
						
						
							
							arg : -hf do not fail if url mismatch ( #13219 )  
						
						... 
						
						
						
						* arg : -hf do not fail if url mismatch
* do not return if cannot parse metadata json 
						
						
					 
					
						2025-04-30 21:29:15 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						3b127c7385 
					 
					
						
						
							
							common : add -jf / --json-schema-file flag ( #12011 )  
						
						
						
						
					 
					
						2025-04-30 14:52:35 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						5933e6fdc9 
					 
					
						
						
							
							arg : allow using -hf offline ( #13202 )  
						
						... 
						
						
						
						* arg : allow using -hf offline
* add more comments in code [no ci] 
						
						
					 
					
						2025-04-30 10:46:32 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						43f2b07193 
					 
					
						
						
							
							common : fix noreturn compile warning ( #13151 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-04-28 11:57:19 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						85f36e5e71 
					 
					
						
						
							
							arg : fix unused variable ( #13142 )  
						
						
						
						
					 
					
						2025-04-28 08:16:59 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						2d451c8059 
					 
					
						
						
							
							common : add common_remote_get_content ( #13123 )  
						
						... 
						
						
						
						* common : add common_remote_get_content
* support max size and timeout
* add tests 
						
						
					 
					
						2025-04-26 22:58:12 +02:00 
						 
				 
			
				
					
						
							
							
								frob 
							
						 
					 
					
						
						
							
						
						d5fe4e81bd 
					 
					
						
						
							
							grammar : handle maxItems == 0 in JSON schema ( #13117 )  
						
						... 
						
						
						
						Co-authored-by: Richard Lyons <frob@cloudstaff.com > 
						
						
					 
					
						2025-04-26 10:10:20 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						13b4548877 
					 
					
						
						
							
							cmake : do not include ./src as public for libllama ( #13062 )  
						
						... 
						
						
						
						* cmake : do not include ./src as public for libllama
ggml-ci
* cmake : rework tests
ggml-ci
* llguidance : remove unicode include
ggml-ci
* cmake : make c++17 private
ggml-ci 
						
						
					 
					
						2025-04-24 16:00:10 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						7c727fbe39 
					 
					
						
						
							
							arg : add --no-mmproj-offload ( #13093 )  
						
						... 
						
						
						
						* arg : add --no-mmproj-offload
* Update common/arg.cpp 
						
						
					 
					
						2025-04-24 14:04:14 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						80982e815e 
					 
					
						
						
							
							arg : clean up handling --mmproj with -hf ( #13082 )  
						
						... 
						
						
						
						* arg : clean up handling --mmproj with -hf
* rm change about no_mmproj
* Revert "rm change about no_mmproj"
This reverts commit 2cac8e0efb 
						
						
					 
					
						2025-04-24 12:14:13 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						243453533e 
					 
					
						
						
							
							llava : update documentations ( #13055 )  
						
						... 
						
						
						
						* llava : update documentations
* fix typo 
						
						
					 
					
						2025-04-22 10:37:00 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						84a9bf2fc2 
					 
					
						
						
							
							mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli ( #13012 )  
						
						... 
						
						
						
						* mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli`
* support for minicpmv
* remove cpp files of llava and minicpmv
* update hot topics
* mtmd : add not supported msg for qwen2vl
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-04-21 15:32:58 +02:00 
						 
				 
			
				
					
						
							
							
								Prajwal B Mehendarkar 
							
						 
					 
					
						
						
							
						
						bc091a4dc5 
					 
					
						
						
							
							common : Define cache directory on AIX ( #12915 )  
						
						
						
						
					 
					
						2025-04-12 17:33:39 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						b6930ebc42 
					 
					
						
						
							
							tool-call: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900 )  
						
						... 
						
						
						
						* `tool-call`: don't call common_chat_params_init_hermes_2_pro when there aren't tools (or when there's a schema)
* test all chat formats w/o tools 
						
						
					 
					
						2025-04-11 21:47:52 +02:00 
						 
				 
			
				
					
						
							
							
								yuri@FreeBSD 
							
						 
					 
					
						
						
							
						
						68b08f36d0 
					 
					
						
						
							
							common : Define cache directory on FreeBSD ( #12892 )  
						
						
						
						
					 
					
						2025-04-11 21:45:44 +02:00 
						 
				 
			
				
					
						
							
							
								tastelikefeet 
							
						 
					 
					
						
						
							
						
						b2034c2b55 
					 
					
						
						
							
							contrib: support modelscope community ( #12664 )  
						
						... 
						
						
						
						* support download from modelscope
* support login
* remove comments
* add arguments
* fix code
* fix win32
* test passed
* fix readme
* revert readme
* change to MODEL_ENDPOINT
* revert tail line
* fix readme
* refactor model endpoint
* remove blank line
* fix header
* fix as comments
* update comment
* update readme
---------
Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com> 
						
						
					 
					
						2025-04-11 14:01:56 +02:00 
						 
				 
			
				
					
						
							
							
								Prajwal B Mehendarkar 
							
						 
					 
					
						
						
							
						
						1d343b4069 
					 
					
						
						
							
							arg : Including limits file on AIX ( #12822 )  
						
						
						
						
					 
					
						2025-04-08 14:30:59 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						bd3f59f812 
					 
					
						
						
							
							cmake : enable curl by default ( #12761 )  
						
						... 
						
						
						
						* cmake : enable curl by default
* no curl if no examples
* fix build
* fix build-linux-cross
* add windows-setup-curl
* fix
* shell
* fix path
* fix windows-latest-cmake*
* run: include_directories
* LLAMA_RUN_EXTRA_LIBS
* sycl: no llama_curl
* no test-arg-parser on windows
* clarification
* try riscv64 / arm64
* windows: include libcurl inside release binary
* add msg
* fix mac / ios / android build
* will this fix xcode?
* try clearing the cache
* add bunch of licenses
* revert clear cache
* fix xcode
* fix xcode (2)
* fix typo 
						
						
					 
					
						2025-04-07 13:35:19 +02:00 
						 
				 
			
				
					
						
							
							
								Sergey Fedorov 
							
						 
					 
					
						
						
							
						
						f1e3eb4249 
					 
					
						
						
							
							common : fix includes in arg.cpp and gemma3-cli.cpp ( #12766 )  
						
						... 
						
						
						
						* arg.cpp: add a missing include
* gemma3-cli.cpp: fix cinttypes include 
						
						
					 
					
						2025-04-05 17:46:00 +02:00 
						 
				 
			
				
					
						
							
							
								エシュナヴァリシア 
							
						 
					 
					
						
						
							
						
						c6ff5d2a8d 
					 
					
						
						
							
							common: custom hf endpoint support ( #12769 )  
						
						... 
						
						
						
						* common: custom hf endpoint support
Add support for custom huggingface endpoints via HF_ENDPOINT environment variable
You can now specify a custom huggingface endpoint using the HF_ENDPOINT environment variable when using the --hf-repo flag, which works similarly to huggingface-cli's endpoint configuration.
Example usage:
HF_ENDPOINT=https://hf-mirror.com/  ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is"
The trailing slash in the URL is optional:
HF_ENDPOINT=https://hf-mirror.com  ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is"
* Update common/arg.cpp
readability Improvement
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Apply suggestions from code review
---------
Co-authored-by: ベアトリーチェ <148695646+MakiSonomura@users.noreply.github.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-04-05 15:31:42 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						7a84777f42 
					 
					
						
						
							
							sync: minja ( #12739 )  
						
						... 
						
						
						
						* sync: minja
https://github.com/google/minja/pull/57 
* fix json include 
						
						
					 
					
						2025-04-04 21:16:39 +01:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						5f696e88e0 
					 
					
						
						
							
							sync : minja (inclusionAI/Ling) and update tests ( #12699 )  
						
						... 
						
						
						
						Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
					 
					
						2025-04-03 13:51:35 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						e0e912f49b 
					 
					
						
						
							
							llama : add option to override model tensor buffers ( #11397 )  
						
						... 
						
						
						
						* llama : add option to override tensor buffers
* ggml : fix possible underflow in ggml_nbytes 
						
						
					 
					
						2025-04-02 14:52:01 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						42eb248f46 
					 
					
						
						
							
							common : remove json.hpp from common.cpp ( #12697 )  
						
						... 
						
						
						
						* common : remove json.hpp from common.cpp
* fix comment 
						
						
					 
					
						2025-04-02 09:58:34 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						267c1399f1 
					 
					
						
						
							
							common : refactor downloading system, handle mmproj with -hf option ( #12694 )  
						
						... 
						
						
						
						* (wip) refactor downloading system [no ci]
* fix all examples
* fix mmproj with -hf
* gemma3: update readme
* only handle mmproj in llava example
* fix multi-shard download
* windows: fix problem with std::min and std::max
* fix 2 
						
						
					 
					
						2025-04-01 23:44:05 +02:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						a6f32f0b34 
					 
					
						
						
							
							Fix clang warning in gguf_check_reserved_keys ( #12686 )  
						
						... 
						
						
						
						* Fix clang warning in gguf_check_reserved_keys
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Fix typo
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
					 
					
						2025-04-01 13:12:53 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						dd373dd3bf 
					 
					
						
						
							
							llama: fix error on bad grammar ( #12628 )  
						
						
						
						
					 
					
						2025-03-28 18:08:52 +01:00 
						 
				 
			
				
					
						
							
							
								Piotr 
							
						 
					 
					
						
						
							
						
						2099a9d5db 
					 
					
						
						
							
							server : Support listening on a unix socket ( #12613 )  
						
						... 
						
						
						
						* server : Bump cpp-httplib to include AF_UNIX windows support
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
* server : Allow running the server example on a unix socket
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
---------
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com > 
						
						
					 
					
						2025-03-27 23:41:04 +01:00 
						 
				 
			
				
					
						
							
							
								Michał Moskal 
							
						 
					 
					
						
						
							
						
						2447ad8a98 
					 
					
						
						
							
							upgrade to llguidance 0.7.10 ( #12576 )  
						
						
						
						
					 
					
						2025-03-26 11:06:09 -07:00 
						 
				 
			
				
					
						
							
							
								marcoStocchi 
							
						 
					 
					
						
						
							
						
						f4c3dd5daa 
					 
					
						
						
							
							llama-tts : add '-o' option ( #12398 )  
						
						... 
						
						
						
						* added -o option to specify an output file name
* llama-tts returns ENOENT in case of file write error
note : PR #12042  is closed as superseded with this one. 
						
						
					 
					
						2025-03-15 17:23:11 +01:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						774973b8f3 
					 
					
						
						
							
							main : add -sysf / --system-prompt-file ( #12249 ) ( #12250 )  
						
						... 
						
						
						
						* add system_prompt_file
* add -sysf / --system-prompt-file
* remove system_prompt_file 
						
						
					 
					
						2025-03-14 16:57:05 +01:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						8fcb563613 
					 
					
						
						
							
							Load all MoE experts during warmup ( #11571 )  
						
						... 
						
						
						
						* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup
* common : use new API to enable warmup mode during model warmup
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
					 
					
						2025-03-14 13:47:05 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						be7c303410 
					 
					
						
						
							
							arg : no n_predict = -2 for examples except for main and infill ( #12364 )  
						
						
						
						
					 
					
						2025-03-13 12:34:54 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e0dbec0bc6 
					 
					
						
						
							
							llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )  
						
						... 
						
						
						
						* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci 
						
						
					 
					
						2025-03-13 12:35:44 +02:00 
						 
				 
			
				
					
						
							
							
								marcoStocchi 
							
						 
					 
					
						
						
							
						
						6ef79a67ca 
					 
					
						
						
							
							common : refactor '-o' option ( #12278 )  
						
						... 
						
						
						
						As discussed in PR 'llama-tts : add -o option' (#12042 ):
* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.
* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program. 
						
						
					 
					
						2025-03-10 13:34:13 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						4e39a3c332 
					 
					
						
						
							
							server: extract <think> tags from qwq outputs (#12297 )  
						
						... 
						
						
						
						* extract <think> tags from qwq outputs
* const for all static regexes in chat.cpp 
						
						
					 
					
						2025-03-10 10:59:03 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						87c2630546 
					 
					
						
						
							
							allow missing content in message if tool_calls provided ( #12293 )  
						
						
						
						
					 
					
						2025-03-10 09:45:07 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1e2f78a004 
					 
					
						
						
							
							server : add speculative decoding presets for FIM ( #12287 )  
						
						
						
						
					 
					
						2025-03-09 19:08:20 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						7cf64f6bee 
					 
					
						
						
							
							sync: minja - support QwQ-32B ( #12235 )  
						
						... 
						
						
						
						8a76f7815e 
					
						2025-03-07 09:33:37 +00:00