Olivier Chafik 
							
						 
					 
					
						
						
							
						
						d785f9c1fd 
					 
					
						
						
							
							server: fix/test add_generation_prompt ( #13770 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: ochafik <ochafik@google.com > 
						
						
					 
					
						2025-05-25 10:45:49 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						f5cd27b71d 
					 
					
						
						
							
							server: streaming of tool calls and thoughts when --jinja is on (#12379 )  
						
						 
						
						... 
						
						
						
						* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com > 
						
						
					 
					
						2025-05-25 01:48:08 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						9ecf3e66a3 
					 
					
						
						
							
							server : support audio input ( #13714 )  
						
						 
						
						... 
						
						
						
						* server : support audio input
* add audio support on webui 
						
						
					 
					
						2025-05-23 11:03:47 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						797990c4bc 
					 
					
						
						
							
							mtmd : add ultravox audio input ( #13623 )  
						
						 
						
						... 
						
						
						
						* convert ok, load ok
* warmup ok
* test
* still does not work?
* fix padding
* temporary give up
* fix merge conflict
* build_ultravox()
* rm test
* fix merge conflict
* add necessary mtmd APIs
* first working version (only 4s of audio)
* will this monster compile?
* fix compile
* please compile
* fPIC
* fix windows
* various fixes
* clean up audio_helpers
* fix conversion
* add some debug stuff
* long audio input ok
* adapt the api
* add --audio arg
* final touch UX
* add miniaudio to readme
* fix typo
* refactor kv metadata
* mtmd_default_marker() 
						
						
					 
					
						2025-05-22 20:42:48 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						cc74d5be99 
					 
					
						
						
							
							server : pad small embedding batches ( #13692 )  
						
						 
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-22 16:33:39 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5fbfe384d4 
					 
					
						
						
							
							server : improve error reporting ( #13680 )  
						
						 
						
						
						
						
					 
					
						2025-05-21 19:46:56 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Robin Davidsson 
							
						 
					 
					
						
						
							
						
						0d5c742161 
					 
					
						
						
							
							server : Add the endpoints /api/tags and /api/chat ( #13659 )  
						
						 
						
						... 
						
						
						
						* Add the endpoints /api/tags and /api/chat
Add the endpoints /api/tags and /api/chat, and improved the model metadata response
* Remove trailing whitespaces
* Removed code that is not needed for copilot to work. 
						
						
					 
					
						2025-05-21 15:15:27 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Dorin-Andrei Geman 
							
						 
					 
					
						
						
							
						
						42158ae2e8 
					 
					
						
						
							
							server : fix first message identification ( #13634 )  
						
						 
						
						... 
						
						
						
						* server : fix first message identification
When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626 ) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message.
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
* server : Fix checks for first role message for stream=True
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
---------
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com > 
						
						
					 
					
						2025-05-21 15:07:57 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						797f2ac062 
					 
					
						
						
							
							kv-cache : simplify the interface ( #13660 )  
						
						 
						
						... 
						
						
						
						* kv-cache : simplify the interface
ggml-ci
* context : revert llama_batch_allocr position change
ggml-ci 
						
						
					 
					
						2025-05-21 15:11:13 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e298d2fbd0 
					 
					
						
						
							
							kv-cache : add SWA support ( #13194 )  
						
						 
						
						... 
						
						
						
						* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci 
						
						
					 
					
						2025-05-20 08:05:46 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Isaac McFadyen 
							
						 
					 
					
						
						
							
						
						6a2bc8bfb7 
					 
					
						
						
							
							server : added --no-prefill-assistant flag ( #13608 )  
						
						 
						
						... 
						
						
						
						* added no-prefill-assistant flag
* reworded documentation comment
* updated server README.md 
						
						
					 
					
						2025-05-17 23:59:48 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						6aa892ec2a 
					 
					
						
						
							
							server : do not return error out of context (with ctx shift disabled) ( #13577 )  
						
						 
						
						
						
						
					 
					
						2025-05-16 21:50:00 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						aea9f8b4e7 
					 
					
						
						
							
							webui : improve accessibility for visually impaired people ( #13551 )  
						
						 
						
						... 
						
						
						
						* webui : improve accessibility for visually impaired people
* add a11y for extra contents
* fix some labels being read twice
* add skip to main content 
						
						
					 
					
						2025-05-16 21:49:01 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						3cc1f1f1d2 
					 
					
						
						
							
							webui : handle PDF input (as text or image) + convert pasted long content to file ( #13562 )  
						
						 
						
						... 
						
						
						
						* webui : handle PDF input (as text or image)
* handle the case where pdf image + server without mtmd
* fix bug missing pages 
						
						
					 
					
						2025-05-15 14:24:50 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Piotr Wilkin (ilintar) 
							
						 
					 
					
						
						
							
						
						c753d7bed0 
					 
					
						
						
							
							server : proper error handling for missing elements in messages array (OpenAI compatible backend) ( #13540 )  
						
						 
						
						
						
						
					 
					
						2025-05-15 08:40:58 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						aa48e373f2 
					 
					
						
						
							
							server: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 )  
						
						 
						
						... 
						
						
						
						* Inject date_string in llama 3.x + fix for functionary v2
https://github.com/ggml-org/llama.cpp/issues/12729 
* move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com > 
						
						
					 
					
						2025-05-15 02:39:51 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						3198405e98 
					 
					
						
						
							
							common: add partial regex support (#12808 )  
						
						 
						
						... 
						
						
						
						* move string_find_partial_stop & string_ends_with to common
* add common_regex (supports partial matches)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* partial regex: add missing iterator end checks
* string utils: use string_views
* direct throw to avoid ggml.h include
* regex-partial: replace missed ggml_asserts
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-05-14 19:50:57 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						053174436f 
					 
					
						
						
							
							server : passthrough the /models endpoint during loading ( #13535 )  
						
						 
						
						... 
						
						
						
						* server : passthrough the /models endpoint during loading
* server : update readme + return json for "meta" field 
						
						
					 
					
						2025-05-14 15:42:10 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						360a9c98e1 
					 
					
						
						
							
							server : fix cache_tokens bug with no cache_prompt ( #13533 )  
						
						 
						
						
						
						
					 
					
						2025-05-14 13:35:07 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						bb1681fbd5 
					 
					
						
						
							
							webui : use fflate for more deterministic gzip compress ( #13525 )  
						
						 
						
						... 
						
						
						
						* webui : use pako for more deterministic gzip compress
* simpler code
* use fflate instead of pako 
						
						
					 
					
						2025-05-14 10:26:12 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Luca Stefani 
							
						 
					 
					
						
						
							
						
						d486dd3e8e 
					 
					
						
						
							
							webui: Allow pasting file from clipboard ( #13526 )  
						
						 
						
						... 
						
						
						
						* server: Allow pasting file from clipboard
* server: Prevent default action on file paste
* update build
* format then build combined
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-05-14 10:07:31 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Anudit Nagar 
							
						 
					 
					
						
						
							
						
						91159ee9df 
					 
					
						
						
							
							server : allow content to be null in oaicompat_completion_params_parse ( #13477 )  
						
						 
						
						
						
						
					 
					
						2025-05-12 13:56:42 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Anthony Umfer 
							
						 
					 
					
						
						
							
						
						9a390c4829 
					 
					
						
						
							
							tools : fix uninitialized llama_batch in server ( #13436 )  
						
						 
						
						... 
						
						
						
						* add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free()
* Update tools/server/server.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* use C++11 initializer syntax
* switch from Copy-list-initialization to Direct-list-initialization
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-05-11 17:08:26 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						3b24d26c22 
					 
					
						
						
							
							server : update docs ( #13432 )  
						
						 
						
						
						
						
					 
					
						2025-05-10 18:44:49 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						33eff40240 
					 
					
						
						
							
							server : vision support via libmtmd ( #12898 )  
						
						 
						
						... 
						
						
						
						* server : (experimental) vision support via libmtmd
* mtmd : add more api around mtmd_image_tokens
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* abstract out the batch management
* small fix
* refactor logic adding tokens to batch
* implement hashing image
* use FNV hash, now hash bitmap instead of file data
* allow decoding image embedding to be split into batches
* rm whitespace
* disable some features when mtmd is on
* fix --no-mmproj-offload
* mtmd_context_params no timings
* refactor server_inp to server_tokens
* fix the failing test case
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* improve server_input struct
* clip :  fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
* fix detokenize
* add const to various places
* add warning about breaking changes
* add c api
* helper: use mtmd_image_tokens_get_n_pos
* fix ctx_shift
* fix name shadowing
* more strict condition
* support remote image_url
* remote image_url log
* add CI test
* do not log base64
* add "has_multimodal" to /props
* remove dangling image
* speculative: use slot.cache_tokens.insert
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* rm can_be_detokenized
* on prmpt processing done, assert cache_tokens.size
* handle_completions_impl returns void
* adapt the new web ui
* update docs and hot topics
* rm assert
* small fix (2)
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-05-09 19:29:37 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						d9c4accaff 
					 
					
						
						
							
							server : (webui) rename has_multimodal --> modalities ( #13393 )  
						
						 
						
						... 
						
						
						
						* server : (webui) rename has_multimodal --> modalities
* allow converting SVG to PNG
* less complicated code 
						
						
					 
					
						2025-05-09 09:06:37 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						ee01d71e58 
					 
					
						
						
							
							server : (webui) fix a very small misalignment ( #13387 )  
						
						 
						
						... 
						
						
						
						* server : (webui) fix a very small misalignment
* restore font-bold 
						
						
					 
					
						2025-05-08 18:51:45 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						8c83449cb7 
					 
					
						
						
							
							server : (webui) revamp the input area, plus many small UI improvements ( #13365 )  
						
						 
						
						... 
						
						
						
						* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log 
						
						
					 
					
						2025-05-08 15:37:29 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6562e5a4d6 
					 
					
						
						
							
							context : allow cache-less context for embeddings ( #13108 )  
						
						 
						
						... 
						
						
						
						* context : allow cache-less context for embeddings
ggml-ci
* context : enable reranking with encode()
ggml-ci
* context : encode() clears embd_seq
ggml-ci
* examples : use llama_encode() when appropriate
ggml-ci
* models : nomic bert moe does not require KV cache
* llama : update comments for llama_decode/llama_encode
ggml-ci
* context : update warning log [no ci] 
						
						
					 
					
						2025-05-08 14:28:33 +03:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								oobabooga 
							
						 
					 
					
						
						
							
						
						233461f812 
					 
					
						
						
							
							sampling : Integrate Top-nσ into main sampling chain (and add it to the server) ( #13264 )  
						
						 
						
						... 
						
						
						
						* sampling: add Top-nσ sampler to `llama-server` and sampler ordering
* revert: sampler ordering
* revert: VS' crappy auto-formatting
* revert: VS' crappy auto-formatting pt.2
* revert: my crappy eye sight...
* sampling: add XTC to Top-nσ sampler chain
* sampling: add Dyna. Temp. to Top-nσ sampler chain
* sampling: actually remove Top-nσ from sampler(oops)
* Integrate top_n_sigma into main sampler chain
* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA
* Formatting
* Lint
* Exit early in the sampler if nsigma < 0
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com > 
						
						
					 
					
						2025-05-05 22:12:19 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								igardev 
							
						 
					 
					
						
						
							
						
						b34c859146 
					 
					
						
						
							
							server : Webui - change setText command from parent window to also send the message. ( #13309 )  
						
						 
						
						... 
						
						
						
						* setText command from parent window for llama-vscode now sends the message automatically.
* Upgrade packages versions to fix vulnerabilities with "npm audit fix" command.
* Fix code formatting.
* Add index.html.gz changes.
* Revert "Upgrade packages versions to fix vulnerabilities with "npm audit fix" command."
This reverts commit 67687b7fda .
* easier approach
* add setTimeout
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-05-05 16:03:31 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						1d36b3670b 
					 
					
						
						
							
							llama : move end-user examples to tools directory ( #13249 )  
						
						 
						
						... 
						
						
						
						* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-05-02 20:27:13 +02:00