Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						89fea80d29 
					 
					
						
						
							
							server : fix incorrect usage of llama_get_embeddings() ( #14225 )  
						
						... 
						
						
						
						* server : fix incorrect usage of llama_get_embeddings()
ggml-ci
* cont : fix the fix
ggml-ci 
						
						
					 
					
						2025-06-16 22:33:27 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d3e64b9f49 
					 
					
						
						
							
							llama : rework embeddings logic ( #14208 )  
						
						... 
						
						
						
						* llama : rework embeddings logic
ggml-ci
* cont : fix rerank
ggml-ci
* cont : engrish [no ci]
* cont : fix rerank
ggml-ci
* server : support both embeddings and completions with single model
ggml-ci
* cont : avoid embeddings_org
ggml-ci 
						
						
					 
					
						2025-06-16 14:14:00 +03:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						cd355eda7d 
					 
					
						
						
							
							server : When listening on a unix domain socket don't print http:// and port ( #14180 )  
						
						... 
						
						
						
						Instead show something like this:
main: server is listening on file.sock - starting the main loop
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
					 
					
						2025-06-15 23:36:22 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ffad043973 
					 
					
						
						
							
							server : fix SWA condition for full context reprocess ( #14163 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-06-13 11:18:25 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						7d516443dd 
					 
					
						
						
							
							server : re-enable SWA speculative decoding ( #14131 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-06-12 11:51:38 +03:00 
						 
				 
			
				
					
						
							
							
								Aman 
							
						 
					 
					
						
						
							
						
						7781e5fe99 
					 
					
						
						
							
							webui: Wrap long numbers instead of infinite horizontal scroll ( #14062 )  
						
						... 
						
						
						
						* webui: Wrap long numbers instead of infinite horizontal scroll
* Use tailwind class
* update index.html.gz 
						
						
					 
					
						2025-06-11 16:42:25 +02:00 
						 
				 
			
				
					
						
							
							
								Taylor 
							
						 
					 
					
						
						
							
						
						2baf07727f 
					 
					
						
						
							
							server : pass default --keep argument ( #14120 )  
						
						
						
						
					 
					
						2025-06-11 13:43:43 +03:00 
						 
				 
			
				
					
						
							
							
								Juk Armstrong 
							
						 
					 
					
						
						
							
						
						3a12db23b6 
					 
					
						
						
							
							Fixed spec timings to: accepted/tested instead of accepted/drafted ( #14104 )  
						
						
						
						
					 
					
						2025-06-10 16:48:07 +01:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						dc0623fddb 
					 
					
						
						
							
							webui: fix sidebar being covered by main content ( #14082 )  
						
						... 
						
						
						
						* webui: fix sidebar being covered by main content
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* webui: update index.html.gz
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
					 
					
						2025-06-09 12:01:17 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						87d34b381d 
					 
					
						
						
							
							server : fix LRU check ( #14079 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-06-09 12:57:58 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						745aa5319b 
					 
					
						
						
							
							llama : deprecate llama_kv_self_ API ( #14030 )  
						
						... 
						
						
						
						* llama : deprecate llama_kv_self_ API
ggml-ci
* llama : allow llama_memory_(nullptr)
ggml-ci
* memory : add flag for optional data clear in llama_memory_clear
ggml-ci 
						
						
					 
					
						2025-06-06 14:11:15 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						3637576288 
					 
					
						
						
							
							server : disable speculative decoding for SWA models ( #13970 )  
						
						... 
						
						
						
						* server : use swa-full fo draft context
ggml-ci
* server : disable speculative decoding for SWA models 
						
						
					 
					
						2025-06-02 21:34:40 +03:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						c9bbc77931 
					 
					
						
						
							
							server: update deepseek reasoning format (pass reasoning_content as diffs) (#13933 )  
						
						... 
						
						
						
						* server: update deepseek reasoning format (now in reasoning_content diffs), add legacy option for compat
* update unit/test_tool_call.py::test_thoughts 
						
						
					 
					
						2025-06-02 10:15:44 -07:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						3600cc2886 
					 
					
						
						
							
							llama : use n_swa + n_ubatch cells for SWA cache ( #13833 )  
						
						... 
						
						
						
						* llama : use n_swa + n_ubatch cells for SWA cache
ggml-ci
* llama : add warning about multi-sqeuence SWA contexts 
						
						
					 
					
						2025-05-31 15:57:44 +03:00 
						 
				 
			
				
					
						
							
							
								igardev 
							
						 
					 
					
						
						
							
						
						c7e0a2054b 
					 
					
						
						
							
							webui : Replace alert and confirm with custom modals. ( #13711 )  
						
						... 
						
						
						
						* Replace alert and confirm with custom modals. This is needed as Webview in VS Code doesn't permit alert and confirm for security reasons.
* use Modal Provider to simplify the use of confirm and alert modals.
* Increase the z index of the modal dialogs.
* Update index.html.gz
* also add showPrompt
* rebuild
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-05-31 11:56:08 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						3f55f781f1 
					 
					
						
						
							
							llama : auto-batch preparation ( #13845 )  
						
						... 
						
						
						
						* llama : auto-batch
ggml-ci
* context : simplify if branching 
						
						
					 
					
						2025-05-31 12:55:57 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						51fa76f172 
					 
					
						
						
							
							mtmd : drop _shared from libmtmd name, merge helpers into libmtmd ( ⚠️  breaking change) ( #13917 )  
						
						... 
						
						
						
						* mtmd : fix missing public header
* no object
* apply suggestion from Georgi
* rm mtmd-helper, merge it to mtmd
* missing vendor include dir 
						
						
					 
					
						2025-05-31 10:14:29 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						12d0188c0d 
					 
					
						
						
							
							kv-cache : refactor + add llama_memory_state_i ( #13746 )  
						
						... 
						
						
						
						* kv-cache : simplify the "struct llama_kv_cache" interface
ggml-ci
* kv-cache : revert the (n_swa + n_ubatch) change (for next PR)
ggml-ci
* kv-cache : some comments
ggml-ci
* context : fix graph reserve for multiple sequences
ggml-ci
* kv-cache : fix typo [no ci]
* kv-cache : fix find_slot() logic for free slots
ggml-ci
* llama : add TODO for deprecating the defrag API in the future
* kv-cache : improve find_slot() using min/max seq pos info
ggml-ci
* llama : handle aborts and compute errors
ggml-ci
* memory : extract state into llama_memory_state
ggml-ci
* kv-cache : add comments
ggml-ci
* server : update batching logic to reset n_batch on successful decode
* server : upon full re-processing, remove the sequence from the cache
* kv-cache : add TODO for doing split_equal when split_simple fails
ggml-ci 
						
						
					 
					
						2025-05-31 10:24:04 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						53f925074d 
					 
					
						
						
							
							sync : vendor ( #13901 )  
						
						... 
						
						
						
						* sync : vendor
ggml-ci
* cont : fix httplib version
ggml-ci
* cont : fix lint
* cont : fix lint
* vendor : move to common folder /vendor
ggml-ci
* cont : fix lint
* cont : move httplib to /vendor + use json_fwd.hpp
ggml-ci
* cont : fix server build
ggml-ci
* cont : add missing headers
ggml-ci
* cont : header clean-up
ggml-ci 
						
						
					 
					
						2025-05-30 16:25:45 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						10961339b2 
					 
					
						
						
							
							mtmd : move helpers to dedicated library ( ⚠️  breaking change) ( #13866 )  
						
						... 
						
						
						
						* mtmd : move helpers to dedicated library
* fix server build
* rm leftover cmakelist code 
						
						
					 
					
						2025-05-28 22:35:22 +02:00 
						 
				 
			
				
					
						
							
							
								Đinh Trọng Huy 
							
						 
					 
					
						
						
							
						
						e0e3aa231d 
					 
					
						
						
							
							llama : add support for BertForSequenceClassification reranker ( #13858 )  
						
						... 
						
						
						
						* convert: add support for BertForSequenceClassification
* add support for reranking using BertForSequenceClassification
* merge checks of eos and sep
* fix lint
---------
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp > 
						
						
					 
					
						2025-05-28 19:01:58 +02:00 
						 
				 
			
				
					
						
							
							
								Sky 
							
						 
					 
					
						
						
							
						
						c962ae3382 
					 
					
						
						
							
							server: fix remove 'image_url'/'input_audio' json-object effectlly for 'llama_params' in multimodal-model-mode ( #13853 )  
						
						... 
						
						
						
						[fix]: remove 'image_url'/'input_audio' effectlly for 'llama_params' in multimodal-model-mode 
						
						
					 
					
						2025-05-28 16:33:54 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						03f582ae8f 
					 
					
						
						
							
							server: fix streaming crashes ( #13786 )  
						
						... 
						
						
						
						* add preludes to content on partial regex match
* allow all parsers to parse non-tool-call content.
* tweak order of <|python_tag|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash 
						
						
					 
					
						2025-05-26 16:03:57 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						d74e94c1b3 
					 
					
						
						
							
							server: fix format of streamed tool call deltas (diff name, fix id location) (#13800 )  
						
						... 
						
						
						
						* fix deltas of tool_call.function.name
* fix tool_call.id (was in tool_call.function.id!) + add function type
* add tool_call.type
* populate empty tool_call.function.arguments on first delta 
						
						
					 
					
						2025-05-26 14:56:49 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						f13847cfb5 
					 
					
						
						
							
							server: fix regression on streamed non-chat completion w/ stops ( #13785 )  
						
						... 
						
						
						
						* more forgiving message diffs: partial stop words aren't erased, full stops are
* Add (slow) server test for completion + stream + stop 
						
						
					 
					
						2025-05-26 14:16:37 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						79c137f776 
					 
					
						
						
							
							examples : allow extracting embeddings from decoder contexts ( #13797 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-26 14:03:54 +03:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						e121edc432 
					 
					
						
						
							
							server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771 )  
						
						... 
						
						
						
						---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-05-26 00:30:51 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						2f099b510f 
					 
					
						
						
							
							webui : bump max upload file size to 500MB ( #13779 )  
						
						
						
						
					 
					
						2025-05-25 18:02:18 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						d785f9c1fd 
					 
					
						
						
							
							server: fix/test add_generation_prompt ( #13770 )  
						
						... 
						
						
						
						Co-authored-by: ochafik <ochafik@google.com > 
						
						
					 
					
						2025-05-25 10:45:49 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						f5cd27b71d 
					 
					
						
						
							
							server: streaming of tool calls and thoughts when --jinja is on (#12379 )  
						
						... 
						
						
						
						* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com > 
						
						
					 
					
						2025-05-25 01:48:08 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						9ecf3e66a3 
					 
					
						
						
							
							server : support audio input ( #13714 )  
						
						... 
						
						
						
						* server : support audio input
* add audio support on webui 
						
						
					 
					
						2025-05-23 11:03:47 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						797990c4bc 
					 
					
						
						
							
							mtmd : add ultravox audio input ( #13623 )  
						
						... 
						
						
						
						* convert ok, load ok
* warmup ok
* test
* still does not work?
* fix padding
* temporary give up
* fix merge conflict
* build_ultravox()
* rm test
* fix merge conflict
* add necessary mtmd APIs
* first working version (only 4s of audio)
* will this monster compile?
* fix compile
* please compile
* fPIC
* fix windows
* various fixes
* clean up audio_helpers
* fix conversion
* add some debug stuff
* long audio input ok
* adapt the api
* add --audio arg
* final touch UX
* add miniaudio to readme
* fix typo
* refactor kv metadata
* mtmd_default_marker() 
						
						
					 
					
						2025-05-22 20:42:48 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						cc74d5be99 
					 
					
						
						
							
							server : pad small embedding batches ( #13692 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-05-22 16:33:39 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5fbfe384d4 
					 
					
						
						
							
							server : improve error reporting ( #13680 )  
						
						
						
						
					 
					
						2025-05-21 19:46:56 +03:00 
						 
				 
			
				
					
						
							
							
								Robin Davidsson 
							
						 
					 
					
						
						
							
						
						0d5c742161 
					 
					
						
						
							
							server : Add the endpoints /api/tags and /api/chat ( #13659 )  
						
						... 
						
						
						
						* Add the endpoints /api/tags and /api/chat
Add the endpoints /api/tags and /api/chat, and improved the model metadata response
* Remove trailing whitespaces
* Removed code that is not needed for copilot to work. 
						
						
					 
					
						2025-05-21 15:15:27 +02:00 
						 
				 
			
				
					
						
							
							
								Dorin-Andrei Geman 
							
						 
					 
					
						
						
							
						
						42158ae2e8 
					 
					
						
						
							
							server : fix first message identification ( #13634 )  
						
						... 
						
						
						
						* server : fix first message identification
When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626 ) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message.
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
* server : Fix checks for first role message for stream=True
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
---------
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com > 
						
						
					 
					
						2025-05-21 15:07:57 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						797f2ac062 
					 
					
						
						
							
							kv-cache : simplify the interface ( #13660 )  
						
						... 
						
						
						
						* kv-cache : simplify the interface
ggml-ci
* context : revert llama_batch_allocr position change
ggml-ci 
						
						
					 
					
						2025-05-21 15:11:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e298d2fbd0 
					 
					
						
						
							
							kv-cache : add SWA support ( #13194 )  
						
						... 
						
						
						
						* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci 
						
						
					 
					
						2025-05-20 08:05:46 +03:00 
						 
				 
			
				
					
						
							
							
								Isaac McFadyen 
							
						 
					 
					
						
						
							
						
						6a2bc8bfb7 
					 
					
						
						
							
							server : added --no-prefill-assistant flag ( #13608 )  
						
						... 
						
						
						
						* added no-prefill-assistant flag
* reworded documentation comment
* updated server README.md 
						
						
					 
					
						2025-05-17 23:59:48 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						6aa892ec2a 
					 
					
						
						
							
							server : do not return error out of context (with ctx shift disabled) ( #13577 )  
						
						
						
						
					 
					
						2025-05-16 21:50:00 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						aea9f8b4e7 
					 
					
						
						
							
							webui : improve accessibility for visually impaired people ( #13551 )  
						
						... 
						
						
						
						* webui : improve accessibility for visually impaired people
* add a11y for extra contents
* fix some labels being read twice
* add skip to main content 
						
						
					 
					
						2025-05-16 21:49:01 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						3cc1f1f1d2 
					 
					
						
						
							
							webui : handle PDF input (as text or image) + convert pasted long content to file ( #13562 )  
						
						... 
						
						
						
						* webui : handle PDF input (as text or image)
* handle the case where pdf image + server without mtmd
* fix bug missing pages 
						
						
					 
					
						2025-05-15 14:24:50 +02:00 
						 
				 
			
				
					
						
							
							
								Piotr Wilkin (ilintar) 
							
						 
					 
					
						
						
							
						
						c753d7bed0 
					 
					
						
						
							
							server : proper error handling for missing elements in messages array (OpenAI compatible backend) ( #13540 )  
						
						
						
						
					 
					
						2025-05-15 08:40:58 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						aa48e373f2 
					 
					
						
						
							
							server: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 )  
						
						... 
						
						
						
						* Inject date_string in llama 3.x + fix for functionary v2
https://github.com/ggml-org/llama.cpp/issues/12729 
* move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com > 
						
						
					 
					
						2025-05-15 02:39:51 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						3198405e98 
					 
					
						
						
							
							common: add partial regex support (#12808 )  
						
						... 
						
						
						
						* move string_find_partial_stop & string_ends_with to common
* add common_regex (supports partial matches)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/regex-partial.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* partial regex: add missing iterator end checks
* string utils: use string_views
* direct throw to avoid ggml.h include
* regex-partial: replace missed ggml_asserts
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-05-14 19:50:57 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						053174436f 
					 
					
						
						
							
							server : passthrough the /models endpoint during loading ( #13535 )  
						
						... 
						
						
						
						* server : passthrough the /models endpoint during loading
* server : update readme + return json for "meta" field 
						
						
					 
					
						2025-05-14 15:42:10 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						360a9c98e1 
					 
					
						
						
							
							server : fix cache_tokens bug with no cache_prompt ( #13533 )  
						
						
						
						
					 
					
						2025-05-14 13:35:07 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						bb1681fbd5 
					 
					
						
						
							
							webui : use fflate for more deterministic gzip compress ( #13525 )  
						
						... 
						
						
						
						* webui : use pako for more deterministic gzip compress
* simpler code
* use fflate instead of pako 
						
						
					 
					
						2025-05-14 10:26:12 +02:00 
						 
				 
			
				
					
						
							
							
								Luca Stefani 
							
						 
					 
					
						
						
							
						
						d486dd3e8e 
					 
					
						
						
							
							webui: Allow pasting file from clipboard ( #13526 )  
						
						... 
						
						
						
						* server: Allow pasting file from clipboard
* server: Prevent default action on file paste
* update build
* format then build combined
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-05-14 10:07:31 +02:00 
						 
				 
			
				
					
						
							
							
								Anudit Nagar 
							
						 
					 
					
						
						
							
						
						91159ee9df 
					 
					
						
						
							
							server : allow content to be null in oaicompat_completion_params_parse ( #13477 )  
						
						
						
						
					 
					
						2025-05-12 13:56:42 +02:00