Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						4314e56c4f 
					 
					
						
						
							
							server : use lambda instead of std::bind ( #11507 )  
						
						... 
						
						
						
						This commit replaces the two usages of `std::bind` in favor of lambdas for
the callback functions for `callback_new_task` and
`callback_update_slots`.
The motivation for this changes is consistency with the rest of the code
in server.cpp (lambdas are used for all other callbacks/handlers). Also
lambdas are more readable (perhaps this is subjective) but also they are
recommended over `std::bind` in modern C++.
Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md  
						
						
					 
					
						2025-01-30 11:05:00 +01:00 
						 
				 
			
				
					
						
							
							
								Nigel Bosch 
							
						 
					 
					
						
						
							
						
						eb7cf15a80 
					 
					
						
						
							
							server : add /apply-template endpoint for additional use cases of Minja functionality ( #11489 )  
						
						... 
						
						
						
						* add /apply-template endpoint to server
* remove unnecessary line
* add /apply-template documentation
* return only "prompt" field in /apply-template
* use suggested idea instead of my overly verbose way 
						
						
					 
					
						2025-01-29 19:45:44 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						e51c47b401 
					 
					
						
						
							
							server : update auto gen files comments [no ci] ( #11484 )  
						
						... 
						
						
						
						* server : update auto gen files comments
This commit updates the 'auto generated files' comments in server.cpp
and removes `deps.sh` from the comment.
The motivation for this change is that `deps.sh` was removed in
Commit 91c36c269b#10599 )").
* squash! server : update auto gen files comments [no ci]
Move comments about file generation to README.md.
* squash! server : update auto gen files comments [no ci]
Remove the comments in server.cpp that mention that information
can be found in the README.md file. 
						
						
					 
					
						2025-01-29 16:34:18 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						49b0e3cec4 
					 
					
						
						
							
							server : fix cleaning up stream task ( #11418 )  
						
						... 
						
						
						
						* server : fix cleaning up stream task
* one more spot 
						
						
					 
					
						2025-01-25 16:36:44 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						5845661640 
					 
					
						
						
							
							server : add more clean up when cancel_tasks is called ( #11340 )  
						
						... 
						
						
						
						* server : add more clean up when cancel_tasks is called
* fix recv_with_timeout
* std::remove_if
* fix std::remove_if 
						
						
					 
					
						2025-01-23 13:56:05 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						12c2bdf2de 
					 
					
						
						
							
							server : fix draft context not being released ( #11354 )  
						
						
						
						
					 
					
						2025-01-22 17:44:40 +01:00 
						 
				 
			
				
					
						
							
							
								Jiří Podivín 
							
						 
					 
					
						
						
							
						
						96f4053934 
					 
					
						
						
							
							Adding logprobs to /v1/completions ( #11344 )  
						
						... 
						
						
						
						Signed-off-by: Jiri Podivin <jpodivin@redhat.com > 
						
						
					 
					
						2025-01-22 12:51:32 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						80d0d6b4b7 
					 
					
						
						
							
							common : add -hfd option for the draft model ( #11318 )  
						
						... 
						
						
						
						* common : add -hfd option for the draft model
* cont : fix env var
* cont : more fixes 
						
						
					 
					
						2025-01-20 22:29:43 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						f30f099228 
					 
					
						
						
							
							server : implement cancellable request ( #11285 )  
						
						... 
						
						
						
						* server : implement cancellable request
* fix typo
* httplib 0.18.5
* fix i underflow 
						
						
					 
					
						2025-01-18 14:12:05 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						afa8a9ec9b 
					 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming ( #11110 )  
						
						... 
						
						
						
						* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-12 11:32:42 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e6e7c75d94 
					 
					
						
						
							
							server : fix extra BOS in infill endpoint ( #11106 )  
						
						... 
						
						
						
						* server : fix extra BOS in infill endpoing
ggml-ci
* server : update infill tests 
						
						
					 
					
						2025-01-06 15:36:08 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f66f582927 
					 
					
						
						
							
							llama : refactor src/llama.cpp ( #10902 )  
						
						... 
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 
					
						2025-01-03 10:18:53 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						0da5d86026 
					 
					
						
						
							
							server : allow using LoRA adapters per-request ( #10994 )  
						
						... 
						
						
						
						* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-02 15:05:18 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						45095a61bf 
					 
					
						
						
							
							server : clean up built-in template detection ( #11026 )  
						
						... 
						
						
						
						* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
						
						
					 
					
						2024-12-31 15:22:01 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						5896c65232 
					 
					
						
						
							
							server : add OAI compat for /v1/completions ( #10974 )  
						
						... 
						
						
						
						* server : add OAI compat for /v1/completions
* add test
* add docs
* better docs 
						
						
					 
					
						2024-12-31 12:34:13 +01:00 
						 
				 
			
				
					
						
							
							
								Alexey Parfenov 
							
						 
					 
					
						
						
							
						
						16cdce7b68 
					 
					
						
						
							
							server : fix token duplication when streaming with stop strings ( #10997 )  
						
						
						
						
					 
					
						2024-12-28 16:08:54 +01:00 
						 
				 
			
				
					
						
							
							
								Reza Kakhki 
							
						 
					 
					
						
						
							
						
						9ba399dfa7 
					 
					
						
						
							
							server : add support for "encoding_format": "base64" to the */embeddings endpoints ( #10967 )  
						
						... 
						
						
						
						* add support for base64
* fix base64 test
* improve test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-24 21:33:04 +01:00 
						 
				 
			
				
					
						
							
							
								NeverLucky 
							
						 
					 
					
						
						
							
						
						09fe2e7613 
					 
					
						
						
							
							server:  allow filtering llama server response fields ( #10940 )  
						
						... 
						
						
						
						* llama_server_response_fields
* llama_server_response_fields_fix_issues
* params fixes
* fix
* clarify docs
* change to "response_fields"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-24 17:39:49 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						14b699ecde 
					 
					
						
						
							
							server : fix missing model id in /model endpoint ( #10957 )  
						
						... 
						
						
						
						* server : fix missing model id in /model endpoint
* fix ci 
						
						
					 
					
						2024-12-23 12:52:25 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						485dc01214 
					 
					
						
						
							
							server : add system_fingerprint to chat/completion ( #10917 )  
						
						... 
						
						
						
						* server : add system_fingerprint to chat/completion
* update README 
						
						
					 
					
						2024-12-23 12:02:44 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						57bb2c40cd 
					 
					
						
						
							
							server : fix logprobs, make it OAI-compatible ( #10783 )  
						
						... 
						
						
						
						* server : fix logprobs, make it openai-compatible
* update docs
* add std::log
* return pre-sampling p
* sort before apply softmax
* add comment
* fix test
* set p for sampled token
* update docs
* add --multi-token-probs
* update docs
* add `post_sampling_probs` option
* update docs [no ci]
* remove --multi-token-probs
* "top_probs" with "post_sampling_probs"
* resolve review comments
* rename struct token_prob to prob_info
* correct comment placement
* fix setting prob for sampled token 
						
						
					 
					
						2024-12-19 15:40:08 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						152610eda9 
					 
					
						
						
							
							server : output embeddings for all tokens when pooling = none ( #10861 )  
						
						... 
						
						
						
						* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-12-18 13:01:41 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0e70ba686e 
					 
					
						
						
							
							server : add "tokens" output ( #10853 )  
						
						... 
						
						
						
						* server : add "tokens" output
ggml-ci
* server : update readme
ggml-ci
* server : return tokens ids only if requested
ggml-ci
* tests : improve "tokens" type check
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* server : remove "tokens" from the OAI endpoint
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-12-18 11:05:29 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						46828872c3 
					 
					
						
						
							
							server : (embeddings) using same format for "input" and "content" ( #10872 )  
						
						... 
						
						
						
						* server : (embeddings) using same format for "input" and "content"
* fix test case
* handle empty input case
* fix test 
						
						
					 
					
						2024-12-18 10:55:09 +02:00 
						 
				 
			
				
					
						
							
							
								krystiancha 
							
						 
					 
					
						
						
							
						
						05c3a444b8 
					 
					
						
						
							
							server : fill usage info in embeddings and rerank responses ( #10852 )  
						
						... 
						
						
						
						* server : fill usage info in embeddings response
* server : fill usage info in reranking response 
						
						
					 
					
						2024-12-17 18:00:24 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						644fd71b44 
					 
					
						
						
							
							sampling : refactor + optimize penalties sampler ( #10803 )  
						
						... 
						
						
						
						* sampling : refactor + optimize penalties sampler
ggml-ci
* common : apply ignore_eos as logit bias
ggml-ci
* batched : remove penalties sampler
* params : allow penalty_last_n == -1 to be equal to context size
ggml-ci
* common : by default, move the penalties at the end of the sampling chain
ggml-ci
* common : ignore all EOG tokens
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* common : move back the penalties at the front of the sampling chain
ggml-ci
* readme : restore hint about --ignore-eos flag [no ci]
* llama : minor
ggml-ci
* webui : update
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2024-12-16 12:31:14 +02:00 
						 
				 
			
				
					
						
							
							
								Vinesh Janarthanan 
							
						 
					 
					
						
						
							
						
						5478bbcd17 
					 
					
						
						
							
							server: (UI) add syntax highlighting and latex math rendering ( #10808 )  
						
						... 
						
						
						
						* add code highlighting and math formatting
* code cleanup
* build public/index.html
* rebuild public/index.html
* fixed coding style
* fixed coding style
* style fixes
* highlight: smaller bundle size, fix light & dark theme
* remove katex
* add bundle size check
* add more languages
* add php
* reuse some langs
* use gzip
* Revert "remove katex"
This reverts commit c0e5046accson@huggingface.co > 
						
						
					 
					
						2024-12-15 12:55:54 +01:00 
						 
				 
			
				
					
						
							
							
								Michelle Tan 
							
						 
					 
					
						
						
							
						
						89d604f2c8 
					 
					
						
						
							
							server: Fix has_next_line in JSON response ( #10818 )  
						
						... 
						
						
						
						* Update server JSON response.
* Add unit test to check `has_new_line` JSON response
* Remove `has_new_line` unit test changes.
* Address code review comment: type check for `has_new_line` in unit test 
						
						
					 
					
						2024-12-14 23:29:45 +01:00 
						 
				 
			
				
					
						
							
							
								cduk 
							
						 
					 
					
						
						
							
						
						56eea0781c 
					 
					
						
						
							
							Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default ( #10771 )  
						
						... 
						
						
						
						Signed-off-by: Charles Darke <s.cduk@toodevious.com >
Co-authored-by: Charles Darke <s.cduk@toodevious.com > 
						
						
					 
					
						2024-12-13 23:21:49 +01:00 
						 
				 
			
				
					
						
							
							
								Yüg 
							
						 
					 
					
						
						
							
						
						a86ad841f1 
					 
					
						
						
							
							server : add flag to disable the web-ui ( #10762 ) ( #10751 )  
						
						... 
						
						
						
						Co-authored-by: eugenio.segala <esegala@deloitte.co.uk > 
						
						
					 
					
						2024-12-10 18:22:34 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						ce8784bdb1 
					 
					
						
						
							
							server : fix format_infill ( #10724 )  
						
						... 
						
						
						
						* server : fix format_infill
* fix
* rename
* update test
* use another model
* update test
* update test
* test_invalid_input_extra_req 
						
						
					 
					
						2024-12-08 23:04:29 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						e52522b869 
					 
					
						
						
							
							server : bring back info of final chunk in stream mode ( #10722 )  
						
						... 
						
						
						
						* server : bring back into to final chunk in stream mode
* clarify a bit
* traling space 
						
						
					 
					
						2024-12-08 20:38:51 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						3573fa8e7b 
					 
					
						
						
							
							server : (refactor) no more json in server_task input ( #10691 )  
						
						... 
						
						
						
						* server : (refactor) no more json in server_task input
* add test for slots endpoint
* add tests for /props and /slots
* remove task inf_type
* fix CI by adding safe_json_to_str
* add "model_path" to /props
* update readme 
						
						
					 
					
						2024-12-07 20:21:09 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ce4a7b8493 
					 
					
						
						
							
							server : various fixes ( #10704 )  
						
						... 
						
						
						
						* server : various fixes
ggml-ci
* server : show curent seed in slot_params
ggml-ci
* fix /slots endpoint
* Update examples/server/server.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* server : reflect endpoint response changes in the readme
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-12-07 18:02:05 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c2a16c0bdb 
					 
					
						
						
							
							server : fix free of spec context and batch ( #10651 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-12-07 11:52:44 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						6c5bc0625f 
					 
					
						
						
							
							server : (refactoring) do not rely on JSON internally ( #10643 )  
						
						... 
						
						
						
						* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs 
						
						
					 
					
						2024-12-06 11:14:32 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1da7b76569 
					 
					
						
						
							
							server : fix speculative decoding with context shift ( #10641 )  
						
						... 
						
						
						
						* server : fix speculative decoding with context shift
ggml-ci
* server : take into account speculative limits
ggml-ci
* server : add tests 
						
						
					 
					
						2024-12-04 22:38:20 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						91c36c269b 
					 
					
						
						
							
							server : (web ui) Various improvements, now use vite as bundler ( #10599 )  
						
						... 
						
						
						
						* hide buttons in dropdown menu
* use npm as deps manager and vite as bundler
* fix build
* fix build (2)
* fix responsive on mobile
* fix more problems on mobile
* sync build
* (test) add CI step for verifying build
* fix ci
* force rebuild .hpp files
* cmake: clean up generated files pre build 
						
						
					 
					
						2024-12-03 19:38:44 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						70b98fadbc 
					 
					
						
						
							
							server : fix default draft model parameters ( #10586 )  
						
						... 
						
						
						
						* server : force F16 KV cache for the draft model
ggml-ci
* server : fix draft params
ggml-ci
* server : various params fixes
ggml-ci 
						
						
					 
					
						2024-12-03 11:20:00 +02:00 
						 
				 
			
				
					
						
							
							
								haopeng 
							
						 
					 
					
						
						
							
						
						64ed2091b2 
					 
					
						
						
							
							server: Add "tokens per second" information in the backend ( #10548 )  
						
						... 
						
						
						
						* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-02 14:45:54 +01:00 
						 
				 
			
				
					
						
							
							
								alek3y 
							
						 
					 
					
						
						
							
						
						86dc11c5bc 
					 
					
						
						
							
							server : bind to any port when specified ( #10590 )  
						
						
						
						
					 
					
						2024-12-01 13:33:12 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						84e1c33cde 
					 
					
						
						
							
							server : fix parallel speculative decoding ( #10513 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-11-26 13:36:40 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						47f931c8f9 
					 
					
						
						
							
							server : enable cache_prompt by default ( #10501 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-11-25 21:50:07 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						10bce0450f 
					 
					
						
						
							
							llama : accept a list of devices to use to offload a model ( #10497 )  
						
						... 
						
						
						
						* llama : accept a list of devices to use to offload a model
* accept `--dev none` to completely disable offloading
* fix dev list with dl backends
* rename env parameter to LLAMA_ARG_DEVICE for consistency 
						
						
					 
					
						2024-11-25 19:30:06 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						9ca2e67762 
					 
					
						
						
							
							server : add speculative decoding support ( #10455 )  
						
						... 
						
						
						
						* server : add speculative decoding support
ggml-ci
* server : add helper function slot.can_speculate()
ggml-ci 
						
						
					 
					
						2024-11-25 16:31:38 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d9d54e498d 
					 
					
						
						
							
							speculative : refactor and add a simpler example ( #10362 )  
						
						... 
						
						
						
						* speculative : refactor and add a simpler example
ggml-ci
* speculative : clean-up and add comments and TODOs [no ci]
* speculative : manage context in common_speculative
ggml-ci
* speculative : simplify
ggml-ci
* speculative : simplify (cont)
ggml-ci
* speculative : add --draft-min CLI arg
* speculative : minor fixup
* make : build fixes
* speculative : do not redraft previous drafts
ggml-ci
* speculative : fix the draft sampling
ggml-ci
* speculative : fix compile warning
* common : refactor args
ggml-ci
* common : change defaults [no ci]
* common : final touches
ggml-ci 
						
						
					 
					
						2024-11-25 09:58:41 +02:00 
						 
				 
			
				
					
						
							
							
								MaggotHATE 
							
						 
					 
					
						
						
							
						
						bcdb7a2386 
					 
					
						
						
							
							server: (web UI) Add samplers sequence customization ( #10255 )  
						
						... 
						
						
						
						* Samplers sequence: simplified and input field.
* Removed unused function
* Modify and use `settings-modal-short-input`
* rename "name" --> "label"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-11-16 14:26:54 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						9901068ac7 
					 
					
						
						
							
							server : (web UI) add copy button for code block, fix api key ( #10242 )  
						
						... 
						
						
						
						* server : (web ui) add copy btn for code blocks
* fix problem with api key
* use settings-modal-short-input component
* always show copy btn for code snippet 
						
						
					 
					
						2024-11-15 10:48:49 +01:00 
						 
				 
			
				
					
						
							
							
								Jhen-Jie Hong 
							
						 
					 
					
						
						
							
						
						0e712a5acb 
					 
					
						
						
							
							server : fix incorrect res in validate_model_chat_template ( #10272 )  
						
						... 
						
						
						
						* server : fix validate_model_chat_template
* server : fix chat res 
						
						
					 
					
						2024-11-13 13:15:23 +02:00