Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						92bc493917 
					 
					
						
						
							
							tests : increase timeout when sanitizers are enabled ( #11300 )  
						
						... 
						
						
						
						* tests : increase timeout when sanitizers are enabled
* tests : add DEFAULT_HTTP_TIMEOUT 
						
						
					 
					
						2025-01-19 20:22:30 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						f30f099228 
					 
					
						
						
							
							server : implement cancellable request ( #11285 )  
						
						... 
						
						
						
						* server : implement cancellable request
* fix typo
* httplib 0.18.5
* fix i underflow 
						
						
					 
					
						2025-01-18 14:12:05 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						0da5d86026 
					 
					
						
						
							
							server : allow using LoRA adapters per-request ( #10994 )  
						
						... 
						
						
						
						* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-02 15:05:18 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						45095a61bf 
					 
					
						
						
							
							server : clean up built-in template detection ( #11026 )  
						
						... 
						
						
						
						* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
						
						
					 
					
						2024-12-31 15:22:01 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						152610eda9 
					 
					
						
						
							
							server : output embeddings for all tokens when pooling = none ( #10861 )  
						
						... 
						
						
						
						* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-12-18 13:01:41 +02:00 
						 
				 
			
				
					
						
							
							
								Yüg 
							
						 
					 
					
						
						
							
						
						a86ad841f1 
					 
					
						
						
							
							server : add flag to disable the web-ui ( #10762 ) ( #10751 )  
						
						... 
						
						
						
						Co-authored-by: eugenio.segala <esegala@deloitte.co.uk > 
						
						
					 
					
						2024-12-10 18:22:34 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						ce8784bdb1 
					 
					
						
						
							
							server : fix format_infill ( #10724 )  
						
						... 
						
						
						
						* server : fix format_infill
* fix
* rename
* update test
* use another model
* update test
* update test
* test_invalid_input_extra_req 
						
						
					 
					
						2024-12-08 23:04:29 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						3573fa8e7b 
					 
					
						
						
							
							server : (refactor) no more json in server_task input ( #10691 )  
						
						... 
						
						
						
						* server : (refactor) no more json in server_task input
* add test for slots endpoint
* add tests for /props and /slots
* remove task inf_type
* fix CI by adding safe_json_to_str
* add "model_path" to /props
* update readme 
						
						
					 
					
						2024-12-07 20:21:09 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						b782e5c7d4 
					 
					
						
						
							
							server : add more test cases ( #10569 )  
						
						... 
						
						
						
						* server : add split model test
* add test speculative
* add invalid cases 
						
						
					 
					
						2024-11-29 21:48:56 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						6c59567689 
					 
					
						
						
							
							server : (tests) don't use thread for capturing stdout/stderr, bump openai client library ( #10568 )  
						
						... 
						
						
						
						* server : (tests) don't use thread for capturing stdout/stderr
* test: bump openai to 1.55.2
* bump openai to 1.55.3 
						
						
					 
					
						2024-11-28 19:17:49 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						9f912511bc 
					 
					
						
						
							
							common : fix duplicated file name with hf_repo and hf_file ( #10550 )  
						
						
						
						
					 
					
						2024-11-27 22:30:52 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						45abe0f74e 
					 
					
						
						
							
							server : replace behave with pytest ( #10416 )  
						
						... 
						
						
						
						* server : replace behave with pytest
* fix test on windows
* misc
* add more tests
* more tests
* styling
* log less, fix embd test
* added all sequential tests
* fix coding style
* fix save slot test
* add parallel completion test
* fix parallel test
* remove feature files
* update test docs
* no cache_prompt for some tests
* add test_cache_vs_nocache_prompt 
						
						
					 
					
						2024-11-26 16:20:18 +01:00