Olivier Chafik 
							
						 
					 
					
						
						
							
						
						cde3833239 
					 
					
						
						
							
							tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616 )  
						
						... 
						
						
						
						* tool-call: allow `--jinja --chat-template chatml`
* fix double bos issue (drop bos/eos tokens from jinja template)
* add missing try catch around jinja parsing to default to chatml
* Simplify default chatml logic 
						
						
					 
					
						2025-02-03 23:49:27 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						4a2b196d03 
					 
					
						
						
							
							server : fix --jinja when there's no tools or schema (typo was forcing JSON) ( #11531 )  
						
						
						
						
					 
					
						2025-01-31 10:12:40 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						8b576b6c55 
					 
					
						
						
							
							Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )  
						
						... 
						
						
						
						---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-01-30 19:13:58 +00:00 
						 
				 
			
				
					
						
							
							
								Nigel Bosch 
							
						 
					 
					
						
						
							
						
						eb7cf15a80 
					 
					
						
						
							
							server : add /apply-template endpoint for additional use cases of Minja functionality ( #11489 )  
						
						... 
						
						
						
						* add /apply-template endpoint to server
* remove unnecessary line
* add /apply-template documentation
* return only "prompt" field in /apply-template
* use suggested idea instead of my overly verbose way 
						
						
					 
					
						2025-01-29 19:45:44 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						45095a61bf 
					 
					
						
						
							
							server : clean up built-in template detection ( #11026 )  
						
						... 
						
						
						
						* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
						
						
					 
					
						2024-12-31 15:22:01 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						5896c65232 
					 
					
						
						
							
							server : add OAI compat for /v1/completions ( #10974 )  
						
						... 
						
						
						
						* server : add OAI compat for /v1/completions
* add test
* add docs
* better docs 
						
						
					 
					
						2024-12-31 12:34:13 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						485dc01214 
					 
					
						
						
							
							server : add system_fingerprint to chat/completion ( #10917 )  
						
						... 
						
						
						
						* server : add system_fingerprint to chat/completion
* update README 
						
						
					 
					
						2024-12-23 12:02:44 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						57bb2c40cd 
					 
					
						
						
							
							server : fix logprobs, make it OAI-compatible ( #10783 )  
						
						... 
						
						
						
						* server : fix logprobs, make it openai-compatible
* update docs
* add std::log
* return pre-sampling p
* sort before apply softmax
* add comment
* fix test
* set p for sampled token
* update docs
* add --multi-token-probs
* update docs
* add `post_sampling_probs` option
* update docs [no ci]
* remove --multi-token-probs
* "top_probs" with "post_sampling_probs"
* resolve review comments
* rename struct token_prob to prob_info
* correct comment placement
* fix setting prob for sampled token 
						
						
					 
					
						2024-12-19 15:40:08 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						3573fa8e7b 
					 
					
						
						
							
							server : (refactor) no more json in server_task input ( #10691 )  
						
						... 
						
						
						
						* server : (refactor) no more json in server_task input
* add test for slots endpoint
* add tests for /props and /slots
* remove task inf_type
* fix CI by adding safe_json_to_str
* add "model_path" to /props
* update readme 
						
						
					 
					
						2024-12-07 20:21:09 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						6c5bc0625f 
					 
					
						
						
							
							server : (refactoring) do not rely on JSON internally ( #10643 )  
						
						... 
						
						
						
						* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs 
						
						
					 
					
						2024-12-06 11:14:32 +01:00 
						 
				 
			
				
					
						
							
							
								haopeng 
							
						 
					 
					
						
						
							
						
						64ed2091b2 
					 
					
						
						
							
							server: Add "tokens per second" information in the backend ( #10548 )  
						
						... 
						
						
						
						* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-02 14:45:54 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						b782e5c7d4 
					 
					
						
						
							
							server : add more test cases ( #10569 )  
						
						... 
						
						
						
						* server : add split model test
* add test speculative
* add invalid cases 
						
						
					 
					
						2024-11-29 21:48:56 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						45abe0f74e 
					 
					
						
						
							
							server : replace behave with pytest ( #10416 )  
						
						... 
						
						
						
						* server : replace behave with pytest
* fix test on windows
* misc
* add more tests
* more tests
* styling
* log less, fix embd test
* added all sequential tests
* fix coding style
* fix save slot test
* add parallel completion test
* fix parallel test
* remove feature files
* update test docs
* no cache_prompt for some tests
* add test_cache_vs_nocache_prompt 
						
						
					 
					
						2024-11-26 16:20:18 +01:00