Victor 
							
						 
					 
					
						
						
							
						
						add2a3aa5a 
					 
					
						
						
							
							server: fix "--grammar-file" parameter ( #12285 )  
						
						
						
						
					 
					
						2025-03-14 11:21:17 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						be421fc429 
					 
					
						
						
							
							tool-call: ensure there's always a non-empty tool call id (#12292 )  
						
						
						
						
					 
					
						2025-03-10 09:45:29 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						669912d9a5 
					 
					
						
						
							
							tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )  
						
						... 
						
						
						
						* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-03-05 13:05:13 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						1a24c4621f 
					 
					
						
						
							
							server: fix deadly typo in response_format.json_schema.schema handling (#12168 )  
						
						
						
						
					 
					
						2025-03-04 08:24:07 +02:00 
						 
				 
			
				
					
						
							
							
								rhjdvsgsgks 
							
						 
					 
					
						
						
							
						
						401af80b54 
					 
					
						
						
							
							server: handle echo=false on /v1/completions ( #12060 )  
						
						
						
						
					 
					
						2025-02-25 12:52:52 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						0b52745649 
					 
					
						
						
							
							server: support add_generation_prompt query param ( #12062 )  
						
						
						
						
					 
					
						2025-02-25 10:40:22 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						cf756d6e0a 
					 
					
						
						
							
							server : disable Nagle's algorithm ( #12020 )  
						
						
						
						
					 
					
						2025-02-22 11:46:31 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						63e489c025 
					 
					
						
						
							
							tool-call: refactor common chat / tool-call api (+ tests / fixes) ( #11900 )  
						
						... 
						
						
						
						* tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type
* addressed clang-tidy lints in [test-]chat.*
* rm minja deps from util & common & move it to common/minja/
* add name & tool_call_id to common_chat_msg
* add common_chat_tool
* added json <-> tools, msgs conversions to chat.h
* fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens)
* fix deepseek r1 slow test (no longer <think> opening w/ new template)
* allow empty tools w/ auto + grammar
* fix & test server grammar & json_schema params w/ & w/o --jinja 
						
						
					 
					
						2025-02-18 18:03:23 +00:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						63ac128563 
					 
					
						
						
							
							server : add TEI API format for /rerank endpoint ( #11942 )  
						
						... 
						
						
						
						* server : add TEI API format for /rerank endpoint
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* fix
* also gitignore examples/server/*.gz.hpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-18 14:21:41 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						68ff663a04 
					 
					
						
						
							
							repo : update links to new url ( #11886 )  
						
						... 
						
						
						
						* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci 
						
						
					 
					
						2025-02-15 16:40:57 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						c7f460ab88 
					 
					
						
						
							
							server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607 )  
						
						... 
						
						
						
						* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-13 10:05:16 +00:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						5598f475be 
					 
					
						
						
							
							server : remove CPPHTTPLIB_NO_EXCEPTIONS define ( #11622 )  
						
						... 
						
						
						
						This commit removes the CPPHTTPLIB_NO_EXCEPTIONS define from the server
code.
The motivation for this is that when using a debug build the server
would crash when an exception was throws and terminate the server
process, as it was unhandled. When CPPHTTPLIB_NO_EXCEPTIONS is set
cpp_httplib will not call the exception handler, which would normally
return a 500 error to the client. This caused tests to fail when using
a debug build.
Fixes: https://github.com/ggerganov/llama.cpp/issues/11613  
						
						
					 
					
						2025-02-03 16:45:38 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						bfcce4d693 
					 
					
						
						
							
							tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 )  
						
						... 
						
						
						
						* `tool-call`: support Command R7B (w/ tool_plan return)
* `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override
* `tool-call`: test cleanup / handle lazy grammar triggers 
						
						
					 
					
						2025-02-02 09:25:38 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						a83f528688 
					 
					
						
						
							
							tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539 )  
						
						... 
						
						
						
						* An empty tool_call_id is better than none!
* sync: minja (tool call name optional https://github.com/google/minja/pull/36 )
* Force-disable parallel_tool_calls if template doesn't support it
* More debug logs
* Llama 3.x tools: accept / trigger on more varied spaced outputs
* Fix empty content for functionary v3.2 tool call
* Add proper tool call docs to server README
* readme: function calling *is* supported now
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-31 14:15:25 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						b1bcd309fc 
					 
					
						
						
							
							fix stop regression ( #11543 )  
						
						
						
						
					 
					
						2025-01-31 13:48:31 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						4a2b196d03 
					 
					
						
						
							
							server : fix --jinja when there's no tools or schema (typo was forcing JSON) ( #11531 )  
						
						
						
						
					 
					
						2025-01-31 10:12:40 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						8b576b6c55 
					 
					
						
						
							
							Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )  
						
						... 
						
						
						
						---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2025-01-30 19:13:58 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						6171c9d258 
					 
					
						
						
							
							Add Jinja template support ( #11016 )  
						
						... 
						
						
						
						* Copy minja from 58f0ca6dd7https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626https://github.com/google/minja/pull/25 
* Update minja from https://github.com/google/minja/pull/27 
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-21 13:18:51 +00:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						afa8a9ec9b 
					 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming ( #11110 )  
						
						... 
						
						
						
						* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-01-12 11:32:42 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						727368c60f 
					 
					
						
						
							
							llama : use LLAMA_TOKEN_NULL ( #11062 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 10:52:15 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f66f582927 
					 
					
						
						
							
							llama : refactor src/llama.cpp ( #10902 )  
						
						... 
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 
					
						2025-01-03 10:18:53 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						0da5d86026 
					 
					
						
						
							
							server : allow using LoRA adapters per-request ( #10994 )  
						
						... 
						
						
						
						* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-02 15:05:18 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						45095a61bf 
					 
					
						
						
							
							server : clean up built-in template detection ( #11026 )  
						
						... 
						
						
						
						* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
						
						
					 
					
						2024-12-31 15:22:01 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						5896c65232 
					 
					
						
						
							
							server : add OAI compat for /v1/completions ( #10974 )  
						
						... 
						
						
						
						* server : add OAI compat for /v1/completions
* add test
* add docs
* better docs 
						
						
					 
					
						2024-12-31 12:34:13 +01:00 
						 
				 
			
				
					
						
							
							
								Reza Kakhki 
							
						 
					 
					
						
						
							
						
						9ba399dfa7 
					 
					
						
						
							
							server : add support for "encoding_format": "base64" to the */embeddings endpoints ( #10967 )  
						
						... 
						
						
						
						* add support for base64
* fix base64 test
* improve test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-24 21:33:04 +01:00 
						 
				 
			
				
					
						
							
							
								NeverLucky 
							
						 
					 
					
						
						
							
						
						09fe2e7613 
					 
					
						
						
							
							server:  allow filtering llama server response fields ( #10940 )  
						
						... 
						
						
						
						* llama_server_response_fields
* llama_server_response_fields_fix_issues
* params fixes
* fix
* clarify docs
* change to "response_fields"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-24 17:39:49 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						485dc01214 
					 
					
						
						
							
							server : add system_fingerprint to chat/completion ( #10917 )  
						
						... 
						
						
						
						* server : add system_fingerprint to chat/completion
* update README 
						
						
					 
					
						2024-12-23 12:02:44 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						57bb2c40cd 
					 
					
						
						
							
							server : fix logprobs, make it OAI-compatible ( #10783 )  
						
						... 
						
						
						
						* server : fix logprobs, make it openai-compatible
* update docs
* add std::log
* return pre-sampling p
* sort before apply softmax
* add comment
* fix test
* set p for sampled token
* update docs
* add --multi-token-probs
* update docs
* add `post_sampling_probs` option
* update docs [no ci]
* remove --multi-token-probs
* "top_probs" with "post_sampling_probs"
* resolve review comments
* rename struct token_prob to prob_info
* correct comment placement
* fix setting prob for sampled token 
						
						
					 
					
						2024-12-19 15:40:08 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						46828872c3 
					 
					
						
						
							
							server : (embeddings) using same format for "input" and "content" ( #10872 )  
						
						... 
						
						
						
						* server : (embeddings) using same format for "input" and "content"
* fix test case
* handle empty input case
* fix test 
						
						
					 
					
						2024-12-18 10:55:09 +02:00 
						 
				 
			
				
					
						
							
							
								krystiancha 
							
						 
					 
					
						
						
							
						
						05c3a444b8 
					 
					
						
						
							
							server : fill usage info in embeddings and rerank responses ( #10852 )  
						
						... 
						
						
						
						* server : fill usage info in embeddings response
* server : fill usage info in reranking response 
						
						
					 
					
						2024-12-17 18:00:24 +02:00 
						 
				 
			
				
					
						
							
							
								Michelle Tan 
							
						 
					 
					
						
						
							
						
						89d604f2c8 
					 
					
						
						
							
							server: Fix has_next_line in JSON response ( #10818 )  
						
						... 
						
						
						
						* Update server JSON response.
* Add unit test to check `has_new_line` JSON response
* Remove `has_new_line` unit test changes.
* Address code review comment: type check for `has_new_line` in unit test 
						
						
					 
					
						2024-12-14 23:29:45 +01:00 
						 
				 
			
				
					
						
							
							
								kallewoof 
							
						 
					 
					
						
						
							
						
						484d2f31ae 
					 
					
						
						
							
							bug-fix: snprintf prints NULL in place of the last character ( #10419 )  
						
						... 
						
						
						
						* bug-fix: snprintf prints NULL in place of the last character
We need to give snprintf enough space to print the last character and the null character, thus we allocate one extra byte and then ignore it when converting to std::string.
* add comment about extra null-term byte requirement 
						
						
					 
					
						2024-12-11 14:48:04 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						3573fa8e7b 
					 
					
						
						
							
							server : (refactor) no more json in server_task input ( #10691 )  
						
						... 
						
						
						
						* server : (refactor) no more json in server_task input
* add test for slots endpoint
* add tests for /props and /slots
* remove task inf_type
* fix CI by adding safe_json_to_str
* add "model_path" to /props
* update readme 
						
						
					 
					
						2024-12-07 20:21:09 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ce4a7b8493 
					 
					
						
						
							
							server : various fixes ( #10704 )  
						
						... 
						
						
						
						* server : various fixes
ggml-ci
* server : show curent seed in slot_params
ggml-ci
* fix /slots endpoint
* Update examples/server/server.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* server : reflect endpoint response changes in the readme
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-12-07 18:02:05 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						6c5bc0625f 
					 
					
						
						
							
							server : (refactoring) do not rely on JSON internally ( #10643 )  
						
						... 
						
						
						
						* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs 
						
						
					 
					
						2024-12-06 11:14:32 +01:00 
						 
				 
			
				
					
						
							
							
								haopeng 
							
						 
					 
					
						
						
							
						
						64ed2091b2 
					 
					
						
						
							
							server: Add "tokens per second" information in the backend ( #10548 )  
						
						... 
						
						
						
						* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-02 14:45:54 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d9d54e498d 
					 
					
						
						
							
							speculative : refactor and add a simpler example ( #10362 )  
						
						... 
						
						
						
						* speculative : refactor and add a simpler example
ggml-ci
* speculative : clean-up and add comments and TODOs [no ci]
* speculative : manage context in common_speculative
ggml-ci
* speculative : simplify
ggml-ci
* speculative : simplify (cont)
ggml-ci
* speculative : add --draft-min CLI arg
* speculative : minor fixup
* make : build fixes
* speculative : do not redraft previous drafts
ggml-ci
* speculative : fix the draft sampling
ggml-ci
* speculative : fix compile warning
* common : refactor args
ggml-ci
* common : change defaults [no ci]
* common : final touches
ggml-ci 
						
						
					 
					
						2024-11-25 09:58:41 +02:00 
						 
				 
			
				
					
						
							
							
								sasha0552 
							
						 
					 
					
						
						
							
						
						42cadc74bd 
					 
					
						
						
							
							server : fix slot selection by lru ( #10126 )  
						
						... 
						
						
						
						* server : fix slot selection by lru, migrate lcs to `size_t`
* minor debug log fix 
						
						
					 
					
						2024-11-02 18:34:56 +02:00 
						 
				 
			
				
					
						
							
							
								sasha0552 
							
						 
					 
					
						
						
							
						
						d865d1478c 
					 
					
						
						
							
							server : fix smart selection of available slot ( #10120 )  
						
						... 
						
						
						
						* Fix smart selection of available slot
* minor fix
* replace vectors of tokens with shorthands 
						
						
					 
					
						2024-11-01 14:33:14 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8d8ff71536 
					 
					
						
						
							
							llama : remove Tail-Free sampling ( #10071 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-29 10:42:05 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8125e6cbfc 
					 
					
						
						
							
							server : don't overfill the batch during infill ( #10018 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-28 08:49:32 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						958367bf53 
					 
					
						
						
							
							server : refactor slot input data, move tokenizer to HTTP thread ( #10023 )  
						
						... 
						
						
						
						* server : refactor slot input data, move tokenizer to HTTP thread
* move prompt_tokens.empty() check
* fix incorrect if branch
* fix infinite generation loop
* bring back infill validation
* add infill test
* try fixing format_infill
* fix test
* remove redundant code
* rename completion to inference
* update docs
* use llama_tokens everywhere 
						
						
					 
					
						2024-10-24 21:51:22 +02:00 
						 
				 
			
				
					
						
							
							
								VoidIsVoid 
							
						 
					 
					
						
						
							
						
						a89f75e1b7 
					 
					
						
						
							
							server : handle "logprobs" field with false value ( #9871 )  
						
						... 
						
						
						
						Co-authored-by: Gimling <huangjl@ruyi.ai > 
						
						
					 
					
						2024-10-14 10:04:36 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c7181bd294 
					 
					
						
						
							
							server : reuse cached context chunks ( #9866 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-13 18:52:48 +03:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						7eee341bee 
					 
					
						
						
							
							common : use common_ prefix for common library functions ( #9805 )  
						
						... 
						
						
						
						* common : use common_ prefix for common library functions
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-10-10 22:57:42 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						458367a906 
					 
					
						
						
							
							server : better security control for public deployments ( #9776 )  
						
						... 
						
						
						
						* server : more explicit endpoint access settings
* protect /props endpoint
* fix tests
* update server docs
* fix typo
* fix tests 
						
						
					 
					
						2024-10-08 13:27:04 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f4d2b8846a 
					 
					
						
						
							
							llama : add reranking support ( #9510 )  
						
						... 
						
						
						
						* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-09-28 17:42:03 +03:00 
						 
				 
			
				
					
						
							
							
								Vinesh Janarthanan 
							
						 
					 
					
						
						
							
						
						8a308354f6 
					 
					
						
						
							
							server : match OAI structured output response ( #9527 )  
						
						
						
						
					 
					
						2024-09-18 09:50:34 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6262d13e0b 
					 
					
						
						
							
							common : reimplement logging ( #9418 )  
						
						... 
						
						
						
						https://github.com/ggerganov/llama.cpp/pull/9418  
					
						2024-09-15 20:46:12 +03:00 
						 
				 
			
				
					
						
							
							
								Mathijs Henquet 
							
						 
					 
					
						
						
							
						
						78203641fe 
					 
					
						
						
							
							server : Add option to return token pieces in /tokenize endpoint ( #9108 )  
						
						... 
						
						
						
						* server : added with_pieces functionality to /tokenize endpoint
* server : Add tokenize with pieces tests to server.feature
* Handle case if tokenizer splits along utf8 continuation bytes
* Add example of token splitting
* Remove trailing ws
* Fix trailing ws
* Maybe fix ci
* maybe this fix windows ci?
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-09-12 22:30:11 +02:00