Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						727368c60f 
					 
					
						
						
							
							llama : use LLAMA_TOKEN_NULL ( #11062 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-01-06 10:52:15 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f66f582927 
					 
					
						
						
							
							llama : refactor src/llama.cpp ( #10902 )  
						
						... 
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 
					
						2025-01-03 10:18:53 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						0da5d86026 
					 
					
						
						
							
							server : allow using LoRA adapters per-request ( #10994 )  
						
						... 
						
						
						
						* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-01-02 15:05:18 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						45095a61bf 
					 
					
						
						
							
							server : clean up built-in template detection ( #11026 )  
						
						... 
						
						
						
						* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition 
						
						
					 
					
						2024-12-31 15:22:01 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						5896c65232 
					 
					
						
						
							
							server : add OAI compat for /v1/completions ( #10974 )  
						
						... 
						
						
						
						* server : add OAI compat for /v1/completions
* add test
* add docs
* better docs 
						
						
					 
					
						2024-12-31 12:34:13 +01:00 
						 
				 
			
				
					
						
							
							
								Reza Kakhki 
							
						 
					 
					
						
						
							
						
						9ba399dfa7 
					 
					
						
						
							
							server : add support for "encoding_format": "base64" to the */embeddings endpoints ( #10967 )  
						
						... 
						
						
						
						* add support for base64
* fix base64 test
* improve test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-24 21:33:04 +01:00 
						 
				 
			
				
					
						
							
							
								NeverLucky 
							
						 
					 
					
						
						
							
						
						09fe2e7613 
					 
					
						
						
							
							server:  allow filtering llama server response fields ( #10940 )  
						
						... 
						
						
						
						* llama_server_response_fields
* llama_server_response_fields_fix_issues
* params fixes
* fix
* clarify docs
* change to "response_fields"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-24 17:39:49 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						485dc01214 
					 
					
						
						
							
							server : add system_fingerprint to chat/completion ( #10917 )  
						
						... 
						
						
						
						* server : add system_fingerprint to chat/completion
* update README 
						
						
					 
					
						2024-12-23 12:02:44 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						57bb2c40cd 
					 
					
						
						
							
							server : fix logprobs, make it OAI-compatible ( #10783 )  
						
						... 
						
						
						
						* server : fix logprobs, make it openai-compatible
* update docs
* add std::log
* return pre-sampling p
* sort before apply softmax
* add comment
* fix test
* set p for sampled token
* update docs
* add --multi-token-probs
* update docs
* add `post_sampling_probs` option
* update docs [no ci]
* remove --multi-token-probs
* "top_probs" with "post_sampling_probs"
* resolve review comments
* rename struct token_prob to prob_info
* correct comment placement
* fix setting prob for sampled token 
						
						
					 
					
						2024-12-19 15:40:08 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						46828872c3 
					 
					
						
						
							
							server : (embeddings) using same format for "input" and "content" ( #10872 )  
						
						... 
						
						
						
						* server : (embeddings) using same format for "input" and "content"
* fix test case
* handle empty input case
* fix test 
						
						
					 
					
						2024-12-18 10:55:09 +02:00 
						 
				 
			
				
					
						
							
							
								krystiancha 
							
						 
					 
					
						
						
							
						
						05c3a444b8 
					 
					
						
						
							
							server : fill usage info in embeddings and rerank responses ( #10852 )  
						
						... 
						
						
						
						* server : fill usage info in embeddings response
* server : fill usage info in reranking response 
						
						
					 
					
						2024-12-17 18:00:24 +02:00 
						 
				 
			
				
					
						
							
							
								Michelle Tan 
							
						 
					 
					
						
						
							
						
						89d604f2c8 
					 
					
						
						
							
							server: Fix has_next_line in JSON response ( #10818 )  
						
						... 
						
						
						
						* Update server JSON response.
* Add unit test to check `has_new_line` JSON response
* Remove `has_new_line` unit test changes.
* Address code review comment: type check for `has_new_line` in unit test 
						
						
					 
					
						2024-12-14 23:29:45 +01:00 
						 
				 
			
				
					
						
							
							
								kallewoof 
							
						 
					 
					
						
						
							
						
						484d2f31ae 
					 
					
						
						
							
							bug-fix: snprintf prints NULL in place of the last character ( #10419 )  
						
						... 
						
						
						
						* bug-fix: snprintf prints NULL in place of the last character
We need to give snprintf enough space to print the last character and the null character, thus we allocate one extra byte and then ignore it when converting to std::string.
* add comment about extra null-term byte requirement 
						
						
					 
					
						2024-12-11 14:48:04 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						3573fa8e7b 
					 
					
						
						
							
							server : (refactor) no more json in server_task input ( #10691 )  
						
						... 
						
						
						
						* server : (refactor) no more json in server_task input
* add test for slots endpoint
* add tests for /props and /slots
* remove task inf_type
* fix CI by adding safe_json_to_str
* add "model_path" to /props
* update readme 
						
						
					 
					
						2024-12-07 20:21:09 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ce4a7b8493 
					 
					
						
						
							
							server : various fixes ( #10704 )  
						
						... 
						
						
						
						* server : various fixes
ggml-ci
* server : show curent seed in slot_params
ggml-ci
* fix /slots endpoint
* Update examples/server/server.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* server : reflect endpoint response changes in the readme
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-12-07 18:02:05 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						6c5bc0625f 
					 
					
						
						
							
							server : (refactoring) do not rely on JSON internally ( #10643 )  
						
						... 
						
						
						
						* server : (refactoring) reduce usage of json internally
* move all response types to struct
* wip [no ci]
* many fixes
* add virtual function
* fix index
* minor style fix
* add std::move
* refactor handle_completions_generic
* add virtual functions
* remove server.hpp
* clarify server_sent_event RFC specs
* apply review comments
* fix model_alias and completion_probabilities
* small clean up
* remove virtual for to_json_oai_compat()
* naming oai_compat --> oaicompat
* fix unwanted recursive call
* update docs 
						
						
					 
					
						2024-12-06 11:14:32 +01:00 
						 
				 
			
				
					
						
							
							
								haopeng 
							
						 
					 
					
						
						
							
						
						64ed2091b2 
					 
					
						
						
							
							server: Add "tokens per second" information in the backend ( #10548 )  
						
						... 
						
						
						
						* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-12-02 14:45:54 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d9d54e498d 
					 
					
						
						
							
							speculative : refactor and add a simpler example ( #10362 )  
						
						... 
						
						
						
						* speculative : refactor and add a simpler example
ggml-ci
* speculative : clean-up and add comments and TODOs [no ci]
* speculative : manage context in common_speculative
ggml-ci
* speculative : simplify
ggml-ci
* speculative : simplify (cont)
ggml-ci
* speculative : add --draft-min CLI arg
* speculative : minor fixup
* make : build fixes
* speculative : do not redraft previous drafts
ggml-ci
* speculative : fix the draft sampling
ggml-ci
* speculative : fix compile warning
* common : refactor args
ggml-ci
* common : change defaults [no ci]
* common : final touches
ggml-ci 
						
						
					 
					
						2024-11-25 09:58:41 +02:00 
						 
				 
			
				
					
						
							
							
								sasha0552 
							
						 
					 
					
						
						
							
						
						42cadc74bd 
					 
					
						
						
							
							server : fix slot selection by lru ( #10126 )  
						
						... 
						
						
						
						* server : fix slot selection by lru, migrate lcs to `size_t`
* minor debug log fix 
						
						
					 
					
						2024-11-02 18:34:56 +02:00 
						 
				 
			
				
					
						
							
							
								sasha0552 
							
						 
					 
					
						
						
							
						
						d865d1478c 
					 
					
						
						
							
							server : fix smart selection of available slot ( #10120 )  
						
						... 
						
						
						
						* Fix smart selection of available slot
* minor fix
* replace vectors of tokens with shorthands 
						
						
					 
					
						2024-11-01 14:33:14 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8d8ff71536 
					 
					
						
						
							
							llama : remove Tail-Free sampling ( #10071 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-29 10:42:05 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8125e6cbfc 
					 
					
						
						
							
							server : don't overfill the batch during infill ( #10018 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-28 08:49:32 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						958367bf53 
					 
					
						
						
							
							server : refactor slot input data, move tokenizer to HTTP thread ( #10023 )  
						
						... 
						
						
						
						* server : refactor slot input data, move tokenizer to HTTP thread
* move prompt_tokens.empty() check
* fix incorrect if branch
* fix infinite generation loop
* bring back infill validation
* add infill test
* try fixing format_infill
* fix test
* remove redundant code
* rename completion to inference
* update docs
* use llama_tokens everywhere 
						
						
					 
					
						2024-10-24 21:51:22 +02:00 
						 
				 
			
				
					
						
							
							
								VoidIsVoid 
							
						 
					 
					
						
						
							
						
						a89f75e1b7 
					 
					
						
						
							
							server : handle "logprobs" field with false value ( #9871 )  
						
						... 
						
						
						
						Co-authored-by: Gimling <huangjl@ruyi.ai > 
						
						
					 
					
						2024-10-14 10:04:36 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c7181bd294 
					 
					
						
						
							
							server : reuse cached context chunks ( #9866 )  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2024-10-13 18:52:48 +03:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						7eee341bee 
					 
					
						
						
							
							common : use common_ prefix for common library functions ( #9805 )  
						
						... 
						
						
						
						* common : use common_ prefix for common library functions
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-10-10 22:57:42 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						458367a906 
					 
					
						
						
							
							server : better security control for public deployments ( #9776 )  
						
						... 
						
						
						
						* server : more explicit endpoint access settings
* protect /props endpoint
* fix tests
* update server docs
* fix typo
* fix tests 
						
						
					 
					
						2024-10-08 13:27:04 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f4d2b8846a 
					 
					
						
						
							
							llama : add reranking support ( #9510 )  
						
						... 
						
						
						
						* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-09-28 17:42:03 +03:00 
						 
				 
			
				
					
						
							
							
								Vinesh Janarthanan 
							
						 
					 
					
						
						
							
						
						8a308354f6 
					 
					
						
						
							
							server : match OAI structured output response ( #9527 )  
						
						
						
						
					 
					
						2024-09-18 09:50:34 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6262d13e0b 
					 
					
						
						
							
							common : reimplement logging ( #9418 )  
						
						... 
						
						
						
						https://github.com/ggerganov/llama.cpp/pull/9418  
					
						2024-09-15 20:46:12 +03:00 
						 
				 
			
				
					
						
							
							
								Mathijs Henquet 
							
						 
					 
					
						
						
							
						
						78203641fe 
					 
					
						
						
							
							server : Add option to return token pieces in /tokenize endpoint ( #9108 )  
						
						... 
						
						
						
						* server : added with_pieces functionality to /tokenize endpoint
* server : Add tokenize with pieces tests to server.feature
* Handle case if tokenizer splits along utf8 continuation bytes
* Add example of token splitting
* Remove trailing ws
* Fix trailing ws
* Maybe fix ci
* maybe this fix windows ci?
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
					 
					
						2024-09-12 22:30:11 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						6e7d133a5f 
					 
					
						
						
							
							server : refactor multitask handling ( #9274 )  
						
						... 
						
						
						
						* server : remove multitask from server_task
* refactor completions handler
* fix embeddings
* use res_ok everywhere
* small change for handle_slots_action
* use unordered_set everywhere
* (try) fix test
* no more "mutable" lambda
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* use deque
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-09-02 17:11:51 +02:00 
						 
				 
			
				
					
						
							
							
								ardfork 
							
						 
					 
					
						
						
							
						
						978ba3d83d 
					 
					
						
						
							
							Server: Don't ignore llama.cpp params ( #8754 )  
						
						... 
						
						
						
						* Don't ignore llama.cpp params
* Add fallback for max_tokens 
						
						
					 
					
						2024-08-04 20:16:23 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4e24cffd8c 
					 
					
						
						
							
							server : handle content array in chat API ( #8449 )  
						
						... 
						
						
						
						* server : handle content array in chat API
* Update examples/server/utils.hpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2024-07-12 14:48:15 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						48e6b92cc3 
					 
					
						
						
							
							Add chat template support for llama-cli ( #8068 )  
						
						... 
						
						
						
						* add chat template support for llama-cli
* add help message
* server: simplify format_chat
* more consistent naming
* improve
* add llama_chat_format_example
* fix server
* code style
* code style
* Update examples/main/main.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-06-25 21:56:49 +10:00 
						 
				 
			
				
					
						
							
							
								sasha0552 
							
						 
					 
					
						
						
							
						
						7a16ce7db2 
					 
					
						
						
							
							server : smart slot selection using Longest Common Prefix ( #7728 )  
						
						... 
						
						
						
						* server : Smart selection of available slot using Longest Common Substring
* add usage
* remove trailing whitespaces
* Use Longest Common Prefix (LCP) instead of LCS
* Rename argument 
						
						
					 
					
						2024-06-08 10:50:31 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1442677f92 
					 
					
						
						
							
							common : refactor cli arg parsing ( #7675 )  
						
						... 
						
						
						
						* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params 
						
						
					 
					
						2024-06-04 21:23:39 +03:00 
						 
				 
			
				
					
						
							
							
								Benjamin Findley 
							
						 
					 
					
						
						
							
						
						e586ee4259 
					 
					
						
						
							
							change default temperature of OAI compat API from 0 to 1 ( #7226 )  
						
						... 
						
						
						
						* change default temperature of OAI compat API from 0 to 1
* make tests explicitly send temperature to OAI API 
						
						
					 
					
						2024-05-13 12:40:08 +10:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						c12452c7ae 
					 
					
						
						
							
							JSON: [key] -> .at(key), assert() -> GGML_ASSERT ( #7143 )  
						
						
						
						
					 
					
						2024-05-08 21:53:08 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						1fd9c1741d 
					 
					
						
						
							
							clean up json_value & server_log ( #7142 )  
						
						
						
						
					 
					
						2024-05-08 13:24:14 +02:00 
						 
				 
			
				
					
						
							
							
								Pedro Cuenca 
							
						 
					 
					
						
						
							
						
						b97bc3966e 
					 
					
						
						
							
							llama : support Llama 3 HF conversion ( #6745 )  
						
						... 
						
						
						
						* Support Llama 3 conversion
The tokenizer is BPE.
* style
* Accept suggestion
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com >
* llama : add llama_token_is_eog()
ggml-ci
* llama : auto-detect more EOT tokens when missing in KV data
* convert : replacing EOS token is a hack
* llama : fix codegemma EOT token + add TODOs
* llama : fix model type string for 8B model
---------
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2024-04-21 14:50:41 +03:00 
						 
				 
			
				
					
						
							
							
								Pierrick Hymbert 
							
						 
					 
					
						
						
							
						
						75cd4c7729 
					 
					
						
						
							
							ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response ( #6495 )  
						
						... 
						
						
						
						* ci: bench: support sse and fix prompt processing time
server: add tokens usage in stream mode
* ci: bench: README.md EOL
* ci: bench: remove total pp and tg as it is not accurate
* ci: bench: fix case when there is no token generated
* ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics
* ci: bench: fix finish reason rate 
						
						
					 
					
						2024-04-06 05:40:47 +02:00 
						 
				 
			
				
					
						
							
							
								JH23X 
							
						 
					 
					
						
						
							
						
						60cdf40cc3 
					 
					
						
						
							
							server : handle exception on wrong type in request ( #6452 )  
						
						... 
						
						
						
						Co-authored-by: Jonas Holzner <jonas.holzner.external@hensoldt.net > 
						
						
					 
					
						2024-04-03 21:09:52 +03:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						ad3a0505e3 
					 
					
						
						
							
							Server: clean up OAI params parsing function ( #6284 )  
						
						... 
						
						
						
						* server: clean up oai parsing function
* fix response_format
* fix empty response_format
* minor fixes
* add TODO for logprobs
* update docs 
						
						
					 
					
						2024-03-25 09:42:17 +01:00 
						 
				 
			
				
					
						
							
							
								Pierrick Hymbert 
							
						 
					 
					
						
						
							
						
						1b26aebe4d 
					 
					
						
						
							
							server: flush stdout after logging in both text and json layout ( #6253 )  
						
						
						
						
					 
					
						2024-03-23 13:18:45 +01:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						72114edf06 
					 
					
						
						
							
							json-schema-to-grammar : fix order of props + non-str const/enum ( #6232 )  
						
						... 
						
						
						
						* json: ordered json in server/schema converter to respect orig order
* json: ws nits
* json: support non-string const / enums 
						
						
					 
					
						2024-03-22 15:07:44 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						5b7b0ac8df 
					 
					
						
						
							
							json-schema-to-grammar improvements (+ added to server) ( #5978 )  
						
						... 
						
						
						
						* json: fix arrays (disallow `[,1]`)
* json: support tuple types (`[number, string]`)
* json: support additionalProperties (`{[k: string]: [string,number][]}`)
* json: support required / optional properties
* json: add support for pattern
* json: resolve $ref (and support https schema urls)
* json: fix $ref resolution
* join: support union types (mostly for nullable types I think)
* json: support allOf + nested anyOf
* json: support any (`{}` or `{type: object}`)
* json: fix merge
* json: temp fix for escapes
* json: spaces in output and unrestricted output spaces
* json: add typings
* json:fix typo
* Create ts-type-to-grammar.sh
* json: fix _format_literal (json.dumps already escapes quotes)
* json: merge lit sequences and handle negatives
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
* json: handle pattern repetitions
* Update json-schema-to-grammar.mjs
* Create regex-to-grammar.py
* json: extract repeated regexp patterns to subrule
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* json: handle schema from pydantic Optional fields
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update ts-type-to-grammar.sh
* Update ts-type-to-grammar.sh
* json: simplify nullable fields handling
* json: accept duplicate identical rules
* json: revert space to 1 at most
* json: reuse regexp pattern subrules
* json: handle uuid string format
* json: fix literal escapes
* json: add --allow-fetch
* json: simplify range escapes
* json: support negative ranges in patterns
* Delete commit.txt
* json: custom regex parser, adds dot support & JS-portable
* json: rm trailing spaces
* Update json-schema-to-grammar.mjs
* json: updated server & chat `( cd examples/server && ./deps.sh )`
* json: port fixes from mjs to python
* Update ts-type-to-grammar.sh
* json: support prefixItems alongside array items
* json: add date format + fix uuid
* json: add date, time, date-time formats
* json: preserve order of props from TS defs
* json: port schema converter to C++, wire in ./server
* json: nits
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* json: fix mjs implementation + align outputs
* Update json-schema-to-grammar.mjs.hpp
* json: test C++, JS & Python versions
* json: nits + regen deps
* json: cleanup test
* json: revert from c++17 to 11
* json: nit fixes
* json: dirty include for test
* json: fix zig build
* json: pass static command to std::system in tests (fixed temp files)
* json: fix top-level $refs
* json: don't use c++20 designated initializers
* nit
* json: basic support for reserved names `{number:{number:{root:number}}}`
* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
* json: re-ran server deps.sh
* json: simplify test
* json: support mix of additional props & required/optional
* json: add tests for some expected failures
* json: fix type=const in c++, add failure expectations for non-str const&enum
* json: test (& simplify output of) empty schema
* json: check parsing in test + fix value & string refs
* json: add server tests for OAI JSON response_format
* json: test/fix top-level anyOf
* json: improve grammar parsing failures
* json: test/fix additional props corner cases
* json: fix string patterns (was missing quotes)
* json: ws nit
* json: fix json handling in server when there's no response_format
* json: catch schema conversion errors in server
* json: don't complain about unknown format type in server if unset
* json: cleaner build of test
* json: create examples/json-schema-pydantic-example.py
* json: fix date pattern
* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
* json: indent 4 spaces
* json: fix naming of top-level c++ function (+ drop unused one)
* json: avoid using namespace std
* json: fix zig build
* Update server.feature
* json: iostream -> fprintf
* json: space before & refs for consistency
* json: nits 
						
						
					 
					
						2024-03-21 11:50:43 +00:00 
						 
				 
			
				
					
						
							
							
								Karthick 
							
						 
					 
					
						
						
							
						
						47cc7a7bf9 
					 
					
						
						
							
							Server: Handle n_keep parameter in the request ( #6174 )  
						
						
						
						
					 
					
						2024-03-20 12:02:34 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						99b71c068f 
					 
					
						
						
							
							Server: Use multi-task for embeddings endpoint ( #6001 )  
						
						... 
						
						
						
						* use multitask for embd endpoint
* specify types
* remove redundant {"n_predict", 0} 
						
						
					 
					
						2024-03-13 11:39:11 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						caa106d4e0 
					 
					
						
						
							
							Server: format error to json ( #5961 )  
						
						... 
						
						
						
						* server: format error to json
* server: do not crash on grammar error
* fix api key test case
* revert limit max n_predict
* small fix
* correct coding style
* update completion.js
* launch_slot_with_task
* update docs
* update_slots
* update webui
* update readme 
						
						
					 
					
						2024-03-11 10:56:41 +01:00