mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-31 08:51:55 +00:00 
			
		
		
		
	 5b7b0ac8df
			
		
	
	5b7b0ac8df
	
	
	
		
			
			* json: fix arrays (disallow `[,1]`)
* json: support tuple types (`[number, string]`)
* json: support additionalProperties (`{[k: string]: [string,number][]}`)
* json: support required / optional properties
* json: add support for pattern
* json: resolve $ref (and support https schema urls)
* json: fix $ref resolution
* join: support union types (mostly for nullable types I think)
* json: support allOf + nested anyOf
* json: support any (`{}` or `{type: object}`)
* json: fix merge
* json: temp fix for escapes
* json: spaces in output and unrestricted output spaces
* json: add typings
* json:fix typo
* Create ts-type-to-grammar.sh
* json: fix _format_literal (json.dumps already escapes quotes)
* json: merge lit sequences and handle negatives
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
* json: handle pattern repetitions
* Update json-schema-to-grammar.mjs
* Create regex-to-grammar.py
* json: extract repeated regexp patterns to subrule
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* json: handle schema from pydantic Optional fields
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update ts-type-to-grammar.sh
* Update ts-type-to-grammar.sh
* json: simplify nullable fields handling
* json: accept duplicate identical rules
* json: revert space to 1 at most
* json: reuse regexp pattern subrules
* json: handle uuid string format
* json: fix literal escapes
* json: add --allow-fetch
* json: simplify range escapes
* json: support negative ranges in patterns
* Delete commit.txt
* json: custom regex parser, adds dot support & JS-portable
* json: rm trailing spaces
* Update json-schema-to-grammar.mjs
* json: updated server & chat `( cd examples/server && ./deps.sh )`
* json: port fixes from mjs to python
* Update ts-type-to-grammar.sh
* json: support prefixItems alongside array items
* json: add date format + fix uuid
* json: add date, time, date-time formats
* json: preserve order of props from TS defs
* json: port schema converter to C++, wire in ./server
* json: nits
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* json: fix mjs implementation + align outputs
* Update json-schema-to-grammar.mjs.hpp
* json: test C++, JS & Python versions
* json: nits + regen deps
* json: cleanup test
* json: revert from c++17 to 11
* json: nit fixes
* json: dirty include for test
* json: fix zig build
* json: pass static command to std::system in tests (fixed temp files)
* json: fix top-level $refs
* json: don't use c++20 designated initializers
* nit
* json: basic support for reserved names `{number:{number:{root:number}}}`
* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
* json: re-ran server deps.sh
* json: simplify test
* json: support mix of additional props & required/optional
* json: add tests for some expected failures
* json: fix type=const in c++, add failure expectations for non-str const&enum
* json: test (& simplify output of) empty schema
* json: check parsing in test + fix value & string refs
* json: add server tests for OAI JSON response_format
* json: test/fix top-level anyOf
* json: improve grammar parsing failures
* json: test/fix additional props corner cases
* json: fix string patterns (was missing quotes)
* json: ws nit
* json: fix json handling in server when there's no response_format
* json: catch schema conversion errors in server
* json: don't complain about unknown format type in server if unset
* json: cleaner build of test
* json: create examples/json-schema-pydantic-example.py
* json: fix date pattern
* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
* json: indent 4 spaces
* json: fix naming of top-level c++ function (+ drop unused one)
* json: avoid using namespace std
* json: fix zig build
* Update server.feature
* json: iostream -> fprintf
* json: space before & refs for consistency
* json: nits
		
	
		
			
				
	
	
		
			101 lines
		
	
	
		
			4.7 KiB
		
	
	
	
		
			Gherkin
		
	
	
	
	
	
			
		
		
	
	
			101 lines
		
	
	
		
			4.7 KiB
		
	
	
	
		
			Gherkin
		
	
	
	
	
	
| @llama.cpp
 | |
| @server
 | |
| Feature: llama.cpp server
 | |
| 
 | |
|   Background: Server startup
 | |
|     Given a server listening on localhost:8080
 | |
|     And   a model url https://huggingface.co/ggml-org/models/resolve/main/tinyllamas/stories260K.gguf
 | |
|     And   a model file stories260K.gguf
 | |
|     And   a model alias tinyllama-2
 | |
|     And   42 as server seed
 | |
|       # KV Cache corresponds to the total amount of tokens
 | |
|       # that can be stored across all independent sequences: #4130
 | |
|       # see --ctx-size and #5568
 | |
|     And   256 KV cache size
 | |
|     And   32 as batch size
 | |
|     And   2 slots
 | |
|     And   64 server max tokens to predict
 | |
|     And   prometheus compatible metrics exposed
 | |
|     Then  the server is starting
 | |
|     Then  the server is healthy
 | |
| 
 | |
|   Scenario: Health
 | |
|     Then the server is ready
 | |
|     And  all slots are idle
 | |
| 
 | |
| 
 | |
|   Scenario Outline: Completion
 | |
|     Given a prompt <prompt>
 | |
|     And   <n_predict> max tokens to predict
 | |
|     And   a completion request with no api error
 | |
|     Then  <n_predicted> tokens are predicted matching <re_content>
 | |
|     And   the completion is <truncated> truncated
 | |
|     And   <n_prompt> prompt tokens are processed
 | |
|     And   prometheus metrics are exposed
 | |
|     And   metric llamacpp:tokens_predicted is <n_predicted>
 | |
| 
 | |
|     Examples: Prompts
 | |
|       | prompt                                                                    | n_predict | re_content                                  | n_prompt | n_predicted | truncated |
 | |
|       | I believe the meaning of life is                                          | 8         | (read\|going)+                              | 18       | 8           | not       |
 | |
|       | Write a joke about AI from a very long prompt which will not be truncated | 256       | (princesses\|everyone\|kids\|Anna\|forest)+ | 46       | 64          | not       |
 | |
| 
 | |
|   Scenario: Completion prompt truncated
 | |
|     Given a prompt:
 | |
|     """
 | |
|     Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
 | |
|     Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
 | |
|     Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
 | |
|     Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
 | |
|     """
 | |
|     And   a completion request with no api error
 | |
|     Then  64 tokens are predicted matching fun|Annaks|popcorns|pictry|bowl
 | |
|     And   the completion is  truncated
 | |
|     And   109 prompt tokens are processed
 | |
| 
 | |
| 
 | |
|   Scenario Outline: OAI Compatibility
 | |
|     Given a model <model>
 | |
|     And   a system prompt <system_prompt>
 | |
|     And   a user prompt <user_prompt>
 | |
|     And   <max_tokens> max tokens to predict
 | |
|     And   streaming is <enable_streaming>
 | |
|     Given an OAI compatible chat completions request with no api error
 | |
|     Then  <n_predicted> tokens are predicted matching <re_content>
 | |
|     And   <n_prompt> prompt tokens are processed
 | |
|     And   the completion is <truncated> truncated
 | |
| 
 | |
|     Examples: Prompts
 | |
|       | model        | system_prompt               | user_prompt                          | max_tokens | re_content                        | n_prompt | n_predicted | enable_streaming | truncated |
 | |
|       | llama-2      | Book                        | What is the best book                | 8          | (Here\|what)+                     | 77       | 8           | disabled         | not       |
 | |
|       | codellama70b | You are a coding assistant. | Write the fibonacci function in c++. | 128        | (thanks\|happy\|bird\|Annabyear)+ | -1       | 64          | enabled          |           |
 | |
| 
 | |
| 
 | |
|   Scenario Outline: OAI Compatibility w/ response format
 | |
|     Given a model test
 | |
|     And   a system prompt test
 | |
|     And   a user prompt test
 | |
|     And   a response format <response_format>
 | |
|     And   10 max tokens to predict
 | |
|     Given an OAI compatible chat completions request with no api error
 | |
|     Then  <n_predicted> tokens are predicted matching <re_content>
 | |
| 
 | |
|     Examples: Prompts
 | |
|       | response_format                                                     | n_predicted | re_content             |
 | |
|       | {"type": "json_object", "schema": {"const": "42"}}                  | 5           | "42"                   |
 | |
|       | {"type": "json_object", "schema": {"items": [{"type": "integer"}]}} | 10          | \[ -300 \]             |
 | |
|       | {"type": "json_object"}                                             | 10          | \{ " Jacky.            |
 | |
| 
 | |
| 
 | |
|   Scenario: Tokenize / Detokenize
 | |
|     When tokenizing:
 | |
|     """
 | |
|     What is the capital of France ?
 | |
|     """
 | |
|     Then tokens can be detokenize
 | |
| 
 | |
|   Scenario: Models available
 | |
|     Given available models
 | |
|     Then  1 models are supported
 | |
|     Then  model 0 is identified by tinyllama-2
 | |
|     Then  model 0 is trained on 128 tokens context
 |