mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-11-03 09:22:01 +00:00 
			
		
		
		
	* server: tests: init scenarios - health and slots endpoints - completion endpoint - OAI compatible chat completion requests w/ and without streaming - completion multi users scenario - multi users scenario on OAI compatible endpoint with streaming - multi users with total number of tokens to predict exceeds the KV Cache size - server wrong usage scenario, like in Infinite loop of "context shift" #3969 - slots shifting - continuous batching - embeddings endpoint - multi users embedding endpoint: Segmentation fault #5655 - OpenAI-compatible embeddings API - tokenize endpoint - CORS and api key scenario * server: CI GitHub workflow --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
		
			
				
	
	
		
			51 lines
		
	
	
		
			1.7 KiB
		
	
	
	
		
			Gherkin
		
	
	
	
	
	
			
		
		
	
	
			51 lines
		
	
	
		
			1.7 KiB
		
	
	
	
		
			Gherkin
		
	
	
	
	
	
@llama.cpp
 | 
						|
Feature: Security
 | 
						|
 | 
						|
  Background: Server startup with an api key defined
 | 
						|
    Given a server listening on localhost:8080
 | 
						|
    And   a model file stories260K.gguf
 | 
						|
    And   a server api key llama.cpp
 | 
						|
    Then  the server is starting
 | 
						|
    Then  the server is healthy
 | 
						|
 | 
						|
  Scenario Outline: Completion with some user api key
 | 
						|
    Given a prompt test
 | 
						|
    And   a user api key <api_key>
 | 
						|
    And   4 max tokens to predict
 | 
						|
    And   a completion request with <api_error> api error
 | 
						|
 | 
						|
    Examples: Prompts
 | 
						|
      | api_key   | api_error |
 | 
						|
      | llama.cpp | no        |
 | 
						|
      | llama.cpp | no        |
 | 
						|
      | hackeme   | raised    |
 | 
						|
      |           | raised    |
 | 
						|
 | 
						|
  Scenario Outline: OAI Compatibility
 | 
						|
    Given a system prompt test
 | 
						|
    And   a user prompt test
 | 
						|
    And   a model test
 | 
						|
    And   2 max tokens to predict
 | 
						|
    And   streaming is disabled
 | 
						|
    And   a user api key <api_key>
 | 
						|
    Given an OAI compatible chat completions request with <api_error> api error
 | 
						|
 | 
						|
    Examples: Prompts
 | 
						|
      | api_key   | api_error |
 | 
						|
      | llama.cpp | no        |
 | 
						|
      | llama.cpp | no        |
 | 
						|
      | hackme    | raised    |
 | 
						|
 | 
						|
 | 
						|
  Scenario Outline: CORS Options
 | 
						|
    When an OPTIONS request is sent from <origin>
 | 
						|
    Then CORS header <cors_header> is set to <cors_header_value>
 | 
						|
 | 
						|
    Examples: Headers
 | 
						|
      | origin          | cors_header                      | cors_header_value |
 | 
						|
      | localhost       | Access-Control-Allow-Origin      | localhost         |
 | 
						|
      | web.mydomain.fr | Access-Control-Allow-Origin      | web.mydomain.fr   |
 | 
						|
      | origin          | Access-Control-Allow-Credentials | true              |
 | 
						|
      | web.mydomain.fr | Access-Control-Allow-Methods     | POST              |
 | 
						|
      | web.mydomain.fr | Access-Control-Allow-Headers     | *                 |
 |