mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	server : update readme about token probs (#4777)
* updated server readme to reflect the gg/server-token-probs-4088 commit added explanation for the API's completion result which now includes `completion_probabilities`. Also added a JSON schema that shows the type/structure of `completion_probabilities`. * simplified the `completion_probabilities` JSON schema It's now easier to understand what the structure of `completion_probabilities` looks like. * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
		| @@ -175,35 +175,44 @@ node index.js | ||||
|  | ||||
|     `system_prompt`: Change the system prompt (initial prompt of all slots), this is useful for chat applications. [See more](#change-system-prompt-on-runtime) | ||||
|  | ||||
|     *Result JSON:* | ||||
| ### Result JSON: | ||||
|  | ||||
|     Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion. | ||||
| * Note: When using streaming mode (`stream`) only `content` and `stop` will be returned until end of completion. | ||||
|  | ||||
|     `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string. | ||||
|  | ||||
|     `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options) | ||||
| - `completion_probabilities`: An array of token probabilities for each completion. The array's length is `n_predict`. Each item in the array has the following structure: | ||||
|  | ||||
|     `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model` | ||||
| ``` | ||||
| { | ||||
|   "content": "<the token selected by the model>", | ||||
|   "probs": [ | ||||
|     { | ||||
|       "prob": float, | ||||
|       "tok_str": "<most likely token>" | ||||
|     }, | ||||
|     { | ||||
|       "prob": float, | ||||
|       "tok_str": "<second most likely tonen>" | ||||
|     }, | ||||
|     ... | ||||
|   ] | ||||
| }, | ||||
| ``` | ||||
| Notice that each `probs` is an array of length `n_probs`. | ||||
|  | ||||
|     `model`: The path to the model loaded with `-m` | ||||
|  | ||||
|     `prompt`: The provided `prompt` | ||||
|  | ||||
|     `stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token | ||||
|  | ||||
|     `stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered | ||||
|  | ||||
|     `stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided | ||||
|  | ||||
|     `stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word) | ||||
|  | ||||
|     `timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second` | ||||
|  | ||||
|     `tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`) | ||||
|  | ||||
|     `tokens_evaluated`: Number of tokens evaluated in total from the prompt | ||||
|  | ||||
|     `truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`) | ||||
| - `content`: Completion result as a string (excluding `stopping_word` if any). In case of streaming mode, will contain the next token as a string. | ||||
| - `stop`: Boolean for use with `stream` to check whether the generation has stopped (Note: This is not related to stopping words array `stop` from input options) | ||||
| - `generation_settings`: The provided options above excluding `prompt` but including `n_ctx`, `model` | ||||
| - `model`: The path to the model loaded with `-m` | ||||
| - `prompt`: The provided `prompt` | ||||
| - `stopped_eos`: Indicating whether the completion has stopped because it encountered the EOS token | ||||
| - `stopped_limit`: Indicating whether the completion stopped because `n_predict` tokens were generated before stop words or EOS was encountered | ||||
| - `stopped_word`: Indicating whether the completion stopped due to encountering a stopping word from `stop` JSON array provided | ||||
| - `stopping_word`: The stopping word encountered which stopped the generation (or "" if not stopped due to a stopping word) | ||||
| - `timings`: Hash of timing information about the completion such as the number of tokens `predicted_per_second` | ||||
| - `tokens_cached`: Number of tokens from the prompt which could be re-used from previous completion (`n_past`) | ||||
| - `tokens_evaluated`: Number of tokens evaluated in total from the prompt | ||||
| - `truncated`: Boolean indicating if the context size was exceeded during generation, i.e. the number of tokens provided in the prompt (`tokens_evaluated`) plus tokens generated (`tokens predicted`) exceeded the context size (`n_ctx`) | ||||
|  | ||||
| -   **POST** `/tokenize`: Tokenize a given text. | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Behnam M
					Behnam M