mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-11-03 09:22:01 +00:00 
			
		
		
		
	* model-conversion: add model card template for embeddings [no ci] This commit adds a separate model card template (model repository README.md template) for embedding models. The motivation for this is that there server command for the embedding model is a little different and some addition information can be useful in the model card for embedding models which might not be directly relevant for causal models. * squash! model-conversion: add model card template for embeddings [no ci] Fix pyright lint error. * remove --pooling override and clarify embd_normalize usage
		
			
				
	
	
		
			49 lines
		
	
	
		
			1.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			49 lines
		
	
	
		
			1.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
---
 | 
						|
base_model:
 | 
						|
- {base_model}
 | 
						|
---
 | 
						|
# {model_name} GGUF
 | 
						|
 | 
						|
Recommended way to run this model:
 | 
						|
 | 
						|
```sh
 | 
						|
llama-server -hf {namespace}/{model_name}-GGUF
 | 
						|
```
 | 
						|
 | 
						|
Then the endpoint can be accessed at http://localhost:8080/embedding, for
 | 
						|
example using `curl`:
 | 
						|
```console
 | 
						|
curl --request POST \
 | 
						|
    --url http://localhost:8080/embedding \
 | 
						|
    --header "Content-Type: application/json" \
 | 
						|
    --data '{{"input": "Hello embeddings"}}' \
 | 
						|
    --silent
 | 
						|
```
 | 
						|
 | 
						|
Alternatively, the `llama-embedding` command line tool can be used:
 | 
						|
```sh
 | 
						|
llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
 | 
						|
```
 | 
						|
 | 
						|
#### embd_normalize
 | 
						|
When a model uses pooling, or the pooling method is specified using `--pooling`,
 | 
						|
the normalization can be controlled by the `embd_normalize` parameter.
 | 
						|
 | 
						|
The default value is `2` which means that the embeddings are normalized using
 | 
						|
the Euclidean norm (L2). Other options are:
 | 
						|
* -1 No normalization
 | 
						|
*  0 Max absolute
 | 
						|
*  1 Taxicab
 | 
						|
*  2 Euclidean/L2
 | 
						|
* \>2 P-Norm
 | 
						|
 | 
						|
This can be passed in the request body to `llama-server`, for example:
 | 
						|
```sh
 | 
						|
    --data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
 | 
						|
```
 | 
						|
 | 
						|
And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
 | 
						|
```sh
 | 
						|
llama-embedding -hf {namespace}/{model_name}-GGUF  --embd-normalize -1 -p "Hello embeddings"
 | 
						|
```
 |