mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	 5d6688de08
			
		
	
	5d6688de08
	
	
	
		
			
			This commit updates the modelcard.template file used in the model conversion scripts for embedding models to include the llama-server --embeddings flag in the recommended command to run the model. The motivation for this change was that when using the model-conversion "tool" to upload the EmbeddingGemma models to Hugging Face this flag was missing and the embedding endpoint was there for not available when copy&pasting the command.
		
			
				
	
	
		
			49 lines
		
	
	
		
			1.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			49 lines
		
	
	
		
			1.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| ---
 | |
| base_model:
 | |
| - {base_model}
 | |
| ---
 | |
| # {model_name} GGUF
 | |
| 
 | |
| Recommended way to run this model:
 | |
| 
 | |
| ```sh
 | |
| llama-server -hf {namespace}/{model_name}-GGUF --embeddings
 | |
| ```
 | |
| 
 | |
| Then the endpoint can be accessed at http://localhost:8080/embedding, for
 | |
| example using `curl`:
 | |
| ```console
 | |
| curl --request POST \
 | |
|     --url http://localhost:8080/embedding \
 | |
|     --header "Content-Type: application/json" \
 | |
|     --data '{{"input": "Hello embeddings"}}' \
 | |
|     --silent
 | |
| ```
 | |
| 
 | |
| Alternatively, the `llama-embedding` command line tool can be used:
 | |
| ```sh
 | |
| llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
 | |
| ```
 | |
| 
 | |
| #### embd_normalize
 | |
| When a model uses pooling, or the pooling method is specified using `--pooling`,
 | |
| the normalization can be controlled by the `embd_normalize` parameter.
 | |
| 
 | |
| The default value is `2` which means that the embeddings are normalized using
 | |
| the Euclidean norm (L2). Other options are:
 | |
| * -1 No normalization
 | |
| *  0 Max absolute
 | |
| *  1 Taxicab
 | |
| *  2 Euclidean/L2
 | |
| * \>2 P-Norm
 | |
| 
 | |
| This can be passed in the request body to `llama-server`, for example:
 | |
| ```sh
 | |
|     --data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
 | |
| ```
 | |
| 
 | |
| And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
 | |
| ```sh
 | |
| llama-embedding -hf {namespace}/{model_name}-GGUF  --embd-normalize -1 -p "Hello embeddings"
 | |
| ```
 |