mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-28 08:31:25 +00:00 
			
		
		
		
	docs : Quantum -> Quantized (#8666)
* docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.
This commit is contained in:
		| @@ -1,6 +1,6 @@ | ||||
| # llama.cpp/examples/imatrix | ||||
|  | ||||
| Compute an importance matrix for a model and given text dataset. Can be used during quantization to enchance the quality of the quantum models. | ||||
| Compute an importance matrix for a model and given text dataset. Can be used during quantization to enchance the quality of the quantized models. | ||||
| More information is available here: https://github.com/ggerganov/llama.cpp/pull/4861 | ||||
|  | ||||
| ## Usage | ||||
|   | ||||
| @@ -5,7 +5,7 @@ Fast, lightweight, pure C/C++ HTTP server based on [httplib](https://github.com/ | ||||
| Set of LLM REST APIs and a simple web front end to interact with llama.cpp. | ||||
|  | ||||
| **Features:** | ||||
|  * LLM inference of F16 and quantum models on GPU and CPU | ||||
|  * LLM inference of F16 and quantized models on GPU and CPU | ||||
|  * [OpenAI API](https://github.com/openai/openai-openapi) compatible chat completions and embeddings routes | ||||
|  * Parallel decoding with multi-user support | ||||
|  * Continuous batching | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Ujjawal Panchal
					Ujjawal Panchal