mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-31 08:51:55 +00:00 
			
		
		
		
	Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)
* k_cache: be able to use Q5_0 * k_cache: be able to use Q5_1 on CODA * k_cache: be able to use Q5_0 on Metal * k_cache: be able to use Q5_1 on Metal * k_cache: be able to use IQ4_NL - just CUDA for now * k_cache: be able to use IQ4_NL on Metal * k_cache: add newly added supported types to llama-bench and CUDA supports_op --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
This commit is contained in:
		| @@ -1590,6 +1590,9 @@ static ggml_type kv_cache_type_from_str(const std::string & s) { | ||||
|     if (s == "q4_1") { | ||||
|         return GGML_TYPE_Q4_1; | ||||
|     } | ||||
|     if (s == "iq4_nl") { | ||||
|         return GGML_TYPE_IQ4_NL; | ||||
|     } | ||||
|     if (s == "q5_0") { | ||||
|         return GGML_TYPE_Q5_0; | ||||
|     } | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Kawrakow
					Kawrakow