mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	Vulkan k-quant mmq and ggml-backend offload functionality (#6155)
* Fix Vulkan no kv offload incoherence * Add k-quant mul mat mat shaders * Rework working buffer allocation, reduces vram use noticeably Clean up cpu assist code, replaced with ggml-backend offload function * Default to all dedicated GPUs * Add fallback for integrated GPUs if no dedicated GPUs are found * Add debug info which device is allocating memory * Fix Intel dequant issue Fix validation issue * Fix Vulkan GGML_OP_GET_ROWS implementation * Clean up merge artifacts * Remove Vulkan warning
This commit is contained in:
		| @@ -636,15 +636,6 @@ Building the program with BLAS support may lead to some performance improvements | ||||
|  | ||||
| - #### Vulkan | ||||
|  | ||||
| > [!WARNING] | ||||
| > | ||||
| > Vulkan support has been broken in https://github.com/ggerganov/llama.cpp/pull/6122 | ||||
| > due to relying on `GGML_OP_GET_ROWS` which is not yet properly supported by the Vulkan backend, | ||||
| > but should be fixed relatively soon (possibly in https://github.com/ggerganov/llama.cpp/pull/6155 | ||||
| > (ref: https://github.com/ggerganov/llama.cpp/pull/6122#issuecomment-2015327635)). | ||||
| > | ||||
| > Meanwhile, if you want to use the Vulkan backend, you should use the commit right before the breaking change, https://github.com/ggerganov/llama.cpp/commit/55c1b2a3bbd470e9e2a3a0618b92cf64a885f806 | ||||
|  | ||||
|   **With docker**: | ||||
|  | ||||
|   You don't need to install Vulkan SDK. It will be installed inside the container. | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 0cc4m
					0cc4m