llama.cpp/ggml-vulkan-shaders.hpp at 65c64dc36f9bca5b3f100614cdd02bf12d6b3e49

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

0cc4m ba0c7c70ab Vulkan k-quant mmq and ggml-backend offload functionality (#6155 )

* Fix Vulkan no kv offload incoherence

* Add k-quant mul mat mat shaders

* Rework working buffer allocation, reduces vram use noticeably

Clean up cpu assist code, replaced with ggml-backend offload function

* Default to all dedicated GPUs

* Add fallback for integrated GPUs if no dedicated GPUs are found

* Add debug info which device is allocating memory

* Fix Intel dequant issue

Fix validation issue

* Fix Vulkan GGML_OP_GET_ROWS implementation

* Clean up merge artifacts

* Remove Vulkan warning

2024-03-29 17:29:21 +01:00

4.0 MiB

Raw Blame History

View Raw

4.0 MiB Raw Blame History

4.0 MiB

Raw Blame History