llama.cpp/ggml-opencl.cpp at d1f563a743a83dabc11e125d4a7d64189c16498c

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-01 09:01:57 +00:00

Files

0cc4m dcb2ed4826 OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653 )

* Use events instead of clFinish, where possible

* OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel

* Reduce queueing overhead for contiguous tensors by using single mul kernel call

* Adapt to #1612 cl_mem malloc changes

* Reduce code duplication between cuda and opencl branches

* Improve implementation

2023-06-04 08:12:05 +02:00

43 KiB

Raw Blame History

View Raw

43 KiB Raw Blame History

43 KiB

Raw Blame History