llama.cpp/ggml-cuda.h at f647ce040ff06348d2ceaa5443a6a7a8b80c70c9

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

slaren 58b367c2d7 cuBLAS: refactor and optimize f16 mat mul performance (#1259 )

* cuBLAS: refactor, convert fp16 to fp32 on device

* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16

* fix build

* cuBLAS: update block_q5_1

2023-05-01 18:11:07 +02:00

638 B

Raw Blame History

View Raw

638 B Raw Blame History

638 B

Raw Blame History