llama.cpp/ggml-cuda.h at 738ace394a6f8cf0174e90a97185d9e512c0e200

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

slaren 58b367c2d7 cuBLAS: refactor and optimize f16 mat mul performance (#1259 )

* cuBLAS: refactor, convert fp16 to fp32 on device

* cuBLAS: use multiple streams, choose smartly between mul_mat_q and mul_mat_f16

* fix build

* cuBLAS: update block_q5_1

2023-05-01 18:11:07 +02:00

638 B

Raw Blame History

View Raw

638 B Raw Blame History

638 B

Raw Blame History