llama.cpp/ggml-cuda.cu at a6704643b62243bc4b6bbcd727d63d44e01a1002

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

Johannes Gäßler 1fcdcc28b1 cuda : performance optimizations (#1530 )

* xor hack

* block y dim

* loop unrolling

* Fixed cmake LLAMA_CUDA_BY option

* Removed hipblas compatibility code

* Define GGML_CUDA_DMMV_BLOCK_Y if not defined

* Fewer iters, more ops per iter

* Renamed DMMV X/Y compilation options

2023-05-26 00:07:29 +03:00

35 KiB

Raw Blame History

View Raw

35 KiB Raw Blame History

35 KiB

Raw Blame History