llama.cpp/ggml-cuda.cu at eaa13a48ff4136f01c1cdb79cacd61b67ec53095

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-12 10:47:01 +00:00

Files

Georgi Gerganov eaa13a48ff falcon : fix CUDA inference by making K and Q contiguous (#2830 )

* falcon : fix CUDA inference by making K and Q contiguous

ggml-ci

* cuda : add assert to guard from non-cont ropes

2023-08-27 16:40:48 +03:00

View Raw