llama.cpp/ggml-cuda.cu at 16090a5ddeb53783ca29fcc0b4ee3893fed64f90

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-17 11:37:10 +00:00

Files

slaren 7e2b9974d1 ggml-cuda : update rope implementation for parallel decoding (#3254 )

* ggml-cuda : update rope implementation for parallel decoding

* better solution for p0 computation

* fix rope

* simpler rope implementation

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2023-09-19 11:31:36 +03:00

268 KiB

Raw Blame History

View Raw

268 KiB Raw Blame History

268 KiB

Raw Blame History