Files
llama.cpp/ggml-cuda.cu
slaren 7e2b9974d1 ggml-cuda : update rope implementation for parallel decoding (#3254)
* ggml-cuda : update rope implementation for parallel decoding

* better solution for p0 computation

* fix rope

* simpler rope implementation

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-19 11:31:36 +03:00

268 KiB