llama.cpp/llama.cpp at d2beca95dcfcd6f1145886e914b879ffc3604b7a

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-29 08:41:22 +00:00

Files

Georgi Gerganov 986b6ce9f9 ggml, llama : avoid heavy V transpose + improvements (#775 )

ggml :

- added ggml_view_3d()
- ggml_view_tensor() now inherits the stride too
- reimplement ggml_cpy() to account for dst stride
- no longer require tensor->data to be memory aligned

llama :

- compute RoPE on 32-bit tensors (should be more accurate)
- store RoPE-ed K in the KV cache
- store transposed V in the KV cache (significant speed-up)
- avoid unnecessary Q copy

2023-04-05 22:07:33 +03:00

59 KiB

Raw Blame History

View Raw

59 KiB Raw Blame History

59 KiB

Raw Blame History