llama.cpp/ggml.c at 986b6ce9f99503c51ec5afd8a10baa32359434c6

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

Georgi Gerganov 986b6ce9f9 ggml, llama : avoid heavy V transpose + improvements (#775 )

ggml :

- added ggml_view_3d()
- ggml_view_tensor() now inherits the stride too
- reimplement ggml_cpy() to account for dst stride
- no longer require tensor->data to be memory aligned

llama :

- compute RoPE on 32-bit tensors (should be more accurate)
- store RoPE-ed K in the KV cache
- store transposed V in the KV cache (significant speed-up)
- avoid unnecessary Q copy

2023-04-05 22:07:33 +03:00

327 KiB

Raw Blame History

View Raw

327 KiB Raw Blame History

327 KiB

Raw Blame History