llama.cpp/ggml.h at 4953e9007f86327aabc8312a7211c18019a3a40e

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Georgi Gerganov 986b6ce9f9 ggml, llama : avoid heavy V transpose + improvements (#775 )

ggml :

- added ggml_view_3d()
- ggml_view_tensor() now inherits the stride too
- reimplement ggml_cpy() to account for dst stride
- no longer require tensor->data to be memory aligned

llama :

- compute RoPE on 32-bit tensors (should be more accurate)
- store RoPE-ed K in the KV cache
- store transposed V in the KV cache (significant speed-up)
- avoid unnecessary Q copy

2023-04-05 22:07:33 +03:00

22 KiB

Raw Blame History

View Raw

22 KiB Raw Blame History

22 KiB

Raw Blame History