llama.cpp/src/llama-kv-cache.cpp at c0b45097c33e2667a94444f08cc9e36bec0a5e2e

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-08 10:07:01 +00:00

Files

Georgi Gerganov cf0e3ba150 model : avoid ggml_cont_3d for fused QKV weights (#15662 )

* model : avoid ggml_cont_3d for fused QKV weights

ggml-ci

* kv-cache : make cpy_k and cpy_v implementation more readable

ggml-ci

* cont : add comments

ggml-ci

* cont : minor fix [no ci]

* cont : one more fix

* cont : clarity

ggml-ci

* kv-cache : require contiguous heads of k_cur and v_cur

ggml-ci

2025-09-08 10:25:33 +03:00

64 KiB

Raw Blame History

View Raw

64 KiB Raw Blame History

64 KiB

Raw Blame History