llama.cpp/tools/server/server.cpp at 229bf686287d18f82c44e89888cc662145ecfdb4

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Georgi Gerganov 85a7d8677b memory : remove KV cache size padding (#16812 )

* memory : remove KV cache size padding

* cont : restore padding for n_kv tensor shape

* server : use slot context size instead of training context size

* server : simplify context limit logic

2025-10-28 20:19:44 +02:00

228 KiB

Raw Blame History

View Raw

228 KiB Raw Blame History

228 KiB

Raw Blame History