kv-cache : pad the cache size to 256 for performance (#17046)

* kv-cache : pad the size of the small SWA cache for performance * context : pad the total context to 256 * cont : future-proof the swa pad * server : adjust test params to new logic
2025-11-10 10:27:03 +00:00 · 2025-11-07 20:03:25 +02:00
parent 9eb9a1331d
commit 16bcc1259d
4 changed files with 14 additions and 7 deletions
--- a/include/llama.h
+++ b/include/llama.h
@@ -463,6 +463,7 @@ extern "C" {

    // NOTE: After creating a llama_context, it is recommended to query the actual values using these functions
    //       In some cases the requested values via llama_context_params may differ from the actual values used by the context
+    //       ref: https://github.com/ggml-org/llama.cpp/pull/17046#discussion_r2503085732
    LLAMA_API uint32_t llama_n_ctx      (const struct llama_context * ctx);
    LLAMA_API uint32_t llama_n_ctx_seq  (const struct llama_context * ctx);
    LLAMA_API uint32_t llama_n_batch    (const struct llama_context * ctx);