llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-09 10:17:06 +00:00

Files

Georgi Gerganov 16bcc1259d kv-cache : pad the cache size to 256 for performance (#17046 )

* kv-cache : pad the size of the small SWA cache for performance

* context : pad the total context to 256

* cont : future-proof the swa pad

* server : adjust test params to new logic

2025-11-07 20:03:25 +02:00

test_basic.py

server : host-memory prompt caching (#16391 )

2025-10-09 18:54:51 +03:00

test_chat_completion.py

server : support unified cache across slots (#16736 )

2025-11-02 18:14:04 +02:00

test_completion.py

server : support unified cache across slots (#16736 )

2025-11-02 18:14:04 +02:00

test_ctx_shift.py

memory : remove KV cache size padding (#16812 )

2025-10-28 20:19:44 +02:00

test_embedding.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_infill.py

server : support unified cache across slots (#16736 )

2025-11-02 18:14:04 +02:00

test_lora.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_rerank.py

server / ranking : add sorting and management of top_n (#16403 )

2025-10-11 16:39:04 +03:00

test_security.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_slot_save.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_speculative.py

kv-cache : pad the cache size to 256 for performance (#17046 )

2025-11-07 20:03:25 +02:00

test_template.py

server : speed up tests (#15836 )

2025-09-06 14:45:24 +02:00

test_tokenize.py

server : disable context shift by default (#15416 )

2025-08-19 16:46:37 +03:00

test_tool_call.py

server : speed up tests (#15836 )

2025-09-06 14:45:24 +02:00

test_vision_api.py

server : speed up tests (#15836 )

2025-09-06 14:45:24 +02:00