llama.cpp/tests/test-thread-safety.cpp at ee3a5a10adf9e83722d1914dddc56a0623ececaf

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-06 09:46:50 +00:00

Files

Georgi Gerganov cd5e3b5754 server : support unified cache across slots (#16736 )

* server : support unified context across slots

* cont : fix speculative decoding initialization

* context : fix n_ctx_per_seq computation

* server : purge slots one by one

* tests : add unified cache server tests

* llama : update per-seq context computation

* test-thread-safety : handle tiny training context of the input model

* server : fix server_tokens clear()

* server : use 4 slots + unified KV by default

* llama : add note about context size queries

* cont : update todos [no ci]

* context : do not cap the size of the context

* tests : adjust parameters to be CI friendlier

* context : add warning

2025-11-02 18:14:04 +02:00

5.5 KiB

Raw Blame History

View Raw

5.5 KiB Raw Blame History

5.5 KiB

Raw Blame History