llama.cpp

CS348Project/llama.cpp

Fork 0

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-06 09:46:50 +00:00

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	cd5e3b5754	server : support unified cache across slots (#16736 ) * server : support unified context across slots * cont : fix speculative decoding initialization * context : fix n_ctx_per_seq computation * server : purge slots one by one * tests : add unified cache server tests * llama : update per-seq context computation * test-thread-safety : handle tiny training context of the input model * server : fix server_tokens clear() * server : use 4 slots + unified KV by default * llama : add note about context size queries * cont : update todos [no ci] * context : do not cap the size of the context * tests : adjust parameters to be CI friendlier * context : add warning	2025-11-02 18:14:04 +02:00
Georgi Gerganov	d2fcd91cf9	server : disable context shift by default (#15416 ) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local	2025-08-19 16:46:37 +03:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00

Author

SHA1

Message

Date

Georgi Gerganov

cd5e3b5754

server : support unified cache across slots (#16736 )

* server : support unified context across slots

* cont : fix speculative decoding initialization

* context : fix n_ctx_per_seq computation

* server : purge slots one by one

* tests : add unified cache server tests

* llama : update per-seq context computation

* test-thread-safety : handle tiny training context of the input model

* server : fix server_tokens clear()

* server : use 4 slots + unified KV by default

* llama : add note about context size queries

* cont : update todos [no ci]

* context : do not cap the size of the context

* tests : adjust parameters to be CI friendlier

* context : add warning

2025-11-02 18:14:04 +02:00

Georgi Gerganov

d2fcd91cf9

server : disable context shift by default (#15416 )

* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local

2025-08-19 16:46:37 +03:00

Diego Devesa

1d36b3670b

llama : move end-user examples to tools directory (#13249 )

* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

2025-05-02 20:27:13 +02:00

3 Commits