Georgi Gerganov
16bcc1259d
kv-cache : pad the cache size to 256 for performance ( #17046 )
...
* kv-cache : pad the size of the small SWA cache for performance
* context : pad the total context to 256
* cont : future-proof the swa pad
* server : adjust test params to new logic
2025-11-07 20:03:25 +02:00
Johannes Gäßler
e81b8e4b7f
llama: use FA + max. GPU layers by default ( #15434 )
...
* llama: use max. GPU layers by default, auto -fa
* ggml-backend: abort instead of segfault
2025-08-30 16:32:10 +02:00
Georgi Gerganov
d2fcd91cf9
server : disable context shift by default ( #15416 )
...
* server : disable context shift by default
ggml-ci
* server : make scopr of test parameters local
2025-08-19 16:46:37 +03:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-05-02 20:27:13 +02:00