llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

ddh0 f6dcda3900 server : context checkpointing for hybrid and recurrent models (#16382 )

* initial commit for branch 3

* generalize `swa_checkpoint` to `ctx_checkpoint`

this extends `llama-server`'s SWA checkpointing logic to include
hybrid/recurrent models such as Jamba, Granite

* oops

* disable debug prints

* keep backwards compat with `--swa-checkpoints`

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* update prompt re-processing message

* fix off-by-one error per GG

* keep `seq_rm` log per GG

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server : fix checkpoint logic to support recurrent caches

* server : cleanup and fixes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2025-10-03 21:34:51 +03:00

batched-bench

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

cvector-generator

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

export-lora

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

gguf-split

ci : use smaller model (#16168 )