llama.cpp/tools/server/server.cpp at ca71fb9b368e3db96e028f80c4c9df6b6b370edd

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

ddh0 f6dcda3900 server : context checkpointing for hybrid and recurrent models (#16382 )

* initial commit for branch 3

* generalize `swa_checkpoint` to `ctx_checkpoint`

this extends `llama-server`'s SWA checkpointing logic to include
hybrid/recurrent models such as Jamba, Granite

* oops

* disable debug prints

* keep backwards compat with `--swa-checkpoints`

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* update prompt re-processing message

* fix off-by-one error per GG

* keep `seq_rm` log per GG

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server : fix checkpoint logic to support recurrent caches

* server : cleanup and fixes

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2025-10-03 21:34:51 +03:00

219 KiB

Raw Blame History

View Raw

219 KiB Raw Blame History

219 KiB

Raw Blame History