llama.cpp/examples/save-load-state/save-load-state.cpp at e4640d8fdf56f14a6db3d092bcd3d2d315cb5d04

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

David Friehs df845cc982 llama : minimize size used for state save/load (#4820 )

* examples : save-load-state: save only required state

* llama : only reserve n_vocab * n_batch at most for logits

llama_decode asserts that only n_batch tokens are passed each call, and
n_ctx is expected to be bigger than n_batch.

* llama : always reserve n_vocab * n_batch for logits

llama_context de-serialization breaks if the contexts have differing
capacity for logits and llama_decode will at maximum resize to
n_vocab * n_batch.

* llama : only save and restore used logits

for batch sizes of 512 this reduces save state in the best case by
around 62 MB, which can be a lot if planning to save on each message
to allow regenerating messages.

* llama : use ostringstream and istringstream for save and load

* llama : serialize rng into minimum amount of space required

* llama : break session version due to serialization changes

2024-01-13 18:29:43 +02:00

4.7 KiB

Raw Blame History

View Raw

4.7 KiB Raw Blame History

4.7 KiB

Raw Blame History