llama.cpp/tools/server/server.cpp at ae532eac2c1df1d8edc3d2719145895b966de1bf

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-04 09:32:00 +00:00

Files

Oleksandr Kuvshynov e5155e6986 server : export max observed n_past value (#15361 )

Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.

2025-08-18 00:28:58 +02:00

205 KiB

Raw Blame History

View Raw

205 KiB Raw Blame History

205 KiB

Raw Blame History