server: update readme to mention n_past_max metric (#16436)

https://github.com/ggml-org/llama.cpp/pull/15361 added new metric exported, but I've missed this doc.
2025-10-27 08:21:30 +00:00 · 2025-10-06 03:53:31 -04:00
parent ca71fb9b36
commit c5fef0fcea
1 changed files with 1 additions and 0 deletions
--- a/tools/server/README.md
+++ b/tools/server/README.md
@@ -1045,6 +1045,7 @@ Available metrics:
 - `llamacpp:kv_cache_tokens`: KV-cache tokens.
 - `llamacpp:requests_processing`: Number of requests processing.
 - `llamacpp:requests_deferred`: Number of requests deferred.
+- `llamacpp:n_past_max`: High watermark of the context size observed.

 ### POST `/slots/{id_slot}?action=save`: Save the prompt cache of the specified slot to a file.