server: fix correct time_ms calculation in prompt_progress (#17093)

* fix: correct time_ms calculation in send_partial_response

The time_ms field was incorrectly calculated. The division was happening
before the subtraction leading to incorrect values.

Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After:
(ggml_time_us() - slot.t_start_process_prompt) / 1000

* docs : document time_ms field in prompt_progress
This commit is contained in:
Aidan
2025-11-08 13:12:11 +00:00
committed by GitHub
parent 64fe17fbb8
commit eeee367de5
2 changed files with 2 additions and 2 deletions

View File

@@ -3078,7 +3078,7 @@ struct server_context {
res->progress.total = slot.task->n_tokens();
res->progress.cache = slot.n_prompt_tokens_cache;
res->progress.processed = slot.prompt.tokens.size();
res->progress.time_ms = (ggml_time_us() - slot.t_start_process_prompt / 1000);
res->progress.time_ms = (ggml_time_us() - slot.t_start_process_prompt) / 1000;
} else {
res->content = tkn.text_to_send;
res->tokens = { tkn.tok };