llama.cpp/tools/server/server.cpp at c4df49a42d396bdf7344501813e7de53bc9e7bb3

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

Xuan-Son Nguyen 61bdfd5298 server : implement prompt processing progress report in stream mode (#15827 )

* server : implement `return_progress`

* add timings.cache_n

* add progress.time_ms

* add test

* fix test for chat/completions

* readme: add docs on timings

* use ggml_time_us

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2025-09-06 13:35:04 +02:00

218 KiB

Raw Blame History

View Raw

218 KiB Raw Blame History

218 KiB

Raw Blame History