llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Radoslav Gerganov 2b6b55a59f server : include usage statistics only when user request them (#16052 )

* server : include usage statistics only when user request them

When serving the OpenAI compatible API, we should check if
{"stream_options": {"include_usage": true} is set in the request when
deciding whether we should send usage statistics

closes: #16048

* add unit test

2025-09-18 10:36:57 +00:00

test_basic.py

SvelteKit-based WebUI (#14839 )

2025-09-17 19:29:13 +02:00

test_chat_completion.py

server : include usage statistics only when user request them (#16052 )

2025-09-18 10:36:57 +00:00

test_completion.py

server : Support multimodal completion and embeddings prompts in JSON format (#15108 )

2025-08-22 10:10:14 +02:00

test_ctx_shift.py

llama: use FA + max. GPU layers by default (#15434 )