webui: auto-refresh /props on inference start to resync model metadata (#16784)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-13 10:57:15 +00:00

* webui: auto-refresh /props on inference start to resync model metadata

- Add no-cache headers to /props and /slots
- Throttle slot checks to 30s
- Prevent concurrent fetches with promise guard
- Trigger refresh from chat streaming for legacy and ModelSelector
- Show dynamic serverWarning when using cached data

* fix: restore proper legacy behavior in webui by using unified /props refresh

Updated assistant message bubbles to show each message's stored model when available,
falling back to the current server model only when the per-message value is missing

When the model selector is disabled, now fetches /props and prioritizes that model name
over chunk metadata, then persists it with the streamed message so legacy mode properly
reflects the backend configuration

* fix: detect first valid SSE chunk and refresh server props once

* fix: removed the slots availability throttle constant and state

* webui: purge ai-generated cruft

* chore: update webui static build

This commit is contained in:

Pascal

2025-11-01 19:49:51 +01:00

committed by

GitHub

parent e4a71599e5

commit 2f68ce7cfd

7 changed files with 180 additions and 70 deletions

BIN
tools/server/public/index.html.gz

View File

Binary file not shown.

webui: auto-refresh /props on inference start to resync model metadata (#16784)

BIN tools/server/public/index.html.gz View File

BIN
tools/server/public/index.html.gz

View File