llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-06 09:46:50 +00:00

Files

Pascal 2f68ce7cfd webui: auto-refresh /props on inference start to resync model metadata (#16784 )

* webui: auto-refresh /props on inference start to resync model metadata

- Add no-cache headers to /props and /slots
- Throttle slot checks to 30s
- Prevent concurrent fetches with promise guard
- Trigger refresh from chat streaming for legacy and ModelSelector
- Show dynamic serverWarning when using cached data

* fix: restore proper legacy behavior in webui by using unified /props refresh

Updated assistant message bubbles to show each message's stored model when available,
falling back to the current server model only when the per-message value is missing

When the model selector is disabled, now fetches /props and prioritizes that model name
over chunk metadata, then persists it with the streamed message so legacy mode properly
reflects the backend configuration

* fix: detect first valid SSE chunk and refresh server props once

* fix: removed the slots availability throttle constant and state

* webui: purge ai-generated cruft

* chore: update webui static build

2025-11-01 19:49:51 +01:00

batched-bench

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

cvector-generator

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

export-lora

cmake : Do not install tools on iOS targets (#15903 )

2025-09-16 09:54:44 +07:00

gguf-split

ci : use smaller model (#16168 )