Xuan-Son Nguyen
|
3c3635d2f2
|
server : speed up tests (#15836)
* server : speed up tests
* clean up
* restore timeout_seconds in some places
* flake8
* explicit offline
|
2025-09-06 14:45:24 +02:00 |
|
Olivier Chafik
|
e121edc432
|
server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771)
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
|
2025-05-26 00:30:51 +01:00 |
|
Olivier Chafik
|
d785f9c1fd
|
server: fix/test add_generation_prompt (#13770)
Co-authored-by: ochafik <ochafik@google.com>
|
2025-05-25 10:45:49 +01:00 |
|
Olivier Chafik
|
aa48e373f2
|
server: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802)
* Inject date_string in llama 3.x + fix for functionary v2
https://github.com/ggml-org/llama.cpp/issues/12729
* move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2025-05-15 02:39:51 +01:00 |
|