Georgi Gerganov
432cf4304c
codeowners : update + cleanup ( #16174 )
...
---------
Co-authored-by: slaren <slarengh@gmail.com >
2025-09-22 18:20:21 +03:00
Georgi Gerganov
7f766929ca
sync : ggml
2025-09-20 13:02:14 +03:00
Xuan-Son Nguyen
3c3635d2f2
server : speed up tests ( #15836 )
...
* server : speed up tests
* clean up
* restore timeout_seconds in some places
* flake8
* explicit offline
2025-09-06 14:45:24 +02:00
Piotr Wilkin (ilintar)
9e2b1e83c6
scripts : add Jinja tester PySide6 simple app ( #15756 )
...
* feat: add Jinja tester PySide6 simple app
* Linter fixes
* Pylint fixes
* Whitespace
* Add commandline support; add formatter; add extensions
* Remove testing actions
* Silence flake8 warnings for commandline mode
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Fix trailing whitespace/newline logic
* Update scripts/jinja/jinja-tester.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update scripts/jinja/jinja-tester.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-09-05 01:05:12 +02:00
Johannes Gäßler
e81b8e4b7f
llama: use FA + max. GPU layers by default ( #15434 )
...
* llama: use max. GPU layers by default, auto -fa
* ggml-backend: abort instead of segfault
2025-08-30 16:32:10 +02:00
Johannes Gäßler
3d16b29c3b
scripts: strip "AMD Instinct" from GPU name ( #15668 )
2025-08-29 22:04:08 +02:00
Aman Gupta
55042b3692
scripts: add sqlite3 check for compare-commits.sh ( #15633 )
2025-08-28 19:23:22 +08:00
Johannes Gäßler
9ef536907d
scripts: fix compare-llama-bench.py ( #15521 )
2025-08-23 13:58:58 +03:00
Georgi Gerganov
9ebebef62f
llama : remove KV cache defragmentation logic ( #15473 )
...
ggml-ci
2025-08-22 12:22:13 +03:00
Georgi Gerganov
60212f1ead
sync : ggml
2025-08-18 22:06:44 +03:00
Georgi Gerganov
f0c541d315
scripts : update sync scripts
2025-08-18 22:06:44 +03:00
Georgi Gerganov
3973163bff
sync : ggml
...
ggml-ci
2025-08-14 14:59:27 +03:00
Johannes Gäßler
4850b52aed
server-bench: external OAI servers, sqlite ( #15179 )
...
* server-bench: external OAI servers, sqlite
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* raise_for_status
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-08-08 23:04:36 +02:00
Johannes Gäßler
20638e4f16
scripts: fix crash when --tool is not set ( #15133 )
2025-08-07 08:50:30 +02:00
R0CKSTAR
3025b621d1
llama-bench: rename DB table name from test to llama_bench ( #15003 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-08-02 17:20:40 +08:00
R0CKSTAR
484b2091ce
compare-commits.sh: support both llama-bench and test-backend-ops ( #14392 )
...
* compare-commits.sh: support both llama-bench and test-backend-ops
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Speed up the build by specifying -j 12
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Remove build_number from test-backend-ops db
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Apply suggestion from @JohannesGaessler
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* Refine tool selection logic
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-08-01 08:47:27 +08:00
Georgi Gerganov
e32a4ec60e
sync : ggml
...
ggml-ci
2025-07-30 17:33:11 +03:00
Johannes Gäßler
bbd0f91779
server-bench: make seed choice configurable ( #14929 )
...
* server-bench: make seed choice configurable
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* fix error formatting
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
2025-07-29 10:40:50 +02:00
Georgi Gerganov
1f45f2890e
sync : ggml
2025-07-28 08:15:01 +03:00
Aman Gupta
446595b9b3
Docs: add instructions for adding backends ( #14889 )
2025-07-27 09:36:43 +08:00
Georgi Gerganov
2df255da3c
sync : ggml
...
ggml-ci
2025-07-24 20:27:23 +03:00
Georgi Gerganov
b17230917c
sync : ggml
2025-07-19 11:46:50 +03:00
Johannes Gäßler
5cae766541
scripts: synthetic prompt mode for server-bench.py ( #14695 )
2025-07-16 09:33:28 +02:00
Johannes Gäßler
494c5899cb
scripts: benchmark for HTTP server throughput ( #14668 )
...
* scripts: benchmark for HTTP server throughput
* fix server connection reset
2025-07-14 13:14:30 +02:00
Georgi Gerganov
8eff95544e
sync : ggml
2025-07-12 16:13:27 +03:00
Georgi Gerganov
215535701d
sync : ggml
...
ggml-ci
2025-07-12 14:25:44 +03:00
Aman Gupta
11ee0fea2a
Docs: script to auto-generate ggml operations docs ( #14598 )
...
* Docs: script to auto-generate ggml operations docs
* Review: formatting changes + change github action
* Use built-in types instead of typing
* docs : add BLAS and Metal ops
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-07-10 23:29:01 +08:00
Georgi Gerganov
d4cdd9c1c3
ggml : remove kompute backend ( #14501 )
...
ggml-ci
2025-07-03 07:48:32 +03:00
Georgi Gerganov
e17991c466
sync : ggml
...
ggml-ci
2025-07-02 20:08:45 +03:00
Georgi Gerganov
f61c05d4b1
sync : ggml
...
ggml-ci
2025-07-01 11:06:39 +03:00
Vedran Miletić
e9b6350e61
scripts : make the shell scripts cross-platform ( #14341 )
2025-06-30 10:17:18 +02:00
Georgi Gerganov
06cbedfca1
sync : ggml
...
ggml-ci
2025-06-20 21:02:47 +03:00
Georgi Gerganov
d03172cc79
sync : ggml
...
ggml-ci
2025-06-18 09:59:21 +03:00
Aman Gupta
2e42be42bd
compare-llama-bench: add option to plot ( #14169 )
...
* compare llama-bench: add option to plot
* Address review comments: convert case + add type hints
* Add matplotlib to requirements
* fix tests
* Improve comment and fix assert condition for test
* Add back default test_name, add --plot_log_scale
* use log_scale regardless of x_values
2025-06-14 10:34:20 +02:00
Georgi Gerganov
ae92c1855b
sync : ggml
...
ggml-ci
2025-06-10 18:39:33 +03:00
Georgi Gerganov
b8e2194efc
sync : ggml
...
ggml-ci
2025-06-10 09:21:56 +03:00
Georgi Gerganov
f3a4b1659c
sync : ggml
...
ggml-ci
2025-06-01 13:43:57 +03:00
Georgi Gerganov
53f925074d
sync : vendor ( #13901 )
...
* sync : vendor
ggml-ci
* cont : fix httplib version
ggml-ci
* cont : fix lint
* cont : fix lint
* vendor : move to common folder /vendor
ggml-ci
* cont : fix lint
* cont : move httplib to /vendor + use json_fwd.hpp
ggml-ci
* cont : fix server build
ggml-ci
* cont : add missing headers
ggml-ci
* cont : header clean-up
ggml-ci
2025-05-30 16:25:45 +03:00
Georgi Gerganov
1c49c70d07
sync : ggml
2025-05-27 18:05:33 +03:00
Georgi Gerganov
a26c4cc11e
scripts : add option to compare commits in Debug ( #13806 )
...
* scripts : add option to compare commits in Debug
* cont : reuse existing CMAKE_OPTS
2025-05-26 22:24:01 +03:00
Olivier Chafik
f5cd27b71d
server: streaming of tool calls and thoughts when --jinja is on (#12379 )
...
* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com >
2025-05-25 01:48:08 +01:00
Georgi Gerganov
d30cb5a7fa
sync : ggml
...
ggml-ci
2025-05-19 13:29:56 +03:00
Sigbjørn Skjæret
be1d4a13db
scripts : fix compare-llama-bench.py show parameter ( #13514 )
2025-05-14 08:41:01 +02:00
Sigbjørn Skjæret
bf79371120
scripts : support arbitrary input file formats in compare-llama-bench.py ( #13455 )
2025-05-13 15:31:12 +02:00
Georgi Gerganov
1e2809bc4b
sync : ggml
2025-05-13 14:02:28 +03:00
Sigbjørn Skjæret
09232370fc
scripts : exit compare-llama-bench.py gracefully when there's nothing to compare ( #13451 )
2025-05-11 16:20:39 +02:00
Georgi Gerganov
d879433824
sync : ggml
...
ggml-ci
2025-05-07 17:28:36 +03:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-05-02 20:27:13 +02:00
Georgi Gerganov
b34443923c
sync : ggml ( #13268 )
...
* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)
* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)
* review: remove src_x/y < 0 checks; add performance tests
* sync : ggml
ggml-ci
* vulkan : fix lint (#0 )
---------
Co-authored-by: Acly <aclysia@gmail.com >
2025-05-02 20:54:30 +03:00
Georgi Gerganov
b1dd4d08e8
sync : ggml
...
ggml-ci
2025-05-01 20:15:34 +03:00