Daniel Bevenius
04e632a4aa
ci : remove missing reranker model files ( #16444 )
...
This commit removes jina-reranker-v1-tiny-en model files that are no
longer present on Hugging Face.
The motivation for this that it clears up the CI logs from 404 errors
which can be a little confusing when looking at the logs the first time.
Refs: https://github.com/ggml-org/llama.cpp/actions/runs/18070620247/job/51419855630#step:5:2649
2025-10-06 14:56:59 +02:00
Georgi Gerganov
bbd32bc038
ci : fix clean-up of old logs ( #16381 )
2025-10-02 10:35:43 +03:00
Georgi Gerganov
d72f5f7ba2
ci : add AMD runners and workflows ( #16249 )
...
* ci : add AMD runners and workflows
* ci : move AMD jobs to separate workflow
* cont : fix paths
2025-09-29 17:51:48 +03:00
R0CKSTAR
a86a580a66
musa: upgrade musa sdk to 4.3.0 ( #16240 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-09-26 02:56:38 +02:00
Eve
bee378e098
ci: run the x64 and arm ci on the github machines instead ( #16183 )
...
* run the x64 ci on regular machines
* set up the same thing for arm
fix test-quantize-perf just like #12306
* try to disable sve
* add another sve run
2025-09-25 08:06:06 +03:00
Georgi Gerganov
0889589dbe
ci : enable Vulkan workflow on Mac ( #16194 )
2025-09-23 13:44:25 +03:00
Georgi Gerganov
1d660d2fae
ci : use smaller model ( #16168 )
...
* ci : switch from gemma to qwen3 0.6b
* ci : use smaller model for some tests
2025-09-22 09:11:39 +03:00
Georgi Gerganov
4d0a7cbc61
ci : adjust params for less runtime ( #16167 )
...
* ci : adjust params for less runtime
* ci : gate BF16 on some hardware
* ci : move extra tests to Arm runner
2025-09-22 08:31:40 +03:00
Georgi Gerganov
28baac9c9f
ci : migrate ggml ci to self-hosted runners ( #16116 )
...
* ci : migrate ggml ci to a self-hosted runners
* ci : add T4 runner
* ci : add instructions for adding self-hosted runners
* ci : disable test-backend-ops from debug builds due to slowness
* ci : add AMD V710 runner (vulkan)
* cont : add ROCM workflow
* ci : switch to qwen3 0.6b model
* cont : fix the context size
2025-09-21 16:50:45 +03:00
Georgi Gerganov
0320ac5264
metal : refactor + optimize v2 ( #15995 )
...
* metal : improve naming
* metal : refactor device
ggml-ci
* cont : props
ggml-ci
* metal : apply ggml_mem_ranges_t
ggml-ci
* metal : remove GGML_METAL_USE_BF16
ggml-ci
* metal : refactor device buffer
ggml-ci
* cont : fix naming
* metal : sync before destroying the backend
ggml-ci
* metal : refactor context
ggml-ci
* metal : migrate ggml-metal.m to ggml-metal.cpp
ggml-ci
* metal : adjust ops API
ggml-ci
* metal : use C++ to store piplienes
ggml-ci
* metal : migrate ops to separate functions
ggml-ci
* metal : add ggml_metal_library_t
ggml-ci
* metal : improve naming
ggml-ci
* metal : cleanp
ggml-ci
* metal : add support for GGML_OP_LOG
ggml-ci
* metal : fix error handling
ggml-ci
2025-09-17 20:38:12 +03:00
Georgi Gerganov
55758b00ca
metal : refactor kernel loading ( #15964 )
...
* metal : refactor bin kernels loading
ggml-ci
* metal : refactor rms kernel loading
ggml-ci
* ci : try to add memory leaks check
ggml-ci
* ci : try to enable memory leak detection for Mac
* cont : seems to be working
2025-09-13 16:24:22 +03:00
Sigbjørn Skjæret
7d3c9f2b21
ci : explicitly set fa off or on ( #15692 )
2025-08-31 15:30:20 +02:00
Georgi Gerganov
30649cab65
ci : continue file download with wget ( #15471 )
...
ggml-ci
2025-08-21 13:42:55 +03:00
R0CKSTAR
3f4fc97f1d
musa: upgrade musa sdk to rc4.2.0 ( #14498 )
...
* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-07-24 20:05:37 +01:00
Reese Levine
21c021745d
ggml: Add initial WebGPU backend ( #14521 )
...
* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults
* Initialize webgpu device
* Making progress on setting up the backend
* Finish more boilerplate/utility functions
* Organize file and work on alloc buffer
* Add webgpu_context to prepare for actually running some shaders
* Work on memset and add shader loading
* Work on memset polyfill
* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it
* Implement get_tensor and buffer_clear
* Finish rest of setup
* Start work on compute graph
* Basic mat mul working
* Work on emscripten build
* Basic WebGPU backend instructions
* Use EMSCRIPTEN flag
* Work on passing ci, implement 4d tensor multiplication
* Pass thread safety test
* Implement permuting for mul_mat and cpy
* minor cleanups
* Address feedback
* Remove division by type size in cpy op
* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends
* Fix name
* Fix macos dawn prefix path
2025-07-16 18:18:51 +03:00
Vedran Miletić
e9b6350e61
scripts : make the shell scripts cross-platform ( #14341 )
2025-06-30 10:17:18 +02:00
Sigbjørn Skjæret
88fc854b4b
llama : improve sep token handling ( #14272 )
2025-06-20 14:04:09 +02:00
Diego Devesa
6adc3c3ebc
llama : add thread safety test ( #14035 )
...
* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-06-16 08:11:43 -07:00
pockers21
146b88e8b3
ci: fix CUDA build failure on autodl cloud machines ( #14005 )
...
Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.
Co-authored-by: pockers21 <liyang2@uniontech.com >
2025-06-05 16:25:29 +03:00
R0CKSTAR
33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )
...
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-05-21 09:58:49 +08:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-05-02 20:27:13 +02:00
Xuan-Son Nguyen
e391d3ee8d
ci : no curl on ggml-ci ( #12796 )
2025-04-07 15:37:28 +03:00
Atharva Dubey
2004644b7a
ci : add env variable in ggml-ci and document the same in SYCL.md ( #12736 )
2025-04-03 15:12:39 +03:00
R0CKSTAR
492d7f1ff7
musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc ( #12611 )
...
* musa: fix all warnings
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: enable -DLLAMA_FATAL_WARNINGS=ON in run.sh
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: update ci doc (install ccache)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* fix Windows build issue
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-03-30 10:59:38 +02:00
R0CKSTAR
fd7855f8f5
doc: [MUSA] minor changes ( #12583 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-03-26 09:09:48 +02:00
R0CKSTAR
3cd3a39532
ci: [MUSA] add CI and update doc ( #12562 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-03-25 09:45:08 +02:00
Akarshan Biswas
c95fa362b3
ci: [SYCL] ggml-ci Use main GPU and enable sysman ( #12547 )
2025-03-24 19:35:38 +02:00
Akarshan Biswas
48d7021c61
CI: fix SYCL build ( #12546 )
2025-03-24 14:58:32 +02:00
Georgi Gerganov
ea002810a2
ci : fix save-load test invocations ( #12245 )
2025-03-07 12:19:31 +02:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Xuan Son Nguyen
b4d92a59a2
ci : add -no-cnv for tests ( #11238 )
2025-01-14 16:42:23 +02:00
Wang Qin
5c7a5aa0c3
ci: add error handling for Python venv creation in run.sh ( #10608 )
2024-12-01 20:11:42 +02:00
Georgi Gerganov
ec450d3bbf
metal : opt-in compile flag for BF16 ( #10218 )
...
* metal : opt-in compile flag for BF16
ggml-ci
* ci : use BF16
ggml-ci
* swift : switch back to v12
* metal : has_float -> use_float
ggml-ci
* metal : fix BF16 check in MSL
ggml-ci
2024-11-08 21:59:46 +02:00
Georgi Gerganov
1926d6e39d
llama : adjust default context size + print warnings ( #10136 )
...
* llama : adjust default context size + print warnings
ggml-ci
* ggml-ci : add missing gpu-layers + adjust context sizes
2024-11-02 15:18:56 +02:00
Georgi Gerganov
40f2555797
ci : fix cmake flags for SYCL
2024-10-24 21:23:33 +03:00
Georgi Gerganov
8c475b97b8
rerank : use [SEP] token instead of [BOS] ( #9737 )
...
* rerank : use [SEP] token instead of [BOS]
ggml-ci
* common : sanity check for non-NULL tokens
ggml-ci
* ci : adjust rank score interval
ggml-ci
* ci : add shebang to run.sh
ggml-ci
2024-10-05 15:55:04 +03:00
Georgi Gerganov
f4d2b8846a
llama : add reranking support ( #9510 )
...
* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
2024-09-28 17:42:03 +03:00
Georgi Gerganov
6262d13e0b
common : reimplement logging ( #9418 )
...
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-15 20:46:12 +03:00
Georgi Gerganov
7a3df798fc
ci : add VULKAN support to ggml-ci ( #9055 )
2024-08-26 12:19:39 +03:00
slaren
f12ceaca0c
ggml-ci : try to improve build time ( #9160 )
2024-08-26 11:03:30 +02:00
Alex Tuddenham
4090ea5501
ci : add checks for cmake,make and ctest in ci/run.sh ( #8200 )
...
* Added checks for cmake,make and ctest
* Removed erroneous whitespace
2024-07-07 17:59:14 +03:00
Georgi Gerganov
e235b267a2
py : switch to snake_case ( #8305 )
...
* py : switch to snake_case
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* cont : fix link
* gguf-py : use snake_case in scripts entrypoint export
* py : rename requirements for convert_legacy_llama.py
Needed for scripts/check-requirements.sh
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net >
2024-07-05 07:53:33 +03:00
ditsuke
821922916f
fix: Update script paths in CI scripts
2024-07-04 15:39:13 +00:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com >
2024-06-26 18:33:02 +03:00
Olivier Chafik
1c641e6aac
build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4 .
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com >
2024-06-13 00:41:52 +01:00
Galunid
9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py ( #7430 )
...
* Move convert.py to examples/convert-no-torch.py
* Fix CI, scripts, readme files
* convert-no-torch -> convert-legacy-llama
* Move vocab thing to vocab.py
* Fix convert-no-torch -> convert-legacy-llama
* Fix lost convert.py in ci/run.sh
* Fix imports
* Fix gguf not imported correctly
* Fix flake8 complaints
* Fix check-requirements.sh
* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE
* Review fixes
2024-05-30 21:40:00 +10:00
Georgi Gerganov
55ac3b7aea
ci : use Pythia models instead of OpenLlama ( #7470 )
...
* ci : start using Pythia models over OpenLlama
ggml-ci
* ci : disable q2_k ppl tests
* ci : use convert-hf-to-gguf.py
* ci : update gg_get_model
* ci : fix convert outfile name
ggml-ci
* llama : gptneox arch use F32 attn prec
ggml-ci
2024-05-23 15:28:14 +03:00
Georgi Gerganov
e84b71c2c6
ggml : drop support for QK_K=64 ( #7473 )
...
* ggml : drop support for QK_K=64
ggml-ci
* opencl : restore QK_K=256 define
2024-05-23 10:00:21 +03:00
slaren
b228aba91a
remove convert-lora-to-ggml.py ( #7204 )
2024-05-12 02:29:33 +02:00
Georgi Gerganov
947d3ad27d
ci : add GG_BUILD_EXTRA_TESTS_0 env ( #7098 )
...
* ci : add GG_BUILD_EXTRA_TESTS_0 env
ggml-ci
* Update run.sh
ggml-ci
2024-05-07 11:08:49 +03:00