llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Johannes Gäßler e81b8e4b7f llama: use FA + max. GPU layers by default (#15434 )

* llama: use max. GPU layers by default, auto -fa

* ggml-backend: abort instead of segfault

2025-08-30 16:32:10 +02:00

test_basic.py

…

test_chat_completion.py

…

test_completion.py

…

test_ctx_shift.py

…

test_embedding.py

…

test_infill.py

…

test_lora.py

…

test_rerank.py

…

test_security.py

…

test_slot_save.py

…

test_speculative.py

…

test_template.py

…

test_tokenize.py

…

test_tool_call.py

…

test_vision_api.py

…