llama: use FA + max. GPU layers by default (#15434)

* llama: use max. GPU layers by default, auto -fa

* ggml-backend: abort instead of segfault
This commit is contained in:
Johannes Gäßler
2025-08-30 16:32:10 +02:00
committed by GitHub
parent 38ad381f9f
commit e81b8e4b7f
19 changed files with 235 additions and 72 deletions

View File

@@ -59,3 +59,5 @@ std::string llama_format_tensor_shape(const std::vector<int64_t> & ne);
std::string llama_format_tensor_shape(const struct ggml_tensor * t);
std::string gguf_kv_to_str(const struct gguf_context * ctx_gguf, int i);
#define LLAMA_TENSOR_NAME_FATTN "__fattn__"