llama.cpp/examples/server/tests/utils.py at b32efad2bc42460637c3a364c9554ea8217b3d7f

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

Georgi Gerganov a19b5cef16 llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )

* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci

2025-04-08 19:54:51 +03:00

15 KiB

Raw Blame History

View Raw

15 KiB Raw Blame History

15 KiB

Raw Blame History