llama.cpp/examples/server/tests/utils.py at b3b6d862cfdf190e1b9ad961639a25f5ebc0c7e3

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Georgi Gerganov a19b5cef16 llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )

* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci

2025-04-08 19:54:51 +03:00

15 KiB

Raw Blame History

View Raw

15 KiB Raw Blame History

15 KiB

Raw Blame History