llama.cpp/examples/server_embd.py at 34c9d765bf173c551398f1e7fa4595019bc53bab

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-29 08:41:22 +00:00

Files

Georgi Gerganov a19b5cef16 llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )

* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci

2025-04-08 19:54:51 +03:00

969 B

Raw Blame History

View Raw

969 B Raw Blame History

969 B

Raw Blame History