llama.cpp/examples/retrieval/retrieval.cpp at 3a7ac5300a7e8ebbe4a3eb5aff9dba11ed76ea61

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Douglas Hanley 80ea089d77 llama : allow pooled embeddings on any model (#7477 )

* create append_pooling operation; allow to specify attention_type; add last token pooling; update examples

* find result_norm/result_embd tensors properly; update output allocation logic

* only use embd output for pooling_type NONE

* get rid of old causal_attn accessor

* take out attention_type; add in llama_set_embeddings

* bypass logits when doing non-NONE pooling

2024-06-21 08:38:22 +03:00

10 KiB

Raw Blame History

View Raw

10 KiB Raw Blame History

10 KiB

Raw Blame History