llama.cpp/gguf-py/gguf/tensor_mapping.py at 0da4b6aa07d96b758812d17b2c82267632fa4ba5

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

Daniel Bevenius fb15d649ed llama : add support for EmbeddingGemma 300m (#15798 )

This commit add support for the EmbeddingGemma 300m. This model supports
sliding window attention (SWA) and a new swq_type is introduced to
support symmetric SWA masking.

This commit also extracts the code from the function
llama_is_masked_swa in llama-impl.h, so that the logic can be shared
by both llm_graph_input_attn_no_cache::set_input and
llama_kv_cache::set_input_kq_mask.

With this commit the EmbeddingGemma 300m model can be converted to
to GGUF and used with llama.cpp.

Once the model has been uploaded to HuggingFace it can be used like
this:
```console
./build/bin/llama-cli -hf ggml-org/embeddinggemma-300m-GGUF:Q8_0
```

2025-09-04 18:10:29 +02:00

69 KiB

Raw Blame History

View Raw

69 KiB Raw Blame History

69 KiB

Raw Blame History