llama.cpp/gguf-py/gguf/tensor_mapping.py at 4d197edebea1e22394206b947c8e4f3a24dcffbe

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

Sigbjørn Skjæret 84bf3c6778 model : add BailingMoeV2 support (#16063 )

* add BailingMoeV2 support

* update llm types

* undo

* undo

* update llm types

* add model collection link

* update

* almost working

* correct group selection and rename n_group_exp

* avoid large top_k and use argmax instead for now

if we had something like argmax2 that would be equivalent, but this works fine until then

* poke

* skip group selection when there are no tokens

* fix 1T conversion

* hopefully fixed expert group selection

third time's the charm?

* make expert group selection generally available

The new LLaDA2Moe model uses this method too, make it generally available regardless of architecture.

* allow n_expert_groups to be 1 (Kimi K2)

* address review suggestions

2025-10-20 21:38:20 +02:00

71 KiB

Raw Blame History

View Raw

71 KiB Raw Blame History

71 KiB

Raw Blame History