llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

Sigbjørn Skjæret 84bf3c6778 model : add BailingMoeV2 support (#16063 )

* add BailingMoeV2 support

* update llm types

* undo

* undo

* update llm types

* add model collection link

* update

* almost working

* correct group selection and rename n_group_exp

* avoid large top_k and use argmax instead for now

if we had something like argmax2 that would be equivalent, but this works fine until then

* poke

* skip group selection when there are no tokens

* fix 1T conversion

* hopefully fixed expert group selection

third time's the charm?

* make expert group selection generally available

The new LLaDA2Moe model uses this method too, make it generally available regardless of architecture.

* allow n_expert_groups to be 1 (Kimi K2)

* address review suggestions

2025-10-20 21:38:20 +02:00

scripts

gguf-py : add support for endian conversion of BF16 data (#16594 )

2025-10-15 22:43:08 +02:00

__init__.py

convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 )

2024-07-18 20:40:15 +10:00

constants.py

model : add BailingMoeV2 support (#16063 )

2025-10-20 21:38:20 +02:00

gguf_reader.py

gguf-py : display the invalid gguf type (#13687 )