llama.cpp/gguf-py/gguf/constants.py at c429b33beb35f13934a4dfbe0c138d30b45e5d54

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

fairydreaming fbca2f27fc Add support for ArcticForCausalLM (#7020 )

* common : increase max number of experts to 128

* common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn

* gguf-py : add architecture-specific block mappings that override selected general block mappings

* convert-hf : add model conversion support for ArcticForCausalLM

* convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM)

* llama : add inference support for LLM_ARCH_ARCTIC

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

2024-05-24 14:31:13 +02:00

34 KiB

Raw Blame History

View Raw

34 KiB Raw Blame History

34 KiB

Raw Blame History