model : Granite MoE shared (#13269)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

* feat: Add GGUF conversion for granitemoeshared

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: hparam and arch plumbing for granitemoeshared

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Split MoE fused tensors for shared experts in conversion

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: First WIP cut at model arch in cpp

The hparam and architecture plumbing should be correct, but the
implementation of the shared experts seems to still be broken.

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Cleaner (maybe more correct?) splitting for gate/up

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Fix the input to the shared experts

I had misread that the shared experts take the inputs _before_ the standard
MoE layer and was feeding the output of the MoE to the shared experts.

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Avoid architecture-specific checks for Granite MoE Shared

This is a cleaner way that will allow more flexibility in architecture
strings going forward.

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* refactor: Split granite architectures out of llm_build_llama

This helps de-clutter the llama-family graph construction and allows
granite to diverge further (in preparation for Granite 4).

NOTE: I removed the granite scale factors from llm_build_deci because they
appear to only be there as copy-paste from llm_build_llama. The HF config
does not seem to set those values:
https://huggingface.co/Deci/DeciLM-7B/blob/main/config.json

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Fix compiler warning about uninitialized inp_pos

This should not have been reachable, but it warns on some compliers

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Consoladate GraniteMoEShared into GraniteMoE for conversion

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Consolidate GraniteMoEShared into GraniteMoE on the c++ side

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

This commit is contained in:

Gabe Goodhart

2025-05-13 07:12:01 -06:00

committed by

GitHub

parent 1e2809bc4b

commit d590cd4c24

5 changed files with 235 additions and 35 deletions

									
										3

src/llama-arch.cpp
									
												View File
												
				@@ -1481,6 +1481,9 @@ static const std::map<llm_arch, std::map<llm_tensor, const char *>> LLM_TENSOR_N

				            { LLM_TENSOR_FFN_GATE_EXPS,   "blk.%d.ffn_gate_exps" },

				            { LLM_TENSOR_FFN_DOWN_EXPS,   "blk.%d.ffn_down_exps" },

				            { LLM_TENSOR_FFN_UP_EXPS,     "blk.%d.ffn_up_exps" },

				            { LLM_TENSOR_FFN_GATE_SHEXP,  "blk.%d.ffn_gate_shexp" },

				            { LLM_TENSOR_FFN_DOWN_SHEXP,  "blk.%d.ffn_down_shexp" },

				            { LLM_TENSOR_FFN_UP_SHEXP,    "blk.%d.ffn_up_shexp" },

				        },

				    },

				    {

model : Granite MoE shared (#13269)

3 src/llama-arch.cpp Unescape Escape View File

3

src/llama-arch.cpp

View File