mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-28 08:31:25 +00:00 
			
		
		
		
	model: Add support for PhiMoE arch (#11003)
* model: support phimoe * python linter * doc: minor Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com> * doc: minor Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com> * doc: add phimoe as supported model ggml-ci --------- Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
This commit is contained in:
		| @@ -28,7 +28,7 @@ The required steps to implement for an HF model are: | ||||
| ```python | ||||
| @Model.register("MyModelForCausalLM") | ||||
| class MyModel(Model): | ||||
|     model_arch = gguf.MODEL_ARCH.GROK | ||||
|     model_arch = gguf.MODEL_ARCH.MYMODEL | ||||
| ``` | ||||
|  | ||||
| 2. Define the layout of the GGUF tensors in [constants.py](/gguf-py/gguf/constants.py) | ||||
| @@ -79,14 +79,14 @@ Depending on the model configuration, tokenizer, code and tensors layout, you wi | ||||
| - `Model#set_vocab` | ||||
| - `Model#write_tensors` | ||||
|  | ||||
| NOTE: Tensor names must end with `.weight` suffix, that is the convention and several tools like `quantize` expect this to proceed the weights. | ||||
| NOTE: Tensor names must end with `.weight` or `.bias` suffixes, that is the convention and several tools like `quantize` expect this to proceed the weights. | ||||
|  | ||||
| ### 2. Define the model architecture in `llama.cpp` | ||||
|  | ||||
| The model params and tensors layout must be defined in `llama.cpp`: | ||||
| 1. Define a new `llm_arch` | ||||
| 2. Define the tensors layout in `LLM_TENSOR_NAMES` | ||||
| 3. Add any non standard metadata in `llm_load_hparams` | ||||
| 3. Add any non-standard metadata in `llm_load_hparams` | ||||
| 4. Create the tensors for inference in `llm_load_tensors` | ||||
| 5. If the model has a RoPE operation, add the rope type in `llama_rope_type` | ||||
|  | ||||
| @@ -96,9 +96,9 @@ NOTE: The dimensions in `ggml` are typically in the reverse order of the `pytorc | ||||
|  | ||||
| This is the funniest part, you have to provide the inference graph implementation of the new model architecture in `llama_build_graph`. | ||||
|  | ||||
| Have a look at existing implementation like `build_llama`, `build_dbrx` or `build_bert`. | ||||
| Have a look at existing implementations like `build_llama`, `build_dbrx` or `build_bert`. | ||||
|  | ||||
| When implementing a new graph, please note that the underlying `ggml` backends might not support them all, support for missing backend operations can be added in another PR. | ||||
| Some `ggml` backends do not support all operations. Backend implementations can be added in a separate PR. | ||||
|  | ||||
| Note: to debug the inference graph: you can use [llama-eval-callback](/examples/eval-callback/). | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Pierrick Hymbert
					Pierrick Hymbert