Sam
							
						 
					 | 
					
						
						
							
						
						ef0144c087
					 | 
					
						
						
							
							model: support GLM 4.5 family of models (#14939)
						
						
						
						
						
						
						
						* model: Add GLM 4.5 (#14921)
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Merge in PR suggestions
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: Add GLM 4.5 family of models (#14921)
1. Updated tensor_mapping.py with NextN tensor mappings
- Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py
- Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm
2. Added num_nextn_predict_layers configuration
- Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp
- Added num_nextn_predict_layers field to llama_hparams struct
- Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter
- Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers
- Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method
- Updated conversion script to extract and write this parameter from HuggingFace config
3. Added FIM tokens for GLM4_MOE
- Added GLM-4.5's FIM tokens to llama-vocab.cpp:
  - <|code_prefix|> for FIM_PRE
  - <|code_suffix|> for FIM_SUF
  - <|code_middle|> for FIM_MID
4. Removed manual NextN tensor handling
- Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors
- NextN tensors are now handled automatically through the proper tensor mapping system
* glm 4.5 update tensors names
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: glm 4.5 apply suggestions from code review
* Apply suggestions from code review
* patch broken chat template
* typings fix
* add TENSOR_SKIP flag
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Update src/llama-model-loader.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
						
						
					 | 
					
						2025-08-04 20:29:25 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Diego Devesa
							
						 
					 | 
					
						
						
							
						
						e0e912f49b
					 | 
					
						
						
							
							llama : add option to override model tensor buffers (#11397)
						
						
						
						
						
						
						
						* llama : add option to override tensor buffers
* ggml : fix possible underflow in ggml_nbytes 
						
						
					 | 
					
						2025-04-02 14:52:01 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Xuan Son Nguyen
							
						 
					 | 
					
						
						
							
						
						681149ced2
					 | 
					
						
						
							
							llama : add llama_model_load_from_splits (#11255)
						
						
						
						
						
						
						
						* llama : add `llama_model_load_from_splits`
* update 
						
						
					 | 
					
						2025-01-16 13:54:08 +01:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Georgi Gerganov
							
						 
					 | 
					
						
						
							
						
						afa8a9ec9b
					 | 
					
						
						
							
							llama : add llama_vocab, functions -> methods, naming (#11110)
						
						
						
						
						
						
						
						* llama : functions -> methods (#11110)
* llama : add struct llama_vocab to the API (#11156)
ggml-ci
* hparams : move vocab params to llama_vocab (#11159)
ggml-ci
* vocab : more pimpl (#11165)
ggml-ci
* vocab : minor tokenization optimizations (#11160)
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167)
ggml-ci
* llama : update API names to use correct prefix (#11174)
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174)
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174)
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com> 
						
						
					 | 
					
						2025-01-12 11:32:42 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Georgi Gerganov
							
						 
					 | 
					
						
						
							
						
						f66f582927
					 | 
					
						
						
							
							llama : refactor src/llama.cpp (#10902)
						
						
						
						
						
						
						
						* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci] 
						
						
					 | 
					
						2025-01-03 10:18:53 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 |