Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						be42642581 
					 
					
						
						
							
							readme : update hot topics ( #15097 )  
						
						
						
						
							
						
					 
					
						2025-08-05 20:19:33 +03:00 
						 
				 
			
				
					
						
							
							
								Romain Biessy 
							
						 
					 
					
						
						
							
						
						3306ceabf0 
					 
					
						
						
							
							sycl: fix mul_mat selection ( #15092 )  
						
						
						
						
							
 
						
					 
					
						2025-08-05 18:39:55 +02:00 
						 
				 
			
				
					
						
							
							
								Juk Armstrong 
							
						 
					 
					
						
						
							
						
						c81de6e107 
					 
					
						
						
							
							Fix glm4moe bug ( #15088 )  
						
						
						
						
							
 
						
					 
					
						2025-08-05 13:56:44 +01:00 
						 
				 
			
				
					
						
							
							
								Alex Wu 
							
						 
					 
					
						
						
							
						
						22f060c9c4 
					 
					
						
						
							
							webui: fix markdown table ( #15081 )  
						
						... 
						
						
						
						* webui: fix markdown table
* webui: fix table display with themes 
						
						
							
						
					 
					
						2025-08-05 13:56:44 +02:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						ee3a9fcf88 
					 
					
						
						
							
							context : fix index overflow on huge outputs ( #15080 )  
						
						... 
						
						
						
						* context : fix overflow when re-ordering huge outputs
* context : fix logits size overflow for huge batches 
						
						
							
 
						
					 
					
						2025-08-05 11:27:45 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						ec428b02c3 
					 
					
						
						
							
							llama : add --n-cpu-moe option ( #15077 )  
						
						... 
						
						
						
						* llama : add --n-cpu-moe option
Keeps the MoE weights of the first N layers in the CPU 
						
						
							
 
						
					 
					
						2025-08-05 01:05:36 +02:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						19f68fa5a4 
					 
					
						
						
							
							imatrix : warn when GGUF imatrix is saved without .gguf suffix ( #15076 )  
						
						... 
						
						
						
						* imatrix : add warning when suffix is not .gguf for GGUF imatrix
* imatrix : only warn about suffix when output format is unspecified 
						
						
							
 
						
					 
					
						2025-08-04 23:26:52 +02:00 
						 
				 
			
				
					
						
							
							
								Christian Kastner 
							
						 
					 
					
						
						
							
						
						41613437ff 
					 
					
						
						
							
							cmake: Add GGML_BACKEND_DIR option ( #15074 )  
						
						... 
						
						
						
						* cmake: Add GGML_BACKEND_DIR option
This can be used by distributions to specify where to look for backends
when ggml is built with GGML_BACKEND_DL=ON.
* Fix phrasing 
						
						
							
 
						
					 
					
						2025-08-04 21:29:14 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						e5bebe5251 
					 
					
						
						
							
							gguf-py : add --chat-template-file to gguf_new_metadata ( #15075 )  
						
						
						
						
							
						
					 
					
						2025-08-04 21:01:48 +02:00 
						 
				 
			
				
					
						
							
							
								Sam 
							
						 
					 
					
						
						
							
						
						ef0144c087 
					 
					
						
						
							
							model: support GLM 4.5 family of models ( #14939 )  
						
						... 
						
						
						
						* model: Add GLM 4.5 (#14921 )
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Merge in PR suggestions
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: Add GLM 4.5 family of models (#14921 )
1. Updated tensor_mapping.py with NextN tensor mappings
- Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py
- Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm
2. Added num_nextn_predict_layers configuration
- Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp
- Added num_nextn_predict_layers field to llama_hparams struct
- Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter
- Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers
- Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method
- Updated conversion script to extract and write this parameter from HuggingFace config
3. Added FIM tokens for GLM4_MOE
- Added GLM-4.5's FIM tokens to llama-vocab.cpp:
  - <|code_prefix|> for FIM_PRE
  - <|code_suffix|> for FIM_SUF
  - <|code_middle|> for FIM_MID
4. Removed manual NextN tensor handling
- Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors
- NextN tensors are now handled automatically through the proper tensor mapping system
* glm 4.5 update tensors names
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
* model: glm 4.5 apply suggestions from code review
* Apply suggestions from code review
* patch broken chat template
* typings fix
* add TENSOR_SKIP flag
Co-authored-by: Diego Devesa <slarengh@gmail.com >
* Update src/llama-model-loader.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
							
 
						
					 
					
						2025-08-04 20:29:25 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2721257e3e 
					 
					
						
						
							
							quantize : fix confusing error message if ftype is invalid ( #15071 )  
						
						
						
						
							
 
						
					 
					
						2025-08-04 18:11:02 +02:00 
						 
				 
			
				
					
						
							
							
								Reese Levine 
							
						 
					 
					
						
						
							
						
						587d0118f5 
					 
					
						
						
							
							ggml: WebGPU backend host improvements and style fixing ( #14978 )  
						
						... 
						
						
						
						* Add parameter buffer pool, batching of submissions, refactor command building/submission
* Add header for linux builds
* Free staged parameter buffers at once
* Format with clang-format
* Fix thread-safe implementation
* Use device implicit synchronization
* Update workflow to use custom release
* Remove testing branch workflow 
						
						
							
 
						
					 
					
						2025-08-04 08:52:43 -07:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						5aa1105da2 
					 
					
						
						
							
							vulkan: fix build when using glslang that does not support coopmat2 ( #15062 )  
						
						
						
						
							
 
						
					 
					
						2025-08-04 07:09:19 +02:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						d31192b4ee 
					 
					
						
						
							
							imatrix : use GGUF by default ( #14842 )  
						
						... 
						
						
						
						* imatrix : use GGUF by default
* imatrix : use GGUF regardless of the output filename
The legacy format can only be produced with --output-format dat 
						
						
							
 
						
					 
					
						2025-08-03 22:00:05 +02:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						0a2f5496be 
					 
					
						
						
							
							imatrix : fix 3d activation handling for hybrid and recurrent models ( #14994 )  
						
						... 
						
						
						
						* imatrix : use a single count for dense 3d tensors
* imatrix : fix 3d activations when model tensor is 2d
* imatrix : fix 3d tensor counts 
						
						
							
 
						
					 
					
						2025-08-03 21:49:13 +02:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						11a3811164 
					 
					
						
						
							
							memory : handle kv_unified for hybrid models ( #15050 )  
						
						
						
						
							
 
						
					 
					
						2025-08-03 21:43:07 +02:00 
						 
				 
			
				
					
						
							
							
								Csaba Kecskemeti 
							
						 
					 
					
						
						
							
						
						97366dc6ab 
					 
					
						
						
							
							vocab : JetBrains Mellum pre-tokenizer ( #15045 )  
						
						
						
						
							
 
						
					 
					
						2025-08-03 21:38:18 +02:00 
						 
				 
			
				
					
						
							
							
								Gabriel Larson 
							
						 
					 
					
						
						
							
						
						83bc2f288c 
					 
					
						
						
							
							model : add text-only support for Kimi-VL (and find special tokens in text_config)  ( #15051 )  
						
						... 
						
						
						
						* basic kimi-vl textmodel conversion
* check config["text_config"] for special tokens 
						
						
							
						
					 
					
						2025-08-03 16:56:25 +02:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						6c7a441161 
					 
					
						
						
							
							vulkan: Use coopmat2 for conv2d ( #14982 )  
						
						
						
						
							
 
						
					 
					
						2025-08-03 14:23:57 +02:00 
						 
				 
			
				
					
						
							
							
								lhez 
							
						 
					 
					
						
						
							
						
						5c0eb5ef54 
					 
					
						
						
							
							opencl: fix adreno compiler detection logic ( #15029 )  
						
						
						
						
							
 
						
					 
					
						2025-08-02 19:51:18 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						03d4698218 
					 
					
						
						
							
							CUDA: use mma FA kernel for gqa > 4 on RTX 4000 ( #15035 )  
						
						
						
						
							
 
						
					 
					
						2025-08-02 16:37:08 +02:00 
						 
				 
			
				
					
						
							
							
								leejet 
							
						 
					 
					
						
						
							
						
						3303c19b16 
					 
					
						
						
							
							cuda: make im2col a little faster ( #15025 )  
						
						
						
						
							
 
						
					 
					
						2025-08-02 17:15:36 +03:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						4fdea540bd 
					 
					
						
						
							
							kv-cache : skip alignment of n_stream in kv-cache log msg [no ci] ( #15040 )  
						
						... 
						
						
						
						This commit removes the right alignment the `n_stream` value in the
log message in the `llama_kv_cache_unified` constructor.
The motivation for this change is to enhance the readability of log
message. Currently the output looks like this:
```console
llama_kv_cache_unified: size = 2048.00 MiB (  4096 cells,  32 layers,  1/ 1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
```
Notice that the `n_stream` value is right aligned, which makes it a
little harder to read.
With the change in this commit the output will look like
```console
llama_kv_cache_unified: size = 2048.00 MiB (  4096 cells,  32 layers, 1/1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
``` 
						
						
							
						
					 
					
						2025-08-02 17:14:57 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a4569c41fd 
					 
					
						
						
							
							llama : enable LLAMA_SET_ROWS=1 by default ( #14959 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-08-02 17:14:21 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						15e92fd337 
					 
					
						
						
							
							cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 ( #15038 )  
						
						... 
						
						
						
						* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1
ggml-ci
* cont : fix cont types
ggml-ci
* cont : adopt variable names and comment from the other branch 
						
						
							
 
						
					 
					
						2025-08-02 17:13:05 +03:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2bf3fbf0b5 
					 
					
						
						
							
							ci : check that pre-tokenizer hashes are up-to-date ( #15032 )  
						
						... 
						
						
						
						* torch is not required for convert_hf_to_gguf_update
* add --check-missing parameter
* check that pre-tokenizer hashes are up-to-date 
						
						
							
						
					 
					
						2025-08-02 14:39:01 +02:00 
						 
				 
			
				
					
						
							
							
								Douglas Hanley 
							
						 
					 
					
						
						
							
						
						711d5e6fe6 
					 
					
						
						
							
							convert : fix Qwen3-Embedding pre-tokenizer hash ( #15030 )  
						
						
						
						
							
						
					 
					
						2025-08-02 12:51:02 +02:00 
						 
				 
			
				
					
						
							
							
								Jhen-Jie Hong 
							
						 
					 
					
						
						
							
						
						f738989dcb 
					 
					
						
						
							
							chat : fix multiple tool_calls on hermes-2-pro ( #14962 )  
						
						
						
						
							
 
						
					 
					
						2025-08-02 18:04:48 +08:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						4cb208c93c 
					 
					
						
						
							
							vulkan: coopmat2 mul_mat optimizations ( #14934 )  
						
						... 
						
						
						
						- Increase tile size for k-quants, to match non-k-quants
- Choose more carefully between large and medium tiles, considering how it
  interacts with split_k
- Allow larger/non-power of two split_k, and make the splits a multiple of 256
- Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used 
						
						
							
 
						
					 
					
						2025-08-02 11:21:37 +02:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						3025b621d1 
					 
					
						
						
							
							llama-bench: rename DB table name from test to llama_bench ( #15003 )  
						
						... 
						
						
						
						Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
							
 
						
					 
					
						2025-08-02 17:20:40 +08:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						ec0b18802c 
					 
					
						
						
							
							vulkan: Support ne[3]>1 in noncontig matrix-vector multiply ( #15015 )  
						
						
						
						
							
 
						
					 
					
						2025-08-02 10:48:30 +02:00 
						 
				 
			
				
					
						
							
							
								Douglas Hanley 
							
						 
					 
					
						
						
							
						
						339bd0268c 
					 
					
						
						
							
							model : support Qwen3-Embedding ( #15023 )  
						
						
						
						
							
 
						
					 
					
						2025-08-02 10:44:50 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						f906275537 
					 
					
						
						
							
							server: enable token array inputs for OAI API ( #15001 )  
						
						
						
						
							
 
						
					 
					
						2025-08-02 10:12:41 +02:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						a9f7541ec2 
					 
					
						
						
							
							vulkan: optimizations for direct convolution ( #14933 )  
						
						... 
						
						
						
						* vulkan: optimizations for direct convolution
- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.
* Three tiles sizes for CONV_2D, and a heuristic to choose
* reallow collectives for pre-Turing
* make SHMEM_PAD a spec constant
* fixes for intel perf - no shmem padding, placeholder shader core count
* shader variants with/without unrolling
* 0cc4m's fixes for AMD perf
Co-authored-by: 0cc4m <picard12@live.de >
---------
Co-authored-by: 0cc4m <picard12@live.de > 
						
						
							
 
						
					 
					
						2025-08-02 09:57:04 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						9c35706b98 
					 
					
						
						
							
							CUDA: fix MMQ nwarps for AMD with warp_size==32 ( #15014 )  
						
						
						
						
							
 
						
					 
					
						2025-08-01 20:47:32 +02:00 
						 
				 
			
				
					
						
							
							
								l-austenfeld 
							
						 
					 
					
						
						
							
						
						c76b420e4c 
					 
					
						
						
							
							vendor : update vendored copy of google/minja ( #15011 )  
						
						... 
						
						
						
						* vendor : update vendored copy of google/minja
Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com >
* Re-remove trailing whitespace
Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com >
* Remove another trailing whitespace
Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com >
---------
Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com > 
						
						
							
 
						
					 
					
						2025-08-01 16:59:06 +02:00 
						 
				 
			
				
					
						
							
							
								stevenkuang 
							
						 
					 
					
						
						
							
						
						0f5ccd6fd1 
					 
					
						
						
							
							model : add hunyuan dense ( #14878 )  
						
						... 
						
						
						
						* support hunyuan_v1_dense
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* update hunyuan_moe to hunyuan_v1_moe
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix rope alpha assert and bos token
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* add blank line
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* Revert "update hunyuan_moe to hunyuan_v1_moe"
This reverts commit aa973ca219stevenkuang@tencent.com >
* fix hunyuan_moe chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* remove leftover code
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* update hunyuan dense chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
* fix hunyuan dense vocab and chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
---------
Signed-off-by: stevenkuang <stevenkuang@tencent.com > 
						
						
							
 
						
					 
					
						2025-08-01 15:31:12 +02:00 
						 
				 
			
				
					
						
							
							
								lhez 
							
						 
					 
					
						
						
							
						
						1c872f71fb 
					 
					
						
						
							
							opencl: add f16 for add, sub, mul, div ( #14984 )  
						
						
						
						
							
 
						
					 
					
						2025-08-01 13:15:44 +02:00 
						 
				 
			
				
					
						
							
							
								Srihari-mcw 
							
						 
					 
					
						
						
							
						
						baad94885d 
					 
					
						
						
							
							ggml : Q2k interleaving implementation - x86/x64 SIMD ( #14373 )  
						
						... 
						
						
						
						* Initial Q2_K Block Interleaving Implementation
* Addressed review comments and clean up of the code
* Post rebase fixes
* Initial CI/CD fixes
* Update declarations in arch-fallback.h
* Changes for GEMV Q2_K in arch-fallback.h
* Enable repacking only on AVX-512 machines
* Update comments in repack.cpp
* Address q2k comments
---------
Co-authored-by: Manogna-Sree <elisetti.manognasree@multicorewareinc.com > 
						
						
							
 
						
					 
					
						2025-08-01 09:20:33 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						ba42794c9e 
					 
					
						
						
							
							graph : fix equal_seq() check ( #14986 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-08-01 06:38:12 +03:00 
						 
				 
			
				
					
						
							
							
								diannao 
							
						 
					 
					
						
						
							
						
						2860d479b4 
					 
					
						
						
							
							docker : add cann build pipline ( #14591 )  
						
						... 
						
						
						
						* docker: add cann build pipline
* docker: add cann build pipline
* docker: fix cann devops
* cann : fix multi card hccl
* Update ggml/src/ggml-cann/ggml-cann.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
* Update ggml-cann.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
							
 
						
					 
					
						2025-08-01 10:02:34 +08:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						484b2091ce 
					 
					
						
						
							
							compare-commits.sh: support both llama-bench and test-backend-ops ( #14392 )  
						
						... 
						
						
						
						* compare-commits.sh: support both llama-bench and test-backend-ops
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Speed up the build by specifying -j 12
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Remove build_number from test-backend-ops db
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Apply suggestion from @JohannesGaessler
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* Refine tool selection logic
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Johannes Gäßler <johannesg@5d6.de > 
						
						
							
						
					 
					
						2025-08-01 08:47:27 +08:00 
						 
				 
			
				
					
						
							
							
								Ed Addario 
							
						 
					 
					
						
						
							
						
						daf2dd7880 
					 
					
						
						
							
							quantize : skip tensor override when in fallback mode ( #14995 )  
						
						
						
						
							
 
						
					 
					
						2025-07-31 21:32:18 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						a06ed5feae 
					 
					
						
						
							
							llama : add simple option to enable CPU for MoE weights (--cpu-moe) ( #14992 )  
						
						
						
						
							
 
						
					 
					
						2025-07-31 20:15:41 +02:00 
						 
				 
			
				
					
						
							
							
								Aman Gupta 
							
						 
					 
					
						
						
							
						
						784524053d 
					 
					
						
						
							
							Fix params bug in diffusion example ( #14993 )  
						
						
						
						
							
 
						
					 
					
						2025-08-01 01:22:58 +08:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						d6818d06a6 
					 
					
						
						
							
							llama : allow other bufts when overriding to CPU, add --no-repack option ( #14990 )  
						
						
						
						
							
 
						
					 
					
						2025-07-31 18:11:34 +02:00 
						 
				 
			
				
					
						
							
							
								Ruben Ortlam 
							
						 
					 
					
						
						
							
						
						e08a98826b 
					 
					
						
						
							
							Vulkan: Fix minor debug mode issues ( #14899 )  
						
						... 
						
						
						
						* vulkan: fix debug mode issues
* vulkan: remove broken check_results GGML_OP_SET_ROWS support 
						
						
							
 
						
					 
					
						2025-07-31 17:46:54 +02:00 
						 
				 
			
				
					
						
							
							
								tc-mb 
							
						 
					 
					
						
						
							
						
						952a47f455 
					 
					
						
						
							
							mtmd : support MiniCPM-V 4.0 ( #14983 )  
						
						... 
						
						
						
						* support minicpm-v 4
* add md
* support MiniCPM-o 4.0
* add default location
* temp rm MiniCPM-o 4.0
* fix code
* fix "minicpmv_projector" default path 
						
						
							
 
						
					 
					
						2025-07-31 17:22:17 +02:00 
						 
				 
			
				
					
						
							
							
								Csaba Kecskemeti 
							
						 
					 
					
						
						
							
						
						36e5fe7bcd 
					 
					
						
						
							
							MODEL_TENSOR.SSM_DT_NORM has defined twice ( #14991 )  
						
						... 
						
						
						
						* MODEL_TENSOR.SSM_DT_NORM has defined twice, and second overwritten the jamba model's layername
* correct order 
						
						
							
						
					 
					
						2025-07-31 10:59:49 -04:00 
						 
				 
			
				
					
						
							
							
								g2mt 
							
						 
					 
					
						
						
							
						
						94933c8c2e 
					 
					
						
						
							
							server : implement universal assisted decoding ( #12635 )  
						
						... 
						
						
						
						* llama-server : implement universal assisted decoding
* Erase prompt tail for kv-cache
* set vocab_dft_compatible in common_speculative
* rename ctx_main to ctx_tgt
* move vocab_dft_compatible to spec struct
* clear mem_dft, remove mem
* detokenize id_last for incompatible models
* update comment
* add --spec-replace flag
* accept special tokens when translating between draft/main models
* Escape spec-replace
* clamp draft result to size to params.n_draft
* fix comment
* clean up code
* restore old example
* log common_speculative_are_compatible in speculative example
* fix
* Update common/speculative.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/speculative.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/speculative.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2025-07-31 14:25:23 +02:00