Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						db2bb378b1 
					 
					
						
						
							
							cont : gate the ggml_set_rows usage with env var  
						
						... 
						
						
						
						ggml-ci 
						
						
							
						
					 
					
						2025-06-23 13:21:36 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						79dac3c861 
					 
					
						
						
							
							kv-cache : use ggml_set_rows  
						
						... 
						
						
						
						ggml-ci 
						
						
							
						
					 
					
						2025-06-23 13:21:36 +03:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						1f647b5992 
					 
					
						
						
							
							ggml : fix supports_op  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:36 +03:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						eba97574da 
					 
					
						
						
							
							ggml : simplify forward_dup_f32  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:36 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c0cfc2f78b 
					 
					
						
						
							
							metal : add ggml_set_rows implementation  
						
						... 
						
						
						
						ggml-ci 
						
						
							
						
					 
					
						2025-06-23 13:21:36 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						828e5d2fcd 
					 
					
						
						
							
							tests : add ggml_set_rows  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:35 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e73690a69d 
					 
					
						
						
							
							ggml : ggml_set_rows update comment + better index name  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:35 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e89709721b 
					 
					
						
						
							
							ggml : support GGML_TYPE_F32 ".from_float" trait  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:35 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						630c84a2bd 
					 
					
						
						
							
							ggml : ggml_set_rows support quantized dst  
						
						... 
						
						
						
						ggml-ci 
						
						
							
						
					 
					
						2025-06-23 13:21:35 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						df71c803b4 
					 
					
						
						
							
							ggml : ggml_set_rows support broadcast  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:35 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						313a444b22 
					 
					
						
						
							
							ggml : add ggml_is_contiguous_rows  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:35 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						695b6b7025 
					 
					
						
						
							
							ggml : add repeat impl for i64  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:34 +03:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						f2cd962fe2 
					 
					
						
						
							
							use I64 for indices  
						
						
						
						
							
						
					 
					
						2025-06-23 13:21:34 +03:00 
						 
				 
			
				
					
						
							
							
								Radoslav Gerganov 
							
						 
					 
					
						
						
							
						
						c1a581a10b 
					 
					
						
						
							
							ggml : add ggml_set_rows  
						
						... 
						
						
						
						Add ggml_set_rows(a, b, c) which copies rows from 'b' into 'a' using
indices from 'c'.
ref: #8366  
						
						
							
						
					 
					
						2025-06-23 13:21:32 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						7b50d589a8 
					 
					
						
						
							
							kv-cells : fix tracking of seq_pos ( #14339 )  
						
						... 
						
						
						
						* kv-cells : fix tracking of seq_pos during cache reuse
ggml-ci
* cont : improve error message
ggml-ci
* cont : add more comments 
						
						
							
 
						
					 
					
						2025-06-23 12:27:35 +03:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						3a9457df96 
					 
					
						
						
							
							vulkan: update windows SDK in CI ( #14334 )  
						
						
						
						
							
						
					 
					
						2025-06-23 10:19:24 +02:00 
						 
				 
			
				
					
						
							
							
								Ed Addario 
							
						 
					 
					
						
						
							
						
						fa4a9f2a1c 
					 
					
						
						
							
							quantize : handle user-defined pruning of whole layers (blocks) ( #13037 )  
						
						
						
						
							
 
						
					 
					
						2025-06-22 23:16:26 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						238005c2dc 
					 
					
						
						
							
							gguf-py : fix SpecialVocab parsing when post_processor is null ( #14330 )  
						
						
						
						
							
						
					 
					
						2025-06-22 19:46:17 +02:00 
						 
				 
			
				
					
						
							
							
								Ruikai Peng 
							
						 
					 
					
						
						
							
						
						66aba7aca9 
					 
					
						
						
							
							run : avoid double tokenization ( #14327 )  
						
						... 
						
						
						
						* run : avoid double tokenization by adopting common_tokenize heuristic
* build : fix windows gcc and clang warnings
* lint : fixed trailing whitepace
* run : fix is_first flag 
						
						
							
 
						
					 
					
						2025-06-23 01:28:06 +08:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f1f5e82df6 
					 
					
						
						
							
							examples : fix is_first logic for tokenization ( #14329 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-06-22 20:10:07 +03:00 
						 
				 
			
				
					
						
							
							
								uvos 
							
						 
					 
					
						
						
							
						
						af3373f1ad 
					 
					
						
						
							
							HIP: enable vec fattn on RDNA4 ( #14323 )  
						
						
						
						
							
 
						
					 
					
						2025-06-22 16:51:23 +02:00 
						 
				 
			
				
					
						
							
							
								yuiseki 
							
						 
					 
					
						
						
							
						
						5d5c066de8 
					 
					
						
						
							
							mtmd : fix Pixtral OOM with large images by capping image_size to 1024 ( #14326 )  
						
						... 
						
						
						
						Mistral Small 2506 models using Pixtral vision encoder were running out
of GPU memory when processing images larger than 1024x1024 pixels due to
exponential memory growth from unlimited image size.
This fix applies the same 1024x1024 limit used by Qwen2VL models to
prevent OOM issues while maintaining compatibility with existing models. 
						
						
							
 
						
					 
					
						2025-06-22 14:44:57 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						40bfa04c95 
					 
					
						
						
							
							common : use std::string_view now that we target c++17 ( #14319 )  
						
						
						
						
							
 
						
					 
					
						2025-06-22 08:37:43 +03:00 
						 
				 
			
				
					
						
							
							
								Aman Gupta 
							
						 
					 
					
						
						
							
						
						aa064b2eb7 
					 
					
						
						
							
							CUDA: add mean operation ( #14313 )  
						
						... 
						
						
						
						* CUDA: add mean operation
* add back sum_rows_f32_cuda
* Review: early exit if col!=0 
						
						
							
 
						
					 
					
						2025-06-22 12:39:54 +08:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						aa0ef5c578 
					 
					
						
						
							
							gguf-py : fix Qwen3-Embedding eos token ( #14314 )  
						
						
						
						
							
						
					 
					
						2025-06-21 18:12:05 +02:00 
						 
				 
			
				
					
						
							
							
								Markus Tavenrath 
							
						 
					 
					
						
						
							
						
						bb16041cae 
					 
					
						
						
							
							Add support for VK_EXT_debug_utils to add labels to Vulkan objects. ( #13792 )  
						
						... 
						
						
						
						* Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled.
* remove #ifdef for debug utils and add queue marker. 
						
						
							
 
						
					 
					
						2025-06-21 08:17:12 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						58cba76a9a 
					 
					
						
						
							
							gguf-py : fix TemplateProcessing pair when bos/eos is missing ( #14312 )  
						
						
						
						
							
						
					 
					
						2025-06-21 07:33:21 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						67ae5312e2 
					 
					
						
						
							
							metal : fix thread-safety ( #14300 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-06-21 08:04:18 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						692e3cdd0a 
					 
					
						
						
							
							memory : rename interface to llama_memory_context_i ( #14296 )  
						
						... 
						
						
						
						* memory : rename interface to llama_memory_context_i
ggml-ci
* cont : fix comments
* cont : use "mctx" for referencing a memory context
ggml-ci 
						
						
							
 
						
					 
					
						2025-06-21 08:03:46 +03:00 
						 
				 
			
				
					
						
							
							
								Daniel Han 
							
						 
					 
					
						
						
							
						
						b23fa0b3f4 
					 
					
						
						
							
							convert : fix Llama 4 conversion ( #14311 )  
						
						
						
						
							
						
					 
					
						2025-06-21 06:32:01 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						06cbedfca1 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-06-20 21:02:47 +03:00 
						 
				 
			
				
					
						
							
							
								Acly 
							
						 
					 
					
						
						
							
						
						b7147673f2 
					 
					
						
						
							
							Add ggml_roll (ggml/1274)  
						
						... 
						
						
						
						* ggml : add ggml_roll
* use set/get_op_params & std::min 
						
						
							
						
					 
					
						2025-06-20 21:02:47 +03:00 
						 
				 
			
				
					
						
							
							
								David Chiu 
							
						 
					 
					
						
						
							
						
						d860dd99a4 
					 
					
						
						
							
							docs : fix the link to llama.h ( #14293 )  
						
						
						
						
							
						
					 
					
						2025-06-20 19:43:35 +02:00 
						 
				 
			
				
					
						
							
							
								Aman Gupta 
							
						 
					 
					
						
						
							
						
						c959f462a0 
					 
					
						
						
							
							CUDA: add conv_2d_transpose ( #14287 )  
						
						... 
						
						
						
						* CUDA: add conv_2d_transpose
* remove direct include of cuda_fp16
* Review: add brackets for readability, remove ggml_set_param and add asserts 
						
						
							
 
						
					 
					
						2025-06-20 22:48:24 +08:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						22015b2092 
					 
					
						
						
							
							lint : remove trailing whitepace ( #14304 )  
						
						
						
						
							
 
						
					 
					
						2025-06-20 16:37:44 +02:00 
						 
				 
			
				
					
						
							
							
								Ruikai Peng 
							
						 
					 
					
						
						
							
						
						dd6e6d0b6a 
					 
					
						
						
							
							vocab : prevent tokenizer overflow ( #14301 )  
						
						... 
						
						
						
						* vocab : prevent stack overflow in tokenize
* vocab : return error instead of aborting on oversized token count
* vocab : INT32_MIN from llama_tokenize on overflow 
						
						
							
 
						
					 
					
						2025-06-20 07:13:06 -07:00 
						 
				 
			
				
					
						
							
							
								Nicolò Scipione 
							
						 
					 
					
						
						
							
						
						8308f98c7f 
					 
					
						
						
							
							sycl: add usage of enqueue_functions extension ( #14244 )  
						
						... 
						
						
						
						* Add header and namespace to use enqueue_functions extension
* Convert submit and parallel_for to use new extension in convert.cpp
* Convert submit and parallel_for to use extension in ggml-sycl.cpp
* Convert submit and parallel_for to use extension in gla.cpp
* Convert submit and parallel_for in mmq.cpp
* Convert submit and parallel_for in mmvq.cpp
* Convert submit and parallel_for in remaining files
* Convert all simple parallel_for to nd_launch from enqueue_functions
extension
* Wrapping extension in general function
Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com > 
						
						
							
 
						
					 
					
						2025-06-20 15:07:21 +02:00 
						 
				 
			
				
					
						
							
							
								Christian Kastner 
							
						 
					 
					
						
						
							
						
						6369be0735 
					 
					
						
						
							
							Implement GGML_CPU_ALL_VARIANTS for PowerPC ( #14286 )  
						
						... 
						
						
						
						* Add PowerPC feature detection and scoring
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC
* ggml-cpu: Delay some initializations until function is called
When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
							
 
						
					 
					
						2025-06-20 14:17:32 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						88fc854b4b 
					 
					
						
						
							
							llama : improve sep token handling ( #14272 )  
						
						
						
						
							
 
						
					 
					
						2025-06-20 14:04:09 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						e28c1b93fd 
					 
					
						
						
							
							cuda : synchronize graph capture and cublas handle destruction ( #14288 )  
						
						... 
						
						
						
						Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread 
						
						
							
 
						
					 
					
						2025-06-20 13:57:36 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d27b3ca175 
					 
					
						
						
							
							ggml : fix repack work size for mul_mat_id ( #14292 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-06-20 11:19:15 +03:00 
						 
				 
			
				
					
						
							
							
								Charles Xu 
							
						 
					 
					
						
						
							
						
						9230dbe2c7 
					 
					
						
						
							
							ggml: Update KleidiAI to v1.9.0 ( #14277 )  
						
						
						
						
							
 
						
					 
					
						2025-06-20 10:51:01 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						812939a9e9 
					 
					
						
						
							
							model : more uniform output id handling ( #14275 )  
						
						... 
						
						
						
						* model : more uniform output id handling
ggml-ci
* cont : revert n_outputs < n_tokens optimization
ggml-ci
* cont : fix out_ids initialization
ggml-ci 
						
						
							
 
						
					 
					
						2025-06-20 10:50:27 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4c9fdfbe15 
					 
					
						
						
							
							ubatch : new splitting logic ( #14217 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-06-20 10:14:14 +03:00 
						 
				 
			
				
					
						
							
							
								Aman Gupta 
							
						 
					 
					
						
						
							
						
						9eaa51e7f0 
					 
					
						
						
							
							CUDA: add conv_2d_dw ( #14265 )  
						
						... 
						
						
						
						* CUDA: add conv_2d_dw
* better naming
* simplify using template
* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const 
						
						
							
 
						
					 
					
						2025-06-20 09:50:24 +08:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						8f71d0f3e8 
					 
					
						
						
							
							ggml-cpu : remove unnecesary arm feature detection ( #14281 )  
						
						... 
						
						
						
						Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code. 
						
						
							
 
						
					 
					
						2025-06-19 21:24:14 +02:00 
						 
				 
			
				
					
						
							
							
								Alex Trotta 
							
						 
					 
					
						
						
							
						
						381174bbda 
					 
					
						
						
							
							gguf-py : make sentencepiece optional ( #14200 )  
						
						... 
						
						
						
						* Make sentencepiece optional
* Bump to 0.18.0
* Bump patch instead of minor
Co-authored-by: compilade <git@compilade.net >
---------
Co-authored-by: compilade <git@compilade.net > 
						
						
							
 
						
					 
					
						2025-06-19 15:56:12 +02:00 
						 
				 
			
				
					
						
							
							
								aa956 
							
						 
					 
					
						
						
							
						
						d67341dc18 
					 
					
						
						
							
							server : add server parameters for draft model cache type ( #13782 )  
						
						... 
						
						
						
						Co-authored-by: aa956 <27946957+aa956@users.noreply.github.com > 
						
						
							
 
						
					 
					
						2025-06-19 16:01:03 +03:00 
						 
				 
			
				
					
						
							
							
								fanyang 
							
						 
					 
					
						
						
							
						
						456af35eb7 
					 
					
						
						
							
							build : suppress gcc15 compile warnings ( #14261 )  
						
						... 
						
						
						
						* Change _contains_any() substrs to std::string_view and fix the find comparison logic. 
						
						
							
 
						
					 
					
						2025-06-19 14:49:48 +02:00 
						 
				 
			
				
					
						
							
							
								Anton Mitkov 
							
						 
					 
					
						
						
							
						
						600e3e9b50 
					 
					
						
						
							
							sycl: Cleanup codepaths in Get Rows in sycl backend ( #14215 )  
						
						... 
						
						
						
						Addresses unused reorder path 
						
						
							
 
						
					 
					
						2025-06-19 11:40:21 +01:00