0cc4m 
							
						 
					 
					
						
						
							
						
						90f17bba01 
					 
					
						
						
							
							Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues  
						
						
						
						
							
						
					 
					
						2025-03-17 19:41:11 +00:00 
						 
				 
			
				
					
						
							
							
								Gaurav Garg 
							
						 
					 
					
						
						
							
						
						b1b132efcb 
					 
					
						
						
							
							cuda : enable CUDA Graph on CUDA Toolkit < 12.x ( #12394 )  
						
						... 
						
						
						
						* Enable CUDA Graph on CTK < 12.x
`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.
* Fix compilation errors with MUSA
* Disable CUDA Graph for MUSA 
						
						
							
 
						
					 
					
						2025-03-17 20:25:13 +02:00 
						 
				 
			
				
					
						
							
							
								Guus Waals 
							
						 
					 
					
						
						
							
						
						01e8f2138b 
					 
					
						
						
							
							ggml-vulkan: remove unused find_program(glslc) ( #12416 )  
						
						... 
						
						
						
						It's already found by FindVulkan.cmake in the parent CMakeLists 
						
						
							
						
					 
					
						2025-03-17 13:35:43 -03:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						484a8ab513 
					 
					
						
						
							
							vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader ( #12312 )  
						
						
						
						
							
 
						
					 
					
						2025-03-17 09:26:18 -05:00 
						 
				 
			
				
					
						
							
							
								Daniele 
							
						 
					 
					
						
						
							
						
						cf2270e4d3 
					 
					
						
						
							
							vulkan: subgroup size tuning ( #12087 )  
						
						... 
						
						
						
						* vulkan: subgroup size test
* Vulkan: Add device architecture enum and logic to recognize AMD generations
* vulkan: use new architecture logic to specify subgroup size
* Initial vulkan subgroup size tuning for RDNA3
* vulkan: commonize RDNA subgroup tuning
* vulkan: override subgroup size if required_subgroup_size = 0
* vulkan: disable warp 32 for RDNA3
* vulkan: fine tuned RDNA1 subgroup sizes
* vulkan: adjusted subgroup size map
* vulkan: fixed RDNA2 subgroup map
---------
Co-authored-by: 0cc4m <picard12@live.de > 
						
						
							
 
						
					 
					
						2025-03-17 12:42:33 +01:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						f07690c930 
					 
					
						
						
							
							vulkan: use fp32 in coopmat2 q4_k dequant function ( #12309 )  
						
						
						
						
							
 
						
					 
					
						2025-03-17 10:43:35 +01:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						891c63956d 
					 
					
						
						
							
							vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking ( #12273 )  
						
						... 
						
						
						
						* vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking 
						
						
							
 
						
					 
					
						2025-03-17 10:41:59 +01:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						2f21123c1d 
					 
					
						
						
							
							vulkan: Adjust coopmat2 tile sizes and selection heuristic ( #12258 )  
						
						
						
						
							
 
						
					 
					
						2025-03-17 10:35:00 +01:00 
						 
				 
			
				
					
						
							
							
								Christian Kastner 
							
						 
					 
					
						
						
							
						
						374101fd74 
					 
					
						
						
							
							cmake : enable building llama.cpp using system libggml ( #12321 )  
						
						... 
						
						
						
						* cmake: Factor out compiler flag function from ggml
llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).
* cmake: Enable building against system ggml
This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends. 
						
						
							
 
						
					 
					
						2025-03-17 11:05:23 +02:00 
						 
				 
			
				
					
						
							
							
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						b3c9a65673 
					 
					
						
						
							
							SYCL: set extras only on GGML_TYPE_Q4_0 ( #12366 )  
						
						... 
						
						
						
						* SYCL: set extras only on GGML_TYPE_Q4_0
* release tensor_extras in reset buffer interface 
						
						
							
 
						
					 
					
						2025-03-17 09:45:12 +08:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						8ba95dca20 
					 
					
						
						
							
							llama : fix OLMo-2-0325-32B-Instruct K-norm size ( #12400 )  
						
						
						
						
							
 
						
					 
					
						2025-03-16 19:46:36 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						dc079cfdff 
					 
					
						
						
							
							context : fix init of n_outputs ( #12397 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-03-16 19:29:36 +02:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						7b61bcc87c 
					 
					
						
						
							
							ci : add --symlinks to xcframework zip command ( #12409 )  
						
						... 
						
						
						
						This commit adds the --symlinks option to the zip command used to create
the xcframework zip file. This is necessary to create symlinks in the
zip file. Without this option,  the Versions symlink is stored as a
regular directory entry in the zip file, rather than as a symlink in the
zip which causes the followig error in xcode:
```console
Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22)
```
Refs: https://github.com/ggml-org/llama.cpp/pull/11996#issuecomment-2727026377  
						
						
							
						
					 
					
						2025-03-16 18:22:05 +01:00 
						 
				 
			
				
					
						
							
							
								marcoStocchi 
							
						 
					 
					
						
						
							
						
						f4c3dd5daa 
					 
					
						
						
							
							llama-tts : add '-o' option ( #12398 )  
						
						... 
						
						
						
						* added -o option to specify an output file name
* llama-tts returns ENOENT in case of file write error
note : PR #12042  is closed as superseded with this one. 
						
						
							
 
						
					 
					
						2025-03-15 17:23:11 +01:00 
						 
				 
			
				
					
						
							
							
								aubreyli 
							
						 
					 
					
						
						
							
						
						3d35d87b41 
					 
					
						
						
							
							SYCL: Delete redundant plus sign and space ( #12391 )  
						
						
						
						
							
 
						
					 
					
						2025-03-15 15:49:03 +01:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						b19bd064c0 
					 
					
						
						
							
							SYCL : support non-contiguous tensors in binary ops (add, sub, etc) ( #12399 )  
						
						... 
						
						
						
						* sycl : support non-contiguous tensors in binary ops
* sycl : silence unused variable warning
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
							
 
						
					 
					
						2025-03-15 22:19:30 +08:00 
						 
				 
			
				
					
						
							
							
								Chenguang Li 
							
						 
					 
					
						
						
							
						
						92a391327e 
					 
					
						
						
							
							[CANN]MUL_MAT optimization ( #12382 )  
						
						
						
						
							
						
					 
					
						2025-03-15 09:31:08 +08:00 
						 
				 
			
				
					
						
							
							
								Eric Curtin 
							
						 
					 
					
						
						
							
						
						9f2250ba72 
					 
					
						
						
							
							Add CLI arg to llama-run to adjust the number of threads used ( #12370 )  
						
						... 
						
						
						
						We default to 4, sometimes we want to manually adjust this
Signed-off-by: Eric Curtin <ecurtin@redhat.com > 
						
						
							
 
						
					 
					
						2025-03-14 16:41:20 +00:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						774973b8f3 
					 
					
						
						
							
							main : add -sysf / --system-prompt-file ( #12249 ) ( #12250 )  
						
						... 
						
						
						
						* add system_prompt_file
* add -sysf / --system-prompt-file
* remove system_prompt_file 
						
						
							
 
						
					 
					
						2025-03-14 16:57:05 +01:00 
						 
				 
			
				
					
						
							
							
								fairydreaming 
							
						 
					 
					
						
						
							
						
						8fcb563613 
					 
					
						
						
							
							Load all MoE experts during warmup ( #11571 )  
						
						... 
						
						
						
						* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup
* common : use new API to enable warmup mode during model warmup
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com > 
						
						
							
						
					 
					
						2025-03-14 13:47:05 +01:00 
						 
				 
			
				
					
						
							
							
								Victor 
							
						 
					 
					
						
						
							
						
						add2a3aa5a 
					 
					
						
						
							
							server: fix "--grammar-file" parameter ( #12285 )  
						
						
						
						
							
 
						
					 
					
						2025-03-14 11:21:17 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c522ce4143 
					 
					
						
						
							
							graph : simplify attn input build for unified KV cache ( #12381 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-03-14 10:47:44 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						081bee8c64 
					 
					
						
						
							
							hparams : add SWA rope parameters ( #12374 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-03-14 09:03:24 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						84d5475541 
					 
					
						
						
							
							llama : fix Gemma3 SWA KV cache shift ( #12373 )  
						
						... 
						
						
						
						* llama : fix Gemma3 SWA KV cache shift
ggml-ci
* hparams : add comment [no ci] 
						
						
							
						
					 
					
						2025-03-13 19:08:07 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						be7c303410 
					 
					
						
						
							
							arg : no n_predict = -2 for examples except for main and infill ( #12364 )  
						
						
						
						
							
 
						
					 
					
						2025-03-13 12:34:54 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e0dbec0bc6 
					 
					
						
						
							
							llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )  
						
						... 
						
						
						
						* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci 
						
						
							
						
					 
					
						2025-03-13 12:35:44 +02:00 
						 
				 
			
				
					
						
							
							
								Ishaan Gandhi 
							
						 
					 
					
						
						
							
						
						2048b5913d 
					 
					
						
						
							
							server : fix crash when using verbose output with input tokens that are not in printable range ( #12178 ) ( #12338 )  
						
						... 
						
						
						
						* Fix DOS index bug
* Remove new APIs
* remove extra line
* Remove from API
* Add extra newline
* Update examples/server/server.cpp
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
							
 
						
					 
					
						2025-03-13 11:10:05 +01:00 
						 
				 
			
				
					
						
							
							
								Oscar Barenys 
							
						 
					 
					
						
						
							
						
						f08f4b3187 
					 
					
						
						
							
							Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support ( #12301 )  
						
						
						
						
							
 
						
					 
					
						2025-03-12 20:06:58 +01:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						80a02aa858 
					 
					
						
						
							
							llama.swiftui : fix xcframework dir in README [no ci] ( #12353 )  
						
						... 
						
						
						
						This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory. 
						
						
							
						
					 
					
						2025-03-12 13:45:32 +01:00 
						 
				 
			
				
					
						
							
							
								Alberto Cabrera Pérez 
							
						 
					 
					
						
						
							
						
						363f8c5d67 
					 
					
						
						
							
							sycl : variable sg_size support for mmvq kernels ( #12336 )  
						
						
						
						
							
 
						
					 
					
						2025-03-12 09:57:32 +00:00 
						 
				 
			
				
					
						
							
							
								uvos 
							
						 
					 
					
						
						
							
						
						34c961b181 
					 
					
						
						
							
							CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 ( #12315 )  
						
						... 
						
						
						
						When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64 
						
						
							
 
						
					 
					
						2025-03-12 10:14:11 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						7841fc723e 
					 
					
						
						
							
							llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )  
						
						... 
						
						
						
						* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64 
						
						
							
 
						
					 
					
						2025-03-12 09:30:24 +01:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						bf69cfe62f 
					 
					
						
						
							
							vulkan: fix bug in coopmat1 mul_mat_id ( #12316 )  
						
						... 
						
						
						
						* tests: run mul_mat_id with a larger N
* vulkan: fix bug in coopmat1 mul_mat_id 
						
						
							
 
						
					 
					
						2025-03-12 06:59:19 +01:00 
						 
				 
			
				
					
						
							
							
								uvos 
							
						 
					 
					
						
						
							
						
						10f2e81809 
					 
					
						
						
							
							CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. ( #12177 )  
						
						... 
						
						
						
						refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de > 
						
						
							
 
						
					 
					
						2025-03-11 20:16:03 +01:00 
						 
				 
			
				
					
						
							
							
								jklincn 
							
						 
					 
					
						
						
							
						
						ba7654380a 
					 
					
						
						
							
							ggml-backend : fix backend search path ( #12330 )  
						
						... 
						
						
						
						* Fix backend search path
* replace .native() with '/'
* reverted .native() 
						
						
							
 
						
					 
					
						2025-03-11 14:25:17 +01:00 
						 
				 
			
				
					
						
							
							
								BB-fat 
							
						 
					 
					
						
						
							
						
						6ab2e4765a 
					 
					
						
						
							
							metal : Cache the Metal library at the device context level ( #12265 )  
						
						
						
						
							
 
						
					 
					
						2025-03-11 13:45:02 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						96e1280839 
					 
					
						
						
							
							clip : bring back GPU support ( #12322 )  
						
						... 
						
						
						
						* clip : bring back GPU support
* use n_gpu_layers param
* fix double free
* ggml_backend_init_by_type
* clean up 
						
						
							
 
						
					 
					
						2025-03-11 09:20:16 +01:00 
						 
				 
			
				
					
						
							
							
								Eve 
							
						 
					 
					
						
						
							
						
						2c9f833d17 
					 
					
						
						
							
							mat vec double buffer ( #12188 )  
						
						
						
						
							
 
						
					 
					
						2025-03-10 19:28:11 +00:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						251364549f 
					 
					
						
						
							
							musa: support new arch mp_31 and update doc ( #12296 )  
						
						... 
						
						
						
						Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
							
 
						
					 
					
						2025-03-10 18:18:25 +01:00 
						 
				 
			
				
					
						
							
							
								Henry Linjamäki 
							
						 
					 
					
						
						
							
						
						8acdacb3ea 
					 
					
						
						
							
							opencl: use OpenCL C standard supported by the device ( #12221 )  
						
						... 
						
						
						
						This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available. 
						
						
							
 
						
					 
					
						2025-03-10 09:57:00 -07:00 
						 
				 
			
				
					
						
							
							
								John Bean 
							
						 
					 
					
						
						
							
						
						89b2b56e86 
					 
					
						
						
							
							readme: added Sidekick to available UIs ( #12311 )  
						
						
						
						
							
						
					 
					
						2025-03-10 16:13:09 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e128a1bf5b 
					 
					
						
						
							
							tests : fix test-quantize-fns to init the CPU backend ( #12306 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-03-10 14:07:15 +02:00 
						 
				 
			
				
					
						
							
							
								marcoStocchi 
							
						 
					 
					
						
						
							
						
						6ef79a67ca 
					 
					
						
						
							
							common : refactor '-o' option ( #12278 )  
						
						... 
						
						
						
						As discussed in PR 'llama-tts : add -o option' (#12042 ):
* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.
* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program. 
						
						
							
 
						
					 
					
						2025-03-10 13:34:13 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						4e39a3c332 
					 
					
						
						
							
							server: extract <think> tags from qwq outputs (#12297 )  
						
						... 
						
						
						
						* extract <think> tags from qwq outputs
* const for all static regexes in chat.cpp 
						
						
							
 
						
					 
					
						2025-03-10 10:59:03 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						be421fc429 
					 
					
						
						
							
							tool-call: ensure there's always a non-empty tool call id (#12292 )  
						
						
						
						
							
						
					 
					
						2025-03-10 09:45:29 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						87c2630546 
					 
					
						
						
							
							allow missing content in message if tool_calls provided ( #12293 )  
						
						
						
						
							
 
						
					 
					
						2025-03-10 09:45:07 +00:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						2b3a25c212 
					 
					
						
						
							
							sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 )  
						
						... 
						
						
						
						* Fix typo in lazy grammar handling (fixes trigger tokens)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2025-03-10 09:44:42 +00:00 
						 
				 
			
				
					
						
							
							
								tc-mb 
							
						 
					 
					
						
						
							
						
						8352cdc87b 
					 
					
						
						
							
							llava : fix bug in minicpm-v code ( #11513 )  
						
						... 
						
						
						
						* fix bug in minicpm-v code
* update readme of minicpm-v 
						
						
							
 
						
					 
					
						2025-03-10 10:33:24 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						1e2f78a004 
					 
					
						
						
							
							server : add speculative decoding presets for FIM ( #12287 )  
						
						
						
						
							
						
					 
					
						2025-03-09 19:08:20 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0fd7ca7a21 
					 
					
						
						
							
							authors : update ( #12271 )  
						
						
						
						
							
						
					 
					
						2025-03-08 18:26:00 +02:00