Matt Clayton 
							
						 
					 
					
						
						
							
						
						f05a6d71a0 
					 
					
						
						
							
							mtmd : Expose helper_decode_image_chunk ( #13366 )  
						
						... 
						
						
						
						* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free
* Slim down
* Cleanups 
						
						
							
 
						
					 
					
						2025-05-08 20:25:39 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						ee01d71e58 
					 
					
						
						
							
							server : (webui) fix a very small misalignment ( #13387 )  
						
						... 
						
						
						
						* server : (webui) fix a very small misalignment
* restore font-bold 
						
						
							
						
					 
					
						2025-05-08 18:51:45 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						8c83449cb7 
					 
					
						
						
							
							server : (webui) revamp the input area, plus many small UI improvements ( #13365 )  
						
						... 
						
						
						
						* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log 
						
						
							
 
						
					 
					
						2025-05-08 15:37:29 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						1a844be132 
					 
					
						
						
							
							convert : support rope_scaling type and rope_type ( #13349 )  
						
						
						
						
							
						
					 
					
						2025-05-08 15:34:29 +02:00 
						 
				 
			
				
					
						
							
							
								welix 
							
						 
					 
					
						
						
							
						
						0ccc121354 
					 
					
						
						
							
							mtmd : fix the calculation of n_tokens for smolvlm ( #13381 )  
						
						... 
						
						
						
						Co-authored-by: Taichi Nishimura <Taichi.A.Nishimura@sony.com > 
						
						
							
 
						
					 
					
						2025-05-08 15:03:53 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6562e5a4d6 
					 
					
						
						
							
							context : allow cache-less context for embeddings ( #13108 )  
						
						... 
						
						
						
						* context : allow cache-less context for embeddings
ggml-ci
* context : enable reranking with encode()
ggml-ci
* context : encode() clears embd_seq
ggml-ci
* examples : use llama_encode() when appropriate
ggml-ci
* models : nomic bert moe does not require KV cache
* llama : update comments for llama_decode/llama_encode
ggml-ci
* context : update warning log [no ci] 
						
						
							
						
					 
					
						2025-05-08 14:28:33 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						51fb96b1ff 
					 
					
						
						
							
							context : remove logits_all flag ( #13284 )  
						
						... 
						
						
						
						* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci 
						
						
							
 
						
					 
					
						2025-05-08 14:26:50 +03:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						70a6991edf 
					 
					
						
						
							
							ci : move release workflow to a separate file ( #13362 )  
						
						
						
						
							
 
						
					 
					
						2025-05-08 13:15:28 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						f061021206 
					 
					
						
						
							
							llama : print size and type of overridden tensors ( #13364 )  
						
						
						
						
							
 
						
					 
					
						2025-05-08 13:15:15 +02:00 
						 
				 
			
				
					
						
							
							
								Alberto Cabrera Pérez 
							
						 
					 
					
						
						
							
						
						8733e0cf6e 
					 
					
						
						
							
							sycl: addressing non-contiguous src1 mul_mats (nc and batched) ( #13343 )  
						
						... 
						
						
						
						* sycl: fixed non-contiguous src1 mul_mats (nc and batched)
* Fixed wrong static_cast inside kernel 
						
						
							
 
						
					 
					
						2025-05-08 10:08:01 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						814f795e06 
					 
					
						
						
							
							docker : disable arm64 and intel images ( #13356 )  
						
						
						
						
							
						
					 
					
						2025-05-07 16:36:33 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d879433824 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-05-07 17:28:36 +03:00 
						 
				 
			
				
					
						
							
							
								Daniel Bevenius 
							
						 
					 
					
						
						
							
						
						13b0a04597 
					 
					
						
						
							
							whisper: remove MSVC warnings pragmas (whisper/3090)  
						
						... 
						
						
						
						* ggml : remove MSVC warnings pragmas
This commit removes the MSVC-specific pragmas as these are now handled
in ggml/CMakeLists.txt.
* whisper : remove MSVC warning pragmas
This commit removes the MSVC-specific pragmas. These are now handled in
the ggml/CMakeLists.txt file. 
						
						
							
						
					 
					
						2025-05-07 17:28:36 +03:00 
						 
				 
			
				
					
						
							
							
								Jared Tweed 
							
						 
					 
					
						
						
							
						
						bba9d945c1 
					 
					
						
						
							
							cmake : removed stdc++fs (whisper/3097)  
						
						... 
						
						
						
						* removed stdc++fs
* kept line, but removed stdc++fs 
						
						
							
						
					 
					
						2025-05-07 17:28:36 +03:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						bc4e1128f7 
					 
					
						
						
							
							llama : deci : support ffn-free with attention ( #13296 )  
						
						
						
						
							
 
						
					 
					
						2025-05-07 12:49:27 +02:00 
						 
				 
			
				
					
						
							
							
								Ycros 
							
						 
					 
					
						
						
							
						
						39e73ae0d6 
					 
					
						
						
							
							common : Add a warning when we can't match samplers from a string or char. ( #13330 )  
						
						
						
						
							
 
						
					 
					
						2025-05-07 11:23:28 +03:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						1f73301b63 
					 
					
						
						
							
							cuda : remove nrows_x in mul_mat_q_process_tile ( #13325 )  
						
						... 
						
						
						
						Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
							
 
						
					 
					
						2025-05-07 09:48:23 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4773d7a02f 
					 
					
						
						
							
							examples : remove infill ( #13283 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-05-07 10:28:02 +03:00 
						 
				 
			
				
					
						
							
							
								piDack 
							
						 
					 
					
						
						
							
						
						6c7fd67b64 
					 
					
						
						
							
							llama : support tie embedding for chatglm models ( #13328 )  
						
						
						
						
							
 
						
					 
					
						2025-05-07 09:23:11 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						141a908a59 
					 
					
						
						
							
							CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF ( #13135 )  
						
						
						
						
							
 
						
					 
					
						2025-05-06 23:35:51 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						32916a4907 
					 
					
						
						
							
							clip : refactor graph builder ( #13321 )  
						
						... 
						
						
						
						* mtmd : refactor graph builder
* fix qwen2vl
* clean up siglip cgraph
* pixtral migrated
* move minicpmv to a dedicated build function
* move max_feature_layer to build_llava
* use build_attn for minicpm resampler
* fix windows build
* add comment for batch_size
* also support tinygemma3 test model
* qwen2vl does not use RMS norm
* fix qwen2vl norm (2) 
						
						
							
 
						
					 
					
						2025-05-06 22:40:24 +02:00 
						 
				 
			
				
					
						
							
							
								DocShotgun 
							
						 
					 
					
						
						
							
						
						ffc727203a 
					 
					
						
						
							
							sampling : make top_n_sigma no-op at <=0 or a single candidate ( #13345 )  
						
						
						
						
							
 
						
					 
					
						2025-05-06 22:36:24 +02:00 
						 
				 
			
				
					
						
							
							
								oobabooga 
							
						 
					 
					
						
						
							
						
						91a86a6f35 
					 
					
						
						
							
							sampling : don't consider -infinity values in top_n_sigma ( #13344 )  
						
						
						
						
							
 
						
					 
					
						2025-05-06 20:24:15 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						f4ed10b69c 
					 
					
						
						
							
							cmake : remove arm64 msvc presets ( #13342 )  
						
						
						
						
							
						
					 
					
						2025-05-06 20:15:31 +02:00 
						 
				 
			
				
					
						
							
							
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						1e333d5bba 
					 
					
						
						
							
							SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled ( #13254 )  
						
						... 
						
						
						
						* SYCL: Do not set tensor extras when reorder optimize is disabled
* SYCL: Disable reorder optimize by default 
						
						
							
 
						
					 
					
						2025-05-06 20:27:06 +05:30 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						2f54e348ad 
					 
					
						
						
							
							llama : fix build_ffn without gate ( #13336 )  
						
						... 
						
						
						
						* llama : fix build_ffn without gate
* fix build on windows
* Revert "fix build on windows"
This reverts commit fc420d3c7e 
						
						
							
 
						
					 
					
						2025-05-06 14:25:40 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						2356fb1d53 
					 
					
						
						
							
							CUDA: fix bad asserts for partial offload ( #13337 )  
						
						
						
						
							
						
					 
					
						2025-05-06 13:58:51 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						764b85627b 
					 
					
						
						
							
							convert : qwen2/3moe : set yarn metadata if present ( #13331 )  
						
						... 
						
						
						
						* set yarn metadata if present
* add comment about enabling YaRN
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co >
---------
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co > 
						
						
							
						
					 
					
						2025-05-06 11:12:06 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						15a28ec8c7 
					 
					
						
						
							
							CUDA: fix --split-mode row for MMQ ( #13323 )  
						
						
						
						
							
 
						
					 
					
						2025-05-06 08:36:46 +02:00 
						 
				 
			
				
					
						
							
							
								compilade 
							
						 
					 
					
						
						
							
						
						a7366faa5b 
					 
					
						
						
							
							gguf-py : avoid requiring pyside6 for other scripts ( #13036 )  
						
						... 
						
						
						
						- gguf-py : remove gguf-py/gguf/scripts/__init__.py because it's not needed
Implicit namespaces are supported since Python 3.3 (https://peps.python.org/pep-0420/ ),
and the entrypoints in pyproject.toml can directly refer to the main functions. 
						
						
							
 
						
					 
					
						2025-05-05 22:27:31 -04:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						9070365020 
					 
					
						
						
							
							CUDA: fix logic for clearing padding with -ngl 0 ( #13320 )  
						
						
						
						
							
 
						
					 
					
						2025-05-05 22:32:13 +02:00 
						 
				 
			
				
					
						
							
							
								oobabooga 
							
						 
					 
					
						
						
							
						
						233461f812 
					 
					
						
						
							
							sampling : Integrate Top-nσ into main sampling chain (and add it to the server) ( #13264 )  
						
						... 
						
						
						
						* sampling: add Top-nσ sampler to `llama-server` and sampler ordering
* revert: sampler ordering
* revert: VS' crappy auto-formatting
* revert: VS' crappy auto-formatting pt.2
* revert: my crappy eye sight...
* sampling: add XTC to Top-nσ sampler chain
* sampling: add Dyna. Temp. to Top-nσ sampler chain
* sampling: actually remove Top-nσ from sampler(oops)
* Integrate top_n_sigma into main sampler chain
* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA
* Formatting
* Lint
* Exit early in the sampler if nsigma < 0
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com > 
						
						
							
 
						
					 
					
						2025-05-05 22:12:19 +02:00 
						 
				 
			
				
					
						
							
							
								igardev 
							
						 
					 
					
						
						
							
						
						b34c859146 
					 
					
						
						
							
							server : Webui - change setText command from parent window to also send the message. ( #13309 )  
						
						... 
						
						
						
						* setText command from parent window for llama-vscode now sends the message automatically.
* Upgrade packages versions to fix vulnerabilities with "npm audit fix" command.
* Fix code formatting.
* Add index.html.gz changes.
* Revert "Upgrade packages versions to fix vulnerabilities with "npm audit fix" command."
This reverts commit 67687b7fdaivailo.gardev@akros.ch >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
							
						
					 
					
						2025-05-05 16:03:31 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						9b61acf060 
					 
					
						
						
							
							mtmd : rename llava directory to mtmd ( #13311 )  
						
						... 
						
						
						
						* mv llava to mtmd
* change ref everywhere 
						
						
							
 
						
					 
					
						2025-05-05 16:02:55 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						5215b91e93 
					 
					
						
						
							
							clip :  fix confused naming ffn_up and ffn_down ( #13290 )  
						
						... 
						
						
						
						* clip :  fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff 
						
						
							
 
						
					 
					
						2025-05-05 12:54:44 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						ae803bfc3d 
					 
					
						
						
							
							convert : bailingmoe : set yarn metadata if present ( #13312 )  
						
						
						
						
							
						
					 
					
						2025-05-05 12:34:26 +02:00 
						 
				 
			
				
					
						
							
							
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						66645a5285 
					 
					
						
						
							
							SYCL: Disable mul_mat kernels for noncontiguous tensor b ( #13308 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-05-05 13:39:10 +05:30 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						27aa259532 
					 
					
						
						
							
							mtmd : add C public API ( #13184 )  
						
						... 
						
						
						
						* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* add const to various places
* add warning about breaking changes
* helper: use mtmd_image_tokens_get_n_pos 
						
						
							
 
						
					 
					
						2025-05-04 23:43:42 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						9fdfcdaedd 
					 
					
						
						
							
							rpc : use backend registry, support dl backends ( #13304 )  
						
						
						
						
							
 
						
					 
					
						2025-05-04 21:25:43 +02:00 
						 
				 
			
				
					
						
							
							
								Aaron Teo 
							
						 
					 
					
						
						
							
						
						6eb7d25c70 
					 
					
						
						
							
							ggml : activate s390x simd for Q3_K ( #13301 )  
						
						... 
						
						
						
						Signed-off-by: Aaron Teo <aaron.teo1@ibm.com > 
						
						
							
 
						
					 
					
						2025-05-04 19:49:12 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						86bd60d3fe 
					 
					
						
						
							
							llava/mtmd : fixes to fully support dl backends ( #13303 )  
						
						
						
						
							
 
						
					 
					
						2025-05-04 17:05:20 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						9f2da5871f 
					 
					
						
						
							
							llama : build windows releases with dl backends ( #13220 )  
						
						
						
						
							
 
						
					 
					
						2025-05-04 14:20:49 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						93c4e23905 
					 
					
						
						
							
							CUDA: fix race condition in MMQ stream-k fixup ( #13299 )  
						
						
						
						
							
 
						
					 
					
						2025-05-04 14:16:39 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						8afbd96818 
					 
					
						
						
							
							CUDA: fix race condition in MMQ ids_dst ( #13294 )  
						
						
						
						
							
 
						
					 
					
						2025-05-04 13:58:38 +02:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						8ae5ebcf85 
					 
					
						
						
							
							vulkan: Additional type support for unary, binary, and copy ( #13266 )  
						
						... 
						
						
						
						Support f16->f32 copy.
Support f16->f16 and f32->f32 unary ops.
Support all combinations of f16/f32 for src0/src1/dst for add/sub/mul/div. 
						
						
							
 
						
					 
					
						2025-05-04 07:17:16 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						3e959f0976 
					 
					
						
						
							
							imatrix: fix oob writes if src1 is not contiguous ( #13286 )  
						
						
						
						
							
 
						
					 
					
						2025-05-04 00:50:37 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						36667c8edc 
					 
					
						
						
							
							clip : revert the change of BOI/EOI token for GLM-edge ( ⚠️  breaking change) ( #13259 )  
						
						
						
						
							
 
						
					 
					
						2025-05-03 20:07:54 +02:00 
						 
				 
			
				
					
						
							
							
								ymcki 
							
						 
					 
					
						
						
							
						
						3bf785f3ef 
					 
					
						
						
							
							llama : Llama-3_1-Nemotron-Ultra-253B-v1 support ( #12843 )  
						
						
						
						
							
 
						
					 
					
						2025-05-03 17:39:51 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						1d36b3670b 
					 
					
						
						
							
							llama : move end-user examples to tools directory ( #13249 )  
						
						... 
						
						
						
						* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
							
 
						
					 
					
						2025-05-02 20:27:13 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b34443923c 
					 
					
						
						
							
							sync : ggml ( #13268 )  
						
						... 
						
						
						
						* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)
* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)
* review: remove src_x/y < 0 checks; add performance tests
* sync : ggml
ggml-ci
* vulkan : fix lint (#0 )
---------
Co-authored-by: Acly <aclysia@gmail.com > 
						
						
							
						
					 
					
						2025-05-02 20:54:30 +03:00