Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						cc74d5be99 
					 
					
						
						
							
							server : pad small embedding batches ( #13692 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-05-22 16:33:39 +03:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						5be24af73d 
					 
					
						
						
							
							gguf-py : correct charsmap parameter typing ( #13701 )  
						
						
						
						
							
						
					 
					
						2025-05-22 14:25:05 +02:00 
						 
				 
			
				
					
						
							
							
								Nicolò Scipione 
							
						 
					 
					
						
						
							
						
						d394a9aedc 
					 
					
						
						
							
							sycl : Remove waits from function calls ( #13702 )  
						
						... 
						
						
						
						* removes the waits in async memcpy functions 
						
						
							
 
						
					 
					
						2025-05-22 12:54:43 +01:00 
						 
				 
			
				
					
						
							
							
								Ewan Crawford 
							
						 
					 
					
						
						
							
						
						6b56a64690 
					 
					
						
						
							
							SYCL: Avoid using with SYCL-Graph for unsupported nodes ( #13587 )  
						
						... 
						
						
						
						Currently on a CUDA backend to SYCL when running
`GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there
are two operations that throw an exception from the blocking
waits during queue recording.
* `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187 
* `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 
We've noticed that `ggml-cuda.cu` has the
[check_node_graph_compatibility_and_refresh_copy_ops](39e73ae0d6/ggml/src/ggml-cuda/ggml-cuda.cu (L2458-L2458) 
						
						
							
 
						
					 
					
						2025-05-22 16:24:09 +08:00 
						 
				 
			
				
					
						
							
							
								Henry Linjamäki 
							
						 
					 
					
						
						
							
						
						a4e8912dfd 
					 
					
						
						
							
							opencl: Add support for multiple devices ( #12622 )  
						
						... 
						
						
						
						* opencl: Add support for multiple devices
... but limited to one platform. A platform with a GPU will be preferred.
Additionally:
* Filter out devices that lack capabilities needed by the backend
  implementation (half support, OpenCL 2.0+, etc).
* Make ggml_backend_opencl_reg() thread-safe.
* fixup: fix an error in sync_with_other_backends
... when there is only one OpenCL device available. 
						
						
							
 
						
					 
					
						2025-05-21 16:21:45 -07:00 
						 
				 
			
				
					
						
							
							
								Henry Linjamäki 
							
						 
					 
					
						
						
							
						
						edbf42edfd 
					 
					
						
						
							
							opencl: fix couple crashes ( #12795 )  
						
						... 
						
						
						
						* opencl: fix couple crashes
* fix kernel launches failed on devices which do not support
  non-uniform work-groups. When non-uniform work-groups are not
  supported, set `local_work_size` to NULL (= let driver choose the
  work-group sizes). This patch does not cover everything - just the
  cases tested by test-backend-ops.
* fix sub-buffer creation failed due to `cl_buffer_region::origin` not
  being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`.
* OpenCL: query non-uniform WG sizes only on OpenCL 3.0+ 
						
						
							
 
						
					 
					
						2025-05-21 13:21:17 -07:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						d643bb2c79 
					 
					
						
						
							
							releases : build CPU backend separately (windows) ( #13642 )  
						
						
						
						
							
 
						
					 
					
						2025-05-21 22:09:57 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8e186ef0e7 
					 
					
						
						
							
							hparams : support models for which all layers use SWA ( #13682 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-05-21 20:00:49 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5fbfe384d4 
					 
					
						
						
							
							server : improve error reporting ( #13680 )  
						
						
						
						
							
 
						
					 
					
						2025-05-21 19:46:56 +03:00 
						 
				 
			
				
					
						
							
							
								antichristHater 
							
						 
					 
					
						
						
							
						
						c76532e7ba 
					 
					
						
						
							
							convert : add qwen2vl support for unsloth merges ( #13686 )  
						
						
						
						
							
						
					 
					
						2025-05-21 18:40:35 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						2aa777d86d 
					 
					
						
						
							
							examples : switch retrieval to llama_encode ( #13685 )  
						
						... 
						
						
						
						* switch retrieval to llama_encode
* enable --no-warmup for retrieval 
						
						
							
 
						
					 
					
						2025-05-21 16:57:38 +02:00 
						 
				 
			
				
					
						
							
							
								Emmanuel Ferdman 
							
						 
					 
					
						
						
							
						
						eb0f5c28d3 
					 
					
						
						
							
							gguf-py : display the invalid gguf type ( #13687 )  
						
						... 
						
						
						
						Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com > 
						
						
							
						
					 
					
						2025-05-21 16:33:54 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						cf4cb59e64 
					 
					
						
						
							
							ggml : add ggml_gelu_erf() ( #13667 )  
						
						... 
						
						
						
						* ggml : add ggml_gelu_na (not approximated)
* fix naming order
* rename na --> erf
* apply review suggesions
* revert naming order 
						
						
							
 
						
					 
					
						2025-05-21 16:26:33 +02:00 
						 
				 
			
				
					
						
							
							
								Robin Davidsson 
							
						 
					 
					
						
						
							
						
						0d5c742161 
					 
					
						
						
							
							server : Add the endpoints /api/tags and /api/chat ( #13659 )  
						
						... 
						
						
						
						* Add the endpoints /api/tags and /api/chat
Add the endpoints /api/tags and /api/chat, and improved the model metadata response
* Remove trailing whitespaces
* Removed code that is not needed for copilot to work. 
						
						
							
 
						
					 
					
						2025-05-21 15:15:27 +02:00 
						 
				 
			
				
					
						
							
							
								Dorin-Andrei Geman 
							
						 
					 
					
						
						
							
						
						42158ae2e8 
					 
					
						
						
							
							server : fix first message identification ( #13634 )  
						
						... 
						
						
						
						* server : fix first message identification
When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626 ) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message.
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
* server : Fix checks for first role message for stream=True
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com >
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
---------
Signed-off-by: Dorin Geman <dorin.geman@docker.com >
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com > 
						
						
							
 
						
					 
					
						2025-05-21 15:07:57 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						797f2ac062 
					 
					
						
						
							
							kv-cache : simplify the interface ( #13660 )  
						
						... 
						
						
						
						* kv-cache : simplify the interface
ggml-ci
* context : revert llama_batch_allocr position change
ggml-ci 
						
						
							
 
						
					 
					
						2025-05-21 15:11:13 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						b44890df2e 
					 
					
						
						
							
							model : disable SWA for Phi models ( #13676 )  
						
						... 
						
						
						
						* model : disable SWA for Phi models
ggml-ci
* model : update warning message
* model : print warning only if n_swa > 0
* model : fix typo 
						
						
							
 
						
					 
					
						2025-05-21 13:09:21 +03:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						33983057d0 
					 
					
						
						
							
							musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )  
						
						... 
						
						
						
						* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Johannes Gäßler <johannesg@5d6.de > 
						
						
							
 
						
					 
					
						2025-05-21 09:58:49 +08:00 
						 
				 
			
				
					
						
							
							
								Eve 
							
						 
					 
					
						
						
							
						
						fb1cab201c 
					 
					
						
						
							
							vulkan: fix warnings ( #13626 )  
						
						... 
						
						
						
						* small fixes
* remove ifdef 
						
						
							
 
						
					 
					
						2025-05-20 21:35:16 +00:00 
						 
				 
			
				
					
						
							
							
								l3utterfly 
							
						 
					 
					
						
						
							
						
						b7a17463ec 
					 
					
						
						
							
							mtmd-helper : bug fix to token batching in mtmd ( #13650 )  
						
						... 
						
						
						
						* Update mtmd-helper.cpp
* Update tools/mtmd/mtmd-helper.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com >
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
							
 
						
					 
					
						2025-05-20 18:55:30 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						be0239693c 
					 
					
						
						
							
							model : fix llama4 graph ( #13663 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-05-20 19:21:04 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a4090d1174 
					 
					
						
						
							
							llama : remove llama_kv_cache_view API + remove deprecated ( #13653 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-05-20 16:13:16 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						b69f1647f9 
					 
					
						
						
							
							CUDA: skip fully masked-out KV in FA vec kernel ( #13584 )  
						
						... 
						
						
						
						* CUDA: skip fully masked-out KV in FA vec kernel 
						
						
							
 
						
					 
					
						2025-05-20 14:45:07 +02:00 
						 
				 
			
				
					
						
							
							
								Sigbjørn Skjæret 
							
						 
					 
					
						
						
							
						
						759e37b0d8 
					 
					
						
						
							
							tests : avoid github urls due to throttling ( #13654 )  
						
						
						
						
							
						
					 
					
						2025-05-20 12:03:17 +02:00 
						 
				 
			
				
					
						
							
							
								Svetlozar Georgiev 
							
						 
					 
					
						
						
							
						
						4245e622e0 
					 
					
						
						
							
							sycl: disable reorder for sycl mulmat ( #13536 )  
						
						
						
						
							
 
						
					 
					
						2025-05-20 11:34:15 +02:00 
						 
				 
			
				
					
						
							
							
								0cc4m 
							
						 
					 
					
						
						
							
						
						c9c64dee57 
					 
					
						
						
							
							Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output ( #13639 )  
						
						
						
						
							
 
						
					 
					
						2025-05-20 10:11:56 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c00a2634be 
					 
					
						
						
							
							metal : fix typo in FA kernel comments ( #13651 )  
						
						
						
						
							
 
						
					 
					
						2025-05-20 10:41:40 +03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						e298d2fbd0 
					 
					
						
						
							
							kv-cache : add SWA support ( #13194 )  
						
						... 
						
						
						
						* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci 
						
						
							
 
						
					 
					
						2025-05-20 08:05:46 +03:00 
						 
				 
			
				
					
						
							
							
								Xinpeng Dou 
							
						 
					 
					
						
						
							
						
						f0adb80bf7 
					 
					
						
						
							
							CANN: Update CANN model support ( #13162 )  
						
						... 
						
						
						
						* Update CANN model support status
* Update of model support
* update
* update
* update
* fix format of CANN.md
* fix format of CANN.md
* fix format of CANN.md 
						
						
							
						
					 
					
						2025-05-20 11:43:43 +08:00 
						 
				 
			
				
					
						
							
							
								Nicolò Scipione 
							
						 
					 
					
						
						
							
						
						f7c9429c85 
					 
					
						
						
							
							sycl : Overcoming workaround for mmap() allocation on Windows ( #13482 )  
						
						... 
						
						
						
						* Remove mmap workaround on windows
After some testing I found that mmap is supported on windows and for
many GPUs on Linux. Therefore I remove the workaround for windows since
it is not necessary.
* Update llama-bench README
SYCL backend introduced a workaround that allows execution of
llama-bench also without specifying `--mmp 0` flag 
						
						
							
 
						
					 
					
						2025-05-20 08:54:43 +08:00 
						 
				 
			
				
					
						
							
							
								psocolovsky 
							
						 
					 
					
						
						
							
						
						1dfbf2cf3a 
					 
					
						
						
							
							common : add load_progress_callback ( #13617 )  
						
						
						
						
							
 
						
					 
					
						2025-05-19 21:17:36 +02:00 
						 
				 
			
				
					
						
							
							
								0cc4m 
							
						 
					 
					
						
						
							
						
						8960efd0a6 
					 
					
						
						
							
							Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence ( #13607 )  
						
						
						
						
							
 
						
					 
					
						2025-05-19 17:54:08 +02:00 
						 
				 
			
				
					
						
							
							
								Alberto Cabrera Pérez 
							
						 
					 
					
						
						
							
						
						725f23f1f3 
					 
					
						
						
							
							sycl : backend documentation review ( #13544 )  
						
						... 
						
						
						
						* sycl: reviewing and updating docs
* Updates Runtime error codes
* Improves OOM troubleshooting entry
* Added a llama 3 sample
* Updated supported models
* Updated releases table 
						
						
							
						
					 
					
						2025-05-19 14:38:20 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						92ecdcc06a 
					 
					
						
						
							
							mtmd : add vision support for llama 4 ( #13282 )  
						
						... 
						
						
						
						* wip llama 4 conversion
* rm redundant __init__
* fix conversion
* fix conversion
* test impl
* try this
* reshape patch_embeddings_0
* fix view
* rm ffn_post_norm
* cgraph ok
* f32 for pos embd
* add image marker tokens
* Llama4UnfoldConvolution
* correct pixel shuffle
* fix merge conflicts
* correct
* add debug_graph
* logits matched, but it still preceives the image incorrectly
* fix style
* add image_grid_pinpoints
* handle llama 4 preprocessing
* rm load_image_size
* rm unused line
* fix
* small fix 2
* add test & docs
* fix llava-1.6 test
* test: add notion of huge models
* add comment
* add warn about degraded quality 
						
						
							
 
						
					 
					
						2025-05-19 13:04:14 +02:00 
						 
				 
			
				
					
						
							
							
								Alberto Cabrera Pérez 
							
						 
					 
					
						
						
							
						
						f71f40a284 
					 
					
						
						
							
							ci : upgraded oneAPI version in SYCL workflows and dockerfile ( #13532 )  
						
						
						
						
							
 
						
					 
					
						2025-05-19 11:46:09 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						d30cb5a7fa 
					 
					
						
						
							
							sync : ggml  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2025-05-19 13:29:56 +03:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						6c35981a64 
					 
					
						
						
							
							mnist: fix segmentation fault (ggml/1227)  
						
						
						
						
							
						
					 
					
						2025-05-19 13:29:56 +03:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						8b5e19aea6 
					 
					
						
						
							
							ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)  
						
						
						
						
							
						
					 
					
						2025-05-19 13:29:56 +03:00 
						 
				 
			
				
					
						
							
							
								Daniel Tang 
							
						 
					 
					
						
						
							
						
						60aea028b5 
					 
					
						
						
							
							ggml : Fix missing backtrace on Linux (ggml/1228)  
						
						... 
						
						
						
						* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1
* Fixed lldb attach
* Simplify by having the child do ggml_print_backtrace_symbols 
						
						
							
						
					 
					
						2025-05-19 13:29:56 +03:00 
						 
				 
			
				
					
						
							
							
								Nick 
							
						 
					 
					
						
						
							
						
						9c55e5c5c2 
					 
					
						
						
							
							fix: check model pointer validity before use ( #13631 )  
						
						
						
						
							
 
						
					 
					
						2025-05-19 13:25:41 +03:00 
						 
				 
			
				
					
						
							
							
								Chenguang Li 
							
						 
					 
					
						
						
							
						
						33d7aed4a8 
					 
					
						
						
							
							CANN: Support MOE Model MUL_MAT_ID ( #13042 )  
						
						... 
						
						
						
						Signed-off-by: noemotiovon <757486878@qq.com > 
						
						
							
 
						
					 
					
						2025-05-19 14:21:17 +08:00 
						 
				 
			
				
					
						
							
							
								Isaac McFadyen 
							
						 
					 
					
						
						
							
						
						6a2bc8bfb7 
					 
					
						
						
							
							server : added --no-prefill-assistant flag ( #13608 )  
						
						... 
						
						
						
						* added no-prefill-assistant flag
* reworded documentation comment
* updated server README.md 
						
						
							
 
						
					 
					
						2025-05-17 23:59:48 +02:00 
						 
				 
			
				
					
						
							
							
								Gilad S. 
							
						 
					 
					
						
						
							
						
						e3a7cf6c5b 
					 
					
						
						
							
							cmake: use the current build config for vulkan-shaders-gen ( #13595 )  
						
						... 
						
						
						
						* fix: use the current build config for `vulkan-shaders-gen`
* fix: only pass a valid build type to `--config` 
						
						
							
 
						
					 
					
						2025-05-17 15:26:43 -03:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						518329b2d4 
					 
					
						
						
							
							parallel : add option for non-shared and larger prompts ( #13598 )  
						
						... 
						
						
						
						* parallel : add option for non-shared and larger prompts
* parallel : update readme [no ci]
* cont : add note about base models [no ci]
* parallel : better var name
ggml-ci 
						
						
							
						
					 
					
						2025-05-17 12:58:55 +03:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						2f5a4e1e09 
					 
					
						
						
							
							vulkan: move common FA code to flash_attn_base.comp ( #13556 )  
						
						... 
						
						
						
						* vulkan: move common FA code to flash_attn_base.comp
* vulkan: move common FA index/stride setup code to flash_attn_base.comp
* build fix 
						
						
							
 
						
					 
					
						2025-05-17 09:14:55 +02:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						4f41ee11d6 
					 
					
						
						
							
							vulkan: use scalar FA rather than coopmat2 when N==1 ( #13554 )  
						
						
						
						
							
 
						
					 
					
						2025-05-17 08:35:47 +02:00 
						 
				 
			
				
					
						
							
							
								Z 
							
						 
					 
					
						
						
							
						
						3e0be1cace 
					 
					
						
						
							
							llguidance : official v0.7.20 release (no actual changes) [noci] ( #13594 )  
						
						
						
						
							
 
						
					 
					
						2025-05-16 22:56:28 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						6aa892ec2a 
					 
					
						
						
							
							server : do not return error out of context (with ctx shift disabled) ( #13577 )  
						
						
						
						
							
 
						
					 
					
						2025-05-16 21:50:00 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						aea9f8b4e7 
					 
					
						
						
							
							webui : improve accessibility for visually impaired people ( #13551 )  
						
						... 
						
						
						
						* webui : improve accessibility for visually impaired people
* add a11y for extra contents
* fix some labels being read twice
* add skip to main content 
						
						
							
						
					 
					
						2025-05-16 21:49:01 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						06c1e4abc1 
					 
					
						
						
							
							readme : add list of dependencies and their license ( #13591 )  
						
						
						
						
							
						
					 
					
						2025-05-16 20:04:18 +02:00