Aaron Teo 
							
						 
					 
					
						
						
							
						
						ff27f80a74 
					 
					
						
						
							
							ggml: initial IBM zDNN backend ( #14975 )  
						
						... 
						
						
						
						* ggml-zdnn: inital backend impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: tensor->extra logging check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add output buffer check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: run compute and store into tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more loggers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update set_tensor logging to check only for matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: last working matmul version
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add comments to prevent accidentally deleting lines
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: support op out_prod
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update op out_prod to use tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rewrite the backend implementation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bugfix new impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix compiler warnings and bugfixes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: test ztensor finding in init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: implement at least 1 op to test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: assign tensor->extra to buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add check for view tensors to prevent init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rework init_tensor to create new buffers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to std vector instead of array
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch buffers back and set to arbitrary number
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: impl init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update supports_op matmul matrix
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: impl matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix compiler error missing type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing data transform call
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: tighten memory usage, change string allocation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias ztensor and data free
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add bias data transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more debug info for extra buffer transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add logger to check if mat mul ops go through set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: activate bias transform in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move weights transform into mulmat
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add more safeguards in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix sequencing of transforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bugfix transform ztensor vs origtensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: figure out why sigtrap is happening
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move everything back to local declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: move bias data to local also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bring back working matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: rewrite into mre
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing vector import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing vector import in header
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt to fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix invalid ztensor buffer release
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add logging to debug free buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: remove free_buffer debug info
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add parmblkformat detections
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add nnpa installed detection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add zdnn_init call for static libs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* chore: add codeowners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: disable batched matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: attempt at fixing tensor views during matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: deny all view tensors directly
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix pr comments
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update ops docs for zdnn
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: redo test-backend-ops for ops.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-zdnn: fix typo in build-s390x.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* codeowners: remove taronaeo for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "codeowners: remove taronaeo for now"
This reverts commit 411ea4ed78aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com > 
						
						
					 
					
						2025-08-15 21:11:22 +08:00 
						 
				 
			
				
					
						
							
							
								rainred 
							
						 
					 
					
						
						
							
						
						cf9e5648a7 
					 
					
						
						
							
							mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. ( #14750 )  
						
						... 
						
						
						
						* Fix MinicpmV model converter and clip to avoid using hardcode.
* Code update for pr/14750
* Remove unused field, update script path in docs.
* Add version 5 for fallback code.
---------
Co-authored-by: lzhang <zhanglei@modelbest.cn > 
						
						
					 
					
						2025-08-11 16:12:12 +02:00 
						 
				 
			
				
					
						
							
							
								tc-mb 
							
						 
					 
					
						
						
							
						
						952a47f455 
					 
					
						
						
							
							mtmd : support MiniCPM-V 4.0 ( #14983 )  
						
						... 
						
						
						
						* support minicpm-v 4
* add md
* support MiniCPM-o 4.0
* add default location
* temp rm MiniCPM-o 4.0
* fix code
* fix "minicpmv_projector" default path 
						
						
					 
					
						2025-07-31 17:22:17 +02:00 
						 
				 
			
				
					
						
							
							
								hipudding 
							
						 
					 
					
						
						
							
						
						11490b3672 
					 
					
						
						
							
							CANN: Improve loading efficiency after converting weights to NZ format. ( #14985 )  
						
						... 
						
						
						
						* CANN: Improve loading efficiency after converting weights to NZ format.
* CANN: fix typo 
						
						
					 
					
						2025-07-31 19:47:20 +08:00 
						 
				 
			
				
					
						
							
							
								Xinpeng Dou 
							
						 
					 
					
						
						
							
						
						61550f8231 
					 
					
						
						
							
							CANN: update ops docs ( #14935 )  
						
						... 
						
						
						
						* CANN:add ops docs
* CANN: update ops docs 
						
						
					 
					
						2025-07-30 08:39:24 +08:00 
						 
				 
			
				
					
						
							
							
								lhez 
							
						 
					 
					
						
						
							
						
						8ad7b3e65b 
					 
					
						
						
							
							opencl : add ops docs ( #14910 )  
						
						
						
						
					 
					
						2025-07-28 18:50:17 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						00fa15fedc 
					 
					
						
						
							
							mtmd : add support for Voxtral ( #14862 )  
						
						... 
						
						
						
						* mtmd : add support for Voxtral
* clean up
* fix python requirements
* add [BEGIN_AUDIO] token
* also support Devstral conversion
* add docs and tests
* fix regression for ultravox
* minor coding style improvement
* correct project activation fn
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com >
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com > 
						
						
					 
					
						2025-07-28 15:01:48 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						a5771c9eea 
					 
					
						
						
							
							ops : update BLAS ( #14914 )  
						
						
						
						
					 
					
						2025-07-28 10:01:03 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						c35f9eaf09 
					 
					
						
						
							
							ops : update Metal ( #14912 )  
						
						
						
						
					 
					
						2025-07-28 08:22:56 +03:00 
						 
				 
			
				
					
						
							
							
								Ruben Ortlam 
							
						 
					 
					
						
						
							
						
						bf78f5439e 
					 
					
						
						
							
							vulkan: add ops docs ( #14900 )  
						
						
						
						
					 
					
						2025-07-27 15:33:08 +02:00 
						 
				 
			
				
					
						
							
							
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						bbfc849274 
					 
					
						
						
							
							SYCL: add ops doc ( #14901 )  
						
						
						
						
					 
					
						2025-07-27 17:52:58 +05:30 
						 
				 
			
				
					
						
							
							
								Aman Gupta 
							
						 
					 
					
						
						
							
						
						446595b9b3 
					 
					
						
						
							
							Docs: add instructions for adding backends ( #14889 )  
						
						
						
						
					 
					
						2025-07-27 09:36:43 +08:00 
						 
				 
			
				
					
						
							
							
								Aaron Teo 
							
						 
					 
					
						
						
							
						
						c7f3169cd5 
					 
					
						
						
							
							ggml-cpu : disable GGML_NNPA by default due to instability ( #14880 )  
						
						... 
						
						
						
						* docs: update s390x document for sentencepiece
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit e086c5e3a7aaron.teo1@ibm.com >
(cherry picked from commit 8410b085eafixes  #14877 
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 412f4c7c88aaron.teo1@ibm.com >
(cherry picked from commit c1eeae1d0caaron.teo1@ibm.com > 
						
						
					 
					
						2025-07-25 19:09:03 +02:00 
						 
				 
			
				
					
						
							
							
								wooksong 
							
						 
					 
					
						
						
							
						
						e7fecba934 
					 
					
						
						
							
							docs : update HOWTO‑add‑model.md for ModelBase and new model classes ( #14874 )  
						
						... 
						
						
						
						This patch updates the example in docs/development/HOWTO-add-model.md to
reflect recent changes after `TextModel` and `MmprojModel` were introduced.
It replaces the outdated `Model` base class with `TextModel` or `MmprojModel`
and updates the registration example accordingly.
Signed-off-by: Wook Song <wook16.song@samsung.com > 
						
						
					 
					
						2025-07-25 16:25:05 +02:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						3f4fc97f1d 
					 
					
						
						
							
							musa: upgrade musa sdk to rc4.2.0 ( #14498 )  
						
						... 
						
						
						
						* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
					 
					
						2025-07-24 20:05:37 +01:00 
						 
				 
			
				
					
						
							
							
								Pouya 
							
						 
					 
					
						
						
							
						
						39cffdf188 
					 
					
						
						
							
							docs: add libcurl-dev install hint for Linux distros ( #14801 )  
						
						... 
						
						
						
						* docs: add libcurl-dev install hint for Linux distros
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
* Update docs/build.md
---------
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-07-24 11:26:44 +02:00 
						 
				 
			
				
					
						
							
							
								rspOverflow 
							
						 
					 
					
						
						
							
						
						b526ad2668 
					 
					
						
						
							
							Documentation: Further revisions to the Vulkan section in build.md ( #14785 )  
						
						... 
						
						
						
						* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.
* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md 
						
						
					 
					
						2025-07-20 18:55:32 +02:00 
						 
				 
			
				
					
						
							
							
								rspOverflow 
							
						 
					 
					
						
						
							
						
						f0d4d176df 
					 
					
						
						
							
							Documentation: Update build.md's Vulkan section ( #14736 )  
						
						... 
						
						
						
						* Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation.
* Documentation: Reorganize build.md's Vulkan section. 
						
						
					 
					
						2025-07-19 12:18:36 +02:00 
						 
				 
			
				
					
						
							
							
								Reese Levine 
							
						 
					 
					
						
						
							
						
						21c021745d 
					 
					
						
						
							
							ggml: Add initial WebGPU backend ( #14521 )  
						
						... 
						
						
						
						* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults
* Initialize webgpu device
* Making progress on setting up the backend
* Finish more boilerplate/utility functions
* Organize file and work on alloc buffer
* Add webgpu_context to prepare for actually running some shaders
* Work on memset and add shader loading
* Work on memset polyfill
* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it
* Implement get_tensor and buffer_clear
* Finish rest of setup
* Start work on compute graph
* Basic mat mul working
* Work on emscripten build
* Basic WebGPU backend instructions
* Use EMSCRIPTEN flag
* Work on passing ci, implement 4d tensor multiplication
* Pass thread safety test
* Implement permuting for mul_mat and cpy
* minor cleanups
* Address feedback
* Remove division by type size in cpy op
* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends
* Fix name
* Fix macos dawn prefix path 
						
						
					 
					
						2025-07-16 18:18:51 +03:00 
						 
				 
			
				
					
						
							
							
								Aman Gupta 
							
						 
					 
					
						
						
							
						
						11ee0fea2a 
					 
					
						
						
							
							Docs: script to auto-generate ggml operations docs ( #14598 )  
						
						... 
						
						
						
						* Docs: script to auto-generate ggml operations docs
* Review: formatting changes + change github action
* Use built-in types instead of typing
* docs : add BLAS and Metal ops
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-07-10 23:29:01 +08:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						08382869a2 
					 
					
						
						
							
							model : add SmolLM3 ( #14581 )  
						
						... 
						
						
						
						* Init - first pass.
* Model -> ModelBase.
* fix errors in conversion.
* Update the graph.
* up.
* up.
* wip
* cgraph ok
* rm redundant code
---------
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com > 
						
						
					 
					
						2025-07-08 18:07:01 +02:00 
						 
				 
			
				
					
						
							
							
								Grzegorz Grasza 
							
						 
					 
					
						
						
							
						
						1b2aaf28ac 
					 
					
						
						
							
							Add Vulkan images to docker.md ( #14472 )  
						
						... 
						
						
						
						Right now it's not easy to find those. 
						
						
					 
					
						2025-07-01 15:44:11 +02:00 
						 
				 
			
				
					
						
							
							
								Aaron Teo 
							
						 
					 
					
						
						
							
						
						bf5bcd0b85 
					 
					
						
						
							
							docs: update s390x documentation + add faq ( #14389 )  
						
						... 
						
						
						
						* docs: update s390x documentation + add faq
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add s390x z17 build q&a
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com > 
						
						
					 
					
						2025-06-26 12:41:41 +02:00 
						 
				 
			
				
					
						
							
							
								Aaron Teo 
							
						 
					 
					
						
						
							
						
						60ef23d6c1 
					 
					
						
						
							
							ggml-cpu: enable IBM NNPA Vector Intrinsics ( #14317 )  
						
						... 
						
						
						
						* ggml-cpu: add nnpa compile flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 4a9f60c201aaron.teo1@ibm.com >
(cherry picked from commit 8d4a7987f9aaron.teo1@ibm.com >
(cherry picked from commit 0ff0d65162aaron.teo1@ibm.com >
(cherry picked from commit 2f58bbcbb8aaron.teo1@ibm.com >
(cherry picked from commit 01b929491baaron.teo1@ibm.com >
* ggml-cpu: fix print vs printf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix float placeholder
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: ensure fp16 and fp32 load and stores are called
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fp16 load ensured to hit
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove sigint from fp16 store
for some reason, the function is not getting a hit when debugged with
    gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: nnpa switch to vec_xst test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to vec_xst for 4 element loops also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: rework noop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove noop, general code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarify variable naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add breakpoint for debugging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: test fix for conversion failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: disable fp32->fp16 nnpa conversions for now
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to elif macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix typo
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix compiler types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: change to typedef vector types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add 4 element loops for fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarified vector naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back fp32->fp16 store nnpa
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add nnpa macro check in ggml-impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add missing __func__
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: diagnose why __NNPA__ macro is not being defined
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: import vecintrin.h to fix compiler errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: update macro tests
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 157f856c34aaron.teo1@ibm.com >
* ggml-cpu: switch to importing ggml-cpu-impl instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix macro declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: test more macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add debug prints
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bruteforce macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add ggml-impl.h to cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to private macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 157f856c34aaron.teo1@ibm.com >
* ggml-cpu: bring back compile macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: switch to quotes for import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add compiler error macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add s390x detection in ggml-src
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: bring back compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: undo cmakelists work
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 18d79e1a30aaron.teo1@ibm.com >
* ggml-cpu: remove typedefs.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove typedef from cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add ggml-impl.h future notes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: add todo comment for future reference
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: clarify naming of dlf16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove unnecessary target compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update broken huggingface link for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix duplicate func names during compile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: fix duplicate func names during compile"
This reverts commit fbb733451faaron.teo1@ibm.com >
* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"
This reverts commit bd288e8fa5aaron.teo1@ibm.com >
* ggml: refactor fp16<->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix missing simd-mappings.h import in quants.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix missing simd-mappings.h within repack
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix amx mmq missing simd-mappings.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: attempt at fixing loongarch failing build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move nnpa together with other fp16<->fp32 simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: fix wrong refactor of ggml-base
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555 
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: remove dependency on ggml-cpu from ggml-base
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406 
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: remove mistaken fallback macro
fallback logic was already implemented but i was too sleepy to realise
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006 
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"
This reverts commit 32a3533564aaron.teo1@ibm.com >
* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"
This reverts commit 9e40d984adaaron.teo1@ibm.com >
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006 
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
(cherry picked from commit 9e40d984adaaron.teo1@ibm.com >
* ggml-cpu: extern c ggml_table_f32_f16 + chore docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h
we rely on the variable declaration in ggml-cpu.c instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"
This reverts commit f71b21d2f7aaron.teo1@ibm.com >
* ggml-cpu: bring back ggml_table_f32_f16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* Revert "ggml-cpu: bring back ggml_table_f32_f16"
This reverts commit 2dce119178aaron.teo1@ibm.com >
* fix ggml time initialization
* fix f32_f16 table init
* remove extra line
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: slaren <slarengh@gmail.com > 
						
						
					 
					
						2025-06-25 23:49:04 +02:00 
						 
				 
			
				
					
						
							
							
								Anton Mitkov 
							
						 
					 
					
						
						
							
						
						2bf9d539dd 
					 
					
						
						
							
							sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices ( #13973 )  
						
						
						
						
					 
					
						2025-06-25 18:09:55 +02:00 
						 
				 
			
				
					
						
							
							
								David Chiu 
							
						 
					 
					
						
						
							
						
						d860dd99a4 
					 
					
						
						
							
							docs : fix the link to llama.h ( #14293 )  
						
						
						
						
					 
					
						2025-06-20 19:43:35 +02:00 
						 
				 
			
				
					
						
							
							
								Aaron Teo 
							
						 
					 
					
						
						
							
						
						8d94713654 
					 
					
						
						
							
							docs: add s390x build documentation ( #14264 )  
						
						... 
						
						
						
						* docs: add s390x-specific build docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add s390x model conversion steps
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x build indent
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update hyperlinks for s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: update llama.h docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x add accelerator and perf optimizations
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x indent blocks
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: revert block indentation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: add support information for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: remove indentation for accelerator section s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: remove redundant words s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: reword for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: s390x reword simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* docs: fix trailing whitespace for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com > 
						
						
					 
					
						2025-06-18 18:10:26 +01:00 
						 
				 
			
				
					
						
							
							
								Pepijn de Vos 
							
						 
					 
					
						
						
							
						
						00ba772610 
					 
					
						
						
							
							docs : remove WIP since PR has been merged ( #13912 )  
						
						
						
						
					 
					
						2025-06-15 08:06:37 +02:00 
						 
				 
			
				
					
						
							
							
								ddpasa 
							
						 
					 
					
						
						
							
						
						26ff3685bf 
					 
					
						
						
							
							docs : Update multimodal.md ( #14122 )  
						
						... 
						
						
						
						* Update multimodal.md
* Update multimodal.md 
						
						
					 
					
						2025-06-13 15:17:53 +02:00 
						 
				 
			
				
					
						
							
							
								Xinpeng Dou 
							
						 
					 
					
						
						
							
						
						e21d2d4ae2 
					 
					
						
						
							
							CANN: Simplify the environment variable setting( #13104 )  
						
						... 
						
						
						
						* Simplify the environment variable setting to specify the memory pool type.
* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.
* update
* fix CI
* update
* delete whitespace
* fix according to review
* update CANN.md
* update CANN.md 
						
						
					 
					
						2025-06-09 19:47:39 +08:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						ea1431b0fa 
					 
					
						
						
							
							docs : add "Quick start" section for new users ( #13862 )  
						
						... 
						
						
						
						* docs : add "Quick start" section for non-technical users
* rm flox
* Update README.md 
						
						
					 
					
						2025-06-03 13:09:36 +02:00 
						 
				 
			
				
					
						
							
							
								Jiří Podivín 
							
						 
					 
					
						
						
							
						
						b3a89c3d9e 
					 
					
						
						
							
							docs : Note about necessity of having libcurl installed for standard build. ( #13945 )  
						
						... 
						
						
						
						Signed-off-by: Jiri Podivin <jpodivin@gmail.com > 
						
						
					 
					
						2025-05-31 18:58:35 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						bc583e3c63 
					 
					
						
						
							
							mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) ( #13784 )  
						
						... 
						
						
						
						* mtmd : allow multiple modalities at the same time
* refactor mtmd tokenizer
* fix compile
* ok, missing SinusoidsPositionEmbedding
* first working version
* fix style
* more strict validate of n_embd
* refactor if..else to switch
* fix regression
* add test for 3B
* update docs
* fix tokenizing with add_special
* add more tests
* fix test case "huge"
* rm redundant code
* set_position_mrope_1d rm n_tokens 
						
						
					 
					
						2025-05-27 14:06:10 +02:00 
						 
				 
			
				
					
						
							
							
								bandoti 
							
						 
					 
					
						
						
							
						
						72b090da2c 
					 
					
						
						
							
							docs: remove link for llama-cli function calling ( #13810 )  
						
						
						
						
					 
					
						2025-05-27 08:52:40 -03:00 
						 
				 
			
				
					
						
							
							
								Bizhao Shi 
							
						 
					 
					
						
						
							
						
						2d38b6e400 
					 
					
						
						
							
							CANN: Add the basic supports of Flash Attention kernel ( #13627 )  
						
						... 
						
						
						
						* cann: add the basic FA support
* cann: update the readme
* cann: update the FlashAttention with PSEShift
* cann: update the input parameters in FA
* cann: update the alibi with max_bias
* cann: add the constrints of softcap
* cann: update the docs CANN.md
* cann: update the docs CANN.md
* cann: fix typo of CANN.md
* cann: add some comments and update the CANN.md
* cann: update the CANN.md
* cann: update the inner precise for fusedInferAttention
* cann: update the constraints of flash_attn_ext on ggml-cann.cpp
* cann: clean the whitespace
* cann: clean the whitespace
* cann: add a new endline 
						
						
					 
					
						2025-05-26 10:20:18 +08:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						40aaa8a403 
					 
					
						
						
							
							mtmd : add support for Qwen2-Audio and SeaLLM-Audio ( #13760 )  
						
						... 
						
						
						
						* mtmd : add Qwen2-Audio support
* small clean up
* update discussion link
* clarify mtmd_get_output_embd
* clarification in multimodal.md
* fix ultravox bug
* ggml_cont 
						
						
					 
					
						2025-05-25 14:06:32 +02:00 
						 
				 
			
				
					
						
							
							
								ddpasa 
							
						 
					 
					
						
						
							
						
						a08c1d2845 
					 
					
						
						
							
							docs : add Moondream2 pre-quantized link ( #13745 )  
						
						... 
						
						
						
						* Multimodal: Added Moondream2 model and fixed ggml.org link
* Apply suggestions from code review
---------
Co-authored-by: name <none@none.com >
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-05-25 14:04:49 +02:00 
						 
				 
			
				
					
						
							
							
								Olivier Chafik 
							
						 
					 
					
						
						
							
						
						f5cd27b71d 
					 
					
						
						
							
							server: streaming of tool calls and thoughts when --jinja is on (#12379 )  
						
						... 
						
						
						
						* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com >
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com > 
						
						
					 
					
						2025-05-25 01:48:08 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						797990c4bc 
					 
					
						
						
							
							mtmd : add ultravox audio input ( #13623 )  
						
						... 
						
						
						
						* convert ok, load ok
* warmup ok
* test
* still does not work?
* fix padding
* temporary give up
* fix merge conflict
* build_ultravox()
* rm test
* fix merge conflict
* add necessary mtmd APIs
* first working version (only 4s of audio)
* will this monster compile?
* fix compile
* please compile
* fPIC
* fix windows
* various fixes
* clean up audio_helpers
* fix conversion
* add some debug stuff
* long audio input ok
* adapt the api
* add --audio arg
* final touch UX
* add miniaudio to readme
* fix typo
* refactor kv metadata
* mtmd_default_marker() 
						
						
					 
					
						2025-05-22 20:42:48 +02:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						33983057d0 
					 
					
						
						
							
							musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )  
						
						... 
						
						
						
						* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
Co-authored-by: Johannes Gäßler <johannesg@5d6.de > 
						
						
					 
					
						2025-05-21 09:58:49 +08:00 
						 
				 
			
				
					
						
							
							
								Xinpeng Dou 
							
						 
					 
					
						
						
							
						
						f0adb80bf7 
					 
					
						
						
							
							CANN: Update CANN model support ( #13162 )  
						
						... 
						
						
						
						* Update CANN model support status
* Update of model support
* update
* update
* update
* fix format of CANN.md
* fix format of CANN.md
* fix format of CANN.md 
						
						
					 
					
						2025-05-20 11:43:43 +08:00 
						 
				 
			
				
					
						
							
							
								Alberto Cabrera Pérez 
							
						 
					 
					
						
						
							
						
						725f23f1f3 
					 
					
						
						
							
							sycl : backend documentation review ( #13544 )  
						
						... 
						
						
						
						* sycl: reviewing and updating docs
* Updates Runtime error codes
* Improves OOM troubleshooting entry
* Added a llama 3 sample
* Updated supported models
* Updated releases table 
						
						
					 
					
						2025-05-19 14:38:20 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						92ecdcc06a 
					 
					
						
						
							
							mtmd : add vision support for llama 4 ( #13282 )  
						
						... 
						
						
						
						* wip llama 4 conversion
* rm redundant __init__
* fix conversion
* fix conversion
* test impl
* try this
* reshape patch_embeddings_0
* fix view
* rm ffn_post_norm
* cgraph ok
* f32 for pos embd
* add image marker tokens
* Llama4UnfoldConvolution
* correct pixel shuffle
* fix merge conflicts
* correct
* add debug_graph
* logits matched, but it still preceives the image incorrectly
* fix style
* add image_grid_pinpoints
* handle llama 4 preprocessing
* rm load_image_size
* rm unused line
* fix
* small fix 2
* add test & docs
* fix llava-1.6 test
* test: add notion of huge models
* add comment
* add warn about degraded quality 
						
						
					 
					
						2025-05-19 13:04:14 +02:00 
						 
				 
			
				
					
						
							
							
								Łukasz Ślusarczyk 
							
						 
					 
					
						
						
							
						
						9c404ed54c 
					 
					
						
						
							
							sycl: use oneDNN for matrices multiplication ( #12972 )  
						
						
						
						
					 
					
						2025-05-15 16:53:41 +02:00 
						 
				 
			
				
					
						
							
							
								ddpasa 
							
						 
					 
					
						
						
							
						
						21ca987fba 
					 
					
						
						
							
							docs: Update link to ggml-org in multimodal.md ( #13513 )  
						
						... 
						
						
						
						* Update multimodal.md
Minor change to include the huggingface link
* Update docs/multimodal.md
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com > 
						
						
					 
					
						2025-05-14 09:59:12 +02:00 
						 
				 
			
				
					
						
							
							
								Thomas Germer 
							
						 
					 
					
						
						
							
						
						62d4250e52 
					 
					
						
						
							
							docs : Fix typo in InternVL3 model name ( #13440 )  
						
						
						
						
					 
					
						2025-05-10 22:26:46 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						3b24d26c22 
					 
					
						
						
							
							server : update docs ( #13432 )  
						
						
						
						
					 
					
						2025-05-10 18:44:49 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						053367d149 
					 
					
						
						
							
							mtmd : support InternVL 2.5 and 3 ( #13422 )  
						
						... 
						
						
						
						* convert : internvl support
* InternVL3-1B working
* fix regression
* rm mobilevlm from test
* fix conversion
* add test for internvl
* add to list of pre-quant
* restore boi/eoi check
* add clarify comment for norm eps 
						
						
					 
					
						2025-05-10 16:26:42 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						33eff40240 
					 
					
						
						
							
							server : vision support via libmtmd ( #12898 )  
						
						... 
						
						
						
						* server : (experimental) vision support via libmtmd
* mtmd : add more api around mtmd_image_tokens
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* abstract out the batch management
* small fix
* refactor logic adding tokens to batch
* implement hashing image
* use FNV hash, now hash bitmap instead of file data
* allow decoding image embedding to be split into batches
* rm whitespace
* disable some features when mtmd is on
* fix --no-mmproj-offload
* mtmd_context_params no timings
* refactor server_inp to server_tokens
* fix the failing test case
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* improve server_input struct
* clip :  fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
* fix detokenize
* add const to various places
* add warning about breaking changes
* add c api
* helper: use mtmd_image_tokens_get_n_pos
* fix ctx_shift
* fix name shadowing
* more strict condition
* support remote image_url
* remote image_url log
* add CI test
* do not log base64
* add "has_multimodal" to /props
* remove dangling image
* speculative: use slot.cache_tokens.insert
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* rm can_be_detokenized
* on prmpt processing done, assert cache_tokens.size
* handle_completions_impl returns void
* adapt the new web ui
* update docs and hot topics
* rm assert
* small fix (2)
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-05-09 19:29:37 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						9b61acf060 
					 
					
						
						
							
							mtmd : rename llava directory to mtmd ( #13311 )  
						
						... 
						
						
						
						* mv llava to mtmd
* change ref everywhere 
						
						
					 
					
						2025-05-05 16:02:55 +02:00