cmdr2 
							
						 
					 
					
						
						
							
						
						b64d7cc272 
					 
					
						
						
							
							cuda: unary ops as float + de-duplicate (ggml/1130)  
						
						
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								cmdr2 
							
						 
					 
					
						
						
							
						
						0cbee131ad 
					 
					
						
						
							
							cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129)  
						
						... 
						
						
						
						ggml-ci 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								cmdr2 
							
						 
					 
					
						
						
							
						
						87abb7e903 
					 
					
						
						
							
							cuda/cpu: Increase support for fp16 unary operations (ggml/1125)  
						
						... 
						
						
						
						* Support fp16 unary operations in the CUDA backend
* cpu: increase fp16 support for unary operators in the CPU backend
* cuda: increase fp16 support for unary operators in the CUDA backend
* Add test cases for fp16 unary operators
* metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing
* metal: fix PR comments for unary op support after fp16 unary tests 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						6d4c23b81b 
					 
					
						
						
							
							whisper : support GGML_BACKEND_DL (whisper/2843)  
						
						... 
						
						
						
						* whisper : support GGML_BACKEND_DL
* fix DTW crash
* whisper.objc : fix build - add ggml-cpp.h
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								midnight 
							
						 
					 
					
						
						
							
						
						6512a90037 
					 
					
						
						
							
							cmake : fix compile assumptions for power9/etc (whisper/2777)  
						
						... 
						
						
						
						* Add small comment re: VSX to readme
Co-authored-by: midnight <midnight@example.com > 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								petterreinholdtsen 
							
						 
					 
					
						
						
							
						
						4512055792 
					 
					
						
						
							
							Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)  
						
						... 
						
						
						
						It is used by Whisper talk-llama example.
Co-authored-by: Petter Reinholdtsen <pere@debian.org > 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								cmdr2 
							
						 
					 
					
						
						
							
						
						f54a4ba11e 
					 
					
						
						
							
							Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121)  
						
						... 
						
						
						
						* Support float16-to-float16 add/sub/mul/div operations in the CUDA backend
* Add fp16 support for add/sub/mul/div on the CPU backend
* Add test cases for fp16 add/sub/mul/div 
						
						
					 
					
						2025-03-03 18:18:11 +02:00 
						 
				 
			
				
					
						
							
							
								ag2s20150909 
							
						 
					 
					
						
						
							
						
						9660ffef58 
					 
					
						
						
							
							ggml : fix kleidiai build ( #12159 )  
						
						... 
						
						
						
						The libggml API has changed, but this has not been updated. 
						
						
					 
					
						2025-03-03 13:54:08 +01:00 
						 
				 
			
				
					
						
							
							
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						ece9745bb8 
					 
					
						
						
							
							SYCL: Move CPY kernels to a separate file and add few missing kernels ( #12133 )  
						
						... 
						
						
						
						* SYCL: refactor and move cpy kernels to a separate file
* Add few missing cpy kernels
* refactor and add debug logs 
						
						
					 
					
						2025-03-03 11:07:22 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						cc473cac7c 
					 
					
						
						
							
							ggml-backend : keep paths in native string type when possible ( #12144 )  
						
						
						
						
					 
					
						2025-03-02 22:11:00 +01:00 
						 
				 
			
				
					
						
							
							
								Erik Scholz 
							
						 
					 
					
						
						
							
						
						80c41ddd8f 
					 
					
						
						
							
							CUDA: compress mode option and default to size ( #12029 )  
						
						... 
						
						
						
						cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size". 
						
						
					 
					
						2025-03-01 12:57:22 +01:00 
						 
				 
			
				
					
						
							
							
								William Tambellini 
							
						 
					 
					
						
						
							
						
						70680c48e5 
					 
					
						
						
							
							ggml : upgrade init_tensor API to return a ggml_status ( #11854 )  
						
						... 
						
						
						
						* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <slarengh@gmail.com > 
						
						
					 
					
						2025-02-28 14:41:47 +01:00 
						 
				 
			
				
					
						
							
							
								Rémy O 
							
						 
					 
					
						
						
							
						
						438a83926a 
					 
					
						
						
							
							vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations ( #11595 )  
						
						... 
						
						
						
						* vulkan: implement specialized MMV kernels for IQ2 quantizations
* vulkan: add MMV kernels for IQ3 quants
* vulkan: Increase MMV batch size and unroll IQ LUT setup
* vulkan: fix init_iq_shmem for WG sizes larger than tables
* vulkan: common batch size for all I-quants 
						
						
					 
					
						2025-02-28 09:42:52 +01:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						9c42b1718c 
					 
					
						
						
							
							CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ ( #12098 )  
						
						
						
						
					 
					
						2025-02-28 09:26:43 +01:00 
						 
				 
			
				
					
						
							
							
								Prashant Vithule 
							
						 
					 
					
						
						
							
						
						05e6f5aad0 
					 
					
						
						
							
							ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot ( #12064 )  
						
						... 
						
						
						
						* Added SVE Support for Q2_K Quantized Models
* Use 4-space indentation in the switch cases
* removed comments lines
* Remove the loop Retain the curly bracess for better understanding of code
* Remove the comment like added for q3_k_q8_k kernel
---------
Co-authored-by: vithulep <p.m.vithule1517@gmail.com > 
						
						
					 
					
						2025-02-28 09:36:12 +02:00 
						 
				 
			
				
					
						
							
							
								hipudding 
							
						 
					 
					
						
						
							
						
						673cfef9aa 
					 
					
						
						
							
							CANN: Fix build error with GCC 13 ( #11990 )  
						
						... 
						
						
						
						Remove unused header file that causes compilation failure on ARM
platform with GCC 13. 
						
						
					 
					
						2025-02-28 15:23:47 +08:00 
						 
				 
			
				
					
						
							
							
								Eve 
							
						 
					 
					
						
						
							
						
						fbeda9002d 
					 
					
						
						
							
							vulkan: matmul dequantization improvements ( #12015 )  
						
						... 
						
						
						
						* faster dequant for old quants
* dont use unpack for iq4_nl
* vec2 unpack for q8 
						
						
					 
					
						2025-02-28 08:20:08 +01:00 
						 
				 
			
				
					
						
							
							
								Daniele 
							
						 
					 
					
						
						
							
						
						581650b7ca 
					 
					
						
						
							
							vulkan: improve im2col ( #11826 )  
						
						... 
						
						
						
						* vulkan: improve im2col performance 
						
						
					 
					
						2025-02-28 07:52:51 +01:00 
						 
				 
			
				
					
						
							
							
								Vladimir Vuksanovic 
							
						 
					 
					
						
						
							
						
						b95c8af37c 
					 
					
						
						
							
							cmake: Fix ggml backend dependencies and installation ( #11818 )  
						
						... 
						
						
						
						* Fix dependencies between ggml and backends
ggml backends link only to ggml-base and ggml links to all backends.
* Fix installation of ggml backends
Set up GNUInstallDirs before setting the installation directory of ggml backends 
						
						
					 
					
						2025-02-27 09:42:48 +02:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						a82c9e7c23 
					 
					
						
						
							
							vulkan: fix assertion when qy_needs_dequant ( #12068 )  
						
						... 
						
						
						
						Looks like a copy/paste bug from qx_needs_dequant. 
						
						
					 
					
						2025-02-25 16:30:21 +01:00 
						 
				 
			
				
					
						
							
							
								Judd 
							
						 
					 
					
						
						
							
						
						c132239bfb 
					 
					
						
						
							
							add OP sigmoid ( #12056 )  
						
						... 
						
						
						
						Co-authored-by: Judd <foldl@boxvest.com > 
						
						
					 
					
						2025-02-25 12:32:20 +01:00 
						 
				 
			
				
					
						
							
							
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						393fca629e 
					 
					
						
						
							
							ggml-cpu: Fix build with sve ( #12059 )  
						
						... 
						
						
						
						* ggml-cpu: Fix build with sve
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* ggml-cpu: Remove unused variable in sve q3_k vec dot
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
					 
					
						2025-02-25 19:28:22 +08:00 
						 
				 
			
				
					
						
							
							
								Rémy O 
							
						 
					 
					
						
						
							
						
						61d4f39dfe 
					 
					
						
						
							
							vulkan: implement more backpropagation operators ( #11914 )  
						
						... 
						
						
						
						* vulkan: implement GGML_OP_ROPE_BACK
* vulkan: implement GGML_OP_RMS_NORM_BACK
* vulkan: implement GGML_OP_SILU_BACK
* vulkan: implement GGML_OP_SOFTMAX_BACK 
						
						
					 
					
						2025-02-25 12:04:45 +01:00 
						 
				 
			
				
					
						
							
							
								Gian-Carlo Pascutto 
							
						 
					 
					
						
						
							
						
						58d07a8043 
					 
					
						
						
							
							metal : copy kernels for quant to F32/F16 conversions ( #12017 )  
						
						... 
						
						
						
						metal: use dequantize_q templates
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-25 11:27:58 +02:00 
						 
				 
			
				
					
						
							
							
								lhez 
							
						 
					 
					
						
						
							
						
						34a846b584 
					 
					
						
						
							
							opencl: fix for small models ( #11950 )  
						
						... 
						
						
						
						* opencl: fix small shape gemv, remove unused extensions
* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size
* opencl: fix for token length < 4
* opencl: use wave size of 64 for all Adreno GPUs
---------
Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com >
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com > 
						
						
					 
					
						2025-02-24 14:47:07 -07:00 
						 
				 
			
				
					
						
							
							
								Neo Zhang Jianyu 
							
						 
					 
					
						
						
							
						
						08d5986290 
					 
					
						
						
							
							[SYCL] Optimize mul_mat for Q4_0 on Intel GPU ( #12035 )  
						
						... 
						
						
						
						* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com > 
						
						
					 
					
						2025-02-24 22:33:23 +08:00 
						 
				 
			
				
					
						
							
							
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						8303e8b0fb 
					 
					
						
						
							
							SYCL: Fix GGML_SYCL_DEBUG macro ( #11995 )  
						
						
						
						
					 
					
						2025-02-24 10:18:25 +00:00 
						 
				 
			
				
					
						
							
							
								Aaron Teo 
							
						 
					 
					
						
						
							
						
						af7747c95a 
					 
					
						
						
							
							ggml-cpu: Support s390x SIMD Instruction Set ( #12019 )  
						
						... 
						
						
						
						* ggml: add s390x ARCH_FLAGS for compilation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add SIMD for s390x using vector intrinsics
SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16
SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix missing escape character in GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix s390x GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: full SIMD activation for F32,F16 s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add option to disable s390x VXE/VXE2
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: change vecintrin.h include to ggml-cpu-impl
* add __VXE__ and __VXE2__ macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* cmake: add s390x target detection for VX/VXE/VXE2
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: move s390x vector intrinsics to ggml-cpu-impl.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x Q8_0 SIMD
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: correct documentation for Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x reduce code complexity Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x bugfix typo Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activated for Q4_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x inline vec_reve
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q4_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add VXE backend feature
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: remove test.py
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for quantize_row_q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for quantize_row_q8_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for iq4_xs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: bugfix iq4_xs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for iq4_nl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add float, double, and long vector data type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: clean up iq4_xs SIMD
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix improper use of restrict keyword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: update warning message for ggml_vec_tbl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: switch to restrict for iq4_nl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: slight dot product speed improvement for q4_1_q8_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for q6_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add missing `_t` to ggml_int8x16x4_t
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix missing `_t` for ggml_vec_xl_s8x4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix more missing `_t`
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add unroll and prefetch to Q8_0
increase of 3.86% for prompt processing and 32.22% for token generation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: patch Q8_0 to use proper vector sizes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: optimise Q8_0 dot prod compute kernel further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add unroll and prefetch to Q4_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: refactor Q6_K variable naming for readability
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q6_K typos
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q5_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix wrong char*x16_t naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: Q5_K y0 wrong signness
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q4_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q4_K invalid vector intrinsics
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: simplify ggml_padd_s16 compute kernel
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: correct ggml-cpu vxe wording
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: change ggml_aligned_malloc alignment to 256
256 is the cache line size for s390x platforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: resolve pr merge via cherry-pick 225bbbfaaron.teo1@ibm.com >
* ggml : fix LoongArch compile error with 128-bit SIMD (#11701 )
* ggml: resolve pr merge via cherry-pick 4571953
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: cmake remove fork when determining s390x machine type
thank you @ericcurtin
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Jinyang He <hejinyang@loongson.cn >
Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com > 
						
						
					 
					
						2025-02-22 21:39:24 +00:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						a28e0d5eb1 
					 
					
						
						
							
							CUDA: app option to compile without FlashAttention ( #12025 )  
						
						
						
						
					 
					
						2025-02-22 20:44:34 +01:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						5fa07c2f93 
					 
					
						
						
							
							CUDA: optimize FA for GQA + large batches ( #12014 )  
						
						
						
						
					 
					
						2025-02-22 12:20:17 +01:00 
						 
				 
			
				
					
						
							
							
								Gian-Carlo Pascutto 
							
						 
					 
					
						
						
							
						
						d70908421f 
					 
					
						
						
							
							cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. ( #12000 )  
						
						
						
						
					 
					
						2025-02-22 09:43:24 +01:00 
						 
				 
			
				
					
						
							
							
								PureJourney 
							
						 
					 
					
						
						
							
						
						ecc8e3aeff 
					 
					
						
						
							
							CUDA: correct the lowest Maxwell supported by CUDA 12 ( #11984 )  
						
						... 
						
						
						
						* CUDA: correct the lowest Maxwell supported by CUDA 12
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de > 
						
						
					 
					
						2025-02-21 12:21:05 +01:00 
						 
				 
			
				
					
						
							
							
								Bodhi 
							
						 
					 
					
						
						
							
						
						0b3863ff95 
					 
					
						
						
							
							MUSA: support ARM64 and enable dp4a .etc ( #11843 )  
						
						... 
						
						
						
						* MUSA:  support ARM64 and enable __dp4a .etc
* fix cross entropy loss op for musa
* update
* add cc info log for musa
* add comment for the MUSA .cc calculation block
---------
Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com > 
						
						
					 
					
						2025-02-21 09:46:23 +02:00 
						 
				 
			
				
					
						
							
							
								Charles Xu 
							
						 
					 
					
						
						
							
						
						c5d91a7400 
					 
					
						
						
							
							ggml-cpu: Add CPU backend support for KleidiAI library ( #11390 )  
						
						... 
						
						
						
						* ggml-cpu: Add CPU backend support for KleidiAI library
* Add environmental variable GGML_KLEIDIAI_SME
* Add support for multithread LHS conversion
* Switch kernel selection order to dotprod and i8mm
* updates for review comments
* More updates for review comments
* Reorganize and rename KleidiAI files
* Move ggml-cpu-traits.h to source file
* Update cmake for SME build and add alignment for SME
* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list 
						
						
					 
					
						2025-02-20 15:06:51 +02:00 
						 
				 
			
				
					
						
							
							
								Prashant Vithule 
							
						 
					 
					
						
						
							
						
						4806498bf1 
					 
					
						
						
							
							ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot ( #11917 )  
						
						... 
						
						
						
						* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file
* Improved Formating of code in  ggml-cpu-quants.c file
* style : minor fixes
* style : less whitespaces
* style : ptr spaceing
---------
Co-authored-by: vithulep <p.m.vithule1517@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-20 12:08:32 +02:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						73e2ed3ce3 
					 
					
						
						
							
							CUDA: use async data loading for FlashAttention ( #11894 )  
						
						... 
						
						
						
						* CUDA: use async data loading for FlashAttention
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-02-17 14:03:24 +01:00 
						 
				 
			
				
					
						
							
							
								Rémy O 
							
						 
					 
					
						
						
							
						
						2eea03d86a 
					 
					
						
						
							
							vulkan: implement several ops relevant for ggml_opt ( #11769 )  
						
						... 
						
						
						
						* vulkan: support memset_tensor
* vulkan: support GGML_OP_SUM
* vulkan: implement GGML_OP_ARGMAX
* vulkan: implement GGML_OP_SUB
* vulkan: implement GGML_OP_COUNT_EQUAL
* vulkan: implement GGML_OP_OPT_STEP_ADAMW
* vulkan: fix check_results RWKV_WKV6 crash and memory leaks
* vulkan: implement GGML_OP_REPEAT_BACK
* tests: remove invalid test-backend-ops REPEAT_BACK tests
* vulkan: fix COUNT_EQUAL memset using a fillBuffer command 
						
						
					 
					
						2025-02-17 07:55:57 +01:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						bf42a23d0a 
					 
					
						
						
							
							vulkan: support multi/vision rope, and noncontiguous rope ( #11902 )  
						
						
						
						
					 
					
						2025-02-16 08:52:23 +01:00 
						 
				 
			
				
					
						
							
							
								Hale Chan 
							
						 
					 
					
						
						
							
						
						c2ea16f260 
					 
					
						
						
							
							metal : fix the crash caused by the lack of residency set support on Intel Macs. ( #11904 )  
						
						
						
						
					 
					
						2025-02-16 08:50:26 +02:00 
						 
				 
			
				
					
						
							
							
								Adrian Kretz 
							
						 
					 
					
						
						
							
						
						22885105a6 
					 
					
						
						
							
							metal : optimize dequant q6_K kernel ( #11892 )  
						
						
						
						
					 
					
						2025-02-15 20:39:20 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						68ff663a04 
					 
					
						
						
							
							repo : update links to new url ( #11886 )  
						
						... 
						
						
						
						* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci 
						
						
					 
					
						2025-02-15 16:40:57 +02:00 
						 
				 
			
				
					
						
							
							
								Rémy O 
							
						 
					 
					
						
						
							
						
						fc1b0d0936 
					 
					
						
						
							
							vulkan: initial support for IQ1_S and IQ1_M quantizations ( #11528 )  
						
						... 
						
						
						
						* vulkan: initial support for IQ1_S and IQ1_M quantizations
* vulkan: define MMV kernels for IQ1 quantizations
* devops: increase timeout of Vulkan tests again
* vulkan: simplify ifdef for init_iq_shmem 
						
						
					 
					
						2025-02-15 09:01:40 +01:00 
						 
				 
			
				
					
						
							
							
								lhez 
							
						 
					 
					
						
						
							
						
						300907b211 
					 
					
						
						
							
							opencl: Fix rope and softmax ( #11833 )  
						
						... 
						
						
						
						* opencl: fix `ROPE`
* opencl: fix `SOFT_MAX`
* Add fp16 variant
* opencl: enforce subgroup size for `soft_max` 
						
						
					 
					
						2025-02-14 12:12:23 -07:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						94b87f87b5 
					 
					
						
						
							
							cuda : add ampere to the list of default architectures ( #11870 )  
						
						
						
						
					 
					
						2025-02-14 15:33:52 +01:00 
						 
				 
			
				
					
						
							
							
								Jinyang He 
							
						 
					 
					
						
						
							
						
						38e32eb6a0 
					 
					
						
						
							
							ggml: optimize some vec dot functions for LoongArch ASX ( #11842 )  
						
						... 
						
						
						
						* Optimize ggml_vec_dot_q3_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q4_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q6_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q5_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q2_K_q8_K for LoongArch ASX
* Optimize mul_sum_i8_pairs_float for LoongArch ASX
* Optimize ggml_vec_dot_iq4_xs_q8_K for LoongArch ASX 
						
						
					 
					
						2025-02-14 10:54:27 +02:00 
						 
				 
			
				
					
						
							
							
								Eve 
							
						 
					 
					
						
						
							
						
						a4f011e8d0 
					 
					
						
						
							
							vulkan: linux builds + small subgroup size fixes ( #11767 )  
						
						... 
						
						
						
						* mm subgroup size
* upload vulkan x86 builds 
						
						
					 
					
						2025-02-14 02:59:40 +00:00 
						 
				 
			
				
					
						
							
							
								Jeffrey Morgan 
							
						 
					 
					
						
						
							
						
						8a8c4ceb60 
					 
					
						
						
							
							llamafile: use member variable instead of constant for iq4nlt ( #11780 )  
						
						
						
						
					 
					
						2025-02-13 18:05:04 +01:00 
						 
				 
			
				
					
						
							
							
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						bd6e55bfd3 
					 
					
						
						
							
							musa: bump MUSA SDK version to rc3.1.1  ( #11822 )  
						
						... 
						
						
						
						* musa: Update MUSA SDK version to rc3.1.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: Remove workaround in PR #10042 
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
					 
					
						2025-02-13 13:28:18 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						a394039db0 
					 
					
						
						
							
							ggml-cpu : add chunking support to mul_mat_id ( #11666 )  
						
						... 
						
						
						
						* ggml-cpu : add chunking support to mul_mat_id
* allocate chunk counter in wdata
parallelize src1 quantization by column to allows parallelization even when there is only one row
* disable for arm
* cleanup
* better way to disable for arm
* fix uninitialized counter when using 1 thread only
* revert test-backend-ops changes 
						
						
					 
					
						2025-02-13 01:02:38 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						be3bbd6215 
					 
					
						
						
							
							ggml : x2 speed for WASM by optimizing SIMD ( #11453 )  
						
						... 
						
						
						
						* ggml : x2 speed for WASM by optimizing SIMD
* fix bad merging
* rm trailing spaces
* rm redundant clamp
* better quantize_row_q8_K
Co-authored-by: camel-cdr <camel-cdr@protonmail.com >
* remove memset that causes buffer overflow
Co-authored-by: camel-cdr <camel-cdr@protonmail.com >
---------
Co-authored-by: camel-cdr <camel-cdr@protonmail.com > 
						
						
					 
					
						2025-02-13 00:33:45 +01:00