William Tambellini 
							
						 
					 
					
						
						
							
						
						70680c48e5 
					 
					
						
						
							
							ggml : upgrade init_tensor API to return a ggml_status ( #11854 )  
						
						 
						
						... 
						
						
						
						* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <slarengh@gmail.com > 
						
						
					 
					
						2025-02-28 14:41:47 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Rémy O 
							
						 
					 
					
						
						
							
						
						438a83926a 
					 
					
						
						
							
							vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations ( #11595 )  
						
						 
						
						... 
						
						
						
						* vulkan: implement specialized MMV kernels for IQ2 quantizations
* vulkan: add MMV kernels for IQ3 quants
* vulkan: Increase MMV batch size and unroll IQ LUT setup
* vulkan: fix init_iq_shmem for WG sizes larger than tables
* vulkan: common batch size for all I-quants 
						
						
					 
					
						2025-02-28 09:42:52 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						9c42b1718c 
					 
					
						
						
							
							CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ ( #12098 )  
						
						 
						
						
						
						
					 
					
						2025-02-28 09:26:43 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Prashant Vithule 
							
						 
					 
					
						
						
							
						
						05e6f5aad0 
					 
					
						
						
							
							ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot ( #12064 )  
						
						 
						
						... 
						
						
						
						* Added SVE Support for Q2_K Quantized Models
* Use 4-space indentation in the switch cases
* removed comments lines
* Remove the loop Retain the curly bracess for better understanding of code
* Remove the comment like added for q3_k_q8_k kernel
---------
Co-authored-by: vithulep <p.m.vithule1517@gmail.com > 
						
						
					 
					
						2025-02-28 09:36:12 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								hipudding 
							
						 
					 
					
						
						
							
						
						673cfef9aa 
					 
					
						
						
							
							CANN: Fix build error with GCC 13 ( #11990 )  
						
						 
						
						... 
						
						
						
						Remove unused header file that causes compilation failure on ARM
platform with GCC 13. 
						
						
					 
					
						2025-02-28 15:23:47 +08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Eve 
							
						 
					 
					
						
						
							
						
						fbeda9002d 
					 
					
						
						
							
							vulkan: matmul dequantization improvements ( #12015 )  
						
						 
						
						... 
						
						
						
						* faster dequant for old quants
* dont use unpack for iq4_nl
* vec2 unpack for q8 
						
						
					 
					
						2025-02-28 08:20:08 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Daniele 
							
						 
					 
					
						
						
							
						
						581650b7ca 
					 
					
						
						
							
							vulkan: improve im2col ( #11826 )  
						
						 
						
						... 
						
						
						
						* vulkan: improve im2col performance 
						
						
					 
					
						2025-02-28 07:52:51 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						a82c9e7c23 
					 
					
						
						
							
							vulkan: fix assertion when qy_needs_dequant ( #12068 )  
						
						 
						
						... 
						
						
						
						Looks like a copy/paste bug from qx_needs_dequant. 
						
						
					 
					
						2025-02-25 16:30:21 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Judd 
							
						 
					 
					
						
						
							
						
						c132239bfb 
					 
					
						
						
							
							add OP sigmoid ( #12056 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Judd <foldl@boxvest.com > 
						
						
					 
					
						2025-02-25 12:32:20 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Molly Sophia 
							
						 
					 
					
						
						
							
						
						393fca629e 
					 
					
						
						
							
							ggml-cpu: Fix build with sve ( #12059 )  
						
						 
						
						... 
						
						
						
						* ggml-cpu: Fix build with sve
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
* ggml-cpu: Remove unused variable in sve q3_k vec dot
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com > 
						
						
					 
					
						2025-02-25 19:28:22 +08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Rémy O 
							
						 
					 
					
						
						
							
						
						61d4f39dfe 
					 
					
						
						
							
							vulkan: implement more backpropagation operators ( #11914 )  
						
						 
						
						... 
						
						
						
						* vulkan: implement GGML_OP_ROPE_BACK
* vulkan: implement GGML_OP_RMS_NORM_BACK
* vulkan: implement GGML_OP_SILU_BACK
* vulkan: implement GGML_OP_SOFTMAX_BACK 
						
						
					 
					
						2025-02-25 12:04:45 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Gian-Carlo Pascutto 
							
						 
					 
					
						
						
							
						
						58d07a8043 
					 
					
						
						
							
							metal : copy kernels for quant to F32/F16 conversions ( #12017 )  
						
						 
						
						... 
						
						
						
						metal: use dequantize_q templates
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-25 11:27:58 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								lhez 
							
						 
					 
					
						
						
							
						
						34a846b584 
					 
					
						
						
							
							opencl: fix for small models ( #11950 )  
						
						 
						
						... 
						
						
						
						* opencl: fix small shape gemv, remove unused extensions
* opencl: fix `transpose_16`, `dump_tensor`, enforce subgroup size
* opencl: fix for token length < 4
* opencl: use wave size of 64 for all Adreno GPUs
---------
Co-authored-by: Shawn Gu <quic_shawngu@quicinc.com >
Co-authored-by: Skyler Szot <quic_sszot@quicinc.com > 
						
						
					 
					
						2025-02-24 14:47:07 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Neo Zhang Jianyu 
							
						 
					 
					
						
						
							
						
						08d5986290 
					 
					
						
						
							
							[SYCL] Optimize mul_mat for Q4_0 on Intel GPU ( #12035 )  
						
						 
						
						... 
						
						
						
						* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com > 
						
						
					 
					
						2025-02-24 22:33:23 +08:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						8303e8b0fb 
					 
					
						
						
							
							SYCL: Fix GGML_SYCL_DEBUG macro ( #11995 )  
						
						 
						
						
						
						
					 
					
						2025-02-24 10:18:25 +00:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Aaron Teo 
							
						 
					 
					
						
						
							
						
						af7747c95a 
					 
					
						
						
							
							ggml-cpu: Support s390x SIMD Instruction Set ( #12019 )  
						
						 
						
						... 
						
						
						
						* ggml: add s390x ARCH_FLAGS for compilation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add SIMD for s390x using vector intrinsics
SIMD is activated for:
* ggml_vec_dot_f32
* ggml_vec_dot_f16
* ggml_vec_mad_f32
* ggml_vec_mad_f16
* ggml_vec_mad_f32_unroll
* ggml_vec_scale_f32
* ggml_vec_scale_f16
SIMD is NOT activated for:
* ggml_vec_dot_f16_unroll (pending bugfix)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix missing escape character in GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix s390x GGML_F32x4_REDUCE
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: full SIMD activation for F32,F16 s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add option to disable s390x VXE/VXE2
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: change vecintrin.h include to ggml-cpu-impl
* add __VXE__ and __VXE2__ macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* cmake: add s390x target detection for VX/VXE/VXE2
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: move s390x vector intrinsics to ggml-cpu-impl.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x Q8_0 SIMD
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: correct documentation for Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x reduce code complexity Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x bugfix typo Q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activated for Q4_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x inline vec_reve
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q4_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add VXE backend feature
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: remove test.py
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for quantize_row_q8_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for quantize_row_q8_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for iq4_xs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: bugfix iq4_xs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for iq4_nl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add float, double, and long vector data type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: clean up iq4_xs SIMD
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix improper use of restrict keyword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: update warning message for ggml_vec_tbl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: switch to restrict for iq4_nl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: slight dot product speed improvement for q4_1_q8_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for q6_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add missing `_t` to ggml_int8x16x4_t
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix missing `_t` for ggml_vec_xl_s8x4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix more missing `_t`
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add unroll and prefetch to Q8_0
increase of 3.86% for prompt processing and 32.22% for token generation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: patch Q8_0 to use proper vector sizes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: optimise Q8_0 dot prod compute kernel further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: add unroll and prefetch to Q4_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: refactor Q6_K variable naming for readability
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q6_K typos
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q5_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix wrong char*x16_t naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: Q5_K y0 wrong signness
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q5_K invalid uchar type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: s390x SIMD activation for Q4_K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: fix Q4_K invalid vector intrinsics
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: simplify ggml_padd_s16 compute kernel
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: correct ggml-cpu vxe wording
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: change ggml_aligned_malloc alignment to 256
256 is the cache line size for s390x platforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: resolve pr merge via cherry-pick 225bbbf 
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml : fix LoongArch compile error with 128-bit SIMD (#11701 )
* ggml: resolve pr merge via cherry-pick 4571953
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
* ggml: cmake remove fork when determining s390x machine type
thank you @ericcurtin
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
Co-authored-by: Jinyang He <hejinyang@loongson.cn >
Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com > 
						
						
					 
					
						2025-02-22 21:39:24 +00:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						a28e0d5eb1 
					 
					
						
						
							
							CUDA: app option to compile without FlashAttention ( #12025 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 20:44:34 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						5fa07c2f93 
					 
					
						
						
							
							CUDA: optimize FA for GQA + large batches ( #12014 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 12:20:17 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Gian-Carlo Pascutto 
							
						 
					 
					
						
						
							
						
						d70908421f 
					 
					
						
						
							
							cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. ( #12000 )  
						
						 
						
						
						
						
					 
					
						2025-02-22 09:43:24 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								PureJourney 
							
						 
					 
					
						
						
							
						
						ecc8e3aeff 
					 
					
						
						
							
							CUDA: correct the lowest Maxwell supported by CUDA 12 ( #11984 )  
						
						 
						
						... 
						
						
						
						* CUDA: correct the lowest Maxwell supported by CUDA 12
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de > 
						
						
					 
					
						2025-02-21 12:21:05 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Bodhi 
							
						 
					 
					
						
						
							
						
						0b3863ff95 
					 
					
						
						
							
							MUSA: support ARM64 and enable dp4a .etc ( #11843 )  
						
						 
						
						... 
						
						
						
						* MUSA:  support ARM64 and enable __dp4a .etc
* fix cross entropy loss op for musa
* update
* add cc info log for musa
* add comment for the MUSA .cc calculation block
---------
Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com > 
						
						
					 
					
						2025-02-21 09:46:23 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Charles Xu 
							
						 
					 
					
						
						
							
						
						c5d91a7400 
					 
					
						
						
							
							ggml-cpu: Add CPU backend support for KleidiAI library ( #11390 )  
						
						 
						
						... 
						
						
						
						* ggml-cpu: Add CPU backend support for KleidiAI library
* Add environmental variable GGML_KLEIDIAI_SME
* Add support for multithread LHS conversion
* Switch kernel selection order to dotprod and i8mm
* updates for review comments
* More updates for review comments
* Reorganize and rename KleidiAI files
* Move ggml-cpu-traits.h to source file
* Update cmake for SME build and add alignment for SME
* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list 
						
						
					 
					
						2025-02-20 15:06:51 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Prashant Vithule 
							
						 
					 
					
						
						
							
						
						4806498bf1 
					 
					
						
						
							
							ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot ( #11917 )  
						
						 
						
						... 
						
						
						
						* Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file
* Improved Formating of code in  ggml-cpu-quants.c file
* style : minor fixes
* style : less whitespaces
* style : ptr spaceing
---------
Co-authored-by: vithulep <p.m.vithule1517@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
					 
					
						2025-02-20 12:08:32 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						73e2ed3ce3 
					 
					
						
						
							
							CUDA: use async data loading for FlashAttention ( #11894 )  
						
						 
						
						... 
						
						
						
						* CUDA: use async data loading for FlashAttention
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-02-17 14:03:24 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Rémy O 
							
						 
					 
					
						
						
							
						
						2eea03d86a 
					 
					
						
						
							
							vulkan: implement several ops relevant for ggml_opt ( #11769 )  
						
						 
						
						... 
						
						
						
						* vulkan: support memset_tensor
* vulkan: support GGML_OP_SUM
* vulkan: implement GGML_OP_ARGMAX
* vulkan: implement GGML_OP_SUB
* vulkan: implement GGML_OP_COUNT_EQUAL
* vulkan: implement GGML_OP_OPT_STEP_ADAMW
* vulkan: fix check_results RWKV_WKV6 crash and memory leaks
* vulkan: implement GGML_OP_REPEAT_BACK
* tests: remove invalid test-backend-ops REPEAT_BACK tests
* vulkan: fix COUNT_EQUAL memset using a fillBuffer command 
						
						
					 
					
						2025-02-17 07:55:57 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						bf42a23d0a 
					 
					
						
						
							
							vulkan: support multi/vision rope, and noncontiguous rope ( #11902 )  
						
						 
						
						
						
						
					 
					
						2025-02-16 08:52:23 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Hale Chan 
							
						 
					 
					
						
						
							
						
						c2ea16f260 
					 
					
						
						
							
							metal : fix the crash caused by the lack of residency set support on Intel Macs. ( #11904 )  
						
						 
						
						
						
						
					 
					
						2025-02-16 08:50:26 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Adrian Kretz 
							
						 
					 
					
						
						
							
						
						22885105a6 
					 
					
						
						
							
							metal : optimize dequant q6_K kernel ( #11892 )  
						
						 
						
						
						
						
					 
					
						2025-02-15 20:39:20 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						68ff663a04 
					 
					
						
						
							
							repo : update links to new url ( #11886 )  
						
						 
						
						... 
						
						
						
						* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci 
						
						
					 
					
						2025-02-15 16:40:57 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Rémy O 
							
						 
					 
					
						
						
							
						
						fc1b0d0936 
					 
					
						
						
							
							vulkan: initial support for IQ1_S and IQ1_M quantizations ( #11528 )  
						
						 
						
						... 
						
						
						
						* vulkan: initial support for IQ1_S and IQ1_M quantizations
* vulkan: define MMV kernels for IQ1 quantizations
* devops: increase timeout of Vulkan tests again
* vulkan: simplify ifdef for init_iq_shmem 
						
						
					 
					
						2025-02-15 09:01:40 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								lhez 
							
						 
					 
					
						
						
							
						
						300907b211 
					 
					
						
						
							
							opencl: Fix rope and softmax ( #11833 )  
						
						 
						
						... 
						
						
						
						* opencl: fix `ROPE`
* opencl: fix `SOFT_MAX`
* Add fp16 variant
* opencl: enforce subgroup size for `soft_max` 
						
						
					 
					
						2025-02-14 12:12:23 -07:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						94b87f87b5 
					 
					
						
						
							
							cuda : add ampere to the list of default architectures ( #11870 )  
						
						 
						
						
						
						
					 
					
						2025-02-14 15:33:52 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jinyang He 
							
						 
					 
					
						
						
							
						
						38e32eb6a0 
					 
					
						
						
							
							ggml: optimize some vec dot functions for LoongArch ASX ( #11842 )  
						
						 
						
						... 
						
						
						
						* Optimize ggml_vec_dot_q3_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q4_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q6_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q5_K_q8_K for LoongArch ASX
* Optimize ggml_vec_dot_q2_K_q8_K for LoongArch ASX
* Optimize mul_sum_i8_pairs_float for LoongArch ASX
* Optimize ggml_vec_dot_iq4_xs_q8_K for LoongArch ASX 
						
						
					 
					
						2025-02-14 10:54:27 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Eve 
							
						 
					 
					
						
						
							
						
						a4f011e8d0 
					 
					
						
						
							
							vulkan: linux builds + small subgroup size fixes ( #11767 )  
						
						 
						
						... 
						
						
						
						* mm subgroup size
* upload vulkan x86 builds 
						
						
					 
					
						2025-02-14 02:59:40 +00:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jeffrey Morgan 
							
						 
					 
					
						
						
							
						
						8a8c4ceb60 
					 
					
						
						
							
							llamafile: use member variable instead of constant for iq4nlt ( #11780 )  
						
						 
						
						
						
						
					 
					
						2025-02-13 18:05:04 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								R0CKSTAR 
							
						 
					 
					
						
						
							
						
						bd6e55bfd3 
					 
					
						
						
							
							musa: bump MUSA SDK version to rc3.1.1  ( #11822 )  
						
						 
						
						... 
						
						
						
						* musa: Update MUSA SDK version to rc3.1.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
* musa: Remove workaround in PR #10042 
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com > 
						
						
					 
					
						2025-02-13 13:28:18 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						a394039db0 
					 
					
						
						
							
							ggml-cpu : add chunking support to mul_mat_id ( #11666 )  
						
						 
						
						... 
						
						
						
						* ggml-cpu : add chunking support to mul_mat_id
* allocate chunk counter in wdata
parallelize src1 quantization by column to allows parallelization even when there is only one row
* disable for arm
* cleanup
* better way to disable for arm
* fix uninitialized counter when using 1 thread only
* revert test-backend-ops changes 
						
						
					 
					
						2025-02-13 01:02:38 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Xuan-Son Nguyen 
							
						 
					 
					
						
						
							
						
						be3bbd6215 
					 
					
						
						
							
							ggml : x2 speed for WASM by optimizing SIMD ( #11453 )  
						
						 
						
						... 
						
						
						
						* ggml : x2 speed for WASM by optimizing SIMD
* fix bad merging
* rm trailing spaces
* rm redundant clamp
* better quantize_row_q8_K
Co-authored-by: camel-cdr <camel-cdr@protonmail.com >
* remove memset that causes buffer overflow
Co-authored-by: camel-cdr <camel-cdr@protonmail.com >
---------
Co-authored-by: camel-cdr <camel-cdr@protonmail.com > 
						
						
					 
					
						2025-02-13 00:33:45 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								uvos 
							
						 
					 
					
						
						
							
						
						5c4284d57b 
					 
					
						
						
							
							HIP: Remove GCN from list of devices that avoid MMQ ( #11831 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 22:25:28 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								uvos 
							
						 
					 
					
						
						
							
						
						e598697d63 
					 
					
						
						
							
							HIP: Switch to std::vector in rocblas version check ( #11820 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 17:25:03 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Richard 
							
						 
					 
					
						
						
							
						
						748ee9fe93 
					 
					
						
						
							
							ggml : fix multi-threaded clamp_f32 ( #11824 )  
						
						 
						
						... 
						
						
						
						* Bug fix for clamp_f32
When using tensors larger than 1d clamp operation does not work due to the restriction of returning if ith is not 0.
* Bug fix for clamp_f32
* Bug fix for clamp_f32 
						
						
					 
					
						2025-02-12 15:57:33 +02:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Weizhao Ouyang 
							
						 
					 
					
						
						
							
						
						198b1ec611 
					 
					
						
						
							
							ggml-cpu: Fix duplicate MATMUL_INT8 ( #11817 )  
						
						 
						
						... 
						
						
						
						Signed-off-by: Weizhao Ouyang <o451686892@gmail.com > 
						
						
					 
					
						2025-02-12 13:22:58 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						c3d6af7cd2 
					 
					
						
						
							
							CUDA: fix CUDART_VERSION checks ( #11821 )  
						
						 
						
						
						
						
					 
					
						2025-02-12 13:16:39 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Sheldon Robinson 
							
						 
					 
					
						
						
							
						
						90e4dba461 
					 
					
						
						
							
							Fix   #11802 : Compile bug - RegQueryValueExA changed to RegQueryValueEx ( #11803 )  
						
						 
						
						... 
						
						
						
						* Fix  #11802 : Compile bug - RegQueryValueExA changed to RegQueryValueEx
* Fix  #11802 : PR #11803  - keep RegQueryValueExA, remove TEXT macro, description needs to be ANSI string 
						
						
					 
					
						2025-02-11 16:55:45 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						b9ab0a4d0b 
					 
					
						
						
							
							CUDA: use arch list for compatibility check ( #11775 )  
						
						 
						
						... 
						
						
						
						* CUDA: use arch list for feature availability check
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com > 
						
						
					 
					
						2025-02-11 00:17:22 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Maxim Evtush 
							
						 
					 
					
						
						
							
						
						7b891bdc86 
					 
					
						
						
							
							fix: typos in documentation files ( #11791 )  
						
						 
						
						... 
						
						
						
						* Update ggml.c
* Update arg.cpp
* Update speculative.h 
						
						
					 
					
						2025-02-10 23:21:31 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Danny Milosavljevic 
							
						 
					 
					
						
						
							
						
						c2a67efe38 
					 
					
						
						
							
							vulkan: Make Vulkan optional at runtime ( #11493 ). ( #11494 )  
						
						 
						
						... 
						
						
						
						Co-authored-by: Jeff Bolz <jbolz@nvidia.com > 
						
						
					 
					
						2025-02-10 07:17:21 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Wagner Bruna 
							
						 
					 
					
						
						
							
						
						b044a0fe3c 
					 
					
						
						
							
							vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation ( #11592 )  
						
						 
						
						
						
						
					 
					
						2025-02-10 07:08:22 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						98f6b0fd1e 
					 
					
						
						
							
							vulkan: account for lookup tables when checking shared memory size ( #11502 )  
						
						 
						
						
						
						
					 
					
						2025-02-09 08:43:51 +01:00  
					
					
						 
						
						
							
							
							 
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Karol Kontny 
							
						 
					 
					
						
						
							
						
						4d3465c5ae 
					 
					
						
						
							
							ggml: Fix data race in ggml threadpool ( #11736 )  
						
						 
						
						... 
						
						
						
						After the barrier in last iteration is executed, still the loop termination
condition will be executed. However main thread can destroy the cgraph object
and its nodes already, then another thread will access it, but the thing is already gone.
Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the
prior situation is possible.
Last syncronization should be done after the loop to ensure the cgraph/cplan won't be
accessed after the main thread exits from the function. 
						
						
					 
					
						2025-02-08 15:30:53 +01:00