Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						7a32fcb3b2 
					 
					
						
						
							
							ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) ( #1179 )  
						
						... 
						
						
						
						* ggml : add Q8_0 quantization format (rename the old one to Q8_1)
* tests : fix test-quantize-fns
* ggml : finalize Q8_0 implementation
* ggml : use q4_0_q8_0 and q4_2_q8_0
* ggml : fix Q8_0 dot product bug (ARM)
* ggml : Q8_0 unroll x2
* ggml : fix bug - using wrong block type
* ggml : extend quantize_fns_t with "vec_dot_type"
* ggml : fix Q8_0 to use 255 values out of 256
* ggml : fix assert using wrong QK4_2 instead of QK4_3 
						
						
					 
					
						2023-04-25 23:40:51 +03:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						50cb666b8a 
					 
					
						
						
							
							Improve cuBLAS performance by using a memory pool ( #1094 )  
						
						... 
						
						
						
						* Improve cuBLAS performance by using a memory pool
* Move cuda specific definitions to ggml-cuda.h/cu
* Add CXX flags to nvcc
* Change memory pool synchronization mechanism to a spin lock
General code cleanup 
						
						
					 
					
						2023-04-21 21:59:17 +02:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						2005469ea1 
					 
					
						
						
							
							Add Q4_3 support to cuBLAS ( #1086 )  
						
						
						
						
					 
					
						2023-04-20 20:49:53 +02:00 
						 
				 
			
				
					
						
							
							
								slaren 
							
						 
					 
					
						
						
							
						
						02d6988121 
					 
					
						
						
							
							Improve cuBLAS performance by dequantizing on the GPU ( #1065 )  
						
						
						
						
					 
					
						2023-04-20 03:14:14 +02:00