PAB 
							
						 
					 
					
						
						
							
						
						667d70d170 
					 
					
						
						
							
							metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026)  
						
						... 
						
						
						
						* wip
* wip implementation f32
* kernel conv transpose 1d f32 working
* initial commit 
						
						
							
						
					 
					
						2024-12-03 20:04:49 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						3b4f2e33e2 
					 
					
						
						
							
							llama : add missing LLAMA_API for llama_chat_builtin_templates ( #10636 )  
						
						
						
						
							
 
						
					 
					
						2024-12-03 12:54:30 +01:00 
						 
				 
			
				
					
						
							
							
								Nikolaos Pothitos 
							
						 
					 
					
						
						
							
						
						82bca2257b 
					 
					
						
						
							
							readme : add option, update default value, fix formatting ( #10271 )  
						
						... 
						
						
						
						* readme : document --no-display-prompt
* readme : update default prompt context size
* readme : remove unnecessary indentation
Indenting a line with four spaces makes Markdown treat that section as
plain text.
* readme : indent commands under bullets
* readme : indent commands in lettered list 
						
						
							
						
					 
					
						2024-12-03 12:50:08 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						0115df2f65 
					 
					
						
						
							
							metal : small-batch mat-mul kernels ( #10581 )  
						
						... 
						
						
						
						* metal : small-batch mat-mul kernels
ggml-ci
* metal : add rest of types
ggml-ci
* metal : final adjustments
ggml-ci
* metal : add comments
ggml-ci 
						
						
							
 
						
					 
					
						2024-12-03 11:52:33 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						515d4e5372 
					 
					
						
						
							
							github : minify link [no ci] (revert)  
						
						... 
						
						
						
						this doesn't work as expected 
						
						
							
						
					 
					
						2024-12-03 11:21:43 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						844e2e1fee 
					 
					
						
						
							
							github : minify link [no ci]  
						
						
						
						
							
						
					 
					
						2024-12-03 11:20:35 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						70b98fadbc 
					 
					
						
						
							
							server : fix default draft model parameters ( #10586 )  
						
						... 
						
						
						
						* server : force F16 KV cache for the draft model
ggml-ci
* server : fix draft params
ggml-ci
* server : various params fixes
ggml-ci 
						
						
							
 
						
					 
					
						2024-12-03 11:20:00 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						642330ac7c 
					 
					
						
						
							
							llama : add enum for built-in chat templates ( #10623 )  
						
						... 
						
						
						
						* llama : add enum for supported chat templates
* use "built-in" instead of "supported"
* arg: print list of built-in templates
* fix test
* update server README 
						
						
							
 
						
					 
					
						2024-12-02 22:10:19 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						8648c52101 
					 
					
						
						
							
							make : deprecate ( #10514 )  
						
						... 
						
						
						
						* make : deprecate
ggml-ci
* ci : disable Makefile builds
ggml-ci
* docs : remove make references [no ci]
* ci : disable swift build
ggml-ci
* docs : remove obsolete make references, scripts, examples
ggml-ci
* basic fix for compare-commits.sh
* update build.md
* more build.md updates
* more build.md updates
* more build.md updates
* Update Makefile
Co-authored-by: Diego Devesa <slarengh@gmail.com >
---------
Co-authored-by: slaren <slarengh@gmail.com > 
						
						
							
						
					 
					
						2024-12-02 21:22:53 +02:00 
						 
				 
			
				
					
						
							
							
								haopeng 
							
						 
					 
					
						
						
							
						
						64ed2091b2 
					 
					
						
						
							
							server: Add "tokens per second" information in the backend ( #10548 )  
						
						... 
						
						
						
						* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co > 
						
						
							
 
						
					 
					
						2024-12-02 14:45:54 +01:00 
						 
				 
			
				
					
						
							
							
								Akarshan Biswas 
							
						 
					 
					
						
						
							
						
						991f8aabee 
					 
					
						
						
							
							SYCL: Fix and switch to GGML_LOG system instead of fprintf ( #10579 )  
						
						... 
						
						
						
						* Switched to GGML_LOG
* Fix missing semicolon 
						
						
							
 
						
					 
					
						2024-12-02 15:04:11 +08:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4cb003dd8d 
					 
					
						
						
							
							contrib : refresh ( #10593 )  
						
						... 
						
						
						
						* contrib : refresh
* contrib : expand [no ci]
* contrib : expand test-backend-ops instructions
* contrib : add CODEOWNERS
* prs : update template to not have checkbox [no ci] 
						
						
							
						
					 
					
						2024-12-02 08:53:27 +02:00 
						 
				 
			
				
					
						
							
							
								Juk Armstrong 
							
						 
					 
					
						
						
							
						
						917786f43d 
					 
					
						
						
							
							Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types ( #10572 )  
						
						... 
						
						
						
						* Templates: `mistral-v1`, `mistral-v2`, `mistral-v3`, `mistral-v3-tekken`
* Changed system message logic and added tests for all 4
* Invalid `system_message` instead of `content` fixed
* Removed tab-indented lines
* Added template code and test for `mistral-v7`
* Added all tests. Fixed bug with `tmpl == "llama2"` test.
* Replaced tabs with spaces.
* Removed `'mistral-v2'` option as no (open) models ever used it
* Removed all references to 'v2' template from comments
* Update llama.cpp
Fixed `trim_assistant_message` bug 
						
						
							
						
					 
					
						2024-12-01 23:09:49 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						5e1ed95583 
					 
					
						
						
							
							grammars : add English-only grammar ( #10612 )  
						
						
						
						
							
						
					 
					
						2024-12-01 21:37:54 +02:00 
						 
				 
			
				
					
						
							
							
								Wang Qin 
							
						 
					 
					
						
						
							
						
						5c7a5aa0c3 
					 
					
						
						
							
							ci: add error handling for Python venv creation in run.sh ( #10608 )  
						
						
						
						
							
						
					 
					
						2024-12-01 20:11:42 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						3420909dff 
					 
					
						
						
							
							ggml : automatic selection of best CPU backend ( #10606 )  
						
						... 
						
						
						
						* ggml : automatic selection of best CPU backend
* amx : minor opt
* add GGML_AVX_VNNI to enable avx-vnni, fix checks 
						
						
							
 
						
					 
					
						2024-12-01 16:12:41 +01:00 
						 
				 
			
				
					
						
							
							
								alek3y 
							
						 
					 
					
						
						
							
						
						86dc11c5bc 
					 
					
						
						
							
							server : bind to any port when specified ( #10590 )  
						
						
						
						
							
 
						
					 
					
						2024-12-01 13:33:12 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						6acce39710 
					 
					
						
						
							
							readme : update the usage section with examples ( #10596 )  
						
						... 
						
						
						
						* readme : update the usage section with examples
* readme : more examples 
						
						
							
						
					 
					
						2024-12-01 11:25:17 +02:00 
						 
				 
			
				
					
						
							
							
								Wang Qin 
							
						 
					 
					
						
						
							
						
						43957ef203 
					 
					
						
						
							
							build: update Makefile comments for C++ version change ( #10598 )  
						
						
						
						
							
 
						
					 
					
						2024-12-01 04:19:44 +01:00 
						 
				 
			
				
					
						
							
							
								Adrien Gallouët 
							
						 
					 
					
						
						
							
						
						0c39f44d70 
					 
					
						
						
							
							ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() ( #10567 )  
						
						... 
						
						
						
						Signed-off-by: Adrien Gallouët <angt@huggingface.co > 
						
						
							
 
						
					 
					
						2024-11-30 09:13:18 -08:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						3e0ba0e604 
					 
					
						
						
							
							readme : remove old badge  
						
						
						
						
							
						
					 
					
						2024-11-30 10:09:21 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						abadba05be 
					 
					
						
						
							
							readme : refresh ( #10587 )  
						
						... 
						
						
						
						* readme : refresh
* readme : move section [no ci]
* readme : clarify [no ci]
* readme : fixes [no ci]
* readme : more fixes [no ci]
* readme : simplify [no ci]
* readme : clarify GGUF 
						
						
							
						
					 
					
						2024-11-30 09:47:07 +02:00 
						 
				 
			
				
					
						
							
							
								Eve 
							
						 
					 
					
						
						
							
						
						0533e7fb38 
					 
					
						
						
							
							vulkan: Dynamic subgroup size support for Q6_K mat_vec ( #10536 )  
						
						... 
						
						
						
						* subgroup 64 version with subgroup add. 15% faster
scalable version
tested for subgroup sizes 16-128
* check for subgroup multiple of 16 and greater than 16
* subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45 )
* force 16 sequential threads per block
* make 16 subgroup size a constant 
						
						
							
 
						
					 
					
						2024-11-30 08:00:02 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						7cc2d2c889 
					 
					
						
						
							
							ggml : move AMX to the CPU backend ( #10570 )  
						
						... 
						
						
						
						* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com > 
						
						
							
 
						
					 
					
						2024-11-29 21:54:58 +01:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						b782e5c7d4 
					 
					
						
						
							
							server : add more test cases ( #10569 )  
						
						... 
						
						
						
						* server : add split model test
* add test speculative
* add invalid cases 
						
						
							
						
					 
					
						2024-11-29 21:48:56 +01:00 
						 
				 
			
				
					
						
							
							
								Robert Collins 
							
						 
					 
					
						
						
							
						
						3a8e9af402 
					 
					
						
						
							
							imatrix : support combine-only ( #10492 )  
						
						... 
						
						
						
						* imatrix-combine-only idea
* ensured that behavior consistent with log 
						
						
							
 
						
					 
					
						2024-11-29 19:21:37 +02:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						a3a3048e7a 
					 
					
						
						
							
							cleanup UI link list ( #10577 )  
						
						... 
						
						
						
						* cleanup UI link list
* sort list alphabetically
* add missing licenses 
						
						
							
						
					 
					
						2024-11-29 17:45:08 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						f0678c5ff4 
					 
					
						
						
							
							ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2024-11-29 16:25:39 +02:00 
						 
				 
			
				
					
						
							
							
								Shupei Fan 
							
						 
					 
					
						
						
							
						
						4b3242bbea 
					 
					
						
						
							
							ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 ( #10580 )  
						
						
						
						
							
 
						
					 
					
						2024-11-29 14:49:02 +01:00 
						 
				 
			
				
					
						
							
							
								Alberto Cabrera Pérez 
							
						 
					 
					
						
						
							
						
						0f77aae560 
					 
					
						
						
							
							sycl : offload of get_rows set to 0 ( #10432 )  
						
						
						
						
							
 
						
					 
					
						2024-11-29 20:38:45 +08:00 
						 
				 
			
				
					
						
							
							
								Alberto Cabrera Pérez 
							
						 
					 
					
						
						
							
						
						266b8519ee 
					 
					
						
						
							
							sycl : Reroute permuted mul_mats through oneMKL ( #10408 )  
						
						... 
						
						
						
						This PR fixes the failing MUL_MAT tests for the sycl backend. 
						
						
							
 
						
					 
					
						2024-11-29 09:49:43 +00:00 
						 
				 
			
				
					
						
							
							
								Chenguang Li 
							
						 
					 
					
						
						
							
						
						938f608742 
					 
					
						
						
							
							CANN: RoPE operator optimization ( #10563 )  
						
						... 
						
						
						
						* [cann] RoPE operator optimization
* [CANN]Code Formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com > 
						
						
							
 
						
					 
					
						2024-11-29 14:46:55 +08:00 
						 
				 
			
				
					
						
							
							
								Jeff Bolz 
							
						 
					 
					
						
						
							
						
						f095a649ec 
					 
					
						
						
							
							vulkan: get the first command buffer submitted sooner ( #10499 )  
						
						... 
						
						
						
						This is an incremental improvement over #9118  to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.
With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU. 
						
						
							
 
						
					 
					
						2024-11-29 07:18:02 +01:00 
						 
				 
			
				
					
						
							
							
								Ting Lou 
							
						 
					 
					
						
						
							
						
						678d7994f4 
					 
					
						
						
							
							llava: return false instead of exit ( #10546 )  
						
						
						
						
							
 
						
					 
					
						2024-11-29 01:09:46 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						dc22344088 
					 
					
						
						
							
							ggml : remove redundant copyright notice + update authors  
						
						
						
						
							
 
						
					 
					
						2024-11-28 20:46:40 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						4c0a95b107 
					 
					
						
						
							
							llama : add missing model types  
						
						
						
						
							
 
						
					 
					
						2024-11-28 20:45:07 +02:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						6c59567689 
					 
					
						
						
							
							server : (tests) don't use thread for capturing stdout/stderr, bump openai client library ( #10568 )  
						
						... 
						
						
						
						* server : (tests) don't use thread for capturing stdout/stderr
* test: bump openai to 1.55.2
* bump openai to 1.55.3 
						
						
							
						
					 
					
						2024-11-28 19:17:49 +01:00 
						 
				 
			
				
					
						
							
							
								Johannes Gäßler 
							
						 
					 
					
						
						
							
						
						890719311b 
					 
					
						
						
							
							common: fix warning message when no GPU found ( #10564 )  
						
						
						
						
							
 
						
					 
					
						2024-11-28 18:15:25 +01:00 
						 
				 
			
				
					
						
							
							
								Random Fly 
							
						 
					 
					
						
						
							
						
						7281cf13ad 
					 
					
						
						
							
							docs: fix outdated usage of llama-simple ( #10565 )  
						
						
						
						
							
 
						
					 
					
						2024-11-28 16:03:11 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						e90688edd0 
					 
					
						
						
							
							ci : fix tag name in cuda and hip releases ( #10566 )  
						
						
						
						
							
 
						
					 
					
						2024-11-28 15:58:54 +01:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						76b27d29c2 
					 
					
						
						
							
							ggml : fix row condition for i8mm kernels ( #10561 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2024-11-28 14:56:37 +02:00 
						 
				 
			
				
					
						
							
							
								Georgi Gerganov 
							
						 
					 
					
						
						
							
						
						eea986f215 
					 
					
						
						
							
							cmake : fix ARM feature detection ( #10543 )  
						
						... 
						
						
						
						ggml-ci 
						
						
							
 
						
					 
					
						2024-11-28 14:56:23 +02:00 
						 
				 
			
				
					
						
							
							
								Shupei Fan 
							
						 
					 
					
						
						
							
						
						c202cef168 
					 
					
						
						
							
							ggml-cpu: support IQ4_NL_4_4 by runtime repack ( #10541 )  
						
						... 
						
						
						
						* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard 
						
						
							
						
					 
					
						2024-11-28 13:52:03 +01:00 
						 
				 
			
				
					
						
							
							
								Sergio López 
							
						 
					 
					
						
						
							
						
						2025fa67e9 
					 
					
						
						
							
							kompute : improve backend to pass test_backend_ops ( #10542 )  
						
						... 
						
						
						
						* kompute: op_unary: reject unsupported parameters
Signed-off-by: Sergio Lopez <slp@redhat.com >
* kompute: softmax: implement ALiBi support
Signed-off-by: Sergio Lopez <slp@redhat.com >
* kompute: rope: implement neox and phi3 support
Signed-off-by: Sergio Lopez <slp@redhat.com >
* kompute: op_mul_mat_q4_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com >
* kompute: op_mul_mat_[q4_0|q4_1|q8_0] permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com >
* kompute: op_mul_mat_f16 permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com >
* kompute: op_mul_mat_q6_k permutted support
Signed-off-by: Sergio Lopez <slp@redhat.com >
---------
Signed-off-by: Sergio Lopez <slp@redhat.com > 
						
						
							
 
						
					 
					
						2024-11-28 12:51:38 +01:00 
						 
				 
			
				
					
						
							
							
								Ruixin Huang 
							
						 
					 
					
						
						
							
						
						c6bc73951e 
					 
					
						
						
							
							CANN: Update cann.md to display correctly in CLion ( #10538 )  
						
						
						
						
							
 
						
					 
					
						2024-11-28 15:27:11 +08:00 
						 
				 
			
				
					
						
							
							
								leo-pony 
							
						 
					 
					
						
						
							
						
						605fa66c50 
					 
					
						
						
							
							CANN: Fix SOC_TYPE compile bug ( #10519 )  
						
						... 
						
						
						
						* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment
* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.
* fix CANN  compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version 
						
						
							
 
						
					 
					
						2024-11-28 15:25:24 +08:00 
						 
				 
			
				
					
						
							
							
								Chenguang Li 
							
						 
					 
					
						
						
							
						
						b7420131bf 
					 
					
						
						
							
							CANN: ROPE operator optimization ( #10540 )  
						
						... 
						
						
						
						* [cann] ROPE operator optimization
Co-authored-by: noemotiovon <noemotiovon@gmail.com > 
						
						
							
 
						
					 
					
						2024-11-28 14:24:46 +08:00 
						 
				 
			
				
					
						
							
							
								Xuan Son Nguyen 
							
						 
					 
					
						
						
							
						
						9f912511bc 
					 
					
						
						
							
							common : fix duplicated file name with hf_repo and hf_file ( #10550 )  
						
						
						
						
							
 
						
					 
					
						2024-11-27 22:30:52 +01:00 
						 
				 
			
				
					
						
							
							
								uvos 
							
						 
					 
					
						
						
							
						
						3ad5451f3b 
					 
					
						
						
							
							Add some minimal optimizations for CDNA ( #10498 )  
						
						... 
						
						
						
						* Add some minimal optimizations for CDNA
* ggml_cuda: set launch bounds also for GCN as it helps there too 
						
						
							
 
						
					 
					
						2024-11-27 17:10:08 +01:00 
						 
				 
			
				
					
						
							
							
								Diego Devesa 
							
						 
					 
					
						
						
							
						
						46c69e0e75 
					 
					
						
						
							
							ci : faster CUDA toolkit installation method and use ccache ( #10537 )  
						
						... 
						
						
						
						* ci : faster CUDA toolkit installation method and use ccache
* remove fetch-depth
* only pack CUDA runtime on master 
						
						
							
 
						
					 
					
						2024-11-27 11:03:25 +01:00