llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-13 10:57:15 +00:00

Author	SHA1	Message	Date
Xuan Son Nguyen	34bacc8365	ggml-ci	2025-07-09 12:09:36 +02:00
Xuan Son Nguyen	4ea74b04e5	make code looks more consistent	2025-07-09 12:07:05 +02:00
Xuan Son Nguyen	0d70ca81e8	use memcpy for op params	2025-07-09 12:05:34 +02:00
Xuan Son Nguyen	50c678f6da	rm __ARM_FEATURE_SVE	2025-07-09 11:56:48 +02:00
Xuan Son Nguyen	563aca0b56	vDSP_vsmsa	2025-07-09 11:55:56 +02:00
Xuan Son Nguyen	265cb43538	fix cann compile error	2025-07-09 11:52:58 +02:00
Xuan Son Nguyen	c8d89317c9	suggestions from coderabbit	2025-07-09 00:06:53 +02:00
Xuan Son Nguyen	b22708fd90	fix cuda	2025-07-09 00:00:44 +02:00
Xuan Son Nguyen	4d0195324e	will this fix cpu?	2025-07-09 00:00:31 +02:00
Xuan Son Nguyen	0e51a0a8b0	opencl	2025-07-08 23:36:47 +02:00
Xuan Son Nguyen	477a97ad87	cann (placeholder)	2025-07-08 23:34:15 +02:00
Xuan Son Nguyen	782b58fa06	vulkan	2025-07-08 23:31:04 +02:00
Xuan Son Nguyen	a28df6f00c	sycl	2025-07-08 23:27:32 +02:00
Xuan Son Nguyen	92a8738452	add CUDA	2025-07-08 23:26:21 +02:00
Xuan Son Nguyen	e427af75fb	add more simd	2025-07-08 23:19:16 +02:00
Xuan Son Nguyen	a5ccf168f1	ggml_vec_mad1_f32	2025-07-08 23:13:42 +02:00
Xuan Son Nguyen	7af3fd98a1	Merge branch 'master' into xsn/ggml_scale_bias	2025-07-08 23:02:15 +02:00
Jeff Bolz	6efcd65945	vulkan: optimize flash attention split_k_reduce (#14554 ) * vulkan: allow FA split_k with smaller KV values * vulkan: spread split_k_reduce work across more threads k_num can get rather large. Use the whole workgroup to reduce the M/L values. Launch a thread for each element in the HSV dimension of the output. Helps a lot for large HSV (like deepseek). b5849	2025-07-08 20:11:42 +02:00
stevenkuang	699f4392a3	model : fix hunyuan moe chat template (#14584 ) Signed-off-by: stevenkuang <stevenkuang@tencent.com> b5848	2025-07-08 18:29:29 +02:00
Xuan-Son Nguyen	08382869a2	model : add SmolLM3 (#14581 ) * Init - first pass. * Model -> ModelBase. * fix errors in conversion. * Update the graph. * up. * up. * wip * cgraph ok * rm redundant code --------- Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com> b5847	2025-07-08 18:07:01 +02:00
compilade	bb4f7a9e4e	memory : fix broken batch splits for recurrent cache (#14575 ) Splits producing more than one ubatch per batch for recurrent models were broken with #14512. This fixes it by moving the completeness check after the ubatch split loop. b5846	2025-07-08 18:37:47 +03:00
Jeff Bolz	b8eeb8741d	vulkan : fix rope with partial rotation and non-cont src (#14582 ) b5845	2025-07-08 15:21:21 +02:00
Alawode Oluwandabira	17a1f0d2d4	server: Add ability to mount server at prefix (#14544 ) * Add server_prefix * Correct server path env * Rename cli flag to --api-prefix * Change all to api_prefix b5844	2025-07-08 11:47:33 +03:00
Xuan-Son Nguyen	8f22dc0a53	model : add hunyuan moe (#14425 ) * model : add hunyuan moe * tokenizer ok * fix tensor name * cgraph init * chat template * wip * almost working * skip embed, fix bos * cleanup * yarn scaling * cleanup * correct rope type * failed token fix * ntk alpha freq_base * tokenization working * cleanup and pr changes * vocab_size sanity check * ntk alpha generic * Update convert_hf_to_gguf.py * Apply suggestions from code review * fix regression * fix style --------- Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com> b5843	2025-07-08 11:24:06 +03:00
Jeff Bolz	53903ae6fa	vulkan: increase timeout for CI (#14574 )	2025-07-08 09:38:31 +02:00
Georgi Gerganov	4d0dcd4a06	cuda : fix rope with partial rotation and non-cont src (#14580 ) * cuda : fix rope non-cont ggml-ci * cont : fix multi-rope + add test ggml-ci * sycl : try fix ggml-ci * cont : fix sycl + clean-up cuda ggml-ci b5841	2025-07-08 10:15:21 +03:00
Aman Gupta	75c91de6e9	CUDA: add bilinear interpolation for upscale (#14563 ) b5840	2025-07-08 10:11:18 +08:00
R0CKSTAR	68155c66f0	musa: fix build warnings (unused variable) (#14561 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> b5839	2025-07-08 07:58:30 +08:00
Sigbjørn Skjæret	e1a7059053	llama : fix incorrect minicpm3 v_states shape (#14571 ) b5838	2025-07-07 23:35:35 +02:00
Sigbjørn Skjæret	12f55c302b	llama : remove ggml_cont where possible (#14568 ) b5837	2025-07-07 21:35:08 +02:00
Aman Gupta	b9c3eefde1	CUDA: add bf16 and i32 to getrows (#14529 ) b5836	2025-07-07 21:45:43 +08:00
Eve	6491d6e4f1	vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485 ) Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260 Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com> b5835	2025-07-06 12:29:36 +02:00
Jeff Bolz	e592be1575	vulkan: fix rms_norm+mul fusion (#14545 ) The fused operation was grabbing the epsilon value from the wrong place. Add an env var to disable fusion. Add some missing checks for supported shapes/types. Handle fused rms_norm+mul in check_results. b5834	2025-07-06 10:08:16 +02:00
Jeff Bolz	a0374a67e2	vulkan: Handle updated FA dim2/3 definition (#14518 ) * vulkan: Handle updated FA dim2/3 definition Pack mask boolean and n_head_log2 into a single dword to keep the push constant block under the 128B limit. * handle null mask for gqa * allow gqa with dim3>1 b5833	2025-07-05 09:26:04 +02:00
Sigbjørn Skjæret	ddef99522d	server : fix assistant prefilling when content is an array (#14360 ) b5832	2025-07-05 09:17:14 +02:00
Sigbjørn Skjæret	6681688146	opencl: add GELU_ERF (#14476 ) b5831	2025-07-04 23:24:56 -07:00
Georgi Gerganov	bac8bed248	eval-callback : check for empty input (#14539 ) b5830	2025-07-05 07:18:09 +03:00
R0CKSTAR	b81510a7b7	test-backend-ops: add support for specifying output format (#14368 ) * test-backend-ops: add support for specifying output format Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Add build_commit and build_number in test_result Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * refactor Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Get build commit from ggml_commit() Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Merge errors into test_operation_info && address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * remove visitor nonsense * remove visitor comment Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> --------- Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> Co-authored-by: slaren <slarengh@gmail.com> b5829	2025-07-05 12:10:53 +08:00
Georgi Gerganov	ef797db357	metal : disable fast math in all quantize kernels (#14528 ) ggml-ci b5828	2025-07-04 19:19:09 +03:00
Georgi Gerganov	67d1ef23c6	batch : add optional for sequential equal split (#14511 ) ggml-ci b5827	2025-07-04 09:08:59 +03:00
Georgi Gerganov	7b50f7c025	graph : prepare for 4D mask (#14515 ) ggml-ci b5826	2025-07-04 09:05:36 +03:00
Georgi Gerganov	c79184d2d1	batch : add n_used count (#14512 ) ggml-ci b5825	2025-07-04 09:04:59 +03:00
luyhcsu	499a8f5a78	CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002 ) Co-authored-by: luyuhong <luyuhong@kylinos.cn> b5824	2025-07-04 11:50:07 +08:00
Sigbjørn Skjæret	28657a8229	ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445 ) b5823	2025-07-03 23:07:22 +02:00
lhez	bee28421be	opencl : broadcast for soft_max (#14510 ) b5822	2025-07-03 20:22:24 +02:00
Jeff Bolz	2b72bedec1	vulkan: support mixed/deepseekR1 FA head sizes (#14509 ) * vulkan: better parameterize FA by head sizes * vulkan: support mixed/deepseekR1 FA head sizes b5821	2025-07-03 20:21:14 +02:00
Johannes Gäßler	c8c4495b8d	ggml: backward pass for split swiglu (#14483 ) b5820	2025-07-03 17:05:18 +02:00
Nicolò Scipione	7b63a71a6b	Fix conditional enabling following arch checks for ggml-sycl (#14504 ) Signed-off-by: nscipione <nicolo.scipione@codeplay.com> b5819	2025-07-03 11:00:03 +02:00
Xuan-Son Nguyen	0c2ee38ab7	convert : correct gemma 3n conversion (#14450 ) * convert : correct gemma 3n conversion * rm redundant code	2025-07-03 10:03:06 +02:00
Georgi Gerganov	a70c8a0c4b	kv-cache : use ggml_set_rows (#14285 ) * kv-cache : use ggml_set_rows ggml-ci * graph : separate k and v indices ggml-ci * cont : remove redundant ifs ggml-ci * kv-cache : improve find_slot impl * kv-cache : bounds-check when accessing slot_info indices * kv-cache : add comments ggml-ci * ggml : add TODOs for adding GGML_OP_SET_ROWS support in the backends ggml-ci b5817	2025-07-03 10:53:35 +03:00

1 2 3 4 5 ...

5867 Commits