llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-13 10:57:15 +00:00

Author	SHA1	Message	Date
ibrahimkhadraoui	0ad3502839	rm extra space	2025-07-07 15:26:46 +04:00
ibrahim khadraoui	3afb2a89eb	Merge pull request #1 from tiiuae/injected-mup injected mup	2025-07-07 15:20:08 +04:00
younesbelkada	e96cc73390	clean ups	2025-07-07 15:13:06 +04:00
younesbelkada	a9f3a63dc1	injected mup	2025-07-07 15:00:25 +04:00
ibrahimkhadraoui	b3bc1fb237	Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased	2025-07-07 14:36:55 +04:00
ibrahimkhadraoui	286e1fa569	fix rope_theta	2025-07-07 14:36:51 +04:00
ibrahimkhadraoui	97011d7a1f	mup_vec create as float64	2025-07-07 14:25:32 +04:00
ibrahimkhadraoui	49d7420964	inp_out_ids moved outside of layers loop	2025-07-07 14:18:48 +04:00
ibrahimkhadraoui	8c50893820	added some cb functions for debugging puposes	2025-07-07 14:10:45 +04:00
Younes B	6c39e775dd	fix conversion and d_inner	2025-07-07 10:56:49 +02:00
ibrahimkhadraoui	441d8d66bd	override modify_tensors instead of get_tensors	2025-07-07 12:00:57 +04:00
ibrahimkhadraoui	53304c84db	remove unused functions from gguf_writer.py	2025-07-07 11:18:14 +04:00
ibrahimkhadraoui	c4af0f3ca5	mamba_d_ssm added to d_inner find_hparam	2025-07-07 11:17:31 +04:00
ibrahimkhadraoui	c56ec07a9a	read arch from gguf.MODEL_ARCH	2025-07-07 10:34:46 +04:00
ibrahimkhadraoui	280dd2dcb7	falcon-h1 specefic vocab resolved	2025-07-07 10:25:57 +04:00
Eve	6491d6e4f1	vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485 ) Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260 Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com> b5835	2025-07-06 12:29:36 +02:00
Jeff Bolz	e592be1575	vulkan: fix rms_norm+mul fusion (#14545 ) The fused operation was grabbing the epsilon value from the wrong place. Add an env var to disable fusion. Add some missing checks for supported shapes/types. Handle fused rms_norm+mul in check_results. b5834	2025-07-06 10:08:16 +02:00
Jeff Bolz	a0374a67e2	vulkan: Handle updated FA dim2/3 definition (#14518 ) * vulkan: Handle updated FA dim2/3 definition Pack mask boolean and n_head_log2 into a single dword to keep the push constant block under the 128B limit. * handle null mask for gqa * allow gqa with dim3>1 b5833	2025-07-05 09:26:04 +02:00
Sigbjørn Skjæret	ddef99522d	server : fix assistant prefilling when content is an array (#14360 ) b5832	2025-07-05 09:17:14 +02:00
Sigbjørn Skjæret	6681688146	opencl: add GELU_ERF (#14476 ) b5831	2025-07-04 23:24:56 -07:00
Georgi Gerganov	bac8bed248	eval-callback : check for empty input (#14539 ) b5830	2025-07-05 07:18:09 +03:00
R0CKSTAR	b81510a7b7	test-backend-ops: add support for specifying output format (#14368 ) * test-backend-ops: add support for specifying output format Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Add build_commit and build_number in test_result Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * refactor Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Get build commit from ggml_commit() Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Merge errors into test_operation_info && address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * remove visitor nonsense * remove visitor comment Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> * Address review comments Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> --------- Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com> Co-authored-by: slaren <slarengh@gmail.com> b5829	2025-07-05 12:10:53 +08:00
Georgi Gerganov	ef797db357	metal : disable fast math in all quantize kernels (#14528 ) ggml-ci b5828	2025-07-04 19:19:09 +03:00
ibrahimkhadraoui	7a25441e13	fixed multipliers	2025-07-04 17:41:03 +04:00
ibrahimkhadraoui	9760c8bc9d	conflict solve	2025-07-04 16:28:48 +04:00
ibrahimkhadraoui	2aa48dd853	Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased	2025-07-04 16:25:54 +04:00
ibrahimkhadraoui	3ee7983961	fix vocab size	2025-07-04 16:25:27 +04:00
younesbelkada	250b4f1074	mix instead of max	2025-07-04 15:53:47 +04:00
younesbelkada	1fd0574adc	try	2025-07-04 15:50:43 +04:00
ibrahimkhadraoui	a6d0067dd7	Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased	2025-07-04 15:37:44 +04:00
ibrahimkhadraoui	15138df48f	small fix ffn_norm	2025-07-04 15:37:40 +04:00
younesbelkada	6c7d9e26e7	fix	2025-07-04 15:25:59 +04:00
ibrahimkhadraoui	d22b4ea425	Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased	2025-07-04 15:10:11 +04:00
ibrahimkhadraoui	2fe057cc40	Revert "fix" This reverts commit `243e4d1a50`.	2025-07-04 15:04:13 +04:00
younesbelkada	22de62cf56	fix	2025-07-04 15:02:14 +04:00
younesbelkada	cce35498d5	pre-norm -> norm	2025-07-04 14:58:33 +04:00
younesbelkada	243e4d1a50	fix	2025-07-04 14:55:31 +04:00
younesbelkada	1415cd8782	another fix	2025-07-04 14:49:59 +04:00
younesbelkada	a39a8423f7	merge	2025-07-04 14:48:22 +04:00
younesbelkada	50eadc7b33	fixes	2025-07-04 14:47:31 +04:00
ibrahimkhadraoui	071f4b7fd8	changed precision for multipliers float 32->64	2025-07-04 14:37:02 +04:00
ibrahimkhadraoui	8bea92261e	python fixes	2025-07-04 14:32:11 +04:00
Georgi Gerganov	67d1ef23c6	batch : add optional for sequential equal split (#14511 ) ggml-ci b5827	2025-07-04 09:08:59 +03:00
Georgi Gerganov	7b50f7c025	graph : prepare for 4D mask (#14515 ) ggml-ci b5826	2025-07-04 09:05:36 +03:00
Georgi Gerganov	c79184d2d1	batch : add n_used count (#14512 ) ggml-ci b5825	2025-07-04 09:04:59 +03:00
luyhcsu	499a8f5a78	CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002 ) Co-authored-by: luyuhong <luyuhong@kylinos.cn> b5824	2025-07-04 11:50:07 +08:00
Sigbjørn Skjæret	28657a8229	ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445 ) b5823	2025-07-03 23:07:22 +02:00
lhez	bee28421be	opencl : broadcast for soft_max (#14510 ) b5822	2025-07-03 20:22:24 +02:00
Jeff Bolz	2b72bedec1	vulkan: support mixed/deepseekR1 FA head sizes (#14509 ) * vulkan: better parameterize FA by head sizes * vulkan: support mixed/deepseekR1 FA head sizes b5821	2025-07-03 20:21:14 +02:00
Johannes Gäßler	c8c4495b8d	ggml: backward pass for split swiglu (#14483 ) b5820	2025-07-03 17:05:18 +02:00

1 2 3 4 5 ...

5926 Commits