Commit Graph

5926 Commits

Author SHA1 Message Date
ibrahimkhadraoui
0ad3502839 rm extra space 2025-07-07 15:26:46 +04:00
ibrahim khadraoui
3afb2a89eb Merge pull request #1 from tiiuae/injected-mup
injected mup
2025-07-07 15:20:08 +04:00
younesbelkada
e96cc73390 clean ups 2025-07-07 15:13:06 +04:00
younesbelkada
a9f3a63dc1 injected mup 2025-07-07 15:00:25 +04:00
ibrahimkhadraoui
b3bc1fb237 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-07 14:36:55 +04:00
ibrahimkhadraoui
286e1fa569 fix rope_theta 2025-07-07 14:36:51 +04:00
ibrahimkhadraoui
97011d7a1f mup_vec create as float64 2025-07-07 14:25:32 +04:00
ibrahimkhadraoui
49d7420964 inp_out_ids moved outside of layers loop 2025-07-07 14:18:48 +04:00
ibrahimkhadraoui
8c50893820 added some cb functions for debugging puposes 2025-07-07 14:10:45 +04:00
Younes B
6c39e775dd fix conversion and d_inner 2025-07-07 10:56:49 +02:00
ibrahimkhadraoui
441d8d66bd override modify_tensors instead of get_tensors 2025-07-07 12:00:57 +04:00
ibrahimkhadraoui
53304c84db remove unused functions from gguf_writer.py 2025-07-07 11:18:14 +04:00
ibrahimkhadraoui
c4af0f3ca5 mamba_d_ssm added to d_inner find_hparam 2025-07-07 11:17:31 +04:00
ibrahimkhadraoui
c56ec07a9a read arch from gguf.MODEL_ARCH 2025-07-07 10:34:46 +04:00
ibrahimkhadraoui
280dd2dcb7 falcon-h1 specefic vocab resolved 2025-07-07 10:25:57 +04:00
Eve
6491d6e4f1 vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485)
Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260

Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com>
b5835
2025-07-06 12:29:36 +02:00
Jeff Bolz
e592be1575 vulkan: fix rms_norm+mul fusion (#14545)
The fused operation was grabbing the epsilon value from the wrong place.

Add an env var to disable fusion.

Add some missing checks for supported shapes/types.

Handle fused rms_norm+mul in check_results.
b5834
2025-07-06 10:08:16 +02:00
Jeff Bolz
a0374a67e2 vulkan: Handle updated FA dim2/3 definition (#14518)
* vulkan: Handle updated FA dim2/3 definition

Pack mask boolean and n_head_log2 into a single dword to keep the push
constant block under the 128B limit.

* handle null mask for gqa

* allow gqa with dim3>1
b5833
2025-07-05 09:26:04 +02:00
Sigbjørn Skjæret
ddef99522d server : fix assistant prefilling when content is an array (#14360) b5832 2025-07-05 09:17:14 +02:00
Sigbjørn Skjæret
6681688146 opencl: add GELU_ERF (#14476) b5831 2025-07-04 23:24:56 -07:00
Georgi Gerganov
bac8bed248 eval-callback : check for empty input (#14539) b5830 2025-07-05 07:18:09 +03:00
R0CKSTAR
b81510a7b7 test-backend-ops: add support for specifying output format (#14368)
* test-backend-ops: add support for specifying output format

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Add build_commit and build_number in test_result

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* refactor

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Get build commit from ggml_commit()

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Merge errors into test_operation_info && address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* remove visitor nonsense

* remove visitor comment

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

---------

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
b5829
2025-07-05 12:10:53 +08:00
Georgi Gerganov
ef797db357 metal : disable fast math in all quantize kernels (#14528)
ggml-ci
b5828
2025-07-04 19:19:09 +03:00
ibrahimkhadraoui
7a25441e13 fixed multipliers 2025-07-04 17:41:03 +04:00
ibrahimkhadraoui
9760c8bc9d conflict solve 2025-07-04 16:28:48 +04:00
ibrahimkhadraoui
2aa48dd853 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-04 16:25:54 +04:00
ibrahimkhadraoui
3ee7983961 fix vocab size 2025-07-04 16:25:27 +04:00
younesbelkada
250b4f1074 mix instead of max 2025-07-04 15:53:47 +04:00
younesbelkada
1fd0574adc try 2025-07-04 15:50:43 +04:00
ibrahimkhadraoui
a6d0067dd7 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-04 15:37:44 +04:00
ibrahimkhadraoui
15138df48f small fix ffn_norm 2025-07-04 15:37:40 +04:00
younesbelkada
6c7d9e26e7 fix 2025-07-04 15:25:59 +04:00
ibrahimkhadraoui
d22b4ea425 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-04 15:10:11 +04:00
ibrahimkhadraoui
2fe057cc40 Revert "fix"
This reverts commit 243e4d1a50.
2025-07-04 15:04:13 +04:00
younesbelkada
22de62cf56 fix 2025-07-04 15:02:14 +04:00
younesbelkada
cce35498d5 pre-norm -> norm 2025-07-04 14:58:33 +04:00
younesbelkada
243e4d1a50 fix 2025-07-04 14:55:31 +04:00
younesbelkada
1415cd8782 another fix 2025-07-04 14:49:59 +04:00
younesbelkada
a39a8423f7 merge 2025-07-04 14:48:22 +04:00
younesbelkada
50eadc7b33 fixes 2025-07-04 14:47:31 +04:00
ibrahimkhadraoui
071f4b7fd8 changed precision for multipliers float 32->64 2025-07-04 14:37:02 +04:00
ibrahimkhadraoui
8bea92261e python fixes 2025-07-04 14:32:11 +04:00
Georgi Gerganov
67d1ef23c6 batch : add optional for sequential equal split (#14511)
ggml-ci
b5827
2025-07-04 09:08:59 +03:00
Georgi Gerganov
7b50f7c025 graph : prepare for 4D mask (#14515)
ggml-ci
b5826
2025-07-04 09:05:36 +03:00
Georgi Gerganov
c79184d2d1 batch : add n_used count (#14512)
ggml-ci
b5825
2025-07-04 09:04:59 +03:00
luyhcsu
499a8f5a78 CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002)
Co-authored-by: luyuhong <luyuhong@kylinos.cn>
b5824
2025-07-04 11:50:07 +08:00
Sigbjørn Skjæret
28657a8229 ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445) b5823 2025-07-03 23:07:22 +02:00
lhez
bee28421be opencl : broadcast for soft_max (#14510) b5822 2025-07-03 20:22:24 +02:00
Jeff Bolz
2b72bedec1 vulkan: support mixed/deepseekR1 FA head sizes (#14509)
* vulkan: better parameterize FA by head sizes

* vulkan: support mixed/deepseekR1 FA head sizes
b5821
2025-07-03 20:21:14 +02:00
Johannes Gäßler
c8c4495b8d ggml: backward pass for split swiglu (#14483) b5820 2025-07-03 17:05:18 +02:00