ibrahimkhadraoui
0ad3502839
rm extra space
2025-07-07 15:26:46 +04:00
ibrahim khadraoui
3afb2a89eb
Merge pull request #1 from tiiuae/injected-mup
...
injected mup
2025-07-07 15:20:08 +04:00
younesbelkada
e96cc73390
clean ups
2025-07-07 15:13:06 +04:00
younesbelkada
a9f3a63dc1
injected mup
2025-07-07 15:00:25 +04:00
ibrahimkhadraoui
b3bc1fb237
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased
2025-07-07 14:36:55 +04:00
ibrahimkhadraoui
286e1fa569
fix rope_theta
2025-07-07 14:36:51 +04:00
ibrahimkhadraoui
97011d7a1f
mup_vec create as float64
2025-07-07 14:25:32 +04:00
ibrahimkhadraoui
49d7420964
inp_out_ids moved outside of layers loop
2025-07-07 14:18:48 +04:00
ibrahimkhadraoui
8c50893820
added some cb functions for debugging puposes
2025-07-07 14:10:45 +04:00
Younes B
6c39e775dd
fix conversion and d_inner
2025-07-07 10:56:49 +02:00
ibrahimkhadraoui
441d8d66bd
override modify_tensors instead of get_tensors
2025-07-07 12:00:57 +04:00
ibrahimkhadraoui
53304c84db
remove unused functions from gguf_writer.py
2025-07-07 11:18:14 +04:00
ibrahimkhadraoui
c4af0f3ca5
mamba_d_ssm added to d_inner find_hparam
2025-07-07 11:17:31 +04:00
ibrahimkhadraoui
c56ec07a9a
read arch from gguf.MODEL_ARCH
2025-07-07 10:34:46 +04:00
ibrahimkhadraoui
280dd2dcb7
falcon-h1 specefic vocab resolved
2025-07-07 10:25:57 +04:00
Eve
6491d6e4f1
vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) ( #14485 )
...
Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260
Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com >
b5835
2025-07-06 12:29:36 +02:00
Jeff Bolz
e592be1575
vulkan: fix rms_norm+mul fusion ( #14545 )
...
The fused operation was grabbing the epsilon value from the wrong place.
Add an env var to disable fusion.
Add some missing checks for supported shapes/types.
Handle fused rms_norm+mul in check_results.
b5834
2025-07-06 10:08:16 +02:00
Jeff Bolz
a0374a67e2
vulkan: Handle updated FA dim2/3 definition ( #14518 )
...
* vulkan: Handle updated FA dim2/3 definition
Pack mask boolean and n_head_log2 into a single dword to keep the push
constant block under the 128B limit.
* handle null mask for gqa
* allow gqa with dim3>1
b5833
2025-07-05 09:26:04 +02:00
Sigbjørn Skjæret
ddef99522d
server : fix assistant prefilling when content is an array ( #14360 )
b5832
2025-07-05 09:17:14 +02:00
Sigbjørn Skjæret
6681688146
opencl: add GELU_ERF ( #14476 )
b5831
2025-07-04 23:24:56 -07:00
Georgi Gerganov
bac8bed248
eval-callback : check for empty input ( #14539 )
b5830
2025-07-05 07:18:09 +03:00
R0CKSTAR
b81510a7b7
test-backend-ops: add support for specifying output format ( #14368 )
...
* test-backend-ops: add support for specifying output format
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Add build_commit and build_number in test_result
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* refactor
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Get build commit from ggml_commit()
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Merge errors into test_operation_info && address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* remove visitor nonsense
* remove visitor comment
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
---------
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
Co-authored-by: slaren <slarengh@gmail.com >
b5829
2025-07-05 12:10:53 +08:00
Georgi Gerganov
ef797db357
metal : disable fast math in all quantize kernels ( #14528 )
...
ggml-ci
b5828
2025-07-04 19:19:09 +03:00
ibrahimkhadraoui
7a25441e13
fixed multipliers
2025-07-04 17:41:03 +04:00
ibrahimkhadraoui
9760c8bc9d
conflict solve
2025-07-04 16:28:48 +04:00
ibrahimkhadraoui
2aa48dd853
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased
2025-07-04 16:25:54 +04:00
ibrahimkhadraoui
3ee7983961
fix vocab size
2025-07-04 16:25:27 +04:00
younesbelkada
250b4f1074
mix instead of max
2025-07-04 15:53:47 +04:00
younesbelkada
1fd0574adc
try
2025-07-04 15:50:43 +04:00
ibrahimkhadraoui
a6d0067dd7
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased
2025-07-04 15:37:44 +04:00
ibrahimkhadraoui
15138df48f
small fix ffn_norm
2025-07-04 15:37:40 +04:00
younesbelkada
6c7d9e26e7
fix
2025-07-04 15:25:59 +04:00
ibrahimkhadraoui
d22b4ea425
Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased
2025-07-04 15:10:11 +04:00
ibrahimkhadraoui
2fe057cc40
Revert "fix"
...
This reverts commit 243e4d1a50 .
2025-07-04 15:04:13 +04:00
younesbelkada
22de62cf56
fix
2025-07-04 15:02:14 +04:00
younesbelkada
cce35498d5
pre-norm -> norm
2025-07-04 14:58:33 +04:00
younesbelkada
243e4d1a50
fix
2025-07-04 14:55:31 +04:00
younesbelkada
1415cd8782
another fix
2025-07-04 14:49:59 +04:00
younesbelkada
a39a8423f7
merge
2025-07-04 14:48:22 +04:00
younesbelkada
50eadc7b33
fixes
2025-07-04 14:47:31 +04:00
ibrahimkhadraoui
071f4b7fd8
changed precision for multipliers float 32->64
2025-07-04 14:37:02 +04:00
ibrahimkhadraoui
8bea92261e
python fixes
2025-07-04 14:32:11 +04:00
Georgi Gerganov
67d1ef23c6
batch : add optional for sequential equal split ( #14511 )
...
ggml-ci
b5827
2025-07-04 09:08:59 +03:00
Georgi Gerganov
7b50f7c025
graph : prepare for 4D mask ( #14515 )
...
ggml-ci
b5826
2025-07-04 09:05:36 +03:00
Georgi Gerganov
c79184d2d1
batch : add n_used count ( #14512 )
...
ggml-ci
b5825
2025-07-04 09:04:59 +03:00
luyhcsu
499a8f5a78
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator ( #14002 )
...
Co-authored-by: luyuhong <luyuhong@kylinos.cn >
b5824
2025-07-04 11:50:07 +08:00
Sigbjørn Skjæret
28657a8229
ggml : implement GEGLU_ERF and GEGLU_QUICK ops ( #14445 )
b5823
2025-07-03 23:07:22 +02:00
lhez
bee28421be
opencl : broadcast for soft_max ( #14510 )
b5822
2025-07-03 20:22:24 +02:00
Jeff Bolz
2b72bedec1
vulkan: support mixed/deepseekR1 FA head sizes ( #14509 )
...
* vulkan: better parameterize FA by head sizes
* vulkan: support mixed/deepseekR1 FA head sizes
b5821
2025-07-03 20:21:14 +02:00
Johannes Gäßler
c8c4495b8d
ggml: backward pass for split swiglu ( #14483 )
b5820
2025-07-03 17:05:18 +02:00