Xuan Son Nguyen
34bacc8365
ggml-ci
2025-07-09 12:09:36 +02:00
Xuan Son Nguyen
4ea74b04e5
make code looks more consistent
2025-07-09 12:07:05 +02:00
Xuan Son Nguyen
0d70ca81e8
use memcpy for op params
2025-07-09 12:05:34 +02:00
Xuan Son Nguyen
50c678f6da
rm __ARM_FEATURE_SVE
2025-07-09 11:56:48 +02:00
Xuan Son Nguyen
563aca0b56
vDSP_vsmsa
2025-07-09 11:55:56 +02:00
Xuan Son Nguyen
265cb43538
fix cann compile error
2025-07-09 11:52:58 +02:00
Xuan Son Nguyen
c8d89317c9
suggestions from coderabbit
2025-07-09 00:06:53 +02:00
Xuan Son Nguyen
b22708fd90
fix cuda
2025-07-09 00:00:44 +02:00
Xuan Son Nguyen
4d0195324e
will this fix cpu?
2025-07-09 00:00:31 +02:00
Xuan Son Nguyen
0e51a0a8b0
opencl
2025-07-08 23:36:47 +02:00
Xuan Son Nguyen
477a97ad87
cann (placeholder)
2025-07-08 23:34:15 +02:00
Xuan Son Nguyen
782b58fa06
vulkan
2025-07-08 23:31:04 +02:00
Xuan Son Nguyen
a28df6f00c
sycl
2025-07-08 23:27:32 +02:00
Xuan Son Nguyen
92a8738452
add CUDA
2025-07-08 23:26:21 +02:00
Xuan Son Nguyen
e427af75fb
add more simd
2025-07-08 23:19:16 +02:00
Xuan Son Nguyen
a5ccf168f1
ggml_vec_mad1_f32
2025-07-08 23:13:42 +02:00
Xuan Son Nguyen
7af3fd98a1
Merge branch 'master' into xsn/ggml_scale_bias
2025-07-08 23:02:15 +02:00
Jeff Bolz
6efcd65945
vulkan: optimize flash attention split_k_reduce ( #14554 )
...
* vulkan: allow FA split_k with smaller KV values
* vulkan: spread split_k_reduce work across more threads
k_num can get rather large. Use the whole workgroup to reduce the M/L values.
Launch a thread for each element in the HSV dimension of the output. Helps a
lot for large HSV (like deepseek).
b5849
2025-07-08 20:11:42 +02:00
stevenkuang
699f4392a3
model : fix hunyuan moe chat template ( #14584 )
...
Signed-off-by: stevenkuang <stevenkuang@tencent.com >
b5848
2025-07-08 18:29:29 +02:00
Xuan-Son Nguyen
08382869a2
model : add SmolLM3 ( #14581 )
...
* Init - first pass.
* Model -> ModelBase.
* fix errors in conversion.
* Update the graph.
* up.
* up.
* wip
* cgraph ok
* rm redundant code
---------
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com >
b5847
2025-07-08 18:07:01 +02:00
compilade
bb4f7a9e4e
memory : fix broken batch splits for recurrent cache ( #14575 )
...
Splits producing more than one ubatch per batch for recurrent models
were broken with #14512 .
This fixes it by moving the completeness check after the ubatch split loop.
b5846
2025-07-08 18:37:47 +03:00
Jeff Bolz
b8eeb8741d
vulkan : fix rope with partial rotation and non-cont src ( #14582 )
b5845
2025-07-08 15:21:21 +02:00
Alawode Oluwandabira
17a1f0d2d4
server: Add ability to mount server at prefix ( #14544 )
...
* Add server_prefix
* Correct server path env
* Rename cli flag to --api-prefix
* Change all to api_prefix
b5844
2025-07-08 11:47:33 +03:00
Xuan-Son Nguyen
8f22dc0a53
model : add hunyuan moe ( #14425 )
...
* model : add hunyuan moe
* tokenizer ok
* fix tensor name
* cgraph init
* chat template
* wip
* almost working
* skip embed, fix bos
* cleanup
* yarn scaling
* cleanup
* correct rope type
* failed token fix
* ntk alpha freq_base
* tokenization working
* cleanup and pr changes
* vocab_size sanity check
* ntk alpha generic
* Update convert_hf_to_gguf.py
* Apply suggestions from code review
* fix regression
* fix style
---------
Co-authored-by: kooshi <1934337+kooshi@users.noreply.github.com >
b5843
2025-07-08 11:24:06 +03:00
Jeff Bolz
53903ae6fa
vulkan: increase timeout for CI ( #14574 )
2025-07-08 09:38:31 +02:00
Georgi Gerganov
4d0dcd4a06
cuda : fix rope with partial rotation and non-cont src ( #14580 )
...
* cuda : fix rope non-cont
ggml-ci
* cont : fix multi-rope + add test
ggml-ci
* sycl : try fix
ggml-ci
* cont : fix sycl + clean-up cuda
ggml-ci
b5841
2025-07-08 10:15:21 +03:00
Aman Gupta
75c91de6e9
CUDA: add bilinear interpolation for upscale ( #14563 )
b5840
2025-07-08 10:11:18 +08:00
R0CKSTAR
68155c66f0
musa: fix build warnings (unused variable) ( #14561 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
b5839
2025-07-08 07:58:30 +08:00
Sigbjørn Skjæret
e1a7059053
llama : fix incorrect minicpm3 v_states shape ( #14571 )
b5838
2025-07-07 23:35:35 +02:00
Sigbjørn Skjæret
12f55c302b
llama : remove ggml_cont where possible ( #14568 )
b5837
2025-07-07 21:35:08 +02:00
Aman Gupta
b9c3eefde1
CUDA: add bf16 and i32 to getrows ( #14529 )
b5836
2025-07-07 21:45:43 +08:00
Eve
6491d6e4f1
vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) ( #14485 )
...
Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260
Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com >
b5835
2025-07-06 12:29:36 +02:00
Jeff Bolz
e592be1575
vulkan: fix rms_norm+mul fusion ( #14545 )
...
The fused operation was grabbing the epsilon value from the wrong place.
Add an env var to disable fusion.
Add some missing checks for supported shapes/types.
Handle fused rms_norm+mul in check_results.
b5834
2025-07-06 10:08:16 +02:00
Jeff Bolz
a0374a67e2
vulkan: Handle updated FA dim2/3 definition ( #14518 )
...
* vulkan: Handle updated FA dim2/3 definition
Pack mask boolean and n_head_log2 into a single dword to keep the push
constant block under the 128B limit.
* handle null mask for gqa
* allow gqa with dim3>1
b5833
2025-07-05 09:26:04 +02:00
Sigbjørn Skjæret
ddef99522d
server : fix assistant prefilling when content is an array ( #14360 )
b5832
2025-07-05 09:17:14 +02:00
Sigbjørn Skjæret
6681688146
opencl: add GELU_ERF ( #14476 )
b5831
2025-07-04 23:24:56 -07:00
Georgi Gerganov
bac8bed248
eval-callback : check for empty input ( #14539 )
b5830
2025-07-05 07:18:09 +03:00
R0CKSTAR
b81510a7b7
test-backend-ops: add support for specifying output format ( #14368 )
...
* test-backend-ops: add support for specifying output format
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Add build_commit and build_number in test_result
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* refactor
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Get build commit from ggml_commit()
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Merge errors into test_operation_info && address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* remove visitor nonsense
* remove visitor comment
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
* Address review comments
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
---------
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com >
Co-authored-by: slaren <slarengh@gmail.com >
b5829
2025-07-05 12:10:53 +08:00
Georgi Gerganov
ef797db357
metal : disable fast math in all quantize kernels ( #14528 )
...
ggml-ci
b5828
2025-07-04 19:19:09 +03:00
Georgi Gerganov
67d1ef23c6
batch : add optional for sequential equal split ( #14511 )
...
ggml-ci
b5827
2025-07-04 09:08:59 +03:00
Georgi Gerganov
7b50f7c025
graph : prepare for 4D mask ( #14515 )
...
ggml-ci
b5826
2025-07-04 09:05:36 +03:00
Georgi Gerganov
c79184d2d1
batch : add n_used count ( #14512 )
...
ggml-ci
b5825
2025-07-04 09:04:59 +03:00
luyhcsu
499a8f5a78
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator ( #14002 )
...
Co-authored-by: luyuhong <luyuhong@kylinos.cn >
b5824
2025-07-04 11:50:07 +08:00
Sigbjørn Skjæret
28657a8229
ggml : implement GEGLU_ERF and GEGLU_QUICK ops ( #14445 )
b5823
2025-07-03 23:07:22 +02:00
lhez
bee28421be
opencl : broadcast for soft_max ( #14510 )
b5822
2025-07-03 20:22:24 +02:00
Jeff Bolz
2b72bedec1
vulkan: support mixed/deepseekR1 FA head sizes ( #14509 )
...
* vulkan: better parameterize FA by head sizes
* vulkan: support mixed/deepseekR1 FA head sizes
b5821
2025-07-03 20:21:14 +02:00
Johannes Gäßler
c8c4495b8d
ggml: backward pass for split swiglu ( #14483 )
b5820
2025-07-03 17:05:18 +02:00
Nicolò Scipione
7b63a71a6b
Fix conditional enabling following arch checks for ggml-sycl ( #14504 )
...
Signed-off-by: nscipione <nicolo.scipione@codeplay.com >
b5819
2025-07-03 11:00:03 +02:00
Xuan-Son Nguyen
0c2ee38ab7
convert : correct gemma 3n conversion ( #14450 )
...
* convert : correct gemma 3n conversion
* rm redundant code
2025-07-03 10:03:06 +02:00
Georgi Gerganov
a70c8a0c4b
kv-cache : use ggml_set_rows ( #14285 )
...
* kv-cache : use ggml_set_rows
ggml-ci
* graph : separate k and v indices
ggml-ci
* cont : remove redundant ifs
ggml-ci
* kv-cache : improve find_slot impl
* kv-cache : bounds-check when accessing slot_info indices
* kv-cache : add comments
ggml-ci
* ggml : add TODOs for adding GGML_OP_SET_ROWS support in the backends
ggml-ci
b5817
2025-07-03 10:53:35 +03:00