Aaron Teo
a1d8568c14
ggml-zdnn: impl matmul
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 18:13:07 +08:00
Aaron Teo
59e9805ab0
ggml-zdnn: code clean up
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 16:26:29 +08:00
Aaron Teo
c1653ab639
ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 16:22:06 +08:00
Aaron Teo
828519659b
ggml-zdnn: update supports_op matmul matrix
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 16:11:37 +08:00
Aaron Teo
18658b8607
ggml-zdnn: impl init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 12:02:20 +08:00
Aaron Teo
da2e0e70ba
ggml-zdnn: switch buffers back and set to arbitrary number
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 02:31:22 +08:00
Aaron Teo
63fbc45ed6
ggml-zdnn: switch to std vector instead of array
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 01:09:01 +08:00
Aaron Teo
b7f4b6fde3
ggml-zdnn: rework init_tensor to create new buffers
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 01:03:53 +08:00
Aaron Teo
ee0ed78d54
ggml-zdnn: add check for view tensors to prevent init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:56:32 +08:00
Aaron Teo
13c64448bd
ggml-zdnn: assign tensor->extra to buffer
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:48:32 +08:00
Aaron Teo
13c05872f2
ggml-zdnn: implement at least 1 op to test
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:44:05 +08:00
Aaron Teo
9e84742e72
ggml-zdnn: test ztensor finding in init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:40:22 +08:00
Aaron Teo
af9f4f0039
ggml-zdnn: fix compiler warnings and bugfixes
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:25:41 +08:00
Aaron Teo
ae2f656d7e
ggml-zdnn: bugfix new impl
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:18:53 +08:00
Aaron Teo
7c6395f826
ggml-zdnn: rewrite the backend implementation
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:14:45 +08:00
Aaron Teo
04ddb2ac95
ggml-zdnn: update op out_prod to use tensor->extra
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-23 19:51:37 +08:00
Aaron Teo
77a753297b
ggml-zdnn: support op out_prod
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-23 19:28:51 +08:00
Aaron Teo
11d58d29de
ggml-zdnn: add comments to prevent accidentally deleting lines
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-23 14:54:44 +08:00
Aaron Teo
529bdb9fbd
ggml-zdnn: last working matmul version
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-22 00:29:47 +08:00
Aaron Teo
60b9874dea
ggml-zdnn: update set_tensor logging to check only for matmul
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 21:11:39 +08:00
Aaron Teo
b9756b6dd4
ggml-zdnn: add more loggers
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 21:09:21 +08:00
Aaron Teo
1989fc9bf4
ggml-zdnn: add set_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 20:37:53 +08:00
Aaron Teo
36d76c30fb
ggml-zdnn: run compute and store into tensor->extra
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 20:30:54 +08:00
Aaron Teo
02cfcfb270
ggml-zdnn: add output buffer check
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 20:19:20 +08:00
Aaron Teo
fd4914b060
ggml-zdnn: tensor->extra logging check
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-18 21:00:30 +08:00
Aaron Teo
e084821a3f
ggml-zdnn: inital backend impl
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-18 20:04:20 +08:00
Reese Levine
21c021745d
ggml: Add initial WebGPU backend ( #14521 )
...
* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults
* Initialize webgpu device
* Making progress on setting up the backend
* Finish more boilerplate/utility functions
* Organize file and work on alloc buffer
* Add webgpu_context to prepare for actually running some shaders
* Work on memset and add shader loading
* Work on memset polyfill
* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it
* Implement get_tensor and buffer_clear
* Finish rest of setup
* Start work on compute graph
* Basic mat mul working
* Work on emscripten build
* Basic WebGPU backend instructions
* Use EMSCRIPTEN flag
* Work on passing ci, implement 4d tensor multiplication
* Pass thread safety test
* Implement permuting for mul_mat and cpy
* minor cleanups
* Address feedback
* Remove division by type size in cpy op
* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends
* Fix name
* Fix macos dawn prefix path
2025-07-16 18:18:51 +03:00
Georgi Gerganov
225e7a1438
llama : add high-throughput mode ( #14363 )
...
* kv-cache : prepare K/V buffers for separation
ggml-ci
* batched-bench : fix oob write
ggml-ci
* llama : add "virtual sequences"
ggml-ci
* llama : use "stream" vs "virtual sequence"
ggml-ci
* graph : fix stream splitting when KV cache is not used
ggml-ci
* kv-cache : add multi-stream save/load support
ggml-ci
* llama : add "--attn-streams" flag
ggml-ci
* kv-cache : fix handling when find_slot fails
ggml-ci
* kv-cache : restore find_slot impl
ggml-ci
* kv-cache : add comments
* kv-cache : add bounds checks for sequence id
ggml-ci
* cont : add n_seq_max to batch allocr
ggml-ci
* kv-cache : perform stream copies lazily after llama_synchronize
ggml-ci
* kv-cache : avoid throwing exceptions across the C boundary
ggml-ci
* CUDA: 4D FlashAttention support (#14628 )
* CUDA: 4D FlashAttention support
* CUDA: fix WMMA FA kernel
* llama : rename attn_streams -> kv_unified
ggml-ci
* common : rename kv_split -> kv_unified
ggml-ci
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-07-16 16:35:42 +03:00
Georgi Gerganov
64978340b0
ggml : add asserts ( #14720 )
...
* ggml : add asserts
ggml-ci
* cont : fix constant type
Co-authored-by: Diego Devesa <slarengh@gmail.com >
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-07-16 14:43:32 +03:00
Jeff Bolz
ba1ceb3456
vulkan: fix noncontig check for mat_mul_id splitting ( #14683 )
...
* vulkan: fix noncontig check for mat_mul_id splitting
Remove supports_op check for > 4096 (splitting fixes this)
* vulkan: fix batched matmul dequant for Q*_K
2025-07-15 21:51:09 +02:00
Jeff Bolz
10a0351a97
vulkan: add RTE variants for glu/add/sub/mul/div ( #14653 )
2025-07-15 21:32:11 +02:00
R0CKSTAR
cbc68be51d
cuda: fix build warnings in set-rows.cu (unused variable) ( #14687 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com >
2025-07-15 15:28:53 +08:00
Anton Mitkov
bdca38376f
sycl: Hotfix for non dnnl codepath ( #14677 )
2025-07-14 18:12:42 +01:00
shalinib-ibm
55c509daf5
ggml : refactor llamafile_sgemm PPC code ( #14673 )
...
Remove un-necessary templates from class definition and packing functions
Reduce deeply nested conditionals, if-else switching in mnapck function
Replace repetitive code with inline functions in Packing functions
2 ~ 7% improvement in Q8 Model
15 ~ 50% improvement in Q4 Model
Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com >
2025-07-14 16:16:42 +03:00
Akarshan Biswas
0f4c6ec0f1
SYCL: use 1D kernel for set_rows ( #14618 )
...
* SYCL: Use 1D kernel for set_rows
* Remove dangling comment
* Refactor and use ceil_div
2025-07-14 10:37:55 +01:00
Anton Mitkov
65a3ebb0aa
sycl: Batched mulmat rework for oneDNN dispatch ( #14617 )
2025-07-14 10:37:35 +01:00
Sigbjørn Skjæret
923e3ea2e3
cuda : add set rows for bf16 ( #14664 )
2025-07-13 15:01:24 +02:00
Yavor Ivanov
e743cddb60
cuda : add ELU support ( #14657 )
2025-07-13 11:33:16 +02:00
Georgi Gerganov
05fec5bd29
ggml : add build-time message to remind about ggml_set_rows ( #14661 )
...
ggml-ci
2025-07-13 10:36:33 +03:00
Yavor Ivanov
dcf7f2ea3c
metal : Add missing unary ops Metal support ( #14660 )
2025-07-13 08:38:13 +03:00
Aman Gupta
7de5c7cab6
CUDA: add set rows for f32 and f16 ( #14551 )
...
* CUDA: add set rows for f32 and f16
* Review: change kernel params, use strides from host
* Use 1-d kernel
* Review: use int64_t for blockDim.x, rename nb->s for clarity
2025-07-12 16:31:38 +03:00
Georgi Gerganov
3120413ccd
vulkan : remove unused vars ( #0 )
...
ggml-ci
2025-07-12 14:25:44 +03:00
Acly
74bb294591
vulkan : implement bilinear interpolation (ggml/1291)
...
ggml-ci
2025-07-12 14:25:44 +03:00
Acly
3e303b1107
vulkan : implement ggml_roll (ggml/1290)
...
ggml-ci
2025-07-12 14:25:44 +03:00
Jeff Bolz
b3ad3a0191
vulkan: support SET_ROWS ( #14587 )
...
* vulkan: support SET_ROWS
Add variants of the copy_to_quant shader that do the SET_ROWS operation.
Change these shaders to spread the work across the workgroup.
The memory access pattern is probably not great (one thread per quant block),
but should be fine for now.
* vulkan: optimize set_rows
Larger workgroups for non-quant types.
Set "norepeat" (there is manual repeat logic).
Use fastmod.
2025-07-12 12:12:26 +02:00
Jeff Bolz
98197e5c98
vulkan: optimizations for deepseek prompt processing ( #14555 )
...
* vulkan: allow unclamped loads in coopmat2 mul_mat_id shader
* vulkan: increase coopmat2 mul_mat_id tile size
* vulkan: optimize mat_mul_id row_ids search to batch loads, and port to coopmat1 path
* vulkan: use smaller FA row size when head size is large. applies to both scalar and CM2 paths (CM1 isn't used due to shared memory limits)
2025-07-12 11:51:58 +02:00
Tarek Dakhran
f5e96b368f
model : support LiquidAI LFM2 hybrid family ( #14620 )
...
**Important**
LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340 )into transformers, but has not yet been released.
To convert into gguf, install transformers from source
```shell
pip install "transformers @ git+https://github.com/huggingface/transformers.git@main "
```
2025-07-11 20:27:01 +02:00
Slobodan Josic
756aa1020a
HIP : Add HIP 7.0+ compatibility for hipBLAS compute types ( #14634 )
2025-07-11 18:55:00 +02:00
rmatif
6bdda13981
opencl: add tiled mul_mat_f16_f32 ( #14535 )
...
* add tiled mul_mat_f16_f32
* fix trailing whitespace
* add insightful comments
2025-07-10 14:58:12 -07:00
lhez
0b8855775c
opencl: add set_rows for f16 and f32 ( #14547 )
...
* opencl: add `set_rows` for `f16` and `f32`
* opencl: better choose workgroup size for `set_rows`
2025-07-10 11:48:52 -07:00