Aaron Teo
fc692ed498
ggml-zdnn: figure out why sigtrap is happening
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 18:00:28 +08:00
Aaron Teo
08de84ef85
ggml-zdnn: bugfix transform ztensor vs origtensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 16:57:57 +08:00
Aaron Teo
032dce5a6a
ggml-zdnn: fix sequencing of transforms
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 16:46:17 +08:00
Aaron Teo
cf0e190c40
ggml-zdnn: add more safeguards in matmul
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 16:44:39 +08:00
Aaron Teo
f239bbb02d
ggml-zdnn: move weights transform into mulmat
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 16:38:44 +08:00
Aaron Teo
092fa3a328
ggml-zdnn: activate bias transform in matmul
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 16:27:35 +08:00
Aaron Teo
f7e8d6f2b2
ggml-zdnn: add logger to check if mat mul ops go through set_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 16:17:12 +08:00
Aaron Teo
6d71749c26
ggml-zdnn: add more debug info for extra buffer transform
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 16:10:07 +08:00
Aaron Teo
4b2f1cb1b8
ggml-zdnn: add bias data transform
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 16:05:53 +08:00
Aaron Teo
f800c80281
ggml-zdnn: add bias ztensor and data free
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 15:59:52 +08:00
Aaron Teo
bee7dd3020
ggml-zdnn: tighten memory usage, change string allocation
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 15:55:42 +08:00
Aaron Teo
aef93b3908
ggml-zdnn: add bias init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-28 15:41:56 +08:00
Aaron Teo
f263f5d9ae
ggml-zdnn: fix missing data transform call
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 18:30:10 +08:00
Aaron Teo
1c75ed63e5
ggml-zdnn: fix compiler error missing type
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 18:22:34 +08:00
Aaron Teo
a1d8568c14
ggml-zdnn: impl matmul
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 18:13:07 +08:00
Aaron Teo
59e9805ab0
ggml-zdnn: code clean up
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 16:26:29 +08:00
Aaron Teo
c1653ab639
ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 16:22:06 +08:00
Aaron Teo
828519659b
ggml-zdnn: update supports_op matmul matrix
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 16:11:37 +08:00
Aaron Teo
18658b8607
ggml-zdnn: impl init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 12:02:20 +08:00
Aaron Teo
da2e0e70ba
ggml-zdnn: switch buffers back and set to arbitrary number
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 02:31:22 +08:00
Aaron Teo
63fbc45ed6
ggml-zdnn: switch to std vector instead of array
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 01:09:01 +08:00
Aaron Teo
b7f4b6fde3
ggml-zdnn: rework init_tensor to create new buffers
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 01:03:53 +08:00
Aaron Teo
ee0ed78d54
ggml-zdnn: add check for view tensors to prevent init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:56:32 +08:00
Aaron Teo
13c64448bd
ggml-zdnn: assign tensor->extra to buffer
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:48:32 +08:00
Aaron Teo
13c05872f2
ggml-zdnn: implement at least 1 op to test
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:44:05 +08:00
Aaron Teo
9e84742e72
ggml-zdnn: test ztensor finding in init_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:40:22 +08:00
Aaron Teo
af9f4f0039
ggml-zdnn: fix compiler warnings and bugfixes
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:25:41 +08:00
Aaron Teo
ae2f656d7e
ggml-zdnn: bugfix new impl
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:18:53 +08:00
Aaron Teo
7c6395f826
ggml-zdnn: rewrite the backend implementation
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-24 00:14:45 +08:00
Aaron Teo
04ddb2ac95
ggml-zdnn: update op out_prod to use tensor->extra
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-23 19:51:37 +08:00
Aaron Teo
77a753297b
ggml-zdnn: support op out_prod
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-23 19:28:51 +08:00
Aaron Teo
11d58d29de
ggml-zdnn: add comments to prevent accidentally deleting lines
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-23 14:54:44 +08:00
Aaron Teo
529bdb9fbd
ggml-zdnn: last working matmul version
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-22 00:29:47 +08:00
Aaron Teo
60b9874dea
ggml-zdnn: update set_tensor logging to check only for matmul
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 21:11:39 +08:00
Aaron Teo
b9756b6dd4
ggml-zdnn: add more loggers
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 21:09:21 +08:00
Aaron Teo
1989fc9bf4
ggml-zdnn: add set_tensor
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 20:37:53 +08:00
Aaron Teo
36d76c30fb
ggml-zdnn: run compute and store into tensor->extra
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 20:30:54 +08:00
Aaron Teo
02cfcfb270
ggml-zdnn: add output buffer check
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-21 20:19:20 +08:00
Aaron Teo
fd4914b060
ggml-zdnn: tensor->extra logging check
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-18 21:00:30 +08:00
Aaron Teo
e084821a3f
ggml-zdnn: inital backend impl
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com >
2025-07-18 20:04:20 +08:00
Georgi Gerganov
01612b7409
llama : reuse compute graphs ( #14482 )
...
* llama : reuse compute graphs
ggml-ci
* llama-bench : add graph reuse parameter
ggml-ci
* cont : remove the parameter and the sched resets
ggml-ci
* graph : rename update() to can_reuse()
ggml-ci
* params : remove is_same()
ggml-ci
* graph : set res->params in llm_graph_context constructor
ggml-ci
* graph : avoid set_max_nodes in llm_graph_result
ggml-ci
* kv-cache : reuse llama_context's graph result instance
ggml-ci
* context : reset the previous graph result upon memory updates
ggml-ci
* batch : llama_ubatch now carries its data instead of pointing to balloc
ggml-ci
* merge : fix build
ggml-ci
* graph : fix can_reuse() checks when flash-attention is disabled
* graph : move llm_graph_result impl in source file + debug env
ggml-ci
b5922
2025-07-17 19:08:33 +03:00
Tarek Dakhran
086cf81e88
llama : fix parallel processing for lfm2 ( #14705 )
b5921
2025-07-17 09:22:11 +02:00
Georgi Gerganov
d9b691081c
kv-cache : opt mask set input ( #14600 )
...
ggml-ci
b5920
2025-07-17 09:49:15 +03:00
Georgi Gerganov
ad57d3edd2
batch : fix uninitialized has_cpl flag ( #14733 )
...
ggml-ci
b5919
2025-07-17 09:45:54 +03:00
Sigbjørn Skjæret
1ba45d4982
ci : disable failing vulkan crossbuilds ( #14723 )
2025-07-16 20:52:08 -03:00
Sigbjørn Skjæret
19e5943d9e
convert : make hf token optional ( #14717 )
...
* make hf token optional
* fail if we can't get necessary tokenizer config
2025-07-16 23:17:43 +02:00
Diner Burger
496957e1cb
llama : fix parameter order for hybrid memory initialization ( #14725 )
b5916
2025-07-16 21:17:25 +02:00
Reese Levine
21c021745d
ggml: Add initial WebGPU backend ( #14521 )
...
* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults
* Initialize webgpu device
* Making progress on setting up the backend
* Finish more boilerplate/utility functions
* Organize file and work on alloc buffer
* Add webgpu_context to prepare for actually running some shaders
* Work on memset and add shader loading
* Work on memset polyfill
* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it
* Implement get_tensor and buffer_clear
* Finish rest of setup
* Start work on compute graph
* Basic mat mul working
* Work on emscripten build
* Basic WebGPU backend instructions
* Use EMSCRIPTEN flag
* Work on passing ci, implement 4d tensor multiplication
* Pass thread safety test
* Implement permuting for mul_mat and cpy
* minor cleanups
* Address feedback
* Remove division by type size in cpy op
* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends
* Fix name
* Fix macos dawn prefix path
b5915
2025-07-16 18:18:51 +03:00
tempstudio
b0f0ecc3dc
model : support output bias for qwen2 ( #14711 )
...
Co-authored-by: qwaqrm <qwaqrm@126.com >
b5914
2025-07-16 18:02:06 +03:00
Georgi Gerganov
225e7a1438
llama : add high-throughput mode ( #14363 )
...
* kv-cache : prepare K/V buffers for separation
ggml-ci
* batched-bench : fix oob write
ggml-ci
* llama : add "virtual sequences"
ggml-ci
* llama : use "stream" vs "virtual sequence"
ggml-ci
* graph : fix stream splitting when KV cache is not used
ggml-ci
* kv-cache : add multi-stream save/load support
ggml-ci
* llama : add "--attn-streams" flag
ggml-ci
* kv-cache : fix handling when find_slot fails
ggml-ci
* kv-cache : restore find_slot impl
ggml-ci
* kv-cache : add comments
* kv-cache : add bounds checks for sequence id
ggml-ci
* cont : add n_seq_max to batch allocr
ggml-ci
* kv-cache : perform stream copies lazily after llama_synchronize
ggml-ci
* kv-cache : avoid throwing exceptions across the C boundary
ggml-ci
* CUDA: 4D FlashAttention support (#14628 )
* CUDA: 4D FlashAttention support
* CUDA: fix WMMA FA kernel
* llama : rename attn_streams -> kv_unified
ggml-ci
* common : rename kv_split -> kv_unified
ggml-ci
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
b5913
2025-07-16 16:35:42 +03:00