Commit Graph

6118 Commits

Author SHA1 Message Date
Aaron Teo
12e6b8b65d Merge branch 'master' into feat/backend-zdnn 2025-07-31 02:00:01 +08:00
Aaron Teo
867d3f325d chore: add codeowners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-31 01:48:55 +08:00
Aaron Teo
cf8cdcd372 ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-31 01:26:30 +08:00
Daniel Bevenius
41e78c567e server : add support for embd_normalize parameter (#14964)
This commit adds support for the `embd_normalize` parameter in the
server code.

The motivation for this is that currently if the server is started with
a pooling type that is not `none`, then Euclidean/L2 normalization will
be the normalization method used for embeddings. However, this is not
always the desired behavior, and users may want to use other
normalization (or none) and this commit allows that.

Example usage:
```console
curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello world today", "embd_normalize": -1}
```
b6037
2025-07-30 18:07:11 +02:00
uvos
ad4a700117 HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (#14949) b6036 2025-07-30 17:38:06 +02:00
Georgi Gerganov
e32a4ec60e sync : ggml
ggml-ci
b6035
2025-07-30 17:33:11 +03:00
Kai Pastor
e228de9449 cmake : Fix BLAS link interface (ggml/1316) 2025-07-30 17:33:11 +03:00
Kai Pastor
73a8e5ca03 vulkan : fix 32-bit builds (ggml/1313)
The pipeline member can be cast to VkPipeline.
This is a VkPipeline_T* on 64 bit but a uint64_t on 32 bit.
Cf. VK_DEFINE_NON_DISPATCHABLE_HANDLE documentation.
2025-07-30 17:33:11 +03:00
Johannes Gäßler
92b8810ec7 CUDA: skip masked KV slices for all FA kernels (#14924) b6032 2025-07-30 15:46:13 +02:00
Georgi Gerganov
00131d6eaf tests : update for LLAMA_SET_ROWS=1 (#14961)
* test-thread-safety : each context uses a single sequence

* embedding : handle --parallel argument

ggml-ci

* save-load : handle -np 1

ggml-ci

* thread-safety : avoid overriding threads, reduce test case arg

ggml-ci
b6031
2025-07-30 15:12:02 +03:00
Georgi Gerganov
1e15bfd42c graph : fix stack-use-after-return (#14960)
ggml-ci
b6030
2025-07-30 13:52:11 +03:00
Aaron Teo
92a17ed9f3 ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:36:38 +08:00
Aaron Teo
90d460c20b ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:34:15 +08:00
Aaron Teo
e67feafc65 ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:27:49 +08:00
Aaron Teo
803dde3bbc ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:23:36 +08:00
Aaron Teo
70224e6cb7 ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:21:04 +08:00
Aaron Teo
1eb7c35e3a ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:57:14 +08:00
Aaron Teo
b7a77cf683 ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:15:20 +08:00
Aaron Teo
4d5edb2221 ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:11:07 +08:00
Aaron Teo
20d69b6cdf ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:05:58 +08:00
Aaron Teo
4fb6bee1f6 ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:04:19 +08:00
Aaron Teo
7b50d057dd ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 15:33:13 +08:00
Aaron Teo
ad0cb30212 ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:52:13 +08:00
Aaron Teo
b4dffed954 ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:50:09 +08:00
Aaron Teo
fd766bdd44 ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:40:36 +08:00
Aaron Teo
e30b1ffbde ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:34:47 +08:00
Aaron Teo
4493b148d0 ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:33:12 +08:00
Douglas Hanley
a118d80233 embeddings: fix extraction of CLS pooling results (#14927)
* embeddings: fix extraction of CLS pooling results

* merge RANK pooling into CLS case for inputs
b6029
2025-07-30 08:25:05 +03:00
Aaron Teo
213f1d2a3f ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:11:09 +08:00
Aaron Teo
e695e8577d ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:06:36 +08:00
Xinpeng Dou
61550f8231 CANN: update ops docs (#14935)
* CANN:add ops docs

* CANN: update ops docs
2025-07-30 08:39:24 +08:00
uvos
aa79524c51 HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (#14945) b6027 2025-07-29 20:23:04 +02:00
uvos
b77d11179d HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (#14930)
This is useful for testing for regressions on GCN with CDNA hardware.

With GGML_HIP_MMQ_MFMA=Off and GGML_CUDA_FORCE_MMQ=On we can conveniently test the GCN code path on CDNA. As CDNA is just GCN renamed with MFMA added and limited use ACC registers, this provides a good alternative for regression testing when GCN hardware is not available.
b6026
2025-07-29 17:44:30 +02:00
uvos
c7aa1364fd HIP: Ignore unsupported unroll transformation in fattn-vec (#14931)
llvm with the amdgcn target dose not support unrolling loops with conditional break statements, when those statements can not be resolved at compile time. Similar to other places in GGML lets simply ignore this warning.
b6025
2025-07-29 17:43:43 +02:00
kallewoof
1a67fcc306 common : avoid logging partial messages (which can contain broken UTF-8 sequences) (#14937)
* bug-fix: don't attempt to log partial parsed messages to avoid crash due to unfinished UTF-8 sequences
b6024
2025-07-29 17:05:38 +02:00
hipudding
204f2cf168 CANN: Add ggml_set_rows (#14943) b6023 2025-07-29 22:36:43 +08:00
Sigbjørn Skjæret
138b288b59 cuda : add softcap fusion (#14907) b6022 2025-07-29 14:22:03 +02:00
Aaron Teo
8dbca74fc7 ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 17:03:58 +08:00
Johannes Gäßler
bbd0f91779 server-bench: make seed choice configurable (#14929)
* server-bench: make seed choice configurable

* Update scripts/server-bench.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update scripts/server-bench.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix error formatting

* Update scripts/server-bench.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-29 10:40:50 +02:00
Aaron Teo
b1376ad051 ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 16:38:07 +08:00
Aaron Teo
b28b423801 ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 15:55:33 +08:00
Aaron Teo
3446807452 ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 15:45:46 +08:00
Aaron Teo
2d45ee2536 ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 15:36:42 +08:00
Aman Gupta
0a5036bee9 CUDA: add roll (#14919)
* CUDA: add roll

* Make everything const, use __restrict__
b6020
2025-07-29 14:45:18 +08:00
Aaron Teo
ab60ae6ca2 ggml-zdnn: add zdnn_init call for static libs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 00:55:44 +08:00
lhez
8ad7b3e65b opencl : add ops docs (#14910) 2025-07-28 18:50:17 +02:00
Aaron Teo
0ae2d30302 ggml-zdnn: add nnpa installed detection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 00:39:55 +08:00
Aaron Teo
a9438925f2 ggml-zdnn: add parmblkformat detections
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 00:36:55 +08:00
Aaron Teo
1c6ca76c2e ggml-zdnn: remove free_buffer debug info
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 00:27:16 +08:00
Aaron Teo
1a0520a540 ggml-zdnn: add logging to debug free buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 00:12:18 +08:00