llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-08 10:07:01 +00:00

Author	SHA1	Message	Date
Aaron Teo	12e6b8b65d	Merge branch 'master' into feat/backend-zdnn	2025-07-31 02:00:01 +08:00
Aaron Teo	867d3f325d	chore: add codeowners Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-31 01:48:55 +08:00
Aaron Teo	cf8cdcd372	ggml-zdnn: update documentation, prepare for upstream Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-31 01:26:30 +08:00
Daniel Bevenius	41e78c567e	server : add support for `embd_normalize` parameter (#14964 ) This commit adds support for the `embd_normalize` parameter in the server code. The motivation for this is that currently if the server is started with a pooling type that is not `none`, then Euclidean/L2 normalization will be the normalization method used for embeddings. However, this is not always the desired behavior, and users may want to use other normalization (or none) and this commit allows that. Example usage: ```console curl --request POST \ --url http://localhost:8080/embedding \ --header "Content-Type: application/json" \ --data '{"input": "Hello world today", "embd_normalize": -1} ``` b6037	2025-07-30 18:07:11 +02:00
uvos	ad4a700117	HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (#14949 ) b6036	2025-07-30 17:38:06 +02:00
Georgi Gerganov	e32a4ec60e	sync : ggml ggml-ci b6035	2025-07-30 17:33:11 +03:00
Kai Pastor	e228de9449	cmake : Fix BLAS link interface (ggml/1316)	2025-07-30 17:33:11 +03:00
Kai Pastor	73a8e5ca03	vulkan : fix 32-bit builds (ggml/1313) The pipeline member can be cast to VkPipeline. This is a VkPipeline_T* on 64 bit but a uint64_t on 32 bit. Cf. VK_DEFINE_NON_DISPATCHABLE_HANDLE documentation.	2025-07-30 17:33:11 +03:00
Johannes Gäßler	92b8810ec7	CUDA: skip masked KV slices for all FA kernels (#14924 ) b6032	2025-07-30 15:46:13 +02:00
Georgi Gerganov	00131d6eaf	tests : update for LLAMA_SET_ROWS=1 (#14961 ) * test-thread-safety : each context uses a single sequence * embedding : handle --parallel argument ggml-ci * save-load : handle -np 1 ggml-ci * thread-safety : avoid overriding threads, reduce test case arg ggml-ci b6031	2025-07-30 15:12:02 +03:00
Georgi Gerganov	1e15bfd42c	graph : fix stack-use-after-return (#14960 ) ggml-ci b6030	2025-07-30 13:52:11 +03:00
Aaron Teo	92a17ed9f3	ggml-zdnn: clean up project structure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:36:38 +08:00
Aaron Teo	90d460c20b	ggml-zdnn: clean up matmul selection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:34:15 +08:00
Aaron Teo	e67feafc65	ggml-zdnn: fix ztensor deallocation abort stabilise ggml <-> zdnn api Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:27:49 +08:00
Aaron Teo	803dde3bbc	ggml-zdnn: code clean up Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:23:36 +08:00
Aaron Teo	70224e6cb7	ggml-zdnn: bring load ztensor back to init routine Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:21:04 +08:00
Aaron Teo	1eb7c35e3a	ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:57:14 +08:00
Aaron Teo	b7a77cf683	ggml-zdnn: add guards to prevent loading ztensor if transformed Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:15:20 +08:00
Aaron Teo	4d5edb2221	ggml-zdnn: fix errorenous output load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:11:07 +08:00
Aaron Teo	20d69b6cdf	ggml-zdnn: disable global load ztensor for now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:05:58 +08:00
Aaron Teo	4fb6bee1f6	ggml-zdnn: attempt at using default nwhc format instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:04:19 +08:00
Aaron Teo	7b50d057dd	ggml-zdnn: attempt at manually changing the layout Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 15:33:13 +08:00
Aaron Teo	ad0cb30212	ggml-zdnn: disable logging and breakpoints for full test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:52:13 +08:00
Aaron Teo	b4dffed954	ggml-zdnn: work on moving output ztensor as well Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:50:09 +08:00
Aaron Teo	fd766bdd44	ggml-zdnn: load ztensors in cgraph exec Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:40:36 +08:00
Aaron Teo	e30b1ffbde	ggml-zdnn: fix missing return from init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:34:47 +08:00
Aaron Teo	4493b148d0	ggml-zdnn: disable op_none initialisation for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:33:12 +08:00
Douglas Hanley	a118d80233	embeddings: fix extraction of CLS pooling results (#14927 ) * embeddings: fix extraction of CLS pooling results * merge RANK pooling into CLS case for inputs b6029	2025-07-30 08:25:05 +03:00
Aaron Teo	213f1d2a3f	ggml-zdnn: add inputs logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:11:09 +08:00
Aaron Teo	e695e8577d	ggml-zdnn: add tensor to pre_tfm_desc logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:06:36 +08:00
Xinpeng Dou	61550f8231	CANN: update ops docs (#14935 ) * CANN:add ops docs * CANN: update ops docs	2025-07-30 08:39:24 +08:00
uvos	aa79524c51	HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (#14945 ) b6027	2025-07-29 20:23:04 +02:00
uvos	b77d11179d	HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (#14930 ) This is useful for testing for regressions on GCN with CDNA hardware. With GGML_HIP_MMQ_MFMA=Off and GGML_CUDA_FORCE_MMQ=On we can conveniently test the GCN code path on CDNA. As CDNA is just GCN renamed with MFMA added and limited use ACC registers, this provides a good alternative for regression testing when GCN hardware is not available. b6026	2025-07-29 17:44:30 +02:00
uvos	c7aa1364fd	HIP: Ignore unsupported unroll transformation in fattn-vec (#14931 ) llvm with the amdgcn target dose not support unrolling loops with conditional break statements, when those statements can not be resolved at compile time. Similar to other places in GGML lets simply ignore this warning. b6025	2025-07-29 17:43:43 +02:00
kallewoof	1a67fcc306	common : avoid logging partial messages (which can contain broken UTF-8 sequences) (#14937 ) * bug-fix: don't attempt to log partial parsed messages to avoid crash due to unfinished UTF-8 sequences b6024	2025-07-29 17:05:38 +02:00
hipudding	204f2cf168	CANN: Add ggml_set_rows (#14943 ) b6023	2025-07-29 22:36:43 +08:00
Sigbjørn Skjæret	138b288b59	cuda : add softcap fusion (#14907 ) b6022	2025-07-29 14:22:03 +02:00
Aaron Teo	8dbca74fc7	ggml-zdnn: attempt to use unique ptr Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 17:03:58 +08:00
Johannes Gäßler	bbd0f91779	server-bench: make seed choice configurable (#14929 ) * server-bench: make seed choice configurable * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix error formatting * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-07-29 10:40:50 +02:00
Aaron Teo	b1376ad051	ggml-zdnn: add weights logging to check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 16:38:07 +08:00
Aaron Teo	b28b423801	ggml-zdnn: switch to using deque to fix pointer deref problem Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 15:55:33 +08:00
Aaron Teo	3446807452	ggml-zdnn: attempt at fixing invalid buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 15:45:46 +08:00
Aaron Teo	2d45ee2536	ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 15:36:42 +08:00
Aman Gupta	0a5036bee9	CUDA: add roll (#14919 ) * CUDA: add roll * Make everything const, use __restrict__ b6020	2025-07-29 14:45:18 +08:00
Aaron Teo	ab60ae6ca2	ggml-zdnn: add zdnn_init call for static libs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:55:44 +08:00
lhez	8ad7b3e65b	opencl : add ops docs (#14910 )	2025-07-28 18:50:17 +02:00
Aaron Teo	0ae2d30302	ggml-zdnn: add nnpa installed detection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:39:55 +08:00
Aaron Teo	a9438925f2	ggml-zdnn: add parmblkformat detections Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:36:55 +08:00
Aaron Teo	1c6ca76c2e	ggml-zdnn: remove free_buffer debug info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:27:16 +08:00
Aaron Teo	1a0520a540	ggml-zdnn: add logging to debug free buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:12:18 +08:00

1 2 3 4 5 ...

6118 Commits