llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

Jeff Bolz e68aa10d8f vulkan: sort graph to allow more parallel execution (#15850 )

* vulkan: sort graph to allow more parallel execution

Add a backend proc to allow the backend to modify the graph. The
vulkan implementation looks at which nodes depend on each other
and greedily reorders them to group together nodes that don't
depend on each other. It only reorders the nodes, doesn't change
the contents of any of them.

With #15489, this reduces the number of synchronizations needed.

* call optimize_graph per-split

2025-09-09 02:10:07 +08:00

cmake

ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )

2025-08-07 13:45:41 +02:00

include

cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868 )

2025-09-08 13:56:51 +03:00

src

vulkan: sort graph to allow more parallel execution (#15850 )

2025-09-09 02:10:07 +08:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml-cpu: drop support for nnpa intrinsics (#15821 )

2025-09-06 11:27:28 +08:00