llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

mahorozte e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032)

* kqmax_new_j in every thread within warp is same after operate at line 199,this reduce can be omit

* same problem in vec32

---------

Co-authored-by: ZhaoXiaoYu <zhao.xiaoyu@zte.com.cn>

2024-12-03 20:04:49 +02:00

include

ggml : move AMX to the CPU backend (#10570 )

2024-11-29 21:54:58 +01:00

src

CUDA: remove unnecessary warp reduce in FA (ggml/1032)

2024-12-03 20:04:49 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : automatic selection of best CPU backend (#10606 )

2024-12-01 16:12:41 +01:00