llama.cpp/ggml.c at 15b4538ff29b280a395a1406d711497d8eaa2564

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-13 10:57:15 +00:00

Files

snadampal 7032f4f634 ggml : update softmax n_task calculation (#5126 )

updated the n_task calculation to use max number of
threads possible. This has improved the prompt eval
performance by around 5% for DOT kernels and by
around 10% for MMLA kernels on AWS Graviton3.

2024-01-26 19:17:59 +02:00

654 KiB

Raw Blame History

View Raw

654 KiB Raw Blame History

654 KiB

Raw Blame History