llama.cpp/ggml.c at bbe7c56c9993af86aa2d84cbe1fd69e1b4300cea

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

snadampal 7032f4f634 ggml : update softmax n_task calculation (#5126 )

updated the n_task calculation to use max number of
threads possible. This has improved the prompt eval
performance by around 5% for DOT kernels and by
around 10% for MMLA kernels on AWS Graviton3.

2024-01-26 19:17:59 +02:00

654 KiB

Raw Blame History

View Raw

654 KiB Raw Blame History

654 KiB

Raw Blame History