llama.cpp/ggml-cuda.cu at ebd062bc19b92ff9860a12e4f789b305015fb18b - llama.cpp - Gitea - Peisong Xiao

CS348Project/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

Georgi Gerganov ebd062bc19 cuda : use 512 threads for soft_max instead of 32

2023-11-30 18:07:51 +02:00

312 KiB

Raw Blame History

View Raw