llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

Johannes Gäßler 73e2ed3ce3 CUDA: use async data loading for FlashAttention (#11894 )

* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>

2025-02-17 14:03:24 +01:00

2025-01-26 12:07:48 -04:00

2025-02-15 16:40:57 +02:00

2025-02-17 14:03:24 +01:00

.gitignore

2024-07-13 18:12:39 +02:00

CMakeLists.txt

2025-02-04 12:59:15 +02:00