Johannes Gäßler
|
0cf6725e9f
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
* CUDA: FA support for Deepseek (Ampere or newer)
* do loop unrolling via C++ template
|
2025-05-09 13:34:58 +02:00 |
|
Johannes Gäßler
|
5fa07c2f93
|
CUDA: optimize FA for GQA + large batches (#12014)
|
2025-02-22 12:20:17 +01:00 |
|
Johannes Gäßler
|
73e2ed3ce3
|
CUDA: use async data loading for FlashAttention (#11894)
* CUDA: use async data loading for FlashAttention
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
|
2025-02-17 14:03:24 +01:00 |
|