llama.cpp/ggml/include/ggml.h at podman

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

Johannes Gäßler 864a0b67a6 CUDA: use mma PTX instructions for FlashAttention (#11583 )

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <slarengh@gmail.com>

2025-02-02 19:31:09 +01:00

81 KiB

Raw Permalink Blame History

View Raw

81 KiB Raw Permalink Blame History

81 KiB

Raw Permalink Blame History