llama.cpp/ggml-metal.m at 86c32198954a2bc482025703d6875e11f1c2a574

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-20 12:07:33 +00:00

Files

Matteo Boschini 1873ff586b metal : add gqa8 kernel to allow llama-2-70B on metal (#2459 )

* Added gqa8 kernel to allow llama-2-70B on metal

* Update ggml-metal.m

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>

* Extend kernel_mul_mat_f16_f32 to handle gqa broadcast

* Added ne03==ne13 assertion

---------

Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>

2023-08-01 10:43:12 +03:00

57 KiB

Raw Blame History

View Raw

57 KiB Raw Blame History

57 KiB

Raw Blame History