llama.cpp/mmf-instance-ncols_3.cu at a972faebed5fdc4a3d2a844d92d476058c02e02d - llama.cpp - Gitea - Peisong Xiao

CS348Project/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

Aman Gupta a972faebed CUDA: Add mul_mat_id support for the mmf kernel (#15767 )

* CUDA: Add mul_mat_id support the mmf

Add support for mul_mat_id for bs < 16

* Review: use warp_size, fix should_use_mmf condition

* Launch one block per expert, stride along n_expert_used

* templatize mul_mat_id

* Pad shmem to 16 bytes, add helper function mul_mat_f_switch_ids

* Reduce compile times by dividing mmf into f16, bf16 and f32 variants

* Divide mmf by ncols_dst

* Add missing files

* Fix MUSA/HIP builds

2025-09-09 14:38:02 +08:00

6 lines

125 B

Plaintext

Raw Blame History

 // This file has been autogenerated by generate_cu_files.py, do not edit manually.
 #include "../mmf.cuh"
 DECL_MMF_CASE(3);