mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-10-27 08:21:30 +00:00
* CUDA: Add mul_mat_id support the mmf Add support for mul_mat_id for bs < 16 * Review: use warp_size, fix should_use_mmf condition * Launch one block per expert, stride along n_expert_used * templatize mul_mat_id * Pad shmem to 16 bytes, add helper function mul_mat_f_switch_ids * Reduce compile times by dividing mmf into f16, bf16 and f32 variants * Divide mmf by ncols_dst * Add missing files * Fix MUSA/HIP builds
6 lines
125 B
Plaintext
6 lines
125 B
Plaintext
// This file has been autogenerated by generate_cu_files.py, do not edit manually.
|
|
|
|
#include "../mmf.cuh"
|
|
|
|
DECL_MMF_CASE(3);
|