llama.cpp/ggml/src/ggml-cuda/mmid.cuh at cb44fc84e8450acf88e913611f5c26a0c8b1aefa - llama.cpp - Gitea - Peisong Xiao

CS348Project/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-21 12:16:57 +00:00

Files

Aman Gupta 48e2fa9fb7 CUDA: add fp kernel for larger batch size MoE (#16512 )

* CUDA: kernel for larger batch sizes for MoE

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* fixup

* tests

* Move mmq_ids_helper to mmid

* cleanup

* Remove redundant checks

2025-10-14 13:15:15 +02:00

6 lines

258 B

Plaintext

Raw Blame History

 #pragma once
 void ggml_cuda_launch_mm_ids_helper(
         const int32_t * ids, int32_t * ids_src1, int32_t * ids_dst, int32_t * expert_bounds,
         int n_experts, int n_tokens, int n_expert_used, int nchannels_y, int si1, int sis1, cudaStream_t stream);