Aman Gupta
|
48e2fa9fb7
|
CUDA: add fp kernel for larger batch size MoE (#16512)
* CUDA: kernel for larger batch sizes for MoE
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP
* fixup
* tests
* Move mmq_ids_helper to mmid
* cleanup
* Remove redundant checks
|
2025-10-14 13:15:15 +02:00 |
|