CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014)

* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID

* fix logic for RoPE support, CUDA graphs
This commit is contained in:
Johannes Gäßler
2025-04-22 21:27:40 +02:00
committed by GitHub
parent dc39a5e7a8
commit 658987cfc9
9 changed files with 548 additions and 426 deletions

View File

@@ -1,3 +1,5 @@
#pragma once
#include "common.cuh"
#include <cstdint>