Aman Gupta
c0bfc57af4
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 ( #16277 )
...
* CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32
This commit adds mul_mat_id support for ncols_dst >= 16. It does this by
packing ncols_dst tiles into the blockDim.y.
My tests on a RTX 3090 show that this is faster than the cuBLAS fallback
for f16 till bs=64, and for f32 till bs=32
* Review: refactor if statement
2025-09-27 18:49:32 +02:00
..
2025-09-27 18:45:07 +02:00
2025-09-17 15:32:42 +02:00
2025-05-12 14:44:49 +02:00
2024-06-26 18:33:02 +03:00
2025-08-20 10:17:37 +08:00
2025-08-05 22:10:36 +03:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2024-11-21 18:18:50 +01:00
2024-10-03 21:17:26 +03:00
2024-07-27 04:41:55 +02:00
2024-06-26 18:33:02 +03:00
2025-09-26 02:56:10 +02:00
2025-08-29 11:35:58 +08:00
2025-03-03 18:18:11 +02:00
2024-06-26 18:33:02 +03:00
2025-09-20 13:02:14 +03:00
2025-09-27 18:45:07 +02:00
2025-03-30 10:59:38 +02:00
2024-06-26 18:33:02 +03:00
2025-06-20 09:50:24 +08:00
2025-06-20 09:50:24 +08:00
2025-06-20 22:48:24 +08:00
2025-06-20 22:48:24 +08:00
2025-09-01 06:55:06 +05:30
2025-08-28 20:33:03 +02:00
2025-08-21 11:06:05 +08:00
2024-07-08 12:23:00 +03:00
2025-08-21 11:06:05 +08:00
2025-09-08 12:33:01 +02:00
2024-11-09 08:35:46 +01:00
2024-10-03 21:17:26 +03:00
2025-05-09 13:34:58 +02:00
2025-08-14 16:23:56 +02:00
2025-09-18 13:28:22 +02:00
2025-04-17 15:19:42 +02:00
2025-07-03 07:45:11 +08:00
2024-09-20 21:15:05 +03:00
2025-08-20 16:58:49 +02:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2025-09-27 18:45:07 +02:00
2025-08-21 11:06:05 +08:00
2025-09-18 19:28:32 +02:00
2025-09-07 00:26:28 +02:00
2025-09-27 18:45:07 +02:00
2025-08-21 11:06:05 +08:00
2025-02-02 19:31:09 +01:00
2025-09-27 18:45:07 +02:00
2025-08-20 23:14:14 +02:00
2025-09-09 08:11:01 +02:00
2025-04-30 23:12:59 +02:00
2025-09-27 18:49:32 +02:00
2025-01-10 09:58:08 +08:00
2025-01-10 09:58:08 +08:00
2025-09-16 00:28:31 +02:00
2025-09-04 10:38:49 +02:00
2025-08-14 13:22:07 +03:00
2025-06-22 12:39:54 +08:00
2025-09-09 14:38:02 +08:00
2025-09-27 18:49:32 +02:00
2025-09-27 18:49:32 +02:00
2025-09-26 02:56:10 +02:00
2025-08-25 17:23:40 +02:00
2025-08-21 11:06:05 +08:00
2025-08-07 10:53:21 +02:00
2025-09-26 02:56:10 +02:00
2025-04-22 21:27:40 +02:00
2025-09-03 19:59:16 +02:00
2025-08-29 11:35:58 +08:00
2024-11-17 08:30:29 +02:00
2024-09-20 21:15:05 +03:00
2025-08-14 12:03:57 +02:00
2025-08-14 12:03:57 +02:00
2025-01-24 12:38:31 +01:00
2024-09-20 21:15:05 +03:00
2025-09-26 02:56:10 +02:00
2025-08-22 13:06:29 +02:00
2025-09-04 10:38:49 +02:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2024-06-26 18:33:02 +03:00
2025-09-05 16:07:02 +02:00
2025-04-30 23:12:59 +02:00
2025-08-20 10:17:37 +08:00
2025-07-29 14:45:18 +08:00
2025-07-29 14:45:18 +08:00
2025-07-08 10:15:21 +03:00
2025-01-15 12:51:37 +01:00
2025-09-04 10:38:49 +02:00
2024-06-26 18:33:02 +03:00
2025-09-22 19:13:00 +02:00
2025-07-12 16:31:38 +03:00
2025-07-29 14:22:03 +02:00
2025-07-29 14:22:03 +02:00
2025-08-05 22:10:36 +03:00
2025-01-16 16:43:38 +01:00
2025-07-11 20:27:01 +02:00
2025-03-31 18:05:13 +02:00
2025-08-28 10:11:36 -04:00
2025-03-31 18:05:13 +02:00
2025-08-13 10:04:46 +02:00
2024-09-08 11:05:55 +03:00
2025-08-13 10:04:46 +02:00
2025-06-22 12:39:54 +08:00
2025-09-25 16:35:05 +02:00
2025-09-25 16:35:05 +02:00
2025-09-16 15:25:57 +02:00
2024-06-26 18:33:02 +03:00
2025-08-05 22:10:36 +03:00
2025-08-05 22:10:36 +03:00
2025-07-08 10:11:18 +08:00
2024-06-26 18:33:02 +03:00
2025-08-25 23:21:22 +02:00
2025-03-18 07:27:50 +08:00
2025-03-18 07:27:50 +08:00