llama.cpp/ggml-cuda/quantize.cuh at beea6e1b16e783a0886e78dec01002a8c00db24d - llama.cpp - Gitea - Peisong Xiao

CS348Project/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

slaren ae1f211ce2 cuda : refactor into multiple files (#6269 )

2024-03-25 13:50:23 +01:00

6 lines

188 B

Plaintext

Raw Blame History

 #include "common.cuh"
 #define CUDA_QUANTIZE_BLOCK_SIZE 256
 void quantize_row_q8_1_cuda(const float * x, void * vy, const int kx, const int ky, const int kx_padded, cudaStream_t stream);