llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

Johannes Gäßler 07a19e27a2 CUDA: fix quantized KV cache + multiple sequences (#14822 )

* CUDA: fix quantized KV cache + multiple sequences

* Update ggml/src/ggml-cuda/fattn-common.cuh

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2025-07-23 14:08:09 +03:00

cmake

ggml-cpu : rework weak alias on apple targets (#14146 )

2025-06-16 13:54:15 +08:00

include

ggml: Add initial WebGPU backend (#14521 )

2025-07-16 18:18:51 +03:00

src

CUDA: fix quantized KV cache + multiple sequences (#14822 )

2025-07-23 14:08:09 +03:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml: Add initial WebGPU backend (#14521 )

2025-07-16 18:18:51 +03:00