llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

Francis Couture-Harpin 96b3d411e0 ggml-quants : allow using vdotq_s32 in TQ2_0 vec_dot

Not yet tested on harware which supports it,
might not work or might not even compile. But also it might.
It should make the performance better on recent ARM CPUs.

* ggml-quants : remove comment about possible format change of TQ2_0

Making it slightly more convenient for AVX512
but less convenient for everything else is not worth the trouble.

2024-08-07 15:08:41 -04:00

cmake

llama : reorganize source code + improve CMake (#8006 )

2024-06-26 18:33:02 +03:00

include

ggml : remove q1_3 and q2_2

2024-08-02 20:16:26 -04:00

src

ggml-quants : allow using vdotq_s32 in TQ2_0 vec_dot

2024-08-07 15:08:41 -04:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

feat: Support Moore Threads GPU (#8383 )

2024-07-28 01:41:25 +02:00