llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Max Krasnyansky 517b7170e1 cpu: introduce chunking for repack matmuls and enable matmul-id chunking on ARM64 (#16833 )

Very similar implementation to the flash-attention chunking, with similar benefits.

2025-10-30 09:06:13 -07:00

2025-08-07 13:45:41 +02:00

2025-10-30 16:19:14 +01:00

2025-10-30 09:06:13 -07:00

.gitignore

2024-07-13 18:12:39 +02:00

CMakeLists.txt

2025-10-22 13:47:09 -07:00