mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-29 08:41:22 +00:00

Files

Georgi Gerganov a10b36c91a llama : refactor kv cache guard (#12695 )

* llama : refactor kv cache guard

ggml-ci

* cont : fix comment [no ci]

* llama : fix kv_cache restore logic

ggml-ci

* context : simplify kv cache updates

ggml-ci

* cont : better name [no ci]

* llama : fix llama_decode return code when could not find KV slot

ggml-ci

* context : change log err -> warn [no ci]

* kv-cache : add comment + warning

2025-04-02 14:32:59 +03:00

CMakeLists.txt

ggml : move AMX to the CPU backend (#10570 )

2024-11-29 21:54:58 +01:00

parallel.cpp

llama : refactor kv cache guard (#12695 )

2025-04-02 14:32:59 +03:00

README.md

Fix some documentation typos/grammar mistakes (#4032 )

2023-11-11 23:04:58 -07:00

README.md

llama.cpp/example/parallel

Simplified simulation of serving incoming requests in parallel