wip: llama : separate recurrent states from the KV cache

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-08 10:07:01 +00:00

This will be necessary to support Jamba
(and other recurrent models mixed with Attention).

Doesn't compile yet, and finding a slot isn't yet done correctly for recurrent states.

This commit is contained in:

Francis Couture-Harpin

2024-04-03 11:07:16 -04:00

parent 5fb1574c81

commit 271104c65c

1 changed files with 979 additions and 445 deletions

1424

llama.cpp

View File

File diff suppressed because it is too large Load Diff

wip: llama : separate recurrent states from the KV cache

1424 llama.cpp View File

1424

llama.cpp

View File