llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-01 09:01:57 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	59fee24c72	recurrent : rework graph inputs + add TODOs ggml-ci	2025-06-18 09:29:51 +03:00
Gabe Goodhart	5046d412ef	fix: Fix initialization of child states Since initially writing this PR, the logic in the child state types changed such that using the "init full" signature and keeping the ubatches on the parent struct no longer worked. Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:19 -06:00
Gabe Goodhart	4ec4e6a801	refactor: Use llama_memory_state_ptr for child states in hybrid memory state Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:19 -06:00
Gabe Goodhart	1510016ea4	fix: Remove logits_all after rebase Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:19 -06:00
Gabe Goodhart	d8c929ff5d	feat: Allow custom layer filters for hybrid recurrent This should help support architectures like Falcon H1 where there is overlap between layers that need attention and recurrent caches. https://github.com/ggml-org/llama.cpp/pull/13979#discussion_r2140748922 Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:19 -06:00
Gabe Goodhart	9c1a604af8	fix: Update clear signature for data argument after rebase Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:18 -06:00
Gabe Goodhart	911e694476	fix: Fix status for init_update sig for recurrent cache state Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:18 -06:00
Gabe Goodhart	d3699366e6	fix: Update recurrent cache for changes to remove intermediate kv_cache interface Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:18 -06:00
Gabe Goodhart	cf03d4ae5c	fix: Fix shift logic to defer to unified cache Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:18 -06:00
Gabe Goodhart	6c6ec0003a	fix: Fix wrong bool condition for split equal in hybrid cache Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:18 -06:00
Gabe Goodhart	c71eaa37a0	feat: First pass at llama_kv_cache_hybrid_recurrent This follows the pattern in iswa where the two child caches are held explicitly to support the case where a model requires a single attention cache and a single recurrent cache where each layer uses exactly one of the caches. This is a rewrite of the more generic approach in the original hybrid cache PR: https://github.com/ggml-org/llama.cpp/pull/13276 Branch: HybridRecurrentCache Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-06-17 14:54:18 -06:00

11 Commits