| 
							
							
								 Georgi Gerganov | e298d2fbd0 | kv-cache : add SWA support (#13194) * kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci | 2025-05-20 08:05:46 +03:00 |  | 
			
				
					| 
							
							
								 Diego Devesa | 6c8b91500e | llama-bench : fix -ot with dl backends (#13563) | 2025-05-15 15:46:55 +02:00 |  | 
			
				
					| 
							
							
								 Georgi Gerganov | b2838049cc | bench : handle decode errors (#13548) ggml-ci | 2025-05-15 05:57:02 +03:00 |  | 
			
				
					| 
							
							
								 Diego Devesa | cf0a43bb64 | llama-bench : add defrag-thold, check for invalid ranges (#13487) | 2025-05-13 00:31:37 +02:00 |  | 
			
				
					| 
							
							
								 Diego Devesa | 22cdab343b | llama-bench : accept ranges for integer parameters (#13410) | 2025-05-12 13:08:22 +02:00 |  | 
			
				
					| 
							
							
								 David Huang | 7f323a589f | Add --no-op-offloadto improve-otpp perf in MoE models like llama4 400B (#13386) | 2025-05-11 14:18:39 +02:00 |  | 
			
				
					| 
							
							
								 Diego Devesa | 1d36b3670b | llama : move end-user examples to tools directory (#13249) * llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> | 2025-05-02 20:27:13 +02:00 |  |