Georgi Gerganov
							
						 
					 | 
					
						
						
							
						
						e298d2fbd0
					 | 
					
						
						
							
							kv-cache : add SWA support (#13194)
						
						
						
						
						
						
						
						* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci 
						
						
					 | 
					
						2025-05-20 08:05:46 +03:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Diego Devesa
							
						 
					 | 
					
						
						
							
						
						6c8b91500e
					 | 
					
						
						
							
							llama-bench : fix -ot with dl backends (#13563)
						
						
						
						
						
						
					 | 
					
						2025-05-15 15:46:55 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Georgi Gerganov
							
						 
					 | 
					
						
						
							
						
						b2838049cc
					 | 
					
						
						
							
							bench : handle decode errors (#13548)
						
						
						
						
						
						
						
						ggml-ci 
						
						
					 | 
					
						2025-05-15 05:57:02 +03:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Diego Devesa
							
						 
					 | 
					
						
						
							
						
						cf0a43bb64
					 | 
					
						
						
							
							llama-bench : add defrag-thold, check for invalid ranges (#13487)
						
						
						
						
						
						
					 | 
					
						2025-05-13 00:31:37 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Diego Devesa
							
						 
					 | 
					
						
						
							
						
						22cdab343b
					 | 
					
						
						
							
							llama-bench : accept ranges for integer parameters (#13410)
						
						
						
						
						
						
					 | 
					
						2025-05-12 13:08:22 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								David Huang
							
						 
					 | 
					
						
						
							
						
						7f323a589f
					 | 
					
						
						
							
							Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (#13386)
						
						
						
						
						
						
					 | 
					
						2025-05-11 14:18:39 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 | 
				
			
				
					
						
							
							
								 
								Diego Devesa
							
						 
					 | 
					
						
						
							
						
						1d36b3670b
					 | 
					
						
						
							
							llama : move end-user examples to tools directory (#13249)
						
						
						
						
						
						
						
						* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co> 
						
						
					 | 
					
						2025-05-02 20:27:13 +02:00 | 
					
					
						
						
						
							
							
							
							
							
							
							
							
						
					 |