mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-31 08:51:55 +00:00 
			
		
		
		
	 80ea089d77
			
		
	
	80ea089d77
	
	
	
		
			
			* create append_pooling operation; allow to specify attention_type; add last token pooling; update examples * find result_norm/result_embd tensors properly; update output allocation logic * only use embd output for pooling_type NONE * get rid of old causal_attn accessor * take out attention_type; add in llama_set_embeddings * bypass logits when doing non-NONE pooling
		
			
				
	
	
	
		
			10 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			10 KiB