mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			21 lines
		
	
	
		
			1021 B
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			21 lines
		
	
	
		
			1021 B
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # perplexity
 | |
| 
 | |
| TODO
 | |
| 
 | |
| ## Llama 2 70B Scorechart
 | |
| | Quantization | Model size (GiB) | Perplexity | Delta to fp16 |
 | |
| |--------------|------------------|------------|---------------|
 | |
| | Q4_0         | 36.20            | 3.5550     | 3.61%         |
 | |
| | Q4_1         | 40.20            | 3.5125     | 2.37%         |
 | |
| | Q5_0         | 44.20            | 3.4744     | 1.26%         |
 | |
| | Q2_K         | 27.27            | 3.7339     | 8.82%         |
 | |
| | Q3_K_S       | 27.86            | 3.7019     | 7.89%         |
 | |
| | Q3_K_M       | 30.83            | 3.5932     | 4.72%         |
 | |
| | Q3_K_L       | 33.67            | 3.5617     | 3.80%         |
 | |
| | Q4_K_S       | 36.39            | 3.4852     | 1.57%         |
 | |
| | Q4_K_M       | 38.54            | 3.4725     | 1.20%         |
 | |
| | Q5_K_S       | 44.20            | 3.4483     | 0.50%         |
 | |
| | Q5_K_M       | 45.41            | 3.4451     | 0.40%         |
 | |
| | Q6_K         | 52.70            | 3.4367     | 0.16%         |
 | |
| | fp16         | 128.5            | 3.4313     | -             |
 | 
