mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-11-04 09:32:00 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			21 lines
		
	
	
		
			1021 B
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			21 lines
		
	
	
		
			1021 B
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# perplexity
 | 
						|
 | 
						|
TODO
 | 
						|
 | 
						|
## Llama 2 70B Scorechart
 | 
						|
| Quantization | Model size (GiB) | Perplexity | Delta to fp16 |
 | 
						|
|--------------|------------------|------------|---------------|
 | 
						|
| Q4_0         | 36.20            | 3.5550     | 3.61%         |
 | 
						|
| Q4_1         | 40.20            | 3.5125     | 2.37%         |
 | 
						|
| Q5_0         | 44.20            | 3.4744     | 1.26%         |
 | 
						|
| Q2_K         | 27.27            | 3.7339     | 8.82%         |
 | 
						|
| Q3_K_S       | 27.86            | 3.7019     | 7.89%         |
 | 
						|
| Q3_K_M       | 30.83            | 3.5932     | 4.72%         |
 | 
						|
| Q3_K_L       | 33.67            | 3.5617     | 3.80%         |
 | 
						|
| Q4_K_S       | 36.39            | 3.4852     | 1.57%         |
 | 
						|
| Q4_K_M       | 38.54            | 3.4725     | 1.20%         |
 | 
						|
| Q5_K_S       | 44.20            | 3.4483     | 0.50%         |
 | 
						|
| Q5_K_M       | 45.41            | 3.4451     | 0.40%         |
 | 
						|
| Q6_K         | 52.70            | 3.4367     | 0.16%         |
 | 
						|
| fp16         | 128.5            | 3.4313     | -             |
 |