mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	perplexity: add BF16 vs. FP16 results (#7150)
This commit is contained in:
		| @@ -7,6 +7,8 @@ Also note that finetunes typically result in a higher perplexity value even thou | ||||
|  | ||||
| Within llama.cpp the perplexity of base models is used primarily to judge the quality loss from e.g. quantized models vs. FP16. | ||||
| The convention among contributors is to use the Wikitext-2 test set for testing unless noted otherwise (can be obtained with `scripts/get-wikitext-2.sh`). | ||||
| When numbers are listed all command line arguments and compilation options are left at their defaults unless noted otherwise. | ||||
| llama.cpp numbers are **not** directly comparable to those of other projects because the exact values depend strongly on the implementation details. | ||||
|  | ||||
| By default only the mean perplexity value and the corresponding uncertainty is calculated. | ||||
| The uncertainty is determined empirically by assuming a Gaussian distribution of the "correct" logits per and then applying error propagation. | ||||
| @@ -32,7 +34,13 @@ In addition to the KL divergence the following statistics are calculated with `- | ||||
|  | ||||
| ## LLaMA 3 8b Scoreboard | ||||
|  | ||||
| Results are sorted by Kullback-Leibler divergence relative to FP16. | ||||
| | Revision | f364eb6f           | | ||||
| |:---------|:-------------------| | ||||
| | Backend  | CUDA               | | ||||
| | CPU      | AMD Epyc 7742      | | ||||
| | GPU      | 1x NVIDIA RTX 4090 | | ||||
|  | ||||
| Results were generated using the CUDA backend and are sorted by Kullback-Leibler divergence relative to FP16. | ||||
| The "WT" importance matrices were created using varying numbers of Wikitext tokens and can be found [here](https://huggingface.co/JohannesGaessler/llama.cpp_importance_matrices/blob/main/imatrix-llama_3-8b-f16-2.7m_tokens.dat). | ||||
|  | ||||
| | Quantization | imatrix | Model size [GiB] | PPL                    | ΔPPL                   | KLD                   | Mean Δp           | RMS Δp           | | ||||
| @@ -89,6 +97,12 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence | ||||
|  | ||||
| ## LLaMA 2 vs. LLaMA 3 Quantization comparison | ||||
|  | ||||
| | Revision | f364eb6f           | | ||||
| |:---------|:-------------------| | ||||
| | Backend  | CUDA               | | ||||
| | CPU      | AMD Epyc 7742      | | ||||
| | GPU      | 1x NVIDIA RTX 4090 | | ||||
|  | ||||
| | Metric          |          L2 7b q2_K |          L3 8b q2_K |        L2 7b q4_K_M |        L3 8b q4_K_M |          L2 7b q6_K |          L3 8b q6_K |          L2 7b q8_0 |          L3 8b q8_0 | | ||||
| |-----------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------|---------------------| | ||||
| | Mean PPL        | 5.794552 ± 0.032298 | 9.751568 ± 0.063312 | 5.877078 ± 0.032781 | 6.407115 ± 0.039119 | 5.808494 ± 0.032425 | 6.253382 ± 0.038078 | 5.798542 ± 0.032366 | 6.234284 ± 0.037878 | | ||||
| @@ -107,6 +121,50 @@ K-quants score better on mean Δp than the legacy quants than e.g. KL divergence | ||||
| | RMS Δp          |     9.762 ± 0.053 % |    21.421 ± 0.079 % |     3.252 ± 0.024 % |     5.519 ± 0.050 % |     1.339 ± 0.010 % |     2.295 ± 0.019 % |     0.618 ± 0.011 % |     1.198 ± 0.007 % | | ||||
| | Same top p      |    85.584 ± 0.086 % |    71.138 ± 0.119 % |    94.665 ± 0.055 % |    91.901 ± 0.072 % |    97.520 ± 0.038 % |    96.031 ± 0.051 % |    98.846 ± 0.026 % |    97.674 ± 0.040 % | | ||||
|  | ||||
| ## LLaMA 3 BF16 vs. FP16 comparison | ||||
|  | ||||
| | Revision | 83330d8c      | | ||||
| |:---------|:--------------| | ||||
| | Backend  | CPU           | | ||||
| | CPU      | AMD Epyc 7742 | | ||||
| | GPU      | N/A           | | ||||
|  | ||||
| Results were calculated with LLaMA 3 8b BF16 as `--kl-divergence-base` and LLaMA 3 8b FP16 as the `--model` for comparison. | ||||
|  | ||||
| | Metric                         |                    Value | | ||||
| |--------------------------------|--------------------------| | ||||
| | Mean PPL(Q)                    |      6.227711 ± 0.037833 | | ||||
| | Mean PPL(base)                 |      6.225194 ± 0.037771 | | ||||
| | Cor(ln(PPL(Q)), ln(PPL(base))) |                  99.990% | | ||||
| | Mean ln(PPL(Q)/PPL(base))      |      0.000404 ± 0.000086 | | ||||
| | Mean PPL(Q)/PPL(base)          |      1.000404 ± 0.000086 | | ||||
| | Mean PPL(Q)-PPL(base)          |      0.002517 ± 0.000536 | | ||||
| | Mean    KLD                    |  0.00002515 ± 0.00000020 | | ||||
| | Maximum KLD                    |                 0.012206 | | ||||
| | 99.9%   KLD                    |                 0.000799 | | ||||
| | 99.0%   KLD                    |                 0.000222 | | ||||
| | 99.0%   KLD                    |                 0.000222 | | ||||
| | Median  KLD                    |                 0.000013 | | ||||
| | 10.0%   KLD                    |                -0.000002 | | ||||
| | 5.0%   KLD                     |                -0.000008 | | ||||
| | 1.0%   KLD                     |                -0.000023 | | ||||
| | Minimum KLD                    |                -0.000059 | | ||||
| | Mean    Δp                     | -0.0000745 ± 0.0003952 % | | ||||
| | Maximum Δp                     |                   4.186% | | ||||
| | 99.9%   Δp                     |                   1.049% | | ||||
| | 99.0%   Δp                     |                   0.439% | | ||||
| | 95.0%   Δp                     |                   0.207% | | ||||
| | 90.0%   Δp                     |                   0.125% | | ||||
| | 75.0%   Δp                     |                   0.029% | | ||||
| | Median  Δp                     |                   0.000% | | ||||
| | 25.0%   Δp                     |                  -0.030% | | ||||
| | 10.0%   Δp                     |                  -0.126% | | ||||
| | 5.0%   Δp                      |                  -0.207% | | ||||
| | 1.0%   Δp                      |                  -0.434% | | ||||
| | 0.1%   Δp                      |                  -1.016% | | ||||
| | Minimum Δp                     |                  -4.672% | | ||||
| | RMS Δp                         |          0.150 ± 0.001 % | | ||||
| | Same top p                     |         99.739 ± 0.013 % | | ||||
|  | ||||
| ## Old Numbers | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Johannes Gäßler
					Johannes Gäßler