mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

Kerfuffle 6e08281e58 Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843 )

* Extend llama_kv_cache_seq_rm to allow matichng any sequence

* Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear

Use llama_kv_cache_clear for cache clearing

Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality

2023-10-29 11:31:40 -06:00

CMakeLists.txt

cmake : install targets (#2256 )

2023-07-19 10:01:11 +03:00

perplexity.cpp

Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843 )

2023-10-29 11:31:40 -06:00

README.md

readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (#3340 )

2023-09-27 18:30:36 +03:00

README.md

perplexity

TODO

Llama 2 70B Scorechart

Quantization	Model size (GiB)	Perplexity	Delta to fp16
Q4_0	36.20	3.5550	3.61%
Q4_1	40.20	3.5125	2.37%
Q5_0	44.20	3.4744	1.26%
Q2_K	27.27	3.7339	8.82%
Q3_K_S	27.86	3.7019	7.89%
Q3_K_M	30.83	3.5932	4.72%
Q3_K_L	33.67	3.5617	3.80%
Q4_K_S	36.39	3.4852	1.57%
Q4_K_M	38.54	3.4725	1.20%
Q5_K_S	44.20	3.4483	0.50%
Q5_K_M	45.41	3.4451	0.40%
Q6_K	52.70	3.4367	0.16%
fp16	128.5	3.4313	-