CS348Project/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-28 08:31:25 +00:00

Files

History

Georgi Gerganov aa750c1ede tests : minor bash stuff (#6902 )

* tests : minor bash stuff

ggml-ci

* llama : fix build

ggml-ci

* tests : fix CUR_DIR -> ROOT_DIR

ggml-ci

* tests : fix fname

ggml-ci

2024-04-25 14:27:20 +03:00

..

CMakeLists.txt

build : link against build info instead of compiling against it (#3879 )

2023-11-02 08:50:16 +02:00

quantize.cpp

quantize : add '--keep-split' to quantize model into shards (#6688 )

2024-04-25 13:29:35 +03:00

README.md

chore: Fix markdown warnings (#6625 )

2024-04-12 10:52:36 +02:00

tests.sh

tests : minor bash stuff (#6902 )

2024-04-25 14:27:20 +03:00

README.md

quantize

TODO

Llama 2 7B

Quantization	Bits per Weight (BPW)
Q2_K	3.35
Q3_K_S	3.50
Q3_K_M	3.91
Q3_K_L	4.27
Q4_K_S	4.58
Q4_K_M	4.84
Q5_K_S	5.52
Q5_K_M	5.68
Q6_K	6.56

Llama 2 13B

Quantization	Bits per Weight (BPW)
Q2_K	3.34
Q3_K_S	3.48
Q3_K_M	3.89
Q3_K_L	4.26
Q4_K_S	4.56
Q4_K_M	4.83
Q5_K_S	5.51
Q5_K_M	5.67
Q6_K	6.56

Llama 2 70B

Quantization	Bits per Weight (BPW)
Q2_K	3.40
Q3_K_S	3.47
Q3_K_M	3.85
Q3_K_L	4.19
Q4_K_S	4.53
Q4_K_M	4.80
Q5_K_S	5.50
Q5_K_M	5.65
Q6_K	6.56