mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-10-27 08:21:30 +00:00
drop quantize.py (now that models are using a single file)
This commit is contained in:
@@ -155,8 +155,8 @@ python3 -m pip install torch numpy sentencepiece
|
||||
# convert the 7B model to ggml FP16 format
|
||||
python3 convert-pth-to-ggml.py models/7B/ 1
|
||||
|
||||
# quantize the model to 4-bits
|
||||
python3 quantize.py 7B
|
||||
# quantize the model to 4-bits (using method 2 = q4_0)
|
||||
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
|
||||
|
||||
# run the inference
|
||||
./main -m ./models/7B/ggml-model-q4_0.bin -n 128
|
||||
|
||||
Reference in New Issue
Block a user