llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-04 09:32:00 +00:00

Files

Daniel Bevenius 62cef26ac5 model-conversion : add qat-q4 quantization targets (#15588 )

This commit adds two targets to the Makefile for quantizing of
Quantization Aware Trained (QAT) models to Q4_0 format.

The motivation for this is that this sets the token embedding and the
output tensors data types to Q8_0 instead of the default Q6_K. This is
someting that we wish to enforce for QAT Q4_0 models that are to be
uploaded to ggml-org on Huggingface to guarantee the best quality.

2025-08-26 16:12:29 +02:00

check-nmse.py

examples : add model conversion tool/example (#15455 )

2025-08-21 12:16:54 +02:00

create-collection-add-model.sh

examples : add model conversion tool/example (#15455 )

2025-08-21 12:16:54 +02:00

hf-add-model-to-collection.py

examples : add model conversion tool/example (#15455 )

2025-08-21 12:16:54 +02:00

hf-create-collection.py

examples : add model conversion tool/example (#15455 )