llama.cpp/convert_hf_to_gguf.py at 69f772682e5b7cdd9fa48f6a0745aa9794050e43

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-11 10:36:54 +00:00

Files

Francis Couture-Harpin 895004f3f8 convert : allow direct conversion to TQ1_0 and TQ2_0

The token embeddings and output tensors are kept in F16
to allow quantizing them to Q4_K and Q6_K with llama-quantize.

* llama : handle fallback for TQ1_0 and TQ2_0 with Q4_0

Q4_0 is not completely symmetric (so not lossless for ternary models),
but it should be good enough.

2024-08-13 17:17:43 -04:00

176 KiB

Executable File

Raw Blame History

View Raw

176 KiB Executable File Raw Blame History

176 KiB

Executable File

Raw Blame History