Files
llama.cpp/convert_hf_to_gguf.py
Francis Couture-Harpin 895004f3f8 convert : allow direct conversion to TQ1_0 and TQ2_0
The token embeddings and output tensors are kept in F16
to allow quantizing them to Q4_K and Q6_K with llama-quantize.

* llama : handle fallback for TQ1_0 and TQ2_0 with Q4_0

Q4_0 is not completely symmetric (so not lossless for ternary models),
but it should be good enough.
2024-08-13 17:17:43 -04:00

176 KiB
Executable File