llama.cpp/convert-pth-to-ggml.py at 956dfda8ad8cea7961e22e0384bbc315bf79aed2

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Ronsor 956dfda8ad Use tokenizer.vocab_size() instead of hardcoding 32000 in convert-pth-to-ggml.py (#142 )

There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.

2023-03-15 21:37:50 +02:00

5.3 KiB

Raw Blame History

View Raw

5.3 KiB Raw Blame History

5.3 KiB

Raw Blame History