llama : improve token type support (#2668)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

* Merge tokenizer fixes into the gguf branch.

* Add test vocabularies

* Adapt convert-new.py (and fix a clang-cl compiler error on windows)

* Improved tokenizer test

But does it work on MacOS?

* Improve token type support

- Added @klosax code to convert.py
- Improved token type support in vocabulary

* Exclude platform dependent tests

* More sentencepiece compatibility by eliminating magic numbers

* Restored accidentally removed comment

This commit is contained in:

goerch

2023-08-21 17:56:02 +02:00

committed by

GitHub

parent e06cbcee73

commit 8d177eddeb

4 changed files with 94 additions and 98 deletions

BIN
models/ggml-vocab-llama.gguf

View File

Binary file not shown.

llama : improve token type support (#2668)

BIN models/ggml-vocab-llama.gguf View File

BIN
models/ggml-vocab-llama.gguf

View File