Inference support for T5 and FLAN-T5 model families (#5763)

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

* llama : add inference support and model types for T5 and FLAN-T5 model families

* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()

* common, llama-cli, llama-batched : add support for encoder-decoder models

* convert-hf : handle shared token embeddings tensors in T5Model

* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)

* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model

* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

This commit is contained in:

fairydreaming

2024-07-04 15:46:11 +02:00

committed by

GitHub

parent f8c4c0738d

commit 807b0c49ff

33 changed files with 946 additions and 31 deletions

1

models/ggml-vocab-llama-bpe.gguf.out

View File

@@ -31,6 +31,7 @@
 284
 11639
 11 379 65948 0 2650 527 499 27623 223 949 37046 101067 19000 23182 102301 9263 18136 16 36827 21909
 3001

Inference support for T5 and FLAN-T5 model families (#5763)

1 models/ggml-vocab-llama-bpe.gguf.out Unescape Escape View File

1

models/ggml-vocab-llama-bpe.gguf.out

View File