llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-31 08:51:55 +00:00

Files

Xuan Son Nguyen 958367bf53 server : refactor slot input data, move tokenizer to HTTP thread (#10023 )

* server : refactor slot input data, move tokenizer to HTTP thread

* move prompt_tokens.empty() check

* fix incorrect if branch

* fix infinite generation loop

* bring back infill validation

* add infill test

* try fixing format_infill

* fix test

* remove redundant code

* rename completion to inference

* update docs

* use llama_tokens everywhere

2024-10-24 21:51:22 +02:00

steps.py

server : refactor slot input data, move tokenizer to HTTP thread (#10023 )

2024-10-24 21:51:22 +02:00