llama.cpp/examples/server/server.cpp at 3dfda05956befb350745c5c2f7134d06adfe8724

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-01 09:01:57 +00:00

Files

Douglas Hanley c3ebcfa148 server : ensure batches are either all embed or all completion (#8420 )

* make sure batches are all embed or all non-embed

* non-embedding batch for sampled tokens; fix unused params warning

2024-07-12 11:14:12 +03:00

View Raw