From 2f61c0f5bf8a620ca4c3872408803ab38cfb9613 Mon Sep 17 00:00:00 2001 From: Vinkal Date: Mon, 29 Sep 2025 12:33:12 +0530 Subject: [PATCH] llama-cli: prevent spurious assistant token (#16202) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * tools/main: llama-cli: prevent spurious assistant token (#13402) During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece. Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged. Fixes #13402. Signed-off-by: Vinkal Chudgar * Update tools/main/main.cpp Co-authored-by: Sigbjørn Skjæret * tools/main: remove outdated comment Signed-off-by: Vinkal Chudgar --------- Signed-off-by: Vinkal Chudgar Co-authored-by: Sigbjørn Skjæret --- tools/main/main.cpp | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/main/main.cpp b/tools/main/main.cpp index 083fc0cf26..498e00e3a5 100644 --- a/tools/main/main.cpp +++ b/tools/main/main.cpp @@ -707,6 +707,10 @@ int main(int argc, char ** argv) { embd.push_back(id); + if (params.conversation_mode && !waiting_for_first_input && !llama_vocab_is_eog(vocab, id)) { + assistant_ss << common_token_to_piece(ctx, id, false); + } + // echo this to console input_echo = true; @@ -824,11 +828,7 @@ int main(int argc, char ** argv) { } } - // if current token is not EOG, we add it to current assistant message if (params.conversation_mode && !waiting_for_first_input) { - const auto id = common_sampler_last(smpl); - assistant_ss << common_token_to_piece(ctx, id, false); - if (!prompt.empty()) { prompt.clear(); is_interacting = false;