mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-10-27 08:21:30 +00:00
refactor: centralize CoT parsing in backend for streaming mode (#16394)
* refactor: unify reasoning handling via backend reasoning_content, drop frontend tag parsing - Updated the chat message component to surface backend-supplied reasoning via message.thinking while showing the raw assistant content without inline tag scrubbing - Simplified chat streaming to append content chunks directly, stream reasoning into the message model, and persist any partial reasoning when generation stops - Refactored the chat service SSE handler to rely on server-provided reasoning_content, removing legacy <think> parsing logic - Refreshed Storybook data and streaming flows to populate the thinking field explicitly for static and streaming assistant messages * refactor: implement streaming-aware universal reasoning parser Remove the streaming mode limitation from --reasoning-format by refactoring try_parse_reasoning() to handle incremental parsing of <think> tags across all formats. - Rework try_parse_reasoning() to track whitespace, partial tags, and multiple reasoning segments, allowing proper separation of reasoning_content and content in streaming mode - Parse reasoning tags before tool call handling in content-only and Llama 3.x formats to ensure inline <think> blocks are captured correctly - Change default reasoning_format from 'auto' to 'deepseek' for consistent behavior - Add 'deepseek-legacy' option to preserve old inline behavior when needed - Update CLI help and documentation to reflect streaming support - Add parser tests for inline <think>...</think> segments The parser now continues processing content after </think> closes instead of stopping, enabling proper message.reasoning_content and message.content separation in both streaming and non-streaming modes. Fixes the issue where streaming responses would dump everything (including post-thinking content) into reasoning_content while leaving content empty. * refactor: address review feedback from allozaur - Passed the assistant message content directly to ChatMessageAssistant to drop the redundant derived state in the chat message component - Simplified chat streaming updates by removing unused partial-thinking handling and persisting partial responses straight from currentResponse - Refreshed the ChatMessage stories to cover standard and reasoning scenarios without the old THINK-tag parsing examples Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * refactor: restore forced reasoning prefix to pass test-chat ([chat] All tests passed) - store the exact sequence seen on input when 'thinking_forced_open' enforces a reasoning block - inject this prefix before the first accumulated segment in 'reasoning_content', then clear it to avoid duplication - repeat the capture on every new 'start_think' detection to properly handle partial/streaming flows * refactor: address review feedback from ngxson * debug: say goodbye to curl -N, hello one-click raw stream - adds a new checkbox in the WebUI to display raw LLM output without backend parsing or frontend Markdown rendering * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessage.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: add Storybook example for raw LLM output and scope reasoning format toggle per story - Added a Storybook example that showcases the chat message component in raw LLM output mode with the provided trace sample - Updated every ChatMessage story to toggle the disableReasoningFormat setting so the raw-output rendering remains scoped to its own example * npm run format * chat-parser: address review feedback from ngxson Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
This commit is contained in:
@@ -3432,7 +3432,8 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
|
||||
{"--reasoning-format"}, "FORMAT",
|
||||
"controls whether thought tags are allowed and/or extracted from the response, and in which format they're returned; one of:\n"
|
||||
"- none: leaves thoughts unparsed in `message.content`\n"
|
||||
"- deepseek: puts thoughts in `message.reasoning_content` (except in streaming mode, which behaves as `none`)\n"
|
||||
"- deepseek: puts thoughts in `message.reasoning_content`\n"
|
||||
"- deepseek-legacy: keeps `<think>` tags in `message.content` while also populating `message.reasoning_content`\n"
|
||||
"(default: auto)",
|
||||
[](common_params & params, const std::string & value) {
|
||||
params.reasoning_format = common_reasoning_format_from_name(value);
|
||||
|
||||
@@ -3,9 +3,12 @@
|
||||
#include "log.h"
|
||||
#include "regex-partial.h"
|
||||
|
||||
#include <algorithm>
|
||||
#include <cctype>
|
||||
#include <optional>
|
||||
#include <stdexcept>
|
||||
#include <string>
|
||||
#include <string_view>
|
||||
#include <vector>
|
||||
|
||||
using json = nlohmann::ordered_json;
|
||||
@@ -166,6 +169,27 @@ void common_chat_msg_parser::consume_literal(const std::string & literal) {
|
||||
}
|
||||
|
||||
bool common_chat_msg_parser::try_parse_reasoning(const std::string & start_think, const std::string & end_think) {
|
||||
std::string pending_reasoning_prefix;
|
||||
|
||||
if (syntax_.reasoning_format == COMMON_REASONING_FORMAT_NONE) {
|
||||
return false;
|
||||
}
|
||||
|
||||
auto set_reasoning_prefix = [&](size_t prefix_pos) {
|
||||
if (!syntax_.thinking_forced_open || syntax_.reasoning_in_content) {
|
||||
return;
|
||||
}
|
||||
if (prefix_pos + start_think.size() > input_.size()) {
|
||||
pending_reasoning_prefix.clear();
|
||||
return;
|
||||
}
|
||||
// Capture the exact literal that opened the reasoning section so we can
|
||||
// surface it back to callers. This ensures formats that force the
|
||||
// reasoning tag open (e.g. DeepSeek R1) retain their original prefix
|
||||
// instead of dropping it during parsing.
|
||||
pending_reasoning_prefix = input_.substr(prefix_pos, start_think.size());
|
||||
};
|
||||
|
||||
auto handle_reasoning = [&](const std::string & reasoning, bool closed) {
|
||||
auto stripped_reasoning = string_strip(reasoning);
|
||||
if (stripped_reasoning.empty()) {
|
||||
@@ -178,28 +202,116 @@ bool common_chat_msg_parser::try_parse_reasoning(const std::string & start_think
|
||||
add_content(syntax_.reasoning_format == COMMON_REASONING_FORMAT_DEEPSEEK ? "</think>" : end_think);
|
||||
}
|
||||
} else {
|
||||
if (!pending_reasoning_prefix.empty()) {
|
||||
add_reasoning_content(pending_reasoning_prefix);
|
||||
pending_reasoning_prefix.clear();
|
||||
}
|
||||
add_reasoning_content(stripped_reasoning);
|
||||
}
|
||||
};
|
||||
if (syntax_.reasoning_format != COMMON_REASONING_FORMAT_NONE) {
|
||||
if (syntax_.thinking_forced_open || try_consume_literal(start_think)) {
|
||||
if (auto res = try_find_literal(end_think)) {
|
||||
handle_reasoning(res->prelude, /* closed */ true);
|
||||
consume_spaces();
|
||||
return true;
|
||||
}
|
||||
auto rest = consume_rest();
|
||||
|
||||
const size_t saved_pos = pos_;
|
||||
const size_t saved_content_size = result_.content.size();
|
||||
const size_t saved_reasoning_size = result_.reasoning_content.size();
|
||||
|
||||
auto restore_state = [&]() {
|
||||
move_to(saved_pos);
|
||||
result_.content.resize(saved_content_size);
|
||||
result_.reasoning_content.resize(saved_reasoning_size);
|
||||
};
|
||||
|
||||
// Allow leading whitespace to be preserved as content when reasoning is present at the start
|
||||
size_t cursor = pos_;
|
||||
size_t whitespace_end = cursor;
|
||||
while (whitespace_end < input_.size() && std::isspace(static_cast<unsigned char>(input_[whitespace_end]))) {
|
||||
++whitespace_end;
|
||||
}
|
||||
|
||||
if (whitespace_end >= input_.size()) {
|
||||
restore_state();
|
||||
if (syntax_.thinking_forced_open) {
|
||||
auto rest = input_.substr(saved_pos);
|
||||
if (!rest.empty()) {
|
||||
handle_reasoning(rest, /* closed */ !is_partial());
|
||||
}
|
||||
// Allow unclosed thinking tags, for now (https://github.com/ggml-org/llama.cpp/issues/13812, https://github.com/ggml-org/llama.cpp/issues/13877)
|
||||
// if (!syntax_.thinking_forced_open) {
|
||||
// throw common_chat_msg_partial_exception(end_think);
|
||||
// }
|
||||
move_to(input_.size());
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
cursor = whitespace_end;
|
||||
const size_t remaining = input_.size() - cursor;
|
||||
const size_t start_prefix = std::min(start_think.size(), remaining);
|
||||
const bool has_start_tag = input_.compare(cursor, start_prefix, start_think, 0, start_prefix) == 0;
|
||||
|
||||
if (has_start_tag && start_prefix < start_think.size()) {
|
||||
move_to(input_.size());
|
||||
return true;
|
||||
}
|
||||
|
||||
if (has_start_tag) {
|
||||
if (whitespace_end > pos_) {
|
||||
add_content(input_.substr(pos_, whitespace_end - pos_));
|
||||
}
|
||||
set_reasoning_prefix(cursor);
|
||||
cursor += start_think.size();
|
||||
} else if (syntax_.thinking_forced_open) {
|
||||
cursor = whitespace_end;
|
||||
} else {
|
||||
restore_state();
|
||||
return false;
|
||||
}
|
||||
while (true) {
|
||||
if (cursor >= input_.size()) {
|
||||
move_to(input_.size());
|
||||
return true;
|
||||
}
|
||||
|
||||
size_t end_pos = input_.find(end_think, cursor);
|
||||
if (end_pos == std::string::npos) {
|
||||
std::string_view remaining_view(input_.data() + cursor, input_.size() - cursor);
|
||||
size_t partial_off = string_find_partial_stop(remaining_view, end_think);
|
||||
size_t reasoning_end = partial_off == std::string::npos ? input_.size() : cursor + partial_off;
|
||||
if (reasoning_end > cursor) {
|
||||
handle_reasoning(input_.substr(cursor, reasoning_end - cursor), /* closed */ partial_off == std::string::npos && !is_partial());
|
||||
}
|
||||
move_to(input_.size());
|
||||
return true;
|
||||
}
|
||||
|
||||
if (end_pos > cursor) {
|
||||
handle_reasoning(input_.substr(cursor, end_pos - cursor), /* closed */ true);
|
||||
} else {
|
||||
handle_reasoning("", /* closed */ true);
|
||||
}
|
||||
|
||||
cursor = end_pos + end_think.size();
|
||||
|
||||
while (cursor < input_.size() && std::isspace(static_cast<unsigned char>(input_[cursor]))) {
|
||||
++cursor;
|
||||
}
|
||||
|
||||
const size_t next_remaining = input_.size() - cursor;
|
||||
if (next_remaining == 0) {
|
||||
move_to(cursor);
|
||||
return true;
|
||||
}
|
||||
|
||||
const size_t next_prefix = std::min(start_think.size(), next_remaining);
|
||||
if (input_.compare(cursor, next_prefix, start_think, 0, next_prefix) == 0) {
|
||||
if (next_prefix < start_think.size()) {
|
||||
move_to(input_.size());
|
||||
return true;
|
||||
}
|
||||
set_reasoning_prefix(cursor);
|
||||
cursor += start_think.size();
|
||||
continue;
|
||||
}
|
||||
|
||||
move_to(cursor);
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
std::string common_chat_msg_parser::consume_rest() {
|
||||
|
||||
@@ -1408,6 +1408,8 @@ static common_chat_params common_chat_params_init_apertus(const common_chat_temp
|
||||
return data;
|
||||
}
|
||||
static void common_chat_parse_llama_3_1(common_chat_msg_parser & builder, bool with_builtin_tools = false) {
|
||||
builder.try_parse_reasoning("<think>", "</think>");
|
||||
|
||||
if (!builder.syntax().parse_tool_calls) {
|
||||
builder.add_content(builder.consume_rest());
|
||||
return;
|
||||
@@ -2862,6 +2864,7 @@ common_chat_params common_chat_templates_apply(
|
||||
}
|
||||
|
||||
static void common_chat_parse_content_only(common_chat_msg_parser & builder) {
|
||||
builder.try_parse_reasoning("<think>", "</think>");
|
||||
builder.add_content(builder.consume_rest());
|
||||
}
|
||||
|
||||
|
||||
@@ -433,7 +433,7 @@ struct common_params {
|
||||
std::string chat_template = ""; // NOLINT
|
||||
bool use_jinja = false; // NOLINT
|
||||
bool enable_chat_template = true;
|
||||
common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_AUTO;
|
||||
common_reasoning_format reasoning_format = COMMON_REASONING_FORMAT_DEEPSEEK;
|
||||
int reasoning_budget = -1;
|
||||
bool prefill_assistant = true; // if true, any trailing assistant message will be prefilled into the response
|
||||
|
||||
|
||||
@@ -106,6 +106,34 @@ static void test_reasoning() {
|
||||
assert_equals("<think>Cogito</think>", builder.result().content);
|
||||
assert_equals("Ergo sum", builder.consume_rest());
|
||||
}
|
||||
{
|
||||
const std::string variant("content_only_inline_think");
|
||||
common_chat_syntax syntax = {
|
||||
/* .format = */ COMMON_CHAT_FORMAT_CONTENT_ONLY,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ false,
|
||||
/* .parse_tool_calls = */ false,
|
||||
};
|
||||
const std::string input = "<think>Pense</think>Bonjour";
|
||||
auto msg = common_chat_parse(input, false, syntax);
|
||||
assert_equals(variant, std::string("Pense"), msg.reasoning_content);
|
||||
assert_equals(variant, std::string("Bonjour"), msg.content);
|
||||
}
|
||||
{
|
||||
const std::string variant("llama_3_inline_think");
|
||||
common_chat_syntax syntax = {
|
||||
/* .format = */ COMMON_CHAT_FORMAT_LLAMA_3_X,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ false,
|
||||
/* .parse_tool_calls = */ false,
|
||||
};
|
||||
const std::string input = "<think>Plan</think>Réponse";
|
||||
auto msg = common_chat_parse(input, false, syntax);
|
||||
assert_equals(variant, std::string("Plan"), msg.reasoning_content);
|
||||
assert_equals(variant, std::string("Réponse"), msg.content);
|
||||
}
|
||||
// Test DeepSeek V3.1 parsing - reasoning content followed by "</think>" and then regular content
|
||||
{
|
||||
common_chat_syntax syntax = {
|
||||
|
||||
@@ -190,7 +190,7 @@ The project is under active development, and we are [looking for feedback and co
|
||||
| `--no-slots` | disables slots monitoring endpoint<br/>(env: LLAMA_ARG_NO_ENDPOINT_SLOTS) |
|
||||
| `--slot-save-path PATH` | path to save slot kv cache (default: disabled) |
|
||||
| `--jinja` | use jinja template for chat (default: disabled)<br/>(env: LLAMA_ARG_JINJA) |
|
||||
| `--reasoning-format FORMAT` | controls whether thought tags are allowed and/or extracted from the response, and in which format they're returned; one of:<br/>- none: leaves thoughts unparsed in `message.content`<br/>- deepseek: puts thoughts in `message.reasoning_content` (except in streaming mode, which behaves as `none`)<br/>(default: auto)<br/>(env: LLAMA_ARG_THINK) |
|
||||
| `--reasoning-format FORMAT` | controls whether thought tags are allowed and/or extracted from the response, and in which format they're returned; one of:<br/>- none: leaves thoughts unparsed in `message.content`<br/>- deepseek: puts thoughts in `message.reasoning_content`<br/>- deepseek-legacy: keeps `<think>` tags in `message.content` while also populating `message.reasoning_content`<br/>(default: deepseek)<br/>(env: LLAMA_ARG_THINK) |
|
||||
| `--reasoning-budget N` | controls the amount of thinking allowed; currently only one of: -1 for unrestricted thinking budget, or 0 to disable thinking (default: -1)<br/>(env: LLAMA_ARG_THINK_BUDGET) |
|
||||
| `--chat-template JINJA_TEMPLATE` | set custom jinja chat template (default: template taken from model's metadata)<br/>if suffix/prefix are specified, template will be disabled<br/>only commonly used templates are accepted (unless --jinja is set before this flag):<br/>list of built-in templates:<br/>bailing, chatglm3, chatglm4, chatml, command-r, deepseek, deepseek2, deepseek3, exaone3, exaone4, falcon3, gemma, gigachat, glmedge, gpt-oss, granite, hunyuan-dense, hunyuan-moe, kimi-k2, llama2, llama2-sys, llama2-sys-bos, llama2-sys-strip, llama3, llama4, megrez, minicpm, mistral-v1, mistral-v3, mistral-v3-tekken, mistral-v7, mistral-v7-tekken, monarch, openchat, orion, phi3, phi4, rwkv-world, seed_oss, smolvlm, vicuna, vicuna-orca, yandex, zephyr<br/>(env: LLAMA_ARG_CHAT_TEMPLATE) |
|
||||
| `--chat-template-file JINJA_TEMPLATE_FILE` | set custom jinja chat template file (default: template taken from model's metadata)<br/>if suffix/prefix are specified, template will be disabled<br/>only commonly used templates are accepted (unless --jinja is set before this flag):<br/>list of built-in templates:<br/>bailing, chatglm3, chatglm4, chatml, command-r, deepseek, deepseek2, deepseek3, exaone3, exaone4, falcon3, gemma, gigachat, glmedge, gpt-oss, granite, hunyuan-dense, hunyuan-moe, kimi-k2, llama2, llama2-sys, llama2-sys-bos, llama2-sys-strip, llama3, llama4, megrez, minicpm, mistral-v1, mistral-v3, mistral-v3-tekken, mistral-v7, mistral-v7-tekken, monarch, openchat, orion, phi3, phi4, rwkv-world, seed_oss, smolvlm, vicuna, vicuna-orca, yandex, zephyr<br/>(env: LLAMA_ARG_CHAT_TEMPLATE_FILE) |
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
<script lang="ts">
|
||||
import { getDeletionInfo } from '$lib/stores/chat.svelte';
|
||||
import { copyToClipboard } from '$lib/utils/copy';
|
||||
import { parseThinkingContent } from '$lib/utils/thinking';
|
||||
import ChatMessageAssistant from './ChatMessageAssistant.svelte';
|
||||
import ChatMessageUser from './ChatMessageUser.svelte';
|
||||
|
||||
@@ -47,26 +46,13 @@
|
||||
|
||||
let thinkingContent = $derived.by(() => {
|
||||
if (message.role === 'assistant') {
|
||||
if (message.thinking) {
|
||||
return message.thinking;
|
||||
}
|
||||
const trimmedThinking = message.thinking?.trim();
|
||||
|
||||
const parsed = parseThinkingContent(message.content);
|
||||
|
||||
return parsed.thinking;
|
||||
return trimmedThinking ? trimmedThinking : null;
|
||||
}
|
||||
return null;
|
||||
});
|
||||
|
||||
let messageContent = $derived.by(() => {
|
||||
if (message.role === 'assistant') {
|
||||
const parsed = parseThinkingContent(message.content);
|
||||
return parsed.cleanContent?.replace('<|channel|>analysis', '');
|
||||
}
|
||||
|
||||
return message.content?.replace('<|channel|>analysis', '');
|
||||
});
|
||||
|
||||
function handleCancelEdit() {
|
||||
isEditing = false;
|
||||
editedContent = message.content;
|
||||
@@ -165,7 +151,7 @@
|
||||
{editedContent}
|
||||
{isEditing}
|
||||
{message}
|
||||
{messageContent}
|
||||
messageContent={message.content}
|
||||
onCancelEdit={handleCancelEdit}
|
||||
onConfirmDelete={handleConfirmDelete}
|
||||
onCopy={handleCopy}
|
||||
|
||||
@@ -131,7 +131,11 @@
|
||||
</div>
|
||||
</div>
|
||||
{:else if message.role === 'assistant'}
|
||||
<MarkdownContent content={messageContent || ''} />
|
||||
{#if config().disableReasoningFormat}
|
||||
<pre class="raw-output">{messageContent || ''}</pre>
|
||||
{:else}
|
||||
<MarkdownContent content={messageContent || ''} />
|
||||
{/if}
|
||||
{:else}
|
||||
<div class="text-sm whitespace-pre-wrap">
|
||||
{messageContent}
|
||||
@@ -203,4 +207,21 @@
|
||||
background-position: -200% 0;
|
||||
}
|
||||
}
|
||||
|
||||
.raw-output {
|
||||
width: 100%;
|
||||
max-width: 48rem;
|
||||
margin-top: 1.5rem;
|
||||
padding: 1rem 1.25rem;
|
||||
border-radius: 1rem;
|
||||
background: hsl(var(--muted) / 0.3);
|
||||
color: var(--foreground);
|
||||
font-family:
|
||||
ui-monospace, SFMono-Regular, 'SF Mono', Monaco, 'Cascadia Code', 'Roboto Mono', Consolas,
|
||||
'Liberation Mono', Menlo, monospace;
|
||||
font-size: 0.875rem;
|
||||
line-height: 1.6;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
}
|
||||
</style>
|
||||
|
||||
@@ -148,6 +148,12 @@
|
||||
key: 'showThoughtInProgress',
|
||||
label: 'Show thought in progress',
|
||||
type: 'checkbox'
|
||||
},
|
||||
{
|
||||
key: 'disableReasoningFormat',
|
||||
label:
|
||||
'Show raw LLM output without backend parsing and frontend Markdown rendering to inspect streaming across different models.',
|
||||
type: 'checkbox'
|
||||
}
|
||||
]
|
||||
},
|
||||
|
||||
@@ -6,6 +6,7 @@ export const SETTING_CONFIG_DEFAULT: Record<string, string | number | boolean> =
|
||||
theme: 'system',
|
||||
showTokensPerSecond: false,
|
||||
showThoughtInProgress: false,
|
||||
disableReasoningFormat: false,
|
||||
keepStatsVisible: false,
|
||||
askForTitleConfirmation: false,
|
||||
pasteLongTextToFileLen: 2500,
|
||||
@@ -76,6 +77,8 @@ export const SETTING_CONFIG_INFO: Record<string, string> = {
|
||||
custom: 'Custom JSON parameters to send to the API. Must be valid JSON format.',
|
||||
showTokensPerSecond: 'Display generation speed in tokens per second during streaming.',
|
||||
showThoughtInProgress: 'Expand thought process by default when generating messages.',
|
||||
disableReasoningFormat:
|
||||
'Show raw LLM output without backend parsing and frontend Markdown rendering to inspect streaming across different models.',
|
||||
keepStatsVisible: 'Keep processing statistics visible after generation finishes.',
|
||||
askForTitleConfirmation:
|
||||
'Ask for confirmation before automatically changing conversation title when editing the first message.',
|
||||
|
||||
@@ -78,6 +78,8 @@ export class ChatService {
|
||||
timings_per_token
|
||||
} = options;
|
||||
|
||||
const currentConfig = config();
|
||||
|
||||
// Cancel any ongoing request and create a new abort controller
|
||||
this.abort();
|
||||
this.abortController = new AbortController();
|
||||
@@ -117,7 +119,7 @@ export class ChatService {
|
||||
stream
|
||||
};
|
||||
|
||||
requestBody.reasoning_format = 'auto';
|
||||
requestBody.reasoning_format = currentConfig.disableReasoningFormat ? 'none' : 'auto';
|
||||
|
||||
if (temperature !== undefined) requestBody.temperature = temperature;
|
||||
// Set max_tokens to -1 (infinite) if not provided or empty
|
||||
@@ -161,7 +163,6 @@ export class ChatService {
|
||||
}
|
||||
|
||||
try {
|
||||
const currentConfig = config();
|
||||
const apiKey = currentConfig.apiKey?.toString().trim();
|
||||
|
||||
const response = await fetch(`./v1/chat/completions`, {
|
||||
@@ -256,10 +257,8 @@ export class ChatService {
|
||||
}
|
||||
|
||||
const decoder = new TextDecoder();
|
||||
let fullResponse = '';
|
||||
let aggregatedContent = '';
|
||||
let fullReasoningContent = '';
|
||||
let regularContent = '';
|
||||
let insideThinkTag = false;
|
||||
let hasReceivedData = false;
|
||||
let lastTimings: ChatMessageTimings | undefined;
|
||||
|
||||
@@ -277,7 +276,7 @@ export class ChatService {
|
||||
if (line.startsWith('data: ')) {
|
||||
const data = line.slice(6);
|
||||
if (data === '[DONE]') {
|
||||
if (!hasReceivedData && fullResponse.length === 0) {
|
||||
if (!hasReceivedData && aggregatedContent.length === 0) {
|
||||
const contextError = new Error(
|
||||
'The request exceeds the available context size. Try increasing the context size or enable context shift.'
|
||||
);
|
||||
@@ -286,7 +285,7 @@ export class ChatService {
|
||||
return;
|
||||
}
|
||||
|
||||
onComplete?.(regularContent, fullReasoningContent || undefined, lastTimings);
|
||||
onComplete?.(aggregatedContent, fullReasoningContent || undefined, lastTimings);
|
||||
|
||||
return;
|
||||
}
|
||||
@@ -310,27 +309,8 @@ export class ChatService {
|
||||
|
||||
if (content) {
|
||||
hasReceivedData = true;
|
||||
fullResponse += content;
|
||||
|
||||
// Track the regular content before processing this chunk
|
||||
const regularContentBefore = regularContent;
|
||||
|
||||
// Process content character by character to handle think tags
|
||||
insideThinkTag = this.processContentForThinkTags(
|
||||
content,
|
||||
insideThinkTag,
|
||||
() => {
|
||||
// Think content is ignored - we don't include it in API requests
|
||||
},
|
||||
(regularChunk) => {
|
||||
regularContent += regularChunk;
|
||||
}
|
||||
);
|
||||
|
||||
const newRegularContent = regularContent.slice(regularContentBefore.length);
|
||||
if (newRegularContent) {
|
||||
onChunk?.(newRegularContent);
|
||||
}
|
||||
aggregatedContent += content;
|
||||
onChunk?.(content);
|
||||
}
|
||||
|
||||
if (reasoningContent) {
|
||||
@@ -345,7 +325,7 @@ export class ChatService {
|
||||
}
|
||||
}
|
||||
|
||||
if (!hasReceivedData && fullResponse.length === 0) {
|
||||
if (!hasReceivedData && aggregatedContent.length === 0) {
|
||||
const contextError = new Error(
|
||||
'The request exceeds the available context size. Try increasing the context size or enable context shift.'
|
||||
);
|
||||
@@ -552,51 +532,6 @@ export class ChatService {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Processes content to separate thinking tags from regular content.
|
||||
* Parses <think> and </think> tags to route content to appropriate handlers.
|
||||
*
|
||||
* @param content - The content string to process
|
||||
* @param currentInsideThinkTag - Current state of whether we're inside a think tag
|
||||
* @param addThinkContent - Callback to handle content inside think tags
|
||||
* @param addRegularContent - Callback to handle regular content outside think tags
|
||||
* @returns Boolean indicating if we're still inside a think tag after processing
|
||||
* @private
|
||||
*/
|
||||
private processContentForThinkTags(
|
||||
content: string,
|
||||
currentInsideThinkTag: boolean,
|
||||
addThinkContent: (chunk: string) => void,
|
||||
addRegularContent: (chunk: string) => void
|
||||
): boolean {
|
||||
let i = 0;
|
||||
let insideThinkTag = currentInsideThinkTag;
|
||||
|
||||
while (i < content.length) {
|
||||
if (!insideThinkTag && content.substring(i, i + 7) === '<think>') {
|
||||
insideThinkTag = true;
|
||||
i += 7; // Skip the <think> tag
|
||||
continue;
|
||||
}
|
||||
|
||||
if (insideThinkTag && content.substring(i, i + 8) === '</think>') {
|
||||
insideThinkTag = false;
|
||||
i += 8; // Skip the </think> tag
|
||||
continue;
|
||||
}
|
||||
|
||||
if (insideThinkTag) {
|
||||
addThinkContent(content[i]);
|
||||
} else {
|
||||
addRegularContent(content[i]);
|
||||
}
|
||||
|
||||
i++;
|
||||
}
|
||||
|
||||
return insideThinkTag;
|
||||
}
|
||||
|
||||
/**
|
||||
* Aborts any ongoing chat completion request.
|
||||
* Cancels the current request and cleans up the abort controller.
|
||||
|
||||
@@ -5,7 +5,6 @@ import { config } from '$lib/stores/settings.svelte';
|
||||
import { filterByLeafNodeId, findLeafNode, findDescendantMessages } from '$lib/utils/branching';
|
||||
import { browser } from '$app/environment';
|
||||
import { goto } from '$app/navigation';
|
||||
import { extractPartialThinking } from '$lib/utils/thinking';
|
||||
import { toast } from 'svelte-sonner';
|
||||
import type { ExportedConversations } from '$lib/types/database';
|
||||
|
||||
@@ -344,11 +343,9 @@ class ChatStore {
|
||||
this.currentResponse = streamedContent;
|
||||
|
||||
captureModelIfNeeded();
|
||||
|
||||
const partialThinking = extractPartialThinking(streamedContent);
|
||||
const messageIndex = this.findMessageIndex(assistantMessage.id);
|
||||
this.updateMessageAtIndex(messageIndex, {
|
||||
content: partialThinking.remainingContent || streamedContent
|
||||
content: streamedContent
|
||||
});
|
||||
},
|
||||
|
||||
@@ -696,18 +693,16 @@ class ChatStore {
|
||||
|
||||
if (lastMessage && lastMessage.role === 'assistant') {
|
||||
try {
|
||||
const partialThinking = extractPartialThinking(this.currentResponse);
|
||||
|
||||
const updateData: {
|
||||
content: string;
|
||||
thinking?: string;
|
||||
timings?: ChatMessageTimings;
|
||||
} = {
|
||||
content: partialThinking.remainingContent || this.currentResponse
|
||||
content: this.currentResponse
|
||||
};
|
||||
|
||||
if (partialThinking.thinking) {
|
||||
updateData.thinking = partialThinking.thinking;
|
||||
if (lastMessage.thinking?.trim()) {
|
||||
updateData.thinking = lastMessage.thinking;
|
||||
}
|
||||
|
||||
const lastKnownState = await slotsService.getCurrentState();
|
||||
@@ -727,7 +722,10 @@ class ChatStore {
|
||||
|
||||
await DatabaseStore.updateMessage(lastMessage.id, updateData);
|
||||
|
||||
lastMessage.content = partialThinking.remainingContent || this.currentResponse;
|
||||
lastMessage.content = this.currentResponse;
|
||||
if (updateData.thinking !== undefined) {
|
||||
lastMessage.thinking = updateData.thinking;
|
||||
}
|
||||
if (updateData.timings) {
|
||||
lastMessage.timings = updateData.timings;
|
||||
}
|
||||
|
||||
@@ -1,143 +0,0 @@
|
||||
/**
|
||||
* Parses thinking content from a message that may contain <think> tags or [THINK] tags
|
||||
* Returns an object with thinking content and cleaned message content
|
||||
* Handles both complete blocks and incomplete blocks (streaming)
|
||||
* Supports formats: <think>...</think> and [THINK]...[/THINK]
|
||||
* @param content - The message content to parse
|
||||
* @returns An object containing the extracted thinking content and the cleaned message content
|
||||
*/
|
||||
export function parseThinkingContent(content: string): {
|
||||
thinking: string | null;
|
||||
cleanContent: string;
|
||||
} {
|
||||
const incompleteThinkMatch = content.includes('<think>') && !content.includes('</think>');
|
||||
const incompleteThinkBracketMatch = content.includes('[THINK]') && !content.includes('[/THINK]');
|
||||
|
||||
if (incompleteThinkMatch) {
|
||||
const cleanContent = content.split('</think>')?.[1]?.trim();
|
||||
const thinkingContent = content.split('<think>')?.[1]?.trim();
|
||||
|
||||
return {
|
||||
cleanContent,
|
||||
thinking: thinkingContent
|
||||
};
|
||||
}
|
||||
|
||||
if (incompleteThinkBracketMatch) {
|
||||
const cleanContent = content.split('[/THINK]')?.[1]?.trim();
|
||||
const thinkingContent = content.split('[THINK]')?.[1]?.trim();
|
||||
|
||||
return {
|
||||
cleanContent,
|
||||
thinking: thinkingContent
|
||||
};
|
||||
}
|
||||
|
||||
const completeThinkMatch = content.match(/<think>([\s\S]*?)<\/think>/);
|
||||
const completeThinkBracketMatch = content.match(/\[THINK\]([\s\S]*?)\[\/THINK\]/);
|
||||
|
||||
if (completeThinkMatch) {
|
||||
const thinkingContent = completeThinkMatch[1]?.trim() ?? '';
|
||||
const cleanContent = `${content.slice(0, completeThinkMatch.index ?? 0)}${content.slice(
|
||||
(completeThinkMatch.index ?? 0) + completeThinkMatch[0].length
|
||||
)}`.trim();
|
||||
|
||||
return {
|
||||
thinking: thinkingContent,
|
||||
cleanContent
|
||||
};
|
||||
}
|
||||
|
||||
if (completeThinkBracketMatch) {
|
||||
const thinkingContent = completeThinkBracketMatch[1]?.trim() ?? '';
|
||||
const cleanContent = `${content.slice(0, completeThinkBracketMatch.index ?? 0)}${content.slice(
|
||||
(completeThinkBracketMatch.index ?? 0) + completeThinkBracketMatch[0].length
|
||||
)}`.trim();
|
||||
|
||||
return {
|
||||
thinking: thinkingContent,
|
||||
cleanContent
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
thinking: null,
|
||||
cleanContent: content
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Checks if content contains an opening thinking tag (for streaming)
|
||||
* Supports both <think> and [THINK] formats
|
||||
* @param content - The message content to check
|
||||
* @returns True if the content contains an opening thinking tag
|
||||
*/
|
||||
export function hasThinkingStart(content: string): boolean {
|
||||
return (
|
||||
content.includes('<think>') ||
|
||||
content.includes('[THINK]') ||
|
||||
content.includes('<|channel|>analysis')
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Checks if content contains a closing thinking tag (for streaming)
|
||||
* Supports both </think> and [/THINK] formats
|
||||
* @param content - The message content to check
|
||||
* @returns True if the content contains a closing thinking tag
|
||||
*/
|
||||
export function hasThinkingEnd(content: string): boolean {
|
||||
return content.includes('</think>') || content.includes('[/THINK]');
|
||||
}
|
||||
|
||||
/**
|
||||
* Extracts partial thinking content during streaming
|
||||
* Supports both <think> and [THINK] formats
|
||||
* Used when we have opening tag but not yet closing tag
|
||||
* @param content - The message content to extract partial thinking from
|
||||
* @returns An object containing the extracted partial thinking content and the remaining content
|
||||
*/
|
||||
export function extractPartialThinking(content: string): {
|
||||
thinking: string | null;
|
||||
remainingContent: string;
|
||||
} {
|
||||
const thinkStartIndex = content.indexOf('<think>');
|
||||
const thinkEndIndex = content.indexOf('</think>');
|
||||
|
||||
const bracketStartIndex = content.indexOf('[THINK]');
|
||||
const bracketEndIndex = content.indexOf('[/THINK]');
|
||||
|
||||
const useThinkFormat =
|
||||
thinkStartIndex !== -1 && (bracketStartIndex === -1 || thinkStartIndex < bracketStartIndex);
|
||||
const useBracketFormat =
|
||||
bracketStartIndex !== -1 && (thinkStartIndex === -1 || bracketStartIndex < thinkStartIndex);
|
||||
|
||||
if (useThinkFormat) {
|
||||
if (thinkEndIndex === -1) {
|
||||
const thinkingStart = thinkStartIndex + '<think>'.length;
|
||||
|
||||
return {
|
||||
thinking: content.substring(thinkingStart),
|
||||
remainingContent: content.substring(0, thinkStartIndex)
|
||||
};
|
||||
}
|
||||
} else if (useBracketFormat) {
|
||||
if (bracketEndIndex === -1) {
|
||||
const thinkingStart = bracketStartIndex + '[THINK]'.length;
|
||||
|
||||
return {
|
||||
thinking: content.substring(thinkingStart),
|
||||
remainingContent: content.substring(0, bracketStartIndex)
|
||||
};
|
||||
}
|
||||
} else {
|
||||
return { thinking: null, remainingContent: content };
|
||||
}
|
||||
|
||||
const parsed = parseThinkingContent(content);
|
||||
|
||||
return {
|
||||
thinking: parsed.thinking,
|
||||
remainingContent: parsed.cleanContent
|
||||
};
|
||||
}
|
||||
@@ -36,6 +36,31 @@
|
||||
children: []
|
||||
};
|
||||
|
||||
const assistantWithReasoning: DatabaseMessage = {
|
||||
id: '3',
|
||||
convId: 'conv-1',
|
||||
type: 'message',
|
||||
timestamp: Date.now() - 1000 * 60 * 2,
|
||||
role: 'assistant',
|
||||
content: "Here's the concise answer, now that I've thought it through carefully for you.",
|
||||
parent: '1',
|
||||
thinking:
|
||||
"Let's consider the user's question step by step:\\n\\n1. Identify the core problem\\n2. Evaluate relevant information\\n3. Formulate a clear answer\\n\\nFollowing this process ensures the final response stays focused and accurate.",
|
||||
children: []
|
||||
};
|
||||
const rawOutputMessage: DatabaseMessage = {
|
||||
id: '6',
|
||||
convId: 'conv-1',
|
||||
type: 'message',
|
||||
timestamp: Date.now() - 1000 * 60,
|
||||
role: 'assistant',
|
||||
content:
|
||||
'<|channel|>analysis<|message|>User greeted me. Initiating overcomplicated analysis: Is this a trap? No, just a normal hello. Respond calmly, act like a helpful assistant, and do not start explaining quantum physics again. Confidence 0.73. Engaging socially acceptable greeting protocol...<|end|>Hello there! How can I help you today?',
|
||||
parent: '1',
|
||||
thinking: '',
|
||||
children: []
|
||||
};
|
||||
|
||||
let processingMessage = $state({
|
||||
id: '4',
|
||||
convId: 'conv-1',
|
||||
@@ -59,60 +84,6 @@
|
||||
thinking: '',
|
||||
children: []
|
||||
});
|
||||
|
||||
// Message with <think> format thinking content
|
||||
const thinkTagMessage: DatabaseMessage = {
|
||||
id: '6',
|
||||
convId: 'conv-1',
|
||||
type: 'message',
|
||||
timestamp: Date.now() - 1000 * 60 * 2,
|
||||
role: 'assistant',
|
||||
content:
|
||||
"<think>\nLet me analyze this step by step:\n\n1. The user is asking about thinking formats\n2. I need to demonstrate the <think> tag format\n3. This content should be displayed in the thinking section\n4. The main response should be separate\n\nThis is a good example of reasoning content.\n</think>\n\nHere's my response after thinking through the problem. The thinking content above should be displayed separately from this main response content.",
|
||||
parent: '1',
|
||||
thinking: '',
|
||||
children: []
|
||||
};
|
||||
|
||||
// Message with [THINK] format thinking content
|
||||
const thinkBracketMessage: DatabaseMessage = {
|
||||
id: '7',
|
||||
convId: 'conv-1',
|
||||
type: 'message',
|
||||
timestamp: Date.now() - 1000 * 60 * 1,
|
||||
role: 'assistant',
|
||||
content:
|
||||
'[THINK]\nThis is the DeepSeek-style thinking format:\n\n- Using square brackets instead of angle brackets\n- Should work identically to the <think> format\n- Content parsing should extract this reasoning\n- Display should be the same as <think> format\n\nBoth formats should be supported seamlessly.\n[/THINK]\n\nThis is the main response content that comes after the [THINK] block. The reasoning above should be parsed and displayed in the thinking section.',
|
||||
parent: '1',
|
||||
thinking: '',
|
||||
children: []
|
||||
};
|
||||
|
||||
// Streaming message for <think> format
|
||||
let streamingThinkMessage = $state({
|
||||
id: '8',
|
||||
convId: 'conv-1',
|
||||
type: 'message',
|
||||
timestamp: 0, // No timestamp = streaming
|
||||
role: 'assistant',
|
||||
content: '',
|
||||
parent: '1',
|
||||
thinking: '',
|
||||
children: []
|
||||
});
|
||||
|
||||
// Streaming message for [THINK] format
|
||||
let streamingBracketMessage = $state({
|
||||
id: '9',
|
||||
convId: 'conv-1',
|
||||
type: 'message',
|
||||
timestamp: 0, // No timestamp = streaming
|
||||
role: 'assistant',
|
||||
content: '',
|
||||
parent: '1',
|
||||
thinking: '',
|
||||
children: []
|
||||
});
|
||||
</script>
|
||||
|
||||
<Story
|
||||
@@ -120,6 +91,10 @@
|
||||
args={{
|
||||
message: userMessage
|
||||
}}
|
||||
play={async () => {
|
||||
const { updateConfig } = await import('$lib/stores/settings.svelte');
|
||||
updateConfig('disableReasoningFormat', false);
|
||||
}}
|
||||
/>
|
||||
|
||||
<Story
|
||||
@@ -128,15 +103,45 @@
|
||||
class: 'max-w-[56rem] w-[calc(100vw-2rem)]',
|
||||
message: assistantMessage
|
||||
}}
|
||||
play={async () => {
|
||||
const { updateConfig } = await import('$lib/stores/settings.svelte');
|
||||
updateConfig('disableReasoningFormat', false);
|
||||
}}
|
||||
/>
|
||||
|
||||
<Story
|
||||
name="WithThinkingBlock"
|
||||
name="AssistantWithReasoning"
|
||||
args={{
|
||||
class: 'max-w-[56rem] w-[calc(100vw-2rem)]',
|
||||
message: assistantWithReasoning
|
||||
}}
|
||||
play={async () => {
|
||||
const { updateConfig } = await import('$lib/stores/settings.svelte');
|
||||
updateConfig('disableReasoningFormat', false);
|
||||
}}
|
||||
/>
|
||||
|
||||
<Story
|
||||
name="RawLlmOutput"
|
||||
args={{
|
||||
class: 'max-w-[56rem] w-[calc(100vw-2rem)]',
|
||||
message: rawOutputMessage
|
||||
}}
|
||||
play={async () => {
|
||||
const { updateConfig } = await import('$lib/stores/settings.svelte');
|
||||
updateConfig('disableReasoningFormat', true);
|
||||
}}
|
||||
/>
|
||||
|
||||
<Story
|
||||
name="WithReasoningContent"
|
||||
args={{
|
||||
message: streamingMessage
|
||||
}}
|
||||
asChild
|
||||
play={async () => {
|
||||
const { updateConfig } = await import('$lib/stores/settings.svelte');
|
||||
updateConfig('disableReasoningFormat', false);
|
||||
// Phase 1: Stream reasoning content in chunks
|
||||
let reasoningText =
|
||||
'I need to think about this carefully. Let me break down the problem:\n\n1. The user is asking for help with something complex\n2. I should provide a thorough and helpful response\n3. I need to consider multiple approaches\n4. The best solution would be to explain step by step\n\nThis approach will ensure clarity and understanding.';
|
||||
@@ -187,126 +192,16 @@
|
||||
message: processingMessage
|
||||
}}
|
||||
play={async () => {
|
||||
const { updateConfig } = await import('$lib/stores/settings.svelte');
|
||||
updateConfig('disableReasoningFormat', false);
|
||||
// Import the chat store to simulate loading state
|
||||
const { chatStore } = await import('$lib/stores/chat.svelte');
|
||||
|
||||
|
||||
// Set loading state to true to trigger the processing UI
|
||||
chatStore.isLoading = true;
|
||||
|
||||
|
||||
// Simulate the processing state hook behavior
|
||||
// This will show the "Generating..." text and parameter details
|
||||
await new Promise(resolve => setTimeout(resolve, 100));
|
||||
await new Promise((resolve) => setTimeout(resolve, 100));
|
||||
}}
|
||||
/>
|
||||
|
||||
<Story
|
||||
name="ThinkTagFormat"
|
||||
args={{
|
||||
class: 'max-w-[56rem] w-[calc(100vw-2rem)]',
|
||||
message: thinkTagMessage
|
||||
}}
|
||||
/>
|
||||
|
||||
<Story
|
||||
name="ThinkBracketFormat"
|
||||
args={{
|
||||
class: 'max-w-[56rem] w-[calc(100vw-2rem)]',
|
||||
message: thinkBracketMessage
|
||||
}}
|
||||
/>
|
||||
|
||||
<Story
|
||||
name="StreamingThinkTag"
|
||||
args={{
|
||||
message: streamingThinkMessage
|
||||
}}
|
||||
parameters={{
|
||||
test: {
|
||||
timeout: 30000
|
||||
}
|
||||
}}
|
||||
asChild
|
||||
play={async () => {
|
||||
// Phase 1: Stream <think> reasoning content
|
||||
const thinkingContent =
|
||||
'Let me work through this problem systematically:\n\n1. First, I need to understand what the user is asking\n2. Then I should consider different approaches\n3. I need to evaluate the pros and cons\n4. Finally, I should provide a clear recommendation\n\nThis step-by-step approach will ensure accuracy.';
|
||||
|
||||
let currentContent = '<think>\n';
|
||||
streamingThinkMessage.content = currentContent;
|
||||
|
||||
for (let i = 0; i < thinkingContent.length; i++) {
|
||||
currentContent += thinkingContent[i];
|
||||
streamingThinkMessage.content = currentContent;
|
||||
await new Promise((resolve) => setTimeout(resolve, 5));
|
||||
}
|
||||
|
||||
// Close the thinking block
|
||||
currentContent += '\n</think>\n\n';
|
||||
streamingThinkMessage.content = currentContent;
|
||||
await new Promise((resolve) => setTimeout(resolve, 200));
|
||||
|
||||
// Phase 2: Stream main response content
|
||||
const responseContent =
|
||||
"Based on my analysis above, here's the solution:\n\n**Key Points:**\n- The approach should be systematic\n- We need to consider all factors\n- Implementation should be step-by-step\n\nThis ensures the best possible outcome.";
|
||||
|
||||
for (let i = 0; i < responseContent.length; i++) {
|
||||
currentContent += responseContent[i];
|
||||
streamingThinkMessage.content = currentContent;
|
||||
await new Promise((resolve) => setTimeout(resolve, 10));
|
||||
}
|
||||
|
||||
streamingThinkMessage.timestamp = Date.now();
|
||||
}}
|
||||
>
|
||||
<div class="w-[56rem]">
|
||||
<ChatMessage message={streamingThinkMessage} />
|
||||
</div>
|
||||
</Story>
|
||||
|
||||
<Story
|
||||
name="StreamingThinkBracket"
|
||||
args={{
|
||||
message: streamingBracketMessage
|
||||
}}
|
||||
parameters={{
|
||||
test: {
|
||||
timeout: 30000
|
||||
}
|
||||
}}
|
||||
asChild
|
||||
play={async () => {
|
||||
// Phase 1: Stream [THINK] reasoning content
|
||||
const thinkingContent =
|
||||
'Using the DeepSeek format now:\n\n- This demonstrates the [THINK] bracket format\n- Should parse identically to <think> tags\n- The UI should display this in the thinking section\n- Main content should be separate\n\nBoth formats provide the same functionality.';
|
||||
|
||||
let currentContent = '[THINK]\n';
|
||||
streamingBracketMessage.content = currentContent;
|
||||
|
||||
for (let i = 0; i < thinkingContent.length; i++) {
|
||||
currentContent += thinkingContent[i];
|
||||
streamingBracketMessage.content = currentContent;
|
||||
await new Promise((resolve) => setTimeout(resolve, 5));
|
||||
}
|
||||
|
||||
// Close the thinking block
|
||||
currentContent += '\n[/THINK]\n\n';
|
||||
streamingBracketMessage.content = currentContent;
|
||||
await new Promise((resolve) => setTimeout(resolve, 200));
|
||||
|
||||
// Phase 2: Stream main response content
|
||||
const responseContent =
|
||||
"Here's my response after using the [THINK] format:\n\n**Observations:**\n- Both <think> and [THINK] formats work seamlessly\n- The parsing logic handles both cases\n- UI display is consistent across formats\n\nThis demonstrates the enhanced thinking content support.";
|
||||
|
||||
for (let i = 0; i < responseContent.length; i++) {
|
||||
currentContent += responseContent[i];
|
||||
streamingBracketMessage.content = currentContent;
|
||||
await new Promise((resolve) => setTimeout(resolve, 10));
|
||||
}
|
||||
|
||||
streamingBracketMessage.timestamp = Date.now();
|
||||
}}
|
||||
>
|
||||
<div class="w-[56rem]">
|
||||
<ChatMessage message={streamingBracketMessage} />
|
||||
</div>
|
||||
</Story>
|
||||
|
||||
Reference in New Issue
Block a user