mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-10-27 08:21:30 +00:00
chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style) (#15533)
* Add DeepSeek V3.1 thinking mode support
- Added COMMON_CHAT_FORMAT_DEEPSEEK_V3_1 enum value
- Created common_chat_params_init_deepseek_v3_1() function (currently uses R1 implementation)
- Created common_chat_parse_deepseek_v3_1() function that handles V3.1 thinking format:
- Extracts reasoning content before '</think>' tag into reasoning_content
- Extracts regular content after '</think>' tag into content
- No opening '<think>' tag in V3.1 format
- Added detection logic for V3.1 templates based on pattern: 'message['prefix'] is defined and message['prefix'] and thinking'
- Added V3.1 case to parsing switch statement
This addresses the issue where V3.1 outputs reasoning content followed by '</think>' and then regular content without the opening '<think>' tag.
* Another attempt by V3.1 non-thinking
* Fix test, but it's not asserting anything.
* Ignore vim swap files in tests dir
* Update the test
* Try using try_find_literal instead of regex
* passing test
* Revert "Try using try_find_literal instead of regex"
This reverts commit c50d887ec2.
* Remove unnecessary change
* Remove comment
* Add code to handle non-thinking mode.
* Try to set message['prefix'] when thinking is enabled.
* This fixes reasoning, but breaks normal content. We need state in the
chat parser.
* DeepSeek V3.1 thinking is now the default. Disable with `--reasoning-budget 0`.
* Simplify (DeepSeek V3.1 reasoning)
* Fix sign inversion bug
* Add some tool calling code (not working).
* Tool calls working in non-reasoning mode.
* Attempt a unit test for tool call parsing.
* Passing test
* Add tests for both happy path and broken fenced DeepSeek V3.1 tool call variants.
* Passing DeepSeek V3.1 tool call tests, but model is not working.
* Revert assistance response prefill change. Not my monkeys.
* Add fenced_thinking unit test variant. Passes, but thinking tool calling
still isn't working for some reason.
* Tests pass in reasoning mode. Also e2e tool test passes.
* Make a copy of the parse_json_tool_calls function for deepseek-v3.1 so
as to not accidentally introduce regressions.
* Fix thinking_forced_open logic. tool calling broken. Need to add another
test case.
* That's what I get for cargo culting a newline.
* Add multi tool call test for deepseek v3.1 non-reasoning
* Move test, remove .gitignore change
* Place deepseek-v3.1 reasoning test directly into existing reasoning
function per CISC's request.
* Address whitespace CI failure.
* Merge two assert_equals per CISC's request.
* Add DeepSeek-V3.1 tests to tests/test-chat.cpp per CISC's request.
* Merge deepseek V3.1 and regular parse_json_tool_calls() function
behaviors by adding optional update_cursor argument.
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* DeepSeek V3.1 fix reasoning_format none
* Strip grammar down to strictly what we expect based on model card. Throw
out parts we cargo culted from R1 that don't make sense.
* Update tests/test-chat-parser.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* DeepSeek V3.1 - Add edge case where thinking is forced open, there is
tool calling in the reasoning content, but then the model just stops the
output without closing the </think> tag, so it's not a partial. In this
case, use the tool call in the reasoning content.
* DeepSeek V3.1 - simplify update_cursor
* Update common/chat.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update common/chat.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update common/chat.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Fix indent
---------
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
This commit is contained in:
@@ -1757,7 +1757,6 @@ static void test_template_output_parsers() {
|
||||
/* is_partial= */ false,
|
||||
{COMMON_CHAT_FORMAT_SEED_OSS}));
|
||||
}
|
||||
|
||||
{
|
||||
auto tmpls = read_templates("models/templates/NVIDIA-Nemotron-Nano-v2.jinja");
|
||||
std::vector<std::string> end_tokens{ "<SPECIAL_12>" };
|
||||
@@ -1828,6 +1827,142 @@ static void test_template_output_parsers() {
|
||||
/* expect_grammar_triggered= */ true
|
||||
);
|
||||
}
|
||||
{
|
||||
auto tmpls = read_templates("models/templates/deepseek-ai-DeepSeek-V3.1.jinja");
|
||||
std::vector<std::string> end_tokens{ "<|end▁of▁sentence|>" };
|
||||
|
||||
for (const auto & inputs : { inputs_no_tools, inputs_tools }) {
|
||||
auto params = common_chat_templates_apply(tmpls.get(), inputs);
|
||||
assert_equals(COMMON_CHAT_FORMAT_DEEPSEEK_V3_1, params.format);
|
||||
assert_equals(true, params.thinking_forced_open);
|
||||
}
|
||||
|
||||
test_templates(tmpls.get(), end_tokens, message_assist, tools, "</think>Hello, world!\nWhat's up?", /* expect_grammar_triggered= */ false);
|
||||
test_templates(tmpls.get(), end_tokens, message_assist_thoughts, tools, "</think>Hello, world!\nWhat's up?", /* expect_grammar_triggered= */ false);
|
||||
assert_msg_equals(
|
||||
simple_assist_msg("Hello, world!\nWhat's up?", "I'm\nthinking"),
|
||||
common_chat_parse(
|
||||
"I'm\nthinking</think>Hello, world!\nWhat's up?",
|
||||
/* is_partial= */ false,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ true,
|
||||
}));
|
||||
// variant: thinking forced open, reasoning_format none
|
||||
assert_msg_equals(
|
||||
simple_assist_msg("REASONING</think>ok", ""),
|
||||
common_chat_parse(
|
||||
"REASONING</think>ok",
|
||||
/* is_partial= */ false,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_NONE,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ true,
|
||||
/* .parse_tool_calls = */ true,
|
||||
}));
|
||||
// variant: happy path for when it works as the model card says it should
|
||||
assert_msg_equals(
|
||||
simple_assist_msg("", "", "get_time", "{\"city\":\"Tokyo\"}"),
|
||||
common_chat_parse(
|
||||
"<|tool▁calls▁begin|><|tool▁call▁begin|>get_time<|tool▁sep|>{\"city\": \"Tokyo\"}<|tool▁call▁end|><|tool▁calls▁end|>",
|
||||
/* is_partial= */ false,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ false,
|
||||
/* .parse_tool_calls = */ true,
|
||||
}));
|
||||
// variant: simple + thinking open
|
||||
assert_msg_equals(
|
||||
simple_assist_msg("", "REASONING", "get_time", "{\"city\":\"Tokyo\"}"),
|
||||
common_chat_parse(
|
||||
"REASONING</think><|tool▁calls▁begin|><|tool▁call▁begin|>get_time<|tool▁sep|>{\"city\": \"Tokyo\"}<|tool▁call▁end|><|tool▁calls▁end|>",
|
||||
/* is_partial= */ false,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ true,
|
||||
/* .parse_tool_calls = */ true,
|
||||
}));
|
||||
// variant: simple + multiple tool calls
|
||||
common_chat_msg message_assist_multiple_calls;
|
||||
message_assist_multiple_calls.role = "assistant";
|
||||
message_assist_multiple_calls.content = "CONTENT";
|
||||
message_assist_multiple_calls.tool_calls.push_back({"get_time", "{\"city\":\"Paris\"}", ""});
|
||||
message_assist_multiple_calls.tool_calls.push_back({"get_weather", "{\"city\":\"Paris\"}", ""});
|
||||
assert_msg_equals(
|
||||
message_assist_multiple_calls,
|
||||
common_chat_parse(
|
||||
"CONTENT<|tool▁calls▁begin|><|tool▁call▁begin|>get_time<|tool▁sep|>{\"city\": \"Paris\"}<|tool▁call▁end|><|tool▁call▁begin|>get_weather<|tool▁sep|>{\"city\": \"Paris\"}<|tool▁call▁end|><|tool▁calls▁end|>",
|
||||
/* is_partial= */ false,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ false,
|
||||
/* .parse_tool_calls = */ true,
|
||||
}));
|
||||
// variant: thinking forced open + tool call in reasoning content
|
||||
assert_msg_equals(
|
||||
simple_assist_msg("", "REASONING<|tool▁calls▁begin|><|tool▁call▁begin|>get_time2<|tool▁sep|>{\"city\": \"Tokyo2\"}<|tool▁call▁end|><|tool▁calls▁end|>REASONING", "get_time", "{\"city\":\"Tokyo\"}"),
|
||||
common_chat_parse(
|
||||
"REASONING<|tool▁calls▁begin|><|tool▁call▁begin|>get_time2<|tool▁sep|>{\"city\": \"Tokyo2\"}<|tool▁call▁end|><|tool▁calls▁end|>REASONING</think><|tool▁calls▁begin|><|tool▁call▁begin|>get_time<|tool▁sep|>{\"city\": \"Tokyo\"}<|tool▁call▁end|><|tool▁calls▁end|>",
|
||||
/* is_partial= */ false,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ true,
|
||||
/* .parse_tool_calls = */ true,
|
||||
}));
|
||||
// variant: thinking forced open + tool call in reasoning content + no closing think + not partial
|
||||
// This is a bit of a fine tuning issue on the model's part IMO. It really should not be attempting
|
||||
// to make tool calls in reasoning content according to the model card, but it does sometimes, so
|
||||
// add the reasoning content as regular content and parse the tool calls.
|
||||
assert_msg_equals(
|
||||
simple_assist_msg("REASONING", "", "get_time", "{\"city\":\"Tokyo\"}"),
|
||||
common_chat_parse(
|
||||
"REASONING<|tool▁calls▁begin|><|tool▁call▁begin|>get_time<|tool▁sep|>{\"city\": \"Tokyo\"}<|tool▁call▁end|><|tool▁calls▁end|>",
|
||||
/* is_partial= */ false,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ true,
|
||||
/* .parse_tool_calls = */ true,
|
||||
}));
|
||||
// variant: thinking forced open + tool call in reasoning content + no closing think + partial
|
||||
assert_msg_equals(
|
||||
simple_assist_msg("", "REASONING<|tool▁calls▁begin|><|tool▁call▁begin|>get_time<|tool▁sep|>{\"city\": \"Tokyo\"}<|tool▁call▁end|><|tool▁calls▁end|>", "", ""),
|
||||
common_chat_parse(
|
||||
"REASONING<|tool▁calls▁begin|><|tool▁call▁begin|>get_time<|tool▁sep|>{\"city\": \"Tokyo\"}<|tool▁call▁end|><|tool▁calls▁end|>",
|
||||
/* is_partial= */ true,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ true,
|
||||
/* .parse_tool_calls = */ true,
|
||||
}));
|
||||
// variant: thinking not forced open + missing reasoning + no tool calls
|
||||
assert_msg_equals(
|
||||
simple_assist_msg("CONTENT", ""),
|
||||
common_chat_parse(
|
||||
"CONTENT",
|
||||
/* is_partial= */ false,
|
||||
{
|
||||
COMMON_CHAT_FORMAT_DEEPSEEK_V3_1,
|
||||
/* .reasoning_format = */ COMMON_REASONING_FORMAT_DEEPSEEK,
|
||||
/* .reasoning_in_content = */ false,
|
||||
/* .thinking_forced_open = */ false,
|
||||
/* .parse_tool_calls = */ true,
|
||||
}));
|
||||
}
|
||||
}
|
||||
|
||||
static void test_msg_diffs_compute() {
|
||||
|
||||
Reference in New Issue
Block a user