Georgi Gerganov
f95b04a21c
model : fix order kvq -> qkv
...
ggml-ci
2025-02-19 18:52:20 +02:00
Georgi Gerganov
2eacb4c1bf
graph : simplify attention api
...
ggml-ci
2025-02-19 18:43:49 +02:00
Georgi Gerganov
e17e4b72d1
context : add llama_context_recurrent
...
ggml-ci
2025-02-19 16:07:27 +02:00
Georgi Gerganov
5f11a5502a
kv-cache : remove llama_kv_cache_i
2025-02-19 14:36:27 +02:00
Georgi Gerganov
f5cedbcaaa
kv-cache : prepare for abstraction
...
ggml-ci
2025-02-18 21:28:58 +02:00
Georgi Gerganov
2bffc2d514
model : pass llama_graph_i as ptr
...
ggml-ci
2025-02-18 14:57:26 +02:00
Georgi Gerganov
9e50456e19
context : minor simplify
...
ggml-ci
2025-02-18 14:53:02 +02:00
Georgi Gerganov
befe14f06f
llama : reorder encode/decode in sources
2025-02-18 14:47:53 +02:00
Georgi Gerganov
bc6f187e9c
cont : use returend tensors from the graph build
...
ggml-ci
2025-02-18 14:24:17 +02:00
Georgi Gerganov
172f61690c
cont : return important tensors
...
ggml-ci
2025-02-18 13:48:43 +02:00
Georgi Gerganov
c23590319a
graph : add llama_graph_result
...
ggml-ci
2025-02-18 13:48:21 +02:00
Georgi Gerganov
f0d3ff2388
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-02-18 10:14:37 +02:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Georgi Gerganov
1d801d27b9
graph : update attn/kv_self names
2025-02-14 17:22:55 +02:00
Georgi Gerganov
828064564c
context : move common inputs to base class
...
ggml-ci
2025-02-14 16:48:21 +02:00
Georgi Gerganov
d5e8e1a2ba
context : remove batch_manager
...
ggml-ci
2025-02-14 16:10:55 +02:00
Georgi Gerganov
131743ff4f
context : abstract constructor and init
...
ggml-ci
2025-02-13 17:17:51 +02:00
Georgi Gerganov
ed3cb55abe
context : abstract input
...
ggml-ci
2025-02-13 15:53:15 +02:00
Georgi Gerganov
107d1e2c32
context : move output functionality to base class
...
ggml-ci
2025-02-13 15:42:14 +02:00
Georgi Gerganov
e08f38df69
context : minor cleanup
...
ggml-ci
2025-02-13 12:50:53 +02:00
Georgi Gerganov
f7c7757bab
context : abstract state read/write
...
ggml-ci
2025-02-13 12:37:28 +02:00
Georgi Gerganov
3a504d9a0b
llama : introduce llama_io interfaces
...
ggml-ci
2025-02-13 12:25:54 +02:00
Olivier Chafik
c7f460ab88
server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607 )
...
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-13 10:05:16 +00:00
Vinesh Janarthanan
27e8a23300
sampling: add Top-nσ sampler ( #11223 )
...
* initial sampling changes:
* completed top nsigma sampler implementation
* apply parameter to only llama-cli
* updated readme
* added tests and fixed nsigma impl
* cleaned up pr
* format
* format
* format
* removed commented tests
* cleanup pr and remove explicit floats
* added top-k sampler to improve performance
* changed sigma to float
* fixed string format to float
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* added llama_sampler_init
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-13 08:45:57 +02:00
Daniel Bevenius
3e69319772
llama : update llama_decode_internal ref [no ci] ( #11840 )
...
This commit updates the comment in llama_kv_cache.h to reflect the
change of the function name from llama_decode_internal to
llama_decode_impl.
2025-02-13 08:07:51 +02:00
Georgi Gerganov
fbe6a07256
context : rename to llama_context_kv_self
2025-02-12 17:16:44 +02:00
Georgi Gerganov
6ee86e5e0f
graph : restore ubatch in build_cb
...
ggml-ci
2025-02-12 16:29:15 +02:00
bandoti
fef0cbeadf
cleanup: fix compile warnings associated with gnu_printf ( #11811 )
2025-02-12 10:06:53 -04:00
Georgi Gerganov
f63aeecce6
llama : models now build their graphs using llama_graph_i
...
ggml-ci
2025-02-12 15:08:40 +02:00
Georgi Gerganov
0ab50f1bbb
context : prepare llama_model graph build
...
ggml-ci
2025-02-12 14:09:55 +02:00
Georgi Gerganov
e633dc171a
context : introduce llama_graph_i
...
ggml-ci
2025-02-12 13:49:44 +02:00
Georgi Gerganov
5eae8e5183
context : move build_rope_factors to base class
...
ggml-ci
2025-02-12 13:32:02 +02:00
Georgi Gerganov
d146a14f77
context : minor naming fix
2025-02-12 12:41:36 +02:00
Georgi Gerganov
8da7f612b7
context : improve llama_context encapsulation
...
ggml-ci
2025-02-12 12:15:04 +02:00
Georgi Gerganov
b52b79b048
context : move encode/decode to llama-context.cpp
2025-02-12 11:23:38 +02:00
Daniel Bevenius
369be5598a
llama : fix typo in llama-grammar.h [no ci] ( #11816 )
2025-02-12 09:40:01 +02:00
Georgi Gerganov
02ef4be975
context : initial abstraction
...
ggml-ci
2025-02-11 22:27:21 +02:00
Wilken Gottwalt
19b392d58d
llama-mmap: fix missing include ( #11796 )
...
Technically the fixed width types come only from iostream and
cstdint/stdint.h headers. memory and vector headers should not provide
these. In GCC 15 the headers are cleaned up and you require the proper
header cstdint.
src/llama-mmap.h:26:5: error: ‘uint32_t’ does not name a type
26 | uint32_t read_u32() const;
| ^~~~~~~~
2025-02-10 20:58:18 +02:00
Georgi Gerganov
2cd8a903c8
context : make output functions members
...
ggml-ci
2025-02-10 17:01:27 +02:00
Georgi Gerganov
d1d8d53008
bman : remove ubatch member
...
ggml-ci
2025-02-10 16:50:14 +02:00
Georgi Gerganov
ef358ee78f
context : add decode/encode
...
ggml-ci
2025-02-10 16:14:13 +02:00
Georgi Gerganov
f9971ef2e1
llama : dedup reserve code
2025-02-10 14:59:51 +02:00
Georgi Gerganov
972f91c7d7
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-02-10 14:45:54 +02:00
Georgi Gerganov
bdcf8b6a56
cont : fix mmap flag print ( #11699 )
2025-02-08 16:49:38 +02:00
Georgi Gerganov
ed926d8833
llama : fix defrag logic ( #11707 )
...
* llama : fix defrag logic
ggml-ci
* cont : better logic
ggml-ci
* cont : clamp fragmentation to 0.0
ggml-ci
2025-02-07 16:05:34 +02:00
Christian Fillion
2d219b389e
vocab : ignore invalid UTF-8 input in the BPE tokenizer ( #11729 )
...
Silently insert U+FFFD(s) (Unicode replacement character) instead until the
next valid codepoint can be found.
This fixes `llama_tokenize` throwing an exception across the C API boundary
or libllama's module boundary (the caller's runtime might be incompatible!)
Returing a proper error code might be desirable, however the signature
of `llama_tokenize` doesn't allow it as all return values already have
existing meaning.
2025-02-07 15:55:47 +02:00
magicse
333820d749
llama : fix progress dots ( #11730 )
...
* Update llama.cpp
For display progress dots in terminal.
Without this it didn't display dots progress during loading model from file.
* Update llama.cpp
removed trailing spaces
2025-02-07 15:48:47 +02:00
Christian Fillion
7ee953a64a
llama : add llama_sampler_init for safe usage of llama_sampler_free ( #11727 )
...
The C API in llama.h claims users can implement `llama_sampler_i` to
create custom `llama_sampler`. The sampler chain takes ownership and
calls `llama_sampler_free` on them. However, `llama_sampler_free` is
hard-coded to use `delete`. This is undefined behavior if the object
wasn't also allocated via `new` from libllama's C++ runtime. Callers
in C and C-compatible languages do not use C++'s `new` operator. C++
callers may not be sharing the same heap as libllama.
2025-02-07 11:33:27 +02:00
tv1wnd
855cd0734a
llama : fix old glm4 models ( #11670 )
2025-02-06 22:48:51 +01:00
Georgi Gerganov
b15fede7a9
kv-cache : fix defrag condition
...
ggml-ci
2025-02-06 14:35:19 +02:00