Georgi Gerganov
952feedfca
context : disable encoder embd tensor for now
...
ggml-ci
2025-02-27 15:07:10 +02:00
Georgi Gerganov
4efe989886
context : pass embeddings tensor from encoder to decoder
...
ggml-ci
2025-02-25 16:11:17 +02:00
Georgi Gerganov
e2b3294f2c
context : fix enc-dec state save/load
...
ggml-ci
2025-02-25 12:14:34 +02:00
Georgi Gerganov
e5bc5f8e02
context : enc-dec is now working
...
ggml-ci
2025-02-25 12:10:34 +02:00
Georgi Gerganov
be58e30017
enc-dec : compose wip
...
ggml-ci
2025-02-24 18:12:24 +02:00
Georgi Gerganov
9cd78f11a1
context : explicit llama_context_i abstract interface
...
ggml-ci
2025-02-24 13:38:11 +02:00
Georgi Gerganov
4a1054b552
context : reuse built_attn_mha
...
ggml-ci
2025-02-24 11:29:52 +02:00
Georgi Gerganov
a5a85a3bc0
context : fix recurrent reserve
...
ggml-ci
2025-02-24 08:59:12 +02:00
Georgi Gerganov
0699a44c83
context : remove redundant virtual, protected -> private
...
ggml-ci
2025-02-23 20:02:11 +02:00
Georgi Gerganov
6378112cb5
graph : remove the build_kv_... API from llama_graph_i
...
ggml-ci
2025-02-23 19:39:22 +02:00
Georgi Gerganov
372fa3a894
cont : enc should work now, next is dec
...
ggml-ci
2025-02-23 12:20:23 +02:00
Georgi Gerganov
f5e80208c5
wip enc-dec
2025-02-21 19:17:47 +02:00
Georgi Gerganov
c4c0a4d13c
Merge branch 'master' into gg/llama-kv-cache
2025-02-21 19:14:07 +02:00
Georgi Gerganov
51f311e057
llama : skip loading unused tensors ( #12004 )
...
* llama : assign unknown/unused tensors to host buffer type
ggml-ci
* llama : skip unused tensors
ggml-ci
2025-02-21 18:33:18 +02:00
Georgi Gerganov
3753b30d65
context : fix n_outputs init
...
ggml-ci
2025-02-21 15:53:26 +02:00
Georgi Gerganov
f588a70da3
context : wrap input tensors in struct
...
ggml-ci
2025-02-21 15:09:28 +02:00
Georgi Gerganov
ebf1bdf97b
context : add logs
...
ggml-ci
2025-02-21 14:35:23 +02:00
Georgi Gerganov
548c230dff
graph : remove worst_case from the API
...
ggml-ci
2025-02-21 13:29:25 +02:00
Georgi Gerganov
2645a7d9a9
context : add save/load for recurrent context
...
ggml-ci
2025-02-21 10:28:42 +02:00
Georgi Gerganov
08011c2ca1
context : add llama_kv_cache_recurrent prototype
...
ggml-ci
2025-02-20 20:55:13 +02:00
Georgi Gerganov
ad870c49f4
context : fix causal input for cache-less case
...
ggml-ci
2025-02-20 20:01:02 +02:00
Georgi Gerganov
b1554be1d7
context : add cache-less llama_context
...
ggml-ci
2025-02-20 18:30:04 +02:00
Georgi Gerganov
072280ea6b
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-02-20 14:26:43 +02:00
Georgi Gerganov
f95b04a21c
model : fix order kvq -> qkv
...
ggml-ci
2025-02-19 18:52:20 +02:00
Georgi Gerganov
2eacb4c1bf
graph : simplify attention api
...
ggml-ci
2025-02-19 18:43:49 +02:00
Georgi Gerganov
e17e4b72d1
context : add llama_context_recurrent
...
ggml-ci
2025-02-19 16:07:27 +02:00
Georgi Gerganov
5f11a5502a
kv-cache : remove llama_kv_cache_i
2025-02-19 14:36:27 +02:00
Daniel Bevenius
9626d9351a
llama : fix indentation in llama-grammar [no ci] ( #11943 )
...
This commit adjusts the indentation for the functions `parse_sequence`
and `parse_rule` in src/llama-grammar.cpp.
The motivation is consistency and improve readability.
2025-02-19 06:16:23 +01:00
Georgi Gerganov
f5cedbcaaa
kv-cache : prepare for abstraction
...
ggml-ci
2025-02-18 21:28:58 +02:00
Georgi Gerganov
2bffc2d514
model : pass llama_graph_i as ptr
...
ggml-ci
2025-02-18 14:57:26 +02:00
Georgi Gerganov
9e50456e19
context : minor simplify
...
ggml-ci
2025-02-18 14:53:02 +02:00
Georgi Gerganov
befe14f06f
llama : reorder encode/decode in sources
2025-02-18 14:47:53 +02:00
Georgi Gerganov
bc6f187e9c
cont : use returend tensors from the graph build
...
ggml-ci
2025-02-18 14:24:17 +02:00
Georgi Gerganov
172f61690c
cont : return important tensors
...
ggml-ci
2025-02-18 13:48:43 +02:00
Georgi Gerganov
c23590319a
graph : add llama_graph_result
...
ggml-ci
2025-02-18 13:48:21 +02:00
Georgi Gerganov
f0d3ff2388
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-02-18 10:14:37 +02:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Georgi Gerganov
1d801d27b9
graph : update attn/kv_self names
2025-02-14 17:22:55 +02:00
Georgi Gerganov
828064564c
context : move common inputs to base class
...
ggml-ci
2025-02-14 16:48:21 +02:00
Georgi Gerganov
d5e8e1a2ba
context : remove batch_manager
...
ggml-ci
2025-02-14 16:10:55 +02:00
Georgi Gerganov
131743ff4f
context : abstract constructor and init
...
ggml-ci
2025-02-13 17:17:51 +02:00
Georgi Gerganov
ed3cb55abe
context : abstract input
...
ggml-ci
2025-02-13 15:53:15 +02:00
Georgi Gerganov
107d1e2c32
context : move output functionality to base class
...
ggml-ci
2025-02-13 15:42:14 +02:00
Georgi Gerganov
e08f38df69
context : minor cleanup
...
ggml-ci
2025-02-13 12:50:53 +02:00
Georgi Gerganov
f7c7757bab
context : abstract state read/write
...
ggml-ci
2025-02-13 12:37:28 +02:00
Georgi Gerganov
3a504d9a0b
llama : introduce llama_io interfaces
...
ggml-ci
2025-02-13 12:25:54 +02:00
Olivier Chafik
c7f460ab88
server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607 )
...
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-13 10:05:16 +00:00
Vinesh Janarthanan
27e8a23300
sampling: add Top-nσ sampler ( #11223 )
...
* initial sampling changes:
* completed top nsigma sampler implementation
* apply parameter to only llama-cli
* updated readme
* added tests and fixed nsigma impl
* cleaned up pr
* format
* format
* format
* removed commented tests
* cleanup pr and remove explicit floats
* added top-k sampler to improve performance
* changed sigma to float
* fixed string format to float
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update common/sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Update src/llama-sampling.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* added llama_sampler_init
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-13 08:45:57 +02:00
Daniel Bevenius
3e69319772
llama : update llama_decode_internal ref [no ci] ( #11840 )
...
This commit updates the comment in llama_kv_cache.h to reflect the
change of the function name from llama_decode_internal to
llama_decode_impl.
2025-02-13 08:07:51 +02:00
Georgi Gerganov
fbe6a07256
context : rename to llama_context_kv_self
2025-02-12 17:16:44 +02:00