Georgi Gerganov
9dd7a0390f
llama : add log about loading model tensors ( #11699 )
2025-02-06 13:41:37 +02:00
Georgi Gerganov
0f1c1cab2c
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-02-06 10:04:33 +02:00
Georgi Gerganov
e0d913fccb
llama : clear whitespaces
2025-02-06 10:02:50 +02:00
Johannes Gäßler
fd08255d0d
CUDA: non-contiguous (RMS) norm support ( #11659 )
...
* CUDA: non-contiguous (RMS) norm support
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-04 22:21:42 +01:00
Molly Sophia
1eca8916b5
llama : fix rwkv inference ( #11618 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
2025-02-03 14:17:50 +02:00
Olivier Chafik
90f9b88afb
nit: more informative crash when grammar sampler fails ( #11593 )
2025-02-02 19:58:34 +00:00
Georgi Gerganov
74b0807245
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-02-02 11:07:05 +02:00
Georgi Gerganov
3e23be7911
context : store graph build function callback
...
ggml-ci
2025-02-02 10:49:32 +02:00
piDack
0cec062a63
llama : add support for GLM-Edge and GLM-Edge-V series models ( #10573 )
...
* add glm edge chat model
* use config partial_rotary_factor as rope ratio
* support for glm edge model
* vision model support
* remove debug info
* fix format
* llava.cpp trailing whitespace
* remove unused AutoTokenizer
* Update src/llama.cpp for not contain <|end|> or </s>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
* add edge template
* fix chat template
* fix confict
* fix confict
* fix ci err
* fix format err
* fix template err
* 9b hf chat support
* format
* format clip.cpp
* fix format
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/llava/clip.cpp
* fix format
* minor : style
---------
Co-authored-by: liyuhang <yuhang.li@zhipuai.cn >
Co-authored-by: piDack <pcdack@hotmail.co >
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: liyuhang <yuhang.li@aminer.cn >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-02-02 09:48:46 +02:00
Georgi Gerganov
5d3491e789
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-01-31 15:11:11 +02:00
Olivier Chafik
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )
...
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-01-30 19:13:58 +00:00
Georgi Gerganov
a40ba49fa6
Merge branch 'master' into gg/llama-kv-cache
2025-01-30 16:39:58 +02:00
mgroeber9110
ffd0821c57
vocab : correctly identify LF token for GPT-2 style BPE tokenizer ( #11496 )
2025-01-30 12:10:59 +02:00
Georgi Gerganov
c30e34cdba
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-01-29 15:01:26 +02:00
Georgi Gerganov
918885697e
llama : resolve rwkv conflict
...
ggml-ci
2025-01-29 14:45:04 +02:00
Molly Sophia
325afb370a
llama: fix missing k_cache store for rwkv6qwen2 ( #11445 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com >
2025-01-29 12:07:21 +08:00
lexasub
a5203b4465
llama : minor fixes for up llama load model speed ( #11448 )
...
* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%
* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings
* Update src/llama-vocab.cpp
---------
Co-authored-by: lexasub <empty@empty.ru >
Co-authored-by: Diego Devesa <slarengh@gmail.com >
2025-01-27 14:42:09 +01:00
Georgi Gerganov
e665b57fa2
Merge branch 'master' into gg/llama-kv-cache
...
ggml-ci
2025-01-27 14:09:22 +02:00
Johannes Gäßler
df984e0147
llama: refactor llama_decode_impl ( #11381 )
2025-01-27 12:07:12 +01:00
Georgi Gerganov
a0c500b4dc
context : prepare for abstraction
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
99422dfa3f
context : introduce llama_batch_manager
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
cb8f2095c6
wip
2025-01-26 20:16:22 +02:00
Georgi Gerganov
133ad6a723
context : initial need_reserve logic
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
c75ba6851e
context : move adapter code in the implementation [no ci]
2025-01-26 20:16:22 +02:00
Georgi Gerganov
f0713498fd
context : add get_ctx_padding()
...
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov
b4ec1d4429
cont : move kv_self update to llama_context
...
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
f2524c0e41
llama : remove references to llama_kv_cache (wip)
...
Intermediate step necessary to abstract the `llama_context` and
`llama_kv_cache`.
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
ae274f9747
llama : fix names [no ci]
2025-01-26 20:16:21 +02:00
Georgi Gerganov
a19f671fe0
context : minor
...
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov
17b363afd3
llama : update llama_kv_self API
...
ggml-ci
2025-01-26 20:16:20 +02:00
Georgi Gerganov
fd05ab87aa
kv_cache : move state read/write to llama_kv_cache
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
4cd1b6fa4c
context : prepare kv_cache_read/write to be moved to kv_cache
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
73a14eccc9
kv_cache : minor
2025-01-26 20:14:36 +02:00
Georgi Gerganov
fef90cb3d7
kv_cache : fix
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
4d7bd03e65
kv_cache : functions -> members
...
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov
f78b396ee7
llama : add struct llama_kv_cache (wip) [no ci]
2025-01-26 20:12:06 +02:00
Frank Mai
1d8ee06000
rpc: fix register position ( #11424 )
...
Signed-off-by: thxCode <thxcode0824@gmail.com >
2025-01-26 16:20:34 +01:00
Olivier Chafik
6171c9d258
Add Jinja template support ( #11016 )
...
* Copy minja from 58f0ca6dd7
* Add --jinja and --chat-template-file flags
* Add missing <optional> include
* Avoid print in get_hf_chat_template.py
* No designated initializers yet
* Try and work around msvc++ non-macro max resolution quirk
* Update test_chat_completion.py
* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
* Refactor test-chat-template
* Test templates w/ minja
* Fix deprecation
* Add --jinja to llama-run
* Update common_chat_format_example to use minja template wrapper
* Test chat_template in e2e test
* Update utils.py
* Update test_chat_completion.py
* Update run.cpp
* Update arg.cpp
* Refactor common_chat_* functions to accept minja template + use_jinja option
* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
* Revert LLAMA_CHATML_TEMPLATE refactor
* Normalize newlines in test-chat-templates for windows tests
* Forward decl minja::chat_template to avoid eager json dep
* Flush stdout in chat template before potential crash
* Fix copy elision warning
* Rm unused optional include
* Add missing optional include to server.cpp
* Disable jinja test that has a cryptic windows failure
* minja: fix vigogne (https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626
* Update minja to https://github.com/google/minja/pull/25
* Update minja from https://github.com/google/minja/pull/27
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com >
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com >
2025-01-21 13:18:51 +00:00
Christopher Nielsen
90d987b105
mmap: add include for cerrno ( #11296 )
...
ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co >
2025-01-20 16:02:43 +02:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
Georgi Gerganov
ef6dada60c
cont : fix whitespaces ( #11305 )
2025-01-20 09:29:32 +02:00
Kyle Bruene
ae3c1db2f9
llama : re-add LLM_ARCH_PHIMOE ( #11305 )
...
Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.
2025-01-20 09:21:01 +02:00
Georgi Gerganov
4dd34ff831
cmake : add sanitizer flags for llama.cpp ( #11279 )
...
* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de >
2025-01-18 16:18:15 +02:00
Radoslav Gerganov
667d72846c
rpc : early register backend devices ( #11262 )
...
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609
2025-01-17 10:57:09 +02:00
Georgi Gerganov
a133566d34
vocab : fix double-eos check ( #11273 )
...
ggml-ci
2025-01-17 09:28:00 +02:00
Xuan Son Nguyen
681149ced2
llama : add llama_model_load_from_splits ( #11255 )
...
* llama : add `llama_model_load_from_splits`
* update
2025-01-16 13:54:08 +01:00
Johannes Gäßler
432df2d5f9
RoPE: fix back, CUDA support for back + noncont. ( #11240 )
...
* RoPE: fix back, CUDA support for back + noncont.
* fix comments reg. non-cont. RoPE support [no-ci]
2025-01-15 12:51:37 +01:00
Georgi Gerganov
bbf3e55e35
vocab : add dummy tokens for "no_vocab" type ( #11231 )
...
* vocab : add dummy tokens for "no_vocab" type
ggml-ci
* vocab : minor [no ci]
2025-01-14 11:54:58 +01:00
Daniel Bevenius
8f70fc3d1b
llama : remove 'd' from bad special token log ( #11212 )
...
This commit removes the 'd' from the log message in llama-vocab.cpp
when logging a bad special token.
The motivation for this is that currently the output can look something
like the following:
```console
load: bad special token:
'tokenizer.ggml.image_token_id' = 128256d, using default id -1
```
2025-01-13 13:38:20 +01:00
Xuan Son Nguyen
9a483999a6
llama : fix chat template gguf key ( #11201 )
2025-01-12 13:45:14 +01:00