llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-02 09:12:03 +00:00

Files

Georgi Gerganov d4c19c0f5c server : accept extra_context for the infill endpoint (#9874 )

* server : accept extra_context for the infill endpoint

ggml-ci

* server : update readme [no ci]

* server : use repo-level FIM pattern if possible

ggml-ci

2024-10-13 21:31:35 +03:00

CMakeLists.txt

llama : move vocab, grammar and sampling into separate files (#8508 )

2024-07-23 13:10:17 +03:00

llama-grammar.cpp

llama : refactor sampling v2 (#9294 )

2024-09-07 15:16:19 +03:00

llama-grammar.h

llama : refactor sampling v2 (#9294 )

2024-09-07 15:16:19 +03:00

llama-impl.h

log : add CONT level for continuing previous log entry (#9610 )

2024-09-24 10:15:35 +03:00

llama-sampling.cpp

sampling : avoid expensive softmax during greedy sampling (#9605 )

2024-09-24 09:03:17 +03:00

llama-sampling.h

llama : refactor samplers internal implementation (#9370 )

2024-09-08 15:52:07 +02:00

llama-vocab.cpp

llama : improve infill support and special token detection (#9798 )

2024-10-12 08:21:51 +03:00

llama-vocab.h

llama : improve infill support and special token detection (#9798 )

2024-10-12 08:21:51 +03:00

llama.cpp

server : accept extra_context for the infill endpoint (#9874 )

2024-10-13 21:31:35 +03:00

unicode-data.cpp

server : better security control for public deployments (#9776 )

2024-10-08 13:27:04 +02:00

unicode-data.h

llama : reduce compile time and binary size (#9712 )

2024-10-02 15:49:55 +02:00

unicode.cpp

llama : reduce compile time and binary size (#9712 )

2024-10-02 15:49:55 +02:00

unicode.h

llama : move vocab, grammar and sampling into separate files (#8508 )

2024-07-23 13:10:17 +03:00