llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-30 08:42:00 +00:00

Files

Georgi Gerganov d1031cf49c sampling : refactor init to use llama_sampling_params (#3696 )

* sampling : refactor init to use llama_sampling_params

* llama : combine repetition, frequency and presence penalties in 1 call

* examples : remove embd-input and gptneox-wip

* sampling : rename penalty params + reduce size of "prev" vector

* sampling : add llama_sampling_print helper

* sampling : hide prev behind API and apply #3661

ggml-ci

2023-10-20 21:07:23 +03:00

CMakeLists.txt

speculative : PoC for speeding-up inference via speculative sampling (#2926 )

2023-09-03 15:12:08 +03:00

speculative.cpp

sampling : refactor init to use llama_sampling_params (#3696 )

2023-10-20 21:07:23 +03:00