mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-10-27 08:21:30 +00:00

Files

Georgi Gerganov e92d53b29e sampling : optimize samplers by reusing bucket sort (#15665 )

* sampling : optimize sorting using bucket sort in more places

ggml-ci

* sampling : do not sort in dist sampler

ggml-ci

* sampling : avoid heap allocations for sort buffers

ggml-ci

* common : add option to sort sampling candidates by probability

ggml-ci

* sampling : revert the change for preserving sort buffers

* sampling : use std::copy instead of memcpy

* sampling : clarify purpose of partial sort helpers

ggml-ci

* cont : remove wrong comment [no ci]

* common : update comment

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2025-08-31 20:41:02 +03:00

CMakeLists.txt

ggml : move AMX to the CPU backend (#10570 )

2024-11-29 21:54:58 +01:00

README.md

repo : update links to new url (#11886 )

2025-02-15 16:40:57 +02:00

speculative.cpp

sampling : optimize samplers by reusing bucket sort (#15665 )

2025-08-31 20:41:02 +03:00

README.md

llama.cpp/examples/speculative

Demonstration of speculative decoding and tree-based speculative decoding techniques

More info: