sampling : optimize samplers by reusing bucket sort (#15665)

* sampling : optimize sorting using bucket sort in more places ggml-ci * sampling : do not sort in dist sampler ggml-ci * sampling : avoid heap allocations for sort buffers ggml-ci * common : add option to sort sampling candidates by probability ggml-ci * sampling : revert the change for preserving sort buffers * sampling : use std::copy instead of memcpy * sampling : clarify purpose of partial sort helpers ggml-ci * cont : remove wrong comment [no ci] * common : update comment Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-10-27 08:21:30 +00:00 · 2025-08-31 20:41:02 +03:00
parent 0d161f021a
commit e92d53b29e
9 changed files with 227 additions and 171 deletions
--- a/tools/tts/tts.cpp
+++ b/tools/tts/tts.cpp
@@ -895,7 +895,7 @@ lovely<|t_0.56|><|code_start|><|634|><|596|><|1766|><|1556|><|1306|><|1285|><|14

                codes.push_back(new_token_id);

-                const auto * cands = common_sampler_get_candidates(smpl[i]);
+                const auto * cands = common_sampler_get_candidates(smpl[i], false);

                // is it an end of generation? -> mark the stream as finished
                if (llama_vocab_is_eog(vocab, new_token_id) || n_decode == n_predict) {