llama.cpp/common/sampling.cpp at b57eb9ca4fb79f3163c6a69e154ff76157a4f716

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-04 09:32:00 +00:00

Files

Kevin Wang 470939d483 common : preallocate sampling token data vector (#8363 )

`emplace_back` repeatedly-called is slower than preallocating the vector to the vocab size and directly inserting the data. Some rudimentary profiling with `chrono` improves the performance of this block of code from ~500us/op to ~40us/op.

Overall, this slightly improves the sampling performance which has a more substantial impact for the `examples/lookahead` implementation -- I am able to see a ~10% performance boost in lookahead inference.

2024-07-08 10:26:53 +03:00

18 KiB

Raw Blame History

View Raw

18 KiB Raw Blame History

18 KiB

Raw Blame History