llama.cpp/examples/server/server.cpp at 38b3de4658292582a8941a2be5c77b40ce6ac0f2

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

Files

Justine Tunney 65e5f6dadb Fix OpenAI server sampling w.r.t. temp and seed (#4668 )

The default values for tfs_z and typical_p were being set to zero, which
caused the token candidates array to get shrunk down to one element thus
preventing any sampling. Note this only applies to OpenAI API compatible
HTTP server requests.

The solution is to use the default values that OpenAI documents, as well
as ensuring we use the llama.cpp defaults for the rest. I've tested this
change still ensures deterministic output by default. If a "temperature"
greater than 0 is explicitly passed, then output is unique each time. If
"seed" is specified in addition to "temperature" then the output becomes
deterministic once more.

See mozilla-Ocho/llamafile#117
See mozilla-Ocho/llamafile@9e4bf29

2023-12-28 15:20:00 -04:00

114 KiB

Raw Blame History

View Raw

114 KiB Raw Blame History

114 KiB

Raw Blame History