In `llama-perplexity`, when using `--kl-divergence`, the KL divergence statistics output mistakenly displays the 99th percentile twice. This change fixes that and correctly displays the 90th percentile as originally intended (presumably).
* perplexity: give more information about constraints on failure
This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each.
* log formatting
* log error and return instead of storing max_seq_exceeded int
* check if s0 is zero for -np check
This commit updates comments and error messages to use "decode" instead
of "eval" in perplexity.cpp.
The motivation for this is that `llama_eval` was renamed to
`llama_decode` a while ago, but the comments and error messages
still referred to "eval". This change ensures consistency and clarity.
* llama : deprecate llama_kv_self_ API
ggml-ci
* llama : allow llama_memory_(nullptr)
ggml-ci
* memory : add flag for optional data clear in llama_memory_clear
ggml-ci