From a8ca18b4b815a2abdbecb958ee5f4c542d69aac7 Mon Sep 17 00:00:00 2001
From: Georgi Gerganov <ggerganov@gmail.com>
Date: Tue, 28 Oct 2025 19:41:43 +0200
Subject: [PATCH] llama-bench : clarify benchmarked parts of the computation
 (#16823)

---
 tools/llama-bench/README.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/llama-bench/README.md b/tools/llama-bench/README.md
index ead4da45e2..87d9c0a219 100644
--- a/tools/llama-bench/README.md
+++ b/tools/llama-bench/README.md
@@ -82,6 +82,9 @@ Using the `-d <n>` option, each test can be run at a specified context depth, pr
 
 For a description of the other options, see the [main example](../main/README.md).
 
+> [!NOTE]
+> The measurements with `llama-bench` do not include the times for tokenization and for sampling.
+
 ## Examples
 
 ### Text generation with different models
@@ -131,7 +134,7 @@ $ ./llama-bench -n 0 -n 16 -p 64 -t 1,2,4,8,16,32
 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | pp 64      |     33.52 ± 0.03 |
 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | tg 16      |     15.32 ± 0.05 |
 | llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | pp 64      |     59.00 ± 1.11 |
-| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | tg 16      |     16.41 ± 0.79 ||
+| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | tg 16      |     16.41 ± 0.79 |
 
 ### Different numbers of layers offloaded to the GPU