mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2025-11-01 09:01:57 +00:00
llama-bench : clarify benchmarked parts of the computation (#16823)
This commit is contained in:
@@ -82,6 +82,9 @@ Using the `-d <n>` option, each test can be run at a specified context depth, pr
|
|||||||
|
|
||||||
For a description of the other options, see the [main example](../main/README.md).
|
For a description of the other options, see the [main example](../main/README.md).
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> The measurements with `llama-bench` do not include the times for tokenization and for sampling.
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
### Text generation with different models
|
### Text generation with different models
|
||||||
@@ -131,7 +134,7 @@ $ ./llama-bench -n 0 -n 16 -p 64 -t 1,2,4,8,16,32
|
|||||||
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 16 | pp 64 | 33.52 ± 0.03 |
|
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 16 | pp 64 | 33.52 ± 0.03 |
|
||||||
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 16 | tg 16 | 15.32 ± 0.05 |
|
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 16 | tg 16 | 15.32 ± 0.05 |
|
||||||
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | pp 64 | 59.00 ± 1.11 |
|
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | pp 64 | 59.00 ± 1.11 |
|
||||||
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg 16 | 16.41 ± 0.79 ||
|
| llama 7B mostly Q4_0 | 3.56 GiB | 6.74 B | CPU | 32 | tg 16 | 16.41 ± 0.79 |
|
||||||
|
|
||||||
### Different numbers of layers offloaded to the GPU
|
### Different numbers of layers offloaded to the GPU
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user