mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-11-04 09:32:00 +00:00 
			
		
		
		
	llama-bench : clarify benchmarked parts of the computation (#16823)
This commit is contained in:
		@@ -82,6 +82,9 @@ Using the `-d <n>` option, each test can be run at a specified context depth, pr
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
For a description of the other options, see the [main example](../main/README.md).
 | 
					For a description of the other options, see the [main example](../main/README.md).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> [!NOTE]
 | 
				
			||||||
 | 
					> The measurements with `llama-bench` do not include the times for tokenization and for sampling.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Examples
 | 
					## Examples
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Text generation with different models
 | 
					### Text generation with different models
 | 
				
			||||||
@@ -131,7 +134,7 @@ $ ./llama-bench -n 0 -n 16 -p 64 -t 1,2,4,8,16,32
 | 
				
			|||||||
| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | pp 64      |     33.52 ± 0.03 |
 | 
					| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | pp 64      |     33.52 ± 0.03 |
 | 
				
			||||||
| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | tg 16      |     15.32 ± 0.05 |
 | 
					| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         16 | tg 16      |     15.32 ± 0.05 |
 | 
				
			||||||
| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | pp 64      |     59.00 ± 1.11 |
 | 
					| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | pp 64      |     59.00 ± 1.11 |
 | 
				
			||||||
| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | tg 16      |     16.41 ± 0.79 ||
 | 
					| llama 7B mostly Q4_0           |   3.56 GiB |     6.74 B | CPU        |         32 | tg 16      |     16.41 ± 0.79 |
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Different numbers of layers offloaded to the GPU
 | 
					### Different numbers of layers offloaded to the GPU
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user