mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	 09aaf4f1f5
			
		
	
	09aaf4f1f5
	
	
	
		
			
			This commit fixes an issue in the llama.cpp project where the command for testing the llama-server object contained a duplicated file extension. The original command was: ./tests.sh unit/test_chat_completion.py.py -v -x It has been corrected to: ./tests.sh unit/test_chat_completion.py -v -x This change ensures that the test script correctly locates and executes the intended test file, preventing test failures due to an incorrect file name.
		
			
				
	
	
		
			67 lines
		
	
	
		
			2.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			67 lines
		
	
	
		
			2.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Server tests
 | |
| 
 | |
| Python based server tests scenario using [pytest](https://docs.pytest.org/en/stable/).
 | |
| 
 | |
| Tests target GitHub workflows job runners with 4 vCPU.
 | |
| 
 | |
| Note: If the host architecture inference speed is faster than GitHub runners one, parallel scenario may randomly fail.
 | |
| To mitigate it, you can increase values in `n_predict`, `kv_size`.
 | |
| 
 | |
| ### Install dependencies
 | |
| 
 | |
| `pip install -r requirements.txt`
 | |
| 
 | |
| ### Run tests
 | |
| 
 | |
| 1. Build the server
 | |
| 
 | |
| ```shell
 | |
| cd ../../..
 | |
| cmake -B build -DLLAMA_CURL=ON
 | |
| cmake --build build --target llama-server
 | |
| ```
 | |
| 
 | |
| 2. Start the test: `./tests.sh`
 | |
| 
 | |
| It's possible to override some scenario steps values with environment variables:
 | |
| 
 | |
| | variable                 | description                                                                                    |
 | |
| |--------------------------|------------------------------------------------------------------------------------------------|
 | |
| | `PORT`                   | `context.server_port` to set the listening port of the server during scenario, default: `8080` |
 | |
| | `LLAMA_SERVER_BIN_PATH`  | to change the server binary path, default: `../../../build/bin/llama-server`                         |
 | |
| | `DEBUG`                  | to enable steps and server verbose mode `--verbose`                                       |
 | |
| | `N_GPU_LAYERS`           | number of model layers to offload to VRAM `-ngl --n-gpu-layers`                                |
 | |
| | `LLAMA_CACHE`            | by default server tests re-download models to the `tmp` subfolder. Set this to your cache (e.g. `$HOME/Library/Caches/llama.cpp` on Mac or `$HOME/.cache/llama.cpp` on Unix) to avoid this |
 | |
| 
 | |
| To run slow tests (will download many models, make sure to set `LLAMA_CACHE` if needed):
 | |
| 
 | |
| ```shell
 | |
| SLOW_TESTS=1 ./tests.sh
 | |
| ```
 | |
| 
 | |
| To run with stdout/stderr display in real time (verbose output, but useful for debugging):
 | |
| 
 | |
| ```shell
 | |
| DEBUG=1 ./tests.sh -s -v -x
 | |
| ```
 | |
| 
 | |
| To run all the tests in a file:
 | |
| 
 | |
| ```shell
 | |
| ./tests.sh unit/test_chat_completion.py -v -x
 | |
| ```
 | |
| 
 | |
| To run a single test:
 | |
| 
 | |
| ```shell
 | |
| ./tests.sh unit/test_chat_completion.py::test_invalid_chat_completion_req
 | |
| ```
 | |
| 
 | |
| Hint: You can compile and run test in single command, useful for local developement:
 | |
| 
 | |
| ```shell
 | |
| cmake --build build -j --target llama-server && ./examples/server/tests/tests.sh
 | |
| ```
 | |
| 
 | |
| To see all available arguments, please refer to [pytest documentation](https://docs.pytest.org/en/stable/how-to/usage.html)
 |