mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	arg : add env variable for parallel (#9513)
* add env variable for parallel * Update README.md with env: LLAMA_ARG_N_PARALLEL
This commit is contained in:
		| @@ -1312,7 +1312,7 @@ gpt_params_context gpt_params_parser_init(gpt_params & params, llama_example ex, | ||||
|         [](gpt_params & params, int value) { | ||||
|             params.n_parallel = value; | ||||
|         } | ||||
|     )); | ||||
|     ).set_env("LLAMA_ARG_N_PARALLEL")); | ||||
|     add_opt(llama_arg( | ||||
|         {"-ns", "--sequences"}, "N", | ||||
|         format("number of sequences to decode (default: %d)", params.n_sequences), | ||||
|   | ||||
| @@ -87,7 +87,7 @@ The project is under active development, and we are [looking for feedback and co | ||||
| | `-ctk, --cache-type-k TYPE` | KV cache data type for K (default: f16) | | ||||
| | `-ctv, --cache-type-v TYPE` | KV cache data type for V (default: f16) | | ||||
| | `-dt, --defrag-thold N` | KV cache defragmentation threshold (default: -1.0, < 0 - disabled)<br/>(env: LLAMA_ARG_DEFRAG_THOLD) | | ||||
| | `-np, --parallel N` | number of parallel sequences to decode (default: 1) | | ||||
| | `-np, --parallel N` | number of parallel sequences to decode (default: 1)<br/>(env:  LLAMA_ARG_N_PARALLEL) | | ||||
| | `-cb, --cont-batching` | enable continuous batching (a.k.a dynamic batching) (default: enabled)<br/>(env: LLAMA_ARG_CONT_BATCHING) | | ||||
| | `-nocb, --no-cont-batching` | disable continuous batching<br/>(env: LLAMA_ARG_NO_CONT_BATCHING) | | ||||
| | `--mlock` | force system to keep model in RAM rather than swapping or compressing | | ||||
|   | ||||
		Reference in New Issue
	
	Block a user
	 Bert Wagner
					Bert Wagner