mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-10-30 08:42:00 +00:00 
			
		
		
		
	rpc : update README for cache usage (#12620)
This commit is contained in:
		 Radoslav Gerganov
					Radoslav Gerganov
				
			
				
					committed by
					
						 GitHub
						GitHub
					
				
			
			
				
	
			
			
			 GitHub
						GitHub
					
				
			
						parent
						
							13731766db
						
					
				
				
					commit
					ef03229ff4
				
			| @@ -72,3 +72,14 @@ $ bin/llama-cli -m ../models/tinyllama-1b/ggml-model-f16.gguf -p "Hello, my name | ||||
|  | ||||
| This way you can offload model layers to both local and remote devices. | ||||
|  | ||||
| ### Local cache | ||||
|  | ||||
| The RPC server can use a local cache to store large tensors and avoid transferring them over the network. | ||||
| This can speed up model loading significantly, especially when using large models. | ||||
| To enable the cache, use the `-c` option: | ||||
|  | ||||
| ```bash | ||||
| $ bin/rpc-server -c | ||||
| ``` | ||||
|  | ||||
| By default, the cache is stored in the `$HOME/.cache/llama.cpp/rpc` directory and can be controlled via the `LLAMA_CACHE` environment variable. | ||||
|   | ||||
		Reference in New Issue
	
	Block a user