mirror of
				https://github.com/ggml-org/llama.cpp.git
				synced 2025-11-03 09:22:01 +00:00 
			
		
		
		
	Implement server mode.
This new mode works by first loading the model then listening for TCP connections on a port. When a connection is received, arguments will be parsed using a simple protocol: - First the number of arguments will be read followed by a newline character. - Then each argument will be read, separated by the 0 byte. - With this we build an argument vector, similar to what is passed to the program entry point. We pass this to gpt_params_parse. Finally `run` will be executed with the input/output streams connected to the socket. Signed-off-by: Thiago Padilha <thiago@padilha.cc>
This commit is contained in:
		
							
								
								
									
										7
									
								
								tcp_server.h
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										7
									
								
								tcp_server.h
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,7 @@
 | 
			
		||||
#pragma once
 | 
			
		||||
 | 
			
		||||
#include "utils.h"
 | 
			
		||||
#include "llama.h"
 | 
			
		||||
#include "run.h"
 | 
			
		||||
 | 
			
		||||
int listen_tcp(llama_context * ctx, gpt_params params);
 | 
			
		||||
		Reference in New Issue
	
	Block a user