Implement server mode.

mirror of https://github.com/ggml-org/llama.cpp.git synced 2025-11-03 09:22:01 +00:00

This new mode works by first loading the model then listening for TCP
connections on a port. When a connection is received, arguments will be
parsed using a simple protocol:

- First the number of arguments will be read followed by a newline
  character.
- Then each argument will be read, separated by the 0 byte.
- With this we build an argument vector, similar to what is passed to
  the program entry point. We pass this to gpt_params_parse.

Finally `run` will be executed with the input/output streams connected
to the socket.

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

This commit is contained in:

Thiago Padilha

2023-03-22 10:41:26 -03:00

parent bf44faa0ee

commit 3a0dcb3920

9 changed files with 331 additions and 2 deletions

									
										7

tcp_server.h
									
										Normal file
									
												View File
												
				@@ -0,0 +1,7 @@

				#pragma once

				#include "utils.h"

				#include "llama.h"

				#include "run.h"

				int listen_tcp(llama_context * ctx, gpt_params params);

Implement server mode.

7 tcp_server.h Normal file Unescape Escape View File

7

tcp_server.h Normal file

View File