Implement server mode.

This new mode works by first loading the model then listening for TCP
connections on a port. When a connection is received, arguments will be
parsed using a simple protocol:

- First the number of arguments will be read followed by a newline
  character.
- Then each argument will be read, separated by the 0 byte.
- With this we build an argument vector, similar to what is passed to
  the program entry point. We pass this to gpt_params_parse.

Finally `run` will be executed with the input/output streams connected
to the socket.

Signed-off-by: Thiago Padilha <thiago@padilha.cc>
This commit is contained in:
Thiago Padilha
2023-03-22 10:41:26 -03:00
parent bf44faa0ee
commit 3a0dcb3920
9 changed files with 331 additions and 2 deletions

View File

@@ -42,6 +42,10 @@ struct gpt_params {
bool instruct = false; // instruction mode (used for Alpaca models)
bool ignore_eos = false; // do not stop generating after eos
bool perplexity = false; // compute perplexity over the prompt
#ifndef _WIN32
std::string listen_port = ""; // TCP port for when running in server mode
#endif
};
bool gpt_params_parse(int argc, char ** argv, gpt_params & params);