Implement server mode.

This new mode works by first loading the model then listening for TCP connections on a port. When a connection is received, arguments will be parsed using a simple protocol: - First the number of arguments will be read followed by a newline character. - Then each argument will be read, separated by the 0 byte. - With this we build an argument vector, similar to what is passed to the program entry point. We pass this to gpt_params_parse. Finally `run` will be executed with the input/output streams connected to the socket. Signed-off-by: Thiago Padilha <thiago@padilha.cc>
2025-11-02 09:12:03 +00:00 · 2023-03-22 10:41:26 -03:00
parent bf44faa0ee
commit 3a0dcb3920
9 changed files with 331 additions and 2 deletions
--- a/main.cpp
+++ b/main.cpp
@@ -1,5 +1,6 @@
 #include "run.h"
 #include "ggml.h"
+#include "tcp_server.h"

 #include <iostream>

@@ -125,5 +126,11 @@ int main(int argc, char ** argv) {
        exit(0);
    }

+#ifndef _WIN32
+    if (params.listen_port != "") {
+      return listen_tcp(ctx, params);
+    }
+#endif
+
    return run(ctx, params, std::cin, stdout, stderr);
 }