MIND Backend
Setup
Below will setup the backend including the go orchestration layer
and a llama.cpp inference server on localhost:8081 and
localhost:8080 for local testing.
Building llama.cpp
See documentation for llama.cpp for details.
Running llama.cpp
Getting a GGUF format model
Run ./backend/get-qwen3-1.7b.sh to download the Qwen 3 1.7B model
from HuggingFace.
Running the inference server
Run ./llama-server -m <path-to-gguf-model> --port 8081 to run the
inference server at localhost:8081.
Running the backend layer
Run go run main.go. This will run the backend layer at
localhost:8080.
A simple CLI client
A simple CLI-based client can be found under backend/cli.py, which
will connect to the backend layer at localhost:8080.
Please use the \help command to view specific operations.