# MIND Backend ## Setup Below will setup the backend including the `go` orchestration layer and a `llama.cpp` inference server on `localhost:8081` and `localhost:8080` for local testing. ### Building `llama.cpp` See documentation for `llama.cpp` for details. ### Running `llama.cpp` #### Getting a `GGUF` format model Run `./backend/get-qwen3-1.7b.sh` to download the Qwen 3 1.7B model from HuggingFace. #### Running the inference server Run `./llama-server -m --port 8081` to run the inference server at `localhost:8081`. ### Running the backend layer Run `go run main.go`. This will run the backend layer at `localhost:8080`. ## A simple CLI client A simple CLI-based client can be found under `backend/cli.py`, which will connect to the backend layer at `localhost:8080`. Please use the `\help` command to view specific operations.