cs348project/backend/README.md

# MIND Backend

## Setup
Below will setup the backend including the `go` orchestration layer
and a `llama.cpp` inference server on `localhost:8081` and
`localhost:8080` for local testing.
### Building `llama.cpp`
See documentation for `llama.cpp` for details.

### Running `llama.cpp`
#### Getting a `GGUF` format model
Run `./backend/get-qwen3-1.7b.sh` to download the Qwen 3 1.7B model
from HuggingFace.
#### Running the inference server
Run `./llama-server -m <path-to-gguf-model> --port 8081` to run the
inference server at `localhost:8081`.

### Running the backend layer
Run `go run main.go`.  This will run the backend layer at
`localhost:8080`.

## A simple CLI client
A simple CLI-based client can be found under `backend/cli.py`, which
will connect to the backend layer at `localhost:8080`.

Please use the `\help` command to view specific operations.