27 lines
862 B
Markdown
27 lines
862 B
Markdown
# MIND Backend
|
|
|
|
## Setup
|
|
Below will setup the backend including the `go` orchestration layer
|
|
and a `llama.cpp` inference server on `localhost:8081` and
|
|
`localhost:8080` for local testing.
|
|
### Building `llama.cpp`
|
|
See documentation for `llama.cpp` for details.
|
|
|
|
### Running `llama.cpp`
|
|
#### Getting a `GGUF` format model
|
|
Run `./backend/get-qwen3-1.7b.sh` to download the Qwen 3 1.7B model
|
|
from HuggingFace.
|
|
#### Running the inference server
|
|
Run `./llama-server -m <path-to-gguf-model> --port 8081` to run the
|
|
inference server at `localhost:8081`.
|
|
|
|
### Running the backend layer
|
|
Run `go run main.go`. This will run the backend layer at
|
|
`localhost:8080`.
|
|
|
|
## A simple CLI client
|
|
A simple CLI-based client can be found under `backend/cli.py`, which
|
|
will connect to the backend layer at `localhost:8080`.
|
|
|
|
Please use the `\help` command to view specific operations.
|