Files
cs348project/backend
..
2025-10-13 19:20:24 -04:00
2025-10-14 02:26:32 -04:00
2025-10-13 19:20:24 -04:00
2025-10-13 19:20:24 -04:00
2025-10-13 19:20:24 -04:00
2025-10-14 02:01:57 -04:00
2025-10-14 02:28:44 -04:00
2025-10-20 23:27:12 -04:00

MIND Backend

Setup

Below will setup the backend including the go orchestration layer and a llama.cpp inference server on localhost:8081 and localhost:8080 for local testing.

Building llama.cpp

See documentation for llama.cpp for details.

Running llama.cpp

Getting a GGUF format model

Run ./backend/get-qwen3-1.7b.sh to download the Qwen 3 1.7B model from HuggingFace.

Running the inference server

Run ./llama-server -m <path-to-gguf-model> --port 8081 to run the inference server at localhost:8081.

Running the backend layer

Run go run main.go. This will run the backend layer at localhost:8080.

A simple CLI client

A simple CLI-based client can be found under backend/cli.py, which will connect to the backend layer at localhost:8080.

Please use the \help command to view specific operations.