CS348Project/cs348project

Fork 0

Files

Peisong Xiao 29a451ab58 Added code for backend glue

2025-10-13 19:20:24 -04:00

3.0 KiB

Raw Blame History

MIND - Modular Inference & Node Database - Server-Side Design

High-Level Overview

Inference Engine - `llama.cpp`

A modified version of llama.cpp that provides extra fields in its completion API to specify the use of on-disk kv-cache. And also tells the client where the new kv-cache blocks are located.

Database - MySQL

This will store the information about users, the conversation histories, and also the index to the kv-cache stored as chunks on disk.

Backend Server - Go Layer

This will provide the APIs used by the frontend, and will talk to the inference engine so that it can load the correct chunks of kv-cache into memory or reconstruct a conversation out of cache, and will handle the life cycle of caches stored on disk.

It will also handle authentication (add-on feature).

CLI Interface - Go/Python

This will provide a simple interface to access all the features provided by the backend of ease of testing and prototyping.

Supported APIs For the Backend

Note that all APIs will need to encode the owner of the node.

`POST /conversations`

This will start a new conversation tree. The go backend should handle the node creation.

`GET /conversations`

This will return all the root nodes of the conversation trees, to provide context for the user to switch conversation trees.

`GET /tree`

This will return the DAG under root, or within a specified depth or reversed depth from leaves, which would provide context for the user to switch between branches on a given tree.

`POST /branches`

Creates a new fork from a given commit.

`GET /branches`

List the branches of related to the current branch. Can also specify the maximum branch-off points to list.

`POST /graft`

Attach a range of nodes from another conversation.

`POST /detach`

Detaches a branch into a new conversation.

`GET /linearize`

Reconstruct a linear conversation history from a branch or node.

`POST /completion`

Trigger inference from the last node of a branch, creating two new nodes, one for the prompt and one for the answer.

Note that this is for talking to the go backend, so the go backend will be responsible for bookkeeping the kv-cache on disk, and the frontend doesn't need to worry about it.

`POST /login`

Logs into a certain user.

`POST /logout`

Logs out a user.

`GET /me`

Get the current user.

Database-Specific

The database should keep track of reachability and the backend should automatically remove orphaned nodes and caches.

It should also keep track of the DAG generated by the prompts and answers and different root nodes.

Cache

For a single user, the kv-cache on disk should only concern the working node, that is, all of its parent nodes.

Multiple users

Authentication

JWT-based authentication and multi-user switching, all API calls except for POST /login would require a token.

The default token will be given for earlier stages.

Queuing

The go layer should also be responsible for keeping track of the llama.cpp services availability and queue prompts in the case of multiple users.

3.0 KiB Raw Blame History

MIND - Modular Inference & Node Database - Server-Side Design

High-Level Overview

Inference Engine - llama.cpp

Database - MySQL

Backend Server - Go Layer

CLI Interface - Go/Python

Supported APIs For the Backend

POST /conversations

GET /conversations

GET /tree

POST /branches

GET /branches

POST /graft

POST /detach

GET /linearize

POST /completion

POST /login

POST /logout

GET /me