Files
cs348project/backend/design.md

3.0 KiB

MIND - Modular Inference & Node Database - Server-Side Design

High-Level Overview

Inference Engine - llama.cpp

A modified version of llama.cpp that provides extra fields in its completion API to specify the use of on-disk kv-cache. And also tells the client where the new kv-cache blocks are located.

Database - MySQL

This will store the information about users, the conversation histories, and also the index to the kv-cache stored as chunks on disk.

Backend Server - Go Layer

This will provide the APIs used by the frontend, and will talk to the inference engine so that it can load the correct chunks of kv-cache into memory or reconstruct a conversation out of cache, and will handle the life cycle of caches stored on disk.

It will also handle authentication (add-on feature).

CLI Interface - Go/Python

This will provide a simple interface to access all the features provided by the backend of ease of testing and prototyping.

Supported APIs For the Backend

Note that all APIs will need to encode the owner of the node.

POST /conversations

This will start a new conversation tree. The go backend should handle the node creation.

GET /conversations

This will return all the root nodes of the conversation trees, to provide context for the user to switch conversation trees.

GET /tree

This will return the DAG under root, or within a specified depth or reversed depth from leaves, which would provide context for the user to switch between branches on a given tree.

POST /branches

Creates a new fork from a given commit.

GET /branches

List the branches of related to the current branch. Can also specify the maximum branch-off points to list.

POST /graft

Attach a range of nodes from another conversation.

POST /detach

Detaches a branch into a new conversation.

GET /linearize

Reconstruct a linear conversation history from a branch or node.

POST /completion

Trigger inference from the last node of a branch, creating two new nodes, one for the prompt and one for the answer.

Note that this is for talking to the go backend, so the go backend will be responsible for bookkeeping the kv-cache on disk, and the frontend doesn't need to worry about it.

POST /login

Logs into a certain user.

POST /logout

Logs out a user.

GET /me

Get the current user.

Database-Specific

The database should keep track of reachability and the backend should automatically remove orphaned nodes and caches.

It should also keep track of the DAG generated by the prompts and answers and different root nodes.

Cache

For a single user, the kv-cache on disk should only concern the working node, that is, all of its parent nodes.

Multiple users

Authentication

JWT-based authentication and multi-user switching, all API calls except for POST /login would require a token.

The default token will be given for earlier stages.

Queuing

The go layer should also be responsible for keeping track of the llama.cpp services availability and queue prompts in the case of multiple users.