3.0 KiB
MIND - Modular Inference & Node Database - Server-Side Design
High-Level Overview
Inference Engine - llama.cpp
A modified version of llama.cpp that provides extra fields in its
completion API to specify the use of on-disk kv-cache. And also tells
the client where the new kv-cache blocks are located.
Database - MySQL
This will store the information about users, the conversation histories, and also the index to the kv-cache stored as chunks on disk.
Backend Server - Go Layer
This will provide the APIs used by the frontend, and will talk to the inference engine so that it can load the correct chunks of kv-cache into memory or reconstruct a conversation out of cache, and will handle the life cycle of caches stored on disk.
It will also handle authentication (add-on feature).
CLI Interface - Go/Python
This will provide a simple interface to access all the features provided by the backend of ease of testing and prototyping.
Supported APIs For the Backend
Note that all APIs will need to encode the owner of the node.
POST /conversations
This will start a new conversation tree. The go backend should
handle the node creation.
GET /conversations
This will return all the root nodes of the conversation trees, to provide context for the user to switch conversation trees.
GET /tree
This will return the DAG under root, or within a specified depth or reversed depth from leaves, which would provide context for the user to switch between branches on a given tree.
POST /branches
Creates a new fork from a given commit.
GET /branches
List the branches of related to the current branch. Can also specify the maximum branch-off points to list.
POST /graft
Attach a range of nodes from another conversation.
POST /detach
Detaches a branch into a new conversation.
GET /linearize
Reconstruct a linear conversation history from a branch or node.
POST /completion
Trigger inference from the last node of a branch, creating two new nodes, one for the prompt and one for the answer.
Note that this is for talking to the go backend, so the go backend will be responsible for bookkeeping the kv-cache on disk, and the frontend doesn't need to worry about it.
POST /login
Logs into a certain user.
POST /logout
Logs out a user.
GET /me
Get the current user.
Database-Specific
The database should keep track of reachability and the backend should automatically remove orphaned nodes and caches.
It should also keep track of the DAG generated by the prompts and answers and different root nodes.
Cache
For a single user, the kv-cache on disk should only concern the working node, that is, all of its parent nodes.
Multiple users
Authentication
JWT-based authentication and multi-user switching, all API calls
except for POST /login would require a token.
The default token will be given for earlier stages.
Queuing
The go layer should also be responsible for keeping track of the
llama.cpp services availability and queue prompts in the case of
multiple users.