99 lines
3.0 KiB
Markdown
99 lines
3.0 KiB
Markdown
# MIND - Modular Inference & Node Database - Server-Side Design
|
|
|
|
## High-Level Overview
|
|
|
|
### Inference Engine - `llama.cpp`
|
|
A modified version of `llama.cpp` that provides extra fields in its
|
|
completion API to specify the use of on-disk kv-cache. And also tells
|
|
the client where the new kv-cache blocks are located.
|
|
|
|
### Database - MySQL
|
|
This will store the information about users, the conversation
|
|
histories, and also the index to the kv-cache stored as chunks on
|
|
disk.
|
|
|
|
### Backend Server - Go Layer
|
|
This will provide the APIs used by the frontend, and will talk to the
|
|
inference engine so that it can load the correct chunks of kv-cache
|
|
into memory or reconstruct a conversation out of cache, and will
|
|
handle the life cycle of caches stored on disk.
|
|
|
|
It will also handle authentication (add-on feature).
|
|
|
|
### CLI Interface - Go/Python
|
|
This will provide a simple interface to access all the features
|
|
provided by the backend of ease of testing and prototyping.
|
|
|
|
## Supported APIs For the Backend
|
|
Note that all APIs will need to encode the owner of the node.
|
|
|
|
### `POST /conversations`
|
|
This will start a new conversation tree. The `go` backend should
|
|
handle the node creation.
|
|
|
|
### `GET /conversations`
|
|
This will return all the root nodes of the conversation trees, to
|
|
provide context for the user to switch conversation trees.
|
|
|
|
### `GET /tree`
|
|
This will return the DAG under root, or within a specified depth or
|
|
reversed depth from leaves, which would provide context for the user
|
|
to switch between branches on a given tree.
|
|
|
|
### `POST /branches`
|
|
Creates a new fork from a given commit.
|
|
|
|
### `GET /branches`
|
|
List the branches of related to the current branch. Can also specify
|
|
the maximum branch-off points to list.
|
|
|
|
### `POST /graft`
|
|
Attach a range of nodes from another conversation.
|
|
|
|
### `POST /detach`
|
|
Detaches a branch into a new conversation.
|
|
|
|
### `GET /linearize`
|
|
Reconstruct a linear conversation history from a branch or node.
|
|
|
|
### `POST /completion`
|
|
Trigger inference from the last node of a branch, creating two new
|
|
nodes, one for the prompt and one for the answer.
|
|
|
|
Note that this is for talking to the go backend, so the go backend
|
|
will be responsible for bookkeeping the kv-cache on disk, and the
|
|
frontend doesn't need to worry about it.
|
|
|
|
### `POST /login`
|
|
Logs into a certain user.
|
|
|
|
### `POST /logout`
|
|
Logs out a user.
|
|
|
|
### `GET /me`
|
|
Get the current user.
|
|
|
|
## Database-Specific
|
|
The database should keep track of reachability and the backend should
|
|
automatically remove orphaned nodes and caches.
|
|
|
|
It should also keep track of the DAG generated by the prompts and
|
|
answers and different root nodes.
|
|
|
|
## Cache
|
|
For a single user, the kv-cache on disk should only concern the
|
|
working node, that is, all of its parent nodes.
|
|
|
|
## Multiple users
|
|
|
|
### Authentication
|
|
JWT-based authentication and multi-user switching, all API calls
|
|
except for `POST /login` would require a token.
|
|
|
|
The default token will be given for earlier stages.
|
|
|
|
### Queuing
|
|
The go layer should also be responsible for keeping track of the
|
|
`llama.cpp` services availability and queue prompts in the case of
|
|
multiple users.
|