Added code for backend glue
This commit is contained in:
98
backend/design.md
Normal file
98
backend/design.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# MIND - Modular Inference & Node Database - Server-Side Design
|
||||
|
||||
## High-Level Overview
|
||||
|
||||
### Inference Engine - `llama.cpp`
|
||||
A modified version of `llama.cpp` that provides extra fields in its
|
||||
completion API to specify the use of on-disk kv-cache. And also tells
|
||||
the client where the new kv-cache blocks are located.
|
||||
|
||||
### Database - MySQL
|
||||
This will store the information about users, the conversation
|
||||
histories, and also the index to the kv-cache stored as chunks on
|
||||
disk.
|
||||
|
||||
### Backend Server - Go Layer
|
||||
This will provide the APIs used by the frontend, and will talk to the
|
||||
inference engine so that it can load the correct chunks of kv-cache
|
||||
into memory or reconstruct a conversation out of cache, and will
|
||||
handle the life cycle of caches stored on disk.
|
||||
|
||||
It will also handle authentication (add-on feature).
|
||||
|
||||
### CLI Interface - Go/Python
|
||||
This will provide a simple interface to access all the features
|
||||
provided by the backend of ease of testing and prototyping.
|
||||
|
||||
## Supported APIs For the Backend
|
||||
Note that all APIs will need to encode the owner of the node.
|
||||
|
||||
### `POST /conversations`
|
||||
This will start a new conversation tree. The `go` backend should
|
||||
handle the node creation.
|
||||
|
||||
### `GET /conversations`
|
||||
This will return all the root nodes of the conversation trees, to
|
||||
provide context for the user to switch conversation trees.
|
||||
|
||||
### `GET /tree`
|
||||
This will return the DAG under root, or within a specified depth or
|
||||
reversed depth from leaves, which would provide context for the user
|
||||
to switch between branches on a given tree.
|
||||
|
||||
### `POST /branches`
|
||||
Creates a new fork from a given commit.
|
||||
|
||||
### `GET /branches`
|
||||
List the branches of related to the current branch. Can also specify
|
||||
the maximum branch-off points to list.
|
||||
|
||||
### `POST /graft`
|
||||
Attach a range of nodes from another conversation.
|
||||
|
||||
### `POST /detach`
|
||||
Detaches a branch into a new conversation.
|
||||
|
||||
### `GET /linearize`
|
||||
Reconstruct a linear conversation history from a branch or node.
|
||||
|
||||
### `POST /completion`
|
||||
Trigger inference from the last node of a branch, creating two new
|
||||
nodes, one for the prompt and one for the answer.
|
||||
|
||||
Note that this is for talking to the go backend, so the go backend
|
||||
will be responsible for bookkeeping the kv-cache on disk, and the
|
||||
frontend doesn't need to worry about it.
|
||||
|
||||
### `POST /login`
|
||||
Logs into a certain user.
|
||||
|
||||
### `POST /logout`
|
||||
Logs out a user.
|
||||
|
||||
### `GET /me`
|
||||
Get the current user.
|
||||
|
||||
## Database-Specific
|
||||
The database should keep track of reachability and the backend should
|
||||
automatically remove orphaned nodes and caches.
|
||||
|
||||
It should also keep track of the DAG generated by the prompts and
|
||||
answers and different root nodes.
|
||||
|
||||
## Cache
|
||||
For a single user, the kv-cache on disk should only concern the
|
||||
working node, that is, all of its parent nodes.
|
||||
|
||||
## Multiple users
|
||||
|
||||
### Authentication
|
||||
JWT-based authentication and multi-user switching, all API calls
|
||||
except for `POST /login` would require a token.
|
||||
|
||||
The default token will be given for earlier stages.
|
||||
|
||||
### Queuing
|
||||
The go layer should also be responsible for keeping track of the
|
||||
`llama.cpp` services availability and queue prompts in the case of
|
||||
multiple users.
|
||||
Reference in New Issue
Block a user