# MIND - Modular Inference & Node Database - Server-Side Design ## High-Level Overview ### Inference Engine - `llama.cpp` A modified version of `llama.cpp` that provides extra fields in its completion API to specify the use of on-disk kv-cache. And also tells the client where the new kv-cache blocks are located. ### Database - MySQL This will store the information about users, the conversation histories, and also the index to the kv-cache stored as chunks on disk. ### Backend Server - Go Layer This will provide the APIs used by the frontend, and will talk to the inference engine so that it can load the correct chunks of kv-cache into memory or reconstruct a conversation out of cache, and will handle the life cycle of caches stored on disk. It will also handle authentication (add-on feature). ### CLI Interface - Go/Python This will provide a simple interface to access all the features provided by the backend of ease of testing and prototyping. ## Supported APIs For the Backend Note that all APIs will need to encode the owner of the node. ### `POST /conversations` This will start a new conversation tree. The `go` backend should handle the node creation. ### `GET /conversations` This will return all the root nodes of the conversation trees, to provide context for the user to switch conversation trees. ### `GET /tree` This will return the DAG under root, or within a specified depth or reversed depth from leaves, which would provide context for the user to switch between branches on a given tree. ### `POST /branches` Creates a new fork from a given commit. ### `GET /branches` List the branches of related to the current branch. Can also specify the maximum branch-off points to list. ### `POST /graft` Attach a range of nodes from another conversation. ### `POST /detach` Detaches a branch into a new conversation. ### `GET /linearize` Reconstruct a linear conversation history from a branch or node. ### `POST /completion` Trigger inference from the last node of a branch, creating two new nodes, one for the prompt and one for the answer. Note that this is for talking to the go backend, so the go backend will be responsible for bookkeeping the kv-cache on disk, and the frontend doesn't need to worry about it. ### `POST /login` Logs into a certain user. ### `POST /logout` Logs out a user. ### `GET /me` Get the current user. ## Database-Specific The database should keep track of reachability and the backend should automatically remove orphaned nodes and caches. It should also keep track of the DAG generated by the prompts and answers and different root nodes. ## Cache For a single user, the kv-cache on disk should only concern the working node, that is, all of its parent nodes. ## Multiple users ### Authentication JWT-based authentication and multi-user switching, all API calls except for `POST /login` would require a token. The default token will be given for earlier stages. ### Queuing The go layer should also be responsible for keeping track of the `llama.cpp` services availability and queue prompts in the case of multiple users.