An asynchronous terminal server/multiple-client setup for conducting and managing chats with LLMs.
- client/server RPC-type architecture
- message signing
- ensure chunk ordering
- basic chat persistence and management
- set, switch to saved system prompts (personalities)
- manage prompts like chats (as files)
- chat truncation to token length
- rename chat
- profiles (profile x personalities -> sets of chats)
- export chat to local file
- context workspace (load/drop files)
- client inject from file
- client inject from other sources, e.g. youtube (trag)
- templates for standard instruction requests (trag)
- context workspace - bench/suspend files (hidden by filename)
- (auto) tools (evolve from llama-farm -> trag)
- user defined tool plugins
- server use vdb context at LLM will (tool)
- iterative workflows (refer to llama-farm)
- tool chains
- file edit/write
- file patch/diff
- allow model to manage workspace
- summaries and standard client instructions (trag)
- server use vdb context on request
- consider best method of pdf conversion / ingestion, OOB
- full arxiv paper ingestion (fvdb) - consolidate into one latex file OOB
- vdb result reranking with context, and winnowing
- can switch between Anthropic, OpenAI, tabbyAPI providers and models
- streaming
- syntax highlighting
- decent REPL
- REPL command mode
- cut/copy from output
- vimish keys in output
- client-side prompt editing
- client-side chat/message editing (how? temporarily set the input field history?)
- latex rendering (this is tricky in the context of prompt-toolkit, but see flatlatex).
- generation cancellation
- design with multimodal models in mind
- image sending and use
- use proper config dir (group?)
- dump default conf if missing
audio streaming ? workflows (tree of instruction templates) tasks
arXiv paper -> latex / md pdf paper -> latex / md