This repository provides a chatbot interface designed to interact with multimodal data (images and text) using vision-language models. The interface uses Gradio for a web-based front-end and communicates with models deployed via OpenAI-compatible APIs. It supports selecting models from multiple deployment endpoints, generating responses, and providing interactive feedback for image and text inputs.
-
Model Selection:
- Reads a YAML file to list available models deployed on specified IPs and ports. (TODO)
- Automatically detects available models using OpenAI-compatible APIs, creating a dictionary of accessible models. (DONE)
- Provides a dropdown in the Gradio interface to select models. (TODO)
-
Multimodal Input:
- Accepts both text and image inputs via a
MultimodalTextbox
. Users can interact with the chatbot by combining images and textual queries. - Allows for multiple image uploads in a single input. (DONE)
- Accepts both text and image inputs via a
-
Error Handling:
- Displays error messages in the front-end when model interactions fail. (TODO)
-
Interactive Parameters:
- Users can adjust parameters like temperature and maximum output tokens directly in the interface.
-
Example Inputs:
- Predefined examples guide users on how to interact with the interface using images and text.
- Python 3.8 or later
- Required Python packages:
pip install gradio openai pyyaml pillow
git clone https://github.com/your-repo/multimodal-chatbot.git
cd multimodal-chatbot
To manage models, define their IPs and ports in a YAML file. For example:
model:
llama-3.1-8b:
- ip: "localhost"
port: 18001
- ip: "100.1.100.122"
port: 8001
llama-3.1-70b: []
(Reading and using this file is currently marked as a TODO.)
Run the following command to launch the Gradio interface:
python chatbot.py --host localhost --port 19000
Open your browser and navigate to:
http://localhost:19000
- Model Name: Select the model to interact with.
- Model URL: Specify the base URL of the model deployment.
- API Key: Provide the API key for authentication.
- Temperature: Controls the randomness of the model's responses.
- Max Output Tokens: Limits the maximum number of tokens generated in the response.
- Example inputs guide users on how to interact with the chatbot using text and images.
├── chatbot.py # Main script for running the Gradio interface
├── configs/
│ └── gradio/
│ └── vision_language_model.yaml # Example YAML configuration for model endpoints
├── data/
│ └── images/ # Example images for predefined examples
└── README.md # Documentation for the repository
-
Read Model Endpoints from YAML: Automatically populate the dropdown with available models listed in the YAML file.
-
Error Display in Front-end: Show detailed error messages in the Gradio interface when model interactions fail.
-
Improved User Experience: Enhance the visual design and interactivity of the interface.
This project is licensed under the MIT License. See the LICENSE file for details.
Feel free to contribute to the repository by submitting issues or pull requests! 🚀