VoiceAI whisper-llm-gtts

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned	license
Alexa Like Assistant	🌍	green	yellow	gradio	4.4.1	app.py	false	apache-2.0

VoiceAI whisper-llm-gtts

Overview

VoiceAI integrates the power of Text-to-Speech (TTS), Speech-to-Text (STT), and Local Language Model (LLM) technologies. This advanced AI application enables seamless conversion of text to speech, transcription of audio to text, and interaction with a local language model through an intuitive interface.

Demo

The demo was done in gradio app to facilitate use of audio in HF spaces.

Gradio Demo

Screenshots

System Flowchart

@startuml
actor User
entity "Whisper\n(Speech-to-Text)" as Whisper
entity "LLM\n(Local Language Model)" as LLM
entity "TTS\n(Text-to-Speech)" as TTS
entity "Memory" as Memory

User -> Whisper : speaks into microphone
Whisper -> LLM : transcribed text
LLM -> Memory : save response
Memory -> LLM : retrieve past response
LLM -> TTS : processed response
TTS -> User : speaks response
@enduml

Getting Started

Prerequisites

Python 3.10 or higher
A GPU for running LLM + Whisper efficiently
Docker for containerization

Installation

Clone the project repository:

git clone [email protected]:mohcineelharras/whisper-llm-gtts.git
cd whisper-llm-gtts

Install dependencies:

install ffmpeg

sudo apt-get install ffmpeg

pip install -r requirements_merged.txt
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir

If you encounter issues with GPU acceleration, try installing the CUDA toolkit:

conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit

Automatic Environment Setup

Set up the environment using the .envtemplate provided, then rename it to .env.

To automate the creation of a conda environment called audio and install dependencies, run:

./install.sh

If you don't have conda, use the following script to set it up:

./install_conda.sh

Model Setup

Create a models folder in the root directory, download the desired LLM model, place it in the models folder, and adjust the .env file accordingly.

Usage

To try the Gradio demo:

python app.py

To use whisper-llm-gtts, open two terminals:

In the first terminal, activate the audio environment and launch FastAPI:

conda activate audio
cd fastapi
python fastapi/api_server.py

In the second terminal, activate the audio environment and start the Streamlit frontend:

conda activate audio
cd streamlit_app
streamlit run streamlit_app/run app.py

To run in the terminal:

bash run_continious.sh
# or
./run_continious.sh

Dockerization

Before building the Docker image, ensure the Docker section in the .env file is uncommented. Create a models folder and download the model you wish to use.

Build and start the containers using Docker Compose:

docker-compose up --build

Technologies & Skills

VoiceAI whisper-llm-gtts employs various technologies and showcases multiple skills:

Libraries

FastAPI
Streamlit
Whisper
gTTS (Google Text-to-Speech)
PyTorch

Skills

API Development
Machine Learning
Full Stack Development
Dockerization
Audio Processing

Tools

Docker & Docker Compose
Git
Uvicorn

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
fastapi		fastapi
ressources		ressources
streamlit_app		streamlit_app
.dockerignore		.dockerignore
.envtemplate		.envtemplate
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
docker-compose.yaml		docker-compose.yaml
install.sh		install.sh
install_conda.sh		install_conda.sh
requirements_merged.txt		requirements_merged.txt
run_continious.sh		run_continious.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceAI whisper-llm-gtts

Overview

Demo

Screenshots

System Flowchart

Getting Started

Prerequisites

Installation

Automatic Environment Setup

Model Setup

Usage

Dockerization

Technologies & Skills

Libraries

Skills

Tools

About

Releases

Packages

Languages

mohcineelharras/whisper-llm-gtts

Folders and files

Latest commit

History

Repository files navigation

VoiceAI whisper-llm-gtts

Overview

Demo

Screenshots

System Flowchart

Getting Started

Prerequisites

Installation

Automatic Environment Setup

Model Setup

Usage

Dockerization

Technologies & Skills

Libraries

Skills

Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages