Merge pull request #11 from Zackriya-Solutions/optimize/main

Pre release
Zackriya-Solutions · Feb 8, 2025 · 14920b3 · 14920b3
2 parents 7a25f95 + 7ef68fc
commit 14920b3
Show file tree

Hide file tree

Showing 35 changed files with 11,711 additions and 1,020 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,6 @@
 # See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
 
-/experiment
+/experiments
 
 # dependencies
 /node_modules
@@ -44,3 +44,6 @@ yarn-error.log*
 # typescript
 *.tsbuildinfo
 next-env.d.ts
+
+# Audio files
+*.wav
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,4 @@
+[submodule "backend/whisper.cpp"]
+	path = backend/whisper.cpp
+	url = https://github.com/Zackriya-Solutions/whisper.cpp
+	branch = develop
diff --git a/PULLREQUEST.md b/PULLREQUEST.md
diff --git a/README.md b/README.md
@@ -1,10 +1,22 @@
-# Meeting Minutes - AI-Powered Meeting Assistant
-
-## Release 0.0.1
-
-A new release is available!
-
-Please check out the release [here](https://github.com/Zackriya-Solutions/meeting-minutes/releases/tag/v0.0.1).
+<div align="center" style="border-bottom: none">
+    <h1>
+        <br>
+        Meetily - AI-Powered Meeting Assistant
+    </h1>
+    <h3>
+    Open source Ai Assistant for taking meeting notes
+    </h3>
+    <p align="center">
+    <a href="https://meetily.zackriy.com"><b>Website</b></a> •
+    <a href="https://meetily.zackriya.com"><b>Contact Us</b></a> •
+    <a href="https://x.com/sujithx007"><b>Author</b></a> •
+    <a href="https://zackriya.com"><b>This project is supported by Zackriya</b>
+    </a>
+</p>
+    <p align="center">
+ An AI-Powered Meeting Assistant that captures live meeting audio, transcribes it in real-time, and generates summaries while ensuring user privacy. Perfect for teams who want to focus on discussions while automatically capturing and organizing meeting content without the need for external servers or complex infrastructure. 
+</p>
+</div>
 
 ## Overview
 
@@ -19,25 +31,72 @@ While there are many meeting transcription tools available, this solution stands
 - **Customizable**: Self-host and modify for your specific needs
 - **Intelligent**: Built-in knowledge graph for semantic search across meetings
 
-> **Note**: We have an experimental Rust-based implementation that explores better performance and native integration. It currently implements:
-> - ✅ Real-time audio capture from both microphone and system audio
-> - ✅ Live transcription using locally-running Whisper
-> - ✅ Speaker diarization
-> - ✅ Rich text editor for notes
-> 
-> See [Rust Implementation](experiment/rust_based_implementation) for details.
-
-
 ## Features
 
 ✅ Modern, responsive UI with real-time updates
+
 ✅ Real-time audio capture (microphone + system audio)
+
 ✅ Live transcription using Whisper.cpp
 ✅ Speaker diarization
+
 ✅ Local processing for privacy
+
 ✅ Packaged the app for Mac Os
+
 🚧 Export to Markdown/PDF
 
+
+> **Note**: We have a Rust-based implementation that explores better performance and native integration. It currently implements:
+> - ✅ Real-time audio capture from both microphone and system audio
+> - ✅ Live transcription using locally-running Whisper
+> - ✅ Speaker diarization
+> - ✅ Rich text editor for notes
+> 
+We are currently working on:
+> - ✅ Export to Markdown/PDF
+> - ✅ Export to HTML
+
+
+## Release 0.0.2
+
+A new release is available!
+
+Please check out the release [here](https://github.com/Zackriya-Solutions/meeting-minutes/releases/tag/v0.0.2).
+
+### What's New
+- Transcription quality is improved.
+- Bug fixes and improvements for frontend
+- Better backend app build process
+- Improved documentation
+- New `.dmg` package
+
+### What would be next?
+- Database connection to save meeting minutes
+- Improve summarization quality for smaller llm models
+- Add download options for meeting transcriptions 
+- Add download option for summary
+
+### Known issues
+- Smaller LLMs can hallucinate, making summarization quality poor
+- Backend build process require CMake, C++ compiler, etc. Making it harder to build
+- Backend build process require Python 3.10 or newer
+- Frontend build process require Node.js
+
+### How to Get Started
+
+#### Frontend
+1. Download the app `dmg.zip` from [here](https://github.com/Zackriya-Solutions/meeting-minutes/releases/tag/v0.0.2)
+2. Double click the `Meeting Minutes.app` to run the app
+
+#### Backend
+0. Download the source code `Source Code.zip` from [here](https://github.com/Zackriya-Solutions/meeting-minutes/releases/tag/v0.0.2)
+1. Go to the `backend` directory: `cd backend`
+2. Make sure you have install all the pre requsites before proceeding - ffmpeg, cmake, c++ compiler and Python versions between 3.10 and 3.12 are required
+3. Build dependencies by eunning `build_whisper.sh`
+4. Run the server with `clean_start_backend.sh`
+
+
 ## LLM Integration
 
 The backend supports multiple LLM providers through a unified interface. Current implementations include:

diff --git a/backend/.gitignore b/backend/.gitignore
@@ -9,6 +9,9 @@ instance/*
 transcripts/
 chroma/
 models/
+whisper.cpp/
+whisper-server*
+whisper-server-package/
 *.db
 *.json
 

diff --git a/backend/README.md b/backend/README.md
@@ -4,40 +4,75 @@ FastAPI backend for meeting transcription and analysis
 
 ## Features
 - Audio file upload and storage
-- Whisper-based transcription
-- Meeting analysis with LLMs
+- Real-time Whisper-based transcription with streaming support
+- Meeting analysis with LLMs (supports Claude, Groq, and Ollama)
 - REST API endpoints
 
 ## Requirements
 - Python 3.9+
 - FFmpeg
-- API Keys
+- C++ compiler (for Whisper.cpp)
+- CMake
+- Git (for submodules)
+- Ollama running
+- API Keys (for Claude or Groq) if planning to use APIS
 - ChromaDB
 
 ## Installation
+
+### 1. Environment Setup
+Create `.env` file in the backend directory:
 ```bash
-pip install -r requirements.txt
+ANTHROPIC_API_KEY=your_key_here  # Optional, for Claude
+GROQ_API_KEY=your_key_here      # Optional, for Groq
 ```
 
-## Environment Setup (Optional if you are using API keys)
-Create `.env` file:
-```
-ANTHROPIC_API_KEY=your_key_here
-GROQ_API_KEY=your_key_here
+### 2. Build Whisper Server
+Run the build script which will:
+- Initialize and update git submodules
+- Build Whisper.cpp with custom server modifications
+- Set up the server package with required files
+- Download the selected Whisper model
+
+```bash
+./build_whisper.sh
 ```
 
-## Running the Server
+If no model is specified, the script will prompt you to choose one interactively.
+
+### 3. Running the Server
+The clean start script provides an interactive way to start the backend services:
+
 ```bash
 ./clean_start_backend.sh
 ```
 
+The script will:
+1. Check and clean up any existing processes
+2. Verify environment setup and required directories
+3. Check for existing Whisper models
+4. Download the selected model if not present
+5. Start the Whisper server
+6. Start the FastAPI backend in a Python virtual environment
+
+To stop all services, press Ctrl+C. The script will automatically clean up all processes.
+
 ## API Documentation
-Access Swagger UI at `http://localhost:8000/docs`
+Access Swagger UI at `http://localhost:5167/docs`
+
+## Services
+The backend runs two services:
+1. Whisper.cpp Server: Handles real-time audio transcription
+2. FastAPI Backend: Manages API endpoints, LLM integration, and data storage
 
-## Testing
-```bash
-pytest tests/
-```
 
-## Deployment
-See `deploy/` directory for Kubernetes manifests
+## Troubleshooting
+- If services fail to start, the script will automatically clean up processes
+- Check logs for detailed error messages
+- Ensure all ports (5167 for backend) are available
+- Verify API keys if using Claude or Groq
+- For Ollama, ensure the Ollama service is running and models are pulled
+- If build fails:
+  - Ensure all dependencies (CMake, C++ compiler) are installed
+  - Check if git submodules are properly initialized
+  - Verify you have write permissions in the directory