initial version

jhakulin · Oct 13, 2024 · 6a4cd9d · 6a4cd9d
1 parent f082924
commit 6a4cd9d
Show file tree

Hide file tree

Showing 25 changed files with 2,583 additions and 1 deletion.
diff --git a/.vscode/launch.json b/.vscode/launch.json
@@ -0,0 +1,18 @@
+{
+    // Use IntelliSense to learn about possible attributes.
+    // Hover to view descriptions of existing attributes.
+    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
+    "version": "0.2.0",
+    "configurations": [
+        {
+            "name":"Python: Current File",
+            "type":"debugpy",
+            "request":"launch",
+            "purpose":["debug-test"],
+            "program":"${file}",
+            "console":"integratedTerminal",
+                "cwd": "${fileDirname}",
+                "justMyCode":false
+        },
+    ]
+}
diff --git a/README.md b/README.md
@@ -1 +1,93 @@
-# realtime-ai
+# Realtime AI Client
+
+## Overview
+
+This Python project exemplifies a modular approach to interacting with OpenAI's Realtime REST APIs. It enables the capture and processing of real-time audio through the microphone, streaming it efficiently to the API for analysis or transcription.
+
+---
+
+#### Key Components
+
+1. **RealtimeAIClient**
+   - **Purpose**: Acts as the high-level orchestrator, integrating with service and audio managers for comprehensive functionality.
+   - **Main Features**:
+     - Coordinates the lifecycle and interactions among different managers.
+     - Provides synchronous methods to start and stop the client.
+     - Handles audio input and output, triggering the appropriate streaming commands.
+
+2. **RealtimeAIOptions**
+   - **Purpose**: Encapsulates configuration parameters for the OpenAI API, such as API keys, model choices, and reconnect settings.
+   - **Attributes**:
+     - `api_key`: The OpenAI API key for authentication.
+     - `model`: Specifies the model to be used.
+     - `instructions`: Initial instructions or prompts to the AI model.
+     - Contains retry settings for controlling connection attempts.
+
+3. **RealtimeAIServiceManager**
+   - **Purpose**: Interfaces with the WebSocketManager to handle event processing and communication logic.
+   - **Main Features**:
+     - Sends initial setup instructions to the API on connection.
+     - Queues incoming events for later processing.
+     - Handles message parsing based on received data types.
+
+4. **AudioStreamManager**
+   - **Purpose**: Streams real-time audio data to the OpenAI Realtime service via RealtimeServiceManager (which sends the data to websocket)
+   - **Main Features**:
+     - Uses asynchronous queues to manage audio data buffering.
+     - Encodes audio into an acceptable format and sends it to the API.
+     - Offers controls to start and stop audio streaming.
+
+5. **WebSocketManager**
+   - **Purpose**: Manages WebSocket connections, providing stability through reconnection strategies.
+   - **Main Features**:
+     - Establishes and maintains WebSocket connections using `asyncio`.
+     - Implements exponential backoff for reconnection attempts.
+     - Both sends and receives data asynchronously from the OpenAI API.
+
+6. **Sample Script (`main.py`)**
+   - **Purpose**: Demonstrates capturing live audio from a microphone, sending it to the OpenAI API using the selected model.
+   - **Key Activities**:
+     - Utilizes `pyaudio` to capture real-time audio input.
+     - Sends captured audio to `RealtimeAIClient` for processing.
+     - Manages the audio stream and executes control functions, cleanly ending operations on user command.
+
+---
+
+#### Summary of Features
+
+- **OpenAI's Realtime API Interaction**: Structured to support real-time interactions with OpenAI's services.
+- **Audio Handling**: Integrates audio processing through `pyaudio` and NumPy libraries, complying with OpenAI API's audio format requirements.
+- **Asynchronous and Synchronous Design**: Takes advantage of `asyncio` and threading to handle WebSocket communications efficiently.
+- **Scalability and Modularity**: Each component operates independently, fostering scalability and maintainability for various real-time audio applications.
+
+---
+
+#### Getting Started
+
+1. **Installation**:
+   - Install dependencies using a package manager:
+     ```bash
+     pip install pyaudio numpy websockets
+     ```
+
+2. **Setup**:
+   - Replace placeholders like `"YOUR_API_KEY"` in the sample script with real information.
+   - Check system microphone access and settings to align with the project's audio requirements (e.g., 16bit PCM 24kHz mono).
+
+3. **Execution**:
+   - Run the script via command-line or an IDE:
+     ```bash
+     python samples/main.py
+     ```
+
+4. **Handling**:
+   - Use the logger outputs to ensure successful connections and audio data transmissions.
+   - Dive into provided methods to insert custom logic or explore further improvements.
+
+## Contributions
+
+Contributions in the form of issues or pull requests are welcome! Feel free to enhance functionalities, fix bugs, or improve documentation.
+
+## License
+
+This project is licensed under the MIT License - see the LICENSE file for details.
diff --git a/samples/__init__.py b/samples/__init__.py
diff --git a/samples/async/__init__.py b/samples/async/__init__.py