Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate OpenAI Audio Client and Introduce ChatClientBase for Enhanced Separation of Concerns #9

Merged
merged 3 commits into from
Dec 23, 2024

Conversation

Cyb3rWard0g
Copy link
Owner

Summary

This PR adds OpenAI’s audio client integration into the project, enabling speech generation, transcription, and translation functionalities. Additionally, it introduces a new ChatClientBase class to handle chat-specific logic, separating it from the LLMClientBase to ensure modularity and maintainability.

Key Changes

  • OpenAI and Azure OpenAI Audio Client:
    • Unified client to support both OpenAI and Azure OpenAI APIs.
    • Configurable settings for API key, base URL, Azure-specific options (e.g., endpoint, deployment, version).
  • Speech Generation:
    • Converts text into audio using OpenAI endpoints.
    • Handles large text inputs by splitting into manageable chunks for efficient processing.
    • Supports incremental file saving or in-memory audio composition.
  • Transcription and Translation:
    • Transcription: Converts audio files into text using OpenAI transcription models.
    • Translation: Translates audio content into English with OpenAI translation models.
  • Dynamic Request Validation:
    • Utilizes Pydantic models for validating and structuring requests.
    • Ensures accurate and error-free communication with OpenAI APIs.
  • Improved Logging and Error Handling:
    • Detailed logging for tracking API interactions and debugging issues.
    • Robust error handling with meaningful feedback for failed operations.
  • Introduction of ChatClientBase:
    • New base class to encapsulate chat-specific functionality, such as Prompty integration and prompt templates.
    • Keeps LLMClientBase focused on general-purpose LLM features, avoiding chat-specific dependencies.
    • Supports loading and configuring Prompty sources for chat-based workflows.

Impact

  • Enhanced Capabilities: Enables a wide range of use cases, from text-to-speech applications to audio-to-text processing and translations.
  • Separation of Concerns: By introducing ChatClientBase, the project achieves better modularity and clarity between general LLM functionalities and chat-specific features.
  • Flexibility: Provides seamless integration with both OpenAI and Azure OpenAI, catering to diverse deployment needs.
  • Ease of Use: Simplifies interaction with OpenAI’s audio APIs through structured requests and automated chunk handling.
  • Scalability: Handles large text inputs and ensures consistent performance with efficient chunk processing and file management.

These updates not only add advanced audio processing capabilities but also improve the project’s maintainability and adaptability by refactoring chat-specific logic into a dedicated base class.

@Cyb3rWard0g Cyb3rWard0g merged commit 3258b55 into main Dec 23, 2024
@Cyb3rWard0g Cyb3rWard0g deleted the feature/oai-audio-client branch December 27, 2024 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant