FloTorch.ai is an innovative product poised to transform the field of Generative AI by simplifying and optimizing the decision-making process for leveraging Large Language Models (LLMs) in Retrieval Augmented Generation (RAG) systems. In today’s fast-paced digital landscape, selecting the right LLM setup is critical for achieving efficiency, accuracy, and cost-effectiveness. However, this process often involves extensive trial-and-error, significant resource expenditure, and complex comparisons of performance metrics. Our solution addresses these challenges with a streamlined, user-friendly approach.
- Automated Evaluation of LLMs: FloTorch.ai evaluates multiple LLMs by analyzing combinations of hyperparameters defined by the end user.
- Performance Metrics: Produces detailed performance scores, including relevance, fluency, and robustness.
- Cost and Time Insights: Provides actionable insights into the pricing and execution times for each LLM configuration.
- Data-Driven Decision-Making: Empowers users to align LLM configurations with specific goals and budget constraints.
FloTorch.ai caters to a broad spectrum of users, including:
- Startups: Optimize AI-driven systems for rapid growth.
- Data Scientists: Simplify model selection and evaluation.
- Developers: Focus on deployment and innovation rather than experimentation.
- Researchers: Gain insights into LLM performance metrics effortlessly.
- Enterprises: Enhance customer experiences, improve content generation, and refine data retrieval processes.
- Eliminates Complexity: No more manual evaluations or tedious trial-and-error processes.
- Accelerates Selection: Streamlines the evaluation and decision-making process.
- Maximizes Efficiency: Ensures users achieve the best performance from their chosen LLMs.
- Focus on Innovation: Allows users to dedicate resources to innovation and deployment rather than experimentation.
By combining advanced evaluation capabilities with a focus on cost and time efficiency, FloTorch.ai provides a holistic solution for navigating the evolving RAG landscape. It empowers users to focus on innovation and deployment, setting a new standard for intelligent decision-making in AI-driven applications.
With FloTorch.ai, we aim to be a pivotal enabler of progress in the generative AI ecosystem, helping our users achieve excellence in their projects.
The CDK directory under the main directory in the FloTorch directory structure defines the AWS infrastructure creation for the FloTorch application using AWS CDK in Python. The infrastructure is designed with a suffix-based deployment strategy, allowing multiple isolated deployments to coexist in the same AWS account.
- An AWS account with sufficient credit / payment method.
- An EC2 instance (t2.large recommended; choose an Ubuntu or Amazon Linux 2 AMI), with an attached IAM role having the following permissions:
- Docker 20.10.x or later
- AWS CLI v2
- Python 3.8+
- Node.js 16.x
- AWS CDK CLI 2.x
- jq (JSON processor)
Note - all these are automatically setup by the installation script
Installs all prerequisites and necessary AWS infrastructure for FloTorch automatically before installing the FloTorch components.
-
Go into your AWS EC2 instance (Using SSH, SSM, or Instance Connect) (Choose an Ubuntu or Amazon Linux 2 AMI).
-
Clone the FloTorch git repository and follow the steps below to automatically install FloTorch and its prerequisites on AWS:
git clone https://github.com/FissionAI/FloTorch.git cd FloTorch cd cdk ./deploy.sh
The deploy.sh
script supports Ubuntu and Amazon Linux 2, automatically detecting the OS and using the appropriate package manager (apt
for Ubuntu, yum
for Amazon Linux). This script will also automatically install and set up the prerequisites needed for FloTorch to be running.
- Networking
- VPC with public and private subnets
- NAT Gateways for private subnet internet access
- Security Groups for various services
- OpenSearch Domain
- Version: OpenSearch 2.15
- Instance Type: r7g.large.search
- Data Nodes: 3
- Storage: 100GB GP3 EBS volumes
- DynamoDB Tables
- Experiment Table
- Metrics Table
- Model Invocations Table
- S3 Buckets
- Data bucket for storing application data
- Versioning enabled
- Encryption enabled
- ECS Configuration
- Fargate Task Definitions
- Memory: 32GB
- CPU: 8 vCPUs
- Task Role with necessary permissions
- Fargate Task Definitions
- Step Functions State Machine
- Parallel execution of experiments
- Map state with failure tolerance:
- Maximum concurrency: 10
- Continues processing even if individual experiments fail
- Lambda function configurations:
- Memory: 1024MB
- App Runner Service
- 4 vCPU and 12 GM memory
After successful deployment, you'll receive:
-
Access Information:
- Web UI URL (App Runner endpoint)
- Nginx authentication credentials
- OpenSearch domain endpoint
-
Resource Details:
- Stack name and ID
- Region information
- Created resource IDs
To remove FloTorch and its infrastructure:
- Go to AWS CloudFormation console in your deployed region
- Remove all images in the each of the following repositories:
flotorch-indexing-<suffix>
flotorch-retriever-<suffix>
flotorch-app-<suffix>
flotorch-evaluation-<suffix>
flotorch-runtime-<suffix>
- Delete the following stacks in order:
FlotorchAppStack-<suffix>
FlotorchStateMachineStack-<suffix>
FlotorchVPCStack-<suffix>
After you login to the App Runner instance hosting the FloTorch UI application, you will be greeted with a Welcome Page.
Upon accessing FloTorch, you are greeted with a welcome page. Click ‘Get started’ to initiate your first project.
You’ll be taken to the "Projects" section to view all existing projects.
Each project is listed with details such as ID, Name, Region, Status, and Date of completion or initiation.
Example ID: 5GM2E
When creating a new project, you'll go through three main steps where you'll need to specify the necessary settings and options for your project.
You will also have the option to use a previously saved configuration file if you have one. Simply click on 'Upload config' and select a valid JSON file with all necessary parameters. The application will automatically load these parameters and display the available experiments you can run. If you don't have a configuration file, please proceed with manual setup.
- Click on "Create Project" to start a new project.
- Fill in required fields such as Project Name, Region, Knowledge Base Data, and Ground Truth Data.
In this page, you’ll be configuring experiment indexing-related settings. Define experiment parameters, including:
- Chunking Strategy
- Vector Dimension
- Chunk Size
- Chunk Overlap Percentage
- Indexing Algorithm (e.g., HNSW)
- Embedding Model (e.g., Titan Embeddings V2 - Text)
In this page, you’ll be configuring experiment retrieval-related settings. Define the parameters:
- N shot prompt; provide a shot prompt file if you’re going with non-zero shot prompt.
- KNN
- Inferencing LLM
- Inferencing LLM Temperature
Once these are selected, all the valid configurations will be displayed on the next page based on the choices you’ve made.
You will have the option to save the valid configurations by clicking the ‘Download config’ button.
Please review the configurations and select all the experiments that you’d like to run by marking the checkboxes and click ‘Run’. Now all the experiments you had marked will be displayed on a table, review it and click ‘Confirm’ to start the experiments.
You’ll now be taken back to the projects page where you can monitor the status of experiments.
Each experiment is tracked by ID, embedding model used, indexing algorithm, and other parameters.
Example statuses include "Not Started", "In Progress", “Failed” or "Completed".
If you select an experiment that is in progress, you’ll be able to view its status in the experiment table.
Statuses include:
- "Not started"
- "Indexing in progress"
- "Retrieval in progress"
- "Completed"
Once an experiment is completed, an evaluation will be run based on a few metrics and the results will be displayed in the experiment table along with directional pricing and the duration.
The evaluation metrics include:
- Faithfulness
- Context Precision
- Aspect Critic
- Answer Relevancy
If you’d like to see the answers the model generated, you can click on the experiment ID to view them along with the questions and the ground truth answers.
You’ll also have the option to view all the parameters of the experiment configuration; click the ‘details’ button on the same page.
FloTorch.ai is an innovative platform designed to simplify and optimize the selection and configuration of Large Language Models (LLMs) for use in Retrieval Augmented Generation (RAG) systems. It addresses the challenges of manual evaluation, resource-intensive experimentation, and complex performance comparisons by providing an automated, user-friendly approach to decision-making. The platform enables efficient and cost-effective utilization of LLMs, making it a valuable tool for startups, researchers, developers, and enterprises.
- Automated Evaluation of LLMs
- FloTorch.ai analyzes multiple LLM configurations by evaluating combinations of hyperparameters specified by users.
- It measures key performance metrics such as relevance, fluency, and robustness.
- Performance Insights
- Provides detailed performance scores for each evaluated LLM configuration.
- Offers actionable insights into execution times and pricing, helping users understand the cost-effectiveness of each setup.
- Cost and Time Efficiency
- Streamlines the LLM selection process by eliminating manual trial-and-error.
- Saves time and resources by automating comparisons and evaluations.
- User-Friendly Decision-Making
- Helps users make data-driven decisions aligned with their specific goals and budgets.
- Simplifies the complexity of optimizing LLMs in RAG systems.
- Broad Applicability
- Caters to diverse use cases, such as enhancing customer experiences, improving content generation, and refining data retrieval processes.
- Targeted Support for Users
- Designed for startups, data scientists, developers, researchers, and enterprises seeking to optimize AI-driven systems.
By focusing on performance, cost, and time efficiency, FloTorch.ai empowers users to maximize the potential of generative AI without the need for extensive experimentation.
It is a platform hosted on AWS infrastructure and provides web-based access through an App Runner instance; it is common for such systems to require authentication mechanisms for secure access.
After the installation is complete, users receive a web URL and credentials to log into the FloTorch application. This implies that secure login credentials (possibly including an API key or token) are required for authentication.
If FloTorch.ai provides an API for external integrations, it is likely that credentials would be required to authenticate API calls.
Since FloTorch runs on AWS, users must have their AWS credentials configured locally to deploy and manage infrastructure. This is separate from accessing the FloTorch application itself.
FloTorch.ai is designed primarily for deployment on AWS infrastructure, including the use of an EC2 instance for setup and hosting. However, whether it can be run from a local machine depends on the specific requirements of the tool and the intended use case.
FloTorch.ai includes sample data for knowledge base data and ground truth data. However, FloTorch typically provides some form of example datasets or pre-configured templates to help users get started.
5. Do I require AWS account access as a root user to be able to use the features and to be able to get the tool running?
No, you do not require root user access to an AWS account to use and set up FloTorch.ai. However, you do need an AWS account with sufficient permissions to perform specific actions necessary for deploying and managing the infrastructure. Using the root account directly is not recommended for security reasons. Instead, you should use an IAM user or role with the required permissions.
If you encounter errors while using FloTorch.ai, there are several avenues to seek help and resolve the issues:
- Check the installation guide, FAQs, and other resources provided in the FloTorch repository or official website.
- Review detailed error logs (if available) for clues. The application may log specific issues during infrastructure setup or usage.
- GitHub Repository:
- If the tool is hosted on GitHub (e.g., FissionAI/FloTorch), check for an "Issues" tab.
- You can report bugs or errors by creating a new issue with detailed information.
- FloTorch.ai Website
Evaluation refers to the process of analyzing the performance of the experiments conducted using Large Language Models (LLMs) for Retrieval Augmented Generation (RAG) tasks. The evaluation process measures how effectively the selected configurations and models meet specific criteria, such as relevance, fluency, robustness, and cost efficiency.
- Metrics Assessment:
- Faithfulness
- Context Precision
- Aspect Critic
- Answer Relevancy
- Cost and Time Analysis
- Configuration Validation
- Result Visualization
- Iterative Improvement
Data strategy and retrieval strategy are key components in configuring and optimizing experiments involving RAG systems.
Yes, the Knowledge Base dataset size limit is restricted to 40MB.
11. I want to contribute to the GitHub source code of FloTorch.ai. Is there a way in which I can contribute?
This tool is governed by the terms and conditions of the APACHE 2.0 license.
It will be released shortly. Please stay tuned to the FloTorch.ai website for regular updates.
Directional pricing is an effective price taken for an experiment to complete. It is the summation of OpenSearch price, embedding model price, retrieval, and evaluation price.
Yes, here are restrictions from the tool:
- KB dataset limit - 40MB
- GT questions - 50
- Max number of experiments - 25
The next version of the product will be released shortly. Stay tuned to the FloTorch.ai website for regular updates.
The time it takes to complete an experiment in FloTorch.ai with a typical configuration can vary based on several factors, including the complexity of the configuration, the size of the data, the type of model used, and the available computational resources.
No, we do not have any default configuration available in Git.
19. Is the data which I use (project data and evaluation data) transferred to the cloud or made public?
No, it will be saved in an S3 bucket, but it is not made public. It will be limited to your AWS Account based on your default IAM configurations and Security Group settings.
The license is Apache 2.0.
This document outlines the guidelines for contributing to the project to maintain consistency and code quality.
Follow the detailed steps provided in instructions.md.
- The
master
branch is the primary branch and should remain stable. - Avoid pushing directly to the
master
branch. All changes must go through the pull request process.
- All new feature branches must be created from the
master
branch. - Use descriptive names for feature branches.
Example:feature/bedrock_claude_inferencer
- All code changes must be submitted as pull requests.
- Each pull request should be reviewed by at least one other developer.
- Keep pull requests small and focused on a specific feature or fix.
- Include relevant information in commit messages to provide context.
- Delete feature branches after they have been successfully merged into
master
.
- Before submitting a pull request, thoroughly test your changes locally to ensure they work as expected.
- Use snake_case for:
- names
- Configuration variables
- python file names
Example:
example_snake_case