-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AI API Eval Framework #618
Comments
Hi! @ashitaprasad I find this issue very interesting and would love to work on it as part of GSoC. The idea of building an AI API evaluation framework and integrating it into API Dash aligns well with my skills in AI, Python, and Flutter. I have experience in developing evaluation frameworks and data visualization tools. I’d like to discuss the project in more detail. Are there any specific AI APIs or benchmarks that should be prioritized? Also, should the evaluation framework support parallel execution for batch processing? Looking forward to contributing! |
@f-ei8ht Currently, lm-evaluation-harness is the most popular LLM eval framework which supports the evaluation of models served via several commercial APIs or local inference APIs. But, it is not user friendly and requires coding background to use. This project is trying to solve the issue of providing an easy way to evaluate the AI API responses for any task benchmark. Read LLM Benchmarks Explained: Everything on MMLU, HellaSwag, BBH, and Beyond Let us take for example of MMLU benchmark and test a model served using Ollama local API. In this feature, the user should be able to select the benchmark (MMLU) for which the API is being evaluated. API Dash will read the benchmark datasets (download if not available), process it and create the API requests which will be executed. The API response received will be processed and used to calculate the benchmark score. Everything happens in a user-friendly manner, where the user is able to see the progress of evaluations, pause/resume evaluation, visualize the end result easily. |
Tell us about the task you want to perform and are unable to do so because the feature is not available
Develop an end-to-end AI API eval framework and integrate it in API Dash. This framework should (list is suggestive, not exhaustive):
The text was updated successfully, but these errors were encountered: