-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BFCL] Support Category-Specific Generation for OSS Model, Remove eval_data_compilation Step #512
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CharlieJCJ
approved these changes
Jul 14, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on Llama3-8B for data generation and eval checker, LGTM
ShishirPatil
approved these changes
Jul 17, 2024
This was referenced Jul 19, 2024
ShishirPatil
pushed a commit
that referenced
this pull request
Jul 20, 2024
… PR Merge (#536) On July 16th, PR #516 and PR #512 were merged first. They introduced fixes that should be applied to all model handlers. Shortly after, on the same day, PR #525 was merged. The new model handler introduced in PR #525 is missing the fixes from the previous two merged PRs (it wasn't synced accordingly). This PR addresses this issue by applying the necessary fixes to the new model handler.
ShishirPatil
pushed a commit
that referenced
this pull request
Jul 24, 2024
…taset; Handle vLLM Benign Error (#540) In this PR: 1. **Support Multi-Model Multi-Category Generation**: - The `openfunctions_evaluation.py` can now take a list of model names and a list of test categories as command line input. - Partially address #501. 2. **Handling vLLM's Error**: - A benign error would occur during the cleanup phase after completing a generation task, causing the pipeline to fail despite generating model results. This issue stems from vLLM and is outside our control. [See this issue](vllm-project/vllm#6145) from the vLLM repo. - This is annoying because when users attempt category-specific generation for locally-hosted models (as supported in #512), only the first category result for the first model is generated since the error occurs immediately after. - To improve the user experience, we now combine all selected test categories into one task and submit that single task to vLLM, splitting the results afterwards. - Note: If multiple locally-hosted models are queued for inference, only the tasks of the first model will complete. Subsequent tasks will still fail due to the cleanup phase error from the first model. Therefore, we recommend running the inference command for one model at a time until vLLM rolls out the fix. 3. **Adding Index to Dataset**: - Each test file and possible_answer file now includes an index to help match entries. This PR **will not** affect the leaderboard score.
aw632
pushed a commit
to vinaybagade/gorilla
that referenced
this pull request
Aug 22, 2024
…l_data_compilation Step (ShishirPatil#512) Currently, for OSS models, users must run `eval_data_compilation.py` to merge all datasets into a single `data_total.json` file before running inference. This requires inferring on all datasets at once, without the option to run inference on individual datasets or subsets. This PR addresses this limitation by allowing users to perform inference on specific datasets directly, removing the need for the `eval_data_compilation` step. Note: hosted models don't have this limitation. Partially addresses ShishirPatil#501 and ShishirPatil#502.
aw632
pushed a commit
to vinaybagade/gorilla
that referenced
this pull request
Aug 22, 2024
… PR Merge (ShishirPatil#536) On July 16th, PR ShishirPatil#516 and PR ShishirPatil#512 were merged first. They introduced fixes that should be applied to all model handlers. Shortly after, on the same day, PR ShishirPatil#525 was merged. The new model handler introduced in PR ShishirPatil#525 is missing the fixes from the previous two merged PRs (it wasn't synced accordingly). This PR addresses this issue by applying the necessary fixes to the new model handler.
aw632
pushed a commit
to vinaybagade/gorilla
that referenced
this pull request
Aug 22, 2024
…taset; Handle vLLM Benign Error (ShishirPatil#540) In this PR: 1. **Support Multi-Model Multi-Category Generation**: - The `openfunctions_evaluation.py` can now take a list of model names and a list of test categories as command line input. - Partially address ShishirPatil#501. 2. **Handling vLLM's Error**: - A benign error would occur during the cleanup phase after completing a generation task, causing the pipeline to fail despite generating model results. This issue stems from vLLM and is outside our control. [See this issue](vllm-project/vllm#6145) from the vLLM repo. - This is annoying because when users attempt category-specific generation for locally-hosted models (as supported in ShishirPatil#512), only the first category result for the first model is generated since the error occurs immediately after. - To improve the user experience, we now combine all selected test categories into one task and submit that single task to vLLM, splitting the results afterwards. - Note: If multiple locally-hosted models are queued for inference, only the tasks of the first model will complete. Subsequent tasks will still fail due to the cleanup phase error from the first model. Therefore, we recommend running the inference command for one model at a time until vLLM rolls out the fix. 3. **Adding Index to Dataset**: - Each test file and possible_answer file now includes an index to help match entries. This PR **will not** affect the leaderboard score.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, for OSS models, users must run
eval_data_compilation.py
to merge all datasets into a singledata_total.json
file before running inference. This requires inferring on all datasets at once, without the option to run inference on individual datasets or subsets. This PR addresses this limitation by allowing users to perform inference on specific datasets directly, removing the need for theeval_data_compilation
step.Note: hosted models don't have this limitation.
Partially addresses #501 and #502.