-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BFCL] Multi Turn Dataset and Possible Answer Fix (Base Category) #719
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CharlieJCJ
approved these changes
Oct 25, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested changes in Message API.
LGTM, clear from dataset validation tests, ran bfcl v3 end-to-end.
ShishirPatil
pushed a commit
that referenced
this pull request
Oct 29, 2024
For functions used in the multi turn categories, we use automated scripts to deterministically extract information from the function doc string and compile the corresponding function docs. This PR addresses a few issues with that pipeline. 1. [Fix] The doc strings are not in a uniform format, making info extraction hard; they have been standardized. 2. [Fix] The compilation script is bugging and would sometimes output wrong function doc with missing information. 3. [Fix] We have also add the compilation script to the `utils` folder, as it may be useful to the community as well. Following #719 , this is also part of the effort to thoroughly bug fix the multi turn categories. We will have more PR coming in the next few days. --------- Co-authored-by: Amitoj Singh <[email protected]> Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]> --------- Co-authored-by: amitojsingh2022 <[email protected]>
This was referenced Oct 29, 2024
ShishirPatil
pushed a commit
that referenced
this pull request
Oct 30, 2024
This PR fixes the ambiguous prompt issue and some wrong ground truth issues for the multi_turn_base category. After this PR, the multi_turn_base entries should be bug-free. Following #719 and #722 , this is also part of the effort to thoroughly bug fix the multi turn categories. We will have more PR coming in the next few days. --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]>
HuanzhiMao
added a commit
that referenced
this pull request
Oct 31, 2024
This PR updates the question and ground truth for the `multi_turn_miss_func` and `multi_turn_long_context` accordingly, since they are augmented from `multi_turn_base` and the fix for the base entries was finalized in #723. Following #719, #722, #723 and #725, this is also part of the effort to thoroughly bug fix the multi turn categories. There will be one more PR coming for the `multi_turn_miss_param` category fix. --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]>
HuanzhiMao
added a commit
that referenced
this pull request
Oct 31, 2024
This PR updates the question and ground truth for the `multi_turn_miss_param` category, since they are augmented from `multi_turn_base` and the fix for the base entries was finalized in #723. Following #719, #722, #723, #725 and #728, this is also part of the effort to thoroughly bug fix the multi turn categories. --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]>
HuanzhiMao
added a commit
that referenced
this pull request
Oct 31, 2024
In the current metric, for the `multi_turn_miss_func` and `multi_turn_miss_param` categories, the model is expected to output no function calls when a turn is missing necessary information (either a relevant function or parameter). This mirrors the standard for irrelevance detection in single-turn scenarios. However, multi-turn interactions introduce additional complexity. For instance, if the user’s request is "go to the ABC folder and display content of the XYZ file" but the `cd` function isn’t provided, the model might reasonably attempt exploratory actions (like calling `pwd` or `ls` to check its current location) before recognizing that it cannot complete the task as requested. Ultimately, the model should recognize that the user’s task is unachievable given the context. To address this, we've updated the metric: a dummy function, `flag_task_unachievable`, will now be provided for every multi-turn entry. If the model determines that one or more tasks are unachievable, it should explicitly invoke this function. During evaluation, any entry where the model calls this function will be marked as correct for irrelevance detection, even if other functions were called beforehand. In addition, this PR addresses #664. The execution result for each turn (for both the model and the ground truth) is also included as part of the score output files to help with debugging. Following #719, #722 and #723, this is also part of the effort to thoroughly bug fix the multi turn categories. We will have more PR coming in the next few days.
VishnuSuresh27
pushed a commit
to VishnuSuresh27/gorilla
that referenced
this pull request
Nov 11, 2024
…ishirPatil#719) This PR addresses a few issues with the base multi turn entries: 1. Some initial config values are in the wrong format, and would result in execution error even when running the ground truth function calls in order. 2. Some entries in the initial config are not used by the `_load_scenario` method. 3. Some ground truth function calls have wrong parameter values, and would result in `{"error": "xxx"}` when executed. 4. Some functions are unreasonable in a real-life setting with its parameter or its functionality. After this PR, all info in the initial config are used, and the ground truth function calls will not results in any error when executed in order. This is part of the effort to thoroughly bug fix the multi turn categories. We will have more PR coming in the next few days. --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]>
VishnuSuresh27
pushed a commit
to VishnuSuresh27/gorilla
that referenced
this pull request
Nov 11, 2024
For functions used in the multi turn categories, we use automated scripts to deterministically extract information from the function doc string and compile the corresponding function docs. This PR addresses a few issues with that pipeline. 1. [Fix] The doc strings are not in a uniform format, making info extraction hard; they have been standardized. 2. [Fix] The compilation script is bugging and would sometimes output wrong function doc with missing information. 3. [Fix] We have also add the compilation script to the `utils` folder, as it may be useful to the community as well. Following ShishirPatil#719 , this is also part of the effort to thoroughly bug fix the multi turn categories. We will have more PR coming in the next few days. --------- Co-authored-by: Amitoj Singh <[email protected]> Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]> --------- Co-authored-by: amitojsingh2022 <[email protected]>
VishnuSuresh27
pushed a commit
to VishnuSuresh27/gorilla
that referenced
this pull request
Nov 11, 2024
This PR fixes the ambiguous prompt issue and some wrong ground truth issues for the multi_turn_base category. After this PR, the multi_turn_base entries should be bug-free. Following ShishirPatil#719 and ShishirPatil#722 , this is also part of the effort to thoroughly bug fix the multi turn categories. We will have more PR coming in the next few days. --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]>
VishnuSuresh27
pushed a commit
to VishnuSuresh27/gorilla
that referenced
this pull request
Nov 11, 2024
…l#728) This PR updates the question and ground truth for the `multi_turn_miss_func` and `multi_turn_long_context` accordingly, since they are augmented from `multi_turn_base` and the fix for the base entries was finalized in ShishirPatil#723. Following ShishirPatil#719, ShishirPatil#722, ShishirPatil#723 and ShishirPatil#725, this is also part of the effort to thoroughly bug fix the multi turn categories. There will be one more PR coming for the `multi_turn_miss_param` category fix. --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]>
VishnuSuresh27
pushed a commit
to VishnuSuresh27/gorilla
that referenced
this pull request
Nov 11, 2024
This PR updates the question and ground truth for the `multi_turn_miss_param` category, since they are augmented from `multi_turn_base` and the fix for the base entries was finalized in ShishirPatil#723. Following ShishirPatil#719, ShishirPatil#722, ShishirPatil#723, ShishirPatil#725 and ShishirPatil#728, this is also part of the effort to thoroughly bug fix the multi turn categories. --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia-Yan <[email protected]> Co-authored-by: VishnuSuresh27 <[email protected]>
VishnuSuresh27
pushed a commit
to VishnuSuresh27/gorilla
that referenced
this pull request
Nov 11, 2024
…irPatil#725) In the current metric, for the `multi_turn_miss_func` and `multi_turn_miss_param` categories, the model is expected to output no function calls when a turn is missing necessary information (either a relevant function or parameter). This mirrors the standard for irrelevance detection in single-turn scenarios. However, multi-turn interactions introduce additional complexity. For instance, if the user’s request is "go to the ABC folder and display content of the XYZ file" but the `cd` function isn’t provided, the model might reasonably attempt exploratory actions (like calling `pwd` or `ls` to check its current location) before recognizing that it cannot complete the task as requested. Ultimately, the model should recognize that the user’s task is unachievable given the context. To address this, we've updated the metric: a dummy function, `flag_task_unachievable`, will now be provided for every multi-turn entry. If the model determines that one or more tasks are unachievable, it should explicitly invoke this function. During evaluation, any entry where the model calls this function will be marked as correct for irrelevance detection, even if other functions were called beforehand. In addition, this PR addresses ShishirPatil#664. The execution result for each turn (for both the model and the ground truth) is also included as part of the score output files to help with debugging. Following ShishirPatil#719, ShishirPatil#722 and ShishirPatil#723, this is also part of the effort to thoroughly bug fix the multi turn categories. We will have more PR coming in the next few days.
HuanzhiMao
added a commit
that referenced
this pull request
Nov 19, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses a few issues with the base multi turn entries:
_load_scenario
method.{"error": "xxx"}
when executed.After this PR, all info in the initial config are used, and the ground truth function calls will not results in any error when executed in order.
This is part of the effort to thoroughly bug fix the multi turn categories. We will have more PR coming in the next few days.
Co-authored-by: Charlie Cheng-Jie Ji [email protected]
Co-authored-by: Fanjia-Yan [email protected]
Co-authored-by: VishnuSuresh27 [email protected]