Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaderboard Update, in sync with BFCL April 9th Release #341

Merged
merged 5 commits into from
Apr 11, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Apr 11, 2024

This PR updates the leaderboard data, as mentioned in #338. As a result, some values/scores are changed.
Note that the model glaiveai/glaive-function-calling-v1 is excluded in the leaderboard because when loading the model using transformers, we get the error AttributeError: 'ReplitLMTokenizer' object has no attribute 'sp_model'. This is a bug from the transformer's side on the specific tokenizer

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@HuanzhiMao HuanzhiMao marked this pull request as ready for review April 11, 2024 09:39
Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ShishirPatil pushed a commit that referenced this pull request Apr 11, 2024
This PR is for the BFCL April 9th release:

1. Bug fix in the evaluation dataset. This involves modifying both
prompts and function docs.
2. Bug fix for possible answers.

The detailed breakdown is attached below. If you spot any issue with our
evaluation dataset and/or possible answers, please feel free to raise an
issue!

| Test Category | Prompt/Func Doc Correction Count | Possible Answer
Correction Count |

|---------------------|-----------------------------|-----------------------------|
| Simple              | 3                           | 16 |
| Parallel             | 1                           | 16|
| Multiple              | 1                         | 11 |
| Parallel Multiple   | 10                          | 43 |

This PR **DOES** change the leaderboard score. We will update the
leaderboard website shortly, in PR #341

---------

Co-authored-by: Charlie Cheng-Jie Ji <[email protected]>
Co-authored-by: Fanjia Yan <[email protected]>

---------

Co-authored-by: Charlie Cheng-Jie Ji <[email protected]>
@ShishirPatil ShishirPatil merged commit 06c7f9d into ShishirPatil:gh-pages Apr 11, 2024
devanshamin pushed a commit to devanshamin/gorilla that referenced this pull request Jul 9, 2024
This PR is for the BFCL April 9th release:

1. Bug fix in the evaluation dataset. This involves modifying both
prompts and function docs.
2. Bug fix for possible answers.

The detailed breakdown is attached below. If you spot any issue with our
evaluation dataset and/or possible answers, please feel free to raise an
issue!

| Test Category | Prompt/Func Doc Correction Count | Possible Answer
Correction Count |

|---------------------|-----------------------------|-----------------------------|
| Simple              | 3                           | 16 |
| Parallel             | 1                           | 16|
| Multiple              | 1                         | 11 |
| Parallel Multiple   | 10                          | 43 |

This PR **DOES** change the leaderboard score. We will update the
leaderboard website shortly, in PR ShishirPatil#341

---------

Co-authored-by: Charlie Cheng-Jie Ji <[email protected]>
Co-authored-by: Fanjia Yan <[email protected]>

---------

Co-authored-by: Charlie Cheng-Jie Ji <[email protected]>
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
This PR is for the BFCL April 9th release:

1. Bug fix in the evaluation dataset. This involves modifying both
prompts and function docs.
2. Bug fix for possible answers.

The detailed breakdown is attached below. If you spot any issue with our
evaluation dataset and/or possible answers, please feel free to raise an
issue!

| Test Category | Prompt/Func Doc Correction Count | Possible Answer
Correction Count |

|---------------------|-----------------------------|-----------------------------|
| Simple              | 3                           | 16 |
| Parallel             | 1                           | 16|
| Multiple              | 1                         | 11 |
| Parallel Multiple   | 10                          | 43 |

This PR **DOES** change the leaderboard score. We will update the
leaderboard website shortly, in PR ShishirPatil#341

---------

Co-authored-by: Charlie Cheng-Jie Ji <[email protected]>
Co-authored-by: Fanjia Yan <[email protected]>

---------

Co-authored-by: Charlie Cheng-Jie Ji <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants