Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Leaderboard Update - 2024/12/06 (Checkpoint d7e52e5) #800

Merged
merged 4 commits into from
Dec 7, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Nov 28, 2024

This PR updates the leaderboard to reflect the change in score due to the following PR merge:

  1. [BFCL] Minor Grammatical Corrections to DEFAULT_SYSTEM_PROMPT #747
  2. Fix handling of examples with no tools in Gemini #770
  3. Skip adding empty content from gemini #768
  4. [BFCL] Add claude-3-5-haiku-20241022, claude-3-5-haiku-20241022-FC, claude-3-5-sonnet-20241022, claude-3-5-sonnet-20241022-FC #750
  5. [BFCL Dataset Revamp 4/n] Live Irrelevance #763
  6. [BFCL Dataset Revamp 5/n] Multi-Turn Base WrapUp #772
  7. [BFCL] Add Unit Test to Check for Illegal Python Parameter Name #777
  8. [BFCL] Dataset and Possible Answer Fix (Live Categories) for Illegal Python Parameter Name #778
  9. [BFCL] some tiny fix in possible_answer #786
  10. [BFCL] Add New Model Qwen/Qwen2.5-72B-Instruct #787
  11. [BFCL] Add DeepSeek-V2.5, DeepSeek-Coder-V2-Instruct-0724, DeepSeek-Coder-V2-Lite-Instruct, DeepSeek-V2-Chat-0628, DeepSeek-V2-Lite-Chat #697
  12. Add minicpm3 4b FC model handler #718
  13. [BFCL] Add support for Writer models and Palmyra X 004 #755
  14. [BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler #796
  15. [BFCL Dataset Revamp 6/n] Live Relevance Data Fix #789
  16. [BFCL Dataset Revamp 7/n] Augmented Multi-turn Dataset Fix #804
  17. [BFCL] Improve Latency Measurement Accuracy and Enable Default State Logging #808
  18. [BFCL] Resolve Issue in Gemini Model When No Model Output #809
  19. [BFCL] Replace 'class' with '_class' to Avoid Function Calling Formatting Error #811
  20. [BFCL] Added Grok Handler #810

Models were evaluated using checkpoint commit d7e52e5.

@HuanzhiMao HuanzhiMao added the BFCL-Website BFCL Leaderboard Website label Nov 28, 2024
@CharlieJCJ CharlieJCJ changed the title [BFCL] Leaderboad Update [BFCL] Leaderboard Update Dec 4, 2024
@HuanzhiMao HuanzhiMao changed the title [BFCL] Leaderboard Update [BFCL] Leaderboard Update - 2024/12/06 (Checkpoint 3d987cb) Dec 6, 2024
@HuanzhiMao HuanzhiMao marked this pull request as ready for review December 6, 2024 11:19
@HuanzhiMao HuanzhiMao changed the title [BFCL] Leaderboard Update - 2024/12/06 (Checkpoint 3d987cb) [BFCL] Leaderboard Update - 2024/12/06 (Checkpoint d7e52e5) Dec 7, 2024
Copy link
Collaborator

@Fanjia-Yan Fanjia-Yan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HuanzhiMao HuanzhiMao merged commit 2dcccc1 into ShishirPatil:gh-pages Dec 7, 2024
@HuanzhiMao HuanzhiMao deleted the leaderboard-update branch December 7, 2024 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-Website BFCL Leaderboard Website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants