Leaderboard Update, in sync with BFCL April 27th Release #391

HuanzhiMao · 2024-04-27T08:13:15Z

As mentioned in #390, in this PR, we fix some inconsistency issues in the cost and latency calculation for open-source models, which are now all calculated when serving the model with vLLM using 8 V100 GPUs. $$\text{Cost} = \text{Latency per 1000 function call} * (\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$

We want to thank the community for pointing out this oversight. Thanks @abacaj and @Teknium1 for initially raising the issue, and thanks @natikgadzhi @HamelHusain @nicoritschel @winglian @olafgeibig and many others for joining the conversation. We are listening to community feedback and continuously improving our Berkeley Function Calling Leaderboard. Discussions like this serve as great examples. Let us know what you want us to include next!

This PR DOES change the leaderboard scores for costs and latency, but not accuracy.

Co-authored-by: Charlie Cheng-Jie Ji [email protected]
Co-authored-by: Fanjia Yan [email protected]

CharlieJCJ

Numbers checked, LGTM

@abacaj

In this PR, we fix some inconsistency issues in the cost and latency calculation for open-source models, which are now all calculated when serving the model with [vLLM](https://github.com/vllm-project/vllm) using 8 V100 GPUs. $$\text{Cost} = \text{Latency per 1000 function call} * (\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$ This PR **DOES** change the leaderboard value in the `cost` and `latency` columns; but it **DOES NOT** change the accuracy score. We will update the leaderboard in a different PR #391. We want to thank the community for pointing out this oversight. Thanks [@abacaj](https://twitter.com/abacaj) and [@teknium1](https://twitter.com/Teknium1) for initially raising the issue, and thanks [@natikgadzhi](https://twitter.com/natikgadzhi) [@HamelHusain](https://twitter.com/HamelHusain) [@nicoritschel](https://twitter.com/nicoritschel) [@winglian](https://twitter.com/winglian) [@olafgeibig](https://twitter.com/olafgeibig) and many others for joining the conversation. We are listening to community feedback and continuously improving our Berkeley Function Calling Leaderboard. Discussions like [this](https://twitter.com/abacaj/status/1784003306508980250) serve as great examples. Let us know what you want us to include next! --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia Yan <[email protected]>

@abacaj

…rPatil#390) In this PR, we fix some inconsistency issues in the cost and latency calculation for open-source models, which are now all calculated when serving the model with [vLLM](https://github.com/vllm-project/vllm) using 8 V100 GPUs. $$\text{Cost} = \text{Latency per 1000 function call} * (\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$ This PR **DOES** change the leaderboard value in the `cost` and `latency` columns; but it **DOES NOT** change the accuracy score. We will update the leaderboard in a different PR ShishirPatil#391. We want to thank the community for pointing out this oversight. Thanks [@abacaj](https://twitter.com/abacaj) and [@teknium1](https://twitter.com/Teknium1) for initially raising the issue, and thanks [@natikgadzhi](https://twitter.com/natikgadzhi) [@HamelHusain](https://twitter.com/HamelHusain) [@nicoritschel](https://twitter.com/nicoritschel) [@winglian](https://twitter.com/winglian) [@olafgeibig](https://twitter.com/olafgeibig) and many others for joining the conversation. We are listening to community feedback and continuously improving our Berkeley Function Calling Leaderboard. Discussions like [this](https://twitter.com/abacaj/status/1784003306508980250) serve as great examples. Let us know what you want us to include next! --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia Yan <[email protected]>

@abacaj

…rPatil#390) In this PR, we fix some inconsistency issues in the cost and latency calculation for open-source models, which are now all calculated when serving the model with [vLLM](https://github.com/vllm-project/vllm) using 8 V100 GPUs. $$\text{Cost} = \text{Latency per 1000 function call} * (\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$ This PR **DOES** change the leaderboard value in the `cost` and `latency` columns; but it **DOES NOT** change the accuracy score. We will update the leaderboard in a different PR ShishirPatil#391. We want to thank the community for pointing out this oversight. Thanks [@abacaj](https://twitter.com/abacaj) and [@teknium1](https://twitter.com/Teknium1) for initially raising the issue, and thanks [@natikgadzhi](https://twitter.com/natikgadzhi) [@HamelHusain](https://twitter.com/HamelHusain) [@nicoritschel](https://twitter.com/nicoritschel) [@winglian](https://twitter.com/winglian) [@olafgeibig](https://twitter.com/olafgeibig) and many others for joining the conversation. We are listening to community feedback and continuously improving our Berkeley Function Calling Leaderboard. Discussions like [this](https://twitter.com/abacaj/status/1784003306508980250) serve as great examples. Let us know what you want us to include next! --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia Yan <[email protected]>

HuanzhiMao added 2 commits April 27, 2024 01:06

add explanation in leaderboard cost and latency calculation

5a718f8

update data.csv

9950807

HuanzhiMao mentioned this pull request Apr 27, 2024

BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) #390

Merged

HuanzhiMao added 3 commits April 27, 2024 01:21

update cost formula

42d8021

update cost formula in blog

a1fd1f5

update data.csv

71c68b4

CharlieJCJ approved these changes Apr 27, 2024

View reviewed changes

update last-update time

bc85341

ShishirPatil merged commit 2c87d43 into ShishirPatil:gh-pages Apr 27, 2024

HuanzhiMao deleted the llama-website branch April 27, 2024 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaderboard Update, in sync with BFCL April 27th Release #391

Leaderboard Update, in sync with BFCL April 27th Release #391

HuanzhiMao commented Apr 27, 2024 •

edited

Loading

CharlieJCJ left a comment

Leaderboard Update, in sync with BFCL April 27th Release #391

Leaderboard Update, in sync with BFCL April 27th Release #391

Conversation

HuanzhiMao commented Apr 27, 2024 • edited Loading

CharlieJCJ left a comment

Choose a reason for hiding this comment

HuanzhiMao commented Apr 27, 2024 •

edited

Loading