BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) #390

HuanzhiMao · 2024-04-27T07:53:15Z

In this PR, we fix some inconsistency issues in the cost and latency calculation for open-source models, which are now all calculated when serving the model with vLLM using 8 V100 GPUs. $$\text{Cost} = \text{Latency per 1000 function call} * (\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$

This PR DOES change the leaderboard value in the cost and latency columns; but it DOES NOT change the accuracy score. We will update the leaderboard in a different PR #391.

We want to thank the community for pointing out this oversight. Thanks @abacaj and @Teknium1 for initially raising the issue, and thanks @natikgadzhi @HamelHusain @nicoritschel @winglian @olafgeibig and many others for joining the conversation. We are listening to community feedback and continuously improving our Berkeley Function Calling Leaderboard. Discussions like this serve as great examples. Let us know what you want us to include next!

Co-authored-by: Charlie Cheng-Jie Ji [email protected]
Co-authored-by: Fanjia Yan [email protected]

CharlieJCJ

LGTM

@abacaj

As mentioned in #390, in this PR, we fix some inconsistency issues in the cost and latency calculation for open-source models, which are now all calculated when serving the model with [vLLM](https://github.com/vllm-project/vllm) using 8 V100 GPUs. $$\text{Cost} = \text{Latency per 1000 function call} * (\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$ We want to thank the community for pointing out this oversight. Thanks [@abacaj](https://twitter.com/abacaj) and [@teknium1](https://twitter.com/Teknium1) for initially raising the issue, and thanks [@natikgadzhi](https://twitter.com/natikgadzhi) [@HamelHusain](https://twitter.com/HamelHusain) [@nicoritschel](https://twitter.com/nicoritschel) [@winglian](https://twitter.com/winglian) [@olafgeibig](https://twitter.com/olafgeibig) and many others for joining the conversation. We are listening to community feedback and continuously improving our Berkeley Function Calling Leaderboard. Discussions like [this](https://twitter.com/abacaj/status/1784003306508980250) serve as great examples. Let us know what you want us to include next! This PR DOES change the leaderboard scores for `costs` and `latency`, but not `accuracy`. --------- Co-authored-by: Charlie Cheng-Jie Ji [[email protected]](mailto:[email protected]) Co-authored-by: Fanjia Yan [[email protected]](mailto:[email protected])

@abacaj

…rPatil#390) In this PR, we fix some inconsistency issues in the cost and latency calculation for open-source models, which are now all calculated when serving the model with [vLLM](https://github.com/vllm-project/vllm) using 8 V100 GPUs. $$\text{Cost} = \text{Latency per 1000 function call} * (\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$ This PR **DOES** change the leaderboard value in the `cost` and `latency` columns; but it **DOES NOT** change the accuracy score. We will update the leaderboard in a different PR ShishirPatil#391. We want to thank the community for pointing out this oversight. Thanks [@abacaj](https://twitter.com/abacaj) and [@teknium1](https://twitter.com/Teknium1) for initially raising the issue, and thanks [@natikgadzhi](https://twitter.com/natikgadzhi) [@HamelHusain](https://twitter.com/HamelHusain) [@nicoritschel](https://twitter.com/nicoritschel) [@winglian](https://twitter.com/winglian) [@olafgeibig](https://twitter.com/olafgeibig) and many others for joining the conversation. We are listening to community feedback and continuously improving our Berkeley Function Calling Leaderboard. Discussions like [this](https://twitter.com/abacaj/status/1784003306508980250) serve as great examples. Let us know what you want us to include next! --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia Yan <[email protected]>

@abacaj

…rPatil#390) In this PR, we fix some inconsistency issues in the cost and latency calculation for open-source models, which are now all calculated when serving the model with [vLLM](https://github.com/vllm-project/vllm) using 8 V100 GPUs. $$\text{Cost} = \text{Latency per 1000 function call} * (\text{8xV100 azure-pay-as-you-go-price per hour / 3600})$$ This PR **DOES** change the leaderboard value in the `cost` and `latency` columns; but it **DOES NOT** change the accuracy score. We will update the leaderboard in a different PR ShishirPatil#391. We want to thank the community for pointing out this oversight. Thanks [@abacaj](https://twitter.com/abacaj) and [@teknium1](https://twitter.com/Teknium1) for initially raising the issue, and thanks [@natikgadzhi](https://twitter.com/natikgadzhi) [@HamelHusain](https://twitter.com/HamelHusain) [@nicoritschel](https://twitter.com/nicoritschel) [@winglian](https://twitter.com/winglian) [@olafgeibig](https://twitter.com/olafgeibig) and many others for joining the conversation. We are listening to community feedback and continuously improving our Berkeley Function Calling Leaderboard. Discussions like [this](https://twitter.com/abacaj/status/1784003306508980250) serve as great examples. Let us know what you want us to include next! --------- Co-authored-by: Charlie Cheng-Jie Ji <[email protected]> Co-authored-by: Fanjia Yan <[email protected]>

HuanzhiMao added 4 commits April 27, 2024 00:25

update cost calculation logic

480b988

update NO_COST_MODELS

3669521

update change log

e9ab6df

update change log url

02c71b6

HuanzhiMao marked this pull request as ready for review April 27, 2024 08:01

add Gorilla latency info

9bd236b

HuanzhiMao mentioned this pull request Apr 27, 2024

Leaderboard Update, in sync with BFCL April 27th Release #391

Merged

HuanzhiMao added 5 commits April 27, 2024 01:17

add cost formula

1361880

update cost formula

dab4840

clean up

3369fd1

add mistral cost

813eb97

add gemini-1.5 cost

e5ad45c

ShishirPatil approved these changes Apr 27, 2024

View reviewed changes

CharlieJCJ approved these changes Apr 27, 2024

View reviewed changes

ShishirPatil merged commit cff48af into ShishirPatil:main Apr 27, 2024

HuanzhiMao deleted the llama-fix branch April 27, 2024 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) #390

BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) #390

HuanzhiMao commented Apr 27, 2024 •

edited

Loading

CharlieJCJ left a comment

BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) #390

BFCL April 27th Release (Bug Fix in Cost/Latency Calculation) #390

Conversation

HuanzhiMao commented Apr 27, 2024 • edited Loading

CharlieJCJ left a comment

Choose a reason for hiding this comment

HuanzhiMao commented Apr 27, 2024 •

edited

Loading