forked from ShishirPatil/gorilla
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BFCL] Speed Up Locally-hosted Model Inference Process (ShishirPatil#671
) Fix ShishirPatil#649 Instead of send requests to the vllm server one by one in sequence, we should send all requests all at once to vllm to utiliza its batching and optimizaiton benefits. Tested on 8 x A100 (40G) with Llama 3.1 70B. The inference speed on single-turn entries are roughtly the same (within 1 minute difference) as when using `llm.generate` before the BFCL V3 release in ShishirPatil#644]. The multi-turn entries still takes around 2 hours to complete, but that's largely due to the nature of the multi-turn dataset; it has been much faster than previously where it would take 2 days to finish. This PR **will not** affect the leaderboard score.
- Loading branch information
1 parent
2abbfd6
commit d9c0835
Showing
2 changed files
with
50 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters