forked from ShishirPatil/gorilla
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BFCL] Fix Hanging Inference for OSS Models on GPU Platforms (Shishir…
…Patil#663) This PR addresses issues encountered when running locally-hosted models on GPU-renting platforms (e.g., Lambda Cloud). Specifically, there were problems with output display from `vllm` due to the use of subprocesses for launching these models. Additionally, some multi-turn functions (such as `xargs`) rely on subprocesses, which caused inference on certain test entries (such as `multi_turn_36 `) to hang indefinitely, resulting in an undesirable pipeline halt. To fix this, the terminal logging logic has been updated to utilize a separate thread for reading from the subprocess pipe and printing to the terminal. Alos, for readability, the `_format_prompt` function has been moved to the `Prompting methods` section; this would not change the leaderboard score.
- Loading branch information
1 parent
d9c0835
commit e110fbc
Showing
1 changed file
with
38 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters