[BFCL] Patch Generation Script for Locally Hosted OSS model #537

HuanzhiMao · 2024-07-20T07:01:19Z

This PR updates the inference method in the oss_handler to improve the generation task for locally-hosted models. Previously, we used Ray for multi-node single-GPU inference (pipeline parallel), which is uncommon as most setups are single machines with multiple GPUs. This approach also led to out-of-memory errors for some large models.

In this PR:

Ray is removed and replaced with vllm's built-in single-node multi-GPU inference method (tensor parallel).
The use of the @torch.inference_mode() decorator has been removed.

Squashed commit of the following: commit e65a108 Author: Huanzhi Mao <[email protected]> Date: Sat Jul 20 21:50:26 2024 -0700 update README commit 8034aed Author: Huanzhi Mao <[email protected]> Date: Sat Jul 20 17:44:50 2024 -0700 refactor glm_handler to simplify logic and apply fix commit 83912f0 Author: Huanzhi Mao <[email protected]> Date: Sat Jul 20 17:31:33 2024 -0700 polish process_input section commit 7d08daf Author: Huanzhi Mao <[email protected]> Date: Sat Jul 20 15:46:06 2024 -0700 simplify _batch_generate logic; seperate out process_input section commit c5ac395 Author: Huanzhi Mao <[email protected]> Date: Sat Jul 20 15:27:42 2024 -0700 remove outdated gemma model name commit b59af2c Author: Huanzhi Mao <[email protected]> Date: Sat Jul 20 14:32:23 2024 -0700 revert, as vllm still requires ray commit 7a275d7 Author: Huanzhi Mao <[email protected]> Date: Sat Jul 20 14:27:44 2024 -0700 remove ray from requirements.txt commit 0d1c478 Merge: 32c1ad4 7b230df Author: Huanzhi (Hans) Mao <[email protected]> Date: Sat Jul 20 00:01:25 2024 -0700 Merge branch 'main' into main commit 32c1ad4 Author: Huanzhi Mao <[email protected]> Date: Fri Jul 19 23:36:42 2024 -0700 remove ray; use vllm tensor_parallel_size commit 5ff790e Author: Huanzhi Mao <[email protected]> Date: Fri Jul 19 21:21:08 2024 -0700 remove torch inference_mode

berkeley-function-call-leaderboard/model_handler/oss_handler.py

HuanzhiMao · 2024-07-22T07:14:44Z

Sorry, this PR was accidentally closed.

CharlieJCJ

LGTM

…atil#537) This PR updates the `inference` method in the `oss_handler` to improve the generation task for locally-hosted models. Previously, we used Ray for multi-node single-GPU inference (pipeline parallel), which is uncommon as most setups are single machines with multiple GPUs. This approach also led to out-of-memory errors for some large models. In this PR: - Ray is removed and replaced with vllm's built-in single-node multi-GPU inference method (tensor parallel). - The use of the `@torch.inference_mode()` decorator has been removed.

HuanzhiMao added 10 commits July 19, 2024 21:21

remove torch inference_mode

5ff790e

remove ray; use vllm tensor_parallel_size

32c1ad4

Merge branch 'main' into main

0d1c478

remove ray from requirements.txt

7a275d7

revert, as vllm still requires ray

b59af2c

remove outdated gemma model name

c5ac395

simplify _batch_generate logic; seperate out process_input section

7d08daf

polish process_input section

83912f0

refactor glm_handler to simplify logic and apply fix

8034aed

update README

e65a108

HuanzhiMao marked this pull request as ready for review July 21, 2024 04:56

fix typo

d93bc37

Fanjia-Yan approved these changes Jul 22, 2024

View reviewed changes

berkeley-function-call-leaderboard/model_handler/oss_handler.py Outdated Show resolved Hide resolved

update num_gpus parameter in OSSHandler to 1

c51329d

HuanzhiMao closed this Jul 22, 2024

HuanzhiMao force-pushed the main branch from c51329d to 5b7635e Compare July 22, 2024 06:56

Merge branch 'main' into main

dd0a4f0

HuanzhiMao reopened this Jul 22, 2024

CharlieJCJ approved these changes Jul 23, 2024

View reviewed changes

ShishirPatil approved these changes Jul 24, 2024

View reviewed changes

ShishirPatil merged commit c0637bb into ShishirPatil:main Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BFCL] Patch Generation Script for Locally Hosted OSS model #537

[BFCL] Patch Generation Script for Locally Hosted OSS model #537

HuanzhiMao commented Jul 20, 2024

HuanzhiMao commented Jul 22, 2024

CharlieJCJ left a comment

[BFCL] Patch Generation Script for Locally Hosted OSS model #537

[BFCL] Patch Generation Script for Locally Hosted OSS model #537

Conversation

HuanzhiMao commented Jul 20, 2024

HuanzhiMao commented Jul 22, 2024

CharlieJCJ left a comment

Choose a reason for hiding this comment