Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] add disable-custom-all-reduce #1148

Merged
merged 4 commits into from
Aug 20, 2024

Conversation

Xu-Chen
Copy link
Contributor

@Xu-Chen Xu-Chen commented Aug 19, 2024

Motivation

Sometimes, we need to turn off Custom allreduce.
Especially on A800 with tp, to avoid timeout problems caused by NCCL communication.
Error like:vllm-project/vllm#6614
This may be the reason, not sure, but after setting disable-custom-all-reduce, the problem no longer occurs.

Modification

Checklist

  • Before submitting a PR for review, make sure it has passed verification in your local development environment at least.
  • Ensure pre-commit pre-commit run --all-files or other linting tools are used to fix potential lint issues.
  • Confirm that modifications are covered by complete unit tests. If not, please add more unit tests for correctness.
  • Modify documentation as needed, such as docstrings or example tutorials.

@zhyncs
Copy link
Member

zhyncs commented Aug 19, 2024

@Xu-Chen Xu-Chen force-pushed the add-disable_custom_all_reduce branch from c0d9374 to 5e065b3 Compare August 19, 2024 10:24
@zhyncs
Copy link
Member

zhyncs commented Aug 19, 2024

@Xu-Chen May you use python3 -m sglang.check_env for A800 env?

@zhyncs zhyncs self-assigned this Aug 19, 2024
@Xu-Chen
Copy link
Contributor Author

Xu-Chen commented Aug 19, 2024

@Xu-Chen May you use python3 -m sglang.check_env for A800 env?
4*A800

Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA A800-SXM4-80GB
GPU 0,1,2,3 Compute Capability: 8.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.1, V12.1.105
CUDA Driver Version: 525.125.06
PyTorch: 2.4.0+cu121
sglang: 0.2.12
flashinfer: 0.1.4+cu121torch2.4
triton: 3.0.0
transformers: 4.44.0
requests: 2.32.3
tqdm: 4.66.5
numpy: 1.26.4
aiohttp: 3.10.3
fastapi: 0.112.0
hf_transfer: Module Not Found
huggingface_hub: 0.24.5
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.0.0
pydantic: 2.8.2
uvicorn: 0.30.5
uvloop: 0.19.0
zmq: 26.1.0
vllm: 0.5.4
multipart: 0.0.9
openai: 1.40.6
anthropic: Module Not Found
litellm: Module Not Found
NVIDIA Topology:
	GPU0	GPU1	GPU2	GPU3	NIC0	NIC1	NIC2	NIC3	NIC4	NIC5	NIC6	NIC7	NIC8	CPU Affinity	NUMA Affinity
GPU0	 X 	NV8	NV8	NV8	NODE	PXB	PXB	NODE	NODE	SYS	SYS	SYS	SYS	0-31,64-95	0
GPU1	NV8	 X 	NV8	NV8	SYS	SYS	SYS	SYS	SYS	PXB	PXB	NODE	NODE	32-63,96-127	1
GPU2	NV8	NV8	 X 	NV8	SYS	SYS	SYS	SYS	SYS	NODE	NODE	PXB	PXB	32-63,96-127	1
GPU3	NV8	NV8	NV8	 X 	SYS	SYS	SYS	SYS	SYS	NODE	NODE	PXB	PXB	32-63,96-127	1
NIC0	NODE	SYS	SYS	SYS	 X 	NODE	NODE	NODE	NODE	SYS	SYS	SYS	SYS
NIC1	PXB	SYS	SYS	SYS	NODE	 X 	PIX	NODE	NODE	SYS	SYS	SYS	SYS
NIC2	PXB	SYS	SYS	SYS	NODE	PIX	 X 	NODE	NODE	SYS	SYS	SYS	SYS
NIC3	NODE	SYS	SYS	SYS	NODE	NODE	NODE	 X 	PIX	SYS	SYS	SYS	SYS
NIC4	NODE	SYS	SYS	SYS	NODE	NODE	NODE	PIX	 X 	SYS	SYS	SYS	SYS
NIC5	SYS	PXB	NODE	NODE	SYS	SYS	SYS	SYS	SYS	 X 	PIX	NODE	NODE
NIC6	SYS	PXB	NODE	NODE	SYS	SYS	SYS	SYS	SYS	PIX	 X 	NODE	NODE
NIC7	SYS	NODE	PXB	PXB	SYS	SYS	SYS	SYS	SYS	NODE	NODE	 X 	PIX
NIC8	SYS	NODE	PXB	PXB	SYS	SYS	SYS	SYS	SYS	NODE	NODE	PIX	 X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_6
  NIC7: mlx5_7
  NIC8: mlx5_8


ulimit soft: 1024

Ying1123
Ying1123 previously approved these changes Aug 20, 2024
@zhyncs
Copy link
Member

zhyncs commented Aug 20, 2024

@Xu-Chen Could you try using --enable-p2p-check instead of --disable-custom-all-reduce in your environment? Does it work?

@Xu-Chen
Copy link
Contributor Author

Xu-Chen commented Aug 20, 2024

@Xu-Chen Could you try using --enable-p2p-check instead of --disable-custom-all-reduce in your environment? Does it work?

@zhyncs Unfortunately, this does not solve the problem. When there are too many requests, timeout problems occur.

@merrymercy merrymercy merged commit ff2cfdb into sgl-project:main Aug 20, 2024
1 of 5 checks passed
@merrymercy
Copy link
Contributor

@Xu-Chen Thanks for the contribution. It is merged.

@Xu-Chen Xu-Chen deleted the add-disable_custom_all_reduce branch August 20, 2024 15:52
@m0g1cian
Copy link

@Xu-Chen I am wondering what is the performance drop after disabling the custom_all_reduce?

@Xu-Chen
Copy link
Contributor Author

Xu-Chen commented Sep 20, 2024

@Xu-Chen I am wondering what is the performance drop after disabling the custom_all_reduce?

About 5% ~ 10%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants