Skip to content

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

License

Notifications You must be signed in to change notification settings

deepseek-ai/DualPipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DualPipe

DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the DeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the profile data.

Schedules

dualpipe

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. The micro-batches in the reverse direction are symmetric to those in the forward direction, so we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border have mutually overlapped computation and communication

DualPipeV

DualPipeV is a concise V-shape schedule derived from DualPipe using a "cut-in-half" procedure, introduced by Sea AI Lab as "Cut-in-half" in their blog post. Thanks to them for this efficient schedule!

Schedules

dualpipev

Example DualPipeV scheduling for 4 PP ranks (8 PP stages) and 10 micro-batches.

Pipeline Bubbles and Memory Usage Comparison (based on the same number of PP stages)

Method Bubble Parameter Per Device Activation Per Device #Devices
1F1B (PP-1)(𝐹+𝐵) PP PP
ZB1P (PP-1)(𝐹+𝐵-2𝑊) PP PP
DualPipe (PP/2-1)(𝐹&𝐵+𝐵-3𝑊) PP+1 PP
DualPipeV (PP/2-1)(𝐹&𝐵+𝐵-3𝑊) PP+1 PP/2

PP denotes the number of pp stages (even). 𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of a full backward chunk, 𝑊 denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐵 denotes the execution time of two mutually overlapped forward and backward chunks.

Quick Start

The usage is shown in the following example:

python examples/example_dualpipe.py
python examples/example_dualpipev.py

Note: For real-world applications, you will need to implement a custom overlapped_forward_backward method tailored to your specific module.

Requirements

  • PyTorch 2.0 and above

Developers

DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

Citation

@misc{deepseekai2024deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437}, 
}

About

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages