[Perf] Skip creating attention mask in llama dataloader #40

billishyahao · 2024-12-13T11:27:29Z

This patch is to skip creating attention mask in llama dataloader by adding flag --no-create-attention-mask-in-dataloader. With this patch, we can see multiple benefits below:

This could bring 4%~6% performance gain
Also address observed data loader crash issue when dealing with long sequence case e.g. [BUG] Long context training using context-parallel hangs/crashes NVIDIA/Megatron-LM#1025
We also see new megatron model example adopt this flag as well. e.g. https://github.com/NVIDIA/Megatron-LM/blob/40db706d37a25787b0fb6b7b561327e5d2b4b2e4/examples/mamba/train.sh#L102

wenchenvincent · 2025-01-22T20:33:37Z

@lizamd Are you aware of this change? Do we use this setting for testing?

wenchenvincent · 2025-01-22T20:35:04Z

@billishyahao Could you give some more details on the behavior --no-create-attention-mask-in-dataloader? For example, if attention mask is not created in dataloader, where is it created?

lizamd · 2025-01-24T17:22:58Z

@billishyahao could you provide more data on the 4-5% perf gain and address @wenchenvincent 's question? we can have a call too

Skip creating attention mask in llama dataloader

3f945b4

wenchenvincent requested a review from lizamd January 22, 2025 20:32

wenchenvincent requested a review from lcskrishna January 22, 2025 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Skip creating attention mask in llama dataloader #40

[Perf] Skip creating attention mask in llama dataloader #40

billishyahao commented Dec 13, 2024

wenchenvincent commented Jan 22, 2025

wenchenvincent commented Jan 22, 2025

lizamd commented Jan 24, 2025

[Perf] Skip creating attention mask in llama dataloader #40

Are you sure you want to change the base?

[Perf] Skip creating attention mask in llama dataloader #40

Conversation

billishyahao commented Dec 13, 2024

wenchenvincent commented Jan 22, 2025

wenchenvincent commented Jan 22, 2025

lizamd commented Jan 24, 2025