Question about integration with DeepSpeed-Ulysses #679

zigzagcai · 2024-11-15T09:56:38Z

Hi developers,

Thanks for such a great project that can demonstrate the power of newly released features in torch.

When I want to run llama2 model with 128k long sequence, how can we enable it? I have some experience with DeepSpeed-Ulysses, so the question becomes does torchtitan support sequence parallelism in DeepSpeed-Ulysses?

Thanks!

gnadathur · 2024-11-15T22:48:56Z

@zigzagcai
titan support sequence and context parallelism for ultra long sequence length. cc: @XilunWu , could you share the details on the config.

Oh -- misread this comment, titan supports native FSDP, no deepspeed integration

XilunWu · 2024-11-15T22:58:33Z

Sequence parallel should be enabled by default if TP is used.

To enable CP, please change the context_parallel_degree (see PR #592 description for details/examples)

tianyu-l added the question Further information is requested label Nov 18, 2024

tianyu-l closed this as completed Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about integration with DeepSpeed-Ulysses #679

Question about integration with DeepSpeed-Ulysses #679

zigzagcai commented Nov 15, 2024

gnadathur commented Nov 15, 2024 •

edited

Loading

XilunWu commented Nov 15, 2024

Question about integration with DeepSpeed-Ulysses #679

Question about integration with DeepSpeed-Ulysses #679

Comments

zigzagcai commented Nov 15, 2024

gnadathur commented Nov 15, 2024 • edited Loading

XilunWu commented Nov 15, 2024

gnadathur commented Nov 15, 2024 •

edited

Loading