Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add APIs to detach/attach model to sdd pipeline #2076

Closed
wants to merge 1 commit into from

Conversation

sarckk
Copy link
Member

@sarckk sarckk commented Jun 5, 2024

Summary:

Summary

Adds 2 new user-facing APIs for TrainPipelineSparseDist:

  • detach() -> torch.nn.Module. Detaches original model so it can be used outside of current train pipeline.
  • attach(model: Optional[torch.nn.Module] = none) -> None. Attaches model to pipeline (i.e. override trec module forward and input dist fwds). If no model specified, uses original model.

Sample use cases:

  • Bulk eval on trec sharded modules (e.g. ShardedEBC) after/during pipelined training. Currently this causes issues because the model forward is swapped with pipelined forward call.
  • Train on one pipeline (e.g. full-sync SDD), then swap to another pipeline (e.g. semi-sync)
  • Swap out model during training by calling attach() on another model (no current use case but is supported)

Differential Revision: D57882281

Summary:
## Summary
Adds 2 new user-facing APIs for `TrainPipelineSparseDist`:
- `detach()` -> torch.nn.Module. Detaches original model so it can be used outside of current train pipeline.
- `attach(model: Optional[torch.nn.Module] = none)` -> None. Attaches model to pipeline (i.e. override trec module forward and input dist fwds). If no model specified, uses original model.



## Sample use cases:
- Bulk eval on trec sharded modules (e.g. ShardedEBC) after/during pipelined training. Currently this causes issues because the model forward is swapped with pipelined forward call.
- Train on one pipeline (e.g. full-sync SDD), then swap to another pipeline (e.g. semi-sync)
- Swap out model during training by calling `attach()` on another model (no current use case but is supported)

Differential Revision: D57882281
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 5, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D57882281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants