Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microbatch Device Movement #3567

Merged
merged 10 commits into from
Aug 28, 2024

Conversation

mvpatel2000
Copy link
Contributor

@mvpatel2000 mvpatel2000 commented Aug 21, 2024

What does this PR do?

Instead of moving the entire batch to device at once, we now move each microbatch to device. This saves memory for large inputs, e.g. multimodal data, when training with many microbatches.

This change may affect certain callbacks which run operations on the batch which require it to be moved to an accelerator ahead of time, such as the two changed in this PR. There shouldn't be too many of these callbacks, so we anticipate this change will be relatively safe.

Manual test showing same loss curves:
image

@mvpatel2000 mvpatel2000 marked this pull request as ready for review August 21, 2024 15:55
@mvpatel2000 mvpatel2000 requested a review from a team as a code owner August 21, 2024 15:55
@mvpatel2000 mvpatel2000 requested a review from dakinggg August 21, 2024 15:55
dakinggg
dakinggg previously approved these changes Aug 22, 2024
@mvpatel2000 mvpatel2000 dismissed dakinggg’s stale review August 23, 2024 18:28

Dismissing while I verify no regression on vision workloads

@mvpatel2000 mvpatel2000 merged commit 62b70f3 into mosaicml:main Aug 28, 2024
14 checks passed
@mvpatel2000 mvpatel2000 deleted the mvpatel2000/microbatch-load branch August 28, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants