Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix skip_first for resumption #2986

Merged
merged 6 commits into from
Feb 9, 2024
Merged

fix skip_first for resumption #2986

merged 6 commits into from
Feb 9, 2024

Conversation

bigning
Copy link
Contributor

@bigning bigning commented Feb 9, 2024

What does this PR do?

currently the profiler’s skip_first argument counts batches from the beginning of training and not from the start of the current resumption. This PR make the skip_first count from the start of the last resumption.

test

pytest tests/profiler/test_profiler.py -k test_skip_first_after_resumption

image

@bigning bigning requested a review from a team as a code owner February 9, 2024 05:12
@bigning bigning requested review from mvpatel2000 and cli99 February 9, 2024 05:13
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice test.

As a side note, the profiler is one of the earliest things we designed and is frankly poorly done. We really should have made it a callback directly (eg CheckpointSaver) and I don't really like the orchestrator profiler class...

composer/profiler/torch_profiler.py Outdated Show resolved Hide resolved
composer/profiler/torch_profiler.py Outdated Show resolved Hide resolved
@mvpatel2000
Copy link
Contributor

mvpatel2000 commented Feb 9, 2024

Btw, you can run lint with pre-commit run --all-files locally

@bigning bigning requested a review from mvpatel2000 February 9, 2024 18:34
@bigning bigning enabled auto-merge (squash) February 9, 2024 22:43
@bigning bigning merged commit b310a9b into dev Feb 9, 2024
14 checks passed
@bigning bigning deleted the profiler_schedule_skip_first branch February 9, 2024 22:43
bigning added a commit that referenced this pull request Feb 9, 2024
mvpatel2000 pushed a commit that referenced this pull request Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants