-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix skip_first for resumption #2986
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice test.
As a side note, the profiler is one of the earliest things we designed and is frankly poorly done. We really should have made it a callback directly (eg CheckpointSaver) and I don't really like the orchestrator profiler class...
Btw, you can run lint with |
…mposer into profiler_schedule_skip_first
What does this PR do?
currently the profiler’s skip_first argument counts batches from the beginning of training and not from the start of the current resumption. This PR make the skip_first count from the start of the last resumption.
test
pytest tests/profiler/test_profiler.py -k test_skip_first_after_resumption